Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3708089imm; Mon, 18 Jun 2018 02:42:57 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJiUNcOqUBoV6BQeMP15nH5gO/R/R23FjTjtsmu7kmxj8OuJ1rarHlo1J11mlhxntn7LA28 X-Received: by 2002:a63:721d:: with SMTP id n29-v6mr10143378pgc.194.1529314977381; Mon, 18 Jun 2018 02:42:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529314977; cv=none; d=google.com; s=arc-20160816; b=SEzKLmV0PImuYgboCzLWtA8JdRAqePzhakxx+x53ndNBKs2Ksf0lrz5f0YAWpocK4Z I2Dc5eoDDC4FBnbtrvf8KRgbeetxzGVIpsoS+3cGxbT+4SUZhkAj0HMkwnj0Z4ACelXS nyQNZ7aTCSSpDYOkodlXl7IaJ5w0lJz0aiuieTPiwtKyv1buhFhPRbjB/pKv0baUoEqI QhMVj3xiwmS+AXh9sJI+qhFyngwtuZyE0X9nshPDo8qFB+7TWzhnEV5dLHYEWUKdmsjo s9YqlkgMMJ3TGHTr8h2hPejiJHMTjdU58h23kczQFgrbCmg7rBzlb9oV4bkmYvM+VgyA qwgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=D14VMdivnNPuV4+JRqkLSDY96TM30tWNxbe0YFl8kRk=; b=BWIuUMsyN4uixByaZo4K2tJ4eySyS3AZ9NMu/tt0U/7B01aJxz6esxIxyha854F4wC 0Kli51RRknejgUJepzZ43FVAUjtmqWvziQriMUNzgwb8ZLAVlec/BcDP8TNmktQ2X8yY Bt8Ea51w0q63OO7ZZukKSW8BHEwFIDRW4O/9uGCw/9I/Kqv1nWvkzxRFEd+6CvBO56Hx UWKEOjMhJgtpCSTDeUY60+YCyh793H7/MK57Js1epROwmsEh3/vJM1W7/tzImQKK193X /eTpls4xpELkzL2WNq5tNupL3zFUVZ+d2+2rTAjf1DrUDOI760RO/EN5fOepqW0A82T8 WGQw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p11-v6si14492189plq.33.2018.06.18.02.42.43; Mon, 18 Jun 2018 02:42:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935966AbeFRJmC (ORCPT + 99 others); Mon, 18 Jun 2018 05:42:02 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:55610 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935470AbeFRIWk (ORCPT ); Mon, 18 Jun 2018 04:22:40 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 03899C93; Mon, 18 Jun 2018 08:22:39 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Gaurav Kohli , "Peter Zijlstra (Intel)" , Linus Torvalds , Oleg Nesterov , Thomas Gleixner , Ingo Molnar , Sasha Levin Subject: [PATCH 4.16 161/279] kthread, sched/wait: Fix kthread_parkme() completion issue Date: Mon, 18 Jun 2018 10:12:26 +0200 Message-Id: <20180618080615.492002239@linuxfoundation.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180618080608.851973560@linuxfoundation.org> References: <20180618080608.851973560@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.16-stable review patch. If anyone has any objections, please let me know. ------------------ From: Peter Zijlstra [ Upstream commit 85f1abe0019fcb3ea10df7029056cf42702283a8 ] Even with the wait-loop fixed, there is a further issue with kthread_parkme(). Upon hotplug, when we do takedown_cpu(), smpboot_park_threads() can return before all those threads are in fact blocked, due to the placement of the complete() in __kthread_parkme(). When that happens, sched_cpu_dying() -> migrate_tasks() can end up migrating such a still runnable task onto another CPU. Normally the task will have hit schedule() and gone to sleep by the time we do kthread_unpark(), which will then do __kthread_bind() to re-bind the task to the correct CPU. However, when we loose the initial TASK_PARKED store to the concurrent wakeup issue described previously, do the complete(), get migrated, it is possible to either: - observe kthread_unpark()'s clearing of SHOULD_PARK and terminate the park and set TASK_RUNNING, or - __kthread_bind()'s wait_task_inactive() to observe the competing TASK_RUNNING store. Either way the WARN() in __kthread_bind() will trigger and fail to correctly set the CPU affinity. Fix this by only issuing the complete() when the kthread has scheduled out. This does away with all the icky 'still running' nonsense. The alternative is to promote TASK_PARKED to a special state, this guarantees wait_task_inactive() cannot observe a 'stale' TASK_RUNNING and we'll end up doing the right thing, but this preserves the whole icky business of potentially migating the still runnable thing. Reported-by: Gaurav Kohli Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/linux/kthread.h | 1 + kernel/kthread.c | 43 +++++++++++++++++++------------------------ kernel/sched/core.c | 32 +++++++++++++++++++++----------- 3 files changed, 41 insertions(+), 35 deletions(-) --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -62,6 +62,7 @@ void *kthread_probe_data(struct task_str int kthread_park(struct task_struct *k); void kthread_unpark(struct task_struct *k); void kthread_parkme(void); +void kthread_park_complete(struct task_struct *k); int kthreadd(void *unused); extern struct task_struct *kthreadd_task; --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -55,7 +55,6 @@ enum KTHREAD_BITS { KTHREAD_IS_PER_CPU = 0, KTHREAD_SHOULD_STOP, KTHREAD_SHOULD_PARK, - KTHREAD_IS_PARKED, }; static inline void set_kthread_struct(void *kthread) @@ -181,11 +180,8 @@ static void __kthread_parkme(struct kthr set_current_state(TASK_PARKED); if (!test_bit(KTHREAD_SHOULD_PARK, &self->flags)) break; - if (!test_and_set_bit(KTHREAD_IS_PARKED, &self->flags)) - complete(&self->parked); schedule(); } - clear_bit(KTHREAD_IS_PARKED, &self->flags); __set_current_state(TASK_RUNNING); } @@ -195,6 +191,11 @@ void kthread_parkme(void) } EXPORT_SYMBOL_GPL(kthread_parkme); +void kthread_park_complete(struct task_struct *k) +{ + complete(&to_kthread(k)->parked); +} + static int kthread(void *_create) { /* Copy data: it's on kthread's stack */ @@ -451,22 +452,15 @@ void kthread_unpark(struct task_struct * { struct kthread *kthread = to_kthread(k); - clear_bit(KTHREAD_SHOULD_PARK, &kthread->flags); /* - * We clear the IS_PARKED bit here as we don't wait - * until the task has left the park code. So if we'd - * park before that happens we'd see the IS_PARKED bit - * which might be about to be cleared. + * Newly created kthread was parked when the CPU was offline. + * The binding was lost and we need to set it again. */ - if (test_and_clear_bit(KTHREAD_IS_PARKED, &kthread->flags)) { - /* - * Newly created kthread was parked when the CPU was offline. - * The binding was lost and we need to set it again. - */ - if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags)) - __kthread_bind(k, kthread->cpu, TASK_PARKED); - wake_up_state(k, TASK_PARKED); - } + if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags)) + __kthread_bind(k, kthread->cpu, TASK_PARKED); + + clear_bit(KTHREAD_SHOULD_PARK, &kthread->flags); + wake_up_state(k, TASK_PARKED); } EXPORT_SYMBOL_GPL(kthread_unpark); @@ -489,12 +483,13 @@ int kthread_park(struct task_struct *k) if (WARN_ON(k->flags & PF_EXITING)) return -ENOSYS; - if (!test_bit(KTHREAD_IS_PARKED, &kthread->flags)) { - set_bit(KTHREAD_SHOULD_PARK, &kthread->flags); - if (k != current) { - wake_up_process(k); - wait_for_completion(&kthread->parked); - } + if (WARN_ON_ONCE(test_bit(KTHREAD_SHOULD_PARK, &kthread->flags))) + return -EBUSY; + + set_bit(KTHREAD_SHOULD_PARK, &kthread->flags); + if (k != current) { + wake_up_process(k); + wait_for_completion(&kthread->parked); } return 0; --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -30,6 +30,8 @@ #include #include +#include + #include #include #ifdef CONFIG_PARAVIRT @@ -2733,20 +2735,28 @@ static struct rq *finish_task_switch(str membarrier_mm_sync_core_before_usermode(mm); mmdrop(mm); } - if (unlikely(prev_state == TASK_DEAD)) { - if (prev->sched_class->task_dead) - prev->sched_class->task_dead(prev); + if (unlikely(prev_state & (TASK_DEAD|TASK_PARKED))) { + switch (prev_state) { + case TASK_DEAD: + if (prev->sched_class->task_dead) + prev->sched_class->task_dead(prev); + + /* + * Remove function-return probe instances associated with this + * task and put them back on the free list. + */ + kprobe_flush_task(prev); - /* - * Remove function-return probe instances associated with this - * task and put them back on the free list. - */ - kprobe_flush_task(prev); + /* Task is done with its stack. */ + put_task_stack(prev); - /* Task is done with its stack. */ - put_task_stack(prev); + put_task_struct(prev); + break; - put_task_struct(prev); + case TASK_PARKED: + kthread_park_complete(prev); + break; + } } tick_nohz_task_switch();