Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp1928619ybv; Fri, 14 Feb 2020 08:23:22 -0800 (PST) X-Google-Smtp-Source: APXvYqz1ua7aDYoZOGjQzo2gcyOn4R463s/Y5GZBKRR57Pl4zfcnS7PjnCs80AdgdqfPSr8XT0D7 X-Received: by 2002:a05:6830:154c:: with SMTP id l12mr2895941otp.275.1581697402422; Fri, 14 Feb 2020 08:23:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581697402; cv=none; d=google.com; s=arc-20160816; b=mpvoK49x3f1es3VKogdRHLXH2cMMH1SpTyXevSuaDN4aI7EtXWsOEStxBIqlvrxbdQ KtekrBxd9aEZ2ozYG2TqfjwfTJid5Dza3Wx3Al6wJIBW29UsiDXkvRuvouOxcZUkNMaM XgQHY5W8dpNzLD3FJtS71XrthFB0arLKutKic4yM8+1HBTTUTmvalRu59vWAPDrGVLaP ZywV4eWXXDaeot3SQxm2RTTcroU3eOcBNS2OPGShqsULr/cBWMCtVBIoseFF6Ap41/Cm v+prgP5FDeX/8Hx3D5i/XCQgcYzUqbV4Sv4rQGPnZ3RMAt6ilFR7E5ZuzWy1FpvOg1Es Ctgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=1nSRYFBUc2ejBQ/QbiwzDsjc4vQSNI4brLVy7Krf+T0=; b=vKINkrcrjuQ0B39+QU4rK787CxAV610sek8fCRXQjANy6OsDCLkmt351h3oF8SfeBc 9UOZ47efSPRYQwDpyc+SLaTYHDW4lbYo8r37YJxRTr1OnJfQvvVCWBBpugqmr0hscbxW cmQSsC0x0n8bK5JzDn6gt+/NoWDpnACK14irPA1WFncgWdHQDJmGeW/07h1dHaD4cqiI bNUMcT+FCcg/EidiWotNj4KO12PV/iewAQ4JQdr+AfBn4zHGlSOc5CwmqKJLYu2cpirj AOPaY2wuGofLH8deAmjHWQIjaPoX1DJsNjxPzn4qey2Z4XwQBOaUDkqHzQ1XDMOPyat5 a7zg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=0z8JLi5y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p83si2811223oih.198.2020.02.14.08.23.09; Fri, 14 Feb 2020 08:23:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=0z8JLi5y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393226AbgBNQWh (ORCPT + 99 others); Fri, 14 Feb 2020 11:22:37 -0500 Received: from mail.kernel.org ([198.145.29.99]:56024 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392938AbgBNQV2 (ORCPT ); Fri, 14 Feb 2020 11:21:28 -0500 Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D6B54246A6; Fri, 14 Feb 2020 16:21:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1581697287; bh=CxAfZQ85yHal5k9+teaGTcPCjoDGV3TJHRuPbXslCXk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=0z8JLi5yv3N/Fh1SC1nm8u45hIPScDitW60bZks+9xodCAeCcTbOGm6WHOD+nrJdp lTf6LFCWFCo87I48sgEDZjIT6/WlIipYB/gcfQvPgdWoJqEVbC7wuAVw0ramV/FwjS s2pdlvMh7EDRmC4xIof5mxeRrrHoQULiCaQ51mZ8= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Peter Zijlstra , "Paul E. McKenney" , Tejun Heo , Sasha Levin Subject: [PATCH AUTOSEL 4.9 004/141] cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order Date: Fri, 14 Feb 2020 11:19:04 -0500 Message-Id: <20200214162122.19794-4-sashal@kernel.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200214162122.19794-1-sashal@kernel.org> References: <20200214162122.19794-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Peter Zijlstra [ Upstream commit 45178ac0cea853fe0e405bf11e101bdebea57b15 ] Paul reported a very sporadic, rcutorture induced, workqueue failure. When the planets align, the workqueue rescuer's self-migrate fails and then triggers a WARN for running a work on the wrong CPU. Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call could be ignored! When stopper->enabled is false, stop_machine will insta complete the work, without actually doing the work. Worse, it will not WARN about this (we really should fix this). It turns out there is a small window where a freshly online'ed CPU is marked 'online' but doesn't yet have the stopper task running: BP AP bringup_cpu() __cpu_up(cpu, idle) --> start_secondary() ... cpu_startup_entry() bringup_wait_for_ap() wait_for_ap_thread() <-- cpuhp_online_idle() while (1) do_idle() ... available to run kthreads ... stop_machine_unpark() stopper->enable = true; Close this by moving the stop_machine_unpark() into cpuhp_online_idle(), such that the stopper thread is ready before we start the idle loop and schedule. Reported-by: "Paul E. McKenney" Debugged-by: Tejun Heo Signed-off-by: Peter Zijlstra (Intel) Tested-by: "Paul E. McKenney" Signed-off-by: Sasha Levin --- kernel/cpu.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index c2573e858009b..1fbe93fefc1fa 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -515,8 +515,7 @@ static int bringup_wait_for_ap(unsigned int cpu) if (WARN_ON_ONCE((!cpu_online(cpu)))) return -ECANCELED; - /* Unpark the stopper thread and the hotplug thread of the target cpu */ - stop_machine_unpark(cpu); + /* Unpark the hotplug thread of the target cpu */ kthread_unpark(st->thread); /* @@ -1115,8 +1114,8 @@ void notify_cpu_starting(unsigned int cpu) /* * Called from the idle task. Wake up the controlling task which brings the - * stopper and the hotplug thread of the upcoming CPU up and then delegates - * the rest of the online bringup to the hotplug thread. + * hotplug thread of the upcoming CPU up and then delegates the rest of the + * online bringup to the hotplug thread. */ void cpuhp_online_idle(enum cpuhp_state state) { @@ -1126,6 +1125,12 @@ void cpuhp_online_idle(enum cpuhp_state state) if (state != CPUHP_AP_ONLINE_IDLE) return; + /* + * Unpart the stopper thread before we start the idle loop (and start + * scheduling); this ensures the stopper task is always available. + */ + stop_machine_unpark(smp_processor_id()); + st->state = CPUHP_AP_ONLINE_IDLE; complete(&st->done); } -- 2.20.1