Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754992AbdCIPNN (ORCPT ); Thu, 9 Mar 2017 10:13:13 -0500 Received: from bombadil.infradead.org ([65.50.211.133]:39680 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754730AbdCIPNM (ORCPT ); Thu, 9 Mar 2017 10:13:12 -0500 Date: Thu, 9 Mar 2017 16:12:55 +0100 From: Peter Zijlstra To: Thomas Gleixner Cc: "Paul E. McKenney" , linux-kernel@vger.kernel.org, mingo@redhat.com, fweisbec@gmail.com Subject: Re: RCU used on incoming CPU before rcu_cpu_starting() called Message-ID: <20170309151255.GA3343@twins.programming.kicks-ass.net> References: <20170308221656.GA11949@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2284 Lines: 67 On Thu, Mar 09, 2017 at 02:08:23PM +0100, Thomas Gleixner wrote: > On Wed, 8 Mar 2017, Paul E. McKenney wrote: > > [ 30.694013] lockdep_rcu_suspicious+0xe7/0x120 > > [ 30.694013] get_work_pool+0x82/0x90 > > [ 30.694013] __queue_work+0x70/0x5f0 > > [ 30.694013] queue_work_on+0x33/0x70 > > [ 30.694013] clear_sched_clock_stable+0x33/0x40 > > [ 30.694013] early_init_intel+0xe7/0x2f0 > > [ 30.694013] init_intel+0x11/0x350 > > [ 30.694013] identify_cpu+0x344/0x5a0 > > [ 30.694013] identify_secondary_cpu+0x18/0x80 > > [ 30.694013] smp_store_cpu_info+0x39/0x40 > > [ 30.694013] start_secondary+0x4e/0x100 > > [ 30.694013] start_cpu+0x14/0x14 > > > > Here is the relevant code from x86's smp_callin(): > > > > /* > > * Save our processor parameters. Note: this information > > * is needed for clock calibration. > > */ > > smp_store_cpu_info(cpuid); > > > > The problem is that smp_store_cpu_info() indirectly invokes > > schedule_work(), which wants to use RCU. But RCU isn't informed > > of the incoming CPU until the call to notify_cpu_starting(), which > > causes lockdep to complain bitterly about the use of RCU by the > > premature call to schedule_work(). > > Right. And that want's to be fixed, not hacked around by silencing RCU. > > Peter???? I'm thinking this is hotplug? 30 seconds after boot is far too late for SMP bringup, or you have a stupid slow machine. Because it only calls schedule_work() after SMP-init. In which case there's then two cases, either: - TSC was stable, hotplug wrecked it, TSC is now unstable, and we're screwed. - TSC was unstable, hotplug triggers and we want to mark it unstable _again_. If this is the second, the below should fix it, if its the first, I've no idea yet on how to fix that properly :/ Bloody hotplug.. --- kernel/sched/clock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index a08795e..eecf388 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -172,7 +172,7 @@ void clear_sched_clock_stable(void) smp_mb(); /* matches sched_clock_init_late() */ - if (sched_clock_running == 2) + if (sched_clock_running == 2 && sched_clock_stable()) schedule_work(&sched_clock_work); }