Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754038AbdCHXns (ORCPT ); Wed, 8 Mar 2017 18:43:48 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:35156 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750869AbdCHXnq (ORCPT ); Wed, 8 Mar 2017 18:43:46 -0500 Date: Wed, 8 Mar 2017 15:41:52 -0800 From: "Paul E. McKenney" To: linux-kernel@vger.kernel.org Cc: mingo@redhat.com, peterz@infradead.org, fweisbec@gmail.com, tglx@linutronix.de Subject: Re: RCU used on incoming CPU before rcu_cpu_starting() called Reply-To: paulmck@linux.vnet.ibm.com References: <20170308221656.GA11949@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170308221656.GA11949@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17030823-0008-0000-0000-000001B11E60 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006747; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000206; SDB=6.00831908; UDB=6.00408205; IPR=6.00609500; BA=6.00005201; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00014565; XFM=3.00000013; UTC=2017-03-08 23:41:53 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17030823-0009-0000-0000-000033A48435 Message-Id: <20170308234152.GA14819@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-03-08_18:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1703080183 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7974 Lines: 203 On Wed, Mar 08, 2017 at 02:16:56PM -0800, Paul E. McKenney wrote: > Hello! > > I am seeing the following splat in rcutorture testing of v4.11-rc1: > > [ 30.694013] ============================= > [ 30.694013] WARNING: suspicious RCU usage > [ 30.694013] 4.11.0-rc1+ #1 Not tainted > [ 30.694013] ----------------------------- > [ 30.694013] /home/git/linux-2.6-tip/kernel/workqueue.c:712 sched RCU or wq_pool_mutex should be held! > [ 30.694013] > [ 30.694013] other info that might help us debug this: > [ 30.694013] > [ 30.694013] > [ 30.694013] RCU used illegally from offline CPU! > [ 30.694013] rcu_scheduler_active = 2, debug_locks = 0 > [ 30.694013] no locks held by swapper/1/0. > [ 30.694013] > [ 30.694013] stack backtrace: > [ 30.694013] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc1+ #1 > [ 30.694013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > [ 30.694013] Call Trace: > [ 30.694013] dump_stack+0x67/0x99 > [ 30.694013] lockdep_rcu_suspicious+0xe7/0x120 > [ 30.694013] get_work_pool+0x82/0x90 > [ 30.694013] __queue_work+0x70/0x5f0 > [ 30.694013] queue_work_on+0x33/0x70 > [ 30.694013] clear_sched_clock_stable+0x33/0x40 > [ 30.694013] early_init_intel+0xe7/0x2f0 > [ 30.694013] init_intel+0x11/0x350 > [ 30.694013] identify_cpu+0x344/0x5a0 > [ 30.694013] identify_secondary_cpu+0x18/0x80 > [ 30.694013] smp_store_cpu_info+0x39/0x40 > [ 30.694013] start_secondary+0x4e/0x100 > [ 30.694013] start_cpu+0x14/0x14 > > Here is the relevant code from x86's smp_callin(): > > /* > * Save our processor parameters. Note: this information > * is needed for clock calibration. > */ > smp_store_cpu_info(cpuid); > > /* > * Get our bogomips. > * Update loops_per_jiffy in cpu_data. Previous call to > * smp_store_cpu_info() stored a value that is close but not as > * accurate as the value just calculated. > */ > calibrate_delay(); > cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy; > pr_debug("Stack at about %p\n", &cpuid); > > /* > * This must be done before setting cpu_online_mask > * or calling notify_cpu_starting. > */ > set_cpu_sibling_map(raw_smp_processor_id()); > wmb(); > > notify_cpu_starting(cpuid); > > The problem is that smp_store_cpu_info() indirectly invokes > schedule_work(), which wants to use RCU. But RCU isn't informed > of the incoming CPU until the call to notify_cpu_starting(), which > causes lockdep to complain bitterly about the use of RCU by the > premature call to schedule_work(). > > I considered just moving the notify_cpu_starting() earlier in the > sequence, but the comments make it seem like this would not be > a wise choice. > > Any suggestions? And I should hasten to mention my default approach, which would be to provide a Kconfig option that causes notify_cpu_starting() to skip calling rcu_cpu_starting(), and to add a call to this function in x86's smp_callin() just before the call to smp_store_cpu_info(). But this seemed a bit crude, so I figured I should check with you guys before taking that approach. But what is the fun in that? Please see below for the corresponding lightly tested patch. ;-) Thanx, Paul ------------------------------------------------------------------------ commit aa805dad9ae66cc4f5106d004b6bb5102fd84434 Author: Paul E. McKenney Date: Wed Mar 8 15:32:29 2017 -0800 cpu: Move RCU earlier in x86 CPU-online procedure Running rcutorture on v4.11-rc1 results in the following splat on x86 on kernels built with both CPU hotplug and lockdep: [ 30.694013] ============================= [ 30.694013] WARNING: suspicious RCU usage [ 30.694013] 4.11.0-rc1+ #1 Not tainted [ 30.694013] ----------------------------- [ 30.694013] /home/git/linux-2.6-tip/kernel/workqueue.c:712 sched RCU or wq_pool_mutex should be held! [ 30.694013] [ 30.694013] other info that might help us debug this: [ 30.694013] [ 30.694013] [ 30.694013] RCU used illegally from offline CPU! [ 30.694013] rcu_scheduler_active = 2, debug_locks = 0 [ 30.694013] no locks held by swapper/1/0. [ 30.694013] [ 30.694013] stack backtrace: [ 30.694013] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc1+ #1 [ 30.694013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 30.694013] Call Trace: [ 30.694013] dump_stack+0x67/0x99 [ 30.694013] lockdep_rcu_suspicious+0xe7/0x120 [ 30.694013] get_work_pool+0x82/0x90 [ 30.694013] __queue_work+0x70/0x5f0 [ 30.694013] queue_work_on+0x33/0x70 [ 30.694013] clear_sched_clock_stable+0x33/0x40 [ 30.694013] early_init_intel+0xe7/0x2f0 [ 30.694013] init_intel+0x11/0x350 [ 30.694013] identify_cpu+0x344/0x5a0 [ 30.694013] identify_secondary_cpu+0x18/0x80 [ 30.694013] smp_store_cpu_info+0x39/0x40 [ 30.694013] start_secondary+0x4e/0x100 [ 30.694013] start_cpu+0x14/0x14 This is caused by the fact that smp_store_cpu_info() indirectly invokes schedule_work(), which wants to use RCU. But RCU isn't informed of the incoming CPU until the later call to notify_cpu_starting(), which causes lockdep to complain bitterly about the use of RCU by the premature call to schedule_work(). It is said to be unwise to move the call to notify_cpu_starting() to precede that to smp_store_cpu_info(), so this commit adds a ARCH_RCU_ONLINE_EARLY Kconfig option that is selected by the x86 architecture. This option suppresses the call to rcu_cpu_starting() from notify_cpu_starting(), and a call to rcu_cpu_starting() is added preceding the call to smp_store_cpu_info(). Signed-off-by: Paul E. McKenney Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Frederic Weisbecker Cc: Thomas Gleixner diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index cc98d5a294ee..bb42b76ec2dd 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -63,6 +63,7 @@ config X86 select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO + select ARCH_RCU_ONLINE_EARLY select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index bd1f1ad35284..ca7386d04194 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -188,6 +188,12 @@ static void smp_callin(void) apic_ap_setup(); /* + * Enable RCU readers on this CPU, needed for workqueues, + * which are used in smp_store_cpu_info(). + */ + rcu_cpu_starting(cpuid); + + /* * Save our processor parameters. Note: this information * is needed for clock calibration. */ diff --git a/init/Kconfig b/init/Kconfig index c859c993c26f..31c2d5e08f16 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -543,6 +543,9 @@ config RCU_STALL_COMMON the tiny variants to disable RCU CPU stall warnings, while making these warnings mandatory for the tree variants. +config ARCH_RCU_ONLINE_EARLY + bool + config CONTEXT_TRACKING bool diff --git a/kernel/cpu.c b/kernel/cpu.c index f7c063239fa5..9d98f2785fa0 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -851,7 +851,8 @@ void notify_cpu_starting(unsigned int cpu) struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu); enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE); - rcu_cpu_starting(cpu); /* Enables RCU usage on this CPU. */ + if (!IS_ENABLED(CONFIG_ARCH_RCU_ONLINE_EARLY)) + rcu_cpu_starting(cpu); /* Enables RCU usage on this CPU. */ while (st->state < target) { st->state++; cpuhp_invoke_callback(cpu, st->state, true, NULL);