Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754913AbdCTMvr (ORCPT ); Mon, 20 Mar 2017 08:51:47 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:60810 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753513AbdCTMu4 (ORCPT ); Mon, 20 Mar 2017 08:50:56 -0400 Date: Mon, 20 Mar 2017 05:50:00 -0700 From: "Paul E. McKenney" To: Tomeu Vizoso Cc: Peter Zijlstra , Thomas Gleixner , "linux-kernel@vger.kernel.org" , Ingo Molnar , fweisbec@gmail.com Subject: Re: RCU used on incoming CPU before rcu_cpu_starting() called Reply-To: paulmck@linux.vnet.ibm.com References: <20170308221656.GA11949@linux.vnet.ibm.com> <20170309151255.GA3343@twins.programming.kicks-ass.net> <20170309152926.GT30506@linux.vnet.ibm.com> <20170309155030.GA13748@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17032012-0048-0000-0000-000001335F01 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006816; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000206; SDB=6.00836347; UDB=6.00411037; IPR=6.00614115; BA=6.00005223; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00014724; XFM=3.00000013; UTC=2017-03-20 12:50:00 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17032012-0049-0000-0000-00003FD45375 Message-Id: <20170320125000.GG3637@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-03-20_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1703200112 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3474 Lines: 78 On Mon, Mar 20, 2017 at 09:32:37AM +0100, Tomeu Vizoso wrote: > On 9 March 2017 at 16:50, Paul E. McKenney wrote: > > > > On Thu, Mar 09, 2017 at 07:29:26AM -0800, Paul E. McKenney wrote: > > > On Thu, Mar 09, 2017 at 04:12:55PM +0100, Peter Zijlstra wrote: > > > > On Thu, Mar 09, 2017 at 02:08:23PM +0100, Thomas Gleixner wrote: > > > > > On Wed, 8 Mar 2017, Paul E. McKenney wrote: > > > > > > [ 30.694013] lockdep_rcu_suspicious+0xe7/0x120 > > > > > > [ 30.694013] get_work_pool+0x82/0x90 > > > > > > [ 30.694013] __queue_work+0x70/0x5f0 > > > > > > [ 30.694013] queue_work_on+0x33/0x70 > > > > > > [ 30.694013] clear_sched_clock_stable+0x33/0x40 > > > > > > [ 30.694013] early_init_intel+0xe7/0x2f0 > > > > > > [ 30.694013] init_intel+0x11/0x350 > > > > > > [ 30.694013] identify_cpu+0x344/0x5a0 > > > > > > [ 30.694013] identify_secondary_cpu+0x18/0x80 > > > > > > [ 30.694013] smp_store_cpu_info+0x39/0x40 > > > > > > [ 30.694013] start_secondary+0x4e/0x100 > > > > > > [ 30.694013] start_cpu+0x14/0x14 > > > > > > > > > > > > Here is the relevant code from x86's smp_callin(): > > > > > > > > > > > > /* > > > > > > * Save our processor parameters. Note: this information > > > > > > * is needed for clock calibration. > > > > > > */ > > > > > > smp_store_cpu_info(cpuid); > > > > > > > > > > > > The problem is that smp_store_cpu_info() indirectly invokes > > > > > > schedule_work(), which wants to use RCU. But RCU isn't informed > > > > > > of the incoming CPU until the call to notify_cpu_starting(), which > > > > > > causes lockdep to complain bitterly about the use of RCU by the > > > > > > premature call to schedule_work(). > > > > > > > > > > Right. And that want's to be fixed, not hacked around by silencing RCU. > > > > > > > > > > Peter???? > > > > > > > > I'm thinking this is hotplug? 30 seconds after boot is far too late for > > > > SMP bringup, or you have a stupid slow machine. > > > > > > And this certainly does qualify as "shortly", thank you! > > > > > > Yes, this only happens on hotplug with lockdep enabled, specifically > > > on rcutorture scenarios TASKS01 and TREE05. > > > > > > > Because it only calls schedule_work() after SMP-init. In which case > > > > there's then two cases, either: > > > > > > > > - TSC was stable, hotplug wrecked it, TSC is now unstable, and we're > > > > screwed. > > > > > > > > - TSC was unstable, hotplug triggers and we want to mark it unstable > > > > _again_. > > > > > > > > If this is the second, the below should fix it, if its the first, I've > > > > no idea yet on how to fix that properly :/ > > > > > > I have applied this patch and started tests on TREE05 and TASKS01, should > > > get results shortly. > > > > And the below patch passed light rcutorture testing, so looking good! > > I'm having trouble finding this patch in linux-next, has it been pushed already? Peter pointed out that this v4.11-rc2 patch should fix the problem, see Message-ID: <20170316155310.afq6zfzkzrnsqm5n@hirez.programming.kicks-ass.net>. I rebased to v4.11-rc2, and haven't seen the problem, so I dropped the patch referred to above. f94c8d116997 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface") I am not sure whether or not Peter is sending another patch or if he was instead was going to amend f94c8d116997's changelog. Thanx, Paul Thanx, Paul