Received: by 10.192.165.148 with SMTP id m20csp1879863imm; Thu, 3 May 2018 06:57:18 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrVzVsXPvw7X7ejGwDBtlDFtOjTNhvFbc8keXqaR38wqSPmqrYJURb+k8v8N4F5GtwNOxx7 X-Received: by 2002:a65:5088:: with SMTP id r8-v6mr14141626pgp.80.1525355838404; Thu, 03 May 2018 06:57:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525355838; cv=none; d=google.com; s=arc-20160816; b=Donu6OtwI/pi4nRv+Pie+fOATy1CsA5AnHSd7TDZJsdHyZPs+drXkIlyMH7LQ2qyoa IAVamOPs51A6QWC6Dn3CbPHv4dJlzCBBoRZbbSKEygA7/P3W2RFUmIREAEtMzp8Ipzrm C6eDgG/kILThRA56CfKm0cIQCFlCaNK6y5YWcP/dCMWj0TV0/KiVhoz6ffdGa+oqMY+y FhQS6QlymdIGvXRYU61iMZTT2MOJYTz4ksexqWHzDgvSvtS1VLqd3lzGy+M2PiVnhQ01 vXoSMLpdOZgSRYH77w/o2YqCetgN99VIauI/JTI5PsIyzaxwj96dgDHbfmtBEWaQxFSc Hm6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=Q64q6sOMexCc/ae/N5O9wN4qXaSORMn2D6NbA61Wjfg=; b=VKqKN7kc8T5+O1EBhO34yhBK1bAGyiOxumt9Go0rbeyUetyXCAzJi3xg8ayb4Nzgls UmwYqdzodu+PIs6CXnunoEsIN3B9///6C3ELJOG9Nx14Eh4quYCUTdurUVasDESVnWSt JAEVos8vHA1mtXUsdDLerCJ2ncg76+H5llxgL09339Ypu60bHVOuw+d5yOzdjc9utbBv zIch88BDWvs2Lj/1Zup/Rg2nKA3mxJjaVFspWBkEx7MGtDKpxTmqmWOsGrYpGoxv8x2y wPfR36Zq6eMUsnII8JEMsVXXm4lMIU0GoAItlS2eih+QumJ/TRw67H8DM/tg7LUlmq++ X5XA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=CVXVSaMB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h134si14370421pfe.52.2018.05.03.06.57.03; Thu, 03 May 2018 06:57:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=CVXVSaMB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751351AbeECN42 (ORCPT + 99 others); Thu, 3 May 2018 09:56:28 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:49418 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750944AbeECN4Z (ORCPT ); Thu, 3 May 2018 09:56:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Q64q6sOMexCc/ae/N5O9wN4qXaSORMn2D6NbA61Wjfg=; b=CVXVSaMBUUyD8VnkKOiktrNP2 xY3X6arEcF+2TWkPN4Cb4wxrI22J6UYjJuStNrSsQcB9C89PIOu74yZHjuNXY6qaf05d9e3a4Joia vEDrn4K+7vh77lbvdomFm5nkn/75LE67ZEplXwXHjbZEasOZSG2MV/IbXIKs+LBnGAnYBuq9mEl+4 SVPGQ1q2k8h0FXcKO/D+ATg7CXGToGwu3eutV0ISvCqGcjlkKwf3V9oTieB+4xQP176rq0Lo/J2wL VXjI/86cRePRe2ge9Py/SWLks8YXu8B0Jpao6KvGgiPSTErBLDlGOb34ptuzy9b6T+kKtpH2jOxF4 g0dt47zRg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fEEia-0003rD-1g; Thu, 03 May 2018 13:56:20 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id A9F9B2029B0BE; Thu, 3 May 2018 15:56:17 +0200 (CEST) Date: Thu, 3 May 2018 15:56:17 +0200 From: Peter Zijlstra To: Mike Galbraith Cc: Matt Fleming , Ingo Molnar , linux-kernel@vger.kernel.org, Michal Hocko , Paul McKenney Subject: Re: cpu stopper threads and load balancing leads to deadlock Message-ID: <20180503135617.GC12217@hirez.programming.kicks-ass.net> References: <20180417142119.GA4511@codeblueprint.co.uk> <20180420095005.GH4064@hirez.programming.kicks-ass.net> <20180424133325.GA3179@codeblueprint.co.uk> <1525349542.9956.2.camel@gmx.de> <20180503122808.GZ12217@hirez.programming.kicks-ass.net> <1525351221.9956.4.camel@gmx.de> <20180503124943.GB12217@hirez.programming.kicks-ass.net> <1525354359.5576.1.camel@gmx.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1525354359.5576.1.camel@gmx.de> User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 03, 2018 at 03:32:39PM +0200, Mike Galbraith wrote: > Dang. With $subject fix applied as well.. That's a NO then... :-( > [ 151.103732] smpboot: Booting Node 0 Processor 2 APIC 0x4 > [ 151.104908] ============================= > [ 151.104909] WARNING: suspicious RCU usage > [ 151.104910] 4.17.0.g66d489e-tip-default #84 Tainted: G E > [ 151.104911] ----------------------------- > [ 151.104912] kernel/sched/core.c:1625 suspicious rcu_dereference_check() usage! > [ 151.104913] > other info that might help us debug this: > > [ 151.104914] > RCU used illegally from offline CPU! > rcu_scheduler_active = 2, debug_locks = 0 > [ 151.104916] 3 locks held by swapper/2/0: > [ 151.104916] #0: 00000000560adb60 (stop_cpus_mutex){+.+.}, at: stop_machine_from_inactive_cpu+0x86/0x140 > [ 151.104923] #1: 00000000e4fb0238 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x2d/0x5f0 > [ 151.104929] #2: 000000003341403b (rcu_read_lock){....}, at: rcu_read_lock+0x0/0x80 > [ 151.104934] > stack backtrace: > [ 151.104937] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G E 4.17.0.g66d489e-tip-default #84 > [ 151.104938] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 > [ 151.104938] Call Trace: > [ 151.104942] dump_stack+0x78/0xb3 > [ 151.104945] ttwu_stat+0x121/0x130 > [ 151.104949] try_to_wake_up+0x2c2/0x5f0 > [ 151.104953] ? cpu_stop_park+0x30/0x30 > [ 151.104956] wake_up_q+0x4a/0x70 > [ 151.104959] cpu_stop_queue_work+0x6b/0xa0 > [ 151.104963] queue_stop_cpus_work+0x61/0xb0 > [ 151.104968] stop_machine_from_inactive_cpu+0xd8/0x140 > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > > index f89014a2c238..a32518c2ba4a 100644 > > --- a/kernel/stop_machine.c > > +++ b/kernel/stop_machine.c > > @@ -650,8 +650,10 @@ int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data, > > /* Schedule work on other CPUs and execute directly for local CPU */ > > set_state(&msdata, MULTI_STOP_PREPARE); > > cpu_stop_init_done(&done, num_active_cpus()); > > - queue_stop_cpus_work(cpu_active_mask, multi_cpu_stop, &msdata, > > - &done); > > + > > + RCU_NONIDLE(queue_stop_cpus_work(cpu_active_mask, multi_cpu_stop, > > + &msdata, &done)); > > + > > ret = multi_cpu_stop(&msdata); Paul, any clue on what else to try here? The whole MTRR setup is radically crazy but it's something we're stuck with (yay hardware) :/ So the issue is that we're doing wakeups from an offline CPU (very early during bringup) and RCU (rightfully) complains about that. I thought RCU_NONIDLE() was the magic incantation that makes RCU 'watch', but clearly it's not enough here.