Received: by 10.192.165.148 with SMTP id m20csp1784110imm; Thu, 3 May 2018 05:28:44 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrtotVkNSC3U+trnaICeaPzO0eoQpGVOzFN4GrtDfUnBChbTFO4+0KCRb6rUgysBi5s7Y4R X-Received: by 2002:a65:4d07:: with SMTP id i7-v6mr4935605pgt.149.1525350524383; Thu, 03 May 2018 05:28:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525350524; cv=none; d=google.com; s=arc-20160816; b=o3HDyzNFjrNpPao/1SlUX5hh8fzzVjGyUPriaToUhy54C/a5nkmW/4LlfTMeSL2BHB 82wz+bIownBv8BEHZ1mPVrOuoFlTkjLkTmzbRw+jH+uUa8avlB4FpGp02uWuMAizH2re lz0ZsjN1Ec3rW3706A/G1MsU0VCpbuaGrC1pzqe3nh7vVpY79WsDNTtC0OXubCNPVtQA cfHvVPkcT0M32IuObgMr2mNu/v0lNKMpkMHJ8iY2e1MtbwdXa0Ahw56qY4DVyUnORurD 8KpI1V711gunXrmIHiU6Ux3iGHLBrfq5ESzXzu5JIQpBfDSrFQkyzhUceEeIV3C0pSqF 5FAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=Uox2Iy6Gsg3X5WVpuoKdhSKrVmEtrQTVv8xyI0HPZ5w=; b=gOh8OS4dFzGHxMR+2gfVIxTFB5+aq/NfbLvYMjRnjU61uWhI9ab0ZA1SXB3/AMxVHQ PoEOYyQebB20k0ELwcnBfQ1z00WYXKptXTGx/YmuiJ9zIW9woTKWlUUrCr+fc3bpQsXX kqa5VTJa+1pUbD3x06sK+oFttaExaSdPq0A/7fJ90SKfYgWKikYAmsyW9NTjTm2dV2Eb JnjraNndbzcCkmKhIJLwIqP/0H0WftS2gc6gUU8oBcMI4aztxq8MONoPE6P5b+q+Ep9c VIIY1pb6pO3xmyZmDBoXehWFVqhyBNt6hWjKfGxnfWUx5Kud0kTDSYNcs1hb/8UXqi0D xKIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=jm/rczNG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h23si13638589pfn.287.2018.05.03.05.28.30; Thu, 03 May 2018 05:28:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=jm/rczNG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751570AbeECM2T (ORCPT + 99 others); Thu, 3 May 2018 08:28:19 -0400 Received: from merlin.infradead.org ([205.233.59.134]:41236 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751111AbeECM2S (ORCPT ); Thu, 3 May 2018 08:28:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Uox2Iy6Gsg3X5WVpuoKdhSKrVmEtrQTVv8xyI0HPZ5w=; b=jm/rczNG4pFp/y9ReVLSeiA11 n73vGl0UHon5hRRjaL1XODQL5XDIpMU9gQoKXkygPFDeQ0seqccUKKuLfQUgZx2SPJEtAVXkqZhHb +uof+e7UJ4k5Xa/nQG6yFUcC7qiu808/dGmhpQdhVQzLTZkLwu6944BnsLl7kYdRZrNrCFcZjbmnQ WpNu0qmKiPMhd70QfCCIRWWXZKz5VBQqdUi9GWLvikj/0zRqZCOfvyLrYTPjNscpBVNTIpMoITyF5 3gAst6CjnyXuFNJitCzxDIvuHjXVVO29sSL+nQONQxayFfmgp/UaBTRbcniSwyHjfKNB6T+e2eGo2 JoQKYunog==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fEDLH-0003L2-K5; Thu, 03 May 2018 12:28:12 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id CC4082029F872; Thu, 3 May 2018 14:28:08 +0200 (CEST) Date: Thu, 3 May 2018 14:28:08 +0200 From: Peter Zijlstra To: Mike Galbraith Cc: Matt Fleming , Ingo Molnar , linux-kernel@vger.kernel.org, Michal Hocko Subject: Re: cpu stopper threads and load balancing leads to deadlock Message-ID: <20180503122808.GZ12217@hirez.programming.kicks-ass.net> References: <20180417142119.GA4511@codeblueprint.co.uk> <20180420095005.GH4064@hirez.programming.kicks-ass.net> <20180424133325.GA3179@codeblueprint.co.uk> <1525349542.9956.2.camel@gmx.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1525349542.9956.2.camel@gmx.de> User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 03, 2018 at 02:12:22PM +0200, Mike Galbraith wrote: > [ 124.216939] ============================= > [ 124.216939] WARNING: suspicious RCU usage > [ 124.216941] 4.17.0.g66d489e-tip-default #82 Tainted: G E > [ 124.216941] ----------------------------- > [ 124.216943] kernel/sched/core.c:1614 suspicious rcu_dereference_check() usage! > [ 124.216944] > other info that might help us debug this: > > [ 124.216945] > RCU used illegally from offline CPU! > rcu_scheduler_active = 2, debug_locks = 0 > [ 124.216946] 4 locks held by swapper/2/0: > [ 124.216947] #0: 000000001f9fa447 (stop_cpus_mutex){+.+.}, at: stop_machine_from_inactive_cpu+0x86/0x130 > [ 124.216953] #1: 000000004cb07b3b (&stopper->lock){..-.}, at: cpu_stop_queue_work+0x2d/0x80 > [ 124.216958] #2: 00000000d3a46b90 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x2d/0x5f0 > [ 124.216964] #3: 00000000f360767b (rcu_read_lock){....}, at: rcu_read_lock+0x0/0x80 > [ 124.216969] > stack backtrace: > [ 124.216971] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G E 4.17.0.g66d489e-tip-default #82 > [ 124.216972] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 > [ 124.216973] Call Trace: > [ 124.216977] dump_stack+0x78/0xb3 > [ 124.216979] ttwu_stat+0x121/0x130 > [ 124.216983] try_to_wake_up+0x2c2/0x5f0 > [ 124.216988] ? cpu_stop_park+0x30/0x30 > [ 124.216990] cpu_stop_queue_work+0x7c/0x80 > [ 124.216993] queue_stop_cpus_work+0x61/0xb0 > [ 124.216997] stop_machine_from_inactive_cpu+0xd3/0x130 > [ 124.216999] ? mtrr_restore+0x80/0x80 > [ 124.217005] mtrr_ap_init+0x62/0x70 > [ 124.217008] identify_secondary_cpu+0x18/0x80 > [ 124.217011] smp_store_cpu_info+0x44/0x50 > [ 124.217014] start_secondary+0x9a/0x1e0 > [ 124.217017] secondary_startup_64+0xa5/0xb0 Hurm.. I don't see how this is 'new'. We moved the wakeup out from under stopper lock, but that should not affect the RCU state. The warning is of course valid, stop_machine_from_inactive_cpu() explicitly run on an 'offline' CPU. The patch didn't change this.