Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp3333497imm; Thu, 17 May 2018 07:11:32 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrkCs6en/lRGu3dsPLCW2hhrGjBLObYOwV65a+fkDC50dAdU8R48/iQvgKHONB3hlezLkNj X-Received: by 2002:a17:902:76c4:: with SMTP id j4-v6mr5426296plt.257.1526566292408; Thu, 17 May 2018 07:11:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526566292; cv=none; d=google.com; s=arc-20160816; b=FgxT0ROUeUJOKpvbBlzPrjiFXr4ZcYYwqPXmeLoOQlY+0i6dUwjjqkuIGjvo71wMLg pz8cqDX+cfWVSBVc7DUr0GxCNdRfJEn5nbskjOkkUXHcK/8nJ2CqJCWemU3+BxmXipI3 Gnod76JJTONeY5v8URG/PR1IAq8VQRaJsZxRgIdhpZOW2hPcs7covmw18Df/oo54b2qA FaerS4z+FFRSUNUFH7LEFfkqmOIfDKFG+HBce0nqpMtpm3XLWBAH9in5pzpHCOqsGulY em2+WhrRnS0s7fIHIyao075su8YCnFEk0JT2io/FT/Bv1oSQwRadBmx9uuuMPVbbpbPn WFLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :arc-authentication-results; bh=rzQ/XmjUD7Km6IfS7eFOOBw8VaDLgfkxfJZf/QmqaFM=; b=yxZ+MbqvUDztb4zaqfj6EckJxKiWcpRt+u53n6oSAeYvRXtN9iMQaZ4rKg6WfqNvYd uTsqXpJg6z0RcBbxmiFCKLGOqFLNA44x5r5cjG+x1dcGT6KtENytpCpGSsxJEidlwR8N Wg9s6Flzbj6FLu5cfpKj+/LnyjSwrcaFeiSVFXBsYJQtJJIH8w+GArg/pb8AWi/spD/5 HAcn2pZJUko1eNYopmKWDmKdwInzbY6uwx0fSWJTbnh+26ERhcKOIhIyGD/vz0ZiSr9p tg6GO18QcGWri96jXsn39kdxTwJ6PJZ9nKLxEIKodzMGMhzN169unXkQsbM5kP32rirf f1UA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 98-v6si5615577pls.180.2018.05.17.07.11.12; Thu, 17 May 2018 07:11:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751626AbeEQOKr (ORCPT + 99 others); Thu, 17 May 2018 10:10:47 -0400 Received: from mout.gmx.net ([212.227.17.22]:55767 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751280AbeEQOKp (ORCPT ); Thu, 17 May 2018 10:10:45 -0400 Received: from homer.simpson.net ([185.146.50.203]) by mail.gmx.com (mrgmx103 [212.227.17.168]) with ESMTPSA (Nemesis) id 0M2L60-1eRLVf0QUO-00s8vs; Thu, 17 May 2018 16:10:11 +0200 Message-ID: <1526566209.8616.19.camel@gmx.de> Subject: Re: cpu stopper threads and load balancing leads to deadlock From: Mike Galbraith To: paulmck@linux.vnet.ibm.com Cc: Peter Zijlstra , Matt Fleming , Ingo Molnar , linux-kernel@vger.kernel.org, Michal Hocko Date: Thu, 17 May 2018 16:10:09 +0200 In-Reply-To: <20180517140345.GI3803@linux.vnet.ibm.com> References: <20180503122808.GZ12217@hirez.programming.kicks-ass.net> <1525351221.9956.4.camel@gmx.de> <20180503124943.GB12217@hirez.programming.kicks-ass.net> <1525354359.5576.1.camel@gmx.de> <20180503135617.GC12217@hirez.programming.kicks-ass.net> <1525357015.5577.2.camel@gmx.de> <20180503144450.GD12217@hirez.programming.kicks-ass.net> <20180503161231.GI26088@linux.vnet.ibm.com> <20180503164508.GG12217@hirez.programming.kicks-ass.net> <1526358626.19125.0.camel@gmx.de> <20180517140345.GI3803@linux.vnet.ibm.com> Content-Type: text/plain; charset="ISO-8859-15" X-Mailer: Evolution 3.22.6 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K1:iSd/xDFpZChhe1oNpEREGvEGtstcG1WENOxCsupQlmdi1p+sF4m qpmx3TiaMh1ThOZI+LXYGm5R+l6C12P6mpAtd47Ns8WUdfnEZCPIYvmythoZo+pKpaa2+Xl NZyWkAQhvB+WI0LjRi9bjWXVBB7YHOoqr4kAf2r7wPx66Kfx0uIxOy1pAOWoQiReuo9HBA5 HGiZOi+/PdKUPgxUp1lwQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:AxKx4UirVXM=:BhKm/4dMJwUSxgjRoHGlfV NOvM5Aorzw5mEyykU+NmMRpnx1+4yxcIXGiExa3eeZ5WCCU+KP6kmJUuG2HfG3B1TNqXqLEIT bCAa6n+0C3Rkc0GmqNyZzQCHXkB3H804gHGkI1DuN16gKJEhxqQBnh240TTsxj5RpTVpiC/2k Wrl1+J2uA5Qt8QgGX06vCJNpCV+oIYhd+zINzoFOOtWD19eefZ9zzfgxTYvTdGaR+45FlsSrr DCo9Yir142uQDK7UrS1mnkl+JJYX24XoqwomkYTjSWm2K0oZ5Nlmmc+hQfRRE/kNWnwzhVquY nYryGX+gDBppB9Jvil20b5I/HXTh2yLKmac39styN86CElaC7zvX950oBJ7DsSgp+MLxaY2jR J1j2i5oRemoMvuJBK+1TzqrM52WinRAVxOXc/OskfkphjCGMTit8OGWpXFcltihnvILxUHint 41SorcEp/k6IAckHC2iecmOqqqH7sJth1U6h/qUZN2AqoN/PnVd2z9S22oZKZ1jLTU1Acr5Lk pY/mM3wMi7bq7bQmnCiKq/tuia6nGyKH9r8wWw7suwy+LWRsGkXFTAhmCTVo7UFzE7xHDEuFW xgTpihT+el8kXQXBn11bUXcA1BHeVjxZRC6OyMvzyxUXKEb5zg1DMm0QBhe29u3RQla/755oQ SSlTzLG7/kLtzGTPuyLEi9ieCTuEejDH22Hcy6Hhnp76/0tLn3FG68mVi9wl/1Ro/RLm6vil1 Mou5IVcaqDRX5DmaASX2EsSpO2hQbjLaQih3S3+y0T8BS5y/74TRdIpeJmFal83a1fZ/4dgZ2 kjgzGgF Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2018-05-17 at 07:03 -0700, Paul E. McKenney wrote: > On Tue, May 15, 2018 at 06:30:26AM +0200, Mike Galbraith wrote: > > > > Something like so perhaps? Mike, can you play around with that? Could > > > burn your granny and eat your cookies. > > > > Did this get queued anywhere? > > I have not queued it, but given Peter's Signed-off-by and your Tested-by > I would be happy to do so. Here's the later. Tested-by: Mike Galbraith > > > diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c > > > index 7468de429087..07360523c3ce 100644 > > > --- a/arch/x86/kernel/cpu/mtrr/main.c > > > +++ b/arch/x86/kernel/cpu/mtrr/main.c > > > @@ -793,6 +793,9 @@ void mtrr_ap_init(void) > > > > > > if (!use_intel() || mtrr_aps_delayed_init) > > > return; > > > + > > > + rcu_cpu_starting(smp_processor_id()); > > > + > > > /* > > > * Ideally we should hold mtrr_mutex here to avoid mtrr entries > > > * changed, but this routine will be called in cpu boot time, > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 2a734692a581..4dab46950fdb 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -3775,6 +3775,8 @@ int rcutree_dead_cpu(unsigned int cpu) > > > return 0; > > > } > > > > > > +static DEFINE_PER_CPU(int, rcu_cpu_started); > > > + > > > /* > > > * Mark the specified CPU as being online so that subsequent grace periods > > > * (both expedited and normal) will wait on it. Note that this means that > > > @@ -3796,6 +3798,11 @@ void rcu_cpu_starting(unsigned int cpu) > > > struct rcu_node *rnp; > > > struct rcu_state *rsp; > > > > > > + if (per_cpu(rcu_cpu_started, cpu)) > > > + return; > > > + > > > + per_cpu(rcu_cpu_started, cpu) = 1; > > > + > > > for_each_rcu_flavor(rsp) { > > > rdp = per_cpu_ptr(rsp->rda, cpu); > > > rnp = rdp->mynode; > > > @@ -3852,6 +3859,8 @@ void rcu_report_dead(unsigned int cpu) > > > preempt_enable(); > > > for_each_rcu_flavor(rsp) > > > rcu_cleanup_dying_idle_cpu(cpu, rsp); > > > + > > > + per_cpu(rcu_cpu_started, cpu) = 0; > > > } > > > > > > /* Migrate the dead CPU's callbacks to the current CPU. */ > > >