Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751478AbdFHUzI (ORCPT ); Thu, 8 Jun 2017 16:55:08 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:55449 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751392AbdFHUzG (ORCPT ); Thu, 8 Jun 2017 16:55:06 -0400 Date: Thu, 8 Jun 2017 13:55:00 -0700 From: "Paul E. McKenney" To: Krister Johansen Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com, stable@vger.kernel.org, gregkh@linuxfoundation.org Subject: Re: [PATCH tip/core/rcu 45/88] rcu: Add memory barriers for NOCB leader wakeup Reply-To: paulmck@linux.vnet.ibm.com References: <20170525215934.GA11578@linux.vnet.ibm.com> <1495749601-21574-45-git-send-email-paulmck@linux.vnet.ibm.com> <20170608201148.GA2553@templeofstupid.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170608201148.GA2553@templeofstupid.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17060820-0036-0000-0000-000002248084 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007196; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00872033; UDB=6.00433801; IPR=6.00652050; BA=6.00005406; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015750; XFM=3.00000015; UTC=2017-06-08 20:55:04 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17060820-0037-0000-0000-000040A74D21 Message-Id: <20170608205500.GC3721@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-08_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706080368 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2515 Lines: 56 On Thu, Jun 08, 2017 at 01:11:48PM -0700, Krister Johansen wrote: > Hi Paul, > > On Thu, May 25, 2017 at 02:59:18PM -0700, Paul E. McKenney wrote: > > Wait/wakeup operations do not guarantee ordering on their own. Instead, > > either locking or memory barriers are required. This commit therefore > > adds memory barriers to wake_nocb_leader() and nocb_leader_wait(). > > > > Signed-off-by: Paul E. McKenney > > --- > > kernel/rcu/tree_plugin.h | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > index 0b1042545116..573fbe9640a0 100644 > > --- a/kernel/rcu/tree_plugin.h > > +++ b/kernel/rcu/tree_plugin.h > > @@ -1810,6 +1810,7 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force) > > if (READ_ONCE(rdp_leader->nocb_leader_sleep) || force) { > > /* Prior smp_mb__after_atomic() orders against prior enqueue. */ > > WRITE_ONCE(rdp_leader->nocb_leader_sleep, false); > > + smp_mb(); /* ->nocb_leader_sleep before swake_up(). */ > > swake_up(&rdp_leader->nocb_wq); > > } > > } > > @@ -2064,6 +2065,7 @@ static void nocb_leader_wait(struct rcu_data *my_rdp) > > * nocb_gp_head, where they await a grace period. > > */ > > gotcbs = false; > > + smp_mb(); /* wakeup before ->nocb_head reads. */ > > for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_follower) { > > rdp->nocb_gp_head = READ_ONCE(rdp->nocb_head); > > if (!rdp->nocb_gp_head) > > May I impose upon you to CC this patch to stable, and tag it as fixing > abedf8e241? I ran into this on a production 4.9 branch. When I > debugged it, I discovered that it went all the way back to 4.6. The > tl;dr is that at least for some environments, the missed wakeup > manifests itself as a series of hung-task warnings to console and if I'm > unlucky it can also generate a hang that can block interactive logins > via ssh. Interesting! This is the first that I have heard that this was anything other than a theoretical bug. To the comment in your second URL, it is wise to recall that a seismologist was in fact arrested for failing to predict an earthquake. Later acquitted/pardoned/whatever, but arrested nonetheless. ;-) https://www.theguardian.com/world/2012/oct/23/jailing-italian-seismologists-scientific-community Silliness aside, does my patch actually fix your problem in practice as well as in theory? If so, may I have your Tested-by? Impressive investigative effort, by the way! Thanx, Paul