Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751510AbdFHXrv (ORCPT ); Thu, 8 Jun 2017 19:47:51 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:50314 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751132AbdFHXru (ORCPT ); Thu, 8 Jun 2017 19:47:50 -0400 Date: Thu, 8 Jun 2017 16:47:43 -0700 From: "Paul E. McKenney" To: Krister Johansen Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com, stable@vger.kernel.org, gregkh@linuxfoundation.org Subject: Re: [PATCH tip/core/rcu 45/88] rcu: Add memory barriers for NOCB leader wakeup Reply-To: paulmck@linux.vnet.ibm.com References: <20170525215934.GA11578@linux.vnet.ibm.com> <1495749601-21574-45-git-send-email-paulmck@linux.vnet.ibm.com> <20170608201148.GA2553@templeofstupid.com> <20170608205500.GC3721@linux.vnet.ibm.com> <20170608212814.GD2553@templeofstupid.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170608212814.GD2553@templeofstupid.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17060823-2213-0000-0000-000001D66B12 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007197; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00872060; UDB=6.00433835; IPR=6.00652107; BA=6.00005406; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015752; XFM=3.00000015; UTC=2017-06-08 23:47:47 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17060823-2214-0000-0000-0000566F4B0B Message-Id: <20170608234743.GE3721@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-08_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706080420 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2417 Lines: 56 On Thu, Jun 08, 2017 at 02:28:14PM -0700, Krister Johansen wrote: > On Thu, Jun 08, 2017 at 01:55:00PM -0700, Paul E. McKenney wrote: > > On Thu, Jun 08, 2017 at 01:11:48PM -0700, Krister Johansen wrote: > > > May I impose upon you to CC this patch to stable, and tag it as fixing > > > abedf8e241? I ran into this on a production 4.9 branch. When I > > > debugged it, I discovered that it went all the way back to 4.6. The > > > tl;dr is that at least for some environments, the missed wakeup > > > manifests itself as a series of hung-task warnings to console and if I'm > > > unlucky it can also generate a hang that can block interactive logins > > > via ssh. > > > > Interesting! This is the first that I have heard that this was anything > > other than a theoretical bug. To the comment in your second URL, it is > > wise to recall that a seismologist was in fact arrested for failing to > > predict an earthquake. Later acquitted/pardoned/whatever, but arrested > > nonetheless. ;-) > > Point taken. I do realize that we all make mistakes, and certainly I do > too. Indeed! Let's just say that the author of that email will have no trouble returning the favor, and sooner rather than later. ;-) > Perhaps I should have said that my survey of current callers of > swake_up() was enough to convince me that I didn't have an immediate > problem elsewhere, but that I'm not familiar enough with the code base > to make that statement with a lot of authority. The concern being that if > the patch came from RT-linux where the barrier was present in > swake_up(), are there other places where swake_up() callers still assume > this is being handled on their behalf? > > As part of this, I also pondered whether I should add a comment around > swake_up(), similar to what's already there for waitqueue_active. > I wasn't sure how subtle this is for other consumers, though. In my case, I assume I need barriers for swake_up(), which is why I found this bug by inspection. Still, I wouldn't mind a comment. Others might have other opinions. > > Silliness aside, does my patch actually fix your problem in practice as > > well as in theory? If so, may I have your Tested-by? > > Yes, it absolutely does. Consider it given: > > Tested-by: Krister Johansen Thank you!!! Thanx, Paul > > Impressive investigative effort, by the way! > > Thanks! > > -K >