Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751862AbdFHV2V (ORCPT ); Thu, 8 Jun 2017 17:28:21 -0400 Received: from sub5.mail.dreamhost.com ([208.113.200.129]:56053 "EHLO homiemail-a46.g.dreamhost.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751782AbdFHV2T (ORCPT ); Thu, 8 Jun 2017 17:28:19 -0400 Date: Thu, 8 Jun 2017 14:28:14 -0700 From: Krister Johansen To: "Paul E. McKenney" Cc: Krister Johansen , linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@efficios.com, josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com, stable@vger.kernel.org, gregkh@linuxfoundation.org Subject: Re: [PATCH tip/core/rcu 45/88] rcu: Add memory barriers for NOCB leader wakeup Message-ID: <20170608212814.GD2553@templeofstupid.com> References: <20170525215934.GA11578@linux.vnet.ibm.com> <1495749601-21574-45-git-send-email-paulmck@linux.vnet.ibm.com> <20170608201148.GA2553@templeofstupid.com> <20170608205500.GC3721@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170608205500.GC3721@linux.vnet.ibm.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1928 Lines: 41 On Thu, Jun 08, 2017 at 01:55:00PM -0700, Paul E. McKenney wrote: > On Thu, Jun 08, 2017 at 01:11:48PM -0700, Krister Johansen wrote: > > May I impose upon you to CC this patch to stable, and tag it as fixing > > abedf8e241? I ran into this on a production 4.9 branch. When I > > debugged it, I discovered that it went all the way back to 4.6. The > > tl;dr is that at least for some environments, the missed wakeup > > manifests itself as a series of hung-task warnings to console and if I'm > > unlucky it can also generate a hang that can block interactive logins > > via ssh. > > Interesting! This is the first that I have heard that this was anything > other than a theoretical bug. To the comment in your second URL, it is > wise to recall that a seismologist was in fact arrested for failing to > predict an earthquake. Later acquitted/pardoned/whatever, but arrested > nonetheless. ;-) Point taken. I do realize that we all make mistakes, and certainly I do too. Perhaps I should have said that my survey of current callers of swake_up() was enough to convince me that I didn't have an immediate problem elsewhere, but that I'm not familiar enough with the code base to make that statement with a lot of authority. The concern being that if the patch came from RT-linux where the barrier was present in swake_up(), are there other places where swake_up() callers still assume this is being handled on their behalf? As part of this, I also pondered whether I should add a comment around swake_up(), similar to what's already there for waitqueue_active. I wasn't sure how subtle this is for other consumers, though. > Silliness aside, does my patch actually fix your problem in practice as > well as in theory? If so, may I have your Tested-by? Yes, it absolutely does. Consider it given: Tested-by: Krister Johansen > Impressive investigative effort, by the way! Thanks! -K