Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751744AbdF3ATE (ORCPT ); Thu, 29 Jun 2017 20:19:04 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:56876 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751333AbdF3ATC (ORCPT ); Thu, 29 Jun 2017 20:19:02 -0400 Date: Thu, 29 Jun 2017 17:18:55 -0700 From: "Paul E. McKenney" To: Jeffrey Hugo Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, pprakash@codeaurora.org, Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Jens Axboe , Sebastian Andrzej Siewior , Thomas Gleixner , Richard Cochran , Boris Ostrovsky , Richard Weinberger Subject: Re: [BUG] Deadlock due due to interactions of block, RCU, and cpu offline Reply-To: paulmck@linux.vnet.ibm.com References: <20170326232843.GA3637@linux.vnet.ibm.com> <20170327181711.GF3637@linux.vnet.ibm.com> <20170620234623.GA16200@linux.vnet.ibm.com> <20170621161853.GB3721@linux.vnet.ibm.com> <20170623033456.GA15959@linux.vnet.ibm.com> <20170628001130.GB3721@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17063000-0052-0000-0000-00000232CDC5 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007295; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000214; SDB=6.00880611; UDB=6.00439013; IPR=6.00660764; BA=6.00005447; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016015; XFM=3.00000015; UTC=2017-06-30 00:18:58 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17063000-0053-0000-0000-000051272365 Message-Id: <20170630001855.GL2393@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-29_17:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706300003 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2149 Lines: 50 On Thu, Jun 29, 2017 at 10:29:12AM -0600, Jeffrey Hugo wrote: > On 6/27/2017 6:11 PM, Paul E. McKenney wrote: > >On Tue, Jun 27, 2017 at 04:32:09PM -0600, Jeffrey Hugo wrote: > >>On 6/22/2017 9:34 PM, Paul E. McKenney wrote: > >>>On Wed, Jun 21, 2017 at 09:18:53AM -0700, Paul E. McKenney wrote: > >>>>No worries, and I am very much looking forward to seeing the results of > >>>>your testing. > >>> > >>>And please see below for an updated patch based on LKML review and > >>>more intensive testing. > >>> > >> > >>I spent some time on this today. It didn't go as I expected. I > >>validated the issue is reproducible as before on 4.11 and 4.12 rcs 1 > >>through 4. However, the version of stress-ng that I was using ran > >>into constant errors starting with rc5, making it nearly impossible > >>to make progress toward reproduction. Upgrading stress-ng to tip > >>fixes the issue, however, I've still been unable to repro the issue. > >> > >>Its my unfounded suspicion that something went in between rc4 and > >>rc5 which changed the timing, and didn't actually fix the issue. I > >>will run the test overnight for 5 hours to try to repro. > >> > >>The patch you sent appears to be based on linux-next, and appears to > >>have a number of dependencies which prevent it from cleanly applying > >>on anything current that I'm able to repro on at this time. Do you > >>want to provide a rebased version of the patch which applies to say > >>4.11? I could easily test that and report back. > > > >Here is a very lightly tested backport to v4.11. > > > > Works for me. Always reproduced the lockup within 2 minutes on stock > 4.11. With the change applied, I was able to test for 2 hours in > the same conditions, and 4 hours with the full system and not > encounter an issue. > > Feel free to add: > Tested-by: Jeffrey Hugo Applied, thank you! > I'm going to go back to 4.12-rc5 and see if I can get either repro > the issue, or identify what changed. Hopefully I can get to > linux-next and double check the original version of the change as > well. Looking forward to hearing what you find! Thanx, Paul