Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756474AbaFYWI5 (ORCPT ); Wed, 25 Jun 2014 18:08:57 -0400 Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:11045 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754995AbaFYWIz (ORCPT ); Wed, 25 Jun 2014 18:08:55 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsgGAJlHq1N5LEio/2dsb2JhbABYgw2rRAaZOgGBDRd1hAMBAQQBOhwjEAgDDgoJJQ8FJQMhE4g6B8N9FxaFTYkZB4RDBZpQk2yDVCs Date: Thu, 26 Jun 2014 08:08:40 +1000 From: Dave Chinner To: Tejun Heo Cc: Austin Schuh , xfs , linux-kernel@vger.kernel.org Subject: Re: On-stack work item completion race? (was Re: XFS crash?) Message-ID: <20140625220840.GO9508@dastard> References: <20140513063943.GQ26353@dastard> <20140513090321.GR26353@dastard> <20140624030240.GB9508@dastard> <20140624032521.GA12164@htj.dyndns.org> <20140625055641.GL9508@dastard> <20140625141836.GC26883@htj.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140625141836.GC26883@htj.dyndns.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 25, 2014 at 10:18:36AM -0400, Tejun Heo wrote: > Hello, Dave. > > On Wed, Jun 25, 2014 at 03:56:41PM +1000, Dave Chinner wrote: > > Hmmm - that's different from my understanding of what the original > > behaviour WQ_MEM_RECLAIM gave us. i.e. that WQ_MEM_RECLAIM > > workqueues had a rescuer thread created to guarantee that the > > *workqueue* could make forward progress executing work in a > > reclaim context. > > From Documentation/workqueue.txt > > WQ_MEM_RECLAIM > > All wq which might be used in the memory reclaim paths _MUST_ > have this flag set. The wq is guaranteed to have at least one > execution context regardless of memory pressure. > > So, all that's guaranteed is that the workqueue has at least one > worker executing its work items. If that one worker is serving a work > item which can't make forward progress, the workqueue is not > guaranteed to make forward progress. Adding that to the docco might be useful ;) > > > As long as a WQ_RECLAIM workqueue dosen't depend upon itself, > > > forward-progress is guaranteed. > > > > I can't find any documentation that actually defines what > > WQ_MEM_RECLAIM means, so I can't tell when or how this requirement > > came about. If it's true, then I suspect most of the WQ_MEM_RECLAIM > > workqueues in filesystems violate it. Can you point me at > > documentation/commits/code describing the constraints of > > WQ_MEM_RECLAIM and the reasons for it? > > Documentation/workqueue.txt should be it but maybe we should be more > explicit. The behavior is maintaining what the > pre-concurrency-management workqueue provided with static > per-workqueue workers. Each workqueue reserved its workers (either > one per cpu or one globally) and it only supported single level of > concurrency on each CPU. WQ_MEM_RECLAIM is providing equivalent > amount of forward progress guarantee and all the existing users > shouldn't have issues on this front. If we have grown incorrect > usages from then on, we need to fix them. Ok, so it hasn't changed. We're still usingthem like we used the original workqueues, and we never, ever provided a guarantee of forwards progress for them, either. So if the workqueues haven't changed, and we haven't changed how we use workqueues, then something else is causing all our recent problems.... Thanks for the clarification, Tejun! Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/