Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752663AbcD0Sbs (ORCPT ); Wed, 27 Apr 2016 14:31:48 -0400 Received: from ns.lynxeye.de ([87.118.118.114]:37991 "EHLO lynxeye.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751866AbcD0Sbq (ORCPT ); Wed, 27 Apr 2016 14:31:46 -0400 Message-ID: <1461781898.2516.10.camel@lynxeye.de> Subject: Re: [PATCH] xfs: idle aild if the AIL is pushed up to the target LSN From: Lucas Stach To: Dave Chinner Cc: Brian Foster , linux-kernel@vger.kernel.org, xfs@oss.sgi.com Date: Wed, 27 Apr 2016 20:31:38 +0200 In-Reply-To: <20160425230833.GC18496@dastard> References: <1461570163-4083-1-git-send-email-dev@lynxeye.de> <20160425142444.GC33882@bfoster.bfoster> <1461607897.2364.27.camel@lynxeye.de> <20160425230833.GC18496@dastard> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.2 (3.18.5.2-1.fc23) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1904 Lines: 44 Am Dienstag, den 26.04.2016, 09:08 +1000 schrieb Dave Chinner: [...] > > > > > > > > That said, I'm not sure whether there's a notable benefit of > > > idling > > > for > > > 50ms over just scheduling out when we've hit the target lsn. It > > > seems > > > like that anybody who pushes the target forward again is going to > > > wake > > > up the thread anyways. On the other hand, if the fs is idle the > > > thread > > > will eventually schedule out indefinitely.  > > Is this a problem? The patch tries to do exactly that: schedule out > > aild indefinitely when there is no more work to do as nobody is > > pushing > > the target LSN forward. > If the filesystem is slowly being dirtied, then the aild should't > really idle at all.i > > Keep in mind that the xfsaild has multiple functions, one of which > is a watchdog that catches log space stalls that would otherwise > hang the filesystem. Every time we've removed the watchdog function > (i.e.  agressively idle the aild) we've had users report random, > unreproducable hangs/stalls that have gone away when the watchdog > function (i.e. don't idle until the log is covered and completely > idle) was re-instated... > I can only see xfsaild_push() doing any work after it has hit the target LSN if something moves the target LSN forward. You say that aggressively idling aild might produce log stalls, which would imply there are races in the code where a code path that moves the target LSN forward doesn't properly wake up aild. Wouldn't this problem also be present when doing non-aggressive idle of aild, just the probability of hitting the issue being reduced significantly? The commit that re-enabled non-aggressive aild idle especially mentions some races that have been fixed and I think those fixes should allow for agressive aild idle. If they are insufficient it wouldn't be safe to idle aild at all, right? Regards, Lucas