Message-ID: <1461607897.2364.27.camel@lynxeye.de>
Subject: Re: [PATCH] xfs: idle aild if the AIL is pushed up to the target LSN
From: Lucas Stach <dev@lynxeye.de>
To: Brian Foster <bfoster@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>, linux-kernel@vger.kernel.org,
        xfs@oss.sgi.com
Date: Mon, 25 Apr 2016 20:11:37 +0200
In-Reply-To: <20160425142444.GC33882@bfoster.bfoster>
References: <1461570163-4083-1-git-send-email-dev@lynxeye.de>
	 <20160425142444.GC33882@bfoster.bfoster>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5818
Lines: 152

Am Montag, den 25.04.2016, 10:24 -0400 schrieb Brian Foster:
> On Mon, Apr 25, 2016 at 09:42:43AM +0200, Lucas Stach wrote:
> > 
> > The current logic only idles aild if the AIL is empty, rescheduling
> > the worker with a timeout of 50ms otherwise. If the target LSN
> > isn't
> > moved forward, the worker will not make any progress as it only
> > pushes the AIL up to the target LSN, leading to the empty AIL
> > condition to only be met after the log worker pushed the complete
> > AIL. As the log worker runs on the scale of minutes, this leaves
> > aild in the "no progress, 50ms timeout" state for extended periods
> > of
> > time, causing many unecessary wakeups.
> > 
> Where do you get the "scale of minutes" from? I believe the default
> log
> worker interval is 30s, though it can be increased by the admin.
> 
Uh, right. It's 30s, which is still a very long time when we consider
that new items can be added to the AIL right after the log worker
forced it out.

> The other thing to note is that the AIL target is also driven by
> demand
> on log space. E.g., see xlog_grant_push_ail(), which executes when a
> transaction attempts to reserve log space that isn't currently
> available
> in the log.
> 
And that's where I think the problem stems from: the above function
pushes the AIL only up to a target LSN to make room for the required
log space. Only the log worker empties the AIL completely by issuing a
xfs_ail_push_all_sync().

> That said, I'm not sure whether there's a notable benefit of idling
> for
> 50ms over just scheduling out when we've hit the target lsn. It seems
> like that anybody who pushes the target forward again is going to
> wake
> up the thread anyways. On the other hand, if the fs is idle the
> thread
> will eventually schedule out indefinitely. 

Is this a problem? The patch tries to do exactly that: schedule out
aild indefinitely when there is no more work to do as nobody is pushing
the target LSN forward.

On an idle FS this will stop aild when the target LSN is hit, will wake
it up again as soon as the log worker requests the complete AIL to be
pushed out, then it goes back to sleep indefinitely until the log fills
up again.

> Have you observed problematic
> behavior due to the current heuristic?
> 
Yes. On a FS with a low, but constant metadata workload I would expect
aild to be scheduled only when the log space is running out and/or the
log worker forces a sync.
What currently happens is that aild stays in the "target LSN hit -> no
work to do, schedule a timeout each 50ms" state for almost the entire
runtime of the system. This causes about 20 CPU wakeups per second per
mountpoint, most of which are completely wasted, as there is no work to
do.

My use case may be atypical, as I'm using XFS on a laptop, but in that
case the XFS aild is actually causing most of the CPU wakeups in an
otherwise idle system, which is clearly undesirable, as those wakeups
aren't really needed as they don't lead to actual work getting done.

> > 
> > Fix this by idling aild as soon as the AIL is pushed up to the
> > target
> > LSN. All code paths that move the target LSN forward already ensure
> > that aild is woken up again when needed.
> > 
> > Signed-off-by: Lucas Stach <dev@lynxeye.de>
> > ---
> > I'm pretty sure that the above deduction is correct, but as this my
> > first real encounter with the XFS code base, someone with a bit
> > more
> > knowledge than me should give this some thought.
> > ---
> >  fs/xfs/xfs_trans_ail.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> > index d6c9c3e..7eca852 100644
> > --- a/fs/xfs/xfs_trans_ail.c
> > +++ b/fs/xfs/xfs_trans_ail.c
> > @@ -495,6 +495,7 @@ xfsaild(
> >  {
> >  	struct xfs_ail	*ailp = data;
> >  	long		tout = 0;	/* milliseconds */
> > +	struct xfs_log_item *lip;
> >  
> >  	current->flags |= PF_MEMALLOC;
> >  	set_freezable();
> > @@ -508,7 +509,8 @@ xfsaild(
> >  		spin_lock(&ailp->xa_lock);
> >  
> >  		/*
> > -		 * Idle if the AIL is empty and we are not racing
> > with a target
> > +		 * Idle if the AIL is empty or pushed up to the
> > requested
> > +		 * target LSN and we are not racing with a target
> >  		 * update. We check the AIL after we set the task
> > to a sleep
> >  		 * state to guarantee that we either catch an
> > xa_target update
> >  		 * or that a wake_up resets the state to
> > TASK_RUNNING.
> > @@ -517,7 +519,8 @@ xfsaild(
> >  		 * The barrier matches the xa_target update in
> > xfs_ail_push().
> >  		 */
> >  		smp_rmb();
> > -		if (!xfs_ail_min(ailp) &&
> > +		lip = xfs_ail_min(ailp);
> > +		if ((!lip || XFS_LSN_CMP(lip->li_lsn, ailp-
> > >xa_target) >= 0) &&
> I think you'd want to use > 0, as == 0 suggests you have an item that
> matches the target.
> 
This check just mirrors what is done in xfsaild_push() to check if the
target has been hit. Either both sites should be changed (which would
assume the target LSN is inclusive and we should push the AIL to at
least the target) or left as is (assuming the target LSN) is exclusive
and we should push up to the target).

> FWIW, another option might be to increase the timeout to something
> much
> larger than the current 50ms when we've hit the target.

I don't think we want to extend the timeout if someone already pushed
the target LSN forward, so we would need to mirror the check for the
target LSN race here. I believe my change is less intrusive.

Regards,
Lucas

>  See towards the
> end of xfsaild_push() where tout is determined based on the state of
> the
> push (and we already have logic to check against xa_target).
> 
> Brian
> 
> > 
> >  		    ailp->xa_target == ailp->xa_target_prev) {
> >  			spin_unlock(&ailp->xa_lock);
> >  			freezable_schedule();