2016-04-25 07:51:15

by Lucas Stach

[permalink] [raw]
Subject: [PATCH] xfs: idle aild if the AIL is pushed up to the target LSN

The current logic only idles aild if the AIL is empty, rescheduling
the worker with a timeout of 50ms otherwise. If the target LSN isn't
moved forward, the worker will not make any progress as it only
pushes the AIL up to the target LSN, leading to the empty AIL
condition to only be met after the log worker pushed the complete
AIL. As the log worker runs on the scale of minutes, this leaves
aild in the "no progress, 50ms timeout" state for extended periods of
time, causing many unecessary wakeups.

Fix this by idling aild as soon as the AIL is pushed up to the target
LSN. All code paths that move the target LSN forward already ensure
that aild is woken up again when needed.

Signed-off-by: Lucas Stach <[email protected]>
---
I'm pretty sure that the above deduction is correct, but as this my
first real encounter with the XFS code base, someone with a bit more
knowledge than me should give this some thought.
---
fs/xfs/xfs_trans_ail.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index d6c9c3e..7eca852 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -495,6 +495,7 @@ xfsaild(
{
struct xfs_ail *ailp = data;
long tout = 0; /* milliseconds */
+ struct xfs_log_item *lip;

current->flags |= PF_MEMALLOC;
set_freezable();
@@ -508,7 +509,8 @@ xfsaild(
spin_lock(&ailp->xa_lock);

/*
- * Idle if the AIL is empty and we are not racing with a target
+ * Idle if the AIL is empty or pushed up to the requested
+ * target LSN and we are not racing with a target
* update. We check the AIL after we set the task to a sleep
* state to guarantee that we either catch an xa_target update
* or that a wake_up resets the state to TASK_RUNNING.
@@ -517,7 +519,8 @@ xfsaild(
* The barrier matches the xa_target update in xfs_ail_push().
*/
smp_rmb();
- if (!xfs_ail_min(ailp) &&
+ lip = xfs_ail_min(ailp);
+ if ((!lip || XFS_LSN_CMP(lip->li_lsn, ailp->xa_target) >= 0) &&
ailp->xa_target == ailp->xa_target_prev) {
spin_unlock(&ailp->xa_lock);
freezable_schedule();
--
2.5.5


2016-04-25 14:24:50

by Brian Foster

[permalink] [raw]
Subject: Re: [PATCH] xfs: idle aild if the AIL is pushed up to the target LSN

On Mon, Apr 25, 2016 at 09:42:43AM +0200, Lucas Stach wrote:
> The current logic only idles aild if the AIL is empty, rescheduling
> the worker with a timeout of 50ms otherwise. If the target LSN isn't
> moved forward, the worker will not make any progress as it only
> pushes the AIL up to the target LSN, leading to the empty AIL
> condition to only be met after the log worker pushed the complete
> AIL. As the log worker runs on the scale of minutes, this leaves
> aild in the "no progress, 50ms timeout" state for extended periods of
> time, causing many unecessary wakeups.
>

Where do you get the "scale of minutes" from? I believe the default log
worker interval is 30s, though it can be increased by the admin.

The other thing to note is that the AIL target is also driven by demand
on log space. E.g., see xlog_grant_push_ail(), which executes when a
transaction attempts to reserve log space that isn't currently available
in the log.

That said, I'm not sure whether there's a notable benefit of idling for
50ms over just scheduling out when we've hit the target lsn. It seems
like that anybody who pushes the target forward again is going to wake
up the thread anyways. On the other hand, if the fs is idle the thread
will eventually schedule out indefinitely. Have you observed problematic
behavior due to the current heuristic?

> Fix this by idling aild as soon as the AIL is pushed up to the target
> LSN. All code paths that move the target LSN forward already ensure
> that aild is woken up again when needed.
>
> Signed-off-by: Lucas Stach <[email protected]>
> ---
> I'm pretty sure that the above deduction is correct, but as this my
> first real encounter with the XFS code base, someone with a bit more
> knowledge than me should give this some thought.
> ---
> fs/xfs/xfs_trans_ail.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index d6c9c3e..7eca852 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -495,6 +495,7 @@ xfsaild(
> {
> struct xfs_ail *ailp = data;
> long tout = 0; /* milliseconds */
> + struct xfs_log_item *lip;
>
> current->flags |= PF_MEMALLOC;
> set_freezable();
> @@ -508,7 +509,8 @@ xfsaild(
> spin_lock(&ailp->xa_lock);
>
> /*
> - * Idle if the AIL is empty and we are not racing with a target
> + * Idle if the AIL is empty or pushed up to the requested
> + * target LSN and we are not racing with a target
> * update. We check the AIL after we set the task to a sleep
> * state to guarantee that we either catch an xa_target update
> * or that a wake_up resets the state to TASK_RUNNING.
> @@ -517,7 +519,8 @@ xfsaild(
> * The barrier matches the xa_target update in xfs_ail_push().
> */
> smp_rmb();
> - if (!xfs_ail_min(ailp) &&
> + lip = xfs_ail_min(ailp);
> + if ((!lip || XFS_LSN_CMP(lip->li_lsn, ailp->xa_target) >= 0) &&

I think you'd want to use > 0, as == 0 suggests you have an item that
matches the target.

FWIW, another option might be to increase the timeout to something much
larger than the current 50ms when we've hit the target. See towards the
end of xfsaild_push() where tout is determined based on the state of the
push (and we already have logic to check against xa_target).

Brian

> ailp->xa_target == ailp->xa_target_prev) {
> spin_unlock(&ailp->xa_lock);
> freezable_schedule();
> --
> 2.5.5
>
> _______________________________________________
> xfs mailing list
> [email protected]
> http://oss.sgi.com/mailman/listinfo/xfs

2016-04-25 18:11:45

by Lucas Stach

[permalink] [raw]
Subject: Re: [PATCH] xfs: idle aild if the AIL is pushed up to the target LSN

Am Montag, den 25.04.2016, 10:24 -0400 schrieb Brian Foster:
> On Mon, Apr 25, 2016 at 09:42:43AM +0200, Lucas Stach wrote:
> >
> > The current logic only idles aild if the AIL is empty, rescheduling
> > the worker with a timeout of 50ms otherwise. If the target LSN
> > isn't
> > moved forward, the worker will not make any progress as it only
> > pushes the AIL up to the target LSN, leading to the empty AIL
> > condition to only be met after the log worker pushed the complete
> > AIL. As the log worker runs on the scale of minutes, this leaves
> > aild in the "no progress, 50ms timeout" state for extended periods
> > of
> > time, causing many unecessary wakeups.
> >
> Where do you get the "scale of minutes" from? I believe the default
> log
> worker interval is 30s, though it can be increased by the admin.
>
Uh, right. It's 30s, which is still a very long time when we consider
that new items can be added to the AIL right after the log worker
forced it out.

> The other thing to note is that the AIL target is also driven by
> demand
> on log space. E.g., see xlog_grant_push_ail(), which executes when a
> transaction attempts to reserve log space that isn't currently
> available
> in the log.
>
And that's where I think the problem stems from: the above function
pushes the AIL only up to a target LSN to make room for the required
log space. Only the log worker empties the AIL completely by issuing a
xfs_ail_push_all_sync().

> That said, I'm not sure whether there's a notable benefit of idling
> for
> 50ms over just scheduling out when we've hit the target lsn. It seems
> like that anybody who pushes the target forward again is going to
> wake
> up the thread anyways. On the other hand, if the fs is idle the
> thread
> will eventually schedule out indefinitely.

Is this a problem? The patch tries to do exactly that: schedule out
aild indefinitely when there is no more work to do as nobody is pushing
the target LSN forward.

On an idle FS this will stop aild when the target LSN is hit, will wake
it up again as soon as the log worker requests the complete AIL to be
pushed out, then it goes back to sleep indefinitely until the log fills
up again.

> Have you observed problematic
> behavior due to the current heuristic?
>
Yes. On a FS with a low, but constant metadata workload I would expect
aild to be scheduled only when the log space is running out and/or the
log worker forces a sync.
What currently happens is that aild stays in the "target LSN hit -> no
work to do, schedule a timeout each 50ms" state for almost the entire
runtime of the system. This causes about 20 CPU wakeups per second per
mountpoint, most of which are completely wasted, as there is no work to
do.

My use case may be atypical, as I'm using XFS on a laptop, but in that
case the XFS aild is actually causing most of the CPU wakeups in an
otherwise idle system, which is clearly undesirable, as those wakeups
aren't really needed as they don't lead to actual work getting done.

> >
> > Fix this by idling aild as soon as the AIL is pushed up to the
> > target
> > LSN. All code paths that move the target LSN forward already ensure
> > that aild is woken up again when needed.
> >
> > Signed-off-by: Lucas Stach <[email protected]>
> > ---
> > I'm pretty sure that the above deduction is correct, but as this my
> > first real encounter with the XFS code base, someone with a bit
> > more
> > knowledge than me should give this some thought.
> > ---
> >  fs/xfs/xfs_trans_ail.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> > index d6c9c3e..7eca852 100644
> > --- a/fs/xfs/xfs_trans_ail.c
> > +++ b/fs/xfs/xfs_trans_ail.c
> > @@ -495,6 +495,7 @@ xfsaild(
> >  {
> >   struct xfs_ail *ailp = data;
> >   long tout = 0; /* milliseconds */
> > + struct xfs_log_item *lip;
> >  
> >   current->flags |= PF_MEMALLOC;
> >   set_freezable();
> > @@ -508,7 +509,8 @@ xfsaild(
> >   spin_lock(&ailp->xa_lock);
> >  
> >   /*
> > -  * Idle if the AIL is empty and we are not racing
> > with a target
> > +  * Idle if the AIL is empty or pushed up to the
> > requested
> > +  * target LSN and we are not racing with a target
> >    * update. We check the AIL after we set the task
> > to a sleep
> >    * state to guarantee that we either catch an
> > xa_target update
> >    * or that a wake_up resets the state to
> > TASK_RUNNING.
> > @@ -517,7 +519,8 @@ xfsaild(
> >    * The barrier matches the xa_target update in
> > xfs_ail_push().
> >    */
> >   smp_rmb();
> > - if (!xfs_ail_min(ailp) &&
> > + lip = xfs_ail_min(ailp);
> > + if ((!lip || XFS_LSN_CMP(lip->li_lsn, ailp-
> > >xa_target) >= 0) &&
> I think you'd want to use > 0, as == 0 suggests you have an item that
> matches the target.
>
This check just mirrors what is done in xfsaild_push() to check if the
target has been hit. Either both sites should be changed (which would
assume the target LSN is inclusive and we should push the AIL to at
least the target) or left as is (assuming the target LSN) is exclusive
and we should push up to the target).

> FWIW, another option might be to increase the timeout to something
> much
> larger than the current 50ms when we've hit the target.

I don't think we want to extend the timeout if someone already pushed
the target LSN forward, so we would need to mirror the check for the
target LSN race here. I believe my change is less intrusive.

Regards,
Lucas

> See towards the
> end of xfsaild_push() where tout is determined based on the state of
> the
> push (and we already have logic to check against xa_target).
>
> Brian
>
> >
> >       ailp->xa_target == ailp->xa_target_prev) {
> >   spin_unlock(&ailp->xa_lock);
> >   freezable_schedule();

2016-04-25 23:09:24

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] xfs: idle aild if the AIL is pushed up to the target LSN

On Mon, Apr 25, 2016 at 08:11:37PM +0200, Lucas Stach wrote:
> Am Montag, den 25.04.2016, 10:24 -0400 schrieb Brian Foster:
> > On Mon, Apr 25, 2016 at 09:42:43AM +0200, Lucas Stach wrote:
> > >
> > > The current logic only idles aild if the AIL is empty, rescheduling
> > > the worker with a timeout of 50ms otherwise. If the target LSN
> > > isn't
> > > moved forward, the worker will not make any progress as it only
> > > pushes the AIL up to the target LSN, leading to the empty AIL
> > > condition to only be met after the log worker pushed the complete
> > > AIL. As the log worker runs on the scale of minutes, this leaves
> > > aild in the "no progress, 50ms timeout" state for extended periods
> > > of
> > > time, causing many unecessary wakeups.
> > >
> > Where do you get the "scale of minutes" from? I believe the default
> > log
> > worker interval is 30s, though it can be increased by the admin.
> >
> Uh, right. It's 30s, which is still a very long time when we consider
> that new items can be added to the AIL right after the log worker
> forced it out.

Standard writeback timeout for data and metadata.

> > The other thing to note is that the AIL target is also driven by
> > demand
> > on log space. E.g., see xlog_grant_push_ail(), which executes when a
> > transaction attempts to reserve log space that isn't currently
> > available
> > in the log.
> >
> And that's where I think the problem stems from: the above function
> pushes the AIL only up to a target LSN to make room for the required
> log space. Only the log worker empties the AIL completely by issuing a
> xfs_ail_push_all_sync().

xfs_ail_push_all(), actually, and the log worker runs every 30s by
default. i.e. we move the target to the end to the AIL to try to
empty it completely every 30s.

> > That said, I'm not sure whether there's a notable benefit of idling
> > for
> > 50ms over just scheduling out when we've hit the target lsn. It seems
> > like that anybody who pushes the target forward again is going to
> > wake
> > up the thread anyways. On the other hand, if the fs is idle the
> > thread
> > will eventually schedule out indefinitely.
>
> Is this a problem? The patch tries to do exactly that: schedule out
> aild indefinitely when there is no more work to do as nobody is pushing
> the target LSN forward.

If the filesystem is slowly being dirtied, then the aild should't
really idle at all.i

Keep in mind that the xfsaild has multiple functions, one of which
is a watchdog that catches log space stalls that would otherwise
hang the filesystem. Every time we've removed the watchdog function
(i.e. agressively idle the aild) we've had users report random,
unreproducable hangs/stalls that have gone away when the watchdog
function (i.e. don't idle until the log is covered and completely
idle) was re-instated...

> On an idle FS this will stop aild when the target LSN is hit, will wake
> it up again as soon as the log worker requests the complete AIL to be
> pushed out, then it goes back to sleep indefinitely until the log fills
> up again.

The behaviour suggests that your filesystem is not idle. The
filesystem takes up to 90s to be marked idle (log needs to be
covered, state machine takes 3x30s cycles to transition to idle
"covered" state.

If you want the filesytsem to idle quickly, then run the log worker
more frequently to get the target updated more quickly. This will
also speed up the log covering state machine as well.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2016-04-27 18:31:48

by Lucas Stach

[permalink] [raw]
Subject: Re: [PATCH] xfs: idle aild if the AIL is pushed up to the target LSN

Am Dienstag, den 26.04.2016, 09:08 +1000 schrieb Dave Chinner:
[...]
> >
> > >
> > > That said, I'm not sure whether there's a notable benefit of
> > > idling
> > > for
> > > 50ms over just scheduling out when we've hit the target lsn. It
> > > seems
> > > like that anybody who pushes the target forward again is going to
> > > wake
> > > up the thread anyways. On the other hand, if the fs is idle the
> > > thread
> > > will eventually schedule out indefinitely. 
> > Is this a problem? The patch tries to do exactly that: schedule out
> > aild indefinitely when there is no more work to do as nobody is
> > pushing
> > the target LSN forward.
> If the filesystem is slowly being dirtied, then the aild should't
> really idle at all.i
>
> Keep in mind that the xfsaild has multiple functions, one of which
> is a watchdog that catches log space stalls that would otherwise
> hang the filesystem. Every time we've removed the watchdog function
> (i.e.  agressively idle the aild) we've had users report random,
> unreproducable hangs/stalls that have gone away when the watchdog
> function (i.e. don't idle until the log is covered and completely
> idle) was re-instated...
>
I can only see xfsaild_push() doing any work after it has hit the
target LSN if something moves the target LSN forward. You say that
aggressively idling aild might produce log stalls, which would imply
there are races in the code where a code path that moves the target LSN
forward doesn't properly wake up aild.

Wouldn't this problem also be present when doing non-aggressive idle of
aild, just the probability of hitting the issue being reduced
significantly? The commit that re-enabled non-aggressive aild idle
especially mentions some races that have been fixed and I think those
fixes should allow for agressive aild idle. If they are insufficient it
wouldn't be safe to idle aild at all, right?

Regards,
Lucas

2016-04-27 22:29:55

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] xfs: idle aild if the AIL is pushed up to the target LSN

On Wed, Apr 27, 2016 at 08:31:38PM +0200, Lucas Stach wrote:
> Am Dienstag, den 26.04.2016, 09:08 +1000 schrieb Dave Chinner:
> [...]
> > >
> > > >
> > > > That said, I'm not sure whether there's a notable benefit of
> > > > idling
> > > > for
> > > > 50ms over just scheduling out when we've hit the target lsn. It
> > > > seems
> > > > like that anybody who pushes the target forward again is going to
> > > > wake
> > > > up the thread anyways. On the other hand, if the fs is idle the
> > > > thread
> > > > will eventually schedule out indefinitely.?
> > > Is this a problem? The patch tries to do exactly that: schedule out
> > > aild indefinitely when there is no more work to do as nobody is
> > > pushing
> > > the target LSN forward.
> > If the filesystem is slowly being dirtied, then the aild should't
> > really idle at all.i
> >
> > Keep in mind that the xfsaild has multiple functions, one of which
> > is a watchdog that catches log space stalls that would otherwise
> > hang the filesystem. Every time we've removed the watchdog function
> > (i.e.??agressively idle the aild) we've had users report random,
> > unreproducable hangs/stalls that have gone away when the watchdog
> > function (i.e. don't idle until the log is covered and completely
> > idle) was re-instated...
> >
> I can only see?xfsaild_push() doing any work after it has hit the
> target LSN if something moves the target LSN forward. You say that
> aggressively idling aild might produce log stalls, which would imply
> there are races in the code where a code path that moves the target LSN
> forward doesn't properly wake up aild.

Well, yes. The code is horrifically complex, there's a heap of
lockless operations along with cross-subsystem co-ordinated
operations done under different locks amongst other things to
provide scalability. History tells me that no matter whether we
*think* we've got it right, there's always another bug lurking.

> Wouldn't this problem also be present when doing non-aggressive idle of
> aild, just the probability of hitting the issue being reduced
> significantly?

Welcome to Risk Management 101.

I've got better things to do with my time than remove a safety net
and the be forced to spend days or even weeks trying to solve all
the subtle, deeply hidden problems that have been around for 20
years that are now exposed to users. That's not a productive use of
the limited amount of XFS developer's time we have available. If you
want to go ahead and do all this, I'll be happy to spend a year
teaching you about how all the log space reservation code works...

> The commit that re-enabled non-aggressive aild idle
> especially mentions some races that have been fixed and I think those
> fixes should allow for agressive aild idle. If they are insufficient it
> wouldn't be safe to idle aild at all, right?

No - the log state machine that covers (idles) the log is the one we
really care about and it guarantees that the AIL is empty. i.e. The
AIL has active items in it until the log is covered and hence, by
definition, it can't be idle until the log is covered.

Cheers,

Dave.
--
Dave Chinner
[email protected]