2006-08-25 05:24:47

by NeilBrown

[permalink] [raw]
Subject: Re: RFC - how to balance Dirty+Writeback in the face of slow writeback.

On Tuesday August 22, [email protected] wrote:
> On Mon, Aug 21, 2006 at 05:24:14PM +1000, Neil Brown wrote:
> > So my feeling (at the moment) is that balance_dirty_pages should look
> > like:
> >
> > if below threshold
> > return
> > writeback_inodes({.bdi = mapping->backing_dev_info)} )
> >
> > while (above threshold + 10%)
> > writeback_inodes(.bdi = NULL)
> > blk_congestion_wait
> >
> > and all bdis should impose a queue limit.
>
> I don't really like the "+ 10%" in there - it's too rubbery given
> the range of memory sizes Linux supports (think of an Altix with
> several TBs of RAM in it ;). With bdis imposing a queue limit, the
> number of writeback pages should be bound and so we shouldn't need
> headroom like this.

I had that there precisely because some BDIs are not bounded - nfs in
particular (which is what started this whole thread).
I think I'm now convinced that nfs really need to limit its writeout
queue.

>
> Hmmm - the above could put the writer to sleep on the request queue
> of the slow device that holds all dirty+writeback. This could
> effectively slow all writers down to the rate of the slowest device
> in the system as they all attempt to do blocking writeback on the
> only dirty bdi (the really slow one).


>
> AFAICT, all we need to do is prevent interactions between bdis and
> the current problem is that we loop on clean bdis waiting for slow
> dirty ones to drain.
>
> My thoughts are along the lines of a decay in nr_to_write between
> loop iterations when we don't write out enough pages (i.e. clean
> bdi) so we break out of the loop sooner rather than later.

I don't understand the purpose of the decay. Once you are sure the
bdi is clean, why not break out of the loop straight away?

Also, your code is a little confusing. The
pages_written += write_chunk - wbc.nr_to_write
in the loop assumes that wbc.nr_to_write equalled write_chunk just
before the call to writeback_inodes, however as you have moved the
initialisation of wbc out of the loop, this is no longer true.

So I would like us to break out of the loop as soon as there is good
reason to believe the bdi is clean.

So maybe something like this..
Note that we *must* have bounded queue on all bdis or this patch can
cause substantial badness.

NeilBrown

Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./mm/page-writeback.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff .prev/mm/page-writeback.c ./mm/page-writeback.c
--- .prev/mm/page-writeback.c 2006-08-25 15:18:37.000000000 +1000
+++ ./mm/page-writeback.c 2006-08-25 15:22:39.000000000 +1000
@@ -187,7 +187,7 @@ static void balance_dirty_pages(struct a
.bdi = bdi,
.sync_mode = WB_SYNC_NONE,
.older_than_this = NULL,
- .nr_to_write = write_chunk,
+ .nr_to_write = write_chunk - pages_written,
.range_cyclic = 1,
};

@@ -217,10 +217,13 @@ static void balance_dirty_pages(struct a
global_page_state(NR_WRITEBACK)
<= dirty_thresh)
break;
- pages_written += write_chunk - wbc.nr_to_write;
+ if (pages_written == write_chunk - wbc.nr_to_write)
+ break; /* couldn't write - must be clean */
+ pages_written = write_chunk - wbc.nr_to_write;
if (pages_written >= write_chunk)
break; /* We've done our duty */
- }
+ } else
+ break;
blk_congestion_wait(WRITE, HZ/10);
}


2006-08-28 01:56:16

by David Chinner

[permalink] [raw]
Subject: Re: RFC - how to balance Dirty+Writeback in the face of slow writeback.

On Fri, Aug 25, 2006 at 03:24:47PM +1000, Neil Brown wrote:
> On Tuesday August 22, [email protected] wrote:
> > AFAICT, all we need to do is prevent interactions between bdis and
> > the current problem is that we loop on clean bdis waiting for slow
> > dirty ones to drain.
> >
> > My thoughts are along the lines of a decay in nr_to_write between
> > loop iterations when we don't write out enough pages (i.e. clean
> > bdi) so we break out of the loop sooner rather than later.
>
> I don't understand the purpose of the decay. Once you are sure the
> bdi is clean, why not break out of the loop straight away?

Simply to slow down the rate at which any process is dirtying
memory. The decay only becomes active when you're writing to a
clean device when there are lots of dirty pages on a slow device,
otherwise it's a no-op.

To illustrate the problem of breaking straight out of the throttle
loop, even though we hit the dirty rate limit we may have
dirtied pages on multiple bdis but we are only flushing on one of
them. Hence we could potentially trigger increasing numbers of
dirty pages if we don't back off in some way when throttling here
even though the device we throttled on was clean.

e.g. Think of writing data to a slow device, then a log entry to a fast
device, and every time the write to the fast device triggers the
throttling which gets cleaned and we go and dirty more pages on
the slow device immediately without throttling....

> Also, your code is a little confusing. The

Sorry, it was a quick hack to illustrate my thinking.....

> So I would like us to break out of the loop as soon as there is good
> reason to believe the bdi is clean.

Which was exactly my line of thinking, but tempered by the fact that
just breaking out of the loop could introduce a nasty problem....

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group