LinuxLists.cc - Likley stupid question on "throttle_vm

2009-11-09 15:22:04

Subject: Likley stupid question on "throttle_vm_writeout"

Hi, (please CC me on replies)

I have a likely stupid question on the function "throttle_vm_writeout". Looking at the code I find:

if (global_page_state(NR_UNSTABLE_NFS) +
global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;
congestion_wait(WRITE, HZ/10);

Shouldn't the NR_FILE_DIRTY pages be considered as well?

Cheers

Martin
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de

2009-11-09 15:26:33

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Likley stupid question on "throttle_vm_writeout"

On Mon, 2009-11-09 at 07:15 -0800, Martin Knoblauch wrote:
> Hi, (please CC me on replies)
>
> I have a likely stupid question on the function "throttle_vm_writeout". Looking at the code I find:
>
> if (global_page_state(NR_UNSTABLE_NFS) +
> global_page_state(NR_WRITEBACK) <= dirty_thresh)
> break;
> congestion_wait(WRITE, HZ/10);
>
> Shouldn't the NR_FILE_DIRTY pages be considered as well?

Ha, you just trod onto a piece of ugly I'd totally forgotten about ;-)

The intent of throttle_vm_writeout() is to limit the total pages in
writeout and to wait for them to go-away.

Everybody hates the function, nobody managed to actually come up with
anything better.

2009-11-10 02:09:01

by Fengguang Wu

[permalink] [raw]

Subject: Re: Likley stupid question on "throttle_vm_writeout"

On Mon, Nov 09, 2009 at 04:26:33PM +0100, Peter Zijlstra wrote:
> On Mon, 2009-11-09 at 07:15 -0800, Martin Knoblauch wrote:
> > Hi, (please CC me on replies)
> >
> > I have a likely stupid question on the function "throttle_vm_writeout". Looking at the code I find:
> >
> > if (global_page_state(NR_UNSTABLE_NFS) +
> > global_page_state(NR_WRITEBACK) <= dirty_thresh)
> > break;
> > congestion_wait(WRITE, HZ/10);
> >
> > Shouldn't the NR_FILE_DIRTY pages be considered as well?
>
> Ha, you just trod onto a piece of ugly I'd totally forgotten about ;-)
>
> The intent of throttle_vm_writeout() is to limit the total pages in
> writeout and to wait for them to go-away.

Like this:

vmscan fast => large NR_WRITEBACK => throttle vmscan based on it

> Everybody hates the function, nobody managed to actually come up with
> anything better.

btw, here is another reason to limit NR_WRITEBACK: I saw many
throttle_vm_writeout() waits if there is no wait queue to limit
NR_WRITEBACK (eg. NFS). In that case the (steadily) big NR_WRITEBACK
is _not_ caused by fast vmscan..

Thanks,
Fengguang

2009-11-10 06:55:23

by KOSAKI Motohiro

[permalink] [raw]

Subject: Re: Likley stupid question on "throttle_vm_writeout"

> On Mon, Nov 09, 2009 at 04:26:33PM +0100, Peter Zijlstra wrote:
> > On Mon, 2009-11-09 at 07:15 -0800, Martin Knoblauch wrote:
> > > Hi, (please CC me on replies)
> > >
> > > I have a likely stupid question on the function "throttle_vm_writeout". Looking at the code I find:
> > >
> > > if (global_page_state(NR_UNSTABLE_NFS) +
> > > global_page_state(NR_WRITEBACK) <= dirty_thresh)
> > > break;
> > > congestion_wait(WRITE, HZ/10);
> > >
> > > Shouldn't the NR_FILE_DIRTY pages be considered as well?
> >
> > Ha, you just trod onto a piece of ugly I'd totally forgotten about ;-)
> >
> > The intent of throttle_vm_writeout() is to limit the total pages in
> > writeout and to wait for them to go-away.
>
> Like this:
>
> vmscan fast => large NR_WRITEBACK => throttle vmscan based on it
>
> > Everybody hates the function, nobody managed to actually come up with
> > anything better.
>
> btw, here is another reason to limit NR_WRITEBACK: I saw many
> throttle_vm_writeout() waits if there is no wait queue to limit
> NR_WRITEBACK (eg. NFS). In that case the (steadily) big NR_WRITEBACK
> is _not_ caused by fast vmscan..

btw, this explanation point out why we should remove other bare congestion_wait()
in reclaim path.
At least, I observed the congestion_wait() in do_try_to_free_pages() decrease
reclaim performance very much.

2009-11-10 12:01:43

by Martin Knoblauch

[permalink] [raw]

Subject: Re: Likley stupid question on "throttle_vm_writeout"

----- Original Message ----

> From: Wu Fengguang <[email protected]>
> To: Peter Zijlstra <[email protected]>
> Cc: Martin Knoblauch <[email protected]>; [email protected]
> Sent: Tue, November 10, 2009 3:08:58 AM
> Subject: Re: Likley stupid question on "throttle_vm_writeout"
>
> On Mon, Nov 09, 2009 at 04:26:33PM +0100, Peter Zijlstra wrote:
> > On Mon, 2009-11-09 at 07:15 -0800, Martin Knoblauch wrote:
> > > Hi, (please CC me on replies)
> > >
> > > I have a likely stupid question on the function "throttle_vm_writeout".
> Looking at the code I find:
> > >
> > > if (global_page_state(NR_UNSTABLE_NFS) +
> > > global_page_state(NR_WRITEBACK) <= dirty_thresh)
> > > break;
> > > congestion_wait(WRITE, HZ/10);
> > >
> > > Shouldn't the NR_FILE_DIRTY pages be considered as well?
> >
> > Ha, you just trod onto a piece of ugly I'd totally forgotten about ;-)
> >
> > The intent of throttle_vm_writeout() is to limit the total pages in
> > writeout and to wait for them to go-away.
>
> Like this:
>
> vmscan fast => large NR_WRITEBACK => throttle vmscan based on it
>
> > Everybody hates the function, nobody managed to actually come up with
> > anything better.
>
> btw, here is another reason to limit NR_WRITEBACK: I saw many
> throttle_vm_writeout() waits if there is no wait queue to limit
> NR_WRITEBACK (eg. NFS). In that case the (steadily) big NR_WRITEBACK
> is _not_ caused by fast vmscan..
>

That is exactely what made me look again into the code. My observation is that when doing something like:

dd if=/dev/zero of=fast-local-disk bs=1M count=15000

most of the "dirty" pages are in NR_FILE_DIRTY with some relatively small amount (10% or so) in NR_WRITEBACK. If I do:

dd if=/dev/zero of=some-nfs-mount bs=1M count=15000

NR_WRITEBACK almost immediatelly goes up to dirty_ratio, with NR_UNSTABLE_NFS small. Over time NR_UNSTABLE_NFS grows, but is always lower than NR_WRITEBACK (maybe 40/60).

But don't ask what happens if I do both in parallel.... The local IO really slows to a crawl and sometimes the system just becomes very unresponsive. Have we heard that before? :-)

Somehow I have the impression that NFS writeout is able to absolutely dominate the dirty pages to an extent that the system is unusable.

Cheers
Martin

2009-11-10 13:08:32

by Fengguang Wu

[permalink] [raw]

Subject: Re: Likley stupid question on "throttle_vm_writeout"

On Tue, Nov 10, 2009 at 08:01:47PM +0800, Martin Knoblauch wrote:
> ----- Original Message ----
>
> > From: Wu Fengguang <[email protected]>
> > To: Peter Zijlstra <[email protected]>
> > Cc: Martin Knoblauch <[email protected]>; [email protected]
> > Sent: Tue, November 10, 2009 3:08:58 AM
> > Subject: Re: Likley stupid question on "throttle_vm_writeout"
> >
> > On Mon, Nov 09, 2009 at 04:26:33PM +0100, Peter Zijlstra wrote:
> > > On Mon, 2009-11-09 at 07:15 -0800, Martin Knoblauch wrote:
> > > > Hi, (please CC me on replies)
> > > >
> > > > I have a likely stupid question on the function "throttle_vm_writeout".
> > Looking at the code I find:
> > > >
> > > > if (global_page_state(NR_UNSTABLE_NFS) +
> > > > global_page_state(NR_WRITEBACK) <= dirty_thresh)
> > > > break;
> > > > congestion_wait(WRITE, HZ/10);
> > > >
> > > > Shouldn't the NR_FILE_DIRTY pages be considered as well?
> > >
> > > Ha, you just trod onto a piece of ugly I'd totally forgotten about ;-)
> > >
> > > The intent of throttle_vm_writeout() is to limit the total pages in
> > > writeout and to wait for them to go-away.
> >
> > Like this:
> >
> > vmscan fast => large NR_WRITEBACK => throttle vmscan based on it
> >
> > > Everybody hates the function, nobody managed to actually come up with
> > > anything better.
> >
> > btw, here is another reason to limit NR_WRITEBACK: I saw many
> > throttle_vm_writeout() waits if there is no wait queue to limit
> > NR_WRITEBACK (eg. NFS). In that case the (steadily) big NR_WRITEBACK
> > is _not_ caused by fast vmscan..
> >
>
> That is exactely what made me look again into the code. My observation is that when doing something like:
>
> dd if=/dev/zero of=fast-local-disk bs=1M count=15000
>
> most of the "dirty" pages are in NR_FILE_DIRTY with some relatively small amount (10% or so) in NR_WRITEBACK. If I do:
>
> dd if=/dev/zero of=some-nfs-mount bs=1M count=15000
>
> NR_WRITEBACK almost immediatelly goes up to dirty_ratio, with
> NR_UNSTABLE_NFS small. Over time NR_UNSTABLE_NFS grows, but is
> always lower than NR_WRITEBACK (maybe 40/60).

This is interesting, though I don't see explicit NFS code to limit
NR_UNSTABLE_NFS. Maybe there are some implicit rules.

> But don't ask what happens if I do both in parallel.... The local
> IO really slows to a crawl and sometimes the system just becomes
> very unresponsive. Have we heard that before? :-)

You may be the first reporter as far as I can tell :)

> Somehow I have the impression that NFS writeout is able to
> absolutely dominate the dirty pages to an extent that the system is
> unusable.

This is why I want to limit NR_WRITEBACK for NFS:

[PATCH] NFS: introduce writeback wait queue
http://lkml.org/lkml/2009/10/3/198

Thanks,
Fengguang

2009-11-10 16:11:33

by Martin Knoblauch

[permalink] [raw]

Subject: Re: Likley stupid question on "throttle_vm_writeout"

----- Original Message ----

> From: Wu Fengguang <[email protected]>
> To: Martin Knoblauch <[email protected]>
> Cc: Peter Zijlstra <[email protected]>; "[email protected]" <[email protected]>; "Myklebust, Trond" <[email protected]>; Peter Staubach <[email protected]>; [email protected]
> Sent: Tue, November 10, 2009 2:08:18 PM
> Subject: Re: Likley stupid question on "throttle_vm_writeout"
>
> On Tue, Nov 10, 2009 at 08:01:47PM +0800, Martin Knoblauch wrote:
> > ----- Original Message ----
> >
> > > From: Wu Fengguang
> > > To: Peter Zijlstra
> > > Cc: Martin Knoblauch ; [email protected]
> > > Sent: Tue, November 10, 2009 3:08:58 AM
> > > Subject: Re: Likley stupid question on "throttle_vm_writeout"
> > >
> > > On Mon, Nov 09, 2009 at 04:26:33PM +0100, Peter Zijlstra wrote:
> > > > On Mon, 2009-11-09 at 07:15 -0800, Martin Knoblauch wrote:
> > > > > Hi, (please CC me on replies)
> > > > >
> > > > > I have a likely stupid question on the function "throttle_vm_writeout".
>
> > > Looking at the code I find:
> > > > >
> > > > > if (global_page_state(NR_UNSTABLE_NFS) +
> > > > > global_page_state(NR_WRITEBACK) <= dirty_thresh)
> > > > > break;
> > > > > congestion_wait(WRITE, HZ/10);
> > > > >
> > > > > Shouldn't the NR_FILE_DIRTY pages be considered as well?
> > > >
> > > > Ha, you just trod onto a piece of ugly I'd totally forgotten about ;-)
> > > >
> > > > The intent of throttle_vm_writeout() is to limit the total pages in
> > > > writeout and to wait for them to go-away.
> > >
> > > Like this:
> > >
> > > vmscan fast => large NR_WRITEBACK => throttle vmscan based on it
> > >
> > > > Everybody hates the function, nobody managed to actually come up with
> > > > anything better.
> > >
> > > btw, here is another reason to limit NR_WRITEBACK: I saw many
> > > throttle_vm_writeout() waits if there is no wait queue to limit
> > > NR_WRITEBACK (eg. NFS). In that case the (steadily) big NR_WRITEBACK
> > > is _not_ caused by fast vmscan..
> > >
> >
> > That is exactely what made me look again into the code. My observation is
> that when doing something like:
> >
> > dd if=/dev/zero of=fast-local-disk bs=1M count=15000
> >
> > most of the "dirty" pages are in NR_FILE_DIRTY with some relatively small
> amount (10% or so) in NR_WRITEBACK. If I do:
> >
> > dd if=/dev/zero of=some-nfs-mount bs=1M count=15000
> >
> > NR_WRITEBACK almost immediatelly goes up to dirty_ratio, with
> > NR_UNSTABLE_NFS small. Over time NR_UNSTABLE_NFS grows, but is
> > always lower than NR_WRITEBACK (maybe 40/60).
>
> This is interesting, though I don't see explicit NFS code to limit
> NR_UNSTABLE_NFS. Maybe there are some implicit rules.
>
> > But don't ask what happens if I do both in parallel.... The local
> > IO really slows to a crawl and sometimes the system just becomes
> > very unresponsive. Have we heard that before? :-)
>
> You may be the first reporter as far as I can tell :)
>

Oh come on :-) I (and others) have reported bad writeout behaviour since years. But maybe not in the combination of local and NFS I/O.

> > Somehow I have the impression that NFS writeout is able to
> > absolutely dominate the dirty pages to an extent that the system is
> > unusable.
>
> This is why I want to limit NR_WRITEBACK for NFS:
>
> [PATCH] NFS: introduce writeback wait queue
> http://lkml.org/lkml/2009/10/3/198
>

Thanks. I will have a look. Is 2.6.32.x OK for testing?

Cheers
Martin

2009-11-11 00:45:53

by Fengguang Wu

[permalink] [raw]

Subject: Re: Likley stupid question on "throttle_vm_writeout"

On Wed, Nov 11, 2009 at 12:11:37AM +0800, Martin Knoblauch wrote:
> > This is why I want to limit NR_WRITEBACK for NFS:
> >
> > [PATCH] NFS: introduce writeback wait queue
> > http://lkml.org/lkml/2009/10/3/198
> >
>
> Thanks. I will have a look. Is 2.6.32.x OK for testing?

I have a more recent patch (attached) based on 2.6.32.

This on/off threshold based approach may not be good for producing
fluent network flow. So I'm experimenting to do more fine grained
waits (ie. instead of wait for 10 jiffies at one time, we do 10
1-jiffy spread sleeps).

Thanks,
Fengguang

Attachments:

(No filename) (592.00 B)
writeback-nfs-request-queue.patch (10.57 kB)
Download all attachments