LinuxLists.cc - Page cache write performance issue

2004-10-13 05:47:11

Subject: Page cache write performance issue

Hi guys,

I've noticed the following performance regression from
between 2.6.8.1 and 2.6.9-rc. It seems to have a very
pronounced affect on both ext2 and xfs.

- single thread, writing (what should be) straight into
the page cache, file size 1/2 of memory size (500MB vs
1GB), writes are in 1K chunks, most of memory is free,
machine was just booted;

- on 2.6.8 (and earlier 2.6 releases) I can typically
get ~50MB/sec on this machine doing this; (or better
with larger I/O sizes, but thats not the point here)

- on 2.4.28-pre (and all 2.4 releases) I can typically
get ~70MB/sec, presumably writeback kicks in earlier on
2.6; OK, I guess we can live with that... probably some
tradeoff is being made there in the VM;

- on 2.6.9-rc I can only get _4_MB/sec (ext2 or xfs);
writeback commences very quickly, CPU utilisation drops
way down (from 100% to <10%)... looks like we go slower
cos we're initiating I/O almost from the start.

Now if I bump up /proc/sys/vm/dirty_background_ratio and
/proc/sys/vm/dirty_ratio from 40 to 80, I see the expected
performance again (actually, I see the 2.4 performance,
so the poorer early-2.6 numbers were probably due to I/O
commencing at the tail end of all the writes, due to 50%
being more than 40% :). But 2.6.8 had the same default
dirty writeout ratios (40) as 2.6.9-rc does, didn't it?

So, any ideas what happened to 2.6.9? Whats the rationale
for commencing writeout earlier in 2.6 (even when there's
so much free memory available)? Any chance we can get the
defaults set to something much larger in the wake of the
other 2.6.9 VM changes, so we don't regress here?

thanks!

--
Nathan

2004-10-13 06:21:51

by Andrew Morton

[permalink] [raw]

Subject: Re: Page cache write performance issue

Nathan Scott <[email protected]> wrote:
>
> So, any ideas what happened to 2.6.9?

Does reverting the below fix it up?

> Whats the rationale for commencing writeout earlier in 2.6
> (even when there's
> so much free memory available)?

There wasn't much rationale behind that patch - that's why I dropped it the
first three times ;) I have no problem with making it four times.

It could be that small values of unmapped_ratio are making background_ratio
too small.

--- a/mm/page-writeback.c 10 Aug 2004 04:16:17 -0000 1.43
+++ a/mm/page-writeback.c 13 Oct 2004 06:12:03 -0000
@@ -153,9 +153,11 @@
if (dirty_ratio < 5)
dirty_ratio = 5;

- background_ratio = dirty_background_ratio;
- if (background_ratio >= dirty_ratio)
- background_ratio = dirty_ratio / 2;
+ /*
+ * Keep the ratio between dirty_ratio and background_ratio roughly
+ * what the sysctls are after dirty_ratio has been scaled (above).
+ */
+ background_ratio = dirty_background_ratio * dirty_ratio/vm_dirty_ratio;

background = (background_ratio * total_pages) / 100;
dirty = (dirty_ratio * total_pages) / 100;

2004-10-13 06:42:28

by Nathan Scott

[permalink] [raw]

Subject: Re: Page cache write performance issue

Hi Andrew,

On Tue, Oct 12, 2004 at 11:19:45PM -0700, Andrew Morton wrote:
> Nathan Scott <[email protected]> wrote:
> >
> > So, any ideas what happened to 2.6.9?
>
> Does reverting the below fix it up?

Reverting that one improves things slightly - I move up from
~4MB/sec to ~17MB/sec; thats just under a third of the 2.6.8
numbers I was seeing though, unfortunately.

cheers.

--
Nathan

2004-10-13 07:03:57

by Andrew Morton

[permalink] [raw]

Subject: Re: Page cache write performance issue

Nathan Scott <[email protected]> wrote:
>
> Hi Andrew,
>
> On Tue, Oct 12, 2004 at 11:19:45PM -0700, Andrew Morton wrote:
> > Nathan Scott <[email protected]> wrote:
> > >
> > > So, any ideas what happened to 2.6.9?
> >
> > Does reverting the below fix it up?
>
> Reverting that one improves things slightly - I move up from
> ~4MB/sec to ~17MB/sec; thats just under a third of the 2.6.8
> numbers I was seeing though, unfortunately.
>

Well something else if fishy: how can you possibly achieve only 4MB/sec?
Using floppy disks or something?

Does the same happen on ext2?

It's exactly a 500MB write on a 1000MB machine, yes?

2004-10-13 07:24:36

by Nathan Scott

[permalink] [raw]

Subject: Re: Page cache write performance issue

On Wed, Oct 13, 2004 at 12:02:06AM -0700, Andrew Morton wrote:
>
> Well something else if fishy: how can you possibly achieve only 4MB/sec?

These are 1K writes too remember, so it feels a bit like we
write 'em out one at a time, sync (though no O_SYNC, or fsync,
or such involved here). This is on an i686, so 4K pages, and
using 4K filesystem blocksizes (both xfs and ext2).

And now that you mention, yes, this is multiple times below
the direct IO numbers too (which on this box are ~30MB/sec
for direct blkdev writes, IIRC, & XFS has similar numbers).

> Using floppy disks or something?

Heh, uh, no. (and no, not "pencils" either ;)

> Does the same happen on ext2?

Yes.

> It's exactly a 500MB write on a 1000MB machine, yes?

Thats correct.

No slab/page/.. debug options enabled either - its the same
.config that was performing ~10x better on 2.6.8. I also
verified that it wasn't any of the XFS changes either (they
wouldn't have affected ext2 anyway, of course) - the same
XFS code backported to 2.6.8 performs fine also.

cheers.

--
Nathan

2004-10-13 08:15:39

by Nick Piggin

[permalink] [raw]

Subject: Re: Page cache write performance issue

Nathan Scott wrote:

>On Wed, Oct 13, 2004 at 12:02:06AM -0700, Andrew Morton wrote:
>
>>Well something else if fishy: how can you possibly achieve only 4MB/sec?
>>
>
>These are 1K writes too remember, so it feels a bit like we
>write 'em out one at a time, sync (though no O_SYNC, or fsync,
>or such involved here). This is on an i686, so 4K pages, and
>using 4K filesystem blocksizes (both xfs and ext2).
>
>

Still shouldn't cause such a big slowdown. Seems like they
might be getting written off the end of the page reclaim
LRU (although in that case it is a bit odd that increasing
the dirty thresholds are improving performance).

I don't think we have any vmscan metrics for this... kswapd
definitely has become more active in 2.6.9-rc. If you're stuck
for ideas, try editing mm/vmscan.c:may_write_to_queue - comment
out the if(current_is_kswapd()) check.

It is a long shot though. Andrew probably has better ideas.

2004-10-13 08:41:42

by Andrew Morton

[permalink] [raw]

Subject: Re: Page cache write performance issue

Nick Piggin <[email protected]> wrote:
>
> Andrew probably has better ideas.

uh, is this an ia32 highmem box?

If so, you've hit the VM sour spot. That 128M highmem zone gets 100%
filled with dirty pages and we end up doing a ton of writeout off the page
LRU. And we do that while `dd' is cheerfully writing to a totally
different part of the disk via balance_dirty_pages(). Seekstorm ensues.
Although last time I looked (a long time ago) the slowdown was only 2:1 -
perhaps your disk is in writethrough mode??

Basically, *any* other config is fine. 896MB and below, 1.5GB and above.

I could well understand that a minor kswapd tweak would make this bad
situation worse. Making the dirty ratios really small (dirty_ratio less
than the 128MB) should make it go away.

If it's not ia32 then dunno.

2004-10-14 00:54:57

by Nathan Scott

[permalink] [raw]

Subject: Re: Page cache write performance issue

On Wed, Oct 13, 2004 at 01:39:41AM -0700, Andrew Morton wrote:
> Nick Piggin <[email protected]> wrote:
> >
> > Andrew probably has better ideas.
>
> uh, is this an ia32 highmem box?

Yep, it is.

> If so, you've hit the VM sour spot.
> ...
> Basically, *any* other config is fine. 896MB and below, 1.5GB and above.

I just tried switching CONFIG_HIGHMEM off, and so running the
machine with 512MB; then adjusted the test to write 256M into
the page cache, again in 1K sequential chunks. A similar mis-
behaviour happens, though the numbers are slightly better (up
from ~4 to ~6.5MB/sec). Both ext2 and xfs see this. When I
drop the file size down to 128M with this kernel, I see good
results again (as we'd expect).

I'm being pulled onto other issues atm, but in the background
I could try reverting specific changesets if you guys can
suggest anything in particular that might be triggering this?

thanks!

--
Nathan

2004-10-14 03:22:38

by Andrew Morton

[permalink] [raw]

Subject: Re: Page cache write performance issue

Nathan Scott <[email protected]> wrote:
>
> On Wed, Oct 13, 2004 at 01:39:41AM -0700, Andrew Morton wrote:
> > Nick Piggin <[email protected]> wrote:
> > >
> > > Andrew probably has better ideas.
> >
> > uh, is this an ia32 highmem box?
>
> Yep, it is.
>
> > If so, you've hit the VM sour spot.
> > ...
> > Basically, *any* other config is fine. 896MB and below, 1.5GB and above.
>
> I just tried switching CONFIG_HIGHMEM off, and so running the
> machine with 512MB; then adjusted the test to write 256M into
> the page cache, again in 1K sequential chunks. A similar mis-
> behaviour happens, though the numbers are slightly better (up
> from ~4 to ~6.5MB/sec). Both ext2 and xfs see this. When I
> drop the file size down to 128M with this kernel, I see good
> results again (as we'd expect).

No such problem here, with

dd if=/dev/zero of=x bs=1k count=128k

on a 256MB machine. xfs and ext2.

Can you exhibit this one more than one machine?

Silly question: what does `grep sync' /etc/fstab say over there? ;)

2004-10-14 07:20:39

by Nathan Scott

[permalink] [raw]

Subject: Re: Page cache write performance issue

On Wed, Oct 13, 2004 at 08:20:41PM -0700, Andrew Morton wrote:
> Nathan Scott <[email protected]> wrote:
> > I just tried switching CONFIG_HIGHMEM off, and so running the
> > machine with 512MB; then adjusted the test to write 256M into
> > the page cache, again in 1K sequential chunks. A similar mis-
> > behaviour happens, though the numbers are slightly better (up
> > from ~4 to ~6.5MB/sec). Both ext2 and xfs see this. When I
> > drop the file size down to 128M with this kernel, I see good
> > results again (as we'd expect).
>
> No such problem here, with
>
> dd if=/dev/zero of=x bs=1k count=128k
>
> on a 256MB machine. xfs and ext2.

Yup, rebooted with mem=128M and on my box, & that crawls.
Maybe its just this old hunk 'o junk, I suppose; odd that
2.6.8 was OK with this though.

> Can you exhibit this one more than one machine?

I haven't got a second ia32 box atm - setting one up soon,
will let you know how it goes.

> Silly question: what does `grep sync' /etc/fstab say over there? ;)

Same thing it said on 2.6.8. :) Nada.

cheers.

--
Nathan

2004-10-14 08:01:42

by Nick Piggin

[permalink] [raw]

Subject: Re: Page cache write performance issue

Nathan Scott wrote:
> On Wed, Oct 13, 2004 at 08:20:41PM -0700, Andrew Morton wrote:
>
>>Nathan Scott <[email protected]> wrote:
>>
>>> I just tried switching CONFIG_HIGHMEM off, and so running the
>>> machine with 512MB; then adjusted the test to write 256M into
>>> the page cache, again in 1K sequential chunks. A similar mis-
>>> behaviour happens, though the numbers are slightly better (up
>>> from ~4 to ~6.5MB/sec). Both ext2 and xfs see this. When I
>>> drop the file size down to 128M with this kernel, I see good
>>> results again (as we'd expect).
>>
>>No such problem here, with
>>
>> dd if=/dev/zero of=x bs=1k count=128k
>>
>>on a 256MB machine. xfs and ext2.
>
>
> Yup, rebooted with mem=128M and on my box, & that crawls.
> Maybe its just this old hunk 'o junk, I suppose; odd that
> 2.6.8 was OK with this though.
>

Just out of interest, can you get profiles and a few lines
of vmstat 1 from 2.6.8 and 2.6.9-rc, please?