2002-01-17 19:22:53

by Rik van Riel

[permalink] [raw]
Subject: [PATCH *] rmap VM 11c

For this release, IO tests are very much welcome ...


The third maintenance release of the 11th version of the reverse
mapping based VM is now available.
This is an attempt at making a more robust and flexible VM
subsystem, while cleaning up a lot of code at the same time.
The patch is available from:

http://surriel.com/patches/2.4/2.4.17-rmap-11c
and http://linuxvm.bkbits.net/


My big TODO items for a next release are:
- fix page_launder() so it doesn't submit the whole
inactive_dirty list for writeout in one go
... no longer needed due to fixed elevator ???
- auto-tuning readahead, readahead per VMA

rmap 11c:
- oom_kill race locking fix (Andres Salomon)
- elevator improvement (Andrew Morton)
- dirty buffer writeout speedup (hopefully ;)) (me)
- small documentation updates (me)
- page_launder() never does synchronous IO, kswapd
and the processes calling it sleep on higher level (me)
- deadlock fix in touch_page() (me)
rmap 11b:
- added low latency reschedule points in vmscan.c (me)
- make i810_dma.c include mm_inline.h too (William Lee Irwin)
- wake up kswapd sleeper tasks on OOM kill so the
killed task can continue on its way out (me)
- tune page allocation sleep point a little (me)
rmap 11a:
- don't let refill_inactive() progress count for OOM (me)
- after an OOM kill, wait 5 seconds for the next kill (me)
- agpgart_be fix for hashed waitqueues (William Lee Irwin)
rmap 11:
- fix stupid logic inversion bug in wakeup_kswapd() (Andrew Morton)
- fix it again in the morning (me)
- add #ifdef BROKEN_PPC_PTE_ALLOC_ONE to rmap.h, it
seems PPC calls pte_alloc() before mem_map[] init (me)
- disable the debugging code in rmap.c ... the code
is working and people are running benchmarks (me)
- let the slab cache shrink functions return a value
to help prevent early OOM killing (Ed Tomlinson)
- also, don't call the OOM code if we have enough
free pages (me)
- move the call to lru_cache_del into __free_pages_ok (Ben LaHaise)
- replace the per-page waitqueue with a hashed
waitqueue, reduces size of struct page from 64
bytes to 52 bytes (48 bytes on non-highmem machines) (William Lee Irwin)
rmap 10:
- fix the livelock for real (yeah right), turned out
to be a stupid bug in page_launder_zone() (me)
- to make sure the VM subsystem doesn't monopolise
the CPU, let kswapd and some apps sleep a bit under
heavy stress situations (me)
- let __GFP_HIGH allocations dig a little bit deeper
into the free page pool, the SCSI layer seems fragile (me)
rmap 9:
- improve comments all over the place (Michael Cohen)
- don't panic if page_remove_rmap() cannot find the
rmap in question, it's possible that the memory was
PG_reserved and belonging to a driver, but the driver
exited and cleared the PG_reserved bit (me)
- fix the VM livelock by replacing > by >= in a few
critical places in the pageout code (me)
- treat the reclaiming of an inactive_clean page like
allocating a new page, calling try_to_free_pages()
and/or fixup_freespace() if required (me)
- when low on memory, don't make things worse by
doing swapin_readahead (me)
rmap 8:
- add ANY_ZONE to the balancing functions to improve
kswapd's balancing a bit (me)
- regularize some of the maximum loop bounds in
vmscan.c for cosmetic purposes (William Lee Irwin)
- move page_address() to architecture-independent
code, now the removal of page->virtual is portable (William Lee Irwin)
- speed up free_area_init_core() by doing a single
pass over the pages and not using atomic ops (William Lee Irwin)
- documented the buddy allocator in page_alloc.c (William Lee Irwin)
rmap 7:
- clean up and document vmscan.c (me)
- reduce size of page struct, part one (William Lee Irwin)
- add rmap.h for other archs (untested, not for ARM) (me)
rmap 6:
- make the active and inactive_dirty list per zone,
this is finally possible because we can free pages
based on their physical address (William Lee Irwin)
- cleaned up William's code a bit (me)
- turn some defines into inlines and move those to
mm_inline.h (the includes are a mess ...) (me)
- improve the VM balancing a bit (me)
- add back inactive_target to /proc/meminfo (me)
rmap 5:
- fixed recursive buglet, introduced by directly
editing the patch for making rmap 4 ;))) (me)
rmap 4:
- look at the referenced bits in page tables (me)
rmap 3:
- forgot one FASTCALL definition (me)
rmap 2:
- teach try_to_unmap_one() about mremap() (me)
- don't assign swap space to pages with buffers (me)
- make the rmap.c functions FASTCALL / inline (me)
rmap 1:
- fix the swap leak in rmap 0 (Dave McCracken)
rmap 0:
- port of reverse mapping VM to 2.4.16 (me)

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/


2002-01-18 00:00:19

by Bill Davidsen

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM 11c

On Thu, 17 Jan 2002, Rik van Riel wrote:

> For this release, IO tests are very much welcome ...
>
>
> The third maintenance release of the 11th version of the reverse
> mapping based VM is now available.
> This is an attempt at making a more robust and flexible VM
> subsystem, while cleaning up a lot of code at the same time.
> The patch is available from:
>
> http://surriel.com/patches/2.4/2.4.17-rmap-11c
> and http://linuxvm.bkbits.net/

Rik, I tried a simple test, building a kernel in a 128M P-II-400, and when
the load average got up to 50 or so the system became slow;-) On the other
hand it was still usable for most normal things other then incoming mail
which properly blocks at LA>10 or so.

I'll be trying it on a large machine tomorrow, but it at least looks
stable. In real life no sane person would do that, would they? Make with a
nice -10 was essentially invisible.

Maybe tomorrow the lateest -aa kernel on the same machine, with and
without my own personal patch.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-01-18 00:07:19

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM 11c

On Thu, 17 Jan 2002, Bill Davidsen wrote:

> > http://surriel.com/patches/2.4/2.4.17-rmap-11c
> > and http://linuxvm.bkbits.net/

> Rik, I tried a simple test, building a kernel in a 128M P-II-400, and
> when the load average got up to 50 or so the system became slow;-) On
> the other hand it was still usable for most normal things other then
> incoming mail which properly blocks at LA>10 or so.

Hehehe, when the load average is 50 only 2% of the CPU is available
for you. With that many gccs you're also under a memory squeeze with
128 MB of RAM, so it's no big wonder things got slow. ;)

I'm happy to hear the system was still usable, though.

> I'll be trying it on a large machine tomorrow, but it at least looks
> stable. In real life no sane person would do that, would they? Make
> with a nice -10 was essentially invisible.

Neat ...

> Maybe tomorrow the lateest -aa kernel on the same machine, with and
> without my own personal patch.

Looking forward to the results.

regards,

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/

2002-01-18 00:33:42

by Adam Kropelin

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM 11c

Rik van Riel <[email protected]>:
> For this release, IO tests are very much welcome ...

Results from a run of my large FTP transfer test on this new release are...
interesting.

Overall time shows an improvement (6:28), though not enough of one to take the
lead over 2.4.13-ac7.

More interesting, perhaps, is the vmstat output, which shows this at first:

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 0 47816 2992 84236 0 0 10 0 4462 174 1 33 66
1 0 0 0 41704 3004 89320 0 0 10 0 4322 167 0 33 67
0 1 0 0 36004 3012 94064 0 0 9 877 4030 163 1 30 69
0 1 1 0 33536 3016 96112 0 0 4 1616 1724 62 0 18 82
0 1 2 0 31068 3020 98160 0 0 4 2048 1729 52 1 15 83
0 1 1 0 28608 3024 100208 0 0 4 2064 1735 56 1 16 82
0 1 1 0 26144 3028 102256 0 0 4 2048 1735 50 0 16 84
0 1 1 0 23684 3032 104304 0 0 5 2048 1713 45 1 15 84
0 1 1 0 21216 3036 106352 0 0 3 2064 1723 52 1 14 85
1 0 2 0 18728 3040 108420 0 0 5 2048 1750 59 0 17 82
0 1 1 0 16292 3044 110448 0 0 3 2064 1722 60 0 15 84
1 0 1 0 13824 3048 112572 0 0 5 2032 1800 61 0 17 83
1 0 1 0 11696 3052 114548 0 0 4 2528 1658 47 0 14 86
1 0 1 0 9232 3056 116596 0 0 4 2048 1735 51 1 13 86
0 1 2 0 6808 3060 118640 0 0 3 1584 1729 84 0 16 84

(i.e., nice steady writeout reminiscent of -ac)

...but after about 20 seconds, behavior degrades again:

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 1 0 1500 3124 123268 0 0 0 3788 534 20 0 8 92
0 1 1 0 1500 3124 123268 0 0 0 0 107 12 0 0 100
0 1 1 0 1500 3124 123268 0 0 0 0 123 10 0 0 100
0 1 1 0 1500 3124 123268 0 0 0 3666 123 12 0 2 97
0 1 1 0 1500 3124 123268 0 0 1 259 109 12 0 8 92
1 0 0 0 1404 3124 123360 0 0 2 0 1078 28 0 7 92
1 0 0 0 1404 3136 123444 0 0 11 0 4560 178 0 39 61
1 0 0 0 1404 3148 123448 0 0 10 0 4620 175 1 34 64
0 0 0 0 1312 3156 123568 0 0 11 0 4276 181 0 36 64
0 0 0 0 1404 3168 123492 0 0 10 0 4330 185 1 30 68
0 1 1 0 1404 3172 123488 0 0 4 6864 1742 69 0 17 83
0 1 1 0 1408 3172 123488 0 0 0 0 111 12 0 0 99
0 1 1 0 1408 3172 123488 0 0 0 0 126 8 0 0 100
0 1 1 0 1404 3172 123480 0 0 0 7456 518 18 0 10 90
0 1 1 0 1404 3172 123480 0 0 0 0 112 10 0 0 100
0 1 1 0 1404 3172 123480 0 0 0 0 123 9 0 0 100
0 1 1 0 1404 3172 123476 0 0 1 7222 120 16 0 5 95
0 1 1 0 1404 3172 123476 0 0 0 0 106 8 0 0 100
0 1 1 0 1524 3172 123352 0 0 0 3790 519 18 0 8 92
0 1 1 0 1524 3172 123352 0 0 0 0 113 8 0 0 100
0 1 1 0 1524 3172 123352 0 0 0 0 125 8 0 0 100

Previous tests showed fluctuating bo values from the start; this is the first
time I've seen them steady, so something in the patch definitely is showing
through here.

I've a couple more tests to run, such as combining -rmap11c with cpqarray and
eepro driver updates from -ac. I'll keep you posted.

--Adam


2002-01-18 00:57:14

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM 11c

On Thu, 17 Jan 2002, Adam Kropelin wrote:
> Rik van Riel <[email protected]>:
> > For this release, IO tests are very much welcome ...
>
> Results from a run of my large FTP transfer test on this new release
> are... interesting.
>
> Overall time shows an improvement (6:28), though not enough of one to
> take the lead over 2.4.13-ac7.

> (i.e., nice steady writeout reminiscent of -ac)
> ...but after about 20 seconds, behavior degrades again:
>
> Previous tests showed fluctuating bo values from the start; this is the first
> time I've seen them steady, so something in the patch definitely is showing
> through here.

Thank you for running this test. I'll try to debug the situation
and see what's going on ... this definately isn't behaving like
it should.

kind regards,

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/


2002-01-18 10:07:27

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM 11c

This looks a little like my problem...

See http://karlsbakk.net/dev/kernel/vm-fsckup.txt

On Thu, 17 Jan 2002, Adam Kropelin wrote:

> Rik van Riel <[email protected]>:
> > For this release, IO tests are very much welcome ...
>
> Results from a run of my large FTP transfer test on this new release are...
> interesting.
>
> Overall time shows an improvement (6:28), though not enough of one to take the
> lead over 2.4.13-ac7.
>
> More interesting, perhaps, is the vmstat output, which shows this at first:
>
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 0 0 0 0 47816 2992 84236 0 0 10 0 4462 174 1 33 66
> 1 0 0 0 41704 3004 89320 0 0 10 0 4322 167 0 33 67
> 0 1 0 0 36004 3012 94064 0 0 9 877 4030 163 1 30 69
> 0 1 1 0 33536 3016 96112 0 0 4 1616 1724 62 0 18 82
> 0 1 2 0 31068 3020 98160 0 0 4 2048 1729 52 1 15 83
> 0 1 1 0 28608 3024 100208 0 0 4 2064 1735 56 1 16 82
> 0 1 1 0 26144 3028 102256 0 0 4 2048 1735 50 0 16 84
> 0 1 1 0 23684 3032 104304 0 0 5 2048 1713 45 1 15 84
> 0 1 1 0 21216 3036 106352 0 0 3 2064 1723 52 1 14 85
> 1 0 2 0 18728 3040 108420 0 0 5 2048 1750 59 0 17 82
> 0 1 1 0 16292 3044 110448 0 0 3 2064 1722 60 0 15 84
> 1 0 1 0 13824 3048 112572 0 0 5 2032 1800 61 0 17 83
> 1 0 1 0 11696 3052 114548 0 0 4 2528 1658 47 0 14 86
> 1 0 1 0 9232 3056 116596 0 0 4 2048 1735 51 1 13 86
> 0 1 2 0 6808 3060 118640 0 0 3 1584 1729 84 0 16 84
>
> (i.e., nice steady writeout reminiscent of -ac)
>
> ...but after about 20 seconds, behavior degrades again:
>
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 0 1 1 0 1500 3124 123268 0 0 0 3788 534 20 0 8 92
> 0 1 1 0 1500 3124 123268 0 0 0 0 107 12 0 0 100
> 0 1 1 0 1500 3124 123268 0 0 0 0 123 10 0 0 100
> 0 1 1 0 1500 3124 123268 0 0 0 3666 123 12 0 2 97
> 0 1 1 0 1500 3124 123268 0 0 1 259 109 12 0 8 92
> 1 0 0 0 1404 3124 123360 0 0 2 0 1078 28 0 7 92
> 1 0 0 0 1404 3136 123444 0 0 11 0 4560 178 0 39 61
> 1 0 0 0 1404 3148 123448 0 0 10 0 4620 175 1 34 64
> 0 0 0 0 1312 3156 123568 0 0 11 0 4276 181 0 36 64
> 0 0 0 0 1404 3168 123492 0 0 10 0 4330 185 1 30 68
> 0 1 1 0 1404 3172 123488 0 0 4 6864 1742 69 0 17 83
> 0 1 1 0 1408 3172 123488 0 0 0 0 111 12 0 0 99
> 0 1 1 0 1408 3172 123488 0 0 0 0 126 8 0 0 100
> 0 1 1 0 1404 3172 123480 0 0 0 7456 518 18 0 10 90
> 0 1 1 0 1404 3172 123480 0 0 0 0 112 10 0 0 100
> 0 1 1 0 1404 3172 123480 0 0 0 0 123 9 0 0 100
> 0 1 1 0 1404 3172 123476 0 0 1 7222 120 16 0 5 95
> 0 1 1 0 1404 3172 123476 0 0 0 0 106 8 0 0 100
> 0 1 1 0 1524 3172 123352 0 0 0 3790 519 18 0 8 92
> 0 1 1 0 1524 3172 123352 0 0 0 0 113 8 0 0 100
> 0 1 1 0 1524 3172 123352 0 0 0 0 125 8 0 0 100
>
> Previous tests showed fluctuating bo values from the start; this is the first
> time I've seen them steady, so something in the patch definitely is showing
> through here.
>
> I've a couple more tests to run, such as combining -rmap11c with cpqarray and
> eepro driver updates from -ac. I'll keep you posted.
>
> --Adam
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.

2002-01-19 05:09:00

by Adam Kropelin

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM 11c (RMAP IS A WINNER!)

Ken Brownfield:

> Do you get more even throughput with this:
>
> /bin/echo "10 0 0 0 500 3000 10 0 0" > /proc/sys/vm/bdflush
>
> It seems to help significantly for me under heavy sustained I/O load.

With a little modification, Ken's suggestion makes -rmap11c a winner on my test
case.

/bin/echo "10 0 0 0 500 3000 30 0 0" > /proc/sys/vm/bdflush

Switching to synchronous bdflush a little later than Ken did brings performance
up to ~2000 blocks/sec, which is similar to older -ac kernels. This writeout
rate is very consistent (even more so than -ac) and seems to be the top end in
all large writes to the RAID (tried FTP, samba, and local balls-to-the-wall "cat
/dev/zero >..."), which helps show that this is not a network driver or protocol
interaction.

The same bdflush tuning (leaving aa's additional parameters at their defaults)
on 2.4.18pre2aa2 yields some improvement, but rmap is consistently faster by a
good margin. 2.4.17 performs worse with this tuning and is pretty much eating
dust at this point.

Latest Results:
2.4.17-rmap11c: 5:41 (down from 6:58)
2.4.18-pre2aa2: 6:31 (down from 7:10)
2.4.17: 7:06 (up from 6:57)

Congrats, Rik and thanks, Ken!

--Adam


2002-01-19 17:50:16

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM 11c (RMAP IS A WINNER!)

On Sat, Jan 19, 2002 at 12:08:30AM -0500, Adam Kropelin wrote:
> Ken Brownfield:
>
> > Do you get more even throughput with this:
> >
> > /bin/echo "10 0 0 0 500 3000 10 0 0" > /proc/sys/vm/bdflush
> >
> > It seems to help significantly for me under heavy sustained I/O load.
>
> With a little modification, Ken's suggestion makes -rmap11c a winner on my test
> case.
>
> /bin/echo "10 0 0 0 500 3000 30 0 0" > /proc/sys/vm/bdflush
^
>
> Switching to synchronous bdflush a little later than Ken did brings performance
> up to ~2000 blocks/sec, which is similar to older -ac kernels. This writeout
> rate is very consistent (even more so than -ac) and seems to be the top end in
> all large writes to the RAID (tried FTP, samba, and local balls-to-the-wall "cat
> /dev/zero >..."), which helps show that this is not a network driver or protocol
> interaction.
>
> The same bdflush tuning (leaving aa's additional parameters at their defaults)

you cannot set the underlined one to zero (way too low, insane) or to
left it to its default (20) in -aa, or it will be misconfigured setup
that can lead to anything. the rule is:

nfract_stop_bdflush <= nfract <= nfract_sync

you set:

nfract = 10
nfract_sync = 30

so nfract_stop_bdflush cannot be 20.

Furthmore you set ndirty to 0, that also is an invalid setup.

With -aa something sane along the above lines is:

/bin/echo "10 2000 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush

this set nfract to 2000 (so you will write around 2 mbyte at every go
with a 1k fs like I bet you have, not 500k as the default), plus nfract
= 10%, nfract_sync = 30% and nfract_stop_bdflush = 5%
(nfract_stop_bdflush is available only in -aa). Of course nfract should
be in function at least of bytes, not of blocksize, but oh well...

now it would be interesting to know how it performs this way with -aa.
The fact you setup the stop bdflush either to 0 or to 20 in -aa can very
well explain regression in async-flushing performance with your previous
test on top of -aa.

Andrea

2002-01-19 18:39:55

by Adam Kropelin

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM 11c (RMAP IS A WINNER!)

Andrea Arcangeli:
> On Sat, Jan 19, 2002 at 12:08:30AM -0500, Adam Kropelin wrote:
> > /bin/echo "10 0 0 0 500 3000 30 0 0" > /proc/sys/vm/bdflush
> ^
>
> you cannot set the underlined one to zero (way too low, insane) or to
> left it to its default (20) in -aa, or it will be misconfigured setup
> that can lead to anything. the rule is:
>
> nfract_stop_bdflush <= nfract <= nfract_sync

<snip>

> so nfract_stop_bdflush cannot be 20.

Ok, thanks for straightening me out on that. I figured there might be some
consequence of the additional knobs in -aa which I didn't know about.

> Furthmore you set ndirty to 0, that also is an invalid setup.

I didn't. That was one of the "additional parameters" that I left at the default
on -aa (500, it seems). Sorry, I should have been clearer about exactly what
settings I used on -aa; the quoted settings were for -rmap only. For reference,
the exact command I tried on -aa was:

/bin/echo "10 500 0 0 500 3000 30 20 0" > /proc/sys/vm/bdflush

> With -aa something sane along the above lines is:
>
> /bin/echo "10 2000 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush

Unfortunately, those adjustments on top of 2.4.18-pre2aa2 set a new record for
worst performance: 7:19.

An additional datapoint: The quoted bdflush settings which make 2.4.17-rmap11c a
winner do not do well at all on 2.4.17-rmap11a. Rik's initial reaction to the
issue was that there was a bug and I know he made some changes in rmap11c to
address it. The fact that 11c definitely performs better for me than 11a seems
to support this. Perhaps this bug or a variant thereof also exists in aa?

--Adam


2002-01-19 20:21:32

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM 11c (RMAP IS A WINNER!)

On Sat, Jan 19, 2002 at 01:39:22PM -0500, Adam Kropelin wrote:
> Andrea Arcangeli:
> > On Sat, Jan 19, 2002 at 12:08:30AM -0500, Adam Kropelin wrote:
> > > /bin/echo "10 0 0 0 500 3000 30 0 0" > /proc/sys/vm/bdflush
> > ^
> >
> > you cannot set the underlined one to zero (way too low, insane) or to
> > left it to its default (20) in -aa, or it will be misconfigured setup
> > that can lead to anything. the rule is:
> >
> > nfract_stop_bdflush <= nfract <= nfract_sync
>
> <snip>
>
> > so nfract_stop_bdflush cannot be 20.
>
> Ok, thanks for straightening me out on that. I figured there might be some
> consequence of the additional knobs in -aa which I didn't know about.
>
> > Furthmore you set ndirty to 0, that also is an invalid setup.
>
> I didn't. That was one of the "additional parameters" that I left at the default
> on -aa (500, it seems). Sorry, I should have been clearer about exactly what
> settings I used on -aa; the quoted settings were for -rmap only. For reference,
> the exact command I tried on -aa was:
>
> /bin/echo "10 500 0 0 500 3000 30 20 0" > /proc/sys/vm/bdflush
>
> > With -aa something sane along the above lines is:
> >
> > /bin/echo "10 2000 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush
>
> Unfortunately, those adjustments on top of 2.4.18-pre2aa2 set a new record for
> worst performance: 7:19.

then please try to decrease the nfract variable again, the above set it
to 2000, if you've a slow harddisk maybe that's too much, so you can try
to set it to 500 again.

I'd also give a try with the below settings:

/bin/echo "10 500 0 0 500 3000 80 8 0" > /proc/sys/vm/bdflush

(500 may vary, you may try with 200 or 1000 instead etc.., but a large
ndirty_sync should allow your program to keep getting data)

> An additional datapoint: The quoted bdflush settings which make 2.4.17-rmap11c a
> winner do not do well at all on 2.4.17-rmap11a. Rik's initial reaction to the
> issue was that there was a bug and I know he made some changes in rmap11c to
> address it. The fact that 11c definitely performs better for me than 11a seems
> to support this. Perhaps this bug or a variant thereof also exists in aa?

AFIK there are no known bugs at the moment (things that can be called
bugs I mean). I cannot consider this special speed variation a bug. And
this buffer flushing thing matters only with a few function in buffer.c,
I don't see how the rmap design can make any difference to this
benchmark (if there's some true change that makes difference then it's
completly orthogonal with rmap).

I benchmarked in my hardware that writes as as fast as reads, and
personally I'm fine with the current behaviour of the async flushing for
2.4, so I don't care much about this (async flushing points are an
heuristic and so it's hard to get every single case faster than before,
that's why it's tuanable, and what you are hitting could be a
timing loopback), I basically care to find exactly what makes the
difference for you, just to be sure it's nothing serious as expected
(just different async flushing wakeup points).

Also just in case, I'd suggest to try to repeat each benchmark three
times, so we know we are not bitten by random variations in the numbers.

Andrea

2002-01-19 22:15:49

by Adam Kropelin

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM 11c (RMAP IS A WINNER!)

(Andrea, the previous version of this mail wasn't supposed to go out yet. I
fat-fingered and sent it before I was done. This is the full version.)

Andrea Arcangeli:
> On Sat, Jan 19, 2002 at 01:39:22PM -0500, Adam Kropelin wrote:
> > Andrea Arcangeli:
> > > With -aa something sane along the above lines is:
> > >
> > > /bin/echo "10 2000 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush
> >
> > Unfortunately, those adjustments on top of 2.4.18-pre2aa2 set a new record
for
> > worst performance: 7:19.
>
> then please try to decrease the nfract variable again, the above set it
> to 2000, if you've a slow harddisk maybe that's too much, so you can try
> to set it to 500 again.

Yes, the harddisk is definitely slow: it's a hw RAID5 partition with older
drives, so writes are pretty slow.

I tried various nfract settings:

/bin/echo "10 300 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush
7:33

/bin/echo "10 500 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush
6:00

/bin/echo "10 800 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush
7:17

nfract=500 seems to be the best and gets much closer to the performance of rmap
and -ac. Writeout is still very bursty compared to the other kernels, but that
may not really matter, I don't know.

> I'd also give a try with the below settings:
>
> /bin/echo "10 500 0 0 500 3000 80 8 0" > /proc/sys/vm/bdflush

7:08

<snip>

> Also just in case, I'd suggest to try to repeat each benchmark three
> times, so we know we are not bitten by random variations in the numbers.

I've been doing a variation on that theme already. The numbers I've been
reporting are best of 2 runs. I have never seen the 2 runs differ by more than
+/- 10 seconds.

--Adam