2003-05-30 17:03:07

by Rik van Riel

[permalink] [raw]
Subject: [PATCH] rmap 15j

The tenth maintenance release of the 15th version of the reverse
mapping based VM is now available.
This is an attempt at making a more robust and flexible VM
subsystem, while cleaning up a lot of code at the same time.
The patch is available from:

http://surriel.com/patches/2.4/2.4.21-pre7-rmap15j
and http://linuxvm.bkbits.net/


My big TODO items for a next release are:
- finetune the O(1) VM code for strange corner cases
- add pte-highmem defines for more architectures
- highmem tweaks

rmap 15j:
- agressive inode reclaim on highmem boxes (me)
- OOM killer tweaks, hopefully better now (me)
- better higher-order page allocations (me)
- small updates and tweaks
rmap 15i:
- drop behind only drops really new pages (Arjan van de Ven)
- lots of VM tuning left and right (Arjan, Ingo, me)
rmap 15h:
- fix obscure SMP race (Ben LaHaise, me)
- fix smp races with bufferhead reclaim (Andrew Morton, me)
- architecture updates (various)
- add barrier to page_unlock_rmap (Pete Zaitcev)
rmap 15g:
- more ppc64 pte-highmem stuff (Julie DeWandel)
- hammer pte-highmem stuff (Jim Paradis)
- reclaim buffer heads under memory pressure (me)
rmap 15f:
- remove pte-highmem compat define from ieee1394 (Marc-C. Petersen)
- clean up scan_active_list after suggestion from hch (me)
- lock ordering fix (me)
- add barrier() to page_chain_lock() (Pete Zaitcev)
- fix pte-highmem defines for ppc64 (Julie DeWandel)
- add pte-highmem defines for s390 & s390x (Pete Zaitcev)
rmap 15e:
- make reclaiming unused inodes more efficient (Arjan van de Ven)
| push to Marcelo and Andrew once it's well tested !
- fix DRM memory leak (Arjan van de Ven)
- fix potential infinite loop in kswapd (me)
- clean up elevator.h (no IO scheduler in -rmap...) (me)
- page aging interval tuned on a per zone basis, better
wakeup mechanism for sudden memory pressure (Arjan, me)
rmap 15d:
- compatability with PREEMPT patch (me)
| fairly ugly, but should work
- bugfix for the pte_chain allocation code (Arjan van de Ven)
rmap 15c:
- backport and audit akpm's reliable pte_chain alloc
code from 2.5 (me)
- reintroduce cache size tuning knobs in /proc (me)
| on very, very popular request
rmap 15b:
- adjust anon/cache work table (me)
- make active_age_bias a per-active list thing (me)
- don't wake up kswapd early from mark_page_accessed (me)
- make sure pte-chains are cacheline aligned with PAE (me, Andrew Morton)
- change some O(1) VM thresholds (me)
- fix pte-highmem backport (me)
- 2.5 backport: pte-highmem (Ben LaHaise)
- 2.5 backport: large cacheline aligned pte-chains (Ben LaHaise)
- 2.5 backport: direct pte pointers (Ben LaHaise)
- undo __find_pagecache_page braindamage (Christoph Hellwig)
rmap 15a:
- more agressive freeing for higher order allocations (me)
- export __find_pagecache_page, find_get_page define (me, Christoph, Arjan)
- make memory statistics SMP safe again (me)
- make page aging slow down again when needed (Andrew Morton)
- first stab at fine-tuning arjan's O(1) VM (me)
- split active list in cache / working set (me)
- fix SMP locking in arjan's O(1) VM (me)
rmap 15:
- small code cleanups and spelling fixes for O(1) VM (me)
- O(1) page launder, O(1) page aging (Arjan van de Ven)
- resync code with -ac (12 small patches) (me)
rmap 14c:
- fold page_over_rsslimit() into page_referenced() (me)
- 2.5 backport: get pte_chains from the slab cache (William Lee Irwin)
- remove dead code from page_launder_zone() (me)
- make OOM detection a bit more agressive (me)
rmap 14b:
- don't unmap pages not in pagecache (ext3 & reiser) (Andrew Morton, me)
- clean up mark_page_accessed a bit (me)
- Alpha NUMA fix for Ingo's per-cpu pages (Fl?vio Leitner, me)
- remove explicit low latency schedule zap_page_range (Robert Love)
- fix OOM stuff for good, hopefully (me)
rmap 14a:
- Ingo Molnar's per-cpu pages (SMP speedup) (Christoph Hellwig)
- fix SMP bug in page_launder_zone (rmap14 only) (Arjan van de Ven)
- semicolon day, fix typo in rmap.c w/ DEBUG_RMAP (Craig Kulesa)
- remove unneeded pte_chain_unlock/lock pair vmscan.c (Craig Kulesa)
- low latency zap_page_range also without preempt (Arjan van de Ven)
- do some throughput tuning for kswapd/page_launder (me)
- don't allocate swap space for pages we're not writing (me)
rmap 14:
- get rid of stalls during swapping, hopefully (me)
- low latency zap_page_range (Robert Love)
rmap 13c:
- add wmb() to wakeup_memwaiters (Arjan van de Ven)
- remap_pmd_range now calls pte_alloc with full address (Paul Mackerras)
- #ifdef out pte_chain_lock/unlock on UP machines (Andrew Morton)
- un-BUG() truncate_complete_page, the race is expected (Andrew Morton, me)
- remove NUMA changes from rmap13a (Christoph Hellwig)
rmap 13b:
- prevent PF_MEMALLOC recursion for higher order allocs (Arjan van de Ven, me)
- fix small SMP race, PG_lru (Hugh Dickins)
rmap 13a:
- NUMA changes for page_address (Samuel Ortiz)
- replace vm.freepages with simpler kswapd_minfree (Christoph Hellwig)
rmap 13:
- rename touch_page to mark_page_accessed and uninline (Christoph Hellwig)
- NUMA bugfix for __alloc_pages (William Irwin)
- kill __find_page (Christoph Hellwig)
- make pte_chain_freelist per zone (William Irwin)
- protect pte_chains by per-page lock bit (William Irwin)
- minor code cleanups (me)
rmap 12i:
- slab cleanup (Christoph Hellwig)
- remove references to compiler.h from mm/* (me)
- move rmap to marcelo's bk tree (me)
- minor cleanups (me)
rmap 12h:
- hopefully fix OOM detection algorithm (me)
- drop pte quicklist in anticipation of pte-highmem (me)
- replace andrea's highmem emulation by ingo's one (me)
- improve rss limit checking (Nick Piggin)
rmap 12g:
- port to armv architecture (David Woodhouse)
- NUMA fix to zone_table initialisation (Samuel Ortiz)
- remove init_page_count (David Miller)
rmap 12f:
- for_each_pgdat macro (William Lee Irwin)
- put back EXPORT(__find_get_page) for modular rd (me)
- make bdflush and kswapd actually start queued disk IO (me)
rmap 12e
- RSS limit fix, the limit can be 0 for some reason (me)
- clean up for_each_zone define to not need pgdata_t (William Lee Irwin)
- fix i810_dma bug introduced with page->wait removal (William Lee Irwin)
rmap 12d:
- fix compiler warning in rmap.c (Roger Larsson)
- read latency improvement (read-latency2) (Andrew Morton)
rmap 12c:
- fix small balancing bug in page_launder_zone (Nick Piggin)
- wakeup_kswapd / wakeup_memwaiters code fix (Arjan van de Ven)
- improve RSS limit enforcement (me)
rmap 12b:
- highmem emulation (for debugging purposes) (Andrea Arcangeli)
- ulimit RSS enforcement when memory gets tight (me)
- sparc64 page->virtual quickfix (Greg Procunier)
rmap 12a:
- fix the compile warning in buffer.c (me)
- fix divide-by-zero on highmem initialisation DOH! (me)
- remove the pgd quicklist (suspicious ...) (DaveM, me)
rmap 12:
- keep some extra free memory on large machines (Arjan van de Ven, me)
- higher-order allocation bugfix (Adrian Drzewiecki)
- nr_free_buffer_pages() returns inactive + free mem (me)
- pages from unused objects directly to inactive_clean (me)
- use fast pte quicklists on non-pae machines (Andrea Arcangeli)
- remove sleep_on from wakeup_kswapd (Arjan van de Ven)
- page waitqueue cleanup (Christoph Hellwig)
rmap 11c:
- oom_kill race locking fix (Andres Salomon)
- elevator improvement (Andrew Morton)
- dirty buffer writeout speedup (hopefully ;)) (me)
- small documentation updates (me)
- page_launder() never does synchronous IO, kswapd
and the processes calling it sleep on higher level (me)
- deadlock fix in touch_page() (me)
rmap 11b:
- added low latency reschedule points in vmscan.c (me)
- make i810_dma.c include mm_inline.h too (William Lee Irwin)
- wake up kswapd sleeper tasks on OOM kill so the
killed task can continue on its way out (me)
- tune page allocation sleep point a little (me)
rmap 11a:
- don't let refill_inactive() progress count for OOM (me)
- after an OOM kill, wait 5 seconds for the next kill (me)
- agpgart_be fix for hashed waitqueues (William Lee Irwin)
rmap 11:
- fix stupid logic inversion bug in wakeup_kswapd() (Andrew Morton)
- fix it again in the morning (me)
- add #ifdef BROKEN_PPC_PTE_ALLOC_ONE to rmap.h, it
seems PPC calls pte_alloc() before mem_map[] init (me)
- disable the debugging code in rmap.c ... the code
is working and people are running benchmarks (me)
- let the slab cache shrink functions return a value
to help prevent early OOM killing (Ed Tomlinson)
- also, don't call the OOM code if we have enough
free pages (me)
- move the call to lru_cache_del into __free_pages_ok (Ben LaHaise)
- replace the per-page waitqueue with a hashed
waitqueue, reduces size of struct page from 64
bytes to 52 bytes (48 bytes on non-highmem machines) (William Lee Irwin)
rmap 10:
- fix the livelock for real (yeah right), turned out
to be a stupid bug in page_launder_zone() (me)
- to make sure the VM subsystem doesn't monopolise
the CPU, let kswapd and some apps sleep a bit under
heavy stress situations (me)
- let __GFP_HIGH allocations dig a little bit deeper
into the free page pool, the SCSI layer seems fragile (me)
rmap 9:
- improve comments all over the place (Michael Cohen)
- don't panic if page_remove_rmap() cannot find the
rmap in question, it's possible that the memory was
PG_reserved and belonging to a driver, but the driver
exited and cleared the PG_reserved bit (me)
- fix the VM livelock by replacing > by >= in a few
critical places in the pageout code (me)
- treat the reclaiming of an inactive_clean page like
allocating a new page, calling try_to_free_pages()
and/or fixup_freespace() if required (me)
- when low on memory, don't make things worse by
doing swapin_readahead (me)
rmap 8:
- add ANY_ZONE to the balancing functions to improve
kswapd's balancing a bit (me)
- regularize some of the maximum loop bounds in
vmscan.c for cosmetic purposes (William Lee Irwin)
- move page_address() to architecture-independent
code, now the removal of page->virtual is portable (William Lee Irwin)
- speed up free_area_init_core() by doing a single
pass over the pages and not using atomic ops (William Lee Irwin)
- documented the buddy allocator in page_alloc.c (William Lee Irwin)
rmap 7:
- clean up and document vmscan.c (me)
- reduce size of page struct, part one (William Lee Irwin)
- add rmap.h for other archs (untested, not for ARM) (me)
rmap 6:
- make the active and inactive_dirty list per zone,
this is finally possible because we can free pages
based on their physical address (William Lee Irwin)
- cleaned up William's code a bit (me)
- turn some defines into inlines and move those to
mm_inline.h (the includes are a mess ...) (me)
- improve the VM balancing a bit (me)
- add back inactive_target to /proc/meminfo (me)
rmap 5:
- fixed recursive buglet, introduced by directly
editing the patch for making rmap 4 ;))) (me)
rmap 4:
- look at the referenced bits in page tables (me)
rmap 3:
- forgot one FASTCALL definition (me)
rmap 2:
- teach try_to_unmap_one() about mremap() (me)
- don't assign swap space to pages with buffers (me)
- make the rmap.c functions FASTCALL / inline (me)
rmap 1:
- fix the swap leak in rmap 0 (Dave McCracken)
rmap 0:
- port of reverse mapping VM to 2.4.16 (me)


2003-05-31 14:35:05

by Rik van Riel

[permalink] [raw]
Subject: [PATCH] rmap 15j for 2.4.21-rc6

On Fri, 30 May 2003, Rik van Riel wrote:

> The tenth maintenance release of the 15th version of the reverse
> mapping based VM is now available.
> This is an attempt at making a more robust and flexible VM
> subsystem, while cleaning up a lot of code at the same time.
> The patch is available from:
>
> http://surriel.com/patches/2.4/2.4.21-pre7-rmap15j
> and http://linuxvm.bkbits.net/

Today I finally merged rmap15j forward to marcelo's latest
release. The IO stall fixes should be especially interesting:

http://surriel.com/patches/2.4/2.4.21-rc6-rmap15j

2003-05-31 15:39:22

by Michael Frank

[permalink] [raw]
Subject: Re: [PATCH] rmap 15j for 2.4.21-rc6

On Saturday 31 May 2003 22:48, Rik van Riel wrote:
>
> Today I finally merged rmap15j forward to marcelo's latest
> release. The IO stall fixes should be especially interesting:
>

Patched rc6 ex BK OK and compiled with gcc295-3 OK

On a P4/533-2.4Ghz/512MB with udma5 IDE ~50MB/s:

Shows severe interactivity problems and hangs

Scroll and mouse hangs and delayed response to keyboard
greater 1s are easily observable.

Test script: tstinter V0.1

http://www.ussg.iu.edu/hypermail/linux/kernel/0305.3/1291.html

To reproduce cd to dir with script and execute from X console:
./tstinter start

More instructions in script

Other kernel results:
2.4.18 PIO DIES - see msg w. script,
2.4.18 udma2 OK,
2.4.19 Bad,
2.4.20 Bad,
2.4.21-rc1 Bad,
2.4.21-rc6 OK
2.5.70,-mm1,-mm2,-mm3 (OK)

Regards
Michael

2003-06-01 04:43:47

by ghugh Song

[permalink] [raw]
Subject: Re: [PATCH] rmap 15j for 2.4.21-rc6

Michael Frank wrote:

>On a P4/533-2.4Ghz/512MB with udma5 IDE ~50MB/s:
>
>Shows severe interactivity problems and hangs
>
>Scroll and mouse hangs and delayed response to keyboard
>greater 1s are easily observable.
>
>Test script: tstinter V0.1

Unlike your case,
On a Athlon 2700+/166MHz DDR/1GB on a board having Nvidia2 chipset
with SCSI hard disks, it solved all my interactivity problem.
BTW, I do LaTeXing the most. I have never tested such test
scripts.

More information about my case can be found in

http://www.ussg.iu.edu/hypermail/linux/kernel/0305.3/1894.html

Regards,

G. Hugh Song


2003-06-01 06:36:53

by Michael Frank

[permalink] [raw]
Subject: Re: [PATCH] rmap 15j for 2.4.21-rc6

On Sunday 01 June 2003 12:57, ghugh Song wrote:

> On a Athlon 2700+/166MHz DDR/1GB on a board having Nvidia2 chipset
> with SCSI hard disks, it solved all my interactivity problem.
> BTW, I do LaTeXing the most.

Congratulations on winning at the roulette table of everyday use ;-)

> I have never tested such test scripts.

How many hours of users and developers time have been wasted on
"interactivity problems, locks and hangs" since 2.4.18 ?

Regards
Michael

2003-06-01 10:41:25

by Andreas Hartmann

[permalink] [raw]
Subject: Re: [PATCH] rmap 15j for 2.4.21-rc6






Michael Frank wrote:

> On Saturday 31 May 2003 22:48, Rik van Riel wrote:
>>
>> Today I finally merged rmap15j forward to marcelo's latest
>> release. The IO stall fixes should be especially interesting:
>>
>
> Patched rc6 ex BK OK and compiled with gcc295-3 OK
>
> On a P4/533-2.4Ghz/512MB with udma5 IDE ~50MB/s:
>
> Shows severe interactivity problems and hangs
>
> Scroll and mouse hangs and delayed response to keyboard
> greater 1s are easily observable.
>

Well, I did the test with 2.4.21rc6 after patching your script (I got syntax
errors):

When I'm using the script as seen in the patch, I'm getting problems with df
(it's mostly very lazy, about 20s delay or more), the load is 4, doing an
ls on some other directories is extremly slow. Mouse and keyboard are
hanging some times.
The write speed shown in xosview was between 1 and 15MB/s. Often the HD LED
was on, but no data seemed to be put to the HD.


My system:
AMD Athlon(tm) XP 2000+
512 MB RAM
Chipsatz Apollo KT266/A/333
WDC WD205AA, udma4, 20,5 GB
(multcount = 16 (on),I/O support = 1 (32-bit),
unmaskirq = 1 (on),using_dma = 1 (on),
readahead = 8 (on)
)
reiserfs


Regards,
Andreas Hartmann


--- /home/Andreas/tstinter Sun Jun 1 11:03:10 2003
+++ tstinter Sun Jun 1 11:27:18 2003
@@ -138,7 +138,7 @@
sleep 1
fi
#invoke dd write loops
- while (( i-- )); do
+ while (( i=`expr $i - 1` )); do
$TSTINTER _xwrite "$2" $i
sleep 1
done
@@ -163,13 +163,13 @@
#
_read)
if [ "$2" = "" ]; then
- count=100K
+ count=100000
else
count=$2
fi
- dd if=/dev/zero of=$TEMPFILE$3 bs=4K count=$count
+ dd if=/dev/zero of=$TEMPFILE$3 bs=4096 count=$count
while (( 1 )); do
- time dd if=$TEMPFILE$3 of=/dev/null bs=4K count=$count &> /dev/null
+ time dd if=$TEMPFILE$3 of=/dev/null bs=4096 count=$count &> /dev/null
done
;;
#
@@ -181,12 +181,12 @@
#
_write)
if [ "$2" = "" ]; then
- count=100K
+ count=100000
else
count=$2
fi
while (( 1 )); do
- time dd if=/dev/zero of=$TEMPFILE$3 bs=4K count=$count
+ time dd if=/dev/zero of=$TEMPFILE$3 bs=4096 count=$count
done
;;
#

2003-06-01 11:13:00

by Michael Frank

[permalink] [raw]
Subject: Re: [PATCH] rmap 15j for 2.4.21-rc6

On Sunday 01 June 2003 19:00, Andreas Hartmann wrote:
>
> Well, I did the test with 2.4.21rc6 after patching your script (I got
> syntax errors):

About your script changes, I like to make it portable, and I use the
following versions:

GNU bash, version 2.05b.0(1)-release (i386-redhat-linux-gnu)

dd (coreutils) 4.5.3

What shell and coreutils are you using?

Avoiding short counts is easy but avoiding C-style expressions is primitive

- count=100K
+ count=100000

- while (( i-- )); do
+ while (( i=`expr $i - 1` )); do

In your opinion are your changes more portable across a wide range of systems?

>
> When I'm using the script as seen in the patch, I'm getting problems with
> df (it's mostly very lazy, about 20s delay or more), the load is 4, doing
> an ls on some other directories is extremly slow. Mouse and keyboard are
> hanging some times.
> The write speed shown in xosview was between 1 and 15MB/s. Often the HD LED
> was on, but no data seemed to be put to the HD.
>

It has a hard time to read anything else, the slower the disk, the worse.

Suppose rmap undoes the fixes introduced in -rc6.

Have you tried -rc6 plain?


Regards
Michael

2003-06-02 13:15:58

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] rmap 15j for 2.4.21-rc6

On Sun, 1 Jun 2003, Michael Frank wrote:

> Suppose rmap undoes the fixes introduced in -rc6.

It shouldn't, unless bitkeeper made a serious merge
error (unlikely). The rmap patch should just change
VM code, not the IO improvements.

Having said that, I'd like to thank Michael Frank for
his script that reproduces problems. Now the problem
is reproducable I can try to fix it ...



2003-06-02 16:36:44

by Andreas Hartmann

[permalink] [raw]
Subject: Re: [PATCH] rmap 15j for 2.4.21-rc6

Michael Frank wrote:

> On Sunday 01 June 2003 19:00, Andreas Hartmann wrote:
>>
>> Well, I did the test with 2.4.21rc6 after patching your script (I got
>> syntax errors):
>
> About your script changes, I like to make it portable, and I use the
> following versions:
>
> GNU bash, version 2.05b.0(1)-release (i386-redhat-linux-gnu)

2.02.1(1)-release

>
> dd (coreutils) 4.5.3

dd (fileutils) 4.1

> What shell and coreutils are you using?

They are obviously much older :-).

> Avoiding short counts is easy but avoiding C-style expressions is
> primitive
>
> - count=100K
> + count=100000
>
> - while (( i-- )); do
> + while (( i=`expr $i - 1` )); do
>
> In your opinion are your changes more portable across a wide range of
> systems?

I didn't think at portability :-). I only made it working for me. Maybe
there are other persons out there who do have some old versions too - so
they can use this patch.

>> When I'm using the script as seen in the patch, I'm getting problems with
>> df (it's mostly very lazy, about 20s delay or more), the load is 4, doing
>> an ls on some other directories is extremly slow. Mouse and keyboard are
>> hanging some times.
>> The write speed shown in xosview was between 1 and 15MB/s. Often the HD
>> LED was on, but no data seemed to be put to the HD.
>>
>
> It has a hard time to read anything else, the slower the disk, the worse.
>
> Suppose rmap undoes the fixes introduced in -rc6.
>
> Have you tried -rc6 plain?

Yes - I only tested this kernel.



Regards,
Andreas Hartmann

2003-06-02 19:24:21

by Willy Tarreau

[permalink] [raw]
Subject: Re: [PATCH] rmap 15j for 2.4.21-rc6

Hi !

On Mon, Jun 02, 2003 at 06:55:48PM +0200, Andreas Hartmann wrote:
> Michael Frank wrote:
<...>
> > GNU bash, version 2.05b.0(1)-release (i386-redhat-linux-gnu)
>
> 2.02.1(1)-release
<...>
> > - while (( i-- )); do
> > + while (( i=`expr $i - 1` )); do
> >
> > In your opinion are your changes more portable across a wide range of
> > systems?
>
> I didn't think at portability :-). I only made it working for me. Maybe
> there are other persons out there who do have some old versions too - so
> they can use this patch.

Well, I found that i--/i++ don't work with bash-2.03 (present about everywhere)
but i=i-1 or i=i+1 work well. So at least, for portability, this could be
rewritten as "while (( i=i-1 )); do".

Cheers,
Willy

2003-06-03 02:57:30

by Andreas Hartmann

[permalink] [raw]
Subject: Re: [PATCH] rmap 15j for 2.4.21-rc6

Hi,

Willy Tarreau wrote:

> Hi !
>
> On Mon, Jun 02, 2003 at 06:55:48PM +0200, Andreas Hartmann wrote:
>> Michael Frank wrote:
> <...>
>> > GNU bash, version 2.05b.0(1)-release (i386-redhat-linux-gnu)
>>
>> 2.02.1(1)-release
> <...>
>> > - while (( i-- )); do
>> > + while (( i=`expr $i - 1` )); do
>> >
>> > In your opinion are your changes more portable across a wide range of
>> > systems?
>>
>> I didn't think at portability :-). I only made it working for me. Maybe
>> there are other persons out there who do have some old versions too - so
>> they can use this patch.
>
> Well, I found that i--/i++ don't work with bash-2.03 (present about
> everywhere) but i=i-1 or i=i+1 work well. So at least, for portability,
> this could be rewritten as "while (( i=i-1 )); do".

Works fine with 2.02.1, too.


Regards,
Andreas Hartmann