2002-01-23 17:15:50

by Rik van Riel

[permalink] [raw]
Subject: [PATCH *] rmap VM, version 12

The first release of the 12th version of the reverse
mapping based VM is now available.
This is an attempt at making a more robust and flexible VM
subsystem, while cleaning up a lot of code at the same time.
The patch is available from:

http://surriel.com/patches/2.4/2.4.17-rmap-12
and http://linuxvm.bkbits.net/

My big TODO items for a next release are:
- RSS ulimit enforcement
- auto-tuning readahead, readahead per VMA

rmap 12:
- keep some extra free memory on large machines (Arjan van de Ven, me)
- higher-order allocation bugfix (Adrian Drzewiecki)
- nr_free_buffer_pages() returns inactive + free mem (me)
- pages from unused objects directly to inactive_clean (me)
- use fast pte quicklists on non-pae machines (Andrea Arcangeli)
- remove sleep_on from wakeup_kswapd (Arjan van de Ven)
- page waitqueue cleanup (Christoph Hellwig)
rmap 11c:
- oom_kill race locking fix (Andres Salomon)
- elevator improvement (Andrew Morton)
- dirty buffer writeout speedup (hopefully ;)) (me)
- small documentation updates (me)
- page_launder() never does synchronous IO, kswapd
and the processes calling it sleep on higher level (me)
- deadlock fix in touch_page() (me)
rmap 11b:
- added low latency reschedule points in vmscan.c (me)
- make i810_dma.c include mm_inline.h too (William Lee Irwin)
- wake up kswapd sleeper tasks on OOM kill so the
killed task can continue on its way out (me)
- tune page allocation sleep point a little (me)
rmap 11a:
- don't let refill_inactive() progress count for OOM (me)
- after an OOM kill, wait 5 seconds for the next kill (me)
- agpgart_be fix for hashed waitqueues (William Lee Irwin)
rmap 11:
- fix stupid logic inversion bug in wakeup_kswapd() (Andrew Morton)
- fix it again in the morning (me)
- add #ifdef BROKEN_PPC_PTE_ALLOC_ONE to rmap.h, it
seems PPC calls pte_alloc() before mem_map[] init (me)
- disable the debugging code in rmap.c ... the code
is working and people are running benchmarks (me)
- let the slab cache shrink functions return a value
to help prevent early OOM killing (Ed Tomlinson)
- also, don't call the OOM code if we have enough
free pages (me)
- move the call to lru_cache_del into __free_pages_ok (Ben LaHaise)
- replace the per-page waitqueue with a hashed
waitqueue, reduces size of struct page from 64
bytes to 52 bytes (48 bytes on non-highmem machines) (William Lee Irwin)
rmap 10:
- fix the livelock for real (yeah right), turned out
to be a stupid bug in page_launder_zone() (me)
- to make sure the VM subsystem doesn't monopolise
the CPU, let kswapd and some apps sleep a bit under
heavy stress situations (me)
- let __GFP_HIGH allocations dig a little bit deeper
into the free page pool, the SCSI layer seems fragile (me)
rmap 9:
- improve comments all over the place (Michael Cohen)
- don't panic if page_remove_rmap() cannot find the
rmap in question, it's possible that the memory was
PG_reserved and belonging to a driver, but the driver
exited and cleared the PG_reserved bit (me)
- fix the VM livelock by replacing > by >= in a few
critical places in the pageout code (me)
- treat the reclaiming of an inactive_clean page like
allocating a new page, calling try_to_free_pages()
and/or fixup_freespace() if required (me)
- when low on memory, don't make things worse by
doing swapin_readahead (me)
rmap 8:
- add ANY_ZONE to the balancing functions to improve
kswapd's balancing a bit (me)
- regularize some of the maximum loop bounds in
vmscan.c for cosmetic purposes (William Lee Irwin)
- move page_address() to architecture-independent
code, now the removal of page->virtual is portable (William Lee Irwin)
- speed up free_area_init_core() by doing a single
pass over the pages and not using atomic ops (William Lee Irwin)
- documented the buddy allocator in page_alloc.c (William Lee Irwin)
rmap 7:
- clean up and document vmscan.c (me)
- reduce size of page struct, part one (William Lee Irwin)
- add rmap.h for other archs (untested, not for ARM) (me)
rmap 6:
- make the active and inactive_dirty list per zone,
this is finally possible because we can free pages
based on their physical address (William Lee Irwin)
- cleaned up William's code a bit (me)
- turn some defines into inlines and move those to
mm_inline.h (the includes are a mess ...) (me)
- improve the VM balancing a bit (me)
- add back inactive_target to /proc/meminfo (me)
rmap 5:
- fixed recursive buglet, introduced by directly
editing the patch for making rmap 4 ;))) (me)
rmap 4:
- look at the referenced bits in page tables (me)
rmap 3:
- forgot one FASTCALL definition (me)
rmap 2:
- teach try_to_unmap_one() about mremap() (me)
- don't assign swap space to pages with buffers (me)
- make the rmap.c functions FASTCALL / inline (me)
rmap 1:
- fix the swap leak in rmap 0 (Dave McCracken)
rmap 0:
- port of reverse mapping VM to 2.4.16 (me)

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/


2002-01-23 18:46:44

by David Miller

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

From: Rik van Riel <[email protected]>
Date: Wed, 23 Jan 2002 15:14:42 -0200 (BRST)

- use fast pte quicklists on non-pae machines (Andrea Arcangeli)

Does this work on SMP? I remember they were turned off because
they were simply broken on SMP.

The problem is that when vmalloc() or whatever kernel mappings change
you have to update all the quicklist page tables to match.

Andrea probably fixed this, I haven't looked at the patch.
If so, ignoreme.

2002-01-23 18:58:44

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

On Wed, 23 Jan 2002, David S. Miller wrote:

> From: Rik van Riel <[email protected]>
> Date: Wed, 23 Jan 2002 15:14:42 -0200 (BRST)
>
> - use fast pte quicklists on non-pae machines (Andrea Arcangeli)
>
> Does this work on SMP? I remember they were turned off because
> they were simply broken on SMP.
>
> The problem is that when vmalloc() or whatever kernel mappings change
> you have to update all the quicklist page tables to match.

Actually, this is just using the pte_free_fast() and
{get,free}_pgd_fast() functions on non-pae machines.

I think this should be safe, unless there is a way
we could pagefault from inside interrupts (but I don't
think we do that).

OTOH, the -preempt people will want to add preemption
protection from the fiddling with the local pte freelist ;)

> Andrea probably fixed this, I haven't looked at the patch.
> If so, ignoreme.

He doesn't seem to fix anything other than just switching
on these options, but I guess this is safe since it's with
the 00_ series of patches in -aa.

(I don't have good experiences with 20_highmem-debug-8,
with that patch in the system plain doesn't boot ;))

regards,

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/

2002-01-23 19:02:14

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12


Does this explain why my SMP box does not boot with rmap12 ? It works fine
with rmap11c.

Machine: 4x 500MHz Pentium Pro with 3GB RAM

When I tried to boot 2.4.17+rmap12, last message I see is

uncompressing linux ...
booting ..


Thanks,
Badari




"David S.
Miller" To: [email protected]
<davem@redhat. cc: [email protected], [email protected]
com> Subject: Re: [PATCH *] rmap VM, version 12
Sent by:
owner-linux-mm
@kvack.org


01/23/02 10:44
AM





- use fast pte quicklists on non-pae machines (Andrea
Arcangeli)

Does this work on SMP? I remember they were turned off because
they were simply broken on SMP.

The problem is that when vmalloc() or whatever kernel mappings change
you have to update all the quicklist page tables to match.

Andrea probably fixed this, I haven't looked at the patch.
If so, ignoreme.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/




2002-01-23 19:05:54

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

On Wed, 23 Jan 2002, Badari Pulavarty wrote:

> Does this explain why my SMP box does not boot with rmap12 ? It works fine
> with rmap11c.
>
> Machine: 4x 500MHz Pentium Pro with 3GB RAM
>
> When I tried to boot 2.4.17+rmap12, last message I see is
>
> uncompressing linux ...
> booting ..

At this point we're not even near using pagetables yet,
so I guess this is something else ...

(I'm not 100% sure, though)

kind regards,

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/

2002-01-23 19:08:04

by David Miller

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

From: Rik van Riel <[email protected]>
Date: Wed, 23 Jan 2002 16:57:58 -0200 (BRST)

On Wed, 23 Jan 2002, David S. Miller wrote:

> The problem is that when vmalloc() or whatever kernel mappings change
> you have to update all the quicklist page tables to match.

Actually, this is just using the pte_free_fast() and
{get,free}_pgd_fast() functions on non-pae machines.

Rofl, you can't just do that. The page tables cache caches the kernel
mappings and if you don't update them properly on SMP you die.

I am seeing reports of SMP failing with rmap12 but not previous
patches. You need to revert this I think.

2002-01-23 19:11:04

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

On Wed, 2002-01-23 at 13:57, Rik van Riel wrote:

> Actually, this is just using the pte_free_fast() and
> {get,free}_pgd_fast() functions on non-pae machines.
>
> I think this should be safe, unless there is a way
> we could pagefault from inside interrupts (but I don't
> think we do that).
>
> OTOH, the -preempt people will want to add preemption
> protection from the fiddling with the local pte freelist ;)

If you are using the stock mechanisms in include/asm/pgalloc.h they are
already made preempt-safe by the patch. ;)

Robert Love

2002-01-23 19:11:04

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12


Rik,

I just tried to boot 2.4.17+rmap12 turning off HIGHMEM and it booted just
fine.
So it has to do with some HIGHMEM change happend between rmap11c and
rmap12.

Does this help ?

Thanks,
Badari




Rik van Riel
<riel@conectiv To: Badari Pulavarty/Beaverton/IBM@IBMUS
a.com.br> cc: <[email protected]>, <[email protected]>
Sent by: Subject: Re: [PATCH *] rmap VM, version 12
owner-linux-mm
@kvack.org


01/23/02 11:05
AM





On Wed, 23 Jan 2002, Badari Pulavarty wrote:

> Does this explain why my SMP box does not boot with rmap12 ? It works
fine
> with rmap11c.
>
> Machine: 4x 500MHz Pentium Pro with 3GB RAM
>
> When I tried to boot 2.4.17+rmap12, last message I see is
>
> uncompressing linux ...
> booting ..

At this point we're not even near using pagetables yet,
so I guess this is something else ...

(I'm not 100% sure, though)

kind regards,

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/




2002-01-23 19:13:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

On Wed, Jan 23, 2002 at 05:05:13PM -0200, Rik van Riel wrote:
> > uncompressing linux ...
> > booting ..
>
> At this point we're not even near using pagetables yet,
> so I guess this is something else ...
>
> (I'm not 100% sure, though)

It happens when you crash before console initialization. VM is already
low level initialized there, but other CPUs should not have been booted yet.

Usual way to debug is to link with one of the patches that replace printk
with an "early_printk" that writes directly into the vga text buffer and
works without the console subsystem.

-andi

2002-01-23 19:21:24

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

On Wed, 23 Jan 2002, Badari Pulavarty wrote:

> I just tried to boot 2.4.17+rmap12 turning off HIGHMEM and it booted
> just fine. So it has to do with some HIGHMEM change happend between
> rmap11c and rmap12.
>
> Does this help ?

Yes. Time for a very very big DOH, the kind of
DOH that would make Homer Simpson blush ...

I think you're seeing a divide by zero on line
947 of page_alloc.c ... which also explains why
the highmem emulation patch wasn't a big success
here. ;)

I'll release an rmap-12a within the hour.

regards,

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/

2002-01-23 19:23:35

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

On Wed, 23 Jan 2002, David S. Miller wrote:

> Actually, this is just using the pte_free_fast() and
> {get,free}_pgd_fast() functions on non-pae machines.
>
> Rofl, you can't just do that. The page tables cache caches the kernel
> mappings and if you don't update them properly on SMP you die.

Umm, this list just contains _freed_ page tables without
any mappings, right ?

If there is some specific magic I'm missing, could you
please point me to the code I'm overlooking ? ;)

> I am seeing reports of SMP failing with rmap12 but not previous
> patches. You need to revert this I think.

Actually, the cause for Badari's bugreport is much more
stupid. If it wasn't so stupid I bet I'd have found it
earlier...

regards,

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/


2002-01-23 19:30:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

From: Rik van Riel <[email protected]>
Date: Wed, 23 Jan 2002 17:22:30 -0200 (BRST)

On Wed, 23 Jan 2002, David S. Miller wrote:

> Actually, this is just using the pte_free_fast() and
> {get,free}_pgd_fast() functions on non-pae machines.
>
> Rofl, you can't just do that. The page tables cache caches the kernel
> mappings and if you don't update them properly on SMP you die.

Umm, this list just contains _freed_ page tables without
any mappings, right ?

No.

If there is some specific magic I'm missing, could you
please point me to the code I'm overlooking ? ;)

Look at what get_pgd_slow() in pgalloc.h does, this is the
case where it isn't going to the cache and it is really allocating the
memory.

When the pgd comes fresh off the cache chain, it doesn't do any
of this stuff, it just gives you the cached PGD with all the PMD's
filled in already, including the kernel PMDs.

Hmmm... maybe the "we can fault on kernel mappings" thing takes
care of this because kernel PMDs can only appear, not go away.

2002-01-23 19:37:15

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

On Wed, 23 Jan 2002, David S. Miller wrote:

> If there is some specific magic I'm missing, could you
> please point me to the code I'm overlooking ? ;)
>
> Look at what get_pgd_slow() in pgalloc.h does, this is the
> case where it isn't going to the cache and it is really allocating the
> memory.

> Hmmm... maybe the "we can fault on kernel mappings" thing takes
> care of this because kernel PMDs can only appear, not go away.

OK, so only the _pgd_ quicklist is questionable and the
_pte_ quicklist is fine ?

regards,

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/

2002-01-23 20:20:44

by David Miller

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

From: Rik van Riel <[email protected]>
Date: Wed, 23 Jan 2002 17:36:35 -0200 (BRST)

OK, so only the _pgd_ quicklist is questionable and the
_pte_ quicklist is fine ?

That is my understanding.

2002-01-23 22:13:18

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

On Wed, 23 Jan 2002, David S. Miller wrote:
> From: Rik van Riel <[email protected]>
> Date: Wed, 23 Jan 2002 17:36:35 -0200 (BRST)
>
> OK, so only the _pgd_ quicklist is questionable and the
> _pte_ quicklist is fine ?
>
> That is my understanding.

OK, then I'll disable the quick pgd list for now.
Considering the fact that the number of pgds is
small anyway it's probably not too much of a benefit
either.

The pte quicklist will stay, however. ;)

regards,

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/

2002-01-24 03:46:36

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

On Wed, Jan 23, 2002 at 11:06:24AM -0800, David S. Miller wrote:
> On Wed, 23 Jan 2002, David S. Miller wrote:
>
> > The problem is that when vmalloc() or whatever kernel mappings change
> > you have to update all the quicklist page tables to match.
>
> Actually, this is just using the pte_free_fast() and
> {get,free}_pgd_fast() functions on non-pae machines.
>
> Rofl, you can't just do that. The page tables cache caches the kernel
> mappings and if you don't update them properly on SMP you die.

the cache we're talking about here cannot cache anything, whatever is in
this cache must contain no information at all, otherwise the kernel
would crash anyway immediatly. Such code was disabled for no good reason
and there was nothing to fix there.

Andrea

2002-01-24 03:50:14

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH *] rmap VM, version 12

On Wed, Jan 23, 2002 at 12:18:57PM -0800, David S. Miller wrote:
>
> OK, so only the _pgd_ quicklist is questionable and the
> _pte_ quicklist is fine ?
>
> That is my understanding.

pgd cache is fine too, the page fault will update the pgd using
swapper_pg_dir accordingly if needed. The swapper_pg_dir will only fault
in new pmd, it will never deallocate them (vfree only invalidates the
pte and free the pages), so it's safe. If vfree would deallocate them
just a simple context switch would break, no matter of the pgd cache.

Andrea