2002-01-05 16:09:55

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH] updated version of radix-tree pagecache

I've just uploaded an updated version of Momchil Velikov's patch for a
scalable pagecache using radix trees. The patch can be found at:

ftp://ftp.kernel.org/pub/linux/kernel/people/hch/patches/v2.4/2.4.17/linux-2.4.17-ratpagecache.patch.gz

ftp://ftp.kernel.org/pub/linux/kernel/people/hch/patches/v2.4/2.4.17/linux-2.4.17-ratpagecache.patch.bz2

It contains a number of fixed and improvements by Momchil and me.

The basic advantage over the old version (besides the fixes :)) is that
the radix tree implementation is now independand of struct page /
struct address_space and thus can easily be used in other code.



=== Changelog ===


Momchil Velikov:

- It was possible to return a PG_locked page to the buddy
allocator with a subsequent oops, if the call to rat_insert in
__add_to_page_cache failed. Thus the functions is changed as to
avoid modifying the pages before rat_insert was
successful. Somewhat paranoid, I changed add_page_cache_locked
too.
- shmem_writepage was causing an infinite looping deadlock, when a
couple of processes was yielding for kswapd, _including kswapd
itself_.
- Initialized swapper_space. On some architectures the spinlock is
initilized to 0 on some to 1, who knows maybe there are/will be
others. I have no idea why this didn't break the test on OSDL's
4- and 8-way boxes.

Me:

- moved rat.c from mm/ to lib/.
- new structure: rat_root containing root-node, height and gfp_mask.
- changed rat_* arguments to struct rat_root * and void *.
- change struct page * arguments to void *.
- moved all declarations in rat.h that are not public to rat.c
- replaced page_cache_init() by ratcache_init() in rat.c.
- rat_node slab handling moved to rat.c
- in swap_state.c removed 0/NULL initializers that aren't needed.
- replaced __find_get_page/__find_lock_page with non-prefixed versions.
- added kdoc-style comments to rat.c.
- fixed up whitespaces in function declarations to math Linux style.


2002-01-07 10:04:07

by Peter Wächtler

[permalink] [raw]
Subject: Re: [PATCH] updated version of radix-tree pagecache

Christoph Hellwig schrieb:
>
> [please Cc [email protected] and lkml on reply]
>
> I've just uploaded an updated version of Momchil Velikov's patch for a
> scalable pagecache using radix trees. The patch can be found at:
>
> It contains a number of fixed and improvements by Momchil and me.
>

Can you sum up the advantages of this implementation?
I think it scales better on "big systems" where otherwise you end up with many
pages on the same hash?

Is it beneficial for small systems? (I think not)

2002-01-07 10:08:16

by Momchil Velikov

[permalink] [raw]
Subject: Re: [PATCH] updated version of radix-tree pagecache

>>>>> "Peter" == Peter W?chtler <[email protected]> writes:
Peter> Is it beneficial for small systems? (I think not)

Does it hurt performance on small systems? (I think not)


2002-01-07 11:05:00

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] updated version of radix-tree pagecache

Christoph Hellwig schrieb:
>> [please Cc [email protected] and lkml on reply]
>>
>> I've just uploaded an updated version of Momchil Velikov's patch for a
>> scalable pagecache using radix trees. The patch can be found at:
>>
>> It contains a number of fixed and improvements by Momchil and me.

On Mon, Jan 07, 2002 at 11:05:08AM +0100, Peter W?chtler wrote:
> Can you sum up the advantages of this implementation?
> I think it scales better on "big systems" where otherwise you end up
> with many pages on the same hash?
>
> Is it beneficial for small systems? (I think not)

I speculate this would be good for small systems as well as it reduces
the size of struct page by 2*sizeof(unsigned long) bytes, allowing more
incremental allocation of pagecache metadata. I haven't tried it on my
smaller systems yet (due to lack of disk space and needing to build the
cross-toolchains), though I'm now curious as to its exact behavior there.

Has anyone tried to do accounting on the radix tree metadata overhead yet?

Cheers,
Bill

2002-01-07 12:50:19

by Daniel Phillips

[permalink] [raw]
Subject: Re: [PATCH] updated version of radix-tree pagecache

On January 7, 2002 12:03 pm, William Lee Irwin III wrote:
> On Mon, Jan 07, 2002 at 11:05:08AM +0100, Peter W?chtler wrote:
> > Can you sum up the advantages of this implementation?
> > I think it scales better on "big systems" where otherwise you end up
> > with many pages on the same hash?
> >
> > Is it beneficial for small systems? (I think not)
>
> I speculate this would be good for small systems as well as it reduces
> the size of struct page by 2*sizeof(unsigned long) bytes, allowing more
> incremental allocation of pagecache metadata. I haven't tried it on my
> smaller systems yet (due to lack of disk space and needing to build the
> cross-toolchains), though I'm now curious as to its exact behavior there.

Benchmark it on UML. In my experience, performance on UML is quite predictive of
performance on native systems.

--
Daniel

2002-01-07 12:49:35

by Daniel Phillips

[permalink] [raw]
Subject: Re: [PATCH] updated version of radix-tree pagecache

On January 7, 2002 12:03 pm, William Lee Irwin III wrote:
> Christoph Hellwig schrieb:
> >> [please Cc [email protected] and lkml on reply]
> >>
> >> I've just uploaded an updated version of Momchil Velikov's patch for a
> >> scalable pagecache using radix trees. The patch can be found at:
> >>
> >> It contains a number of fixed and improvements by Momchil and me.

2002-01-18 16:49:21

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] updated version of radix-tree pagecache

Hi!

> I speculate this would be good for small systems as well as it reduces
> the size of struct page by 2*sizeof(unsigned long) bytes, allowing more
> incremental allocation of pagecache metadata. I haven't tried it on my
> smaller systems yet (due to lack of disk space and needing to build the
> cross-toolchains), though I'm now curious as to its exact behavior there.

Why not mem=8M, nosmp on your "big" system?
Pavel

--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.