2005-10-10 13:04:55

by Wu Fengguang

[permalink] [raw]
Subject: [RFC] use radix_tree for non-resident page tracking

Hi Rik,
The CLOCK-Pro page replacement is quite appealing, and I'd like to
contribute an idea: How about store bookkeeping info of dropped pages
in-place in radix_tree?

The slots in radix_tree_node can be used for bookkeeping data when
the corresponding pages are dropped. When all pages in a radix_tree_node
have been dropped, it is registered in an array/list for delayed
reclaim.

It would be fast and simple:
- no cache-line pollution
- no extra lock (with Nick Piggin's great RCU improvement)
- ready to use lookup code

The memory footprint should be roughly the same, though with a whole
word of space for each page ;)

It can not cover the swap space, which should be handled differently
anyway, I guess.

Regards,
Wu Fengguang


2005-10-10 13:54:31

by Nick Piggin

[permalink] [raw]
Subject: Re: [RFC] use radix_tree for non-resident page tracking

WU Fengguang wrote:
> Hi Rik,
> The CLOCK-Pro page replacement is quite appealing, and I'd like to
> contribute an idea: How about store bookkeeping info of dropped pages
> in-place in radix_tree?
>
> The slots in radix_tree_node can be used for bookkeeping data when
> the corresponding pages are dropped. When all pages in a radix_tree_node
> have been dropped, it is registered in an array/list for delayed
> reclaim.
>
> It would be fast and simple:
> - no cache-line pollution
> - no extra lock (with Nick Piggin's great RCU improvement)

Just a note if you are looking into this - the last version of the
radix-tree lockless readside patches I made public IIRC had some
bugs in them which I have since fixed.

Thanks,
Nick

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-10-10 14:00:36

by Rik van Riel

[permalink] [raw]
Subject: Re: [RFC] use radix_tree for non-resident page tracking

On Mon, 10 Oct 2005, WU Fengguang wrote:

> The CLOCK-Pro page replacement is quite appealing, and I'd like to
> contribute an idea: How about store bookkeeping info of dropped pages
> in-place in radix_tree?

How are you going to get the inter-reference distance
this way?

I do not see how the radix tree provides you with the
refault distance, which is needed to estimate the
inter-reference distance.

--
All Rights Reversed

2005-10-10 15:18:51

by Wu Fengguang

[permalink] [raw]
Subject: Re: [RFC] use radix_tree for non-resident page tracking

On Mon, Oct 10, 2005 at 09:37:47PM +0800, Yingchao Zhou wrote:
> The paper "The Performance Impact of Kernel Prefetching On Buffer
> Cache Replacement Algorithms" appeared in SIGMETRICS05 gives some
> interesting relative results of replacement algorithm when taking
Thanks for the information, it should help my work on read-ahead.
I noticed similar problem when testing read-ahead behavior for fs-on-loop.
The basic conclusion is read-ahead in the block device level can
complement file level read-ahead, for the latter does not do inter-file
read-ahead. But it has to be limited/conservative, because the risk of
cache miss is much higher.
> into account of prefetching. How about the situation in CLOCK-Pro
> project?
Sorry, I'm new to CLOCK-Pro, too...

--
WU Fengguang
Dept. of Automation
University of Science and Technology of China
Hefei, Anhui

2005-10-10 16:16:39

by Wu Fengguang

[permalink] [raw]
Subject: Re: [RFC] use radix_tree for non-resident page tracking

On Mon, Oct 10, 2005 at 10:00:30AM -0400, Rik van Riel wrote:
> How are you going to get the inter-reference distance
> this way?
>
> I do not see how the radix tree provides you with the
> refault distance, which is needed to estimate the
> inter-reference distance.
How about taking down the current sum of `pgfree' in the slot?

--
WU Fengguang
Dept. of Automation
University of Science and Technology of China
Hefei, Anhui

2005-10-10 17:13:47

by Wu Fengguang

[permalink] [raw]
Subject: Re: [RFC] use radix_tree for non-resident page tracking

On Mon, Oct 10, 2005 at 12:37:48PM -0400, Rik van Riel wrote:
> > > How are you going to get the inter-reference distance
> > > this way?
> >
> > How about taking down the current sum of `pgfree' in the slot?
>
> Sounds like a good idea to me. Definately worth a try!
Thanks.
I now realized that counters in struct zone should be better candidates.

And the slots may be used to calculate the real inter-reference distances
for dropped pages. For the present pages, we can either add entries to each
page, or to radix_tree_node. I suspect the latter is enough because it should
work well with sequential reads and sequential reads are the majority of page
activities and the main source of flushing.

--
WU Fengguang
Dept. of Automation
University of Science and Technology of China
Hefei, Anhui

2005-10-10 17:18:43

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [RFC] use radix_tree for non-resident page tracking\

On Tue, Oct 11, 2005 at 12:20:04AM +0800, WU Fengguang wrote:
> On Mon, Oct 10, 2005 at 10:00:30AM -0400, Rik van Riel wrote:
> > How are you going to get the inter-reference distance
> > this way?
> >
> > I do not see how the radix tree provides you with the
> > refault distance, which is needed to estimate the
> > inter-reference distance.
> How about taking down the current sum of `pgfree' in the slot?

Check this radix implementation
http://marc.theaimsgroup.com/?l=linux-mm&m=112387857203221&w=2

Using a hashtable is much faster though, needs measurement.

2005-10-10 17:19:39

by Rik van Riel

[permalink] [raw]
Subject: Re: [RFC] use radix_tree for non-resident page tracking

On Tue, 11 Oct 2005, WU Fengguang wrote:

> I now realized that counters in struct zone should be better candidates.

Probably not, since a logical page could be swapped out from one
memory zone and paged back in to another.

--
All Rights Reversed

2005-10-11 03:14:04

by Wu Fengguang

[permalink] [raw]
Subject: Re: [RFC] use radix_tree for non-resident page tracking

On Mon, Oct 10, 2005 at 01:19:34PM -0400, Rik van Riel wrote:
> On Tue, 11 Oct 2005, WU Fengguang wrote:
>
> > I now realized that counters in struct zone should be better candidates.
>
> Probably not, since a logical page could be swapped out from one
> memory zone and paged back in to another.
I managed to keep the page aging of each zone balanced in the patch:
http://marc.theaimsgroup.com/?l=linux-kernel&m=112856866504476&w=2
Zones in nearby nodes are also made balanced somewhat.

The ages are currently measured by accumulated number of pages pushed into lru.
I wonder that has the side effect of making other activities of each zone
roughly balanced, too. It needs some testing. If so, we can just sum up counters
of each zone in the current node.

--
WU Fengguang
Dept. of Automation
University of Science and Technology of China
Hefei, Anhui