2006-08-01 12:10:27

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 1 of 13] Add apply_to_page_range() which applies a function to a pte range

Jeremy Fitzhardinge <[email protected]> writes:

> 2 files changed, 99 insertions(+)
> include/linux/mm.h | 5 ++
> mm/memory.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
> Add a new mm function apply_to_page_range() which applies a given
> function to every pte in a given virtual address range in a given mm
> structure. This is a generic alternative to cut-and-pasting the Linux
> idiomatic pagetable walking code in every place that a sequence of
> PTEs must be accessed.
>
> Although this interface is intended to be useful in a wide range of
> situations, it is currently used specifically by several Xen
> subsystems, for example: to ensure that pagetables have been allocated
> for a virtual address range, and to construct batched special
> pagetable update requests to map I/O memory (in ioremap()).

- You don't handle huge pages. For a generic function
that sounds like a problem.
- I believe there is a reason the kernel doesn't already have
a function like this. I seem to recall there being efficiency
and fast path arguments.
- Placing this code in mm/memory.c without a common consumer is
pure kernel bloat for everyone who doesn't use this function,
which is just about everyone.

Eric


2006-08-01 12:34:19

by Antonio Vargas

[permalink] [raw]
Subject: Re: [PATCH 1 of 13] Add apply_to_page_range() which applies a function to a pte range

On 8/1/06, Eric W. Biederman <[email protected]> wrote:
> Jeremy Fitzhardinge <[email protected]> writes:
>
> > 2 files changed, 99 insertions(+)
> > include/linux/mm.h | 5 ++
> > mm/memory.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> > Add a new mm function apply_to_page_range() which applies a given
> > function to every pte in a given virtual address range in a given mm
> > structure. This is a generic alternative to cut-and-pasting the Linux
> > idiomatic pagetable walking code in every place that a sequence of
> > PTEs must be accessed.
> >
> > Although this interface is intended to be useful in a wide range of
> > situations, it is currently used specifically by several Xen
> > subsystems, for example: to ensure that pagetables have been allocated
> > for a virtual address range, and to construct batched special
> > pagetable update requests to map I/O memory (in ioremap()).
>
> - You don't handle huge pages. For a generic function
> that sounds like a problem.
> - I believe there is a reason the kernel doesn't already have
> a function like this. I seem to recall there being efficiency
> and fast path arguments.

The proper trick for this is:

1. place you "for each page" code in a #define like so:

#define FOR_EACH_PAGE_INNER do{ ... code ... }while(0);

2. create your function in a separate .h file without the double-include guard

3. inside this code, exchange the indirect function call with your define name:

(*fn)(args); --> FOR_EACH_PAGE_INNER

4. document how the macro will receive certain variables from it's
outer scope, and should leave the "function result" in another one.

this in effect creates a different copy of the page walker for each
function, and inlines your code in it.. just like it would do with a
C++ template.

A place where you can see this technique working is the software
triangle filler from MESA.

The doubt is... is this acceptable regarding linux-kernel coding-style?


> - Placing this code in mm/memory.c without a common consumer is
> pure kernel bloat for everyone who doesn't use this function,
> which is just about everyone.
>


--
Greetz, Antonio Vargas aka winden of network

http://network.amigascne.org/
[email protected]
[email protected]

Every day, every year
you have to work
you have to study
you have to scene.

2006-08-01 21:13:31

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 1 of 13] Add apply_to_page_range() which applies a function to a pte range

* Eric W. Biederman ([email protected]) wrote:
> - You don't handle huge pages. For a generic function
> that sounds like a problem.
> - I believe there is a reason the kernel doesn't already have
> a function like this. I seem to recall there being efficiency
> and fast path arguments.
> - Placing this code in mm/memory.c without a common consumer is
> pure kernel bloat for everyone who doesn't use this function,
> which is just about everyone.

We got the opposite feedback the first time we posted this function.
Xen has some users, and I believe there's a couple in-tree functions we could
convert easily w/out overhead issues. It's generic and this is just the
infrastructure, I think we should leave it.

thanks,
-chris

2006-08-01 21:32:18

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1 of 13] Add apply_to_page_range() which applies a function to a pte range

On Tue, 1 Aug 2006, Chris Wright wrote:

> We got the opposite feedback the first time we posted this function.
> Xen has some users, and I believe there's a couple in-tree functions we could
> convert easily w/out overhead issues. It's generic and this is just the
> infrastructure, I think we should leave it.

Th generic method was proposed a number of times in the past including
by Nick Piggin and more recently by the page table abstraction layer
posted by Ian Wienand. See also

http://www.gelato.org/pdf/apr2006/gelato_ICE06apr_unsw.pdf
http://www.gelato.org/pdf/may2005/gelato_may2005_ia64vm_chubb_unsw.pdf.
http://lwn.net/Articles/124961/

Special functionality may be attached at various levels, and we are very
sensitive to changes in this area.

Would you please research this issue thoroughly and coordinate with others
who have the same interest?

2006-08-02 04:16:55

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 1 of 13] Add apply_to_page_range() which applies a function to a pte range

On Tue, 2006-08-01 at 14:31 -0700, Christoph Lameter wrote:
> On Tue, 1 Aug 2006, Chris Wright wrote:
>
> > We got the opposite feedback the first time we posted this function.
> > Xen has some users, and I believe there's a couple in-tree functions we could
> > convert easily w/out overhead issues. It's generic and this is just the
> > infrastructure, I think we should leave it.
>
> Th generic method was proposed a number of times in the past including
> by Nick Piggin and more recently by the page table abstraction layer
> posted by Ian Wienand. See also
>
> http://www.gelato.org/pdf/apr2006/gelato_ICE06apr_unsw.pdf
> http://www.gelato.org/pdf/may2005/gelato_may2005_ia64vm_chubb_unsw.pdf.
> http://lwn.net/Articles/124961/
>
> Special functionality may be attached at various levels, and we are very
> sensitive to changes in this area.

Hi Christoph,

Thanks for the pointers, but as you've been debating for 18 months now,
no patches are in the -mm tree or obviously about to go in, and this new
helper function is orthogonal to your work, I don't think it's
reasonable to delay this patch.

All we can reasonably do is push this function back into the xen part
of the kernel tree for now.

Rusty.
--
Help! Save Australia from the worst of the DMCA: http://linux.org.au/law

2006-08-02 05:19:25

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH 1 of 13] Add apply_to_page_range() which applies a function to a pte range

On Wed, 2 Aug 2006, Rusty Russell wrote:

> Thanks for the pointers, but as you've been debating for 18 months now,
> no patches are in the -mm tree or obviously about to go in, and this new
> helper function is orthogonal to your work, I don't think it's
> reasonable to delay this patch.

I have not been involved in this issue for a long time now.
You need to contact the people actively working on code like this.
Most important is likely Ian Wienand.

2006-08-02 06:45:13

by Ian Wienand

[permalink] [raw]
Subject: Re: [PATCH 1 of 13] Add apply_to_page_range() which applies a function to a pte range

On Tue, Aug 01, 2006 at 10:18:33PM -0700, Christoph Lameter wrote:
> I have not been involved in this issue for a long time now.
> You need to contact the people actively working on code like this.
> Most important is likely Ian Wienand.

Paul Davies <[email protected]> is the person actively working
on this project. I might note he has not been doing it un-announced;
see

http://marc.theaimsgroup.com/?l=linux-mm&m=115276500100695&w=2

for the latest patches, or some of the other links Cristoph pointed
out. I'm sure he'd love to talk to anyone about it :)

-i

2006-08-02 06:53:17

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 1 of 13] Add apply_to_page_range() which applies a function to a pte range

* Ian Wienand ([email protected]) wrote:
> On Tue, Aug 01, 2006 at 10:18:33PM -0700, Christoph Lameter wrote:
> > I have not been involved in this issue for a long time now.
> > You need to contact the people actively working on code like this.
> > Most important is likely Ian Wienand.
>
> Paul Davies <[email protected]> is the person actively working
> on this project. I might note he has not been doing it un-announced;
> see
>
> http://marc.theaimsgroup.com/?l=linux-mm&m=115276500100695&w=2
>
> for the latest patches, or some of the other links Cristoph pointed
> out. I'm sure he'd love to talk to anyone about it :)

Well that looks closer to the iterator here than some of the eariler
links. The apply_to_page_range is pretty trivial, will have to look at
Paul's patches to see if there's something we can use. This is just for
Xen's use ATM, so we can always revert to keeping it Xen local if Paul's
changes are heading upstream, and use them once they're in.

thanks,
-chris

2006-08-02 13:18:16

by Paul Cameron Davies

[permalink] [raw]
Subject: Re: [PATCH 1 of 13] Add apply_to_page_range() which applies a function to a pte range

On Tue, 1 Aug 2006, Chris Wright wrote:

> * Ian Wienand ([email protected]) wrote:
>> On Tue, Aug 01, 2006 at 10:18:33PM -0700, Christoph Lameter wrote:
>>> I have not been involved in this issue for a long time now.
>>> You need to contact the people actively working on code like this.
>>> Most important is likely Ian Wienand.
>>
>> Paul Davies <[email protected]> is the person actively working
>> on this project. I might note he has not been doing it un-announced;
>> see
>>
>> http://marc.theaimsgroup.com/?l=linux-mm&m=115276500100695&w=2
>>
>> for the latest patches, or some of the other links Cristoph pointed
>> out. I'm sure he'd love to talk to anyone about it :)
>
> Well that looks closer to the iterator here than some of the eariler
> links. The apply_to_page_range is pretty trivial, will have to look at
> Paul's patches to see if there's something we can use. This is just for
> Xen's use ATM, so we can always revert to keeping it Xen local if Paul's
> changes are heading upstream, and use them once they're in.
>
> thanks,
> -chris
>
Hi Chris

I understand you are looking for a generic iterator to operate on a set
of ptes within a given address range.

Unfortunately just about every iteration in the kernel now varies slightly
(usually at the pte directory level) in such a way that abstracting out
each iteration into generic iterators becomes somewhat problematic.

I classify them into read, build and dual iterators.

READ iterators: read a page table within a range and operate on the ptes
eg: unmap_page_range, change protection, msync ...

BUILD iterators: build a page table in a range while operating on the
ptes.
eg: vmap_pte_range, remap_pfn_range, ...

DUAL iterators: read and build a page table within a range and operate on
the ptes.
eg: copy_page_range (src and dst page tables different)
eg: mremap iterator (src and dst page tables the same)

A little over a year ago it was possible to abstract the iterators
into these classes because they were relatively untailored. During the
last year a fair bit of customisation has occured on many iterators and
some additional iterations added. Some of these iterators are performance
critical and their tailoring necessary. The ones that are less important
would need to be tidied up so that abstraction to generic iterators can
occur.

Because of the continual divergence of the iterators I have removed the
generic iterators where I would pass it a function across the interface
because they became too unweildy.

One possible solution is to have a set of critical and non critical
iterators. The non critical iterators would comprise a generic read and
build iterator.

However at this stage the Linux community seems relatively content to
access the page table data structure in an open fashion with tailored cut
and paste iterators. This method does have its advantages but it makes
changing the page table implementation more difficult.

Cheers

Paul Davies