LinuxLists.cc - [PATCH] x86: create array based interface to change page attribute

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

Dave Airlie wrote:
> When cpa was refactored to the new set_memory_ interfaces, it removed
> a special case fast path for AGP systems, which did a lot of page by page
> attribute changing and held the flushes until they were finished. The
> DRM memory manager also required this to get useable speeds.
>
> This introduces a new interface, which accepts an array of memory addresses
> to have attributes changed on and to flush once finished.
>
> Further changes to the AGP stack to actually use this interface will be
> published later.
>
> Signed-off-by: Dave Airlie <[email protected]>
> ---
> arch/x86/mm/pageattr-test.c | 12 ++-
> arch/x86/mm/pageattr.c | 164 +++++++++++++++++++++++++++++++-----------
> include/asm-x86/cacheflush.h | 3 +
> 3 files changed, 132 insertions(+), 47 deletions(-)
>
...
Dave,
Nice work, but how do we handle highmem pages? I know that they don't
need any attribute change since they're not in the kernel map, but both
the AGP module and the DRM memory manager typically hold highmem
addresses in their arrays, so the code should presumably detect those
and avoid them?

Since this is an AGPgart and DRM fastpath, the interface should ideally
be adapted to match the data structures used by those callers. The
AGPgart module uses an array of addresses, which effectively stops it
from using pages beyond the DMA32 limit. The DRM memory manager uses an
array of struct page pointers, but using that was, If I understand
things correctly, rejected.

So, if we, at some point, want to have an AGPgart module capable of
using anything beyond the DMA32 limit we will end up with an interface
that doesn't match neither AGPgart nor DRM, for which users the fastpath
was originally intended.

/Thomas

2008-03-31 07:25:27

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

Dave Airlie <[email protected]> writes:
>
> +#define CPA_FLUSHTLB 1
> +#define CPA_ARRAY 2

I don't think CPA_ARRAY should be a separate case. Rather single
page flushing should be an array with only a single entry. pageattr
is already very complex, no need to make add more special cases.
> +
> + /*
> + * Only flush present addresses:
> + */
> + if (pte && (pte_val(*pte) & _PAGE_PRESENT))
> + clflush_cache_range((void *) *addr, PAGE_SIZE);

Also it is doubtful clflush really makes sense on a large array. Just
doing wbinvd might be faster then. Or perhaps better supporting Self-Snoop
should be revisited, that would at least eliminate it on most Intel
CPUs.

-Andi

2008-03-31 07:56:18

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

Andi Kleen wrote:
> Dave Airlie <[email protected]> writes:
>
>>
>> +#define CPA_FLUSHTLB 1
>> +#define CPA_ARRAY 2
>>
>
> I don't think CPA_ARRAY should be a separate case. Rather single
> page flushing should be an array with only a single entry. pageattr
> is already very complex, no need to make add more special cases.
>
>> +
>> + /*
>> + * Only flush present addresses:
>> + */
>> + if (pte && (pte_val(*pte) & _PAGE_PRESENT))
>> + clflush_cache_range((void *) *addr, PAGE_SIZE);
>>
>
> Also it is doubtful clflush really makes sense on a large array. Just
> doing wbinvd might be faster then. Or perhaps better supporting Self-Snoop
> should be revisited, that would at least eliminate it on most Intel
> CPUs.
>
>
I agree that wbinvd() seems to be faster on large arrays on the
processors I've tested. But isn't there a severe latency problem with
that instruction, that makes people really want to avoid it in all
possible cases?

Also I think we need to clarify the semantics of the c_p_a
functionality. Right now both AGP and DRM relies on c_p_a doing an
explicit cache flush. Otherwise the data won't appear on the device side
of the aperture.
If we use self-snoop, the AGP and DRM drivers can't rely on this flush
being performed, and they have to do the flush themselves, and for
non-self-snooping processors, the flush needs to be done twice?

/Thomas

> -Andi
>

2008-03-31 08:34:46

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

> Also I think we need to clarify the semantics of the c_p_a
> functionality. Right now both AGP and DRM relies on c_p_a doing an
> explicit cache flush. Otherwise the data won't appear on the device side
> of the aperture.

But surely not in cpa I hope ? Or are you saying you first write data
and then do cpa? If true that would be quite an abuse of CPA
I would say and you should fix it ASAP.

> If we use self-snoop, the AGP and DRM drivers can't rely on this flush
> being performed, and they have to do the flush themselves, and for

They definitely should flush themselves if they want data to reach
the device. That is obviously required any time they reuse a page
too.

-Andi

2008-03-31 09:07:14

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

Andi Kleen wrote:
>> Also I think we need to clarify the semantics of the c_p_a
>> functionality. Right now both AGP and DRM relies on c_p_a doing an
>> explicit cache flush. Otherwise the data won't appear on the device side
>> of the aperture.
>>
>
> But surely not in cpa I hope ? Or are you saying you first write data
> and then do cpa? If true that would be quite an abuse of CPA
> I would say and you should fix it ASAP.
>
>
As AGP buffers are moved in- and out of AGP, the caching policy changes,
so yes, there may be writes to cache coherent memory that needs to be
flushed at some point. Since CPA has been doing that up to now, and the
codepaths involved are quite time-critical, a double cache-flush is a
no-no, so if this is left to the caller, we must be able to tell CPA
that any needed cache-flush has already been performed.
>> If we use self-snoop, the AGP and DRM drivers can't rely on this flush
>> being performed, and they have to do the flush themselves, and for
>>
>
> They definitely should flush themselves if they want data to reach
> the device. That is obviously required any time they reuse a page
> too.
>
Understood,
but then we *must* really find a way to say "don't flush the cache
again", perhaps part of Dave's array function?

/Thomas
> -Andi
>

2008-03-31 09:15:00

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

On Mon, Mar 31, 2008 at 11:06:16AM +0200, Thomas Hellstr?m wrote:
> Andi Kleen wrote:
> >>Also I think we need to clarify the semantics of the c_p_a
> >>functionality. Right now both AGP and DRM relies on c_p_a doing an
> >>explicit cache flush. Otherwise the data won't appear on the device side
> >>of the aperture.
> >>
> >
> >But surely not in cpa I hope ? Or are you saying you first write data
> >and then do cpa? If true that would be quite an abuse of CPA
> >I would say and you should fix it ASAP.
> >
> >
> As AGP buffers are moved in- and out of AGP, the caching policy changes,
> so yes, there may be writes to cache coherent memory that needs to be
> flushed at some point. Since CPA has been doing that up to now, and the
> codepaths involved are quite time-critical, a double cache-flush is a

That doesn't make sense. You shouldn't be using CPA in any
time critical code path. It will always be slow.

For anything time critical you should keep a pool of uncached pages
once set up using CPA and reuse them.

CPA really should only be used on initialization or for
larger setup changes which are ok to go somewhat slower.

> no-no, so if this is left to the caller, we must be able to tell CPA
> that any needed cache-flush has already been performed.

Sounds like a bad design.

> >>If we use self-snoop, the AGP and DRM drivers can't rely on this flush
> >>being performed, and they have to do the flush themselves, and for
> >>
> >
> >They definitely should flush themselves if they want data to reach
> >the device. That is obviously required any time they reuse a page
> >too.
> >
> Understood,
> but then we *must* really find a way to say "don't flush the cache
> again", perhaps part of Dave's array function?

The cache must be flushed in CPA, there is no way around it.

If you write data into the buffers before you do CPA on them
you could avoid it, but I don't think you should do that anywhere
near time critical code, so it actually shouldn't matter.

-Andi

2008-03-31 09:33:25

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

Thomas Hellstr?m wrote:
> Dave Airlie wrote:
>> When cpa was refactored to the new set_memory_ interfaces, it removed
>> a special case fast path for AGP systems, which did a lot of page by page
>> attribute changing and held the flushes until they were finished. The
>> DRM memory manager also required this to get useable speeds.
>>
>> This introduces a new interface, which accepts an array of memory
>> addresses
>> to have attributes changed on and to flush once finished.
>>
>> Further changes to the AGP stack to actually use this interface will be
>> published later.
>>
>> Signed-off-by: Dave Airlie <[email protected]>
>> ---
>> arch/x86/mm/pageattr-test.c | 12 ++-
>> arch/x86/mm/pageattr.c | 164
>> +++++++++++++++++++++++++++++++-----------
>> include/asm-x86/cacheflush.h | 3 +
>> 3 files changed, 132 insertions(+), 47 deletions(-)
>>
> ...
> Dave,
> Nice work, but how do we handle highmem pages?

Cache attributes fundamentally work on a mapping and not on physical memory.
(MTRR's are special there, they do work on physical memory, but that's a special case and not relevant here)

So to be honest, your question doesn't make sense; because all I can do is ask "which mapping of these pages".

Even the old interfaces prior to 2.6.24 only managed to deal with SOME of the mappings of a page.
And if/when future CPUs don't require all mappings to be in sync, clearly the kernel will only change
the mapping that is requested as well.

> Since this is an AGPgart and DRM fastpath, the interface should ideally
> be adapted to match the data structures used by those callers.

Actually, the interface has to make sense conceptually convey the right information,
the implementation should not have to second guess internals of AGP/DRM because that
would just be a recipe for disaster.
>The
> AGPgart module uses an array of addresses, which effectively stops it
> from using pages beyond the DMA32 limit. The DRM memory manager uses an
> array of struct page pointers, but using that was, If I understand
> things correctly, rejected.

yes because simply put, if you pass a struct page to such a function, you're not telling it which
mapping or mappings you want changed....
(And if you say "only the 1:1 mapping, so why doesn't the other side calculate that"... there's no speed gain in doing
the calculation for that on the other side of an interface, and that makes it zero reason to misdesign the interface
to only have the "which mapping" information implicit)

2008-03-31 09:57:36

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

Andi Kleen wrote:

> Also it is doubtful clflush really makes sense on a large array. Just
> doing wbinvd might be faster then.

wbinvd is rarely a good idea; think about it... it'll flush 12Mb of cache *per socket* in one instruction.
(on a modern Intel consumer grade CPU, more on the enterprise ones)
This doesn't only impact the current logical thread, but ALL of the threads in the system, since the cache
coherency needs to be preserved, all have to go empty at the same time.
Forget real time... this can take really long even for non-realtime users ;)

At least clflush breaks this up into smaller pieces so total latency won't suck entirely.

2008-03-31 11:05:52

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

Arjan van de Ven wrote:
> Thomas Hellstr?m wrote:
>> Dave Airlie wrote:
>>> When cpa was refactored to the new set_memory_ interfaces, it removed
>>> a special case fast path for AGP systems, which did a lot of page by
>>> page
>>> attribute changing and held the flushes until they were finished. The
>>> DRM memory manager also required this to get useable speeds.
>>>
>>> This introduces a new interface, which accepts an array of memory
>>> addresses
>>> to have attributes changed on and to flush once finished.
>>>
>>> Further changes to the AGP stack to actually use this interface will be
>>> published later.
>>>
>>> Signed-off-by: Dave Airlie <[email protected]>
>>> ---
>>> arch/x86/mm/pageattr-test.c | 12 ++-
>>> arch/x86/mm/pageattr.c | 164
>>> +++++++++++++++++++++++++++++++-----------
>>> include/asm-x86/cacheflush.h | 3 +
>>> 3 files changed, 132 insertions(+), 47 deletions(-)
>>>
>> ...
>> Dave,
>> Nice work, but how do we handle highmem pages?
>
> Cache attributes fundamentally work on a mapping and not on physical
> memory.
> (MTRR's are special there, they do work on physical memory, but that's
> a special case and not relevant here)
>
> So to be honest, your question doesn't make sense; because all I can
> do is ask "which mapping of these pages".
>
> Even the old interfaces prior to 2.6.24 only managed to deal with SOME
> of the mappings of a page.
> And if/when future CPUs don't require all mappings to be in sync,
> clearly the kernel will only change
> the mapping that is requested as well.
>
>
>
>> Since this is an AGPgart and DRM fastpath, the interface should
>> ideally be adapted to match the data structures used by those callers.
>
> Actually, the interface has to make sense conceptually convey the
> right information,
> the implementation should not have to second guess internals of
> AGP/DRM because that
> would just be a recipe for disaster.
>> The AGPgart module uses an array of addresses, which effectively
>> stops it from using pages beyond the DMA32 limit. The DRM memory
>> manager uses an array of struct page pointers, but using that was, If
>> I understand things correctly, rejected.
>
> yes because simply put, if you pass a struct page to such a function,
> you're not telling it which
> mapping or mappings you want changed....
> (And if you say "only the 1:1 mapping, so why doesn't the other side
> calculate that"... there's no speed gain in doing
> the calculation for that on the other side of an interface, and that
> makes it zero reason to misdesign the interface
> to only have the "which mapping" information implicit)
>
Hmm. I get the point. There should be ways to do reasonably efficient
workarounds in the drivers.

/Thomas

2008-03-31 11:11:35

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

Andi Kleen wrote:
> On Mon, Mar 31, 2008 at 11:06:16AM +0200, Thomas Hellstr?m wrote:
>
>> Andi Kleen wrote:
>>
>>>> Also I think we need to clarify the semantics of the c_p_a
>>>> functionality. Right now both AGP and DRM relies on c_p_a doing an
>>>> explicit cache flush. Otherwise the data won't appear on the device side
>>>> of the aperture.
>>>>
>>>>
>>> But surely not in cpa I hope ? Or are you saying you first write data
>>> and then do cpa? If true that would be quite an abuse of CPA
>>> I would say and you should fix it ASAP.
>>>
>>>
>>>
>> As AGP buffers are moved in- and out of AGP, the caching policy changes,
>> so yes, there may be writes to cache coherent memory that needs to be
>> flushed at some point. Since CPA has been doing that up to now, and the
>> codepaths involved are quite time-critical, a double cache-flush is a
>>
>
> That doesn't make sense. You shouldn't be using CPA in any
> time critical code path. It will always be slow.
>
> For anything time critical you should keep a pool of uncached pages
> once set up using CPA and reuse them.
>
> CPA really should only be used on initialization or for
> larger setup changes which are ok to go somewhat slower.
>
Let me rehprase. Not really time-critical but it is of some importance
that CPA is done quickly.
We're dealing with the tradeoff of reading from uncached device memory
vs taking the pages out of
AGP, setting up a cache-coherent mapping, read and then change back.
What we'd really would like to set up is a pool of completely unmapped
(like highmem) pages. Then we could, to a large extent, avoid the CPA calls.

>
>
> The cache must be flushed in CPA, there is no way around it.
>
> If you write data into the buffers before you do CPA on them
> you could avoid it, but I don't think you should do that anywhere
> near time critical code, so it actually shouldn't matter.
>
> -Andi
>
But then we wouldn't really be discussing SS either? For DRM purposes,
the more performance we can squeeze out of CPA, the better.

/Thomas

2008-03-31 11:21:28

by Dave Airlie

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

On Mon, Mar 31, 2008 at 5:25 PM, Andi Kleen <[email protected]> wrote:
> Dave Airlie <[email protected]> writes:
> >
> > +#define CPA_FLUSHTLB 1
> > +#define CPA_ARRAY 2
>
> I don't think CPA_ARRAY should be a separate case. Rather single
> page flushing should be an array with only a single entry. pageattr
> is already very complex, no need to make add more special cases.

I thought about this but the current interface takes a start address
and number of pages from that point to cpa,
the array interface takes an array of page sized pages.

I don't really think we need to generate an array in the first case
with all the pages in it..

Dave.

2008-03-31 11:43:18

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

On Mon, Mar 31, 2008 at 09:21:19PM +1000, Dave Airlie wrote:
> On Mon, Mar 31, 2008 at 5:25 PM, Andi Kleen <[email protected]> wrote:
> > Dave Airlie <[email protected]> writes:
> > >
> > > +#define CPA_FLUSHTLB 1
> > > +#define CPA_ARRAY 2
> >
> > I don't think CPA_ARRAY should be a separate case. Rather single
> > page flushing should be an array with only a single entry. pageattr
> > is already very complex, no need to make add more special cases.
>
> I thought about this but the current interface takes a start address
> and number of pages from that point to cpa,
> the array interface takes an array of page sized pages.
>
> I don't really think we need to generate an array in the first case
> with all the pages in it..

Just put the length into the array members too.

-Andi

2008-03-31 16:10:38

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

Thomas Hellstr?m wrote:

> Let me rehprase. Not really time-critical but it is of some importance
> that CPA is done quickly.
> We're dealing with the tradeoff of reading from uncached device memory

uncached or write combining ?

> vs taking the pages out of
> AGP, setting up a cache-coherent mapping, read and then change back.
> What we'd really would like to set up is a pool of completely unmapped
> (like highmem) pages. Then we could, to a large extent, avoid the CPA
> calls.

changing attributes by nature means a tlb flush and a bunch of expensive cache work.
That's never going to be cheap, I guess it all depends on how much work you do
on the memory for it to pay off or not...

2008-03-31 16:42:34

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

Arjan van de Ven wrote:
> Thomas Hellstr?m wrote:
>
>> Let me rehprase. Not really time-critical but it is of some
>> importance that CPA is done quickly.
>> We're dealing with the tradeoff of reading from uncached device memory
>
> uncached or write combining ?
The user-space mappings (the ones that we really use) are usually
write-combined, whereas the kernel mappings are uncached. (I think this
is OK since both mapping types implies no cache coherency). Even if
(IIRC) write combining is theoretically prefetchable, some devices give
read speeds around 9MB/s.
>
>> vs taking the pages out of
>> AGP, setting up a cache-coherent mapping, read and then change back.
>> What we'd really would like to set up is a pool of completely
>> unmapped (like highmem) pages. Then we could, to a large extent,
>> avoid the CPA calls.
>
> changing attributes by nature means a tlb flush and a bunch of
> expensive cache work.
> That's never going to be cheap, I guess it all depends on how much
> work you do
> on the memory for it to pay off or not...
Indeed. Actually with the new non-wbinvd() CPA, We seem to benefit
already if the buffer is a single page, though it's probably hard to
measure the impact of repopulating the tlb.

/Thomas

2008-03-31 16:55:54

[permalink] [raw]

Subject: Re: [PATCH] x86: create array based interface to change page attribute

Thomas Hellstr?m wrote:
> Arjan van de Ven wrote:
>> Thomas Hellstr?m wrote:
>>
>>> Let me rehprase. Not really time-critical but it is of some
>>> importance that CPA is done quickly.
>>> We're dealing with the tradeoff of reading from uncached device memory
>>
>> uncached or write combining ?
> The user-space mappings (the ones that we really use) are usually
> write-combined, whereas the kernel mappings are uncached. (I think this
> is OK since both mapping types implies no cache coherency).

This is not officially allowed and may tripple fault your cpu..
To comply with the spec one needs to have ALL mappings the same unfortunately.
(And yes, this is a hard problem)

>Even if
> (IIRC) write combining is theoretically prefetchable, some devices give
> read speeds around 9MB/s.
>>
>>> vs taking the pages out of
>>> AGP, setting up a cache-coherent mapping, read and then change back.
>>> What we'd really would like to set up is a pool of completely
>>> unmapped (like highmem) pages. Then we could, to a large extent,
>>> avoid the CPA calls.
>>
>> changing attributes by nature means a tlb flush and a bunch of
>> expensive cache work.
>> That's never going to be cheap, I guess it all depends on how much
>> work you do
>> on the memory for it to pay off or not...
> Indeed. Actually with the new non-wbinvd() CPA, We seem to benefit
> already if the buffer is a single page, though it's probably hard to
> measure the impact of repopulating the tlb.
>
> /Thomas
>
>
>

2008-03-31 17:27:23