Hi all (glibc + kernel folks).
So this isn't totally a kernel issue however I'm sure everyone who can
help is around here somewhere.
So in the kernel we have special memcpy_(from,to)io functions that are
used to copy data in out of PCI space,
however when we expose PCI device memory to userspace it has no way to
know the mapping it has been provided
is suitable for optimised userspace copy operations or not.
so eg. on certain IA64 platforms, doing a memcpy on a mmaped PCI
memory area can cause a hard lock.
Now I started to try and fix this in X but I'm wondering if this is
something glibc/kernel can solve between them.
I was firstly thinking about adding memcpy_io/memset_io/str*_io
options to glibc and just have userspace use them.
however this means code that operates on non-IO space objects gets
penalised in those cases. e.g. a sw renderer rendering
to a SW surface vs the same sw renderer rendering to a surface in
video memory. the renderer really doesn't know where the
surface is actually underneath the hood.
So further thinking about this it would be nice if the standard
memcpy/memset/str* realised, hey I'm working on a memory address or
VMA
that is a IO mapping, I really shouldn't do shiny prefetch stuff on
this. I'm not 100% sure how this could be implemented, some sort of
private
mmap return value or flag like MAP_NO_OPTIMISE (I realise this is
going the wrong way as the kernel should tell glibc it, do we even
have a channel for this info?).
Then when memory op is done it checks the memory dest/src to see if
the segment is allowed to optimise or not.
I'm sure this has come up before and I'm sure I'll either wish I never
posted this or someone will show me the crisp corpse of the last guy
who suggested it.
Dave.
On Thu, Dec 4, 2008 at 10:40 PM, Dave Airlie <[email protected]> wrote:
> I'm sure this has come up before and I'm sure I'll either wish I never
> posted this or someone will show me the crisp corpse of the last guy
> who suggested it.
Do you plan to prevent the compiler from issuing the same sorts of
instructions that might appear in an optimized memcpy?
Isn't it dangerous to have memory that doesn't behave like normal
memory, and yet try to treat it like normal memory?
This mismatch of abstractions is a warning that must not be ignored.
IMHO you need a thin library with a new API (adding support for HAL
would be a clever way to check for PCI devices and perhaps even
mappings).
Cheers,
Carlos.
From: "Carlos O'Donell" <[email protected]>
Date: Fri, 5 Dec 2008 12:32:04 -0500
> On Thu, Dec 4, 2008 at 10:40 PM, Dave Airlie <[email protected]> wrote:
> > I'm sure this has come up before and I'm sure I'll either wish I never
> > posted this or someone will show me the crisp corpse of the last guy
> > who suggested it.
>
> Do you plan to prevent the compiler from issuing the same sorts of
> instructions that might appear in an optimized memcpy?
>
> Isn't it dangerous to have memory that doesn't behave like normal
> memory, and yet try to treat it like normal memory?
>
> This mismatch of abstractions is a warning that must not be ignored.
This is basically my opinion as well.
You'll pretty much need to surround accesses to these places with
accessor macros that do whatever is necessary on a given platform and
avoids the "dangerous" instructions in cases like IA64.
Treating them like normal memory isn't going to work on all systems.
BTW, the sunffb xorg driver has special code for "graphics copy"
which is essentially just a scanline by scanline GCOPY using the
MMX like stuff sparc64 has. It also is mindful of avoiding access
patterns that are known to lock up that chip :)
That's just an aside, since sunffb doesn't provide any offscreen
pixmap memory and thus shouldn't be susceptible to this problem being
discussed here.
There is never going to be something in memcpy that makes any magical
kernel calls to find out about an address. That's just loony, sorry.
memcpy is inlined away by the compiler in good cases. It's like you'd
asked for "*ptr" to have magical constraints by the compiler generating
code to ask the kernel if "ptr" is a special address. Uh, really?
If you can think of an efficient way to determine it and do what you need,
then you could write optimized routines that do that in the vDSO perhaps.
Thanks,
Roland
On Sat, Dec 6, 2008 at 6:27 AM, Roland McGrath <[email protected]> wrote:
> There is never going to be something in memcpy that makes any magical
> kernel calls to find out about an address. That's just loony, sorry.
> memcpy is inlined away by the compiler in good cases. It's like you'd
> asked for "*ptr" to have magical constraints by the compiler generating
> code to ask the kernel if "ptr" is a special address. Uh, really?
>
Yeah I didn't think it was a good idea, its what HPUX does, so that
implied it probably wasn't a good idea.
I was just hoping it might help people come up with a good idea.
Dave.
> If you can think of an efficient way to determine it and do what you need,
> then you could write optimized routines that do that in the vDSO perhaps.
>
>
> Thanks,
> Roland
>
On Sat, Dec 6, 2008 at 6:22 AM, David Miller <[email protected]> wrote:
> From: "Carlos O'Donell" <[email protected]>
> Date: Fri, 5 Dec 2008 12:32:04 -0500
>
>> On Thu, Dec 4, 2008 at 10:40 PM, Dave Airlie <[email protected]> wrote:
>> > I'm sure this has come up before and I'm sure I'll either wish I never
>> > posted this or someone will show me the crisp corpse of the last guy
>> > who suggested it.
>>
>> Do you plan to prevent the compiler from issuing the same sorts of
>> instructions that might appear in an optimized memcpy?
>>
>> Isn't it dangerous to have memory that doesn't behave like normal
>> memory, and yet try to treat it like normal memory?
>>
>> This mismatch of abstractions is a warning that must not be ignored.
>
> This is basically my opinion as well.
>
> You'll pretty much need to surround accesses to these places with
> accessor macros that do whatever is necessary on a given platform and
> avoids the "dangerous" instructions in cases like IA64.
>
> Treating them like normal memory isn't going to work on all systems.
Its a real pain in the ass with dynamic buffer objects, we don't want userspace
to care where they are located, the kernel migrates them in/out of
video memory, GART, local RAM etc.
However I suspect I just need on these platforms to ban any CPU
accesses to pixmaps in VRAM. However
sw fallbacks to the front buffer will always need these accesses.
Its going to be a real pain getting any traction this stuff upstream
(X.org/Mesa) where the world is x86 and maybe the odd powerpc, having
to do special accessors for shithouse hw is never going to be fun.
Maybe I should start libshithouse to encapsulate the problem, I'll
think about it some more.
Dave.
> BTW, the sunffb xorg driver has special code for "graphics copy"
> which is essentially just a scanline by scanline GCOPY using the
> MMX like stuff sparc64 has. It also is mindful of avoiding access
> patterns that are known to lock up that chip :)
>
> That's just an aside, since sunffb doesn't provide any offscreen
> pixmap memory and thus shouldn't be susceptible to this problem being
> discussed here.
>
>
On Sat, Dec 6, 2008 at 1:34 AM, Dave Airlie <[email protected]> wrote:
> Its a real pain in the ass with dynamic buffer objects, we don't want userspace
> to care where they are located, the kernel migrates them in/out of
> video memory, GART, local RAM etc.
>
> However I suspect I just need on these platforms to ban any CPU
> accesses to pixmaps in VRAM. However
> sw fallbacks to the front buffer will always need these accesses.
>
> Its going to be a real pain getting any traction this stuff upstream
> (X.org/Mesa) where the world is x86 and maybe the odd powerpc, having
> to do special accessors for shithouse hw is never going to be fun.
Is there no case on x86 when this matters?
What about ARM, ColdFire or MIPS?
As the embedded market continues to grow I hope to see X.org/Mesa on
more hardware with different memory access rules.
Cheers,
Carlos.
Carlos O'Donell wrote:
> On Sat, Dec 6, 2008 at 1:34 AM, Dave Airlie <[email protected]> wrote:
>> Its a real pain in the ass with dynamic buffer objects, we don't want userspace
>> to care where they are located, the kernel migrates them in/out of
>> video memory, GART, local RAM etc.
>>
>> However I suspect I just need on these platforms to ban any CPU
>> accesses to pixmaps in VRAM. However
>> sw fallbacks to the front buffer will always need these accesses.
>>
>> Its going to be a real pain getting any traction this stuff upstream
>> (X.org/Mesa) where the world is x86 and maybe the odd powerpc, having
>> to do special accessors for shithouse hw is never going to be fun.
>
> Is there no case on x86 when this matters?
>
> What about ARM, ColdFire or MIPS?
On x86, assuming the kernel hasn't done stupid things like map memory
ranges with conflicting memory types, etc. then no, it doesn't matter
what instructions you use to beat on the memory range, which is as it
should be. If this IA64 case is as described by Dave this really sounds
like a case of a brain damaged platform IMHO.. having memory-mapped
ranges where using certain instructions to write to them locks the
machine is just ridiculous. This sounds like one of those cases where a
hardware designer pawns off a particular case as "software can deal with
it" and causes the software people 10 times as much aggravation as they
saved themselves..
>
> As the embedded market continues to grow I hope to see X.org/Mesa on
> more hardware with different memory access rules.
>
> Cheers,
> Carlos.