LinuxLists.cc - Changing radeon KMS cs+gem ioctl to merge read & write domain

2009-10-21 20:51:44

Subject: Changing radeon KMS cs+gem ioctl to merge read & write domain

Hi,

I think we should merge the read & write domain of radeon KMS
into a single domains information. I don't think there is a
good reason for separate read & write domain, we did copy intel
model for that and intel use this mostly for cache coherency &
cache flushing as far as i understand. We make no good use of
this inside the kernel. In order to make this change less disruptive
and easier to introduce i propose we keep libdrm-radeon api
intact thus userspace xf86video-ati & mesa 3d driver doesn't
need a single line patch to adapt. Attached is a proof of concept,
a patch against libdrm which merge read & write domain and only
use the read domain to communicate with the kernel. I am still
in process of stress testing this patch but so far neither X
or 3D had any glitches.

I want to take advantage of this change to the cs reloc to the
following:
struct drm_radeon_cs_reloc {
»·······uint32_t»·······»·······handle;
»·······uint32_t»·······»·······domains;
»·······uint32_t»·······»·······unused;
»·······uint32_t»·······»·······flags;
};

With the following rules: a domain is a 4bit value (more than
enough i believe). Userspace can then provide domain preference
for each relocation. For instance :
0 Invalid|CPU
1 VRAM
2 GTT

domains = (VRAM << 0) | (GTT << 4)
would mean try to place object in VRAM first, if not enough
VRAM place it in GTT.

domains = (GTT << 0)
object can only be in GTT
...

I believe this would be a lot more useful information that
read|write domain. We would also now assume that userspace
knows what it's doing inside a single submited cs and that
userspace issue necessary flush if a bo is used in different
way. Which is what the ddx does.

I believe the only argument in favor of read & write split
is broken AGP chipset where GPU can't write to GART. So far
we don't use this information to work around the issue,
we don't even always test AGP writeback. Thus i believe this
change won't impact current user. Note that i am working on
code to work around bad AGP chipset (fallback to PCI GART
for GPU write + detection of broken writeback).

I really think we should take advantage of being in staging
driver to get the ioctl right before we have to freeze them.

Cheers,
Jerome Glisse

Attachments:

0001-libdrm-libradeon-Unify-read-write-domain-into-a-sing.patch (11.98 kB)

2009-10-22 01:51:34

by Corbin Simpson

[permalink] [raw]

Subject: Re: Changing radeon KMS cs+gem ioctl to merge read & write domain

On 10/21/2009 03:49 PM, Jerome Glisse wrote:
> Hi,
>
> I think we should merge the read & write domain of radeon KMS
> into a single domains information. I don't think there is a
> good reason for separate read & write domain, we did copy intel
> model for that and intel use this mostly for cache coherency &
> cache flushing as far as i understand. We make no good use of
> this inside the kernel. In order to make this change less disruptive
> and easier to introduce i propose we keep libdrm-radeon api
> intact thus userspace xf86video-ati & mesa 3d driver doesn't
> need a single line patch to adapt. Attached is a proof of concept,
> a patch against libdrm which merge read & write domain and only
> use the read domain to communicate with the kernel. I am still
> in process of stress testing this patch but so far neither X
> or 3D had any glitches.
>
> I want to take advantage of this change to the cs reloc to the
> following:
> struct drm_radeon_cs_reloc {
> »·······uint32_t»·······»·······handle;
> »·······uint32_t»·······»·······domains;
> »·······uint32_t»·······»·······unused;
> »·······uint32_t»·······»·······flags;
> };
>
> With the following rules: a domain is a 4bit value (more than
> enough i believe). Userspace can then provide domain preference
> for each relocation. For instance :
> 0 Invalid|CPU
> 1 VRAM
> 2 GTT
>
> domains = (VRAM << 0) | (GTT << 4)
> would mean try to place object in VRAM first, if not enough
> VRAM place it in GTT.
>
> domains = (GTT << 0)
> object can only be in GTT
> ...
>
> I believe this would be a lot more useful information that
> read|write domain. We would also now assume that userspace
> knows what it's doing inside a single submited cs and that
> userspace issue necessary flush if a bo is used in different
> way. Which is what the ddx does.
>
> I believe the only argument in favor of read & write split
> is broken AGP chipset where GPU can't write to GART. So far
> we don't use this information to work around the issue,
> we don't even always test AGP writeback. Thus i believe this
> change won't impact current user. Note that i am working on
> code to work around bad AGP chipset (fallback to PCI GART
> for GPU write + detection of broken writeback).
>
> I really think we should take advantage of being in staging
> driver to get the ioctl right before we have to freeze them.

No objections from me. If you have further ioctl changes, raising them
sooner rather than later would be *greatly* appreciated since I'm
probably the only person touching them in Gallium.

~ C.

2009-10-22 10:55:11

by Jerome Glisse

[permalink] [raw]

Subject: Re: Changing radeon KMS cs+gem ioctl to merge read & write domain

On Wed, 2009-10-21 at 18:49 -0700, Corbin Simpson wrote:
> On 10/21/2009 03:49 PM, Jerome Glisse wrote:
> > Hi,
> >
> > I think we should merge the read & write domain of radeon KMS
> > into a single domains information. I don't think there is a
> > good reason for separate read & write domain, we did copy intel
> > model for that and intel use this mostly for cache coherency &
> > cache flushing as far as i understand. We make no good use of
> > this inside the kernel. In order to make this change less disruptive
> > and easier to introduce i propose we keep libdrm-radeon api
> > intact thus userspace xf86video-ati & mesa 3d driver doesn't
> > need a single line patch to adapt. Attached is a proof of concept,
> > a patch against libdrm which merge read & write domain and only
> > use the read domain to communicate with the kernel. I am still
> > in process of stress testing this patch but so far neither X
> > or 3D had any glitches.
> >
> > I want to take advantage of this change to the cs reloc to the
> > following:
> > struct drm_radeon_cs_reloc {
> > »·······uint32_t»·······»·······handle;
> > »·······uint32_t»·······»·······domains;
> > »·······uint32_t»·······»·······unused;
> > »·······uint32_t»·······»·······flags;
> > };
> >
> > With the following rules: a domain is a 4bit value (more than
> > enough i believe). Userspace can then provide domain preference
> > for each relocation. For instance :
> > 0 Invalid|CPU
> > 1 VRAM
> > 2 GTT
> >
> > domains = (VRAM << 0) | (GTT << 4)
> > would mean try to place object in VRAM first, if not enough
> > VRAM place it in GTT.
> >
> > domains = (GTT << 0)
> > object can only be in GTT
> > ...
> >
> > I believe this would be a lot more useful information that
> > read|write domain. We would also now assume that userspace
> > knows what it's doing inside a single submited cs and that
> > userspace issue necessary flush if a bo is used in different
> > way. Which is what the ddx does.
> >
> > I believe the only argument in favor of read & write split
> > is broken AGP chipset where GPU can't write to GART. So far
> > we don't use this information to work around the issue,
> > we don't even always test AGP writeback. Thus i believe this
> > change won't impact current user. Note that i am working on
> > code to work around bad AGP chipset (fallback to PCI GART
> > for GPU write + detection of broken writeback).
> >
> > I really think we should take advantage of being in staging
> > driver to get the ioctl right before we have to freeze them.
>
> No objections from me. If you have further ioctl changes, raising them
> sooner rather than later would be *greatly* appreciated since I'm
> probably the only person touching them in Gallium.
>
> ~ C.

This change should work without any update to gallium code. But
to take advantages of the placement list you would need to use
a new libdrm-radeon API for reloc.

Beside this merge i don't have in mind any other API change. Others
things i am working on are mostly kernel side only.

Cheers,
Jerome

2009-10-26 02:12:20

by Dave Airlie

[permalink] [raw]

Subject: Re: Changing radeon KMS cs+gem ioctl to merge read & write domain

On Thu, Oct 22, 2009 at 8:49 AM, Jerome Glisse <[email protected]> wrote:
> Hi,
>
> I think we should merge the read & write domain of radeon KMS
> into a single domains information. I don't think there is a
> good reason for separate read & write domain, we did copy intel
> model for that and intel use this mostly for cache coherency &
> cache flushing as far as i understand. We make no good use of
> this inside the kernel. In order to make this change less disruptive
> and easier to introduce i propose we keep libdrm-radeon api
> intact thus userspace xf86video-ati & mesa 3d driver doesn't
> need a single line patch to adapt. Attached is a proof of concept,
> a patch against libdrm which merge read & write domain and only
> use the read domain to communicate with the kernel. I am still
> in process of stress testing this patch but so far neither X
> or 3D had any glitches.
>

Can you list the advantages (speed, complexity reduction)?, I really
really don't like this patch at this point in the development process,
yes the API has warts does fixing it now help any or just increase
the chance of regressions.

Like I don't think we've hit any of the limitations yet, and I suspect
the API limitations we hit will require a new revision of the API, which
I'd rather do once, than just hack away functionality because the current
underlying implementation doesn't use it yet.

I'd really like to use the read/write information to help decide VRAM/GTT
migration priorities in the future, I've mentioned this a few times and I've
haven't heard how your scheme addresses this.

Some issues I can see are you haven't really defined how userspace
users of this API should look, like who decides the buffer placement?
userspace only? can the kernel override? how does the kernel know
which allocs it can override and which it can't? how does space
checking work for the VRAM but maybe GTT buffers?

I don't think the API we have is perfect by any means I just don't think
investing in tweaking the corners for no real benefit is worth it. Just
make the gallium winsys API clean and then you don't need to look
at this stuff, and in a year or two a new API that actually provides benefits
and speedups after we've learned a bit from this one.

Dave.

2009-10-26 09:40:16

by Jerome Glisse

[permalink] [raw]

Subject: Re: Changing radeon KMS cs+gem ioctl to merge read & write domain

On Mon, 2009-10-26 at 12:12 +1000, Dave Airlie wrote:
> On Thu, Oct 22, 2009 at 8:49 AM, Jerome Glisse <[email protected]> wrote:
> > Hi,
> >
> > I think we should merge the read & write domain of radeon KMS
> > into a single domains information. I don't think there is a
> > good reason for separate read & write domain, we did copy intel
> > model for that and intel use this mostly for cache coherency &
> > cache flushing as far as i understand. We make no good use of
> > this inside the kernel. In order to make this change less disruptive
> > and easier to introduce i propose we keep libdrm-radeon api
> > intact thus userspace xf86video-ati & mesa 3d driver doesn't
> > need a single line patch to adapt. Attached is a proof of concept,
> > a patch against libdrm which merge read & write domain and only
> > use the read domain to communicate with the kernel. I am still
> > in process of stress testing this patch but so far neither X
> > or 3D had any glitches.
> >
>
> Can you list the advantages (speed, complexity reduction)?, I really
> really don't like this patch at this point in the development process,
> yes the API has warts does fixing it now help any or just increase
> the chance of regressions.

See below

> Like I don't think we've hit any of the limitations yet, and I suspect
> the API limitations we hit will require a new revision of the API, which
> I'd rather do once, than just hack away functionality because the current
> underlying implementation doesn't use it yet.
>
> I'd really like to use the read/write information to help decide VRAM/GTT
> migration priorities in the future, I've mentioned this a few times and I've
> haven't heard how your scheme addresses this.

New API gives opportunity to give a list of prefered placement, this
would be a lot better than read/write. Userspace has better view of
what is the usage of a buffer (used one frame ? big texture used
several frame ? big texture of which only few texel will be use ? ...)
Read/Write domain fails to transmit any such information to the kernel.

> Some issues I can see are you haven't really defined how userspace
> users of this API should look, like who decides the buffer placement?
> userspace only? can the kernel override? how does the kernel know
> which allocs it can override and which it can't? how does space
> checking work for the VRAM but maybe GTT buffers?

Userspace decide list of placement and kernel can't override it, it
has to pick one placement in the list (mostly because of PPC and vertex
buffer & swap utility). So this doesn't change from what we have, right
now only read buffer can be in vram or gtt.

For space checking, it works pretty much as it does now, we got 2 pool
vram+gtt when a new buffer comes in you check if there is enough room
for it in any of the pool it's valid to put it in. If there isn't enough
then you ask for flush. You first pick the prefered pool and decrease
it if enough space, if not then try second pool if any.

> I don't think the API we have is perfect by any means I just don't think
> investing in tweaking the corners for no real benefit is worth it. Just
> make the gallium winsys API clean and then you don't need to look
> at this stuff, and in a year or two a new API that actually provides benefits
> and speedups after we've learned a bit from this one.
>
> Dave.
>

My fear here is that we will have to support 3 different API (old DRI1,
KMS CS+read/write, KMS CS+newapi). It's painful, of course we could
always let the code rot and don't touch it once it works.

GPU are really different from other HW we don't have a "simple" high
level common API we can't export to userspace (like network or anyother
device i am aware of) :(.

Cheers,
Jerome