LinuxLists.cc - 2.6.14-rt4: via DRM errors

2005-11-24 05:17:24

Subject: 2.6.14-rt4: via DRM errors

I noticed these in dmesg after running "glxgears":

[drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
[drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
[drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
[drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
[drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
[drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
[drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
[drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
[drm:via_pci_cmdbuffer] *ERROR* via_pci_cmdbuffer called without lock held
[drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held

I was able to intermittently reproduce the messages by launching
glxgears and moving the window around.

Lee

2005-11-24 09:52:49

by Thomas Hellström

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

Hi

> I noticed these in dmesg after running "glxgears":
>
> [drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
> [drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
> [drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
> [drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
> [drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
> [drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
> [drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
> [drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
> [drm:via_pci_cmdbuffer] *ERROR* via_pci_cmdbuffer called without lock held
> [drm:via_cmdbuffer] *ERROR* via_cmdbuffer called without lock held
>
> I was able to intermittently reproduce the messages by launching
> glxgears and moving the window around.
>
> Lee
>
>

I made a fix to the locking code in main drm a couple of months ago.

The X server tries to get the DRM_QUIESCENT lock, but when the wait was
interrupted by a signal (like when you move a window around), the locking
function returned without error. This made the X server release other
clients' locks.

This does affect all drivers with a quiescent() function. Not only via.

But it looks like this fix never made it into the kernel source?
Dave?

/Thomas

>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> files
> for problems? Stop! Download the new AJAX search engine that makes
> searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> --
> _______________________________________________
> Dri-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dri-devel
>

2005-11-24 10:44:40

by Dave Airlie

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

>
> I made a fix to the locking code in main drm a couple of months ago.
>
> The X server tries to get the DRM_QUIESCENT lock, but when the wait was
> interrupted by a signal (like when you move a window around), the locking
> function returned without error. This made the X server release other
> clients' locks.
>
> This does affect all drivers with a quiescent() function. Not only via.
>
> But it looks like this fix never made it into the kernel source?
> Dave?

oops... on its way now ..

Dave.

--
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
Linux kernel - DRI, VAX / pam_smb / ILUG

2005-11-24 12:12:57

by Lee Revell

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

On Thu, 2005-11-24 at 10:52 +0100, Thomas Hellstr?m wrote:
> I made a fix to the locking code in main drm a couple of months ago.
>
> The X server tries to get the DRM_QUIESCENT lock, but when the wait
> was interrupted by a signal (like when you move a window around), the
> locking function returned without error. This made the X server
> release other clients' locks.
>
> This does affect all drivers with a quiescent() function. Not only
> via.
>
> But it looks like this fix never made it into the kernel source?

Thanks.

BTW can you point me to a good explanation of DRM locking? There's so
much indirection in the DRM code I can't even tell whether there's one
DRM lock or several, what kind of lock it is or what it's protecting
(beyond "access to the hardware"). Is it just an advisory lock used by
DRM clients to keep from stepping on each other? It doesn't seem
related to spinlocks or mutexes or any of the other types of lock in the
kernel.

Lee

2005-11-24 12:50:24

by Thomas Hellström

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

> On Thu, 2005-11-24 at 10:52 +0100, Thomas Hellstr?m wrote:
>> I made a fix to the locking code in main drm a couple of months ago.
>>
>> The X server tries to get the DRM_QUIESCENT lock, but when the wait
>> was interrupted by a signal (like when you move a window around), the
>> locking function returned without error. This made the X server
>> release other clients' locks.
>>
>> This does affect all drivers with a quiescent() function. Not only
>> via.
>>
>> But it looks like this fix never made it into the kernel source?
>
> Thanks.
>
> BTW can you point me to a good explanation of DRM locking? There's so
> much indirection in the DRM code I can't even tell whether there's one
> DRM lock or several, what kind of lock it is or what it's protecting
> (beyond "access to the hardware"). Is it just an advisory lock used by
> DRM clients to keep from stepping on each other? It doesn't seem
> related to spinlocks or mutexes or any of the other types of lock in the
> kernel.
>
> Lee

There is some info in the old precision insight documentation about the
DRI infrastructure, (can't seem to find a link right now) But generally
there is only one global lock and something called the drawable spinlock
that is apparently not used anymore. The global lock is similar to a
futex, with the exception that the kernel is called both to resolve
contention and whenever a new context is about to take the lock, so that
optional context switching can take place, and also if the client requests
that some special action should take place after locking is done, like
wait for dma ready or quiescent. The lock should be taken before writing
to the hardware or before submitting DMA commands. If you want to be
_sure_ that noone else uses the hardware (like you want to read a
particular register or something), you have to take the lock and wait for
DMA quiescent. For example, if you want to make sure the video scaler is
idle so you can write to it, you first take the lock so that noone else
writes to it or to the DMA queue, then you wait for the DMA queue to be
empty or make sure there are no pending commands for the scaler, then you
wait for the scaler to become idle.

The lock value is easily manipulated from user space and resides in one of
the shared memory areas. I guess this means that with the current drm
security policy it should be regarded as an advisory lock between drm
clients.

At one point I was about to implement a scheme for via with a number of
similar locks, one for each independent function on the video chip, Like
2D, 3D, Mpeg decoder, Video scaler 1 and 2, so that they didn't have to
wait for eachother. The global lock would then only be taken to make sure
that no drawables were touched by the X server or other clients while the
lock was held, which would be compatible with how the X server works
today. Never got around to do that, however, but the mpeg decoders have a
futex scheme to prevent clients stepping on eachother. With that it is
possible to have multiple clients use the same hw decoder.

/Thomas

>
>
>
>

2005-11-24 15:32:12

by Jesse Barnes

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

On Thursday, November 24, 2005 4:50 am, Thomas Hellstr?m wrote:
> There is some info in the old precision insight documentation about
> the DRI infrastructure, (can't seem to find a link right now) But
> generally there is only one global lock and something called the
> drawable spinlock that is apparently not used anymore. The global
> lock is similar to a futex, with the exception that the kernel is
> called both to resolve contention and whenever a new context is about
> to take the lock, so that optional context switching can take place,
> and also if the client requests that some special action should take
> place after locking is done, like wait for dma ready or quiescent.
> The lock should be taken before writing to the hardware or before
> submitting DMA commands. If you want to be _sure_ that noone else
> uses the hardware (like you want to read a particular register or
> something), you have to take the lock and wait for DMA quiescent. For
> example, if you want to make sure the video scaler is idle so you can
> write to it, you first take the lock so that noone else writes to it
> or to the DMA queue, then you wait for the DMA queue to be empty or
> make sure there are no pending commands for the scaler, then you wait
> for the scaler to become idle.
>
> The lock value is easily manipulated from user space and resides in
> one of the shared memory areas. I guess this means that with the
> current drm security policy it should be regarded as an advisory lock
> between drm clients.

This is a nice little writeup, maybe it could go into the kernel's
Documentation/ directory? It would be nice to document how the lock
and signal handling interact as well.

> At one point I was about to implement a scheme for via with a number
> of similar locks, one for each independent function on the video
> chip, Like 2D, 3D, Mpeg decoder, Video scaler 1 and 2, so that they
> didn't have to wait for eachother. The global lock would then only
> be taken to make sure that no drawables were touched by the X server
> or other clients while the lock was held, which would be compatible
> with how the X server works today. Never got around to do that,
> however, but the mpeg decoders have a futex scheme to prevent clients
> stepping on eachother. With that it is possible to have multiple
> clients use the same hw decoder.

Sounds interesting, but that would be card specific, right? I mean, on
some cards the 2d and 3d locks would have to be the same because of
shared state or whatever, for example.

Jesse

2005-11-24 16:04:42

by Thomas Hellström

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

>> At one point I was about to implement a scheme for via with a number
>> of similar locks, one for each independent function on the video
>> chip, Like 2D, 3D, Mpeg decoder, Video scaler 1 and 2, so that they
>> didn't have to wait for eachother. The global lock would then only
>> be taken to make sure that no drawables were touched by the X server
>> or other clients while the lock was held, which would be compatible
>> with how the X server works today. Never got around to do that,
>> however, but the mpeg decoders have a futex scheme to prevent clients
>> stepping on eachother. With that it is possible to have multiple
>> clients use the same hw decoder.
>
> Sounds interesting, but that would be card specific, right? I mean, on
> some cards the 2d and 3d locks would have to be the same because of
> shared state or whatever, for example.
>
> Jesse
>
Yes. you're right. The idea was to provide an implementation of a set of
locks and context switch / idle hooks that the device-specific driver
could use for whatever part of the chip it wanted, _if_ it wanted to.
When a command stream is submitted, the driver would need to check that
there are only commands for locked part of the chip in the stream. There
would also need to be a mechanism to check whether there are pending DMA
commands corresponding to a particular lock, to avoid having to make DMA
quiescent in unnecessary cases. Lock values would reside in a separate
shared memory area. However, a bit complicated and too little time.

/Thomas.

2005-11-25 15:51:48

by Alan

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

On Iau, 2005-11-24 at 05:49 -0500, Lee Revell wrote:
> BTW can you point me to a good explanation of DRM locking? There's so
> much indirection in the DRM code I can't even tell whether there's one
> DRM lock or several, what kind of lock it is or what it's protecting
> (beyond "access to the hardware"). Is it just an advisory lock used by
> DRM clients to keep from stepping on each other? It doesn't seem
> related to spinlocks or mutexes or any of the other types of lock in the
> kernel.

It co-ordinates access between the X server and various 3D clients so
that they don't step on each others drawing. A shared memory area is
used to co-ordinate other things like clip lists and what context may
have been stomped by another user if when you retake the lock you were
not last holder.

Precisely what it protects is board dependant

2005-11-25 19:05:54

by Lee Revell

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

On Thu, 2005-11-24 at 07:31 -0800, Jesse Barnes wrote:
> Sounds interesting, but that would be card specific, right? I mean,
> on some cards the 2d and 3d locks would have to be the same because of
> shared state or whatever, for example.

Not especially, that's how most Linux drivers work. The locking in the
DRM seems unusually coarse grained.

Lee

2005-11-25 19:13:54

by Arjan van de Ven

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

On Fri, 2005-11-25 at 14:05 -0500, Lee Revell wrote:
> On Thu, 2005-11-24 at 07:31 -0800, Jesse Barnes wrote:
> > Sounds interesting, but that would be card specific, right? I mean,
> > on some cards the 2d and 3d locks would have to be the same because of
> > shared state or whatever, for example.
>
> Not especially, that's how most Linux drivers work. The locking in the
> DRM seems unusually coarse grained.

of course sometimes having less but more coarse locks is actually
faster. Taking/dropping a lock is not free. far from it.

2005-11-25 19:22:14

by Lee Revell

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

On Fri, 2005-11-25 at 16:24 +0000, Alan Cox wrote:
> On Iau, 2005-11-24 at 05:49 -0500, Lee Revell wrote:
> > what kind of lock it is or what it's protecting

> It co-ordinates access between the X server and various 3D clients so
> that they don't step on each others drawing. A shared memory area is
> used to co-ordinate other things like clip lists and what context may
> have been stomped by another user if when you retake the lock you were
> not last holder.
>
> Precisely what it protects is board dependant

OK. So it's schedulable.

Any debugging advice for a DRI driver (radeon not via) that I suspect is
causing scheduling blips and audio dropouts due to bus greediness or
other rude behavior? There seem to be a bunch of timeouts where it will
bit bang the hardware in a loop, should I try reducing these?

Lee

2005-11-25 19:24:06

by Lee Revell

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

On Fri, 2005-11-25 at 20:13 +0100, Arjan van de Ven wrote:
> of course sometimes having less but more coarse locks is actually
> faster. Taking/dropping a lock is not free. far from it.

True but couldn't it be a problem for devices like unichrome where you
have 3D and MPEG acceleration and they have to play nice? It just seems
like there may have been an implicit assumption that devices only
support one type of hardware acceleration.

Lee

2005-11-25 19:33:31

by Alan

[permalink] [raw]

Subject: Re: 2.6.14-rt4: via DRM errors

On Gwe, 2005-11-25 at 14:23 -0500, Lee Revell wrote:
> On Fri, 2005-11-25 at 20:13 +0100, Arjan van de Ven wrote:
> > of course sometimes having less but more coarse locks is actually
> > faster. Taking/dropping a lock is not free. far from it.
>
> True but couldn't it be a problem for devices like unichrome where you
> have 3D and MPEG acceleration and they have to play nice? It just seems
> like there may have been an implicit assumption that devices only
> support one type of hardware acceleration.

Not really. The DRI locking is what the driver makes of it. Generally
GPUs are internally very coarse grained and don't like doing different
jobs at the same time anyway.

The nearest thing I think to look at it as would be futex locks, and DRI
could probably use futex locks with some glue for the X authentication
side of things. However futex locks are not in FreeBSD and may never be
(IBM patent questions for non-GPL), and DRI predates futexes by a large
margin.