I'm currently using an r200 (specifically, an agp 'ATI Technologies
Inc Radeon R200 QM [Radeon 9100]') on a uniproc Pentium 3 board
equipped with an intel 440bx/piix4 type chipset (the agp controller is
identified as 'Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev
03)'
All of this was tested with a virgin 2.6.8.1 (with debug info and
frame pointers enabled) and Debian's XFree86 4.3.0.1, using DarkPlaces
and Twilight (both popular quakeGL engine forks) as test apps, unless
otherwise noted.
Thanks to wli (who I owe at least one beer for this, may be an entire
case), we've been able to figure exactly whats going on. The driver is
turning off interrupts, then deadlocking. (No sysrq, no sshing in,
capslock's light doesn't work.)
Turning the NMI watchdog on, it 'fixes' the deadlock (and thanks to
the watchdog, ssh and sysrq now work, but capslock's light still
doesn't work), but the app and X are still dead, but now I can ssh in
and kill -9 them both, however, and quite obviously, I can't start
another X, but I can reboot cleanly.
Things already tested that don't effect bug:
Turning SMP on or off
Turning 4k stacks on or off
Using new radeon fbcon, using old radeon fbcon, using no fbcon
Turning Local APIC for uniproc and/or IO-APIC for uniproc on or off
Turning preempt on or off
Using mem=nopentium
Waving a dead chicken over the box
Things already tested for:
Kernels as far back as 2.6.0 have this bug, haven't tested any earlier
Thanks to netconsole (who I recommend to anyone that can't setup
serial console stuff), I was able to capture the entire kernel output,
including sysrq-t output right after my test app crashes.
I'm including both the netconsole output and the .config. vmlinux and
radeon.ko (and anything else you need) are available upon request.
--
Patrick "Diablo-D3" McFarland || [email protected]
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989
Can you insmod the radeon drm module with drm_opts=debug do the test and
send on the trace, it may be getting wedged somewhere unexpected...
Dave.
>
> All of this was tested with a virgin 2.6.8.1 (with debug info and
> frame pointers enabled) and Debian's XFree86 4.3.0.1, using DarkPlaces
> and Twilight (both popular quakeGL engine forks) as test apps, unless
> otherwise noted.
>
> Thanks to wli (who I owe at least one beer for this, may be an entire
> case), we've been able to figure exactly whats going on. The driver is
> turning off interrupts, then deadlocking. (No sysrq, no sshing in,
> capslock's light doesn't work.)
>
> Turning the NMI watchdog on, it 'fixes' the deadlock (and thanks to
> the watchdog, ssh and sysrq now work, but capslock's light still
> doesn't work), but the app and X are still dead, but now I can ssh in
> and kill -9 them both, however, and quite obviously, I can't start
> another X, but I can reboot cleanly.
>
> Things already tested that don't effect bug:
> Turning SMP on or off
> Turning 4k stacks on or off
> Using new radeon fbcon, using old radeon fbcon, using no fbcon
> Turning Local APIC for uniproc and/or IO-APIC for uniproc on or off
> Turning preempt on or off
> Using mem=nopentium
> Waving a dead chicken over the box
>
> Things already tested for:
> Kernels as far back as 2.6.0 have this bug, haven't tested any earlier
>
> Thanks to netconsole (who I recommend to anyone that can't setup
> serial console stuff), I was able to capture the entire kernel output,
> including sysrq-t output right after my test app crashes.
>
> I'm including both the netconsole output and the .config. vmlinux and
> radeon.ko (and anything else you need) are available upon request.
>
>
--
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person
On Sat, 2004-09-04 at 05:16 -0400, Patrick McFarland wrote:
>
> All of this was tested with a virgin 2.6.8.1 (with debug info and
> frame pointers enabled) and Debian's XFree86 4.3.0.1, [...]
What version of the DRI driver?
--
Earthling Michel Dänzer | Debian (powerpc), X and DRI developer
Libre software enthusiast | http://svcs.affero.net/rm.php?r=daenzer
On Sat, 04 Sep 2004 14:14:55 -0400, Michel D?nzer <[email protected]> wrote:
> What version of the DRI driver?
Where do I look for that?
--
Patrick "Diablo-D3" McFarland || [email protected]
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989
On Sat, 2004-09-04 at 16:36 -0400, Patrick McFarland wrote:
> On Sat, 04 Sep 2004 14:14:55 -0400, Michel Dänzer <[email protected]> wrote:
> > What version of the DRI driver?
>
> Where do I look for that?
Where did you get r200_dri.so from?
--
Earthling Michel Dänzer | Debian (powerpc), X and DRI developer
Libre software enthusiast | http://svcs.affero.net/rm.php?r=daenzer
On Sun, 05 Sep 2004 02:34:59 -0400, Michel D?nzer <[email protected]> wrote:
> On Sat, 2004-09-04 at 16:36 -0400, Patrick McFarland wrote:
> > On Sat, 04 Sep 2004 14:14:55 -0400, Michel D?nzer <[email protected]> wrote:
> > > What version of the DRI driver?
> >
> > Where do I look for that?
>
> Where did you get r200_dri.so from?
>From the one that comes with the Deb X I mentioned above.
--
Patrick "Diablo-D3" McFarland || [email protected]
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989
On Sat, 4 Sep 2004 11:59:12 +0100 (IST), Dave Airlie <[email protected]> wrote:
>
> Can you insmod the radeon drm module with drm_opts=debug do the test and
> send on the trace, it may be getting wedged somewhere unexpected...
Here you go, but it doesn't look like it has output anything interesting.
--
Patrick "Diablo-D3" McFarland || [email protected]
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989
On Sun, 2004-09-05 at 04:22 -0400, Patrick McFarland wrote:
> On Sun, 05 Sep 2004 02:34:59 -0400, Michel Dänzer <[email protected]> wrote:
> >
> > Where did you get r200_dri.so from?
>
> From the one that comes with the Deb X I mentioned above.
Please try something newer, e.g. my xlibmesa-gl1-dri-trunk or a binary
snapshot from dri.sf.net.
--
Earthling Michel Dänzer | Debian (powerpc), X and DRI developer
Libre software enthusiast | http://svcs.affero.net/rm.php?r=daenzer
On Sun, 05 Sep 2004 13:40:54 -0400, Michel D?nzer <[email protected]> wrote:
> On Sun, 2004-09-05 at 04:22 -0400, Patrick McFarland wrote:
> > On Sun, 05 Sep 2004 02:34:59 -0400, Michel D?nzer <[email protected]> wrote:
> > >
> > > Where did you get r200_dri.so from?
> >
> > From the one that comes with the Deb X I mentioned above.
>
> Please try something newer, e.g. my xlibmesa-gl1-dri-trunk or a binary
> snapshot from dri.sf.net.
That shouldn't matter, should it? The userland stuff should never lock
the machine up.
I'll test it anyhow, though.
--
Patrick "Diablo-D3" McFarland || [email protected]
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989
On Sun, 2004-09-05 at 16:18 -0400, Patrick McFarland wrote:
> On Sun, 05 Sep 2004 13:40:54 -0400, Michel Dänzer <[email protected]> wrote:
> > On Sun, 2004-09-05 at 04:22 -0400, Patrick McFarland wrote:
> > > On Sun, 05 Sep 2004 02:34:59 -0400, Michel Dänzer <[email protected]> wrote:
> > > >
> > > > Where did you get r200_dri.so from?
> > >
> > > From the one that comes with the Deb X I mentioned above.
> >
> > Please try something newer, e.g. my xlibmesa-gl1-dri-trunk or a binary
> > snapshot from dri.sf.net.
>
> That shouldn't matter, should it? The userland stuff should never lock
> the machine up.
In an ideal world... Feel free to track down the cause and add code to
the DRM to prevent it.
--
Earthling Michel Dänzer | Debian (powerpc), X and DRI developer
Libre software enthusiast | http://svcs.affero.net/rm.php?r=daenzer
On Sun, 05 Sep 2004 16:25:00 -0400, Michel D?nzer <[email protected]> wrote:
> On Sun, 2004-09-05 at 16:18 -0400, Patrick McFarland wrote:
> > That shouldn't matter, should it? The userland stuff should never lock
> > the machine up.
>
> In an ideal world... Feel free to track down the cause and add code to
> the DRM to prevent it.
I would, except, as many have noted before, even looking at the r200
driver requires years
of therapy to get rid of the nightmares.
So, yeah, I'll check to see if today's dri cvs snapshot works. If it
doesn't, I'm not sure what to do.
--
Patrick "Diablo-D3" McFarland || [email protected]
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989
On Sun, 2004-09-05 at 16:18, Patrick McFarland wrote:
> On Sun, 05 Sep 2004 13:40:54 -0400, Michel D?nzer <[email protected]> wrote:
> > On Sun, 2004-09-05 at 04:22 -0400, Patrick McFarland wrote:
> > > On Sun, 05 Sep 2004 02:34:59 -0400, Michel D?nzer <[email protected]> wrote:
> > > >
> > > > Where did you get r200_dri.so from?
> > >
> > > From the one that comes with the Deb X I mentioned above.
> >
> > Please try something newer, e.g. my xlibmesa-gl1-dri-trunk or a binary
> > snapshot from dri.sf.net.
>
> That shouldn't matter, should it? The userland stuff should never lock
> the machine up.
> I'll test it anyhow, though.
No, it shouldn't. Anything that directly accesses hardware belongs in
the kernel. How to fix this is a pretty hot topic now.
Lee
On Sun, 05 Sep 2004 20:14:43 -0400
Lee Revell <[email protected]> wrote:
> On Sun, 2004-09-05 at 16:18, Patrick McFarland wrote:
[snip]
> >
> > That shouldn't matter, should it? The userland stuff should never lock
> > the machine up.
> > I'll test it anyhow, though.
>
> No, it shouldn't. Anything that directly accesses hardware belongs in
> the kernel. How to fix this is a pretty hot topic now.
That's not the whole truth. There are just too many ways to lock up
those 3D chips. For instance I fixed a lockup in the r100 driver where
the order in which state changing commands were sent to the hardware
would cause a lockup. Each individual state changing command is
perfectly valid. Finding all permutations that trigger a lockup would
have been too much of a hassle and may not even have been true for all
supported hardware out there. So we made the user-space driver emit
state changing commands in a fixed order, which seems to work
everywhere.
Regars,
Felix
>
> Lee
>
| Felix K?hling <[email protected]> http://fxk.de.vu |
| PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 |
On Sun, 05 Sep 2004 20:14:43 -0400, Lee Revell <[email protected]> wrote:
> How to fix this is a pretty hot topic now.
Yow, I didn't mean to cause such an upset. ;)
Currently, the dri cvs snapshot for 20040905 doesn't compile with
2.6.8.1 for me (I've sent
a bug report to the dri-devel mailing list about this) so Lee and
Michel, you'll have to wait
until tomorrow (or maybe even the day after that) to see how the test goes.
I'm hoping it does work, this bug is pretty nasty imho. Who knew Quake
could take an entire box out in under 10 seconds. ;)
--
Patrick "Diablo-D3" McFarland || [email protected]
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989
On Mon, 2004-09-06 at 07:01 -0400, Patrick McFarland wrote:
> On Sun, 05 Sep 2004 20:14:43 -0400, Lee Revell <[email protected]> wrote:
> > How to fix this is a pretty hot topic now.
>
> Yow, I didn't mean to cause such an upset. ;)
>
> Currently, the dri cvs snapshot for 20040905 doesn't compile with
> 2.6.8.1 for me (I've sent
> a bug report to the dri-devel mailing list about this) so Lee and
> Michel, you'll have to wait
> until tomorrow (or maybe even the day after that) to see how the test goes.
You can test the r200_dri.so from the snapshot with the DRM from the
kernel...
--
Earthling Michel Dänzer | Debian (powerpc), X and DRI developer
Libre software enthusiast | http://svcs.affero.net/rm.php?r=daenzer
--- Felix K?hling <[email protected]> wrote:
> On Sun, 05 Sep 2004 20:14:43 -0400
> Lee Revell <[email protected]> wrote:
>
> > On Sun, 2004-09-05 at 16:18, Patrick McFarland wrote:
> [snip]
> > >
> > > That shouldn't matter, should it? The userland stuff should never
> lock
> > > the machine up.
> > > I'll test it anyhow, though.
> >
> > No, it shouldn't. Anything that directly accesses hardware belongs in
> > the kernel. How to fix this is a pretty hot topic now.
>
> That's not the whole truth. There are just too many ways to lock up
> those 3D chips. For instance I fixed a lockup in the r100 driver where
> the order in which state changing commands were sent to the hardware
> would cause a lockup. Each individual state changing command is
> perfectly valid. Finding all permutations that trigger a lockup would
> have been too much of a hassle and may not even have been true for all
> supported hardware out there. So we made the user-space driver emit
> state changing commands in a fixed order, which seems to work
> everywhere.
>
Dose the DRM varify that the cmds are in this order? Why not just have
the DRM 'sort' the cmds? A simple bouble sort would have no more overhead
then the check for correct order, but it would fix missordered cmd
streams.
Once this is done the statement holds true, userland stuff should never...
> Regars,
> Felix
>
> >
> > Lee
> >
>
> | Felix K?hling <[email protected]> http://fxk.de.vu |
> | PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 |
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> http://ads.osdn.com/?ad_idP47&alloc_id808&op=click
> --
> _______________________________________________
> Dri-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dri-devel
>
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail
> Dose the DRM varify that the cmds are in this order? Why not just have
> the DRM 'sort' the cmds? A simple bouble sort would have no more overhead
> then the check for correct order, but it would fix missordered cmd
> streams.
>
> Once this is done the statement holds true, userland stuff should never...
>
Feel free to implement it and profile it, but there are so many ways
to lock up a radeon chip it is scary, the above was just one example,
some days if you look at it funny it can lockup :-), it is accepted
that userland can crap out 3D chips, the Intel ones are fairly easy to
hangup also..
Dave.
Most IMPORTANT is that some-one some-where there is a list of ALL of
these. These are best in the form of code comments so the the respective
places in the code can be changed.
--- Dave Airlie <[email protected]> wrote:
> > Dose the DRM varify that the cmds are in this order? Why not just
> have
> > the DRM 'sort' the cmds? A simple bouble sort would have no more
> overhead
> > then the check for correct order, but it would fix missordered cmd
> > streams.
> >
> > Once this is done the statement holds true, userland stuff should
> never...
> >
>
> Feel free to implement it and profile it, but there are so many ways
> to lock up a radeon chip it is scary, the above was just one example,
> some days if you look at it funny it can lockup :-), it is accepted
> that userland can crap out 3D chips, the Intel ones are fairly easy to
> hangup also..
>
I'd love to, where do I start? The problem he is that I have no-idea...
1. What values I'd neet to test for and sort.
2. The order of the sorting(probly documented in DRI-client code).
3. Where in the DRM I can proform the needed test and sort.
I would also love a list of ALL of these so I can fix them one by one. A
good project for a new DRI developer, no.
> Dave.
>
__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail
On Mon, 06 Sep 2004 14:12:08 -0400, Michel D?nzer <[email protected]> wrote:
> You can test the r200_dri.so from the snapshot with the DRM from the
> kernel...
And drum roll please...
The dri cvs snapshot works fine on both it's own kernel module, and
the one that comes
with 2.6.8.1. So now what? (And does this mean it isn't a kernel bug?)
<rant>
Also, what happens to r200 users who happen to use Debian? Using dri
cvs snapshots
obviously isn't an option for everyone (though I don't mind at all)
and upgrading to Xorg
(when Xorg gets this fix if it doesn't already) is even less of an
option. The official word
from the Debian X Strike Force is not to switch to Xorg until debriX
(modular X) gets
somewhere.
</rant>
--
Patrick "Diablo-D3" McFarland || [email protected]
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989
On Tue, 7 Sep 2004 05:07:45 -0400, Patrick McFarland <[email protected]> wrote:
> Lots of badly formatted text.
I do apologize for anyone who had to read that.
--
Patrick "Diablo-D3" McFarland || [email protected]
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989
Mike Mestnik wrote:
> Most IMPORTANT is that some-one some-where there is a list of ALL of
> these. These are best in the form of code comments so the the respective
> places in the code can be changed.
>
> --- Dave Airlie <[email protected]> wrote:
>
>
>>>Dose the DRM varify that the cmds are in this order? Why not just
>>
>>have
>>
>>>the DRM 'sort' the cmds? A simple bouble sort would have no more
>>
>>overhead
>>
>>>then the check for correct order, but it would fix missordered cmd
>>>streams.
>>>
>>>Once this is done the statement holds true, userland stuff should
>>
>>never...
>>
>>Feel free to implement it and profile it, but there are so many ways
>>to lock up a radeon chip it is scary, the above was just one example,
>>some days if you look at it funny it can lockup :-), it is accepted
>>that userland can crap out 3D chips, the Intel ones are fairly easy to
>>hangup also..
>>
>
> I'd love to, where do I start? The problem he is that I have no-idea...
> 1. What values I'd neet to test for and sort.
> 2. The order of the sorting(probly documented in DRI-client code).
> 3. Where in the DRM I can proform the needed test and sort.
>
> I would also love a list of ALL of these so I can fix them one by one. A
> good project for a new DRI developer, no.
I seriously doubt this is doable. Unless you put the whole driver in the
kernel, which of course nobody wants. I frequently caused gpu lockups by
experimental driver changes (for instance, wrong vertex setup). I think
the consensus was that it's ok for the driver to lock up the gpu, but it
should not lock up the kernel.
It might be possible to prevent lockups by a watchdog, resetting the gpu
if a lockup is detected. This is how ATI deals with lockups in windows
(dubbed "VPU Recover"), and there is a patch floating around for DRI too
(though it is not exactly for that, and doesn't always work).
Roland
On Maw, 2004-09-07 at 10:07, Patrick McFarland wrote:
> Also, what happens to r200 users who happen to use Debian? Using dri
> cvs snapshots
If Debian is currently shipping a buggy driver then Debian needs to ship
a working driver. Same as anyone else. You'll also need the newest
dri driver for Radeon IGP (most ATI chipset laptops) and the newer
R2xx hardware.
Alan
Sorry, I don't know why we are cross posting and including subscribers in
CC. This belongs on the DRI list, as it is only with 3rd party DRI-client
code that the problem exists.
--- Dave Airlie <[email protected]> wrote:
> On Tue, 07 Sep 2004 09:07:11 +0200, Arjan van de Ven <[email protected]>
> wrote:
> > On Tue, 2004-09-07 at 08:54, Dave Airlie wrote:
> >
> > > Feel free to implement it and profile it, but there are so many ways
> > > to lock up a radeon chip it is scary, the above was just one
> example,
> > > some days if you look at it funny it can lockup :-), it is accepted
> > > that userland can crap out 3D chips, the Intel ones are fairly easy
> to
> > > hangup also..
> >
> >
> > hmmm.. I thought the entire reason for having part of DRM in the
> kernel
> > was to be able to prevent such events from happening....
>
> only one reason...
> http://dri.sourceforge.net/doc/drm_low_level.html
>
> But to be honest the chips are entirely capable of locking up on what
> the docs say are valid things, writing enough workarounds and test
> would bloat the drm considerably,
> at the moment we try and have it so a valid OpenGL application doesn't
> lock it up, but someone writing directly to the DRM would be able to
> lockup a fair few chips in many interesting ways....
>
> Dave.
>
--- Roland Scheidegger <[email protected]> wrote:
>
> I seriously doubt this is doable. Unless you put the whole driver in the
>
> kernel, which of course nobody wants. I frequently caused gpu lockups by
>
> experimental driver changes (for instance, wrong vertex setup). I think
> the consensus was that it's ok for the driver to lock up the gpu, but it
>
> should not lock up the kernel.
> It might be possible to prevent lockups by a watchdog, resetting the gpu
>
> if a lockup is detected. This is how ATI deals with lockups in windows
> (dubbed "VPU Recover"), and there is a patch floating around for DRI too
>
> (though it is not exactly for that, and doesn't always work).
>
> Roland
>
It's a simple matter of enforcing 3rd party(this means every DRM user)
clients to use DRI's *dialect or style*. If the DRM see activities that
are not expected to be generated by pure DRI-clients, action should be
taken to prevent a posible lockup. This means that even valid activities
should be treated as invalid IF the DRM can clerly detect a deviation from
pure DRI-client activities.
For example, pure DRI-clients emit state changing commands is a vary
specific order. The DRM could easily spot if these cmds where out of any
knowen/used order or if any other cmds where also inserted into the
expected order. This should be denied"." Only DRI-clients(any client)
using the DRI supplied order(the one used by pure DRI-clients) should be
allowed to access the hardware.
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail
Patrick McFarland wrote:
> On Mon, 06 Sep 2004 14:12:08 -0400, Michel D?nzer <[email protected]> wrote:
>
>>You can test the r200_dri.so from the snapshot with the DRM from the
>>kernel...
>
>
> And drum roll please...
Too early :(
> The dri cvs snapshot works fine on both it's own kernel module, and
> the one that comes
> with 2.6.8.1. So now what? (And does this mean it isn't a kernel bug?)
I have compiled both the kernel module and the replacement X server from
the yesterday's CVS checkout of DRI (but according to "outdated"
instructions in the Wiki). It made a difference against XFree86 4.4.0 +
in-kernel radeon.ko.
The difference is that, instead of just hanging, after several minutes
of run time applications (e.g. "really slick screensavers") print:
drmRadeonIrqWait: -16
and exit with status 1.
After that, 2D works, but the _next_ fullscreen OpenGL application hangs
the system immediately on start.
--
Alexander E. Patrakov