Hi.
I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
The symptoms are that load average goes up, X stops accepting keypresses
or mouse clicks, but the cursor still moves around the screen in
response to the mouse being moved. I can't switch to a VT but can ssh in
remotely to see that things are still running. I don't seem to be able
to kill X but "shutdown -r now" cleanly reboots.
gdb fails to attach - complains about an internal error. strace shows
lots of ioctls against the DRM device all returning EBUSY.
2.6.25 appears to work fine. I originally had PAT enabled under 2.6.26
but have seen a patch fixing that go into git, so disabled it for my
2.6.26 kernel to see if that was the issue; no change AFAICT.
Enabling DRM debug (echo 1 > /sys/module/drm/parameters/debug) gives
lots of output from radeon_freelist_get, after the following ioctl is
received:
Jul 25 10:11:14 meepok kernel: [drm:drm_ioctl] pid=3302, cmd=0xc0406429, nr=0x29 , dev 0xe200, auth=1
and then a returning NULL message.
radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
but I've seen it with older revisions too.
It can take a couple of days for me to hit the problem, so a git bisect
could be a lengthy process. If anyone has any suggestions about faster
ways to track down the issue I'd like to hear them.
Machine is a dual core AMD64 with 4GB of RAM running Debian unstable,
card is:
01:05.0 VGA compatible controller [0300]: ATI Technologies Inc RS690 [Radeon X1200 Series] [1002:791e]
Kernel configs at:
http://the.earth.li/~noodles/radeon-2.6.26-hang/config-2.6.25
http://the.earth.li/~noodles/radeon-2.6.26-hang/config-2.6.26
Debug log from enabling drm debug:
http://the.earth.li/~noodles/radeon-2.6.26-hang/debug
Full dmesg (no obvious errors):
http://the.earth.li/~noodles/radeon-2.6.26-hang/meepok.dmesg
Xorg log file (no obvious errors):
http://the.earth.li/~noodles/radeon-2.6.26-hang/Xorg.0.log
J.
--
"I put it down to corrosive groin sweat myself." -- John Burnham, asr
This .sig brought to you by the letter N and the number 39
Product of the Republic of HuggieTag
> I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
> The symptoms are that load average goes up, X stops accepting keypresses
> or mouse clicks, but the cursor still moves around the screen in
> response to the mouse being moved. I can't switch to a VT but can ssh in
> remotely to see that things are still running. I don't seem to be able
> to kill X but "shutdown -r now" cleanly reboots.
>
> radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
> but I've seen it with older revisions too.
>
> It can take a couple of days for me to hit the problem, so a git bisect
> could be a lengthy process. If anyone has any suggestions about faster
> ways to track down the issue I'd like to hear them.
git log v2.6.25..v2.6.26 drivers/char/drm
5e35eff13f7dd0f5c1d82b3b4708b2f7a5f44113
5cfb6956073a9e42d44a26790b7800980634d037
d396db321bcaec54345e7e9e87cea8482d6ae3a8
259434acccbc823ee8bc00b2d2689ccccd25e1fd
d7463eb41d88a39de2653fd41857c4ccddb8707b
45e519052e8f583a709edd442a23f59581d3fe42
2735977b12cb0f113aae24afff04747b6d0f5bf1
3722bfc607d46275369865c02fe8694486d640b5
fa0d71b967506031f7cb08ced6095d1c4f988594
9f18409ea3d778a171a9505c0a849d846f352bd0
not sure if you wanna try reverting some of those and seeing which is the
cause maybe..
Dave.
On Fri, 25 Jul 2008 10:43:34 +0100
Jonathan McDowell <[email protected]> wrote:
> Hi.
>
> I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
> The symptoms are that load average goes up, X stops accepting keypresses
> or mouse clicks, but the cursor still moves around the screen in
> response to the mouse being moved. I can't switch to a VT but can ssh in
> remotely to see that things are still running. I don't seem to be able
> to kill X but "shutdown -r now" cleanly reboots.
>
> gdb fails to attach - complains about an internal error. strace shows
> lots of ioctls against the DRM device all returning EBUSY.
>
> 2.6.25 appears to work fine. I originally had PAT enabled under 2.6.26
> but have seen a patch fixing that go into git, so disabled it for my
> 2.6.26 kernel to see if that was the issue; no change AFAICT.
>
> Enabling DRM debug (echo 1 > /sys/module/drm/parameters/debug) gives
> lots of output from radeon_freelist_get, after the following ioctl is
> received:
>
> Jul 25 10:11:14 meepok kernel: [drm:drm_ioctl] pid=3302, cmd=0xc0406429, nr=0x29 , dev 0xe200, auth=1
>
> and then a returning NULL message.
>
> radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
> but I've seen it with older revisions too.
>
> It can take a couple of days for me to hit the problem, so a git bisect
> could be a lengthy process. If anyone has any suggestions about faster
> ways to track down the issue I'd like to hear them.
>
> Machine is a dual core AMD64 with 4GB of RAM running Debian unstable,
> card is:
>
> 01:05.0 VGA compatible controller [0300]: ATI Technologies Inc RS690 [Radeon X1200 Series] [1002:791e]
>
> Kernel configs at:
>
> http://the.earth.li/~noodles/radeon-2.6.26-hang/config-2.6.25
> http://the.earth.li/~noodles/radeon-2.6.26-hang/config-2.6.26
>
> Debug log from enabling drm debug:
>
> http://the.earth.li/~noodles/radeon-2.6.26-hang/debug
>
> Full dmesg (no obvious errors):
>
> http://the.earth.li/~noodles/radeon-2.6.26-hang/meepok.dmesg
>
> Xorg log file (no obvious errors):
>
> http://the.earth.li/~noodles/radeon-2.6.26-hang/Xorg.0.log
>
> J.
>
This looks like usual engine lockup followed by CP lockup so
that DMA buffer age never get written and we run out of DMA
buffer thus freelist failing in infinite loop.
I think we now know all the reason why we lockup, while a
fix could be made for old ioctl we believe the best plan is
to work on new ioctl with this fix in mind.
So i don't think a bisect will help, there is certainly somethings
that made this lockup more probable to happen on your config
but best things is to fix lockup.
If you really got time you can still do bisect and find out
what makes this lockups more obvious on your config this could
be helpfull to check that our theories are goods.
Cheers,
Jerome Glisse
Am Freitag 25 Juli 2008 12:12:59 schrieb Jerome Glisse:
> This looks like usual engine lockup followed by CP lockup so
> that DMA buffer age never get written and we run out of DMA
> buffer thus freelist failing in infinite loop.
>
> I think we now know all the reason why we lockup, while a
> fix could be made for old ioctl we believe the best plan is
> to work on new ioctl with this fix in mind.
I can't help but feel uneasy with that kind of plan. After all, do "we"
*really* know what's going on? I always had the impression that we only knew
things along the lines of "perhaps it's better to submit 3D stuff in indirect
buffers".
If you *really* know what causes the lockups, could you please document that?
As in, what's the actual command processor sequence that is to blame? I know
that running e.g. a Nexuiz demo + glxgears window above it is apparently a
100% guaranteed lockup on my system (R420).
If you could share your progress in tracking down the sources of the lockups,
I'd happily try to write a patch against the current system.
cu,
Nicolai
On Fri, 25 Jul 2008 19:04:55 +0200
Nicolai H?hnle <[email protected]> wrote:
> Am Freitag 25 Juli 2008 12:12:59 schrieb Jerome Glisse:
> > This looks like usual engine lockup followed by CP lockup so
> > that DMA buffer age never get written and we run out of DMA
> > buffer thus freelist failing in infinite loop.
> >
> > I think we now know all the reason why we lockup, while a
> > fix could be made for old ioctl we believe the best plan is
> > to work on new ioctl with this fix in mind.
>
> I can't help but feel uneasy with that kind of plan. After all, do "we"
> *really* know what's going on? I always had the impression that we only knew
> things along the lines of "perhaps it's better to submit 3D stuff in indirect
> buffers".
>
> If you *really* know what causes the lockups, could you please document that?
> As in, what's the actual command processor sequence that is to blame? I know
> that running e.g. a Nexuiz demo + glxgears window above it is apparently a
> 100% guaranteed lockup on my system (R420).
>
> If you could share your progress in tracking down the sources of the lockups,
> I'd happily try to write a patch against the current system.
>
> cu,
> Nicolai
>
Here is a brief list from top of my head for the record :
- no RB3D_DSTCACHE twice in a row without rendering cmd in btw
- initialize all clip register to default values wait for engine idle
after setting them
- update wptr every 32 dwords (2 dwords seems enough but that one
is very hard to track)
- use indirect buffer
- RB3D_DSTCACHE is not pipelined if free or sync bit is not set
thus you have to feel the fifo and wait for idle before writing
it if none of these bits are set
- flush & wait until 3d before 2d, and flush & wait dma & 2d idle
after 2d as well feel the fifo with dummy 2d reg to avoid unpipelined
3d reg to get executed before idle is asserted
- avoid emitting cliprect too much
- txinval before changing texture
- avoid stuff RB3D_DSTCACHE & RB2D_DSTCACHE too much
- set ISYNC properly through CP
- CP idle is wrong we should wait for tag and not
try to force CP to goes idle or inject flush after
idle
- set vertex shader constant & input to default safe value
And there is other things to think about scattered in my drm.
Baiscly things should be set in some order to make sure the
engine will not be unhappy in face of a cmd stream. Some of the
above might be wrong but i use them because somehow they each
one of them seems to give me more stable drm. The last drm
i have doesn't lockup in the case of few glxgears on top
of other 3d app like celestia and likely nexuiz haven't tried
that one.
Cheers,
Jerome Glisse <[email protected]>
On Fri, Jul 25, 2008 at 11:10:07AM +0100, Dave Airlie wrote:
> > I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
> > The symptoms are that load average goes up, X stops accepting keypresses
> > or mouse clicks, but the cursor still moves around the screen in
> > response to the mouse being moved. I can't switch to a VT but can ssh in
> > remotely to see that things are still running. I don't seem to be able
> > to kill X but "shutdown -r now" cleanly reboots.
> >
> > radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
> > but I've seen it with older revisions too.
> >
> > It can take a couple of days for me to hit the problem, so a git bisect
> > could be a lengthy process. If anyone has any suggestions about faster
> > ways to track down the issue I'd like to hear them.
>
> git log v2.6.25..v2.6.26 drivers/char/drm
>
> 5e35eff13f7dd0f5c1d82b3b4708b2f7a5f44113
> 5cfb6956073a9e42d44a26790b7800980634d037
No joy.
> d396db321bcaec54345e7e9e87cea8482d6ae3a8
I thought this might be it; nearly 5 days of uptime rather than the
usual less than 2. But I got the same symptoms today so I'll continue
working down the list.
> 259434acccbc823ee8bc00b2d2689ccccd25e1fd
> d7463eb41d88a39de2653fd41857c4ccddb8707b
> 45e519052e8f583a709edd442a23f59581d3fe42
> 2735977b12cb0f113aae24afff04747b6d0f5bf1
> 3722bfc607d46275369865c02fe8694486d640b5
> fa0d71b967506031f7cb08ced6095d1c4f988594
> 9f18409ea3d778a171a9505c0a849d846f352bd0
J.
--
Friends are God's apology for relations.
On Fri, 1 Aug 2008, Jonathan McDowell wrote:
> On Fri, Jul 25, 2008 at 11:10:07AM +0100, Dave Airlie wrote:
>>> I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
>>> The symptoms are that load average goes up, X stops accepting keypresses
>>> or mouse clicks, but the cursor still moves around the screen in
>>> response to the mouse being moved. I can't switch to a VT but can ssh in
>>> remotely to see that things are still running. I don't seem to be able
>>> to kill X but "shutdown -r now" cleanly reboots.
>>>
>>> radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
>>> but I've seen it with older revisions too.
>>>
>>> It can take a couple of days for me to hit the problem, so a git bisect
>>> could be a lengthy process. If anyone has any suggestions about faster
>>> ways to track down the issue I'd like to hear them.
>>
>> git log v2.6.25..v2.6.26 drivers/char/drm
>>
>> 5e35eff13f7dd0f5c1d82b3b4708b2f7a5f44113
>> 5cfb6956073a9e42d44a26790b7800980634d037
>
> No joy.
>
>> d396db321bcaec54345e7e9e87cea8482d6ae3a8
>
> I thought this might be it; nearly 5 days of uptime rather than the
> usual less than 2. But I got the same symptoms today so I'll continue
> working down the list.
>
>> 259434acccbc823ee8bc00b2d2689ccccd25e1fd
>> d7463eb41d88a39de2653fd41857c4ccddb8707b
>> 45e519052e8f583a709edd442a23f59581d3fe42
>> 2735977b12cb0f113aae24afff04747b6d0f5bf1
>> 3722bfc607d46275369865c02fe8694486d640b5
>> fa0d71b967506031f7cb08ced6095d1c4f988594
>> 9f18409ea3d778a171a9505c0a849d846f352bd0
Any joy ? I apparently have the same problem with my RS690. I
noticed it after upgrading from 2.6.25 to 2.6.26, alongside
xorg-server (1.4.99.904 to 1.4.99.905) and Mesa (7.1-rc1 to
7.1-rc3). The ATI driver is 6.9.0.
Here it always freezes in a few minutes or less than an hour.
When it happens, I'm not running any 3D application and the CPU
is idle. I may be just typing something in a shell. But it
works disabling DRI.
Alt-SysRq-s/u/b is the only way. Trying with q freezes the
mouse cursor.
On Sat, Aug 09, 2008 at 05:47:42AM -0300, Fr?d?ric L. W. Meunier wrote:
> On Fri, 1 Aug 2008, Jonathan McDowell wrote:
> >On Fri, Jul 25, 2008 at 11:10:07AM +0100, Dave Airlie wrote:
> >>>I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
> >>>The symptoms are that load average goes up, X stops accepting keypresses
> >>>or mouse clicks, but the cursor still moves around the screen in
> >>>response to the mouse being moved. I can't switch to a VT but can ssh in
> >>>remotely to see that things are still running. I don't seem to be able
> >>>to kill X but "shutdown -r now" cleanly reboots.
> >>>
> >>>radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
> >>>but I've seen it with older revisions too.
> >>>
> >>>It can take a couple of days for me to hit the problem, so a git bisect
> >>>could be a lengthy process. If anyone has any suggestions about faster
> >>>ways to track down the issue I'd like to hear them.
> >>
> >>git log v2.6.25..v2.6.26 drivers/char/drm
> >>
> >>5e35eff13f7dd0f5c1d82b3b4708b2f7a5f44113
> >>5cfb6956073a9e42d44a26790b7800980634d037
> >
> >No joy.
> >
> >>d396db321bcaec54345e7e9e87cea8482d6ae3a8
> >
> >I thought this might be it; nearly 5 days of uptime rather than the
> >usual less than 2. But I got the same symptoms today so I'll continue
> >working down the list.
> >
> >>259434acccbc823ee8bc00b2d2689ccccd25e1fd
> >>d7463eb41d88a39de2653fd41857c4ccddb8707b
> >>45e519052e8f583a709edd442a23f59581d3fe42
> >>2735977b12cb0f113aae24afff04747b6d0f5bf1
> >>3722bfc607d46275369865c02fe8694486d640b5
> >>fa0d71b967506031f7cb08ced6095d1c4f988594
> >>9f18409ea3d778a171a9505c0a849d846f352bd0
>
> Any joy ?
259434acccbc823ee8bc00b2d2689ccccd25e1fd
d7463eb41d88a39de2653fd41857c4ccddb8707b
45e519052e8f583a709edd442a23f59581d3fe42
all don't seem to be the problem. It's getting harder to do the reverts
and I'm away this week so I haven't got any further yet.
> I apparently have the same problem with my RS690. I noticed it after
> upgrading from 2.6.25 to 2.6.26, alongside xorg-server (1.4.99.904 to
> 1.4.99.905) and Mesa (7.1-rc1 to 7.1-rc3). The ATI driver is 6.9.0.
>
> Here it always freezes in a few minutes or less than an hour. When it
> happens, I'm not running any 3D application and the CPU is idle. I may
> be just typing something in a shell. But it works disabling DRI.
Likewise, I'm not doing anything 3D related (at least, not consciously).
J.
--
] http://www.earth.li/~noodles/ [] No program done by a hacker will [
] PGP/GPG Key @ the.earth.li [] work unless he is on the system. [
] via keyserver, web or email. [] [
] RSA: 4DC4E7FD / DSA: 5B430367 [] [
On Sun, 10 Aug 2008, Jonathan McDowell wrote:
> On Sat, Aug 09, 2008 at 05:47:42AM -0300, Fr?d?ric L. W. Meunier wrote:
>> On Fri, 1 Aug 2008, Jonathan McDowell wrote:
>>> On Fri, Jul 25, 2008 at 11:10:07AM +0100, Dave Airlie wrote:
>>>>> I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
>>>>> The symptoms are that load average goes up, X stops accepting keypresses
>>>>> or mouse clicks, but the cursor still moves around the screen in
>>>>> response to the mouse being moved. I can't switch to a VT but can ssh in
>>>>> remotely to see that things are still running. I don't seem to be able
>>>>> to kill X but "shutdown -r now" cleanly reboots.
>>>>>
>>>>> radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
>>>>> but I've seen it with older revisions too.
>>>>>
>>>>> It can take a couple of days for me to hit the problem, so a git bisect
>>>>> could be a lengthy process. If anyone has any suggestions about faster
>>>>> ways to track down the issue I'd like to hear them.
>>>>
>>>> git log v2.6.25..v2.6.26 drivers/char/drm
>>>>
>>>> 5e35eff13f7dd0f5c1d82b3b4708b2f7a5f44113
>>>> 5cfb6956073a9e42d44a26790b7800980634d037
>>>
>>> No joy.
>>>
>>>> d396db321bcaec54345e7e9e87cea8482d6ae3a8
>>>
>>> I thought this might be it; nearly 5 days of uptime rather than the
>>> usual less than 2. But I got the same symptoms today so I'll continue
>>> working down the list.
>>>
>>>> 259434acccbc823ee8bc00b2d2689ccccd25e1fd
>>>> d7463eb41d88a39de2653fd41857c4ccddb8707b
>>>> 45e519052e8f583a709edd442a23f59581d3fe42
>>>> 2735977b12cb0f113aae24afff04747b6d0f5bf1
>>>> 3722bfc607d46275369865c02fe8694486d640b5
>>>> fa0d71b967506031f7cb08ced6095d1c4f988594
>>>> 9f18409ea3d778a171a9505c0a849d846f352bd0
>>
>> Any joy ?
>
> 259434acccbc823ee8bc00b2d2689ccccd25e1fd
> d7463eb41d88a39de2653fd41857c4ccddb8707b
> 45e519052e8f583a709edd442a23f59581d3fe42
>
> all don't seem to be the problem. It's getting harder to do the reverts
> and I'm away this week so I haven't got any further yet.
>
>> I apparently have the same problem with my RS690. I noticed it after
>> upgrading from 2.6.25 to 2.6.26, alongside xorg-server (1.4.99.904 to
>> 1.4.99.905) and Mesa (7.1-rc1 to 7.1-rc3). The ATI driver is 6.9.0.
>>
>> Here it always freezes in a few minutes or less than an hour. When it
>> happens, I'm not running any 3D application and the CPU is idle. I may
>> be just typing something in a shell. But it works disabling DRI.
>
> Likewise, I'm not doing anything 3D related (at least, not consciously).
BTW, I forgot to mention that. Here the motherboard is a
Gigabyte GA-MA69VM-S2. When it happens and I use SysRq to
reboot, it doesn't post in the BIOS screen. I have to press
reset.
On Sun, Aug 10, 2008 at 05:25:55PM -0300, Fr?d?ric L. W. Meunier wrote:
> On Sun, 10 Aug 2008, Jonathan McDowell wrote:
> >On Sat, Aug 09, 2008 at 05:47:42AM -0300, Fr?d?ric L. W. Meunier wrote:
> >>I apparently have the same problem with my RS690. I noticed it after
> >>upgrading from 2.6.25 to 2.6.26, alongside xorg-server (1.4.99.904 to
> >>1.4.99.905) and Mesa (7.1-rc1 to 7.1-rc3). The ATI driver is 6.9.0.
> >>
> >>Here it always freezes in a few minutes or less than an hour. When it
> >>happens, I'm not running any 3D application and the CPU is idle. I may
> >>be just typing something in a shell. But it works disabling DRI.
> >
> >Likewise, I'm not doing anything 3D related (at least, not consciously).
>
> BTW, I forgot to mention that. Here the motherboard is a
> Gigabyte GA-MA69VM-S2. When it happens and I use SysRq to
> reboot, it doesn't post in the BIOS screen. I have to press
> reset.
My mobo is an ASUS M2A-VM HDMI and a "shutdown -r now" when X is wedged
(done over ssh) results in a clean reboot; no need to hard reset.
J.
--
] http://www.earth.li/~noodles/ [] 101 things you can't have too much [
] PGP/GPG Key @ the.earth.li [] of : 38 - clean underwear. [
] via keyserver, web or email. [] [
] RSA: 4DC4E7FD / DSA: 5B430367 [] [
On Fri, Jul 25, 2008 at 11:10:07AM +0100, Dave Airlie wrote:
> > I've started to see "hangs" with X on an ATI RS690 with a 2.6.26 kernel.
> > The symptoms are that load average goes up, X stops accepting keypresses
> > or mouse clicks, but the cursor still moves around the screen in
> > response to the mouse being moved. I can't switch to a VT but can ssh in
> > remotely to see that things are still running. I don't seem to be able
> > to kill X but "shutdown -r now" cleanly reboots.
> >
> > radeon driver is recent git - 1c5858484da4fb1c9bc3ac3b4d7a97863ab99730
> > but I've seen it with older revisions too.
> >
> > It can take a couple of days for me to hit the problem, so a git bisect
> > could be a lengthy process. If anyone has any suggestions about faster
> > ways to track down the issue I'd like to hear them.
>
> git log v2.6.25..v2.6.26 drivers/char/drm
...
> not sure if you wanna try reverting some of those and seeing which is the
> cause maybe..
I never figured out which of these caused the issues, but as a further
data point for anyone else suffering from the issue 2.6.27-rc kernels
appear to fix (or at least significantly ease) the problem; I managed a
23 day uptime on 2.6.27-rc5 with I think one X freeze during that period
that cleaned up after a Ctrl-Alt-Backspace. Not seen the same thing at
all on 2.6.27-rc7 (though only ran it for 14 days before rebooting into
2.6.27 proper).
J.
--
Web [ Can I trade this job for what's behind door 2? ]
site: http:// [ ] Made by
http://www.earth.li/~noodles/ [ ] HuggieTag 0.0.23