2011-06-12 10:48:09

by Jaroslaw Fedewicz

[permalink] [raw]
Subject: drm-radeon failures on R600: patches still don't work

Hello,

There was a recent thread as found on
https://lkml.org/lkml/2011/6/8/17, started by Markus Trippelsdorf:

> The merge of the 'drm-radeon' branch by Linus yesterday breaks my setup
> (RS780). The mouse cursor is just a black block suddenly and I see an
> endless stream of:
> radeon 0000:01:05.0: r600_check_texture_resource:1338 texture invalid format 26
> [drm:radeon_cs_ioctl] *ERROR* Invalid command stream !

This is also true of my laptop (Thinkpad Edge 13, AMD model) which has
built-in Radeon HD3200 (RS780) inside, and the most recent kernel from
git.

Unfortunately, none of the patches proposed so far worked. The patch
by Markus (https://lkml.org/lkml/2011/6/8/19) did work in the sense
that X actually started up, but all graphics were very sluggish
including mouse movements, to the point of unability to do anything
remotely useful on that machine, and a patch proposed by Dave Airlie
(https://lkml.org/lkml/2011/6/8/117) didn't apply - it comes out the
line it added was already in the code.

I'm not a kernel hacker by any means, so sorry if I understood
anything wrong; if I need to supply any additional information, I'll
be glad to.

Also, sorry for not continuing that thread; I have just subscribed so
I can't "reply" to it.


2011-06-12 11:21:29

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: drm-radeon failures on R600: patches still don't work

On 2011.06.12 at 13:48 +0300, Jaroslaw Fedewicz wrote:
>
> There was a recent thread as found on
> https://lkml.org/lkml/2011/6/8/17, started by Markus Trippelsdorf:
>
> > The merge of the 'drm-radeon' branch by Linus yesterday breaks my setup
> > (RS780). The mouse cursor is just a black block suddenly and I see an
> > endless stream of:
> > radeon 0000:01:05.0: r600_check_texture_resource:1338 texture invalid format 26
> > [drm:radeon_cs_ioctl] *ERROR* Invalid command stream !
>
> This is also true of my laptop (Thinkpad Edge 13, AMD model) which has
> built-in Radeon HD3200 (RS780) inside, and the most recent kernel from
> git.
>
> Unfortunately, none of the patches proposed so far worked. The patch
> by Markus (https://lkml.org/lkml/2011/6/8/19) did work in the sense
> that X actually started up, but all graphics were very sluggish
> including mouse movements, to the point of unability to do anything
> remotely useful on that machine, and a patch proposed by Dave Airlie
> (https://lkml.org/lkml/2011/6/8/117) didn't apply - it comes out the
> line it added was already in the code.

Hmm, maybe you're seeing a different problem. The issue that I saw was
fixed by commit 428c6e3630 in the git tree (this is identical to the
patch by Dave you're referring to above).

Does a "git-revert fe6f0bd03d697835e76dd18d232ba476c65b8282" solve your
problem?

> I'm not a kernel hacker by any means, so sorry if I understood
> anything wrong; if I need to supply any additional information, I'll
> be glad to.

It would be great if you could git-bisect the issue. Basically you just
need to run:

git bisect start
git bisect bad
git bisect good ecff4fcc7bbaf060646d2160123f8dc02605a047

build new kernel
(reboot)
and then run either "git bisect good" or "git bisect bad", depending on
whether you see the problem or not, and then repeat the last 3 items in
the list. (please see also "man git-bisect")

> Also, sorry for not continuing that thread; I have just subscribed so
> I can't "reply" to it.

(That's no problem. But it is always a good idea to CC the people that
you're talking about and also the dri-devel list in this case)

--
Markus

2011-06-14 15:14:31

by Jaroslaw Fedewicz

[permalink] [raw]
Subject: Re: drm-radeon failures on R600: patches still don't work

On 12/06/11 14:21, Markus Trippelsdorf wrote:
> Hmm, maybe you're seeing a different problem. The issue that I saw was
> fixed by commit 428c6e3630 in the git tree (this is identical to the
> patch by Dave you're referring to above).
>
> Does a "git-revert fe6f0bd03d697835e76dd18d232ba476c65b8282" solve your
> problem?

As for spewing the log with "invalid textures", yes. I bisected it twice
to exclude any pilot error.

As for other things... it's rather getting strange. Instead of garbled
screen when starting X which was there before revert, and having ability
to switch to a text VT to see the kernel message, the machine just...
hangs. The cursor first freezes, then after a few seconds disappears,
and that's it. After 30 seconds more, HDD stops spinning.

I thought there might be a kernel panic or something, but even if it is,
the kernel cannot say anything. I fired up netconsole and tried to go
with that, but to no avail. There was no message which could be a clear
precursor to disaster.

I thought then that the kernel might manage to print at least something
on the console. So as soon as I have seen the switch to graphics and the
mouse cursor, I switched the VT immediately. After a while, the screen
went black and that was it. It wasn't a particular service.

Even more curious thing is that even using Catalyst doesn't help: it
invariably comes to this end. I have a pure text console boot, then it
starts GDM, then there's a black screen with busy cursor spinning, I
switch to the text console, after a few seconds I lose control over it,
that is - the boot process freezes and the keyboard stops accepting
input, then the switch to a black screen and that's it.

And the only reliable pre-requisite for this to happen is to launch X.
In single-user, without X running, it would work for days, KMS or not.

I'm really not sure how to debug this. This behaviour started as soon as
3.0.0-rc2 and is there as of today's git (3.0.0-rc3).

Also, I'm not sure that DRI is to blame, because the hangs occur on both
open source and proprietary drivers.

As I have told before, netconsole does not show anything suspicious.

I cannot tell if SysRq would work because my laptop hasn't got a SysRq key.

I wonder if Gallium Mesa libs could, in a way, do these things. By the
way, they work fine on 2.6.39...

The machine on which that happens is Thinkpad Edge 13, AMD model, BIOS
v1.12, AMD RS780 (Radeon HD3200). What else do I need to supply? What
debugging options to turn on?

I'm completely lost here and feel like a complete idiot.

Next I'm going to nuke my Gallium Mesa libraries and try to boot without
them, but it's interesting anyway if anyone got similar symptoms.