2005-10-25 20:35:18

by Alessandro Suardi

[permalink] [raw]
Subject: X unkillable in R state sometimes on startx , /proc/sysrq-trigger T output attached

Happened to me the third time now in the last
couple of months, always with different Linus
kernels (plus ACX100 wireless module from
Denis Vlasenko's snapshots) all hanging off
an up-to-date FC4, all built with latest stock
GCC; this last is:

[root@incident ~]# cat /proc/version
Linux version 2.6.14-rc5-git5 (asuardi@incident) (gcc version 4.0.2)
#1 PREEMPT Tue Oct 25 14:32:46 CEST 2005

Symptoms: startx at the command prompt gets
the blank screen, then... nothing. Keyboard is
dead (CapsLock doesn't get its led lit), no VT
switching works. Box is still reachable via ssh
through its wireless network card, all looks OK
except for X running and piling up CPU time,
and apparently untraceable (pstack, strace
hang trying to attach it) and unkillable (kill -9
doesn't kill it).

I took a couple of SysRQ 't' dumps via the
/proc/sysrq-trigger facility and the outcome is
attached - actually it's a full messages log,
startup to reboot.

Exact sequence this time:
1. boot
2. insmod 0.3.17 acx driver
-> notice I can't contact my wireless AP
3. rmmod it, insmod 0.3.16 driver
-> notice I am an idiot, plug in the wireless AP power cord
4. rmmod 0.3.16, insmod 0.3.17, iwconfig wlan0 up
5. switch to VT2 (Alt-F2), login as non-root, run startx
<blank screen, dead keyboard>
6. ssh in from remote box, take SysRQ dumps, reboot

If anyone has an idea on what to do to try and reproduce
and/or debug further, that'd be cool.

Box is a Dell Latitude C640 laptop, [email protected],
1GB RAM, with a USR2210 802.11b wireless
PC Card; video card is a Radeon 7500 M7 LW.

Thanks in advance, ciao,

--alessandro

"All it takes is one decision
A lot of guts, a little vision to wave
Your worries, and cares goodbye"

(Placebo - "Slave To The Wage")


Attachments:
(No filename) (1.73 kB)
messages.bz2 (7.00 B)
Download all attachments

2005-10-25 21:27:24

by Alessandro Suardi

[permalink] [raw]
Subject: Re: X unkillable in R state sometimes on startx , /proc/sysrq-trigger T output attached

On 10/25/05, Alessandro Suardi <[email protected]> wrote:
> Happened to me the third time now in the last
> couple of months, always with different Linus
> kernels (plus ACX100 wireless module from
> Denis Vlasenko's snapshots) all hanging off
> an up-to-date FC4, all built with latest stock
> GCC; this last is:
>
> [root@incident ~]# cat /proc/version
> Linux version 2.6.14-rc5-git5 (asuardi@incident) (gcc version 4.0.2)
> #1 PREEMPT Tue Oct 25 14:32:46 CEST 2005
>
> Symptoms: startx at the command prompt gets
> the blank screen, then... nothing. Keyboard is
> dead (CapsLock doesn't get its led lit), no VT
> switching works. Box is still reachable via ssh
> through its wireless network card, all looks OK
> except for X running and piling up CPU time,
> and apparently untraceable (pstack, strace
> hang trying to attach it) and unkillable (kill -9
> doesn't kill it).
>
> I took a couple of SysRQ 't' dumps via the
> /proc/sysrq-trigger facility and the outcome is
> attached - actually it's a full messages log,
> startup to reboot.
>
> Exact sequence this time:
> 1. boot
> 2. insmod 0.3.17 acx driver
> -> notice I can't contact my wireless AP
> 3. rmmod it, insmod 0.3.16 driver
> -> notice I am an idiot, plug in the wireless AP power cord
> 4. rmmod 0.3.16, insmod 0.3.17, iwconfig wlan0 up
> 5. switch to VT2 (Alt-F2), login as non-root, run startx
> <blank screen, dead keyboard>
> 6. ssh in from remote box, take SysRQ dumps, reboot
>
> If anyone has an idea on what to do to try and reproduce
> and/or debug further, that'd be cool.
>
> Box is a Dell Latitude C640 laptop, [email protected],
> 1GB RAM, with a USR2210 802.11b wireless
> PC Card; video card is a Radeon 7500 M7 LW.

OK, let's forget about attachments (100+ KB), the
full messages file can be found here:
http://xoomer.virgilio.it/incident/messages

Thanks again, ciao,

--alessandro

"All it takes is one decision
A lot of guts, a little vision to wave
Your worries, and cares goodbye"

(Placebo - "Slave To The Wage")

2005-10-26 10:25:15

by Dave Airlie

[permalink] [raw]
Subject: Re: X unkillable in R state sometimes on startx , /proc/sysrq-trigger T output attached

> If anyone has an idea on what to do to try and reproduce
> and/or debug further, that'd be cool.
>
> Box is a Dell Latitude C640 laptop, [email protected],
> 1GB RAM, with a USR2210 802.11b wireless
> PC Card; video card is a Radeon 7500 M7 LW.
>

Your getting an X hang which is usually a DRM/AGP or X configuartion problems..

Please send me your /etc/X11/xorg.conf and /var/log/Xorg.0.log after a hang..

did it work with any kernel before? and suddenly break recently?

Dave.

2005-10-26 12:22:20

by Alessandro Suardi

[permalink] [raw]
Subject: Re: X unkillable in R state sometimes on startx , /proc/sysrq-trigger T output attached

On 10/26/05, Dave Airlie <[email protected]> wrote:
> > If anyone has an idea on what to do to try and reproduce
> > and/or debug further, that'd be cool.
> >
> > Box is a Dell Latitude C640 laptop, [email protected],
> > 1GB RAM, with a USR2210 802.11b wireless
> > PC Card; video card is a Radeon 7500 M7 LW.
> >
>
> Your getting an X hang which is usually a DRM/AGP or X configuartion problems..
>
> Please send me your /etc/X11/xorg.conf and /var/log/Xorg.0.log after a hang..
>
> did it work with any kernel before? and suddenly break recently?

It's intermittent. It looks like more recent kernels have a tendency
to trigger it more easily - in fact I just happened to have another
occurrence, this time without even loading the acx driver, just

1. boot
2. login as non-root
3. startx

but it works most times.

I lied however - the keyboard is _not_ dead, despite not lighting
the CapsLock led, and I can Alt-SysRQ-<x>.

Luckily, I have both the current working Xorg.0.log and the one
coming from the hang. I'm attaching both, and my xorg.conf.

Thanks :)

--alessandro

"All it takes is one decision
A lot of guts, a little vision to wave
Your worries, and cares goodbye"

(Placebo - "Slave To The Wage")


Attachments:
(No filename) (1.20 kB)
xorg.conf (3.00 kB)
Xorg.0.log.good (34.21 kB)
Xorg.0.log.bad (34.21 kB)
Download all attachments

2005-10-26 12:29:00

by Dave Airlie

[permalink] [raw]
Subject: Re: X unkillable in R state sometimes on startx , /proc/sysrq-trigger T output attached

> > >
> > > Box is a Dell Latitude C640 laptop, [email protected],
> > > 1GB RAM, with a USR2210 802.11b wireless
> > > PC Card; video card is a Radeon 7500 M7 LW.
> > >
> >
> > Your getting an X hang which is usually a DRM/AGP or X configuartion problems..
> >
> > Please send me your /etc/X11/xorg.conf and /var/log/Xorg.0.log after a hang..
> >
> > did it work with any kernel before? and suddenly break recently?
>
> It's intermittent. It looks like more recent kernels have a tendency
> to trigger it more easily - in fact I just happened to have another
> occurrence, this time without even loading the acx driver, just
>
> 1. boot
> 2. login as non-root
> 3. startx
>
> but it works most times.
>
> I lied however - the keyboard is _not_ dead, despite not lighting
> the CapsLock led, and I can Alt-SysRQ-<x>.
>
> Luckily, I have both the current working Xorg.0.log and the one
> coming from the hang. I'm attaching both, and my xorg.conf.
>

Wierd it all looks okay to me (diffing the Xorg logs gives nothing
majorly different...)...

What desktop are you runnig on top of X? does it have any 3D or OpenGL
components in it?

if you just run X, does it always start to the X cursor without hanging..

Try disabling pre-empt also.. if you get a chance..

I've got the same chip on an evaluation card in my PC at the moment,
and I've been running the same X 6.8.2 on it for a while with no
issues on the latest kernels..

Dave.

2005-10-26 13:20:07

by Alessandro Suardi

[permalink] [raw]
Subject: Re: X unkillable in R state sometimes on startx , /proc/sysrq-trigger T output attached

On 10/26/05, Dave Airlie <[email protected]> wrote:
> > > >
> > > > Box is a Dell Latitude C640 laptop, [email protected],
> > > > 1GB RAM, with a USR2210 802.11b wireless
> > > > PC Card; video card is a Radeon 7500 M7 LW.
> > > >
> > >
> > > Your getting an X hang which is usually a DRM/AGP or X configuartion problems..
> > >
> > > Please send me your /etc/X11/xorg.conf and /var/log/Xorg.0.log after a hang..
> > >
> > > did it work with any kernel before? and suddenly break recently?
> >
> > It's intermittent. It looks like more recent kernels have a tendency
> > to trigger it more easily - in fact I just happened to have another
> > occurrence, this time without even loading the acx driver, just
> >
> > 1. boot
> > 2. login as non-root
> > 3. startx
> >
> > but it works most times.
> >
> > I lied however - the keyboard is _not_ dead, despite not lighting
> > the CapsLock led, and I can Alt-SysRQ-<x>.
> >
> > Luckily, I have both the current working Xorg.0.log and the one
> > coming from the hang. I'm attaching both, and my xorg.conf.
> >
>
> Wierd it all looks okay to me (diffing the Xorg logs gives nothing
> majorly different...)...
>
> What desktop are you runnig on top of X? does it have any 3D or OpenGL
> components in it?

Desktop is FC4 gnome-desktop-2.10.0-5. I don't think I have
anything fancy in it, apart from the mini-icons on the top panel
which include oldies (Evolution, Mozilla) and actually used
stuff (Firefox, gnome-terminal, Thunderbird, Gaim, and the
volume control).

> if you just run X, does it always start to the X cursor without hanging..

Will try. Note however, when I experience the problem X doesn't
really "hang" - it spins in CPU.

> Try disabling pre-empt also.. if you get a chance..

Will also build no-preempt kernels in the near future.

> I've got the same chip on an evaluation card in my PC at the moment,
> and I've been running the same X 6.8.2 on it for a while with no
> issues on the latest kernels..

For that matter, I'm running it now without issues... it
seems to get in the weird state only on startup.

Thanks ! Ciao,

--alessandro

"All it takes is one decision
A lot of guts, a little vision to wave
Your worries, and cares goodbye"

(Placebo - "Slave To The Wage")

2005-10-26 19:25:29

by Dave Airlie

[permalink] [raw]
Subject: Re: X unkillable in R state sometimes on startx , /proc/sysrq-trigger T output attached

>
> > if you just run X, does it always start to the X cursor without hanging..
>
> Will try. Note however, when I experience the problem X doesn't
> really "hang" - it spins in CPU.

That's a hang from the graphics developers point of view, your
graphics card has crashed and X is spinning waiting for the card to
come back and say it is okay.. something it never does...

>
> For that matter, I'm running it now without issues... it
> seems to get in the weird state only on startup.

I probably restart X about 5-10 times per working session and I've
never seen this yet, I'll do a few more reboots, we have a known issue
with a bug fix that went into X and I'm not sure if it is in your X
packages but it probably is.. can you tell me the FC4 xorg rpm titles
so I can check it, if it causing problems on AGP systems as well I'll
be pushing RH to release new X packages with a proper fix, that benh
is working on at the moment..

Dave.

>
> Thanks ! Ciao,
>
> --alessandro
>
> "All it takes is one decision
> A lot of guts, a little vision to wave
> Your worries, and cares goodbye"
>
> (Placebo - "Slave To The Wage")
>

2005-10-26 20:55:01

by Alessandro Suardi

[permalink] [raw]
Subject: Re: X unkillable in R state sometimes on startx , /proc/sysrq-trigger T output attached

On 10/26/05, Dave Airlie <[email protected]> wrote:
> >
> > > if you just run X, does it always start to the X cursor without hanging..
> >
> > Will try. Note however, when I experience the problem X doesn't
> > really "hang" - it spins in CPU.
>
> That's a hang from the graphics developers point of view, your
> graphics card has crashed and X is spinning waiting for the card to
> come back and say it is okay.. something it never does...

OK, thanks for the clarification.

> > For that matter, I'm running it now without issues... it
> > seems to get in the weird state only on startup.
>
> I probably restart X about 5-10 times per working session and I've
> never seen this yet, I'll do a few more reboots, we have a known issue
> with a bug fix that went into X and I'm not sure if it is in your X
> packages but it probably is.. can you tell me the FC4 xorg rpm titles
> so I can check it, if it causing problems on AGP systems as well I'll
> be pushing RH to release new X packages with a proper fix, that benh
> is working on at the moment..

Sure, here you go:

[asuardi@incident ~]$ grep xorg /var/log/rpmpkgs
fonts-xorg-100dpi-6.8.2-1.noarch.rpm
fonts-xorg-75dpi-6.8.2-1.noarch.rpm
fonts-xorg-base-6.8.2-1.noarch.rpm
xorg-x11-6.8.2-37.FC4.49.2.i386.rpm
xorg-x11-deprecated-libs-6.8.2-37.FC4.49.2.i386.rpm
xorg-x11-devel-6.8.2-37.FC4.49.2.i386.rpm
xorg-x11-font-utils-6.8.2-37.FC4.49.2.i386.rpm
xorg-x11-libs-6.8.2-37.FC4.49.2.i386.rpm
xorg-x11-Mesa-libGL-6.8.2-37.FC4.49.2.i386.rpm
xorg-x11-Mesa-libGLU-6.8.2-37.FC4.49.2.i386.rpm
xorg-x11-tools-6.8.2-37.FC4.49.2.i386.rpm
xorg-x11-twm-6.8.2-37.FC4.49.2.i386.rpm
xorg-x11-xauth-6.8.2-37.FC4.49.2.i386.rpm
xorg-x11-xfs-6.8.2-37.FC4.49.2.i386.rpm

It looks like I installed these packages around Sep 22,
but I honestly can't remember whether the first of these
occurrences happened after the installation or before.

Thanks a lot, ciao,

--alessandro

"All it takes is one decision
A lot of guts, a little vision to wave
Your worries, and cares goodbye"

(Placebo - "Slave To The Wage")

2005-10-26 20:58:56

by Knut Petersen

[permalink] [raw]
Subject: Re: X unkillable in R state sometimes on startx , /proc/sysrq-trigger T output attached

Two months ago I started a thread

BUG: fb_imageblit called before fb_check_var and fb_set_par function

in the Linux-fbdev-devel mailing list. I found that the accelerated
imageblit
function of a framebuffer driver might be called before the graphics
engine is
initialized ... normally that happens in the fb_set_par function. For
cyblafb
I solved the problem by extending the fb_sync function to include a call
to the
graphics engine init function after a short timeout, but the problem is
still
present in all recent kernels. It might be argued that this is not a
kernel bug
but a problem of X - have a look at the Linux-fbdevel thread.

Does X start reliably without a linux framebuffer driver?
Does X start reliably with vesafb?

If the answer is "yes", then have a look at the radeonfb sync function.
After a short timeout, assume that an erroneous X driver disabled mmio,
so (re)enable mmio and (re)init the graphics engine.

cu,
knut

2005-10-26 21:05:11

by Dave Airlie

[permalink] [raw]
Subject: Re: X unkillable in R state sometimes on startx , /proc/sysrq-trigger T output attached

>
> BUG: fb_imageblit called before fb_check_var and fb_set_par function
>

> present in all recent kernels. It might be argued that this is not a
> kernel bug
> but a problem of X - have a look at the Linux-fbdevel thread.
>
> Does X start reliably without a linux framebuffer driver?
> Does X start reliably with vesafb?

Hmm I missed the fact that you are using radeonfb, this could point to
the X chipset initialisation code as well...

I'll try and track down if the evil patch made it into xorg in FC4..

Dave.

2005-10-28 14:57:55

by Nix

[permalink] [raw]
Subject: Re: X unkillable in R state sometimes on startx , /proc/sysrq-trigger T output attached

On 26 Oct 2005, Dave Airlie said:
> Your getting an X hang which is usually a DRM/AGP or X configuartion problems..

Indeed. As a random example, when I installed my new Radeon 9250 last
week, I flipped the AGPMode to 8 because the card said it was capable of
that... and X went CPU-mad within seconds of starting 3D rendering.
Looking at the kernel logs made the cause clear:

Oct 25 22:09:08 hades info: kernel: agpgart: Putting AGP V2 device at 0000:00:00.0 into 1x mode

Whether the cause was that X thought it was using 8x and the kernel
thought it was using 1x, I don't know, but changing it to 4 brought
everything into agreement and eliminated the hangs.

(This was with X.org 6.8.99.901.)


So AGP is indeed one of those things which a misconfiguration of can
cause all sorts of lockup-like problems. (Just like misconfiguring any
of the other buses in the system, I suppose.)

--
`"Gun-wielding recluse gunned down by local police" isn't the epitaph
I want. I am hoping for "Witnesses reported the sound up to two hundred
kilometers away" or "Last body part finally located".' --- James Nicoll