2002-09-21 10:05:20

by Helge Hafting

[permalink] [raw]
Subject: 2.5.37 won't run X?

X won't start on 2.5.37, but works with 2.5.36
The screen goes black as usual, but then nothing else happens.
ssh'ing in from another machine shows XFree86 using 50% cpu,
i.e. one of the two cpu's in this machine.

killing the XFre86 process is impossible, even with kill -9
from root. sysrq SAK worked though, so I could recover
the machine. But I had to boot a different kernel to run X.

lspci
00:0f.0 VGA compatible controller: S3 Inc. ViRGE/DX or /GX (rev 01)

2.5.37 SMP kernel

XFree86 Version 4.1.0.1 / X Window System
(protocol Version 11, revision 0, vendor release 6510)
Release Date: 21 December 2001

Distribution debian testing

Helge Hafting


2002-09-21 13:37:01

by Florin Iucha

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

On Sat, Sep 21, 2002 at 12:10:41PM +0200, Helge Hafting wrote:
> X won't start on 2.5.37, but works with 2.5.36
> The screen goes black as usual, but then nothing else happens.
> ssh'ing in from another machine shows XFree86 using 50% cpu,
> i.e. one of the two cpu's in this machine.
>
> killing the XFre86 process is impossible, even with kill -9
> from root. sysrq SAK worked though, so I could recover
> the machine. But I had to boot a different kernel to run X.
>
> lspci
> 00:0f.0 VGA compatible controller: S3 Inc. ViRGE/DX or /GX (rev 01)
>
> 2.5.37 SMP kernel
>
> XFree86 Version 4.1.0.1 / X Window System
> (protocol Version 11, revision 0, vendor release 6510)
> Release Date: 21 December 2001
>
> Distribution debian testing
>
> Helge Hafting

I get the same behavior on two machines here (a desktop with Duron/SIS 735
chipset/ATI 8500 video and a laptop with PIII/BX chipset/S3 Savage MX video)
so it is not hardware specific.

I am running debian testing, the laptop has XFree 4.1.0 from debian, the
desktop has XFree 4.2.0 from xfree.org .

I am not runnning SMP or preempt. Preempt kernel on the laptop last
worked in 2.5.33 . When enabling preempt in 2.5.36 I get an oops, while
in 2.5.37 I get a bunch of oopses, continuously scrolling off the
screen.

florin

--

"If it's not broken, let's fix it till it is."

41A9 2BDE 8E11 F1C5 87A6 03EE 34B3 E075 3B90 DFE4


Attachments:
(No filename) (1.36 kB)
(No filename) (189.00 B)
Download all attachments

2002-09-21 14:50:47

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

> X won't start on 2.5.37, but works with 2.5.36
> The screen goes black as usual, but then nothing else happens.
> ssh'ing in from another machine shows XFree86 using 50% cpu,
> i.e. one of the two cpu's in this machine.

Looks like Linus fixed this already in his BK tree ... want
to grab that and see if it fixes your problem?

M.

2002-09-21 16:12:04

by Florin Iucha

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

On Sat, Sep 21, 2002 at 07:53:37AM -0700, Martin J. Bligh wrote:
> > X won't start on 2.5.37, but works with 2.5.36
> > The screen goes black as usual, but then nothing else happens.
> > ssh'ing in from another machine shows XFree86 using 50% cpu,
> > i.e. one of the two cpu's in this machine.
>
> Looks like Linus fixed this already in his BK tree ... want
> to grab that and see if it fixes your problem?

What changeset do you think fixed this?

Anyway, I grabbed ftp://nl.linux.org/pub/linux/bk2patch/tagged-to-head.v2.5
and patched it in. Same result X eating 99% cpu time.

florin

--

"If it's not broken, let's fix it till it is."

41A9 2BDE 8E11 F1C5 87A6 03EE 34B3 E075 3B90 DFE4


Attachments:
(No filename) (695.00 B)
(No filename) (189.00 B)
Download all attachments

2002-09-21 16:22:26

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

>> > X won't start on 2.5.37, but works with 2.5.36
>> > The screen goes black as usual, but then nothing else happens.
>> > ssh'ing in from another machine shows XFree86 using 50% cpu,
>> > i.e. one of the two cpu's in this machine.
>>
>> Looks like Linus fixed this already in his BK tree ... want
>> to grab that and see if it fixes your problem?
>
> What changeset do you think fixed this?

Well, this bit looked hopeful:

23 hours torvalds 1.575 Fix vm86 system call interface to entry.S.
This has been broken since the thread_info support went in (early July),
and can cause lockups at X startup etc.


2002-09-21 18:54:34

by Florin Iucha

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

On Sat, Sep 21, 2002 at 09:25:18AM -0700, Martin J. Bligh wrote:
> >> > X won't start on 2.5.37, but works with 2.5.36
> >> > The screen goes black as usual, but then nothing else happens.
> >> > ssh'ing in from another machine shows XFree86 using 50% cpu,
> >> > i.e. one of the two cpu's in this machine.
> >>
> >> Looks like Linus fixed this already in his BK tree ... want
> >> to grab that and see if it fixes your problem?
> >
> > What changeset do you think fixed this?
>
> Well, this bit looked hopeful:
>
> 23 hours torvalds 1.575 Fix vm86 system call interface to entry.S.
> This has been broken since the thread_info support went in (early July),
> and can cause lockups at X startup etc.

X is not locked up, as it eats all the CPU. And 2.5.36 works just fine.

florin

--

"If it's not broken, let's fix it till it is."

41A9 2BDE 8E11 F1C5 87A6 03EE 34B3 E075 3B90 DFE4


Attachments:
(No filename) (893.00 B)
(No filename) (189.00 B)
Download all attachments

2002-09-21 20:18:49

by Andries Brouwer

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

On Sat, Sep 21, 2002 at 01:59:39PM -0500, Florin Iucha wrote:

> X is not locked up, as it eats all the CPU. And 2.5.36 works just fine.

I noticed that the pgrp-related behaviour of some programs changed.
Some programs hang, some programs loop. The hang occurs when they
are stopped by SIGTTOU. The infinite loop occurs when they catch SIGTTOU
(and the same signal is sent immediately again when they leave the
signal routine).
Have not yet investigated details.

2002-09-21 20:43:28

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

On Sat, Sep 21, 2002 at 01:59:39PM -0500, Florin Iucha wrote:
>> X is not locked up, as it eats all the CPU. And 2.5.36 works just fine.

On Sat, Sep 21, 2002 at 10:23:53PM +0200, Andries Brouwer wrote:
> I noticed that the pgrp-related behaviour of some programs changed.
> Some programs hang, some programs loop. The hang occurs when they
> are stopped by SIGTTOU. The infinite loop occurs when they catch SIGTTOU
> (and the same signal is sent immediately again when they leave the
> signal routine).
> Have not yet investigated details.

I'm looking into it.


Thanks,
Bill

2002-09-22 04:32:42

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

On Sat, Sep 21, 2002 at 01:59:39PM -0500, Florin Iucha wrote:
>> X is not locked up, as it eats all the CPU. And 2.5.36 works just fine.

On Sat, Sep 21, 2002 at 10:23:53PM +0200, Andries Brouwer wrote:
> I noticed that the pgrp-related behaviour of some programs changed.
> Some programs hang, some programs loop. The hang occurs when they
> are stopped by SIGTTOU. The infinite loop occurs when they catch SIGTTOU
> (and the same signal is sent immediately again when they leave the
> signal routine).
> Have not yet investigated details.

Linus seems to have put out 2.5.38 with some X lockup fixes. Can you
still reproduce this? If so, are there non-X-related testcases where
you can trigger this? My T21 Thinkpad doesn't see this at all.

I'm still prodding the SIGTTOU path trying to trigger it until then.


Thanks,
Bill

2002-09-22 06:21:58

by Florin Iucha

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

On Sat, Sep 21, 2002 at 09:30:50PM -0700, William Lee Irwin III wrote:
> On Sat, Sep 21, 2002 at 01:59:39PM -0500, Florin Iucha wrote:
> >> X is not locked up, as it eats all the CPU. And 2.5.36 works just fine.
>
> On Sat, Sep 21, 2002 at 10:23:53PM +0200, Andries Brouwer wrote:
> > I noticed that the pgrp-related behaviour of some programs changed.
> > Some programs hang, some programs loop. The hang occurs when they
> > are stopped by SIGTTOU. The infinite loop occurs when they catch SIGTTOU
> > (and the same signal is sent immediately again when they leave the
> > signal routine).
> > Have not yet investigated details.
>
> Linus seems to have put out 2.5.38 with some X lockup fixes. Can you
> still reproduce this? If so, are there non-X-related testcases where
> you can trigger this? My T21 Thinkpad doesn't see this at all.
>
> I'm still prodding the SIGTTOU path trying to trigger it until then.

Weird. 2.5.38 works just fine but the head from few hours ago (which
supposedly had the fix) doesn't. Oh well, it works fine now on both the
desktop and the laptop.

florin

--

"If it's not broken, let's fix it till it is."

41A9 2BDE 8E11 F1C5 87A6 03EE 34B3 E075 3B90 DFE4


Attachments:
(No filename) (1.17 kB)
(No filename) (189.00 B)
Download all attachments

2002-09-22 12:13:36

by Andries Brouwer

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

On Sat, Sep 21, 2002 at 09:30:50PM -0700, William Lee Irwin III wrote:

> On Sat, Sep 21, 2002 at 10:23:53PM +0200, Andries Brouwer wrote:
> > I noticed that the pgrp-related behaviour of some programs changed.
> > Some programs hang, some programs loop. The hang occurs when they
> > are stopped by SIGTTOU. The infinite loop occurs when they catch SIGTTOU
> > (and the same signal is sent immediately again when they leave the
> > signal routine).
> > Have not yet investigated details.
>
> Linus seems to have put out 2.5.38 with some X lockup fixes. Can you
> still reproduce this? If so, are there non-X-related testcases where
> you can trigger this? My T21 Thinkpad doesn't see this at all.
>
> I'm still prodding the SIGTTOU path trying to trigger it until then.

Yes, 2.5.38 behaves differently again, but the statement that
pgrp-related behaviour of some programs changed is still true.

For example: "emacs -nw foo.c" in an xterm window
will start emacs fine. Now put this line in a shell script:
#!/bin/sh
emacs -nw $@
so that pid and pgrp of the started emacs differ. Under 2.5.33
this works, but under 2.5.3[78] this hangs.

Andries

2002-09-23 04:51:10

by Florin Iucha

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

On Sun, Sep 22, 2002 at 01:27:02AM -0500, Florin Iucha wrote:
> On Sat, Sep 21, 2002 at 09:30:50PM -0700, William Lee Irwin III wrote:
> > On Sat, Sep 21, 2002 at 01:59:39PM -0500, Florin Iucha wrote:
> > >> X is not locked up, as it eats all the CPU. And 2.5.36 works just fine.
> >
> > On Sat, Sep 21, 2002 at 10:23:53PM +0200, Andries Brouwer wrote:
> > > I noticed that the pgrp-related behaviour of some programs changed.
> > > Some programs hang, some programs loop. The hang occurs when they
> > > are stopped by SIGTTOU. The infinite loop occurs when they catch SIGTTOU
> > > (and the same signal is sent immediately again when they leave the
> > > signal routine).
> > > Have not yet investigated details.
> >
> > Linus seems to have put out 2.5.38 with some X lockup fixes. Can you
> > still reproduce this? If so, are there non-X-related testcases where
> > you can trigger this? My T21 Thinkpad doesn't see this at all.
> >
> > I'm still prodding the SIGTTOU path trying to trigger it until then.
>
> Weird. 2.5.38 works just fine but the head from few hours ago (which
> supposedly had the fix) doesn't. Oh well, it works fine now on both the
> desktop and the laptop.

I take that back. 2.5.38 works fine on the laptop. On the desktop the
situation is tricky:
* I have compiled 2.5.38 under 2.5.34+xfs,
* rebooted with 2.5.38 and
* spent all day long in 2.5.38,
* rebooted temporarily in Windows
* then all boots into 2.5.38 resulted in a lock up.
The lockup happens with all kernels since 2.5.35 and it is random. It
happens in xdm waiting for login, in starting up KDE, in starting up
daemons at boot up.

Even when hanging in X, the Alt-SysRq still works to SUB.

2.5.34+xfs (from the SGI CVS) works fine. I will try a recent snapshot
from them, with a more recent kernel.

florin

--

"If it's not broken, let's fix it till it is."

41A9 2BDE 8E11 F1C5 87A6 03EE 34B3 E075 3B90 DFE4


Attachments:
(No filename) (1.88 kB)
(No filename) (189.00 B)
Download all attachments

2002-09-23 05:00:00

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.37 won't run X?

On Sun, Sep 22, 2002 at 11:56:17PM -0500, Florin Iucha wrote:
> The lockup happens with all kernels since 2.5.35 and it is random. It
> happens in xdm waiting for login, in starting up KDE, in starting up
> daemons at boot up.
> Even when hanging in X, the Alt-SysRq still works to SUB.
> 2.5.34+xfs (from the SGI CVS) works fine. I will try a recent snapshot
> from them, with a more recent kernel.
> florin

This is different for me. I'm seeing a false negative from
is_orphaned_pgrp(). I'm poking around for the race as I go,
though it isn't in is_orphaned_pgrp() itself.


Cheers,
Bill