2006-01-14 06:52:48

by Dave Jones

[permalink] [raw]
Subject: 2.6.15-git breaks Xorg on em64t

Andi,
Sometime in the last week something was introduced to Linus'
tree which makes my dual EM64T go nuts when X tries to start.
By "go nuts", I mean it does various random things, seen so
far..
- Machine check. (I'm convinced this isn't a hardware problem
despite the new addition telling me otherwise :)
- Reboot
- Total lockup
- NMI watchdog firing, and then lockup

I've tried backing out a handful of the x86-64 patches, and
didn't get too far, as some of them are dependant on others,
it quickly became a real mess to try to bisect where exactly it broke.

Any ideas for potential candidates to try & back out ?

Dave


2006-01-14 14:49:57

by Alessandro Suardi

[permalink] [raw]
Subject: Re: 2.6.15-git breaks Xorg on em64t

On 1/14/06, Dave Jones <[email protected]> wrote:
> Andi,
> Sometime in the last week something was introduced to Linus'
> tree which makes my dual EM64T go nuts when X tries to start.
> By "go nuts", I mean it does various random things, seen so
> far..
> - Machine check. (I'm convinced this isn't a hardware problem
> despite the new addition telling me otherwise :)
> - Reboot
> - Total lockup
> - NMI watchdog firing, and then lockup
>
> I've tried backing out a handful of the x86-64 patches, and
> didn't get too far, as some of them are dependant on others,
> it quickly became a real mess to try to bisect where exactly it broke.
>
> Any ideas for potential candidates to try & back out ?

Did you perhaps take a look at my report ? -git{6,7} were
bad for me, and the netconsole stack was, uhm, interesting.

http://www.ussg.iu.edu/hypermail/linux/kernel/0601.1/2130.html

-git8 made the problem disappear. Haven't tested more recent
snapshots.

Ciao,

--alessandro

"Somehow all you ever need is, never really quite enough, you know"

(Bruce Springsteen - "Reno")

2006-01-14 18:45:18

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.15-git breaks Xorg on em64t

On Saturday 14 January 2006 07:52, Dave Jones wrote:
> Andi,
> Sometime in the last week something was introduced to Linus'
> tree which makes my dual EM64T go nuts when X tries to start.
> By "go nuts", I mean it does various random things, seen so
> far..
> - Machine check. (I'm convinced this isn't a hardware problem
> despite the new addition telling me otherwise :)

Normally it should be impossible to cause machine checks from software
on Intel systems.

> - Reboot
> - Total lockup
> - NMI watchdog firing, and then lockup
>
> I've tried backing out a handful of the x86-64 patches, and
> didn't get too far, as some of them are dependant on others,
> it quickly became a real mess to try to bisect where exactly it broke.\

Shouldn't be too bad - i did a binary search for something else and it worked
pretty well.
>
> Any ideas for potential candidates to try & back out ?

Does it work when you revert all x86-64 changes?

-Andi

2006-01-14 22:51:57

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-git breaks Xorg on em64t

On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > Andi,
> > Sometime in the last week something was introduced to Linus'
> > tree which makes my dual EM64T go nuts when X tries to start.
> > By "go nuts", I mean it does various random things, seen so
> > far..
> > - Machine check. (I'm convinced this isn't a hardware problem
> > despite the new addition telling me otherwise :)
>
> Normally it should be impossible to cause machine checks from software
> on Intel systems.

-git7+ is the only time I've ever seen one on this box.

> > - Reboot
> > - Total lockup
> > - NMI watchdog firing, and then lockup
> >
> > I've tried backing out a handful of the x86-64 patches, and
> > didn't get too far, as some of them are dependant on others,
> > it quickly became a real mess to try to bisect where exactly it broke.\
>
> Shouldn't be too bad - i did a binary search for something else and it worked
> pretty well.

The patches reverted, but I hit problems like modprobe segfaulting during
boot, or reboots during APIC init.

> > Any ideas for potential candidates to try & back out ?
> Does it work when you revert all x86-64 changes?

-git6 which was the last one not to include the x86-64 changes boots and runs fine.

I'll try a latest -git with x86-64 backed out when I get a chance.

Dave

2006-01-15 00:06:46

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.15-git breaks Xorg on em64t

On Saturday 14 January 2006 23:51, Dave Jones wrote:
> On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> > On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > > Andi,
> > > Sometime in the last week something was introduced to Linus'
> > > tree which makes my dual EM64T go nuts when X tries to start.
> > > By "go nuts", I mean it does various random things, seen so
> > > far..
> > > - Machine check. (I'm convinced this isn't a hardware problem
> > > despite the new addition telling me otherwise :)
> >
> > Normally it should be impossible to cause machine checks from software
> > on Intel systems.
>
> -git7+ is the only time I've ever seen one on this box.

What happens when you apply

ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/page-table-setup

?

Anyways, since there doesn't seem to be much interest in pretesting of x86-64
patches anymore before merge such things happen occassionally.

-Andi

2006-01-15 07:07:05

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-git breaks Xorg on em64t

On Sun, Jan 15, 2006 at 01:05:07AM +0100, Andi Kleen wrote:
> On Saturday 14 January 2006 23:51, Dave Jones wrote:
> > On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> > > On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > > > Andi,
> > > > Sometime in the last week something was introduced to Linus'
> > > > tree which makes my dual EM64T go nuts when X tries to start.
> > > > By "go nuts", I mean it does various random things, seen so
> > > > far..
> > > > - Machine check. (I'm convinced this isn't a hardware problem
> > > > despite the new addition telling me otherwise :)
> > >
> > > Normally it should be impossible to cause machine checks from software
> > > on Intel systems.
> >
> > -git7+ is the only time I've ever seen one on this box.
>
> What happens when you apply
>
> ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/page-table-setup

What does this apply to ? Is it dependant on something else not
merged yet ? I get rejects when I apply it to 2.6.15-git10

I don't have time to fix it up by hand right now (I was going to set
a build going before I turned in for the night..)

I'll poke at it again tomorrow.

Another datapoint btw: I've another EM64T that works just fine.
The one that fails is the only one that isn't using onboard VGA,
this one has a PCIE Radeon. Given it happens when X is starting up,
it could be that the X radeon driver does something special which
is why others aren't seeing this.

Dave

2006-01-15 08:35:23

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: 2.6.15-git breaks Xorg on em64t

On Sun, 15 Jan 2006, Dave Jones wrote:
> On Sun, Jan 15, 2006 at 01:05:07AM +0100, Andi Kleen wrote:
> > On Saturday 14 January 2006 23:51, Dave Jones wrote:
> > > On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> > > > On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > > > > Andi,
> > > > > Sometime in the last week something was introduced to Linus'
> > > > > tree which makes my dual EM64T go nuts when X tries to start.
> > > > > By "go nuts", I mean it does various random things, seen so
> > > > > far..
> > > > > - Machine check. (I'm convinced this isn't a hardware problem
> > > > > despite the new addition telling me otherwise :)
> > > >
> > > > Normally it should be impossible to cause machine checks from software
> > > > on Intel systems.
> > >
> > > -git7+ is the only time I've ever seen one on this box.
> >
> > What happens when you apply
> >
> > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/page-table-setup
>
> What does this apply to ? Is it dependant on something else not
> merged yet ? I get rejects when I apply it to 2.6.15-git10
>
> I don't have time to fix it up by hand right now (I was going to set
> a build going before I turned in for the night..)
>
> I'll poke at it again tomorrow.
>
> Another datapoint btw: I've another EM64T that works just fine.
> The one that fails is the only one that isn't using onboard VGA,
> this one has a PCIE Radeon. Given it happens when X is starting up,
> it could be that the X radeon driver does something special which
> is why others aren't seeing this.

On my i386 box with an ati radeon 9600 agp card of sorts trying to use the
radeon driver (ever since I installed suse 10.0) causes an immediate hard
lockup on startup of X. Sounds like your problem... Setting the driver
to VESA or using the ati proprietory driver works fine (but only the ati
driver actually is usable for me).

You could try that and if the others work for you like for me then it is
likely the problem is indeed 100% in the radeon driver somewhere...

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2006-01-15 09:36:08

by Dave Airlie

[permalink] [raw]
Subject: Re: 2.6.15-git breaks Xorg on em64t

>
> Another datapoint btw: I've another EM64T that works just fine.
> The one that fails is the only one that isn't using onboard VGA,
> this one has a PCIE Radeon. Given it happens when X is starting up,
> it could be that the X radeon driver does something special which
> is why others aren't seeing this.
>

It might be due to the DRM update that went through, but I can't think
what might have caused it, if you backout the DRM merge does it help
any?

did the previous kernel have DRM support for that card?

Dave.

2006-01-15 15:48:35

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.15-git breaks Xorg on em64t

On Sunday 15 January 2006 08:06, Dave Jones wrote:
> On Sun, Jan 15, 2006 at 01:05:07AM +0100, Andi Kleen wrote:
> > On Saturday 14 January 2006 23:51, Dave Jones wrote:
> > > On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> > > > On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > > > > Andi,
> > > > > Sometime in the last week something was introduced to Linus'
> > > > > tree which makes my dual EM64T go nuts when X tries to start.
> > > > > By "go nuts", I mean it does various random things, seen so
> > > > > far..
> > > > > - Machine check. (I'm convinced this isn't a hardware problem
> > > > > despite the new addition telling me otherwise :)
> > > >
> > > > Normally it should be impossible to cause machine checks from
> > > > software on Intel systems.
> > >
> > > -git7+ is the only time I've ever seen one on this box.
> >
> > What happens when you apply
> >
> > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/page-table-setup
>
> What does this apply to ? Is it dependant on something else not
> merged yet ? I get rejects when I apply it to 2.6.15-git10

To the other patches in the quilt queue (that is the x86-64 pending
tree that everybody is supposed to test, but nobody does)

I see there was a one liner reject with the memory hot add patch in there. I
reordered it now to come first. It was for -git9.

-Andi

2006-01-16 04:29:50

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-git breaks Xorg on em64t

On Sun, Jan 15, 2006 at 04:47:11PM +0100, Andi Kleen wrote:
> On Sunday 15 January 2006 08:06, Dave Jones wrote:
> > On Sun, Jan 15, 2006 at 01:05:07AM +0100, Andi Kleen wrote:
> > > On Saturday 14 January 2006 23:51, Dave Jones wrote:
> > > > On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> > > > > On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > > > > > Andi,
> > > > > > Sometime in the last week something was introduced to Linus'
> > > > > > tree which makes my dual EM64T go nuts when X tries to start.
> > > > > > By "go nuts", I mean it does various random things, seen so
> > > > > > far..
> > > > > > - Machine check. (I'm convinced this isn't a hardware problem
> > > > > > despite the new addition telling me otherwise :)
> > > > >
> > > > > Normally it should be impossible to cause machine checks from
> > > > > software on Intel systems.
> > > >
> > > > -git7+ is the only time I've ever seen one on this box.
> > >
> > > What happens when you apply
> > >
> > > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/page-table-setup
> >
> > What does this apply to ? Is it dependant on something else not
> > merged yet ? I get rejects when I apply it to 2.6.15-git10
>
> To the other patches in the quilt queue (that is the x86-64 pending
> tree that everybody is supposed to test, but nobody does)
>
> I see there was a one liner reject with the memory hot add patch in there. I
> reordered it now to come first. It was for -git9.

It didn't solve my problem. It hung at X startup again, and then
the NMI watchdog triggered. I'll try reverting DaveA's DRM bits,
but I'm not optimistic, as a) this card isn't supported yet, and b) whilst
it's compiled in, it was disabled by X at startup due to me using dual-head.

Dave

2006-01-16 06:37:04

by Dave Jones

[permalink] [raw]
Subject: Re: 2.6.15-git breaks Xorg on em64t

On Sun, Jan 15, 2006 at 08:36:05PM +1100, Dave Airlie wrote:
> >
> > Another datapoint btw: I've another EM64T that works just fine.
> > The one that fails is the only one that isn't using onboard VGA,
> > this one has a PCIE Radeon. Given it happens when X is starting up,
> > it could be that the X radeon driver does something special which
> > is why others aren't seeing this.
> >
>
> It might be due to the DRM update that went through, but I can't think
> what might have caused it, if you backout the DRM merge does it help
> any?

As it turns out, -git11 with all the DRM bits backed out gives me
a working X again.

> did the previous kernel have DRM support for that card?

No. This is 1002:5b60 / 1002:5b70 based card.

I had previously missed the 5b60 part in lspci output, so thinking
there was no 5b70 addition, I hadn't considered this as a suspect.
Mea Culpa. Looks like Andi is off the hook :-)

Any ideas for any debugging I can add ?

Dave

2006-01-16 12:11:32

by Dave Airlie

[permalink] [raw]
Subject: Re: 2.6.15-git breaks Xorg on em64t

> > > Another datapoint btw: I've another EM64T that works just fine.
> > > The one that fails is the only one that isn't using onboard VGA,
> > > this one has a PCIE Radeon. Given it happens when X is starting up,
> > > it could be that the X radeon driver does something special which
> > > is why others aren't seeing this.
> > >
> >
> > It might be due to the DRM update that went through, but I can't think
> > what might have caused it, if you backout the DRM merge does it help
> > any?
>
> As it turns out, -git11 with all the DRM bits backed out gives me
> a working X again.
>
> > did the previous kernel have DRM support for that card?
>
> No. This is 1002:5b60 / 1002:5b70 based card.
>
> I had previously missed the 5b60 part in lspci output, so thinking
> there was no 5b70 addition, I hadn't considered this as a suspect.
> Mea Culpa. Looks like Andi is off the hook :-)
>
> Any ideas for any debugging I can add ?

Disable dri in your xorg.conf first, (remove the Load "dri"), if that
works which it most likely will, then send me an Xorg.0.log, and a drm
debug trace (modprobe drm debug=1),

I'm going to be looking at a 64-bit machine with PCIE radeon in the
next day or two, I'll be getting one myself post LCA more than
likely...

I think more than likely we are hitting a problem where the DRM sets
up the radeon RAM controller and the X server sets it up
differently... benh has fixes for this but they need to go into the X
server driver....

Dave.

>
> Dave
>
>