Andi,
Sometime in the last week something was introduced to Linus'
tree which makes my dual EM64T go nuts when X tries to start.
By "go nuts", I mean it does various random things, seen so
far..
- Machine check. (I'm convinced this isn't a hardware problem
despite the new addition telling me otherwise :)
- Reboot
- Total lockup
- NMI watchdog firing, and then lockup
I've tried backing out a handful of the x86-64 patches, and
didn't get too far, as some of them are dependant on others,
it quickly became a real mess to try to bisect where exactly it broke.
Any ideas for potential candidates to try & back out ?
Dave
On 1/14/06, Dave Jones <[email protected]> wrote:
> Andi,
> Sometime in the last week something was introduced to Linus'
> tree which makes my dual EM64T go nuts when X tries to start.
> By "go nuts", I mean it does various random things, seen so
> far..
> - Machine check. (I'm convinced this isn't a hardware problem
> despite the new addition telling me otherwise :)
> - Reboot
> - Total lockup
> - NMI watchdog firing, and then lockup
>
> I've tried backing out a handful of the x86-64 patches, and
> didn't get too far, as some of them are dependant on others,
> it quickly became a real mess to try to bisect where exactly it broke.
>
> Any ideas for potential candidates to try & back out ?
Did you perhaps take a look at my report ? -git{6,7} were
bad for me, and the netconsole stack was, uhm, interesting.
http://www.ussg.iu.edu/hypermail/linux/kernel/0601.1/2130.html
-git8 made the problem disappear. Haven't tested more recent
snapshots.
Ciao,
--alessandro
"Somehow all you ever need is, never really quite enough, you know"
(Bruce Springsteen - "Reno")
On Saturday 14 January 2006 07:52, Dave Jones wrote:
> Andi,
> Sometime in the last week something was introduced to Linus'
> tree which makes my dual EM64T go nuts when X tries to start.
> By "go nuts", I mean it does various random things, seen so
> far..
> - Machine check. (I'm convinced this isn't a hardware problem
> despite the new addition telling me otherwise :)
Normally it should be impossible to cause machine checks from software
on Intel systems.
> - Reboot
> - Total lockup
> - NMI watchdog firing, and then lockup
>
> I've tried backing out a handful of the x86-64 patches, and
> didn't get too far, as some of them are dependant on others,
> it quickly became a real mess to try to bisect where exactly it broke.\
Shouldn't be too bad - i did a binary search for something else and it worked
pretty well.
>
> Any ideas for potential candidates to try & back out ?
Does it work when you revert all x86-64 changes?
-Andi
On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > Andi,
> > Sometime in the last week something was introduced to Linus'
> > tree which makes my dual EM64T go nuts when X tries to start.
> > By "go nuts", I mean it does various random things, seen so
> > far..
> > - Machine check. (I'm convinced this isn't a hardware problem
> > despite the new addition telling me otherwise :)
>
> Normally it should be impossible to cause machine checks from software
> on Intel systems.
-git7+ is the only time I've ever seen one on this box.
> > - Reboot
> > - Total lockup
> > - NMI watchdog firing, and then lockup
> >
> > I've tried backing out a handful of the x86-64 patches, and
> > didn't get too far, as some of them are dependant on others,
> > it quickly became a real mess to try to bisect where exactly it broke.\
>
> Shouldn't be too bad - i did a binary search for something else and it worked
> pretty well.
The patches reverted, but I hit problems like modprobe segfaulting during
boot, or reboots during APIC init.
> > Any ideas for potential candidates to try & back out ?
> Does it work when you revert all x86-64 changes?
-git6 which was the last one not to include the x86-64 changes boots and runs fine.
I'll try a latest -git with x86-64 backed out when I get a chance.
Dave
On Saturday 14 January 2006 23:51, Dave Jones wrote:
> On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> > On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > > Andi,
> > > Sometime in the last week something was introduced to Linus'
> > > tree which makes my dual EM64T go nuts when X tries to start.
> > > By "go nuts", I mean it does various random things, seen so
> > > far..
> > > - Machine check. (I'm convinced this isn't a hardware problem
> > > despite the new addition telling me otherwise :)
> >
> > Normally it should be impossible to cause machine checks from software
> > on Intel systems.
>
> -git7+ is the only time I've ever seen one on this box.
What happens when you apply
ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/page-table-setup
?
Anyways, since there doesn't seem to be much interest in pretesting of x86-64
patches anymore before merge such things happen occassionally.
-Andi
On Sun, Jan 15, 2006 at 01:05:07AM +0100, Andi Kleen wrote:
> On Saturday 14 January 2006 23:51, Dave Jones wrote:
> > On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> > > On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > > > Andi,
> > > > Sometime in the last week something was introduced to Linus'
> > > > tree which makes my dual EM64T go nuts when X tries to start.
> > > > By "go nuts", I mean it does various random things, seen so
> > > > far..
> > > > - Machine check. (I'm convinced this isn't a hardware problem
> > > > despite the new addition telling me otherwise :)
> > >
> > > Normally it should be impossible to cause machine checks from software
> > > on Intel systems.
> >
> > -git7+ is the only time I've ever seen one on this box.
>
> What happens when you apply
>
> ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/page-table-setup
What does this apply to ? Is it dependant on something else not
merged yet ? I get rejects when I apply it to 2.6.15-git10
I don't have time to fix it up by hand right now (I was going to set
a build going before I turned in for the night..)
I'll poke at it again tomorrow.
Another datapoint btw: I've another EM64T that works just fine.
The one that fails is the only one that isn't using onboard VGA,
this one has a PCIE Radeon. Given it happens when X is starting up,
it could be that the X radeon driver does something special which
is why others aren't seeing this.
Dave
On Sun, 15 Jan 2006, Dave Jones wrote:
> On Sun, Jan 15, 2006 at 01:05:07AM +0100, Andi Kleen wrote:
> > On Saturday 14 January 2006 23:51, Dave Jones wrote:
> > > On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> > > > On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > > > > Andi,
> > > > > Sometime in the last week something was introduced to Linus'
> > > > > tree which makes my dual EM64T go nuts when X tries to start.
> > > > > By "go nuts", I mean it does various random things, seen so
> > > > > far..
> > > > > - Machine check. (I'm convinced this isn't a hardware problem
> > > > > despite the new addition telling me otherwise :)
> > > >
> > > > Normally it should be impossible to cause machine checks from software
> > > > on Intel systems.
> > >
> > > -git7+ is the only time I've ever seen one on this box.
> >
> > What happens when you apply
> >
> > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/page-table-setup
>
> What does this apply to ? Is it dependant on something else not
> merged yet ? I get rejects when I apply it to 2.6.15-git10
>
> I don't have time to fix it up by hand right now (I was going to set
> a build going before I turned in for the night..)
>
> I'll poke at it again tomorrow.
>
> Another datapoint btw: I've another EM64T that works just fine.
> The one that fails is the only one that isn't using onboard VGA,
> this one has a PCIE Radeon. Given it happens when X is starting up,
> it could be that the X radeon driver does something special which
> is why others aren't seeing this.
On my i386 box with an ati radeon 9600 agp card of sorts trying to use the
radeon driver (ever since I installed suse 10.0) causes an immediate hard
lockup on startup of X. Sounds like your problem... Setting the driver
to VESA or using the ati proprietory driver works fine (but only the ati
driver actually is usable for me).
You could try that and if the others work for you like for me then it is
likely the problem is indeed 100% in the radeon driver somewhere...
Best regards,
Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
>
> Another datapoint btw: I've another EM64T that works just fine.
> The one that fails is the only one that isn't using onboard VGA,
> this one has a PCIE Radeon. Given it happens when X is starting up,
> it could be that the X radeon driver does something special which
> is why others aren't seeing this.
>
It might be due to the DRM update that went through, but I can't think
what might have caused it, if you backout the DRM merge does it help
any?
did the previous kernel have DRM support for that card?
Dave.
On Sunday 15 January 2006 08:06, Dave Jones wrote:
> On Sun, Jan 15, 2006 at 01:05:07AM +0100, Andi Kleen wrote:
> > On Saturday 14 January 2006 23:51, Dave Jones wrote:
> > > On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> > > > On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > > > > Andi,
> > > > > Sometime in the last week something was introduced to Linus'
> > > > > tree which makes my dual EM64T go nuts when X tries to start.
> > > > > By "go nuts", I mean it does various random things, seen so
> > > > > far..
> > > > > - Machine check. (I'm convinced this isn't a hardware problem
> > > > > despite the new addition telling me otherwise :)
> > > >
> > > > Normally it should be impossible to cause machine checks from
> > > > software on Intel systems.
> > >
> > > -git7+ is the only time I've ever seen one on this box.
> >
> > What happens when you apply
> >
> > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/page-table-setup
>
> What does this apply to ? Is it dependant on something else not
> merged yet ? I get rejects when I apply it to 2.6.15-git10
To the other patches in the quilt queue (that is the x86-64 pending
tree that everybody is supposed to test, but nobody does)
I see there was a one liner reject with the memory hot add patch in there. I
reordered it now to come first. It was for -git9.
-Andi
On Sun, Jan 15, 2006 at 04:47:11PM +0100, Andi Kleen wrote:
> On Sunday 15 January 2006 08:06, Dave Jones wrote:
> > On Sun, Jan 15, 2006 at 01:05:07AM +0100, Andi Kleen wrote:
> > > On Saturday 14 January 2006 23:51, Dave Jones wrote:
> > > > On Sat, Jan 14, 2006 at 07:43:27PM +0100, Andi Kleen wrote:
> > > > > On Saturday 14 January 2006 07:52, Dave Jones wrote:
> > > > > > Andi,
> > > > > > Sometime in the last week something was introduced to Linus'
> > > > > > tree which makes my dual EM64T go nuts when X tries to start.
> > > > > > By "go nuts", I mean it does various random things, seen so
> > > > > > far..
> > > > > > - Machine check. (I'm convinced this isn't a hardware problem
> > > > > > despite the new addition telling me otherwise :)
> > > > >
> > > > > Normally it should be impossible to cause machine checks from
> > > > > software on Intel systems.
> > > >
> > > > -git7+ is the only time I've ever seen one on this box.
> > >
> > > What happens when you apply
> > >
> > > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/page-table-setup
> >
> > What does this apply to ? Is it dependant on something else not
> > merged yet ? I get rejects when I apply it to 2.6.15-git10
>
> To the other patches in the quilt queue (that is the x86-64 pending
> tree that everybody is supposed to test, but nobody does)
>
> I see there was a one liner reject with the memory hot add patch in there. I
> reordered it now to come first. It was for -git9.
It didn't solve my problem. It hung at X startup again, and then
the NMI watchdog triggered. I'll try reverting DaveA's DRM bits,
but I'm not optimistic, as a) this card isn't supported yet, and b) whilst
it's compiled in, it was disabled by X at startup due to me using dual-head.
Dave
On Sun, Jan 15, 2006 at 08:36:05PM +1100, Dave Airlie wrote:
> >
> > Another datapoint btw: I've another EM64T that works just fine.
> > The one that fails is the only one that isn't using onboard VGA,
> > this one has a PCIE Radeon. Given it happens when X is starting up,
> > it could be that the X radeon driver does something special which
> > is why others aren't seeing this.
> >
>
> It might be due to the DRM update that went through, but I can't think
> what might have caused it, if you backout the DRM merge does it help
> any?
As it turns out, -git11 with all the DRM bits backed out gives me
a working X again.
> did the previous kernel have DRM support for that card?
No. This is 1002:5b60 / 1002:5b70 based card.
I had previously missed the 5b60 part in lspci output, so thinking
there was no 5b70 addition, I hadn't considered this as a suspect.
Mea Culpa. Looks like Andi is off the hook :-)
Any ideas for any debugging I can add ?
Dave
> > > Another datapoint btw: I've another EM64T that works just fine.
> > > The one that fails is the only one that isn't using onboard VGA,
> > > this one has a PCIE Radeon. Given it happens when X is starting up,
> > > it could be that the X radeon driver does something special which
> > > is why others aren't seeing this.
> > >
> >
> > It might be due to the DRM update that went through, but I can't think
> > what might have caused it, if you backout the DRM merge does it help
> > any?
>
> As it turns out, -git11 with all the DRM bits backed out gives me
> a working X again.
>
> > did the previous kernel have DRM support for that card?
>
> No. This is 1002:5b60 / 1002:5b70 based card.
>
> I had previously missed the 5b60 part in lspci output, so thinking
> there was no 5b70 addition, I hadn't considered this as a suspect.
> Mea Culpa. Looks like Andi is off the hook :-)
>
> Any ideas for any debugging I can add ?
Disable dri in your xorg.conf first, (remove the Load "dri"), if that
works which it most likely will, then send me an Xorg.0.log, and a drm
debug trace (modprobe drm debug=1),
I'm going to be looking at a 64-bit machine with PCIE radeon in the
next day or two, I'll be getting one myself post LCA more than
likely...
I think more than likely we are hitting a problem where the DRM sets
up the radeon RAM controller and the X server sets it up
differently... benh has fixes for this but they need to go into the X
server driver....
Dave.
>
> Dave
>
>