2008-01-08 11:41:16

by Matt

[permalink] [raw]
Subject: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

Hi everyone,

sorry for the long delay

- I first had to get home & set up my rig to reproduce this hardlock
(repeatedly hardlocking / shutting down the laptop doesn't do too good
to the new hdd ;) )

and fortunately I was successful :)

sorry for the bad quality of the pics (they were taken with my phone):

http://omploader.org/vYWU1/moto_0025.jpg
http://omploader.org/vYWU2/moto_0026.jpg

steps to reproduce:
1.) log on
2.) startx
3.) opening some pure 64bit apps == working, no locks
4.) opening 32bit-apps (such as firefox-bin, thunderbird-bin) == hard
lock, only pulling power cord (on laptop) or reset button (rig) works,
magic sysrq key doesn't (keyboard & mouse == dead)

I'm currently writing from my "rescue system" (winxp ;) )
so if you need my kernel-config or some more info of the system please tell

Cheers

Mat


2008-01-09 01:05:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

Matthew wrote:
> Hi everyone,
>
> sorry for the long delay
>
> - I first had to get home & set up my rig to reproduce this hardlock
> (repeatedly hardlocking / shutting down the laptop doesn't do too good
> to the new hdd ;) )
>
> and fortunately I was successful :)
>
> sorry for the bad quality of the pics (they were taken with my phone):
>
> http://omploader.org/vYWU1/moto_0025.jpg
> http://omploader.org/vYWU2/moto_0026.jpg
>
> steps to reproduce:
> 1.) log on
> 2.) startx
> 3.) opening some pure 64bit apps == working, no locks
> 4.) opening 32bit-apps (such as firefox-bin, thunderbird-bin) == hard
> lock, only pulling power cord (on laptop) or reset button (rig) works,
> magic sysrq key doesn't (keyboard & mouse == dead)
>
> I'm currently writing from my "rescue system" (winxp ;) )
> so if you need my kernel-config or some more info of the system please tell
>

I have been unable to reproduce your problem here, and I notice you have
the proprietary, highly invasive and closed-source Nvidia driver
installed in your kernel.

Can you try using the "nv" or "vesa" (unaccelerated) Xorg drivers and
reproduce the problem that way?

If you *do* reproduce the problem that way, it would be extremely
helpful if you could enable CONFIG_DEBUG_INFO and provide the vmlinux
(not vmlinuz/bzImage) file that goes with the crash dump screenshot.

Thanks!

-hpa

2008-01-10 09:06:15

by Matt

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

> I have been unable to reproduce your problem here, and I notice you have
> the proprietary, highly invasive and closed-source Nvidia driver
> installed in your kernel.
>
> Can you try using the "nv" or "vesa" (unaccelerated) Xorg drivers and
> reproduce the problem that way?
>
> If you *do* reproduce the problem that way, it would be extremely
> helpful if you could enable CONFIG_DEBUG_INFO and provide the vmlinux
> (not vmlinuz/bzImage) file that goes with the crash dump screenshot.
>
> Thanks!

I was able to reproduce it with removed nvidia module (rmmod nvidia) &
nv driver, and will post the pictures later if I find some time (it
was the same function if I recall right)
do you also need: CONFIG_DEBUG_BUGVERBOSE enabled ?

>
> -hpa
>

Mat

2008-01-10 09:42:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)


* Matthew <[email protected]> wrote:

> > I have been unable to reproduce your problem here, and I notice you have
> > the proprietary, highly invasive and closed-source Nvidia driver
> > installed in your kernel.
> >
> > Can you try using the "nv" or "vesa" (unaccelerated) Xorg drivers and
> > reproduce the problem that way?
> >
> > If you *do* reproduce the problem that way, it would be extremely
> > helpful if you could enable CONFIG_DEBUG_INFO and provide the vmlinux
> > (not vmlinuz/bzImage) file that goes with the crash dump screenshot.
> >
> > Thanks!
>
> I was able to reproduce it with removed nvidia module (rmmod nvidia) &
> nv driver, and will post the pictures later if I find some time (it
> was the same function if I recall right) do you also need:
> CONFIG_DEBUG_BUGVERBOSE enabled ?

really, that module does all sorts of nasty stuff when inserted (and
then removed), so just to make sure (because you are about to crash your
box again to take a picture), could you try to boot up without never
even once loading the nvidia module?

Ingo

2008-01-10 12:44:08

by Matt

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

> really, that module does all sorts of nasty stuff when inserted (and
> then removed), so just to make sure (because you are about to crash your
> box again to take a picture), could you try to boot up without never
> even once loading the nvidia module?

and it still happens ;(
I un-emerged nvidia-drivers & checked via dmesg |grep nv -> it wasn't
loaded, but the box also hanged

it's a little tricky to reproduce it:
I tried it with root-account: firefox-bin, thunderbird-bin wouldn't trigger
user-account (with used account-directory of both apps):
thunderbird-bin triggers it more reliably
probably it has to do with the x86 compatibility apps of gentoo ?
gentoo amd64-users with 32bit firefox & thunderbird - anyone able to
reproduce it ?
it seemingly is being caused by softirq (see pictures; the zen-sources
is also using parts of rt-kernel); approx 1 minute later there also
was a spinlock lockup by syslog-ng (?)

I'll recompile the newest git-sources and see if it's still triggered
with hardirq & softirq disabled ...

http://www.kerneloftruth.neucode.org/other/crash_ia32_64/ (<--
omploader is down so I'll host the picture somewhere else)
hope there's everything revelant to see / read ...

I'll recompile the kernel in question with debug-info probably this
evening - if I find some time, you guys also need frame-pointers set ?

this also happens with rc7-based kernels, btw

> Ingo
>

Mat

2008-01-10 12:48:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)


* Matthew <[email protected]> wrote:

> this also happens with rc7-based kernels, btw

hm, exactly what rc7 based kernel? Vanilla 2.6.24-rc7, built by you? Or
any patches ontop of it? (x86.git perhaps?)

Ingo

2008-01-10 12:59:45

by Matt

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

>
> > this also happens with rc7-based kernels, btw
>
> hm, exactly what rc7 based kernel? Vanilla 2.6.24-rc7, built by you? Or
> any patches ontop of it? (x86.git perhaps?)

see first post / mail (there are a few additional patches / trees
included: badram, wireless, alsa, tuxonice, madwifi, reiser4,
sched-devel, realtime-lsm, powertop, mactel)

>since yesterday my laptop kept on hard-locking when launching 32bit
>binaries / apps
>I didn't know what to do but

>miguel bot?n was the one pointing me in the right direction, namely bisect :)

>kudos to him & the others involved in his zen-sources project:
>http://repo.or.cz/w/linux-2.6/zen-sources.git

>bisect said the following is the causer:

so I guess I need to counter-check it against your realtime-tree:
is it the following ?
http://git.eu.kernel.org/?p=linux/kernel/git/cloos/rt-2.6.git;a=summary
(it's currently at rc5 ?)

or is hardirq / softirq also included in your sched-devel tree ?
http://git.eu.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a=summary

> Ingo

Mat

2008-01-10 13:15:36

by Ed Tomlinson

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

On January 10, 2008, Ingo Molnar wrote:
>
> * Matthew <[email protected]> wrote:
>
> > this also happens with rc7-based kernels, btw
>
> hm, exactly what rc7 based kernel? Vanilla 2.6.24-rc7, built by you? Or
> any patches ontop of it? (x86.git perhaps?)

Matthew is not alone with this problem. I have it too. Its not new here. Its
been happening as long as I have had gentoo amd64 installed. It can be hard
to reproduce but eventually, when 32 bit apps are used, my box bricks. There is
nothing in the logs (nor on a serial console) - the box just freezes.

My kernel is _not_ tainted. The kernel is currently 2.6.23-gentoo-r5-crc with
the latest cfs backport applied; it does not seem to be critical though as it has
happen with all kernels I have tried (mm, linux and gentoo varients).

The processor is:

processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 4
model name : AMD Athlon(tm) 64 Processor 2800+
stepping : 10
cpu MHz : 1808.802
cache size : 512 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow rep_good
bogomips : 3620.77
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

I asked about this lkml before and was told it was probably a cpu/hardware issue... Its
interesting that Matthew is also running gentoo.

Thanks,
Ed Tomlinson

2008-01-10 13:25:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)


* Matthew <[email protected]> wrote:

> > > this also happens with rc7-based kernels, btw
> >
> > hm, exactly what rc7 based kernel? Vanilla 2.6.24-rc7, built by you?
> > Or any patches ontop of it? (x86.git perhaps?)
>
> see first post / mail (there are a few additional patches / trees
> included: badram, wireless, alsa, tuxonice, madwifi, reiser4,
> sched-devel, realtime-lsm, powertop, mactel)

problem being, that the bad patch that was identified in the first post:

Author: Roland McGrath <[email protected]>
Date: Sun Dec 23 12:47:41 2007 +0100

x86 user_regset math_emu

This converts the ptrace/signal accessors for i387 math_emu
state to the user_regset interface style, and calls these
from the old interfaces.

is only included in x86.git AFAIK. Maybe this commit is not really the
culprit?

Ingo

2008-01-10 13:33:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)


* Ed Tomlinson <[email protected]> wrote:

> Matthew is not alone with this problem. I have it too. Its not new
> here. Its been happening as long as I have had gentoo amd64
> installed. It can be hard to reproduce but eventually, when 32 bit
> apps are used, my box bricks. There is nothing in the logs (nor on a
> serial console) - the box just freezes.
>
> My kernel is _not_ tainted. [...]

ok, good. A series of questions:

- can you reproduce it from the VGA console?

- if yes, does booting with "nmi_watchdog=2 idle=poll" give you a
working NMI watchdog? (working NMI watchdog means the NMI counts
increase for all cores in /proc/interrupts).

if still 'yes', then try to reproduce the hard hang on the VGA text
console - do you perhaps get an NMI backtrace printed within 1-2 minutes
after the hard hang happens? If yes then take a photo of that or write
it down.

Ingo

2008-01-10 15:56:56

by Matt

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

> > My kernel is _not_ tainted. [...]
>

this time my kernel isn't tainted either (comm: thunderbird-bin Not
tainted; http://kerneloftruth.neucode.org/other/crash_ia32_64/not_tainted/moto_0041.jpg)
but still hardlocks:
http://kerneloftruth.neucode.org/other/crash_ia32_64/not_tainted/


> ok, good. A series of questions:
>
> - can you reproduce it from the VGA console?
>

yes

> - if yes, does booting with "nmi_watchdog=2 idle=poll" give you a
> working NMI watchdog? (working NMI watchdog means the NMI counts
> increase for all cores in /proc/interrupts).
>

no, I get err=-16
(watchdog is broken: cat /proc/interrupts | grep NMI reveals nothing)

> if still 'yes', then try to reproduce the hard hang on the VGA text
> console

yes, despite broken watchdog

steps:
1) startx
2) change to tty2, log in; DISPLAY:=0 thunderbird-bin
3) wait until it hardlocks

known apps to trigger that locking:
- realplayer
- thunderbird-bin (2.0.0.9)
- mozilla-firefox-bin (2.0.0.11)
(all included in portage-tree)

apps not triggering:
- skype (not tested that thoroughly (yet))
- ...

> - do you perhaps get an NMI backtrace printed within 1-2 minutes
> after the hard hang happens? If yes then take a photo of that or write
> it down.
>

only backtrace so far (once):
http://kerneloftruth.neucode.org/other/crash_ia32_64/moto_0040.jpg

I'll tar the whole kernel-directory & modules so that you'll be able
to reproduce it more easily (if wanted), is there a place where I
could upload it (it weighs around 300-400 MBs so that'll take some
time ;) )
I got work to do so that'll be all for now, I hope you'll be able to
find the culprit soon ...

> Ingo
>

Mat

2008-01-10 16:38:40

by Ed Tomlinson

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

On January 10, 2008, Ingo Molnar wrote:
>
> * Ed Tomlinson <[email protected]> wrote:
>
> > Matthew is not alone with this problem. I have it too. Its not new
> > here. Its been happening as long as I have had gentoo amd64
> > installed. It can be hard to reproduce but eventually, when 32 bit
> > apps are used, my box bricks. There is nothing in the logs (nor on a
> > serial console) - the box just freezes.
> >
> > My kernel is _not_ tainted. [...]
>
> ok, good. A series of questions:
>
> - can you reproduce it from the VGA console?

No - though I do have a serial console to see logs.

> - if yes, does booting with "nmi_watchdog=2 idle=poll" give you a
> working NMI watchdog? (working NMI watchdog means the NMI counts
> increase for all cores in /proc/interrupts).

booting with the above gives me an incrementing NMI counter in /proc/interrupts

> if still 'yes', then try to reproduce the hard hang on the VGA text
> console - do you perhaps get an NMI backtrace printed within 1-2 minutes
> after the hard hang happens? If yes then take a photo of that or write
> it down.

I am booted with the NMI watchdog and serial consoles active running apps that
eventually will trigger a hang...

Ed Tomlinson

2008-01-10 21:10:47

by Matt

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

> If you *do* reproduce the problem that way, it would be extremely
> helpful if you could enable CONFIG_DEBUG_INFO and provide the vmlinux
> (not vmlinuz/bzImage) file that goes with the crash dump screenshot.

I *did* reproduce it that way and enabled the above mentioned option
and the CONFIG_DEBUG_BUGVERBOSE=y thingy and CONFIG_FRAME_POINTER=y,
too
hope that's enough

here you go: http://kerneloftruth.neucode.org/other/crash_ia32_64/not_tainted/vmlinux
(I hope it was uploaded completely):

md5sum: 0be124557bafebaebd69be2138329ef6
sha256sum: 638d7a0dc36caa8eedd77e2ebeae6e8b54db74466f9d28f769c9cacf2ace0e0e

updated kernel-config:
http://omploader.org/vYWhw/2.6.24-rc6-zen0_bisect%20(latest_config)

> -hpa
>

Mat

2008-01-10 21:18:40

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

Matthew wrote:
>> If you *do* reproduce the problem that way, it would be extremely
>> helpful if you could enable CONFIG_DEBUG_INFO and provide the vmlinux
>> (not vmlinuz/bzImage) file that goes with the crash dump screenshot.
>
> I *did* reproduce it that way and enabled the above mentioned option
> and the CONFIG_DEBUG_BUGVERBOSE=y thingy and CONFIG_FRAME_POINTER=y,
> too
> hope that's enough
>
> here you go: http://kerneloftruth.neucode.org/other/crash_ia32_64/not_tainted/vmlinux
> (I hope it was uploaded completely):
>
> md5sum: 0be124557bafebaebd69be2138329ef6
> sha256sum: 638d7a0dc36caa8eedd77e2ebeae6e8b54db74466f9d28f769c9cacf2ace0e0e
>
> updated kernel-config:
> http://omploader.org/vYWhw/2.6.24-rc6-zen0_bisect%20(latest_config)
>

Great!! Downloading now.

Do you have the error dump output to go along with this, too?

Huge thanks,

-hpa

2008-01-10 21:52:42

by Matt

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

> Do you have the error dump output to go along with this, too?
>

no, unfortunately no kernel crash dump on disk ;( (I hope I understood
it right, I'm pretty noobish concerning collection of error data ;) )

I only have the console-output of the hardlock in:
http://kerneloftruth.neucode.org/other/crash_ia32_64/not_tainted/

and a call trace from an earlier crash but I don't know if it's worth anything:
http://kerneloftruth.neucode.org/other/crash_ia32_64/moto_0040.jpg

just FYI: the crash also occurs with preemptible rcu disabled (classic
rcu) (just saw that I had it enabled in that kernel ...)


> Huge thanks,
>

:)

> -hpa
>

Mat

2008-01-10 21:58:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

Matthew wrote:
>> Do you have the error dump output to go along with this, too?
>>
>
> no, unfortunately no kernel crash dump on disk ;( (I hope I understood
> it right, I'm pretty noobish concerning collection of error data ;) )
>
> I only have the console-output of the hardlock in:
> http://kerneloftruth.neucode.org/other/crash_ia32_64/not_tainted/

That's fine, but that was collected with the vmlinux image you sent me,
right?

-hpa

2008-01-10 22:28:39

by Matt

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

>
> That's fine, but that was collected with the vmlinux image you sent me,
> right?
>

no, but now it is:
http://kerneloftruth.neucode.org/other/crash_ia32_64/not_tainted/latest/
(the other one was taken before I added/selected the demanded features)

what puzzles me is that it doesn't say "tainted" (not tainted) in the
upper part & "tainted" in the lower part
I apologize for the bad quality of the pictures ...

luckily there was also an additional call trace this time (syslog-ng
Tainted: G D) (don't know what it means), nvidia-module wasn't
loaded and no other additional proprietary modules were loaded AFAIK

> -hpa
>

Mat

2008-01-10 22:31:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

Matthew wrote:
>> That's fine, but that was collected with the vmlinux image you sent me,
>> right?
>>
>
> no, but now it is:
> http://kerneloftruth.neucode.org/other/crash_ia32_64/not_tainted/latest/
> (the other one was taken before I added/selected the demanded features)
>
> what puzzles me is that it doesn't say "tainted" (not tainted) in the
> upper part & "tainted" in the lower part
> I apologize for the bad quality of the pictures ...
>
> luckily there was also an additional call trace this time (syslog-ng
> Tainted: G D) (don't know what it means), nvidia-module wasn't
> loaded and no other additional proprietary modules were loaded AFAIK
>

I just managed to reproduce the bug in simulation. I believe we should
be able to resolve this.

-hpa

2008-01-10 22:53:30

by Matt

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

> I just managed to reproduce the bug in simulation. I believe we should
> be able to resolve this.

That's great news! Keep up the good work :)

I hope that you guys'll be able to do so since it (indirectly) more or
less leads to data corruption (at least with thunderbird-bin &
firefox-bin -> both profile directories didn't work after the crash
anymore == data loss)
if I didn't have a backup aside they would have been lost ...

> -hpa
>

Mat

2008-01-10 23:35:40

by Zan Lynx

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)


On Thu, 2008-01-10 at 13:43 +0100, Matthew wrote:

> it's a little tricky to reproduce it:
> I tried it with root-account: firefox-bin, thunderbird-bin wouldn't trigger
> user-account (with used account-directory of both apps):
> thunderbird-bin triggers it more reliably
> probably it has to do with the x86 compatibility apps of gentoo ?
> gentoo amd64-users with 32bit firefox & thunderbird - anyone able to
> reproduce it ?

I believe that I *have* seen this happen to me. I'm using a Compaq
R3000 laptop with AMD-64 CPU and Gentoo with kernel 2.6.24-rc6-mm1.

A few days ago I crashed it twice in a row by trying to load my Comics
bookmarks as tabs (87 entries) in 32-bit Firefox.

I only got 1 and 3/4ths lines output by netconsole and it mentioned a
function with 32-something in the name.

Later tonight I can try loading the tabs in Firefox again to see if it
will reproduce for a 3rd time.

I also have to say that the NMI watchdog, supposedly Non-Maskable,
hardly *ever* works for me. I don't believe that whatever events reset
the dog actually matter to the end user. Perhaps its still processing
interrupts and running a timer loop but if nothing can read or write
disk, net, netlink or other device IO, I don't believe the system can
actually claim to be working.
--
Zan Lynx <[email protected]>


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2008-01-11 02:09:00

by Ed Tomlinson

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

>> - if yes, does booting with "nmi_watchdog=2 idle=poll" give you a
>> working NMI watchdog? (working NMI watchdog means the NMI counts
>> increase for all cores in /proc/interrupts).

> booting with the above gives me an incrementing NMI counter in /proc/interrupts

Ingo,

Is there anything else that needs to be set in the kernel config for the nmi watchdog to trigger?

I ask because I just had a hang but nothing showed on the _serial_ console - I waited a couple
of minutes before rebooting.... Is there any other way to verify the watchdog is working?

I seem to need X active with mix of 32 and 64 bit applications active to get hung here. A massivily
threaded 64 bit java app along with 32 bit firefox and a wine active will eventually trigger things here.
If I had to guess I would say that it the switch from 32 to 64 (or vise versa) that triggers the isuue.

TIA & test/debug patches welcome,
Ed Tomlinson

2008-01-14 16:14:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)


* Matthew <[email protected]> wrote:

> > I just managed to reproduce the bug in simulation. I believe we should
> > be able to resolve this.
>
> That's great news! Keep up the good work :)

FYI, latest x86.git should have this fix included. So if your box still
hangs there must be some other bug lurking as well.

Ingo

2008-01-14 16:16:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)


* Ed Tomlinson <[email protected]> wrote:

> >> - if yes, does booting with "nmi_watchdog=2 idle=poll" give you a
> >> working NMI watchdog? (working NMI watchdog means the NMI counts
> >> increase for all cores in /proc/interrupts).
>
> > booting with the above gives me an incrementing NMI counter in
> > /proc/interrupts
>
> Ingo,
>
> Is there anything else that needs to be set in the kernel config for
> the nmi watchdog to trigger?
>
> I ask because I just had a hang but nothing showed on the _serial_
> console - I waited a couple of minutes before rebooting.... Is there
> any other way to verify the watchdog is working?

if you cause a hard lockup intentionally via an infinite irqs-off loop:

# cat > lockupcli.c
main ()
{
iopl(3);
for (;;) asm("cli");
}
Ctrl-D
make lockupcli
./lockupcli

does the NMI watchdog properly trigger? If not, does booting with
idle=poll change the situation?

Ingo

2008-01-14 16:47:18

by Matt

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

>
> FYI, latest x86.git should have this fix included. So if your box still
> hangs there must be some other bug lurking as well.


the fix from Roland ?: http://lkml.org/lkml/2008/1/11/108
http://forums.gentoo.org/viewtopic-p-4719206.html#4719206 (+ following posts)

works like a charm :)
wine-problems should be solved,

64bit firefox & 32bit flash, 32bit firefox, 32bit thunderbird,
realplayer work fine again without hardlocking so far (at least for
me)

Thanks to everyone involved

>
> Ingo
>

Regards
Mat

2008-01-14 17:01:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)


* Matthew <[email protected]> wrote:

> > FYI, latest x86.git should have this fix included. So if your box
> > still hangs there must be some other bug lurking as well.
>
>
> the fix from Roland ?: http://lkml.org/lkml/2008/1/11/108
> http://forums.gentoo.org/viewtopic-p-4719206.html#4719206 (+ following posts)
>
> works like a charm :)
> wine-problems should be solved,
>
> 64bit firefox & 32bit flash, 32bit firefox, 32bit thunderbird,
> realplayer work fine again without hardlocking so far (at least for
> me)

great - thanks for following through with this, this was an important
regression to get fixed! I've added:

Tested-by: Matthew <[email protected]>

Ingo

2008-01-14 22:20:16

by Ed Tomlinson

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

On January 14, 2008, Ingo Molnar wrote:
>
> * Matthew <[email protected]> wrote:
>
> > > FYI, latest x86.git should have this fix included. So if your box
> > > still hangs there must be some other bug lurking as well.
> >
> >
> > the fix from Roland ?: http://lkml.org/lkml/2008/1/11/108
> > http://forums.gentoo.org/viewtopic-p-4719206.html#4719206 (+ following posts)
> >
> > works like a charm :)
> > wine-problems should be solved,
> >
> > 64bit firefox & 32bit flash, 32bit firefox, 32bit thunderbird,
> > realplayer work fine again without hardlocking so far (at least for
> > me)
>
> great - thanks for following through with this, this was an important
> regression to get fixed! I've added:
>
> Tested-by: Matthew <[email protected]>

Ingo,

This is _not_ a regression. This has been occuring for ages here. A backport of this fix to 2.6.23 would be a
very good thing - IMHO its something that should go into stable asap.

Thanks,
Ed Tomlinson

2008-01-15 17:11:43

by Matt

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

> Ingo,
>
> This is _not_ a regression. This has been occuring for ages here. A backport of this fix to 2.6.23 would be a
> very good thing - IMHO its something that should go into stable asap.
>
> Thanks,
> Ed Tomlinson
>
>
>

++

Ingo,
this probably has to do something with the random unmotivated
hardlocks which I suffered from with 2.6.23

when I used 2.6.23-based kernels my rig from time to time (sometimes 2
times a day) just locked and wouldn't react to keyboard-input or magic
sysrq key anymore
if I have a more precise look at my memory (in my head) it probably
happened more often around usage with realplayer (<-- at least that
app; 32bit on amd64) [perhaps also with 32bit thunderbird]
the next suspect is nvidia-drivers: with earlier versions this
happened more often for me (that's at least the "feeling" I have)

so you guys might want to start with those 3 points (thunderbird,
realplayer, nvidia-drivers);
considerung use / test with latest cfs-backports would also be a good
idea, with early backports I had some problems, whereas it's now
perfectly stable (fair group scheduling not enabled)

unfortunately I can't / couldn't reproduce it
in addition to that I'm pretty busy right now so I can't investigate
any further ...

hope you also find the culprit for that buggy ;)

Regards
Mat

2008-01-15 22:10:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)


* Ed Tomlinson <[email protected]> wrote:

> This is _not_ a regression. This has been occuring for ages here. A
> backport of this fix to 2.6.23 would be a very good thing - IMHO its
> something that should go into stable asap.

the problem is that this bug was only present in x86.git. I.e. neither
2.6.24 nor 2.6.23 has this particular bug.

perhaps something else in x86.git fixed your box, but this
x86.git-specific hang 'took over its place', and now that it got fixed,
you've got a working box? In any case, please monitor your box, it might
still lock up the same way it did previously ...

Ingo

2008-01-15 23:41:47

by Ed Tomlinson

[permalink] [raw]
Subject: Re: Fwd: Fwd: laptop / computer hardlocks during execution of 32bit applications(binaries) on 64bit system (Gentoo)

On January 15, 2008, Ingo Molnar wrote:
>
> * Ed Tomlinson <[email protected]> wrote:
>
> > This is _not_ a regression. This has been occuring for ages here. A
> > backport of this fix to 2.6.23 would be a very good thing - IMHO its
> > something that should go into stable asap.
>
> the problem is that this bug was only present in x86.git. I.e. neither
> 2.6.24 nor 2.6.23 has this particular bug.
>
> perhaps something else in x86.git fixed your box, but this
> x86.git-specific hang 'took over its place', and now that it got fixed,
> you've got a working box? In any case, please monitor your box, it might
> still lock up the same way it did previously ...

I am now testing with a .24-rc7+fix kernel. So far so good. Running gentoo's 32 bit
firefox with flash 9 is a good way to trigger the problem here as is running Delftship (freeship)
under wine. The problem is usually worst with a fully preemptive kernel. I have been using both on
a kernel with preempt and have an uptime of 22 hours - this is really good. I have rarely been able
to get this much uptime using these apps. If it manages to run for a few more days without a lockup
it would really be worth trying to figure out what in .24 fixes the problem...

THANKS!
Ed Tomlinson