2011-05-20 08:39:33

by Justin Piszcz

[permalink] [raw]
Subject: 2.6.39: crash w/threadirqs option enabled

Hi,

I tried this in 2.6.39 and experienced crashes for the first time, when
using: threadirqs

user pts/1 Fri May 20 04:31 still logged in
reboot system boot 2.6.39 Fri May 20 04:28 - 04:36 (00:07)
user pts/2 Thu May 19 20:01 - down (08:24)
reboot system boot 2.6.39 Thu May 19 20:01 - 04:26 (08:25)
user pts/28 Thu May 19 15:57 - crash (04:03)

Not sure I can give any useful output as there is nothing in the logs,
even with netconsole enabled, no useful output was produced.

I've removed the threadirqs option for now, will see if there are any
further stability issues.

Justin.


2011-05-20 12:48:10

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled

On Fri, 20 May 2011, Justin Piszcz wrote:

> Hi,
>
> I tried this in 2.6.39 and experienced crashes for the first time, when
> using: threadirqs
>
> user pts/1 Fri May 20 04:31 still logged in
> reboot system boot 2.6.39 Fri May 20 04:28 - 04:36 (00:07) user
> pts/2 Thu May 19 20:01 - down (08:24) reboot
> system boot 2.6.39 Thu May 19 20:01 - 04:26 (08:25) user
> pts/28 Thu May 19 15:57 - crash (04:03)
>
> Not sure I can give any useful output as there is nothing in the logs, even
> with netconsole enabled, no useful output was produced.

Hmm. Machine lacks serial console, right? Can you send me your .config
please?

Thanks,

tglx

2011-05-20 12:51:06

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled



On Fri, 20 May 2011, Thomas Gleixner wrote:

> On Fri, 20 May 2011, Justin Piszcz wrote:
>
>> Hi,
>>
>> I tried this in 2.6.39 and experienced crashes for the first time, when
>> using: threadirqs
>>
>> user pts/1 Fri May 20 04:31 still logged in
>> reboot system boot 2.6.39 Fri May 20 04:28 - 04:36 (00:07) user
>> pts/2 Thu May 19 20:01 - down (08:24) reboot
>> system boot 2.6.39 Thu May 19 20:01 - 04:26 (08:25) user
>> pts/28 Thu May 19 15:57 - crash (04:03)
>>
>> Not sure I can give any useful output as there is nothing in the logs, even
>> with netconsole enabled, no useful output was produced.
>
> Hmm. Machine lacks serial console, right? Can you send me your .config
> please?
>
> Thanks,
>
> tglx
>

Hi,

Correct, no serial port or header, config:
http://home.comcast.net/~jpiszcz/20110520/config-2.6.39-3.txt

Justin.

2011-05-20 13:30:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled

On Fri, 20 May 2011, Justin Piszcz wrote:
> On Fri, 20 May 2011, Thomas Gleixner wrote:
>
> Correct, no serial port or header, config:
> http://home.comcast.net/~jpiszcz/20110520/config-2.6.39-3.txt

Does it crash right away or just when doing something particular? Is
the box fully dead after the crash ?

Thanks,

tglx

2011-05-20 13:49:12

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled



On Fri, 20 May 2011, Thomas Gleixner wrote:

> On Fri, 20 May 2011, Justin Piszcz wrote:
>> On Fri, 20 May 2011, Thomas Gleixner wrote:
>>
>> Correct, no serial port or header, config:
>> http://home.comcast.net/~jpiszcz/20110520/config-2.6.39-3.txt
>

Hello Thomas,

> Does it crash right away or just when doing something particular?
It crashed at 2100, this is when I run a few I/O intensive processes:
- backup (dump ext4 filesystem -> to a separate raid device)
- backup (dump ext4 on remote host -> to separate raid device)
- backup (dump xfs on remote host -> to separate raid device)

This looks like it is what caused it to crash.

> Is the box fully dead after the crash ?
The host was online and I went away for awhile, when I came back the system
had rebooted on its own (as I lost all of my X windows/etc).

Justin.

2011-05-20 15:17:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled

On Fri, 20 May 2011, Justin Piszcz wrote:
> On Fri, 20 May 2011, Thomas Gleixner wrote:
> > Does it crash right away or just when doing something particular?
> It crashed at 2100, this is when I run a few I/O intensive processes:
> - backup (dump ext4 filesystem -> to a separate raid device)
> - backup (dump ext4 on remote host -> to separate raid device)
> - backup (dump xfs on remote host -> to separate raid device)
>
> This looks like it is what caused it to crash.

That narrows it down somewhat, but does not give us a clue at all :(

> > Is the box fully dead after the crash ?
> The host was online and I went away for awhile, when I came back the system
> had rebooted on its own (as I lost all of my X windows/etc).

Hmm. Did you have panic_timeout set ?

Thanks,

tglx

2011-05-20 16:10:51

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled



On Fri, 20 May 2011, Thomas Gleixner wrote:

> On Fri, 20 May 2011, Justin Piszcz wrote:
>> On Fri, 20 May 2011, Thomas Gleixner wrote:
>>> Does it crash right away or just when doing something particular?
>> It crashed at 2100, this is when I run a few I/O intensive processes:
>> - backup (dump ext4 filesystem -> to a separate raid device)
>> - backup (dump ext4 on remote host -> to separate raid device)
>> - backup (dump xfs on remote host -> to separate raid device)
>>
>> This looks like it is what caused it to crash.
>
> That narrows it down somewhat, but does not give us a clue at all :(
>
>>> Is the box fully dead after the crash ?
>> The host was online and I went away for awhile, when I came back the system
>> had rebooted on its own (as I lost all of my X windows/etc).
>
> Hmm. Did you have panic_timeout set ?

Hi,

No, I do not use panic_timeout or any type of watchdog that would reboot
the system upon a lockup/crash.

Justin.

2011-05-20 16:22:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled

On Fri, 20 May 2011, Justin Piszcz wrote:
> On Fri, 20 May 2011, Thomas Gleixner wrote:
>
> > On Fri, 20 May 2011, Justin Piszcz wrote:
> > > On Fri, 20 May 2011, Thomas Gleixner wrote:
> > > > Does it crash right away or just when doing something particular?
> > > It crashed at 2100, this is when I run a few I/O intensive processes:
> > > - backup (dump ext4 filesystem -> to a separate raid device)
> > > - backup (dump ext4 on remote host -> to separate raid device)
> > > - backup (dump xfs on remote host -> to separate raid device)
> > >
> > > This looks like it is what caused it to crash.
> >
> > That narrows it down somewhat, but does not give us a clue at all :(
> >
> > > > Is the box fully dead after the crash ?
> > > The host was online and I went away for awhile, when I came back the
> > > system
> > > had rebooted on its own (as I lost all of my X windows/etc).
> >
> > Hmm. Did you have panic_timeout set ?
>
> Hi,
>
> No, I do not use panic_timeout or any type of watchdog that would reboot
> the system upon a lockup/crash.

Yuck, that means it ran into a triple fault. Nasty. I have no idea how
to debug that at the moment and I was not able to reproduce on one of
my test systems. Maybe I need to try harder.

Thanks,

tglx



2011-05-21 00:00:40

by Uwaysi Bin Kareem

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled

2.6.39 first observations: The framerate jitter in games that seem to
depend on kernel for low-jitter, like doom 3, is almost completely gone
now. (Tested with latest nvidia driver something 75 something)
I decided to try threadirqs from grub, but that did not work. Looks like
the same error as I get when I tried the rt kernels, which was this one:

http://www.paradoxuncreated.com/tmp/rterror.jpg

Peace Be With You,
Uwaysi.


http://www.paradoxuncreated.com/tmp/.config39

Pentium(R) Dual-Core CPU E5200 @ 2.50GHz

uwaysi@Millennium:~$ lspci -v
00:00.0 Host bridge: nVidia Corporation C55 Host Bridge (rev a2)
Flags: bus master, 66MHz, fast devsel, latency 0
Capabilities: <access denied>

00:00.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:00.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:00.3 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: bus master, 66MHz, fast devsel, latency 0

00:00.4 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: bus master, 66MHz, fast devsel, latency 0

00:00.5 RAM memory: nVidia Corporation C55 Memory Controller (rev a2)
Flags: bus master, 66MHz, fast devsel, latency 0

00:00.6 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:00.7 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:01.0 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:01.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:01.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:01.3 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:01.4 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:01.5 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:01.6 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:02.0 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:02.1 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: bus master, 66MHz, fast devsel, latency 0

00:02.2 RAM memory: nVidia Corporation C55 Memory Controller (rev a1)
Flags: 66MHz, fast devsel

00:03.0 PCI bridge: nVidia Corporation C55 PCI Express bridge (rev a1)
(prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=05, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fa000000-feafffff
Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp

00:06.0 PCI bridge: nVidia Corporation C55 PCI Express bridge (rev a1)
(prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=06, subordinate=06, sec-latency=0
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp

00:07.0 PCI bridge: nVidia Corporation C55 PCI Express bridge (rev a1)
(prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=07, subordinate=07, sec-latency=0
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp

00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2)
Subsystem: Micro-Star International Co., Ltd. Device 7380
Flags: bus master, 66MHz, fast devsel, latency 0
Capabilities: <access denied>

00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a3)
Subsystem: Micro-Star International Co., Ltd. Device 7380
Flags: bus master, 66MHz, fast devsel, latency 0
I/O ports at 4f00 [size=128]

00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a3)
Subsystem: Micro-Star International Co., Ltd. Device 7380
Flags: 66MHz, fast devsel, IRQ 11
I/O ports at 5000 [size=64]
I/O ports at 6000 [size=64]
Capabilities: <access denied>
Kernel driver in use: nForce2_smbus
Kernel modules: i2c-nforce2

00:0a.2 RAM memory: nVidia Corporation MCP51 Memory Controller 0 (rev a3)
Subsystem: Micro-Star International Co., Ltd. Device 7380
Flags: 66MHz, fast devsel

00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
(prog-if 10 [OHCI])
Subsystem: Micro-Star International Co., Ltd. Device 7380
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 20
Memory at f9fff000 (32-bit, non-prefetchable) [size=4K]
Capabilities: <access denied>
Kernel driver in use: ohci_hcd

00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3)
(prog-if 20 [EHCI])
Subsystem: Micro-Star International Co., Ltd. Device 7380
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 21
Memory at f9ffec00 (32-bit, non-prefetchable) [size=256]
Capabilities: <access denied>
Kernel driver in use: ehci_hcd

00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1) (prog-if 8a
[Master SecP PriP])
Subsystem: Micro-Star International Co., Ltd. Device 7380
Flags: bus master, 66MHz, fast devsel, latency 0
[virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
[virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1]
[virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
[virtual] Memory at 00000370 (type 3, non-prefetchable) [size=1]
I/O ports at ffa0 [size=16]
Capabilities: <access denied>
Kernel driver in use: pata_amd
Kernel modules: pata_amd

00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev
a1) (prog-if 85 [Master SecO PriO])
Subsystem: Micro-Star International Co., Ltd. Device 7380
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 23
I/O ports at c800 [size=8]
I/O ports at c480 [size=4]
I/O ports at c400 [size=8]
I/O ports at c080 [size=4]
I/O ports at c000 [size=16]
Memory at f9ffd000 (32-bit, non-prefetchable) [size=4K]
Capabilities: <access denied>
Kernel driver in use: sata_nv
Kernel modules: sata_nv

00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev
a1) (prog-if 85 [Master SecO PriO])
Subsystem: Micro-Star International Co., Ltd. Device 7380
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 22
I/O ports at bc00 [size=8]
I/O ports at b880 [size=4]
I/O ports at b800 [size=8]
I/O ports at b480 [size=4]
I/O ports at b400 [size=16]
Memory at f9ffc000 (32-bit, non-prefetchable) [size=4K]
Capabilities: <access denied>
Kernel driver in use: sata_nv
Kernel modules: sata_nv

00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2) (prog-if
01 [Subtractive decode])
Flags: bus master, 66MHz, fast devsel, latency 0
Bus: primary=00, secondary=08, subordinate=08, sec-latency=32
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: feb00000-febfffff
Capabilities: <access denied>

00:10.1 Audio device: nVidia Corporation MCP51 High Definition Audio (rev
a2)
Subsystem: Micro-Star International Co., Ltd. Device 7380
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 22
Memory at f9ff8000 (32-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: HDA Intel
Kernel modules: snd-hda-intel

00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a3)
Subsystem: Micro-Star International Co., Ltd. Device 380c
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 23
Memory at f9ff7000 (32-bit, non-prefetchable) [size=4K]
I/O ports at b080 [size=8]
Capabilities: <access denied>
Kernel driver in use: forcedeth
Kernel modules: forcedeth

01:00.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for
mainboards (rev a2) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=01, secondary=02, subordinate=05, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fa000000-feafffff
Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp

02:00.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for
mainboards (rev a2) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fa000000-feafffff
Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp

02:02.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for
mainboards (rev a2) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=02, secondary=04, subordinate=04, sec-latency=0
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp

02:03.0 PCI bridge: nVidia Corporation NF200 PCIe 2.0 switch for
mainboards (rev a2) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp

03:00.0 VGA compatible controller: nVidia Corporation GT200 [GeForce GTX
280] (rev a1) (prog-if 00 [VGA controller])
Subsystem: eVga.com. Corp. Device 1280
Flags: bus master, fast devsel, latency 0, IRQ 19
Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
I/O ports at dc00 [size=128]
[virtual] Expansion ROM at fea80000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidia, nvidiafb

08:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire
II(M)] IEEE 1394 OHCI Controller (rev c0) (prog-if 10 [OHCI])
Subsystem: Micro-Star International Co., Ltd. Device 380d
Flags: bus master, medium devsel, latency 32, IRQ 10
Memory at febff800 (32-bit, non-prefetchable) [size=2K]
I/O ports at ec00 [size=128]
Capabilities: <access denied>

2011-05-26 16:30:16

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled



On Fri, 20 May 2011, Justin Piszcz wrote:

>
>
> On Fri, 20 May 2011, Thomas Gleixner wrote:
>
>> On Fri, 20 May 2011, Justin Piszcz wrote:
>>> On Fri, 20 May 2011, Thomas Gleixner wrote:
>>>> Does it crash right away or just when doing something particular?
>>> It crashed at 2100, this is when I run a few I/O intensive processes:
>>> - backup (dump ext4 filesystem -> to a separate raid device)
>>> - backup (dump ext4 on remote host -> to separate raid device)
>>> - backup (dump xfs on remote host -> to separate raid device)
>>>
>>> This looks like it is what caused it to crash.
>>
>> That narrows it down somewhat, but does not give us a clue at all :(
>>
>>>> Is the box fully dead after the crash ?
>>> The host was online and I went away for awhile, when I came back the
>>> system
>>> had rebooted on its own (as I lost all of my X windows/etc).
>>
>> Hmm. Did you have panic_timeout set ?
>
> Hi,
>
> No, I do not use panic_timeout or any type of watchdog that would reboot
> the system upon a lockup/crash.
>
> Justin.
>
>

Hi,

I like to be as accurate as possible, since this occurred, I've removed
threadirqs..

Please disregard this, I also updated the BIOS on the same day (BIOS
update + kernel update), I'm re-running w/thread irqs enabled again, and
I'll update you if there are any issues, thanks.

(I've set the bios to factory defaults -> tweaked), we'll see what
happens.

Justin.

2011-06-10 08:17:34

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled



On Fri, 20 May 2011, Thomas Gleixner wrote:

> On Fri, 20 May 2011, Justin Piszcz wrote:
>> On Fri, 20 May 2011, Thomas Gleixner wrote:
>>
>>> On Fri, 20 May 2011, Justin Piszcz wrote:
>>>> On Fri, 20 May 2011, Thomas Gleixner wrote:
>>>>> Does it crash right away or just when doing something particular?
>>>> It crashed at 2100, this is when I run a few I/O intensive processes:
>>>> - backup (dump ext4 filesystem -> to a separate raid device)
>>>> - backup (dump ext4 on remote host -> to separate raid device)
>>>> - backup (dump xfs on remote host -> to separate raid device)
>>>>
>>>> This looks like it is what caused it to crash.
>>>
>>> That narrows it down somewhat, but does not give us a clue at all :(
>>>
>>>>> Is the box fully dead after the crash ?
>>>> The host was online and I went away for awhile, when I came back the
>>>> system
>>>> had rebooted on its own (as I lost all of my X windows/etc).
>>>
>>> Hmm. Did you have panic_timeout set ?
>>
>> Hi,
>>
>> No, I do not use panic_timeout or any type of watchdog that would reboot
>> the system upon a lockup/crash.
>
> Yuck, that means it ran into a triple fault. Nasty. I have no idea how
> to debug that at the moment and I was not able to reproduce on one of
> my test systems. Maybe I need to try harder.
>
> Thanks,
>
> tglx
>
>
>
>

Hi,

Crashed again and it rebooted too:

reboot system boot 2.6.39 Thu Jun 9 23:58 - 04:05 (04:06)
user1 pts/0 X Thu Jun 9 19:25 - 19:30 (00:04)
user1 pts/10 X Thu Jun 9 18:23 - crash (05:35)

Any thoughts on what could be causing this?
Should I go back to 2.6.38?

Justin.

2011-06-10 08:24:18

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled



On Fri, 10 Jun 2011, Justin Piszcz wrote:
>
> Hi,
>
> Crashed again and it rebooted too:
>
> reboot system boot 2.6.39 Thu Jun 9 23:58 - 04:05 (04:06)
> user1 pts/0 X Thu Jun 9 19:25 - 19:30 (00:04)
> user1 pts/10 X Thu Jun 9 18:23 - crash (05:35)
>
> Any thoughts on what could be causing this?
> Should I go back to 2.6.38?
>

I have a lot of USB devices attached, perhaps that is what's causing the
crashes, I will disconnect some of them and see if there is another crash.

Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 003: ID 2001:f103 D-Link Corp. DUB-H7 7-port USB 2.0 hub
Bus 001 Device 004: ID 2001:f103 D-Link Corp. DUB-H7 7-port USB 2.0 hub
Bus 001 Device 005: ID 0764:0501 Cyber Power System, Inc. CP1500 AVR UPS
Bus 001 Device 006: ID 413c:1002 Dell Computer Corp. Keyboard Hub
Bus 002 Device 003: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)
Bus 001 Device 007: ID 067b:2303 Prolific Technology, Inc. PL2303 Serial Port
Bus 001 Device 008: ID 0424:2502 Standard Microsystems Corp.
Bus 001 Device 009: ID 054c:002c Sony Corp. USB Floppy Disk Drive
Bus 001 Device 010: ID 413c:2002 Dell Computer Corp. SK-8125 Keyboard
Bus 001 Device 011: ID 0461:4d15 Primax Electronics, Ltd Dell Optical Mouse
Bus 001 Device 012: ID 0424:2602 Standard Microsystems Corp. USB 2.0 Hub
Bus 001 Device 013: ID 093b:0027 Plextor Corp.
Bus 001 Device 014: ID 0424:2228 Standard Microsystems Corp. 9-in-2 Card Reader

Justin.

2011-06-10 12:52:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled

On Fri, 10 Jun 2011, Justin Piszcz wrote:
> On Fri, 20 May 2011, Thomas Gleixner wrote:
> Crashed again and it rebooted too:
>
> reboot system boot 2.6.39 Thu Jun 9 23:58 - 04:05 (04:06)
> user1 pts/0 X Thu Jun 9 19:25 - 19:30 (00:04)
> user1 pts/10 X Thu Jun 9 18:23 - crash (05:35)
>
> Any thoughts on what could be causing this?
> Should I go back to 2.6.38?

If you remove the threadirqs option from the commandline it does not
happen, right?

Can you try the following patch ?

Thanks,

tglx
---
commit fd8a7de177b6f56a0fc59ad211c197a7df06b1ad
Author: Thomas Gleixner <[email protected]>
Date: Tue Jul 20 14:34:50 2010 +0200

x86: cpu-hotplug: Prevent softirq wakeup on wrong CPU

After a newly plugged CPU sets the cpu_online bit it enables
interrupts and goes idle. The cpu which brought up the new cpu waits
for the cpu_online bit and when it observes it, it sets the cpu_active
bit for this cpu. The cpu_active bit is the relevant one for the
scheduler to consider the cpu as a viable target.

With forced threaded interrupt handlers which imply forced threaded
softirqs we observed the following race:

cpu 0 cpu 1

bringup(cpu1);
set_cpu_online(smp_processor_id(), true);
local_irq_enable();
while (!cpu_online(cpu1));
timer_interrupt()
-> wake_up(softirq_thread_cpu1);
-> enqueue_on(softirq_thread_cpu1, cpu0);

^^^^

cpu_notify(CPU_ONLINE, cpu1);
-> sched_cpu_active(cpu1)
-> set_cpu_active((cpu1, true);

When an interrupt happens before the cpu_active bit is set by the cpu
which brought up the newly onlined cpu, then the scheduler refuses to
enqueue the woken thread which is bound to that newly onlined cpu on
that newly onlined cpu due to the not yet set cpu_active bit and
selects a fallback runqueue. Not really an expected and desirable
behaviour.

So far this has only been observed with forced hard/softirq threading,
but in theory this could happen without forced threaded hard/softirqs
as well. It's probably unobservable as it would take a massive
interrupt storm on the newly onlined cpu which causes the softirq loop
to wake up the softirq thread and an even longer delay of the cpu
which waits for the cpu_online bit.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Peter Zijlstra <[email protected]>
Cc: [email protected] # 2.6.39

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 33a0c11..9fd3137 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -285,6 +285,19 @@ notrace static void __cpuinit start_secondary(void *unused)
per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
x86_platform.nmi_init();

+ /*
+ * Wait until the cpu which brought this one up marked it
+ * online before enabling interrupts. If we don't do that then
+ * we can end up waking up the softirq thread before this cpu
+ * reached the active state, which makes the scheduler unhappy
+ * and schedule the softirq thread on the wrong cpu. This is
+ * only observable with forced threaded interrupts, but in
+ * theory it could also happen w/o them. It's just way harder
+ * to achieve.
+ */
+ while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
+ cpu_relax();
+
/* enable local interrupts */
local_irq_enable();

2011-06-10 13:07:43

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled



On Fri, 10 Jun 2011, Thomas Gleixner wrote:

> On Fri, 10 Jun 2011, Justin Piszcz wrote:
>> On Fri, 20 May 2011, Thomas Gleixner wrote:
>> Crashed again and it rebooted too:
>>
>> reboot system boot 2.6.39 Thu Jun 9 23:58 - 04:05 (04:06)
>> user1 pts/0 X Thu Jun 9 19:25 - 19:30 (00:04)
>> user1 pts/10 X Thu Jun 9 18:23 - crash (05:35)
>>
>> Any thoughts on what could be causing this?
>> Should I go back to 2.6.38?
>
> If you remove the threadirqs option from the commandline it does not
> happen, right?
Yes, most often anyhow (still using that option)

>
> Can you try the following patch ?
Yup, patched:

# patch -p1 < ../patch-for-cpu
patching file arch/x86/kernel/smpboot.c
#

New kernel running now, I also plugged in my USB devices back in as well, we'll see what happens.

Justin.

2011-06-10 13:10:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled

On Fri, 10 Jun 2011, Justin Piszcz wrote:
> On Fri, 10 Jun 2011, Thomas Gleixner wrote:
> > If you remove the threadirqs option from the commandline it does not
> > happen, right?
> Yes, most often anyhow (still using that option)

-ENOPARSE

2011-06-12 12:04:47

by Justin Piszcz

[permalink] [raw]
Subject: Re: 2.6.39: crash w/threadirqs option enabled



On Fri, 10 Jun 2011, Justin Piszcz wrote:

>
>

Hi,

It crashed again with the patch, so it must be something else when it
happened, I was not able to get any output since i was in X.

Will leave X off, console/monitor on and disconnect those USB devices
I mentioned earlier and see if the problem persists.

The call trace in the picture the last time I was able to get a screen
shot was @ blk_peek_request and scsi_request_fn.
http://home.comcast.net/~jpiszcz/20110528/2639-ss1.jpg

I will disconnect the USB devices I mentioned earlier and see if the issues
persist.

Justin.