2003-11-17 13:33:00

by Dan Creswell

[permalink] [raw]
Subject: Hard lock on 2.6-test9 (More Info)

Both SMP kernels (2.4 and 2.6) appear to have the same interrupt map
(cat /proc/interrupts):

Kernel 2.4:

CPU0 CPU1
0: 10942 8009 IO-APIC-edge timer
1: 199 208 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
8: 1 0 IO-APIC-edge rtc
12: 2275 1615 IO-APIC-edge PS/2 Mouse
14: 57 56 IO-APIC-edge ide0
16: 118 0 IO-APIC-level usb-uhci, eth0
17: 5267 3420 IO-APIC-level ohci1394, Intel ICH4, nvidia
18: 0 0 IO-APIC-level usb-uhci
19: 0 0 IO-APIC-level usb-uhci
23: 0 0 IO-APIC-level ehci_hcd
24: 9083 3661 IO-APIC-level ioc0
25: 42 0 IO-APIC-level ioc1
NMI: 0 0
LOC: 18878 18807
ERR: 0
MIS: 0

Kernel 2.6:

CPU0 CPU1
0: 75737 51102 IO-APIC-edge timer
1: 152 278 IO-APIC-edge i8042
2: 0 0 XT-PIC cascade
8: 1 0 IO-APIC-edge rtc
12: 61 0 IO-APIC-edge i8042
14: 22 1 IO-APIC-edge ide0
16: 92 0 IO-APIC-level eth0
17: 3 0 IO-APIC-level ohci1394, Intel 82801DB-ICH4
23: 0 0 IO-APIC-level ehci_hcd
24: 2635 715 IO-APIC-level ioc0
25: 42 0 IO-APIC-level ioc1
NMI: 0 0
LOC: 126582 126581
ERR: 0
MIS: 0

When I boot X under kernel 2.6, I see the additional nvidia interrupt
path as per the 2.4 output (which was taken whilst I was running X).

But, within seconds of this additional interrupt assignment appearing,
2.6 dies a horrible death whilst 2.4 just keeps on rolling.

Cheers,

Dan.



2003-11-17 16:32:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: Hard lock on 2.6-test9 (More Info)


On Mon, 17 Nov 2003, Dan Creswell wrote:
>
> 17: 5267 3420 IO-APIC-level ohci1394, Intel ICH4, nvidia
>
> When I boot X under kernel 2.6, I see the additional nvidia interrupt
> path as per the 2.4 output (which was taken whilst I was running X).
>
> But, within seconds of this additional interrupt assignment appearing,
> 2.6 dies a horrible death whilst 2.4 just keeps on rolling.

Two potential reasons:
- the nvidia driver is just broken under 2.6.x
- or a driver bug in ohci1394 _or_ the Intel ICH4 driver, which could
become unhappy if they see a lot of interrupts that aren't for them
(maybe it uncovers a race).

You can test for the latter by just disabling those drivers, and seeing
what happens. If it still breaks, it's nvidia. If the crashes stop, it
might _still_ be nvidia, but at that point somebody else might start being
interested in it.

Linus


2003-11-17 16:46:44

by Dan Creswell

[permalink] [raw]
Subject: Re: Hard lock on 2.6-test9 (More Info)

Thanks for the advice Linus - I'll give that a go.

As an aside, I've tried both the nvidia drivers and the out-of-the-box
Xfree ones with the same results (2.4 stable, 2.6 not).

I'll get back to the list with more info once I've done the testing.

Thanks again for your time,

Dan.

Linus Torvalds wrote:

>On Mon, 17 Nov 2003, Dan Creswell wrote:
>
>
>> 17: 5267 3420 IO-APIC-level ohci1394, Intel ICH4, nvidia
>>
>>When I boot X under kernel 2.6, I see the additional nvidia interrupt
>>path as per the 2.4 output (which was taken whilst I was running X).
>>
>>But, within seconds of this additional interrupt assignment appearing,
>>2.6 dies a horrible death whilst 2.4 just keeps on rolling.
>>
>>
>
>Two potential reasons:
> - the nvidia driver is just broken under 2.6.x
> - or a driver bug in ohci1394 _or_ the Intel ICH4 driver, which could
> become unhappy if they see a lot of interrupts that aren't for them
> (maybe it uncovers a race).
>
>You can test for the latter by just disabling those drivers, and seeing
>what happens. If it still breaks, it's nvidia. If the crashes stop, it
>might _still_ be nvidia, but at that point somebody else might start being
>interested in it.
>
> Linus
>
>
>
>
>


2003-11-18 08:18:14

by Thomas Meyer

[permalink] [raw]
Subject: Re: Re: Hard lock on 2.6-test9 (More Info)

Hi,

this is really strange.
irq5 is shared between yenta_socket driver, uhci_hcd and ohci1394.
loading the modules ohci1394 and uhci_hcd -> system run stable.
loading the modules yenta_socket and uhci_hcd -> system run stable.
loading all 3 modules-> systems hangs.
there are no devices connected to usb controller and to 1394 controller.

with kind regards
Thomas Meyer

----- Original Nachricht ----
Von: Dan Creswell <[email protected]>
An: Thomas Meyer <[email protected]>
Datum: 17.11.2003 22:48
Betreff: Re: Hard lock on 2.6-test9 (More Info)

Thanks for the info - certainly seems to help to remove it in my case as
well but I've still got a few problems - another culprit hiding
somewhere....

Cheers,

Dan.

Thomas Meyer wrote:

> Hi,
>
> i have got the same type of hardlock. i first thought it was the
> yenta_socket driver that causes this error, but it seems to be the
> ohci1394 driver.
>
> on my computer irq 5 is shared between yenta_socket driver, uhci_hcd
> and ohci1394. when not loading module ohci1394 system runs stable
>
> with kind regards
> Thomas Meyer
>
> Dan Creswell wrote:
>
>> Thanks for the advice Linus - I'll give that a go.
>>
>> As an aside, I've tried both the nvidia drivers and the
>> out-of-the-box Xfree ones with the same results (2.4 stable, 2.6 not).
>>
>> I'll get back to the list with more info once I've done the testing.
>>
>> Thanks again for your time,
>>
>> Dan.
>>
>> Linus Torvalds wrote:
>>
>>> On Mon, 17 Nov 2003, Dan Creswell wrote:
>>>
>>>
>>>> 17: 5267 3420 IO-APIC-level ohci1394, Intel ICH4,
>>>> nvidia
>>>>
>>>> When I boot X under kernel 2.6, I see the additional nvidia
>>>> interrupt path as per the 2.4 output (which was taken whilst I was
>>>> running X).
>>>>
>>>> But, within seconds of this additional interrupt assignment
>>>> appearing, 2.6 dies a horrible death whilst 2.4 just keeps on rolling.
>>>>
>>>
>>>
>>>
>>> Two potential reasons:
>>> - the nvidia driver is just broken under 2.6.x
>>> - or a driver bug in ohci1394 _or_ the Intel ICH4 driver, which
>>> could become unhappy if they see a lot of interrupts that aren't
>>> for them (maybe it uncovers a race).
>>>
>>> You can test for the latter by just disabling those drivers, and
>>> seeing what happens. If it still breaks, it's nvidia. If the crashes
>>> stop, it might _still_ be nvidia, but at that point somebody else
>>> might start being interested in it.
>>>
>>> Linus
>>>
>>>
>>>
>>>
>>>
>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
>
>