2004-09-11 23:16:59

by Sean Neakums

[permalink] [raw]
Subject: 2.6.9-rc1-mm4: r8169: irq 16: nobody cared!/TX Timeout

irq 16: nobody cared!
[<c0106864>] __report_bad_irq+0x24/0x90
[<c0106ad2>] note_interrupt+0x92/0x160
[<c0106f12>] do_IRQ+0x162/0x1a0
[<c010491c>] common_interrupt+0x18/0x20
[<c0101f80>] default_idle+0x0/0x40
[<c0101fac>] default_idle+0x2c/0x40
[<c0102034>] cpu_idle+0x34/0x50
handlers:
[<c02a5470>] (rtl8169_interrupt+0x0/0x1d0)
Disabling IRQ #16
NETDEV WATCHDOG: eth2: transmit timed out
eth2: TX Timeout

CONFIG_R8169_NAPI=y

I downed and upped the interface and it started working again.


2004-09-12 11:07:56

by Francois Romieu

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4: r8169: irq 16: nobody cared!/TX Timeout

Sean Neakums <[email protected]> :
[r8169 irq delivery/Tx timeout issue]
> I downed and upped the interface and it started working again.

There is a gross error in the 2.6.9-rc1-mm4 version of the r8169 driver
which could be related to your bug.

A few patches have been posted on netdev amongst which the first should
make things better (see [PATCH 2.6.9-rc1-mm4 x/4] on netdev the 10 of
september 2004)

Can you apply the patch below on top of 2.6.9-rc1-mm4 and report
if it makes things better:
http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.9-rc1-mm4/r8169/r8169-130.patch

--
Ueimor

2004-09-12 18:14:45

by Sean Neakums

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4: r8169: irq 16: nobody cared!/TX Timeout

Francois Romieu <[email protected]> writes:

> Sean Neakums <[email protected]> :
> [r8169 irq delivery/Tx timeout issue]
>> I downed and upped the interface and it started working again.
>
> There is a gross error in the 2.6.9-rc1-mm4 version of the r8169 driver
> which could be related to your bug.
>
> A few patches have been posted on netdev amongst which the first should
> make things better (see [PATCH 2.6.9-rc1-mm4 x/4] on netdev the 10 of
> september 2004)
>
> Can you apply the patch below on top of 2.6.9-rc1-mm4 and report
> if it makes things better:
> http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.9-rc1-mm4/r8169/r8169-130.patch

Running 2.6.9-rc1-mm4 with the above patch.

I'm a bit unsure of the timing, but at some point I got this either
before or during the transfer I set up to get some Tx activity, a
repeated wget of a 35M file.

irq 10: nobody cared!
[__report_bad_irq+36/144] __report_bad_irq+0x24/0x90
[note_interrupt+146/352] note_interrupt+0x92/0x160
[do_IRQ+354/416] do_IRQ+0x162/0x1a0
[common_interrupt+24/32] common_interrupt+0x18/0x20
[default_idle+0/64] default_idle+0x0/0x40
[default_idle+44/64] default_idle+0x2c/0x40
[cpu_idle+52/80] cpu_idle+0x34/0x50
[start_kernel+347/384] start_kernel+0x15b/0x180
[unknown_bootoption+0/368] unknown_bootoption+0x0/0x170
handlers:
[usb_hcd_irq+0/112] (usb_hcd_irq+0x0/0x70)
[usb_hcd_irq+0/112] (usb_hcd_irq+0x0/0x70)
Disabling IRQ #10

I killed the transfer and started X, getting this immediately:

irq 16: nobody cared!
[__report_bad_irq+36/144] __report_bad_irq+0x24/0x90
[note_interrupt+146/352] note_interrupt+0x92/0x160
[do_IRQ+354/416] do_IRQ+0x162/0x1a0
[common_interrupt+24/32] common_interrupt+0x18/0x20
[default_idle+0/64] default_idle+0x0/0x40
[default_idle+44/64] default_idle+0x2c/0x40
[cpu_idle+52/80] cpu_idle+0x34/0x50
handlers:
[rtl8169_interrupt+0/464] (rtl8169_interrupt+0x0/0x1d0)
Disabling IRQ #16

This also happened during the originally-reported incident, which I
forgot to mention. Both times, downing and then upping the interface
resulted in what seemed like a solid hang, although possibly it was
just X.

I rebooted and started X again, and again got the above. If I boot
with acpi=noirq, I don't get that message upon starting X. Here's
/proc/interrupts before and after starting X, without passing
acpi=noirq:

CPU0 CPU1
0: 18810 52561 IO-APIC-edge timer
1: 142 8 IO-APIC-edge i8042
5: 0 0 IO-APIC-level acpi
8: 2 2 IO-APIC-edge rtc
10: 3651 3367 IO-APIC-level uhci_hcd, uhci_hcd
11: 0 0 IO-APIC-level VIA686A
14: 2 13 IO-APIC-edge ide0
16: 10 8 IO-APIC-level eth2
17: 12 7 IO-APIC-level eth1
19: 2989 2564 IO-APIC-level aic7xxx
NMI: 0 0
LOC: 71037 71036
ERR: 0
MIS: 0

CPU0 CPU1
0: 42718 64701 IO-APIC-edge timer
1: 247 33 IO-APIC-edge i8042
5: 0 0 IO-APIC-level acpi
8: 2 2 IO-APIC-edge rtc
10: 4928 3367 IO-APIC-level uhci_hcd, uhci_hcd
11: 0 0 IO-APIC-level VIA686A, radeon@PCI:1:0:0
14: 2 13 IO-APIC-edge ide0
16: 10 99990 IO-APIC-level eth2
17: 30 7 IO-APIC-level eth1
19: 3927 2564 IO-APIC-level aic7xxx
NMI: 0 0
LOC: 107084 107083
ERR: 0
MIS: 0


I don't know if this is significant, but with acpi=noirq,
/proc/interrupts looks like this:


CPU0 CPU1
0: 196902 25653 IO-APIC-edge timer
1: 64 1189 IO-APIC-edge i8042
2: 0 0 XT-PIC cascade
5: 0 0 IO-APIC-edge acpi
8: 2 2 IO-APIC-edge rtc
9: 93 4 IO-APIC-level eth1
10: 2438 4609 IO-APIC-level aic7xxx, uhci_hcd, uhci_hcd
11: 10 15775 IO-APIC-level eth2, radeon@PCI:1:0:0
12: 0 0 IO-APIC-level VIA686A
14: 10 5 IO-APIC-edge ide0
NMI: 0 0
LOC: 222226 222225
ERR: 0
MIS: 0

eth2 being the 8169.

Unfortunately after tonight I won't have access to this machine until
Friday evening. I'll grab the netdev patchset and try those next.

2004-09-12 20:47:07

by Francois Romieu

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4: r8169: irq 16: nobody cared!/TX Timeout

Sean Neakums <[email protected]> :
[...]
> Unfortunately after tonight I won't have access to this machine until
> Friday evening. I'll grab the netdev patchset and try those next.

via686a based multiprocessor board and acpi...

Can you try vanilla 2.6.8 r8169 driver with 2.6.9-rc1-mm4 ?

--
Ueimor

2004-09-12 21:06:18

by Sean Neakums

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4: r8169: irq 16: nobody cared!/TX Timeout

Francois Romieu <[email protected]> writes:

> Sean Neakums <[email protected]> :
> [...]
>> Unfortunately after tonight I won't have access to this machine until
>> Friday evening. I'll grab the netdev patchset and try those next.
>
> via686a based multiprocessor board and acpi...
>
> Can you try vanilla 2.6.8 r8169 driver with 2.6.9-rc1-mm4 ?

Same result on starting X:

irq 16: nobody cared!
[__report_bad_irq+36/144] __report_bad_irq+0x24/0x90
[note_interrupt+146/352] note_interrupt+0x92/0x160
[do_IRQ+354/416] do_IRQ+0x162/0x1a0
[common_interrupt+24/32] common_interrupt+0x18/0x20
[default_idle+0/64] default_idle+0x0/0x40
[default_idle+44/64] default_idle+0x2c/0x40
[cpu_idle+52/80] cpu_idle+0x34/0x50
handlers:
[rtl8169_interrupt+0/272] (rtl8169_interrupt+0x0/0x110)
Disabling IRQ #16

2004-09-12 22:03:02

by Francois Romieu

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4: r8169: irq 16: nobody cared!/TX Timeout

Sean Neakums <[email protected]> :
> Francois Romieu <[email protected]> writes:
> > Sean Neakums <[email protected]> :
> > [...]
> >> Unfortunately after tonight I won't have access to this machine until
> >> Friday evening. I'll grab the netdev patchset and try those next.
> >
> > via686a based multiprocessor board and acpi...
> >
> > Can you try vanilla 2.6.8 r8169 driver with 2.6.9-rc1-mm4 ?
>
> Same result on starting X:
>
> irq 16: nobody cared!

It slightly sounds like a broken irq routing.

Any taker for the hot potato ?

--
Ueimor

2004-09-13 12:41:46

by Alan

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4: r8169: irq 16: nobody cared!/TX Timeout

On Sul, 2004-09-12 at 22:59, Francois Romieu wrote:
> > Same result on starting X:
> >
> > irq 16: nobody cared!
>
> It slightly sounds like a broken irq routing.
>
> Any taker for the hot potato ?

Try booting the -mm kernel with "irqpoll" as a boot option and see if it
survives but struggles. At least I think mm4 has the irqpoll hack in. If
so then you can work back and try and see whether things like acpi=off
work

2004-09-13 12:49:03

by Sean Neakums

[permalink] [raw]
Subject: Re: 2.6.9-rc1-mm4: r8169: irq 16: nobody cared!/TX Timeout

Alan Cox <[email protected]> writes:

> On Sul, 2004-09-12 at 22:59, Francois Romieu wrote:
>> > Same result on starting X:
>> >
>> > irq 16: nobody cared!
>>
>> It slightly sounds like a broken irq routing.
>>
>> Any taker for the hot potato ?
>
> Try booting the -mm kernel with "irqpoll" as a boot option and see if it
> survives but struggles. At least I think mm4 has the irqpoll hack in. If
> so then you can work back and try and see whether things like acpi=off
> work

Not sure if you caught the earlier context or if this is relevant, but
acpi=noirq does work.