2003-05-02 17:50:19

by Kimmo Sundqvist

[permalink] [raw]
Subject: 2.5.68-mm4 and 3c900 is a horror

2.5.68-mm4, but something less extreme from now on

There are many kinds of these... the trouble seems to be with the Ethernet
card, a 3com 3c900.

May 2 20:34:10 minjami kernel: irq 19: nobody cared!
May 2 20:34:10 minjami kernel: Call Trace:
May 2 20:34:10 minjami kernel: [handle_IRQ_event+151/244]
handle_IRQ_event+0x97/0xf4
May 2 20:34:10 minjami kernel: [do_IRQ+190/344] do_IRQ+0xbe/0x158
May 2 20:34:10 minjami kernel: [default_idle+0/52] default_idle+0x0/0x34
May 2 20:34:10 minjami kernel: [common_interrupt+24/32]
common_interrupt+0x18/0x20
May 2 20:34:10 minjami kernel: [default_idle+0/52] default_idle+0x0/0x34
May 2 20:34:10 minjami kernel: [default_idle+43/52] default_idle+0x2b/0x34
May 2 20:34:10 minjami kernel: [cpu_idle+55/72] cpu_idle+0x37/0x48
May 2 20:34:10 minjami kernel: [start_secondary+114/116]
start_secondary+0x72/0x74
May 2 20:34:11 minjami kernel: [release_console_sem+103/228]
release_console_sem+0x67/0xe4
May 2 20:34:11 minjami kernel: [printk+363/400] printk+0x16b/0x190

Another:

May 2 20:34:11 minjami kernel: handlers:
May 2 20:34:11 minjami kernel: [<f894cdc0>] (boomerang_interrupt+0x0/0x424
[3c59x])
May 2 20:34:13 minjami kernel: irq 19: nobody cared!
May 2 20:34:13 minjami kernel: Call Trace:
May 2 20:34:13 minjami kernel: [handle_IRQ_event+151/244]
handle_IRQ_event+0x97/0xf4
May 2 20:34:13 minjami kernel: [do_IRQ+190/344] do_IRQ+0xbe/0x158
May 2 20:34:13 minjami kernel: [default_idle+0/52] default_idle+0x0/0x34
May 2 20:34:13 minjami kernel: [rest_init+0/104] _stext+0x0/0x68
May 2 20:34:13 minjami kernel: [common_interrupt+24/32]
common_interrupt+0x18/0x20
May 2 20:34:13 minjami kernel: [default_idle+0/52] default_idle+0x0/0x34
May 2 20:34:13 minjami kernel: [rest_init+0/104] _stext+0x0/0x68
May 2 20:34:13 minjami kernel: [default_idle+44/52] default_idle+0x2c/0x34
May 2 20:34:13 minjami kernel: [cpu_idle+55/72] cpu_idle+0x37/0x48
May 2 20:34:13 minjami kernel: [rest_init+101/104] _stext+0x65/0x68
May 2 20:34:13 minjami kernel: [start_kernel+330/336]
start_kernel+0x14a/0x150

Another:

May 2 20:34:14 minjami kernel: irq 19: nobody cared!
May 2 20:34:14 minjami kernel: Call Trace:
May 2 20:34:14 minjami kernel: [handle_IRQ_event+151/244]
handle_IRQ_event+0x97/0xf4
May 2 20:34:14 minjami kernel: [do_IRQ+190/344] do_IRQ+0xbe/0x158
May 2 20:34:14 minjami kernel: [common_interrupt+24/32]
common_interrupt+0x18/0x20

Another:

May 2 20:34:14 minjami kernel: handlers:
May 2 20:34:14 minjami kernel: [<f894cdc0>] (boomerang_interrupt+0x0/0x424
[3c59x])
May 2 20:34:15 minjami kernel: irq 19: nobody cared!
May 2 20:34:15 minjami kernel: Call Trace:
May 2 20:34:15 minjami kernel: [handle_IRQ_event+151/244]
handle_IRQ_event+0x97/0xf4
May 2 20:34:15 minjami kernel: [do_IRQ+190/344] do_IRQ+0xbe/0x158
May 2 20:34:15 minjami kernel: [common_interrupt+24/32]
common_interrupt+0x18/0x20
May 2 20:34:15 minjami kernel: handlers:
May 2 20:34:15 minjami kernel: [<f894cdc0>] (boomerang_interrupt+0x0/0x424
[3c59x])
May 2 20:34:16 minjami kernel: irq 19: nobody cared!
May 2 20:34:16 minjami kernel: Call Trace:
May 2 20:34:16 minjami kernel: [handle_IRQ_event+151/244]
handle_IRQ_event+0x97/0xf4
May 2 20:34:16 minjami kernel: [do_IRQ+190/344] do_IRQ+0xbe/0x158
May 2 20:34:16 minjami kernel: [common_interrupt+24/32]
common_interrupt+0x18/0x20
May 2 20:34:16 minjami kernel: [do_softirq+91/200] do_softirq+0x5b/0xc8
May 2 20:34:16 minjami kernel: [do_softirq+93/200] do_softirq+0x5d/0xc8
May 2 20:34:16 minjami kernel: [smp_apic_timer_interrupt+322/340]
smp_apic_timer_interrupt+0x142/0x154
May 2 20:34:16 minjami kernel: [default_idle+0/52] default_idle+0x0/0x34
May 2 20:34:16 minjami kernel: [rest_init+0/104] _stext+0x0/0x68
May 2 20:34:16 minjami kernel: [apic_timer_interrupt+26/32]
apic_timer_interrupt+0x1a/0x20
May 2 20:34:16 minjami kernel: [default_idle+0/52] default_idle+0x0/0x34
May 2 20:34:16 minjami kernel: [rest_init+0/104] _stext+0x0/0x68
May 2 20:34:16 minjami kernel: [schedule+0/1344] schedule+0x0/0x540
May 2 20:34:16 minjami kernel: [need_resched+39/50] need_resched+0x27/0x32
May 2 20:34:16 minjami kernel: [default_idle+0/52] default_idle+0x0/0x34
May 2 20:34:16 minjami kernel: [rest_init+0/104] _stext+0x0/0x68
May 2 20:34:16 minjami kernel: [default_idle+44/52] default_idle+0x2c/0x34
May 2 20:34:16 minjami kernel: [cpu_idle+55/72] cpu_idle+0x37/0x48
May 2 20:34:16 minjami kernel: [rest_init+101/104] _stext+0x65/0x68
May 2 20:34:16 minjami kernel: [start_kernel+330/336]
start_kernel+0x14a/0x150

Another:

May 2 20:34:16 minjami kernel: handlers:
May 2 20:34:16 minjami kernel: [<f894cdc0>] (boomerang_interrupt+0x0/0x424
[3c59x])
May 2 20:34:17 minjami kernel: irq 19: nobody cared!
May 2 20:34:17 minjami kernel: Call Trace:
May 2 20:34:17 minjami kernel: [handle_IRQ_event+151/244]
handle_IRQ_event+0x97/0xf4
May 2 20:34:17 minjami kernel: [do_IRQ+190/344] do_IRQ+0xbe/0x158
May 2 20:34:17 minjami kernel: [common_interrupt+24/32]
common_interrupt+0x18/0x20

Another:

May 2 20:34:18 minjami kernel: handlers:
May 2 20:34:18 minjami kernel: [<f894cdc0>] (boomerang_interrupt+0x0/0x424
[3c59x])
May 2 20:34:19 minjami kernel: irq 19: nobody cared!
May 2 20:34:19 minjami kernel: Call Trace:
May 2 20:34:19 minjami kernel: [handle_IRQ_event+151/244]
handle_IRQ_event+0x97/0xf4
May 2 20:34:19 minjami kernel: [do_IRQ+190/344] do_IRQ+0xbe/0x158
May 2 20:34:19 minjami kernel: [default_idle+0/52] default_idle+0x0/0x34
May 2 20:34:19 minjami kernel: [common_interrupt+24/32]
common_interrupt+0x18/0x20
May 2 20:34:19 minjami kernel: [default_idle+0/52] default_idle+0x0/0x34
May 2 20:34:19 minjami kernel: [default_idle+43/52] default_idle+0x2b/0x34
May 2 20:34:19 minjami kernel: [cpu_idle+55/72] cpu_idle+0x37/0x48
May 2 20:34:19 minjami kernel: [start_secondary+114/116]
start_secondary+0x72/0x74
May 2 20:34:19 minjami kernel: [release_console_sem+103/228]
release_console_sem+0x67/0xe4
May 2 20:34:19 minjami kernel: [printk+363/400] printk+0x16b/0x190

Another:

May 2 20:34:33 minjami kernel: handlers:
May 2 20:34:33 minjami kernel: [<f894cdc0>] (boomerang_interrupt+0x0/0x424
[3c59x])
May 2 20:34:34 minjami kernel: irq 19: nobody cared!
May 2 20:34:34 minjami kernel: Call Trace:
May 2 20:34:34 minjami kernel: [handle_IRQ_event+151/244]
handle_IRQ_event+0x97/0xf4
May 2 20:34:34 minjami kernel: [do_IRQ+190/344] do_IRQ+0xbe/0x158
May 2 20:34:34 minjami kernel: [common_interrupt+24/32]
common_interrupt+0x18/0x20
May 2 20:34:34 minjami kernel: [do_softirq+91/200] do_softirq+0x5b/0xc8
May 2 20:34:34 minjami kernel: [do_softirq+93/200] do_softirq+0x5d/0xc8
May 2 20:34:34 minjami kernel: [smp_apic_timer_interrupt+322/340]
smp_apic_timer_interrupt+0x142/0x154
May 2 20:34:34 minjami kernel: [default_idle+0/52] default_idle+0x0/0x34
May 2 20:34:34 minjami kernel: [rest_init+0/104] _stext+0x0/0x68
May 2 20:34:34 minjami kernel: [apic_timer_interrupt+26/32]
apic_timer_interrupt+0x1a/0x20
May 2 20:34:34 minjami kernel: [default_idle+0/52] default_idle+0x0/0x34
May 2 20:34:34 minjami kernel: [rest_init+0/104] _stext+0x0/0x68
May 2 20:34:34 minjami kernel: [default_idle+44/52] default_idle+0x2c/0x34
May 2 20:34:34 minjami kernel: [cpu_idle+55/72] cpu_idle+0x37/0x48
May 2 20:34:34 minjami kernel: [rest_init+101/104] _stext+0x65/0x68
May 2 20:34:34 minjami kernel: [start_kernel+330/336]
start_kernel+0x14a/0x150

22 perfectly good errors left un-copypasted.

Then my kern.log contained some screenfuls of
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

(copied from "cat kern.log | less")

And then a list of files, which are from openoffice.org source directory
/stg/src/openoffice.org-1.0.1/build-tree/oo_1.0.1_src/stoc/unxlngi4.pro/

Some 200 filenames, like:
/stg/src/openoffice.org-1.0.1/build-tree/oo_1.0.1_src/stoc/unxlngi4.pro/slo/invadp_description.obj
/stg/src/openoffice.org-1.0.1/build-tree/oo_1.0.1_src/stoc/unxlngi4.pro/slo/invadp_version.o
/stg/src/openoffice.org-1.0.1/build-tree/oo_1.0.1_src/stoc/unxlngi4.pro/slo/invocation.o
/stg/src/openoffice.org-1.0.1/build-tree/oo_1.0.1_src/stoc/unxlngi4.pro/slo/invocation.obj
/stg/src/openoffice.org-1.0.1/build-tree/oo_1.0.1_src/stoc/unxlngi4.pro/slo/inv_description.o
/stg/src/openoffice.org-1.0.1/build-tree/oo_1.0.1_src/stoc/unxlngi4.pro/slo/inv_description.obj
/stg/src/openoffice.org-1.0.1/build-tree/oo_1.0.1_src/stoc/unxlngi4.pro/slo/inv_version.o
/stg/src/openoffice.org-1.0.1/build-tree/oo_1.0.1_src/stoc/unxlngi4.pro/slo/javaloader.o
/stg/src/openoffice.org-1.0.1/build-tree/oo_1.0.1_src/stoc/unxlngi4.pro/slo/javaloader.obj

(stg stands for storage, a 20GB partition with miscellaneous stuff)

After that it seems I got tired of waiting (the machine acted as if it was
doing nothing, i.e. locked, reacting to nothing) and pushed the reset button.

How should I refine my error reporting? I'll now go and see if all my
ReiserFS partitions are ok.

Debian 3.0r1, gcc version 2.95.4 20011002 (Debian prerelease), Abit VP6 SMP
motherboard with dual pIII 933MHz.

00:00.0 Host bridge: VIA Technologies, Inc. VT82C693A/694x [Apollo PRO133x]
(rev c4)
00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x
AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev
40)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
00:07.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:0b.0 Multimedia audio controller: Ensoniq 5880 AudioPCI (rev 02)
00:0c.0 Ethernet controller: 3Com Corporation 3c900 Combo [Boomerang]
00:0e.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366 /
HPT370 (rev 03)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200 AGP (rev 03)

-Kimmo S.


2003-05-02 21:55:02

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.68-mm4 and 3c900 is a horror

Kimmo Sundqvist <[email protected]> wrote:
>
> 2.5.68-mm4, but something less extreme from now on
>
> There are many kinds of these... the trouble seems to be with the Ethernet
> card, a 3com 3c900.
>
> May 2 20:34:10 minjami kernel: irq 19: nobody cared!

Very odd. How often does this happen? Once a minute? Once a second?
Every packet?

2003-05-03 19:14:50

by Kimmo Sundqvist

[permalink] [raw]
Subject: Re: 2.5.68-mm4 and 3c900 is a horror

> Kimmo Sundqvist <[email protected]> wrote:

>> May 2 20:34:10 minjami kernel: irq 19: nobody cared!

> Very odd. How often does this happen?

Actually I was booting up for the first time with -mm4 as it happened.
The first 8 errors I copied verbatim, and the 22 that happened after the
8th but before the machine locked up I left out. So my guess is that
whatever happened there, happened every time.
The system is set so that it runs kdm on boot, and opens a PPPoE link from
/etc/ppp/ppp_on_boot with
pppd pty "pppoe -I eth0 -m 1412" debug defaultroute

The screen went black, which is normal in itself, but the X background
didn't appear. Neither did it switch virtual consoles, or beep or
anything while I tried. Just can't remember was my display in sync or not,
but either way, pressing any (many) keys didn't put it in sync or out.
The kernel I was running in my previous boot was 2.5.68-osdl2, and
2.5.68-mm4 was also compiled while 2.5.68-osdl2 was running. All
userspace stuff is neatly set up, and with ordinary 2.5.68 (and
2.5.68-osdl2) everything works fine, with the exception of ALSA. OSS
works.
The ReiserFS partitions didn't complain on next bootup.

-Kimmo Sundqvist


2003-05-03 21:28:14

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.68-mm4 and 3c900 is a horror

Kmt Sundqvist <[email protected]> wrote:
>
> > Kimmo Sundqvist <[email protected]> wrote:
>
> >> May 2 20:34:10 minjami kernel: irq 19: nobody cared!
>
> > Very odd. How often does this happen?
>
> Actually I was booting up for the first time with -mm4 as it happened.
> The first 8 errors I copied verbatim, and the 22 that happened after the
> 8th but before the machine locked up I left out. So my guess is that
> whatever happened there, happened every time.

Well I don't see anything wrong in there, but this should shut it up for you.

diff -puN drivers/net/3c59x.c~3c59x-irq-fix drivers/net/3c59x.c
--- 25/drivers/net/3c59x.c~3c59x-irq-fix 2003-05-03 14:39:17.000000000 -0700
+++ 25-akpm/drivers/net/3c59x.c 2003-05-03 14:39:50.000000000 -0700
@@ -2321,7 +2321,6 @@ boomerang_interrupt(int irq, void *dev_i
long ioaddr;
int status;
int work_done = max_interrupt_work;
- int handled;

ioaddr = dev->base_addr;

@@ -2336,18 +2335,14 @@ boomerang_interrupt(int irq, void *dev_i
if (vortex_debug > 6)
printk(KERN_DEBUG "boomerang_interrupt. status=0x%4x\n", status);

- if ((status & IntLatch) == 0) {
- handled = 0;
+ if ((status & IntLatch) == 0)
goto handler_exit; /* No interrupt: shared IRQs can cause this */
- }

if (status == 0xffff) { /* h/w no longer present (hotplug)? */
if (vortex_debug > 1)
printk(KERN_DEBUG "boomerang_interrupt(1): status = 0xffff\n");
- handled = 0;
goto handler_exit;
}
- handled = 1;

if (status & IntReq) {
status |= vp->deferred;
@@ -2442,7 +2437,7 @@ boomerang_interrupt(int irq, void *dev_i
dev->name, status);
handler_exit:
spin_unlock(&vp->lock);
- return IRQ_RETVAL(handled);
+ return IRQ_HANDLED;
}

static int vortex_rx(struct net_device *dev)

_