2001-11-16 06:49:19

by Philipp Matthias Hahn

[permalink] [raw]
Subject: [OOPS] net/8139too

Hello LKML!

Since linux-2.4.15-pre[14]+kdb+freeswan I get an oops when stopping my
8139too network:

# ifdown eth0
eth0: unable to signal thread
# rmmod -a
<<< Unplug the network kable >>>
Inable to handle kernel paging request at virtual address c904b600
printing eip:
c904b600
*pde = 0127d067
*pte = 00000000

Entering kdb (current=0xc792a000, pid 59) Oops: Oops
due to oops @ 0xc904b600
eax = 0xc904b600 ebx = 0xc217aaa0 ecx = 0xc9065000 edx = 0x00000020
esi = 0x04000001 edi = 0x00000009 esp = 0xc792beae eip = 0xc904b600
ebp = 0xc792bef6 xss = 0x00000018 xcs = 0x00000010 eflegs = 0x00010202
xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff &regs = 0xc792be7a
kdb> bt
EBP EIP Function(args)
0xc792bef6 0xc904b600 <unknown>+0xc904b600
kernel <unknown> 0x0 0x0 0x0

# ksymoops -A 0xc904b600
Adhoc c904b600 <[8139too]rtl8139_interrupt+0/dc>

Quoting from Documentation/networking/8139too.txt:
> Version 0.9.22 - November 8, 2001
> ...
> Version 0.9.21 - November 1, 2001
> ...
> * Fix problems with kernel thread exit.
an from drivers/net/8139too.c:
> Robert Kuebel - Save kernel thread from dying on any signal.

Hope that helps.

BYtE
Philipp
--
/ / (_)__ __ ____ __ Philipp Hahn
/ /__/ / _ \/ // /\ \/ /
/____/_/_//_/\_,_/ /_/\_\ [email protected]


2001-11-16 07:18:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [OOPS] net/8139too

Philipp Matthias Hahn wrote:
>
> Hello LKML!
>
> Since linux-2.4.15-pre[14]+kdb+freeswan I get an oops when stopping my
> 8139too network:
>
> # ifdown eth0
> eth0: unable to signal thread

Oh gawd. What now?

Could you please tell us what the return value is from kill_proc()?


--- linux-2.4.15-pre4/drivers/net/8139too.c Mon Nov 12 11:16:11 2001
+++ linux-akpm/drivers/net/8139too.c Thu Nov 15 23:14:14 2001
@@ -2064,7 +2064,7 @@ static int rtl8139_close (struct net_dev
wmb();
ret = kill_proc (tp->thr_pid, SIGTERM, 1);
if (ret) {
- printk (KERN_ERR "%s: unable to signal thread\n", dev->name);
+ printk (KERN_ERR "%s: unable to signal thread: %d\n", dev->name, ret);
return ret;
}
wait_for_completion (&tp->thr_exited);

-

2001-11-16 08:10:16

by FD Cami

[permalink] [raw]
Subject: 3C905x - too much work in interrupt, follow-up


Hi Andrew !

Here are the promised stats (I'm sorry it took so
long):

testing procedure :
each PC was rebooted at midnight on sundays, and
data was collected for 4 days and then averaged.

Network : 700 networked PCs, running different
windows versions or linux, usually with 10MBits/s
boards, some with 100MB. Network is partially switched
with 3COM 1100s and 3300s ; fiber network (100MB/s I
think [not too sure 'bout that]) between two stacks.

Machine 1
ASUS P2B-DS / Dual PII350 / 512MB RAM / 3*18GB 10KT IBM / 3C905C
[it got an upgrade] ; distro is slackware 8, kernel 2.2.19
serves as a proxy server running squid. Normal network load
during the day is around 30MBits/s or so ; the machine
transfers 1GB of data daily, which is not too much i think.
the PC uses IO-APIC.
cat /proc/interrupts always shows that the NIC pushes
3-10 times more interrupts than the timer.
aic7xxx pushes 10 times less interrupts than the NIC.
ifconfig shows that RX is 7 times less that TX

max_interrupt_work set to 20 :
eth0 : too much work in interrupt appears 21 times a day

max_interrupt_work set to 32 :
eth0 : too much work in interrupt appears 8 times a day

max_interrupt_work set to 64 :
eth0 : too much work in interrupt appears around 2 times a day

max_interrupt_work set to 128 :
eth0 : too much work in interrupt never appears in the log.


Machine 2
ABIT LX6 / PII300 / 128MB RAM / 3C905C
hard drives [all ide] :
IBM 8GB as hda
Maxtor 80GB 5400T as hdb
Maxtor 60GB 5400T as hdc
distro is slackware 8, kernel 2.4.4
serves as an ftp server running proftpd ; sometimes uses
samba to send data to a W2K-server.
Normal network load is 500MByte per day for RX, and the same
for TX.

max_interrupt_work set to 20 :
eth0 : too much work in interrupt appears 17 times a day

max_interrupt_work set to 32 :
eth0 : too much work in interrupt appears 7 times a day

max_interrupt_work set to 64 :
eth0 : too much work in interrupt appears around 2 times a day

max_interrupt_work set to 128 :
eth0 : too much work in interrupt never appears in the log.


Machine 3
ABIT BH6 / PII400 / 128MB RAM / 3C905C / Tekram DC-390F
hard drives :
IBM 9GB as sda
IBM 4GB as sdb
distro is slackware 8, kernel 2.2.19
tested as a proxy server instead of the dual PII350

max_interrupt_work set to 20 :
eth0 : too much work in interrupt appears 5 times a day

max_interrupt_work set to 32 :
eth0 : too much work in interrupt appears 4 times a day

max_interrupt_work set to 64 :
eth0 : too much work in interrupt never appears in the log.

I hope that helps... keep me informed, please.

Fran?ois Cami


Andrew Morton wrote [20 April 2001]:
>
> Vibol Hou wrote:
> ...
>
> > Apr 17 16:10:12 omega kernel: eth0: Too much work in interrupt,
status e401.
>
> I got that one too, PC is ASUS P2B-DS with two PII-350, 384MB RAM,
> 3C905B.
If you were getting this message occasionally, and if increasing the
max_interrupt_work module parm makes it stop, and everything
is always working fine, then it's an OK thing to do.
Question is: why is it happening? We're failing to get out
of the interrupt loop after 32 loops. Each loop can reap
up to 16 transmitted packets and 32 received packets.
That's a lot.
My suspicion is that something else in the system is
causing the NIC interrupt routine to get held up for long
periods of time. It has to be another interrupt.
All reporters of this problem (ie: both of them) were using
aic7xx SCSI. I wonder if that driver can sometimes spend a
long time in its interrupt routine. Many times. Rapidly.
Very odd.
Ah. SMP. Perhaps the other CPU is generating the transmit
load, some other interrupt source is slowing down *this*
CPU.
Could you test something for me? Try *decreasing* the
value of max_interrupt_work. See if that increases
the frequency of the message. Then, it if does, try to
correlate the occurence of the message with some other
form of system activity (especially disk I/O).
Thanks.

2001-11-16 10:27:36

by Philipp Matthias Hahn

[permalink] [raw]
Subject: Re: [OOPS] net/8139too

Hello Andrew, Jedd, LKML!

On 2001.11.16 08:17 Andrew Morton wrote:
> > Since linux-2.4.15-pre[14]+kdb+freeswan I get an oops when stopping my
> > 8139too network:
> >
> > # ifdown eth0
> > eth0: unable to signal thread
>
> Could you please tell us what the return value is from kill_proc()?
Now running 2.4.15-pre5 with your patch and kill_proc returns -3.

Willing to test more patches.

BYtE
Philipp
--
/ / (_)__ __ ____ __ Philipp Hahn
/ /__/ / _ \/ // /\ \/ /
/____/_/_//_/\_,_/ /_/\_\ [email protected]

2001-11-24 19:03:33

by Philipp Matthias Hahn

[permalink] [raw]
Subject: Re: [PATCH] net/8139too

On Fri, 16 Nov 2001, Philipp Matthias Hahn wrote:

> Hello Andrew, Jedd, LKML!
>
> On 2001.11.16 08:17 Andrew Morton wrote:
> > > Since linux-2.4.15-pre[14]+kdb+freeswan I get an oops when stopping my
> > > 8139too network:
> > >
> > > # ifdown eth0
> > > eth0: unable to signal thread
> >
> > Could you please tell us what the return value is from kill_proc()?
> Now running 2.4.15-pre5 with your patch and kill_proc returns -3.

Found it! Here's what happend:

modprobe 8139too:
rtl8139_init_board() zeros "struct rtl8139_private"

ifconfig eth0 up:
rtl8139_open() is called, which starts rtl8139_thread()

ifconfig eth0 down:
rtl8139_close() sets "time_to_die = 1"
rtl8139_thread() exits

ifconfig eth0 up 'again':
rtl8139_open() is called, which starts rtl8139_thread()
time_to_die is still 1
rtl8139_thread() exits immediately

ifconfig eth0 down:
rtl8139_close() tries to signal a nonexistent thread -> ESRCH

rmmod 8139too
OOPS

Resetting time_to_die=0 in rtl8139_open() should fix the problem:

--- linux-2.4.15/drivers/net/8139too.c.orig Sat Nov 24 19:48:00 2001
+++ linux-2.4.15/drivers/net/8139too.c Sat Nov 24 19:48:49 2001
@@ -1270,6 +1270,7 @@
tp->full_duplex = tp->duplex_lock;
tp->tx_flag = (TX_FIFO_THRESH << 11) & 0x003f0000;
tp->twistie = 1;
+ tp->time_to_die = 0;

rtl8139_init_ring (dev);
rtl8139_hw_start (dev);

BYtE
Philipp
--
/ / (_)__ __ ____ __ Philipp Hahn
/ /__/ / _ \/ // /\ \/ /
/____/_/_//_/\_,_/ /_/\_\ [email protected]

2001-11-24 19:07:21

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] net/8139too

thanks, patch applied
--
Jeff Garzik | Only so many songs can be sung
Building 1024 | with two lips, two lungs, and one tongue.
MandrakeSoft | - nomeansno