2003-03-11 09:05:16

by Abraham van der Merwe

[permalink] [raw]
Subject: kernel panic: bug in sch_sfq.c

Hi!

I have a box that crashed today. Below is the decoded kernel panic. If you
track down the bug PLEASE send me a patch.

------------< snip <------< snip <------< snip <------------
ksymoops 2.4.8 on i686 2.4.20-rc1. Options used
-v vmlinux-2.4.21-pre5 (specified)
-K (specified)
-L (specified)
-O (specified)
-m System.map-2.4.21-pre5 (specified)

Unable to handle kernel NULL pointer dereference at virtual address 00000004
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c01a5399>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000000 ebx: c7b9a9e8 ecx: 0000007f edx: c7a8eef8
esi: c7b9ab08 edi: 000007f0 ebp: c7a8e060 esp: c021deb8
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c021d000)
Stack: c7b9a9e8 c7b9ab08 c7f7ee00 c7b9a860 c7b893c0 c7f7ee00 c7b9a860 00000000
c01a3507 c7b5c680 7fb9a9f0 c01a339e c7a8e000 ffffffff 00000018 00000006
c7b9a800 00000018 00000006 c7b9a800 c7b9a9e8 c7b9ab08 c7f7ee00 c01a371c
Call Trace: [<c01a3507>] [<c01a339e>] [<c01a371c>] [<c019f7a3>] [>c019949d>]
[<c0115a6a>] [<c01082bd>] [<c0105240>] [<c0105240>] [<c010a528>] [<c0105240>]
[<c0105240>] [<c0105263>] [<c01052d2>] [<c0105000>] [<c0105027>]
Code: 89 50 04 89 02 8b 5c 24 24 c7 03 00 00 00 00 c7 43 04 00 00


>>EIP; c01a5399 <sfq_dequeue+59/1b0> <=====

>>esp; c021deb8 <init_task_union+1eb8/2000>

Trace; c01a3507 <htb_dequeue_tree+217/230>
Trace; c01a339e <htb_dequeue_tree+ae/230>
Trace; c01a371c <htb_dequeue+16c/250>
Trace; c019f7a3 <qdisc_restart+13/d0>
Trace; c0115a6a <do_softirq+5a/b0>
Trace; c01082bd <do_IRQ+9d/b0>
Trace; c0105240 <default_idle+0/30>
Trace; c0105240 <default_idle+0/30>
Trace; c010a528 <call_do_IRQ+5/d>
Trace; c0105240 <default_idle+0/30>
Trace; c0105240 <default_idle+0/30>
Trace; c0105263 <default_idle+23/30>
Trace; c01052d2 <cpu_idle+42/60>
Trace; c0105000 <_stext+0/0>
Trace; c0105027 <rest_init+27/30>

Code; c01a5399 <sfq_dequeue+59/1b0>
00000000 <_EIP>:
Code; c01a5399 <sfq_dequeue+59/1b0> <=====
0: 89 50 04 mov %edx,0x4(%eax) <=====
Code; c01a539c <sfq_dequeue+5c/1b0>
3: 89 02 mov %eax,(%edx)
Code; c01a539e <sfq_dequeue+5e/1b0>
5: 8b 5c 24 24 mov 0x24(%esp,1),%ebx
Code; c01a53a2 <sfq_dequeue+62/1b0>
9: c7 03 00 00 00 00 movl $0x0,(%ebx)
Code; c01a53a8 <sfq_dequeue+68/1b0>
f: c7 43 04 00 00 00 00 movl $0x0,0x4(%ebx)

<0>Kernel panic: Aiee, killing interrupt handler!
------------< snip <------< snip <------< snip <------------

Below are the rules that were installed on the system:

------------< snip <------< snip <------< snip <------------
/sbin/tc qdisc del dev eth0 root
/sbin/tc qdisc del dev eth1 root
/sbin/iptables -t mangle -F qos
/sbin/iptables -t mangle -Z qos
/sbin/tc qdisc add dev eth0 root handle 1: htb default 5 r2q 1
/sbin/tc class add dev eth0 parent 1: classid 1:1 htb rate 96kbit
/sbin/tc class add dev eth0 parent 1:1 classid 1:2 htb rate 96kbit ceil 96kbit
/sbin/tc class add dev eth0 parent 1:2 classid 1:3 htb rate 48kbit ceil 96kbit prio 1
/sbin/tc qdisc add dev eth0 handle 3: parent 1:3 sfq perturb 10 limit 31
/sbin/tc class add dev eth0 parent 1:2 classid 1:4 htb rate 24kbit ceil 96kbit prio 1
/sbin/tc qdisc add dev eth0 handle 4: parent 1:4 sfq perturb 10 limit 31
/sbin/tc class add dev eth0 parent 1:2 classid 1:5 htb rate 16kbit ceil 96kbit prio 2
/sbin/tc qdisc add dev eth0 handle 5: parent 1:5 sfq perturb 10 limit 31
/sbin/iptables -t mangle -A qos -o eth0 -s 66.8.85.0/28 -j CLASSIFY --set-class 1:3
/sbin/iptables -t mangle -A qos -o eth0 -s 66.8.85.80/28 -j CLASSIFY --set-class 1:4
/sbin/iptables -t mangle -A qos -o eth0 -s 192.116.106.192/29 -j CLASSIFY --set-class 1:0
/sbin/iptables -t mangle -A qos -o eth0 -s 66.8.28.48/29 -j CLASSIFY --set-class 1:0
/sbin/tc qdisc add dev eth1 root handle 1: htb default 5 r2q 2
/sbin/tc class add dev eth1 parent 1: classid 1:1 htb rate 512kbit
/sbin/tc class add dev eth1 parent 1:1 classid 1:2 htb rate 256kbit ceil 512kbit
/sbin/tc class add dev eth1 parent 1:2 classid 1:3 htb rate 128kbit ceil 512kbit prio 1
/sbin/tc qdisc add dev eth1 handle 3: parent 1:3 sfq perturb 10 limit 169
/sbin/tc class add dev eth1 parent 1:2 classid 1:4 htb rate 64kbit ceil 512kbit prio 1
/sbin/tc qdisc add dev eth1 handle 4: parent 1:4 sfq perturb 10 limit 169
/sbin/tc class add dev eth1 parent 1:2 classid 1:5 htb rate 32kbit ceil 512kbit prio 2
/sbin/tc qdisc add dev eth1 handle 5: parent 1:5 sfq perturb 10 limit 169
/sbin/iptables -t mangle -A qos -o eth1 -d 66.8.85.0/28 -j CLASSIFY --set-class 1:3
/sbin/iptables -t mangle -A qos -o eth1 -d 66.8.85.80/28 -j CLASSIFY --set-class 1:4
/sbin/iptables -t mangle -A qos -o eth1 -d 192.116.106.192/29 -j CLASSIFY --set-class 1:0
/sbin/iptables -t mangle -A qos -o eth1 -d 66.8.28.48/29 -j CLASSIFY --set-class 1:0
------------< snip <------< snip <------< snip <------------

I've made tons of info available on my home page for you to look at (proc
files, vmlinux, System.map, original panic message, etc.

http://oasis.frogfoot.net/sfq/

--

Regards
Abraham

I saw what you did and I know who you are.

___________________________________________________
Abraham vd Merwe [ZR1BBQ] - Frogfoot Networks
P.O. Box 3472, Matieland, Stellenbosch, 7602
Cell: +27 82 565 4451 Http: http://www.frogfoot.net/
Email: [email protected]


2003-03-11 09:33:46

by bert hubert

[permalink] [raw]
Subject: Re: kernel panic: bug in sch_sfq.c

On Tue, Mar 11, 2003 at 11:14:09AM +0200, Abraham van der Merwe wrote:
> Hi!
>
> I have a box that crashed today. Below is the decoded kernel panic. If you
> track down the bug PLEASE send me a patch.

Weird, Alexeys code is normally very very solid. Perhaps HTB is also
involved. Devik?

>
> ------------< snip <------< snip <------< snip <------------
> ksymoops 2.4.8 on i686 2.4.20-rc1. Options used
> -v vmlinux-2.4.21-pre5 (specified)
> -K (specified)
> -L (specified)
> -O (specified)
> -m System.map-2.4.21-pre5 (specified)
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000004
> *pde = 00000000
> Oops: 0002
> CPU: 0
> EIP: 0010:[<c01a5399>] Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010202
> eax: 00000000 ebx: c7b9a9e8 ecx: 0000007f edx: c7a8eef8
> esi: c7b9ab08 edi: 000007f0 ebp: c7a8e060 esp: c021deb8
> ds: 0018 es: 0018 ss: 0018
> Process swapper (pid: 0, stackpage=c021d000)
> Stack: c7b9a9e8 c7b9ab08 c7f7ee00 c7b9a860 c7b893c0 c7f7ee00 c7b9a860 00000000
> c01a3507 c7b5c680 7fb9a9f0 c01a339e c7a8e000 ffffffff 00000018 00000006
> c7b9a800 00000018 00000006 c7b9a800 c7b9a9e8 c7b9ab08 c7f7ee00 c01a371c
> Call Trace: [<c01a3507>] [<c01a339e>] [<c01a371c>] [<c019f7a3>] [>c019949d>]
> [<c0115a6a>] [<c01082bd>] [<c0105240>] [<c0105240>] [<c010a528>] [<c0105240>]
> [<c0105240>] [<c0105263>] [<c01052d2>] [<c0105000>] [<c0105027>]
> Code: 89 50 04 89 02 8b 5c 24 24 c7 03 00 00 00 00 c7 43 04 00 00
>
>
> >>EIP; c01a5399 <sfq_dequeue+59/1b0> <=====
>
> >>esp; c021deb8 <init_task_union+1eb8/2000>
>
> Trace; c01a3507 <htb_dequeue_tree+217/230>
> Trace; c01a339e <htb_dequeue_tree+ae/230>
> Trace; c01a371c <htb_dequeue+16c/250>
> Trace; c019f7a3 <qdisc_restart+13/d0>
> Trace; c0115a6a <do_softirq+5a/b0>
> Trace; c01082bd <do_IRQ+9d/b0>
> Trace; c0105240 <default_idle+0/30>
> Trace; c0105240 <default_idle+0/30>
> Trace; c010a528 <call_do_IRQ+5/d>
> Trace; c0105240 <default_idle+0/30>
> Trace; c0105240 <default_idle+0/30>
> Trace; c0105263 <default_idle+23/30>
> Trace; c01052d2 <cpu_idle+42/60>
> Trace; c0105000 <_stext+0/0>
> Trace; c0105027 <rest_init+27/30>
>
> Code; c01a5399 <sfq_dequeue+59/1b0>
> 00000000 <_EIP>:
> Code; c01a5399 <sfq_dequeue+59/1b0> <=====
> 0: 89 50 04 mov %edx,0x4(%eax) <=====
> Code; c01a539c <sfq_dequeue+5c/1b0>
> 3: 89 02 mov %eax,(%edx)
> Code; c01a539e <sfq_dequeue+5e/1b0>
> 5: 8b 5c 24 24 mov 0x24(%esp,1),%ebx
> Code; c01a53a2 <sfq_dequeue+62/1b0>
> 9: c7 03 00 00 00 00 movl $0x0,(%ebx)
> Code; c01a53a8 <sfq_dequeue+68/1b0>
> f: c7 43 04 00 00 00 00 movl $0x0,0x4(%ebx)
>
> <0>Kernel panic: Aiee, killing interrupt handler!
> ------------< snip <------< snip <------< snip <------------
>
> Below are the rules that were installed on the system:
>
> ------------< snip <------< snip <------< snip <------------
> /sbin/tc qdisc del dev eth0 root
> /sbin/tc qdisc del dev eth1 root
> /sbin/iptables -t mangle -F qos
> /sbin/iptables -t mangle -Z qos
> /sbin/tc qdisc add dev eth0 root handle 1: htb default 5 r2q 1
> /sbin/tc class add dev eth0 parent 1: classid 1:1 htb rate 96kbit
> /sbin/tc class add dev eth0 parent 1:1 classid 1:2 htb rate 96kbit ceil 96kbit
> /sbin/tc class add dev eth0 parent 1:2 classid 1:3 htb rate 48kbit ceil 96kbit prio 1
> /sbin/tc qdisc add dev eth0 handle 3: parent 1:3 sfq perturb 10 limit 31
> /sbin/tc class add dev eth0 parent 1:2 classid 1:4 htb rate 24kbit ceil 96kbit prio 1
> /sbin/tc qdisc add dev eth0 handle 4: parent 1:4 sfq perturb 10 limit 31
> /sbin/tc class add dev eth0 parent 1:2 classid 1:5 htb rate 16kbit ceil 96kbit prio 2
> /sbin/tc qdisc add dev eth0 handle 5: parent 1:5 sfq perturb 10 limit 31
> /sbin/iptables -t mangle -A qos -o eth0 -s 66.8.85.0/28 -j CLASSIFY --set-class 1:3
> /sbin/iptables -t mangle -A qos -o eth0 -s 66.8.85.80/28 -j CLASSIFY --set-class 1:4
> /sbin/iptables -t mangle -A qos -o eth0 -s 192.116.106.192/29 -j CLASSIFY --set-class 1:0
> /sbin/iptables -t mangle -A qos -o eth0 -s 66.8.28.48/29 -j CLASSIFY --set-class 1:0
> /sbin/tc qdisc add dev eth1 root handle 1: htb default 5 r2q 2
> /sbin/tc class add dev eth1 parent 1: classid 1:1 htb rate 512kbit
> /sbin/tc class add dev eth1 parent 1:1 classid 1:2 htb rate 256kbit ceil 512kbit
> /sbin/tc class add dev eth1 parent 1:2 classid 1:3 htb rate 128kbit ceil 512kbit prio 1
> /sbin/tc qdisc add dev eth1 handle 3: parent 1:3 sfq perturb 10 limit 169
> /sbin/tc class add dev eth1 parent 1:2 classid 1:4 htb rate 64kbit ceil 512kbit prio 1
> /sbin/tc qdisc add dev eth1 handle 4: parent 1:4 sfq perturb 10 limit 169
> /sbin/tc class add dev eth1 parent 1:2 classid 1:5 htb rate 32kbit ceil 512kbit prio 2
> /sbin/tc qdisc add dev eth1 handle 5: parent 1:5 sfq perturb 10 limit 169
> /sbin/iptables -t mangle -A qos -o eth1 -d 66.8.85.0/28 -j CLASSIFY --set-class 1:3
> /sbin/iptables -t mangle -A qos -o eth1 -d 66.8.85.80/28 -j CLASSIFY --set-class 1:4
> /sbin/iptables -t mangle -A qos -o eth1 -d 192.116.106.192/29 -j CLASSIFY --set-class 1:0
> /sbin/iptables -t mangle -A qos -o eth1 -d 66.8.28.48/29 -j CLASSIFY --set-class 1:0
> ------------< snip <------< snip <------< snip <------------
>
> I've made tons of info available on my home page for you to look at (proc
> files, vmlinux, System.map, original panic message, etc.
>
> http://oasis.frogfoot.net/sfq/
>
> --
>
> Regards
> Abraham
>
> I saw what you did and I know who you are.
>
> ___________________________________________________
> Abraham vd Merwe [ZR1BBQ] - Frogfoot Networks
> P.O. Box 3472, Matieland, Stellenbosch, 7602
> Cell: +27 82 565 4451 Http: http://www.frogfoot.net/
> Email: [email protected]
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
http://www.PowerDNS.com Open source, database driven DNS Software
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
http://netherlabs.nl Consulting

2003-03-11 11:54:00

by Martin Devera

[permalink] [raw]
Subject: Re: kernel panic: bug in sch_sfq.c

Hmm,
I looked at it. It seems that skb linked list was corrupted
(containing NULL pointer). It could be because of two problems,
either someone (maybe htb too)'ve overwritten memory or HTB
called dequeue with wrong argument.
Latter is unlikely because I call q->dequeue and sfq's
dequeue was really called. Thus pointer is ok.
Let's examine how could HTB mungle with qdisc internals. If
htb would think that leaf is inner node, inner.feed[0] is
pointer equal to leaf.q. I examined code but there is no
way to make this mistake.

Last 3 days I got 3 bugreports. Each crashes in different
place and all seem unrelated. Each is NULL pointer dereference
though.
I think there is some place in some code which writes in
bad random places in memory :-\

To ask all people whose have seen such Oops, have you used
dynamic tc classes changes ? Like creating/deleting/changing/viewving
classes offten at runtime ? (I'm trying to find common trigger).

thanks, devik
HTB maintainer

On Tue, 11 Mar 2003, bert hubert wrote:

> On Tue, Mar 11, 2003 at 11:14:09AM +0200, Abraham van der Merwe wrote:
> > Hi!
> >
> > I have a box that crashed today. Below is the decoded kernel panic. If you
> > track down the bug PLEASE send me a patch.
>
> Weird, Alexeys code is normally very very solid. Perhaps HTB is also
> involved. Devik?
>
> >
> > ------------< snip <------< snip <------< snip <------------
> > ksymoops 2.4.8 on i686 2.4.20-rc1. Options used
> > -v vmlinux-2.4.21-pre5 (specified)
> > -K (specified)
> > -L (specified)
> > -O (specified)
> > -m System.map-2.4.21-pre5 (specified)
> >
> > Unable to handle kernel NULL pointer dereference at virtual address 00000004
> > *pde = 00000000
> > Oops: 0002
> > CPU: 0
> > EIP: 0010:[<c01a5399>] Not tainted
> > Using defaults from ksymoops -t elf32-i386 -a i386
> > EFLAGS: 00010202
> > eax: 00000000 ebx: c7b9a9e8 ecx: 0000007f edx: c7a8eef8
> > esi: c7b9ab08 edi: 000007f0 ebp: c7a8e060 esp: c021deb8
> > ds: 0018 es: 0018 ss: 0018
> > Process swapper (pid: 0, stackpage=c021d000)
> > Stack: c7b9a9e8 c7b9ab08 c7f7ee00 c7b9a860 c7b893c0 c7f7ee00 c7b9a860 00000000
> > c01a3507 c7b5c680 7fb9a9f0 c01a339e c7a8e000 ffffffff 00000018 00000006
> > c7b9a800 00000018 00000006 c7b9a800 c7b9a9e8 c7b9ab08 c7f7ee00 c01a371c
> > Call Trace: [<c01a3507>] [<c01a339e>] [<c01a371c>] [<c019f7a3>] [>c019949d>]
> > [<c0115a6a>] [<c01082bd>] [<c0105240>] [<c0105240>] [<c010a528>] [<c0105240>]
> > [<c0105240>] [<c0105263>] [<c01052d2>] [<c0105000>] [<c0105027>]
> > Code: 89 50 04 89 02 8b 5c 24 24 c7 03 00 00 00 00 c7 43 04 00 00
> >
> >
> > >>EIP; c01a5399 <sfq_dequeue+59/1b0> <=====
> >
> > >>esp; c021deb8 <init_task_union+1eb8/2000>
> >
> > Trace; c01a3507 <htb_dequeue_tree+217/230>
> > Trace; c01a339e <htb_dequeue_tree+ae/230>
> > Trace; c01a371c <htb_dequeue+16c/250>
> > Trace; c019f7a3 <qdisc_restart+13/d0>
> > Trace; c0115a6a <do_softirq+5a/b0>
> > Trace; c01082bd <do_IRQ+9d/b0>
> > Trace; c0105240 <default_idle+0/30>
> > Trace; c0105240 <default_idle+0/30>
> > Trace; c010a528 <call_do_IRQ+5/d>
> > Trace; c0105240 <default_idle+0/30>
> > Trace; c0105240 <default_idle+0/30>
> > Trace; c0105263 <default_idle+23/30>
> > Trace; c01052d2 <cpu_idle+42/60>
> > Trace; c0105000 <_stext+0/0>
> > Trace; c0105027 <rest_init+27/30>
> >
> > Code; c01a5399 <sfq_dequeue+59/1b0>
> > 00000000 <_EIP>:
> > Code; c01a5399 <sfq_dequeue+59/1b0> <=====
> > 0: 89 50 04 mov %edx,0x4(%eax) <=====
> > Code; c01a539c <sfq_dequeue+5c/1b0>
> > 3: 89 02 mov %eax,(%edx)
> > Code; c01a539e <sfq_dequeue+5e/1b0>
> > 5: 8b 5c 24 24 mov 0x24(%esp,1),%ebx
> > Code; c01a53a2 <sfq_dequeue+62/1b0>
> > 9: c7 03 00 00 00 00 movl $0x0,(%ebx)
> > Code; c01a53a8 <sfq_dequeue+68/1b0>
> > f: c7 43 04 00 00 00 00 movl $0x0,0x4(%ebx)
> >
> > <0>Kernel panic: Aiee, killing interrupt handler!
> > ------------< snip <------< snip <------< snip <------------
> >
> > Below are the rules that were installed on the system:
> >
> > ------------< snip <------< snip <------< snip <------------
> > /sbin/tc qdisc del dev eth0 root
> > /sbin/tc qdisc del dev eth1 root
> > /sbin/iptables -t mangle -F qos
> > /sbin/iptables -t mangle -Z qos
> > /sbin/tc qdisc add dev eth0 root handle 1: htb default 5 r2q 1
> > /sbin/tc class add dev eth0 parent 1: classid 1:1 htb rate 96kbit
> > /sbin/tc class add dev eth0 parent 1:1 classid 1:2 htb rate 96kbit ceil 96kbit
> > /sbin/tc class add dev eth0 parent 1:2 classid 1:3 htb rate 48kbit ceil 96kbit prio 1
> > /sbin/tc qdisc add dev eth0 handle 3: parent 1:3 sfq perturb 10 limit 31
> > /sbin/tc class add dev eth0 parent 1:2 classid 1:4 htb rate 24kbit ceil 96kbit prio 1
> > /sbin/tc qdisc add dev eth0 handle 4: parent 1:4 sfq perturb 10 limit 31
> > /sbin/tc class add dev eth0 parent 1:2 classid 1:5 htb rate 16kbit ceil 96kbit prio 2
> > /sbin/tc qdisc add dev eth0 handle 5: parent 1:5 sfq perturb 10 limit 31
> > /sbin/iptables -t mangle -A qos -o eth0 -s 66.8.85.0/28 -j CLASSIFY --set-class 1:3
> > /sbin/iptables -t mangle -A qos -o eth0 -s 66.8.85.80/28 -j CLASSIFY --set-class 1:4
> > /sbin/iptables -t mangle -A qos -o eth0 -s 192.116.106.192/29 -j CLASSIFY --set-class 1:0
> > /sbin/iptables -t mangle -A qos -o eth0 -s 66.8.28.48/29 -j CLASSIFY --set-class 1:0
> > /sbin/tc qdisc add dev eth1 root handle 1: htb default 5 r2q 2
> > /sbin/tc class add dev eth1 parent 1: classid 1:1 htb rate 512kbit
> > /sbin/tc class add dev eth1 parent 1:1 classid 1:2 htb rate 256kbit ceil 512kbit
> > /sbin/tc class add dev eth1 parent 1:2 classid 1:3 htb rate 128kbit ceil 512kbit prio 1
> > /sbin/tc qdisc add dev eth1 handle 3: parent 1:3 sfq perturb 10 limit 169
> > /sbin/tc class add dev eth1 parent 1:2 classid 1:4 htb rate 64kbit ceil 512kbit prio 1
> > /sbin/tc qdisc add dev eth1 handle 4: parent 1:4 sfq perturb 10 limit 169
> > /sbin/tc class add dev eth1 parent 1:2 classid 1:5 htb rate 32kbit ceil 512kbit prio 2
> > /sbin/tc qdisc add dev eth1 handle 5: parent 1:5 sfq perturb 10 limit 169
> > /sbin/iptables -t mangle -A qos -o eth1 -d 66.8.85.0/28 -j CLASSIFY --set-class 1:3
> > /sbin/iptables -t mangle -A qos -o eth1 -d 66.8.85.80/28 -j CLASSIFY --set-class 1:4
> > /sbin/iptables -t mangle -A qos -o eth1 -d 192.116.106.192/29 -j CLASSIFY --set-class 1:0
> > /sbin/iptables -t mangle -A qos -o eth1 -d 66.8.28.48/29 -j CLASSIFY --set-class 1:0
> > ------------< snip <------< snip <------< snip <------------
> >
> > I've made tons of info available on my home page for you to look at (proc
> > files, vmlinux, System.map, original panic message, etc.
> >
> > http://oasis.frogfoot.net/sfq/
> >
> > --
> >
> > Regards
> > Abraham
> >
> > I saw what you did and I know who you are.
> >
> > ___________________________________________________
> > Abraham vd Merwe [ZR1BBQ] - Frogfoot Networks
> > P.O. Box 3472, Matieland, Stellenbosch, 7602
> > Cell: +27 82 565 4451 Http: http://www.frogfoot.net/
> > Email: [email protected]
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
> --
> http://www.PowerDNS.com Open source, database driven DNS Software
> http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
> http://netherlabs.nl Consulting
>

2003-03-11 15:44:58

by Abraham van der Merwe

[permalink] [raw]
Subject: Re: kernel panic: bug in sch_sfq.c

Hi devik!

In this case, I added the rules the morning, but there was no traffic
flowing through the machine. That evening we redirected traffic through the
box (so HTB obviously kicked in). About 10 minutes later it crashed.

I think the only common thing I've seen so far is that it crashes when htb
is actually doing a lot of shaping/prioritizing - that doesn't help much.

Also, if I compile the kernel with all debugging enabled (CONFIG_DEBUG_SLAB,
etc) I can reliably trigger the BUG() on line 1263 in mm/slab.c - don't know
if this is related to the HTB problem - I've tested this on many different
machines with many different kernels. If I disable CONFIG_DEBUG_SLAB, I
don't get that (obviously). The first crash since I've started disabling
CONFIG_DEBUG_SLAB on our Linux QoS boxes have been this one.

> Hmm,
> I looked at it. It seems that skb linked list was corrupted
> (containing NULL pointer). It could be because of two problems,
> either someone (maybe htb too)'ve overwritten memory or HTB
> called dequeue with wrong argument.
> Latter is unlikely because I call q->dequeue and sfq's
> dequeue was really called. Thus pointer is ok.
> Let's examine how could HTB mungle with qdisc internals. If
> htb would think that leaf is inner node, inner.feed[0] is
> pointer equal to leaf.q. I examined code but there is no
> way to make this mistake.
>
> Last 3 days I got 3 bugreports. Each crashes in different
> place and all seem unrelated. Each is NULL pointer dereference
> though.
> I think there is some place in some code which writes in
> bad random places in memory :-\
>
> To ask all people whose have seen such Oops, have you used
> dynamic tc classes changes ? Like creating/deleting/changing/viewving
> classes offten at runtime ? (I'm trying to find common trigger).
>
> thanks, devik
> HTB maintainer
>
> On Tue, 11 Mar 2003, bert hubert wrote:
>
> > On Tue, Mar 11, 2003 at 11:14:09AM +0200, Abraham van der Merwe wrote:
> > > Hi!
> > >
> > > I have a box that crashed today. Below is the decoded kernel panic. If you
> > > track down the bug PLEASE send me a patch.
> >
> > Weird, Alexeys code is normally very very solid. Perhaps HTB is also
> > involved. Devik?
> >
> > >
> > > ------------< snip <------< snip <------< snip <------------
> > > ksymoops 2.4.8 on i686 2.4.20-rc1. Options used
> > > -v vmlinux-2.4.21-pre5 (specified)
> > > -K (specified)
> > > -L (specified)
> > > -O (specified)
> > > -m System.map-2.4.21-pre5 (specified)
> > >
> > > Unable to handle kernel NULL pointer dereference at virtual address 00000004
> > > *pde = 00000000
> > > Oops: 0002
> > > CPU: 0
> > > EIP: 0010:[<c01a5399>] Not tainted
> > > Using defaults from ksymoops -t elf32-i386 -a i386
> > > EFLAGS: 00010202
> > > eax: 00000000 ebx: c7b9a9e8 ecx: 0000007f edx: c7a8eef8
> > > esi: c7b9ab08 edi: 000007f0 ebp: c7a8e060 esp: c021deb8
> > > ds: 0018 es: 0018 ss: 0018
> > > Process swapper (pid: 0, stackpage=c021d000)
> > > Stack: c7b9a9e8 c7b9ab08 c7f7ee00 c7b9a860 c7b893c0 c7f7ee00 c7b9a860 00000000
> > > c01a3507 c7b5c680 7fb9a9f0 c01a339e c7a8e000 ffffffff 00000018 00000006
> > > c7b9a800 00000018 00000006 c7b9a800 c7b9a9e8 c7b9ab08 c7f7ee00 c01a371c
> > > Call Trace: [<c01a3507>] [<c01a339e>] [<c01a371c>] [<c019f7a3>] [>c019949d>]
> > > [<c0115a6a>] [<c01082bd>] [<c0105240>] [<c0105240>] [<c010a528>] [<c0105240>]
> > > [<c0105240>] [<c0105263>] [<c01052d2>] [<c0105000>] [<c0105027>]
> > > Code: 89 50 04 89 02 8b 5c 24 24 c7 03 00 00 00 00 c7 43 04 00 00
> > >
> > >
> > > >>EIP; c01a5399 <sfq_dequeue+59/1b0> <=====
> > >
> > > >>esp; c021deb8 <init_task_union+1eb8/2000>
> > >
> > > Trace; c01a3507 <htb_dequeue_tree+217/230>
> > > Trace; c01a339e <htb_dequeue_tree+ae/230>
> > > Trace; c01a371c <htb_dequeue+16c/250>
> > > Trace; c019f7a3 <qdisc_restart+13/d0>
> > > Trace; c0115a6a <do_softirq+5a/b0>
> > > Trace; c01082bd <do_IRQ+9d/b0>
> > > Trace; c0105240 <default_idle+0/30>
> > > Trace; c0105240 <default_idle+0/30>
> > > Trace; c010a528 <call_do_IRQ+5/d>
> > > Trace; c0105240 <default_idle+0/30>
> > > Trace; c0105240 <default_idle+0/30>
> > > Trace; c0105263 <default_idle+23/30>
> > > Trace; c01052d2 <cpu_idle+42/60>
> > > Trace; c0105000 <_stext+0/0>
> > > Trace; c0105027 <rest_init+27/30>
> > >
> > > Code; c01a5399 <sfq_dequeue+59/1b0>
> > > 00000000 <_EIP>:
> > > Code; c01a5399 <sfq_dequeue+59/1b0> <=====
> > > 0: 89 50 04 mov %edx,0x4(%eax) <=====
> > > Code; c01a539c <sfq_dequeue+5c/1b0>
> > > 3: 89 02 mov %eax,(%edx)
> > > Code; c01a539e <sfq_dequeue+5e/1b0>
> > > 5: 8b 5c 24 24 mov 0x24(%esp,1),%ebx
> > > Code; c01a53a2 <sfq_dequeue+62/1b0>
> > > 9: c7 03 00 00 00 00 movl $0x0,(%ebx)
> > > Code; c01a53a8 <sfq_dequeue+68/1b0>
> > > f: c7 43 04 00 00 00 00 movl $0x0,0x4(%ebx)
> > >
> > > <0>Kernel panic: Aiee, killing interrupt handler!
> > > ------------< snip <------< snip <------< snip <------------
> > >
> > > Below are the rules that were installed on the system:
> > >
> > > ------------< snip <------< snip <------< snip <------------
> > > /sbin/tc qdisc del dev eth0 root
> > > /sbin/tc qdisc del dev eth1 root
> > > /sbin/iptables -t mangle -F qos
> > > /sbin/iptables -t mangle -Z qos
> > > /sbin/tc qdisc add dev eth0 root handle 1: htb default 5 r2q 1
> > > /sbin/tc class add dev eth0 parent 1: classid 1:1 htb rate 96kbit
> > > /sbin/tc class add dev eth0 parent 1:1 classid 1:2 htb rate 96kbit ceil 96kbit
> > > /sbin/tc class add dev eth0 parent 1:2 classid 1:3 htb rate 48kbit ceil 96kbit prio 1
> > > /sbin/tc qdisc add dev eth0 handle 3: parent 1:3 sfq perturb 10 limit 31
> > > /sbin/tc class add dev eth0 parent 1:2 classid 1:4 htb rate 24kbit ceil 96kbit prio 1
> > > /sbin/tc qdisc add dev eth0 handle 4: parent 1:4 sfq perturb 10 limit 31
> > > /sbin/tc class add dev eth0 parent 1:2 classid 1:5 htb rate 16kbit ceil 96kbit prio 2
> > > /sbin/tc qdisc add dev eth0 handle 5: parent 1:5 sfq perturb 10 limit 31
> > > /sbin/iptables -t mangle -A qos -o eth0 -s 66.8.85.0/28 -j CLASSIFY --set-class 1:3
> > > /sbin/iptables -t mangle -A qos -o eth0 -s 66.8.85.80/28 -j CLASSIFY --set-class 1:4
> > > /sbin/iptables -t mangle -A qos -o eth0 -s 192.116.106.192/29 -j CLASSIFY --set-class 1:0
> > > /sbin/iptables -t mangle -A qos -o eth0 -s 66.8.28.48/29 -j CLASSIFY --set-class 1:0
> > > /sbin/tc qdisc add dev eth1 root handle 1: htb default 5 r2q 2
> > > /sbin/tc class add dev eth1 parent 1: classid 1:1 htb rate 512kbit
> > > /sbin/tc class add dev eth1 parent 1:1 classid 1:2 htb rate 256kbit ceil 512kbit
> > > /sbin/tc class add dev eth1 parent 1:2 classid 1:3 htb rate 128kbit ceil 512kbit prio 1
> > > /sbin/tc qdisc add dev eth1 handle 3: parent 1:3 sfq perturb 10 limit 169
> > > /sbin/tc class add dev eth1 parent 1:2 classid 1:4 htb rate 64kbit ceil 512kbit prio 1
> > > /sbin/tc qdisc add dev eth1 handle 4: parent 1:4 sfq perturb 10 limit 169
> > > /sbin/tc class add dev eth1 parent 1:2 classid 1:5 htb rate 32kbit ceil 512kbit prio 2
> > > /sbin/tc qdisc add dev eth1 handle 5: parent 1:5 sfq perturb 10 limit 169
> > > /sbin/iptables -t mangle -A qos -o eth1 -d 66.8.85.0/28 -j CLASSIFY --set-class 1:3
> > > /sbin/iptables -t mangle -A qos -o eth1 -d 66.8.85.80/28 -j CLASSIFY --set-class 1:4
> > > /sbin/iptables -t mangle -A qos -o eth1 -d 192.116.106.192/29 -j CLASSIFY --set-class 1:0
> > > /sbin/iptables -t mangle -A qos -o eth1 -d 66.8.28.48/29 -j CLASSIFY --set-class 1:0
> > > ------------< snip <------< snip <------< snip <------------
> > >
> > > I've made tons of info available on my home page for you to look at (proc
> > > files, vmlinux, System.map, original panic message, etc.
> > >
> > > http://oasis.frogfoot.net/sfq/

> >
> > --
> > http://www.PowerDNS.com Open source, database driven DNS Software
> > http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
> > http://netherlabs.nl Consulting
> >
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--

Regards
Abraham

Reporter (to Mahatma Gandhi): Mr Gandhi, what do you think of Western
Civilization?
Gandhi: I think it would be a good idea.

___________________________________________________
Abraham vd Merwe - Frogfoot Networks CC
9 Kinnaird Court, 33 Main Street, Newlands, 7700
Phone: +27 21 686 1674 Cell: +27 82 565 4451
Http: http://www.frogfoot.net/ Email: [email protected]

2003-03-11 15:58:11

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: kernel panic: bug in sch_sfq.c

Hello!

> Also, if I compile the kernel with all debugging enabled (CONFIG_DEBUG_SLAB,
> etc) I can reliably trigger the BUG() on line 1263 in mm/slab.c

How does backtrace oops look?

Alexey

2003-03-11 21:09:52

by Abraham van der Merwe

[permalink] [raw]
Subject: Re: kernel panic: bug in sch_sfq.c

Hi kuznet!

> > Also, if I compile the kernel with all debugging enabled (CONFIG_DEBUG_SLAB,
> > etc) I can reliably trigger the BUG() on line 1263 in mm/slab.c
>
> How does backtrace oops look?

I didn't write down most of the BUG() panics, but here is one (unfortunately
it doesn't have any QoS code in the stack trace):

------------< snip <------< snip <------< snip <------------
root@trillian:~/uni-qos# cat panic.txt
c0192eb1
c0176c8c
c017718c
c0176afa
c010810f
c01082b3
c0105240
c0105240
c0105240
c0105240
c0105263
c01052d2
c0105000
c0105027

0f 0b ef 04 e0 87 1e c0 f7 c5 00 04 00 00 74 36 b8 a5 c2 0f

EIP: 0010:c012642e
ESP: c0221eb4

KERNEL BUG slab.c:1263
root@trillian:~/uni-qos#
------------< snip <------< snip <------< snip <------------

A quick objdump through the kernel's vmlinux image reveals, that the stack
trace above looks as follows:

------------< snip <------< snip <------< snip <------------
c0192eb1 alloc_skb
c0176c8c speedo_refill_rx_buf
c017718c speedo_rx
c0176afa speedo_interrupt
c010810f handle_IRQ_event
c01082b3 do_IRQ
c0105240 default_idle
c0105240 default_idle
c0105240 default_idle
c0105240
c0105263 default_idle
c01052d2 cpu_idle
c0105000 rest_init
c0105027 rest_init
------------< snip <------< snip <------< snip <------------

It crashes when it hits BUG(); in slab.c:

------------< snip <------< snip <------< snip <------------
#if DEBUG
if (cachep->flags & SLAB_POISON)
if (kmem_check_poison_obj(cachep, objp))
BUG();
------------< snip <------< snip <------< snip <------------

--

Regards
Abraham

Nothing is so often irretrievably missed as a daily opportunity.
-- Ebner-Eschenbach

___________________________________________________
Abraham vd Merwe - Frogfoot Networks CC
9 Kinnaird Court, 33 Main Street, Newlands, 7700
Phone: +27 21 686 1674 Cell: +27 82 565 4451
Http: http://www.frogfoot.net/ Email: [email protected]


Attachments:
(No filename) (1.91 kB)
(No filename) (232.00 B)
Download all attachments