It's almost 100% (sometimes it just hangs) reproducable for me
although in somewhat strange situation. I have to run Quagga/Zebra
routing suite with zebra and ospfd daemons running. Networking
restart script (removing 60 vlans, creating them again and assigning
IPs to them) leads to panic. Process isn't always swapper, I have
seen ip and kupdated as well, but trace is always same. I can't
reproduce it with 2.4.20 kernel.
Unable to handle kernel NULL pointer dereference at virtual address
00000000
c0118743
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0118743>] Not tainted
EFLAGS: 00010082
eax: c02da4c4 ebx: c02da3c4 ecx: c02da4c4 edx: 00000000
esi: c02e7a20 edi: c02da2a0 ebp: c028df40 esp: c028df14
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c028d000)
Stack: 00000000 c02d1b80 00000000 00002400 00000000 00002400 00000004
00000000
00000001 c028df38 c028df38 c028df48 c0115898 c028df60 c01157c9
00000000
00000001 c02d1ba0 fffffffe c028df7c c01155ab c02d1ba0 00000000
c02d1900
Call Trace: [<c0115898>] [<c01157c9>] [<c01155ab>] [<c0108072>]
[<c0105260>]
[<c0105260>] [<c010a228>] [<c0105260>] [<c0105260>] [<c0105286>]
[<c01052f9>]
[<c0105000>] [<c010502a>]
Code: 8b 02 89 48 04 89 01 89 51 04 89 0a 89 f1 39 d9 0f 85 37 ff
>>EIP; c0118743 <timer_bh+1b7/35c> <=====
>>eax; c02da4c4 <tv1+4/804>
>>ebx; c02da3c4 <tv2+124/220>
>>ecx; c02da4c4 <tv1+4/804>
>>esi; c02e7a20 <serial_timer+0/20>
>>edi; c02da2a0 <tv2+0/220>
>>ebp; c028df40 <init_task_union+1f40/2000>
>>esp; c028df14 <init_task_union+1f14/2000>
Trace; c0115898 <bh_action+1c/4c>
Trace; c01157c9 <tasklet_hi_action+49/70>
Trace; c01155ab <do_softirq+4b/a0>
Trace; c0108072 <do_IRQ+96/a8>
Trace; c0105260 <default_idle+0/30>
Trace; c0105260 <default_idle+0/30>
Trace; c010a228 <call_do_IRQ+5/d>
Trace; c0105260 <default_idle+0/30>
Trace; c0105260 <default_idle+0/30>
Trace; c0105286 <default_idle+26/30>
Trace; c01052f9 <cpu_idle+41/54>
Trace; c0105000 <_stext+0/0>
Trace; c010502a <rest_init+2a/30>
Code; c0118743 <timer_bh+1b7/35c>
00000000 <_EIP>:
Code; c0118743 <timer_bh+1b7/35c> <=====
0: 8b 02 mov (%edx),%eax <=====
Code; c0118745 <timer_bh+1b9/35c>
2: 89 48 04 mov %ecx,0x4(%eax)
Code; c0118748 <timer_bh+1bc/35c>
5: 89 01 mov %eax,(%ecx)
Code; c011874a <timer_bh+1be/35c>
7: 89 51 04 mov %edx,0x4(%ecx)
Code; c011874d <timer_bh+1c1/35c>
a: 89 0a mov %ecx,(%edx)
Code; c011874f <timer_bh+1c3/35c>
c: 89 f1 mov %esi,%ecx
Code; c0118751 <timer_bh+1c5/35c>
e: 39 d9 cmp %ebx,%ecx
Code; c0118753 <timer_bh+1c7/35c>
10: 0f 85 37 ff 00 00 jne ff4d <_EIP+0xff4d> c0128690
<swap_entry
_free+18/3c>
<0>Kernel panic: Aiee, killing interrupt handler!
--
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
Hasso Tepper wrote:
> It's almost 100% (sometimes it just hangs) reproducable for me
> although in somewhat strange situation. I have to run Quagga/Zebra
> routing suite with zebra and ospfd daemons running. Networking
> restart script (removing 60 vlans, creating them again and
> assigning IPs to them) leads to panic. Process isn't always
> swapper, I have seen ip and kupdated as well, but trace is always
> same. I can't reproduce it with 2.4.20 kernel.
It's introduced with 2.4.25-rc1 (2.4.25-pre8 is OK). And it's still
there in 2.4.26-rc1.
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000000
> c0118743
> *pde = 00000000
> Oops: 0000
> CPU: 0
> EIP: 0010:[<c0118743>] Not tainted
> EFLAGS: 00010082
> eax: c02da4c4 ebx: c02da3c4 ecx: c02da4c4 edx: 00000000
> esi: c02e7a20 edi: c02da2a0 ebp: c028df40 esp: c028df14
> ds: 0018 es: 0018 ss: 0018
> Process swapper (pid: 0, stackpage=c028d000)
> Stack: 00000000 c02d1b80 00000000 00002400 00000000 00002400
> 00000004 00000000
> 00000001 c028df38 c028df38 c028df48 c0115898 c028df60
> c01157c9 00000000
> 00000001 c02d1ba0 fffffffe c028df7c c01155ab c02d1ba0
> 00000000 c02d1900
> Call Trace: [<c0115898>] [<c01157c9>] [<c01155ab>] [<c0108072>]
> [<c0105260>]
> [<c0105260>] [<c010a228>] [<c0105260>] [<c0105260>] [<c0105286>]
> [<c01052f9>]
> [<c0105000>] [<c010502a>]
> Code: 8b 02 89 48 04 89 01 89 51 04 89 0a 89 f1 39 d9 0f 85 37 ff
>
> >>EIP; c0118743 <timer_bh+1b7/35c> <=====
> >>
> >>eax; c02da4c4 <tv1+4/804>
> >>ebx; c02da3c4 <tv2+124/220>
> >>ecx; c02da4c4 <tv1+4/804>
> >>esi; c02e7a20 <serial_timer+0/20>
> >>edi; c02da2a0 <tv2+0/220>
> >>ebp; c028df40 <init_task_union+1f40/2000>
> >>esp; c028df14 <init_task_union+1f14/2000>
>
> Trace; c0115898 <bh_action+1c/4c>
> Trace; c01157c9 <tasklet_hi_action+49/70>
> Trace; c01155ab <do_softirq+4b/a0>
> Trace; c0108072 <do_IRQ+96/a8>
> Trace; c0105260 <default_idle+0/30>
> Trace; c0105260 <default_idle+0/30>
> Trace; c010a228 <call_do_IRQ+5/d>
> Trace; c0105260 <default_idle+0/30>
> Trace; c0105260 <default_idle+0/30>
> Trace; c0105286 <default_idle+26/30>
> Trace; c01052f9 <cpu_idle+41/54>
> Trace; c0105000 <_stext+0/0>
> Trace; c010502a <rest_init+2a/30>
>
> Code; c0118743 <timer_bh+1b7/35c>
> 00000000 <_EIP>:
> Code; c0118743 <timer_bh+1b7/35c> <=====
> 0: 8b 02 mov (%edx),%eax <=====
> Code; c0118745 <timer_bh+1b9/35c>
> 2: 89 48 04 mov %ecx,0x4(%eax)
> Code; c0118748 <timer_bh+1bc/35c>
> 5: 89 01 mov %eax,(%ecx)
> Code; c011874a <timer_bh+1be/35c>
> 7: 89 51 04 mov %edx,0x4(%ecx)
> Code; c011874d <timer_bh+1c1/35c>
> a: 89 0a mov %ecx,(%edx)
> Code; c011874f <timer_bh+1c3/35c>
> c: 89 f1 mov %esi,%ecx
> Code; c0118751 <timer_bh+1c5/35c>
> e: 39 d9 cmp %ebx,%ecx
> Code; c0118753 <timer_bh+1c7/35c>
> 10: 0f 85 37 ff 00 00 jne ff4d <_EIP+0xff4d>
> c0128690 <swap_entry
> _free+18/3c>
>
> <0>Kernel panic: Aiee, killing interrupt handler!
--
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
I'm having similar problems, and wonder:
1) Is the box also serving as a firewall (iptables)?
2) Are you using Intel E100 nics?
Phil Oester
On Sun, Mar 28, 2004 at 07:11:06PM +0300, Hasso Tepper wrote:
> Hasso Tepper wrote:
> > It's almost 100% (sometimes it just hangs) reproducable for me
> > although in somewhat strange situation. I have to run Quagga/Zebra
> > routing suite with zebra and ospfd daemons running. Networking
> > restart script (removing 60 vlans, creating them again and
> > assigning IPs to them) leads to panic. Process isn't always
> > swapper, I have seen ip and kupdated as well, but trace is always
> > same. I can't reproduce it with 2.4.20 kernel.
>
> It's introduced with 2.4.25-rc1 (2.4.25-pre8 is OK). And it's still
> there in 2.4.26-rc1.
If you are using E100's, can you backout (patch -R) this patch from
2.4.25 vanilla:
http://linux.bkbits.net:8080/linux-2.4/gnupatch@401f2442_DogKaCRsaoMURrjvGCz4w
if using E1000's, can you backout this patch:
http://linux.bkbits.net:8080/linux-2.4/gnupatch@402212c1G2hqO92c1xCHTWJ4AFBFPQ
And see if this stops the panics?
Phil Oester
> On Sun, Mar 28, 2004 at 07:11:06PM +0300, Hasso Tepper wrote:
> > Hasso Tepper wrote:
> > > It's almost 100% (sometimes it just hangs) reproducable for me
> > > although in somewhat strange situation. I have to run Quagga/Zebra
> > > routing suite with zebra and ospfd daemons running. Networking
> > > restart script (removing 60 vlans, creating them again and
> > > assigning IPs to them) leads to panic. Process isn't always
> > > swapper, I have seen ip and kupdated as well, but trace is always
> > > same. I can't reproduce it with 2.4.20 kernel.
> >
> > It's introduced with 2.4.25-rc1 (2.4.25-pre8 is OK). And it's still
> > there in 2.4.26-rc1.
Phil Oester wrote:
> If you are using E100's, can you backout (patch -R) this patch from
> 2.4.25 vanilla:
>
> http://linux.bkbits.net:8080/linux-2.4/gnupatch@401f2442_DogKaCRsao
>MURrjvGCz4w
>
> if using E1000's, can you backout this patch:
Yes.
> http://linux.bkbits.net:8080/linux-2.4/gnupatch@402212c1G2hqO92c1xC
>HTWJ4AFBFPQ
But this patch isn't in 2.4.25-rc1 I have already problem with.
> And see if this stops the panics?
Now when linux.bkbits.net is working again, I walked through patches
between 2.4.25-pre8 and 2.4.25-rc1 and reverted this patch -
http://linux.bkbits.net:8080/linux-2.4/cset%
401.1290.17.1?nav=index.html|ChangeSet@-9M
This solved problem for me, seems (tested only with 2.4.25-rc1 for
now). I had feeling that it's related to multicast from the beginning
because I couldn't reproduce panic when I hadn't ospfd daemon running
(ospf uses multicast).
> Phil Oester
>
> > On Sun, Mar 28, 2004 at 07:11:06PM +0300, Hasso Tepper wrote:
> > > Hasso Tepper wrote:
> > > > It's almost 100% (sometimes it just hangs) reproducable for
> > > > me although in somewhat strange situation. I have to run
> > > > Quagga/Zebra routing suite with zebra and ospfd daemons
> > > > running. Networking restart script (removing 60 vlans,
> > > > creating them again and assigning IPs to them) leads to
> > > > panic. Process isn't always swapper, I have seen ip and
> > > > kupdated as well, but trace is always same. I can't reproduce
> > > > it with 2.4.20 kernel.
> > >
> > > It's introduced with 2.4.25-rc1 (2.4.25-pre8 is OK). And it's
> > > still there in 2.4.26-rc1.
--
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
Do you have CONFIG_IP_MULTICAST enabled in your .config? I don't, and
a couple of the changes in this changeset depend upon it.
I also run ospfd, so maybe you've hit upon something here...cc'ing
linux-net for comment
Phil
p.s. here's a bookmarkable link to that changeset:
http://linux.bkbits.net:8080/linux-2.4/cset@401ee07fZyaInErbsMYxlCIQSlevFQ?nav=index.html|ChangeSet@-9M
On Sun, Mar 28, 2004 at 08:33:16PM +0300, Hasso Tepper wrote:
> Now when linux.bkbits.net is working again, I walked through patches
> between 2.4.25-pre8 and 2.4.25-rc1 and reverted this patch -
> http://linux.bkbits.net:8080/linux-2.4/cset%
> 401.1290.17.1?nav=index.html|ChangeSet@-9M
>
> This solved problem for me, seems (tested only with 2.4.25-rc1 for
> now). I had feeling that it's related to multicast from the beginning
> because I couldn't reproduce panic when I hadn't ospfd daemon running
> (ospf uses multicast).
Phil Oester wrote:
> Do you have CONFIG_IP_MULTICAST enabled in your .config?
Yes.
> I don't, and a couple of the changes in this changeset depend upon
> it.
>
> I also run ospfd, so maybe you've hit upon something here...cc'ing
> linux-net for comment
--
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
IIRC a similar problem was in v2.6.
I'll dig it up.
On Sun, Mar 28, 2004 at 10:17:54PM +0300, Hasso Tepper wrote:
> Phil Oester wrote:
> > Do you have CONFIG_IP_MULTICAST enabled in your .config?
>
> Yes.
>
> > I don't, and a couple of the changes in this changeset depend upon
> > it.
> >
> > I also run ospfd, so maybe you've hit upon something here...cc'ing
> > linux-net for comment
> IIRC a similar problem was in v2.6.
>
> I'll dig it up.
Was any progress made on this problem?
I am seeing the same panic as was originally reported using both kernel
2.4.25 and 2.4.26-rc1, I can easily reproduce it under the same
conditions as Hasso described in the original email.
With quagga/ospfd running I simply execute
ifconfig eth0 down
ifconfig eth0 up
in quick succession and a panic follows within 20 seconds.
The panic does not occur if ospfd is not running, or if i pause for at
least 10 seconds between the two commands.
Let me know if I can provide any more information that would be helpful
in solving this problem.
Regards
--
Matt Brown
Email: [email protected]
GSM : 021 611 544
Matt Brown wrote:
> Was any progress made on this problem?
>
> I am seeing the same panic as was originally reported using both kernel
> 2.4.25 and 2.4.26-rc1, I can easily reproduce it under the same
> conditions as Hasso described in the original email.
>
> With quagga/ospfd running I simply execute
> ifconfig eth0 down
> ifconfig eth0 up
> in quick succession and a panic follows within 20 seconds.
>
> The panic does not occur if ospfd is not running, or if i pause for at
> least 10 seconds between the two commands.
>
> Let me know if I can provide any more information that would be helpful
> in solving this problem.
Could you try applying the following patch:
http://marc.theaimsgroup.com/?l=linux-netdev&m=108079992001559&w=2
thanks,
Nivedita
On Mon, 2004-04-05 at 14:02, Nivedita Singhvi wrote:
> Could you try applying the following patch:
>
> http://marc.theaimsgroup.com/?l=linux-netdev&m=108079992001559&w=2
Works perfectly. I will test more extensively over the next day or two.
This patch will be in 2.4.26 when it is released I assume?
Regards
--
Matt Brown
Email: [email protected]
GSM : 021 611 544
Matt Brown wrote:
>>http://marc.theaimsgroup.com/?l=linux-netdev&m=108079992001559&w=2
>
>
> Works perfectly. I will test more extensively over the next day or two.
>
> This patch will be in 2.4.26 when it is released I assume?
Great, thanks! Yep, DaveM has checked in the patch
from Dave Stevens, 2.4 and 2.6 trees.
thanks,
Nivedita
On Mon, Apr 05, 2004 at 01:42:34PM +1200, Matt Brown wrote:
> > IIRC a similar problem was in v2.6.
> >
> > I'll dig it up.
>
> Was any progress made on this problem?
>
> I am seeing the same panic as was originally reported using both kernel
> 2.4.25 and 2.4.26-rc1, I can easily reproduce it under the same
> conditions as Hasso described in the original email.
>
> With quagga/ospfd running I simply execute
> ifconfig eth0 down
> ifconfig eth0 up
> in quick succession and a panic follows within 20 seconds.
>
> The panic does not occur if ospfd is not running, or if i pause for at
> least 10 seconds between the two commands.
>
> Let me know if I can provide any more information that would be helpful
> in solving this problem.
Matt,
This oops should be fixed by
http://linux.bkbits.net:8080/linux-2.4/[email protected]?nav=index.html|ChangeSet@-7d|[email protected]
Which will be part of 2.4.26-rc2. Please try it.
Marcelo Tosatti wrote:
> This oops should be fixed by
>
> http://linux.bkbits.net:8080/linux-2.4/[email protected]?nav=index.html|
>ChangeSet@-7d|[email protected]
>
> Which will be part of 2.4.26-rc2. Please try it.
Seems so. Thanks.
--
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
On Tue, 2004-04-06 at 21:00, Hasso Tepper wrote:
> > Which will be part of 2.4.26-rc2. Please try it.
>
> Seems so. Thanks.
Yes, it solves our problem as well.
Many Thanks.
--
Matt Brown
Email: [email protected]
GSM : 021 611 544