+Hardware Specs
Dual Xeon 800FSB
Intel Server Board
4GB ECC DDR
3ware 9500 Sata Raid Card
5x200 GB sata drives in a raid 10 Config (1 hot spare)
Dual Nic
+OS Specs
CentOS 3.4 running a custom 2.6.x kernel patched with UML SKA's Patch
eth0 is 0.0.0.0 promisc and assigned to a bridge (br0)
tuntap devices up
ebtables is enabled and loaded with rules
My problem is that every other week or so the machine crashes. It never
dumps the error to the logs so all i have is a screen shot of the console.
I have put some serious stress on this machine and have been unable to
duplicate the problem (running 20 guest UML's half running va-ctcs and the
other half compiling a 2.6 kernel). Below is a link to 2 screen shots i
have (about 2 weeks apart). I started off using a 2.6.10 kernel when the
problem started. Last time the machine crashed i built a 2.6.11.5 kernel
and disabled APM and ACPI in the kernel config. Any body know whats going
on here.
http://www.unix-scripts.com/shaun/host-screenshot-1.png
http://www.unix-scripts.com/shaun/host-screenshot-2.png
Kernel Config... http://www.unix-scripts.com/shaun/2.6.11.5-hr1_.config
--
Best Regards,
Shaun Reitan
On Tue, 5 Apr 2005, shaun wrote:
> +Hardware Specs
> Dual Xeon 800FSB
> Intel Server Board
> 4GB ECC DDR
> 3ware 9500 Sata Raid Card
> 5x200 GB sata drives in a raid 10 Config (1 hot spare)
> Dual Nic
>
> +OS Specs
> CentOS 3.4 running a custom 2.6.x kernel patched with UML SKA's Patch
> eth0 is 0.0.0.0 promisc and assigned to a bridge (br0)
> tuntap devices up
> ebtables is enabled and loaded with rules
Is it possible to run without the bridge for testing purposes, and be
sure to put the normal networking load?
> My problem is that every other week or so the machine crashes. It never
> dumps the error to the logs so all i have is a screen shot of the console.
> I have put some serious stress on this machine and have been unable to
> duplicate the problem (running 20 guest UML's half running va-ctcs and the
> other half compiling a 2.6 kernel). Below is a link to 2 screen shots i
> have (about 2 weeks apart). I started off using a 2.6.10 kernel when the
> problem started. Last time the machine crashed i built a 2.6.11.5 kernel
> and disabled APM and ACPI in the kernel config. Any body know whats going
> on here.
>
> http://www.unix-scripts.com/shaun/host-screenshot-1.png
> http://www.unix-scripts.com/shaun/host-screenshot-2.png
>
> Kernel Config... http://www.unix-scripts.com/shaun/2.6.11.5-hr1_.config
>
> --
> Best Regards,
>
> Shaun Reitan
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
No, sorry, i have to run with bridging support other wise the guests(UML's)
wont be able to communicate with the outside world.
Best Regards,
Shaun R
----- Original Message -----
From: "Zwane Mwaikambo" <[email protected]>
To: "shaun" <[email protected]>
Cc: <[email protected]>
Sent: Wednesday, April 06, 2005 1:10 AM
Subject: Re: kernel panic - not syncing: Fatal exception in interupt
> On Tue, 5 Apr 2005, shaun wrote:
>
> > +Hardware Specs
> > Dual Xeon 800FSB
> > Intel Server Board
> > 4GB ECC DDR
> > 3ware 9500 Sata Raid Card
> > 5x200 GB sata drives in a raid 10 Config (1 hot spare)
> > Dual Nic
> >
> > +OS Specs
> > CentOS 3.4 running a custom 2.6.x kernel patched with UML SKA's Patch
> > eth0 is 0.0.0.0 promisc and assigned to a bridge (br0)
> > tuntap devices up
> > ebtables is enabled and loaded with rules
>
> Is it possible to run without the bridge for testing purposes, and be
> sure to put the normal networking load?
>
> > My problem is that every other week or so the machine crashes. It never
> > dumps the error to the logs so all i have is a screen shot of the
console.
> > I have put some serious stress on this machine and have been unable to
> > duplicate the problem (running 20 guest UML's half running va-ctcs and
the
> > other half compiling a 2.6 kernel). Below is a link to 2 screen shots
i
> > have (about 2 weeks apart). I started off using a 2.6.10 kernel when
the
> > problem started. Last time the machine crashed i built a 2.6.11.5
kernel
> > and disabled APM and ACPI in the kernel config. Any body know whats
going
> > on here.
> >
> > http://www.unix-scripts.com/shaun/host-screenshot-1.png
> > http://www.unix-scripts.com/shaun/host-screenshot-2.png
> >
> > Kernel Config... http://www.unix-scripts.com/shaun/2.6.11.5-hr1_.config
> >
> > --
> > Best Regards,
> >
> > Shaun Reitan
> >
> >
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
On Wed, 6 Apr 2005, [email protected] wrote:
> No, sorry, i have to run with bridging support other wise the guests(UML's)
> wont be able to communicate with the outside world.
Ok in that case, can you connect a serial console so that you can capture
the entire output?
Thanks,
Zwane
The last time i crashed i changed the console resolution, i'm hoping it will
give me the whole dump this time. I will see if i can get a serial console
on it.
Best Regards,
Shaun Reitan
Account Specialist
http://www.NDCHost.com
http://www.cPlicensing.net
----- Original Message -----
From: "Zwane Mwaikambo" <[email protected]>
To: "[email protected]" <[email protected]>
Cc: <[email protected]>
Sent: Thursday, April 07, 2005 1:09 AM
Subject: Re: kernel panic - not syncing: Fatal exception in interupt
> On Wed, 6 Apr 2005, [email protected] wrote:
>
> > No, sorry, i have to run with bridging support other wise the
guests(UML's)
> > wont be able to communicate with the outside world.
>
> Ok in that case, can you connect a serial console so that you can capture
> the entire output?
>
> Thanks,
> Zwane
>
>
The machine crashed again twice today. I have vga=791 so i caugh a bit more
of the crash. i enabled serial redirection in the bios so i'm hoping to
catch the full dump next time.
The first screen shot is with the old resolution so didnt catch much more
here...
http://www.unix-scripts.com/shaun/host1-2005-04-12-01.png
But this screen shot got a nice chunk and looks a bit diffrent.
http://www.unix-scripts.com/shaun/host1-2005-04-12-02.png
Still looks like there is alot more that i'm missing but by glancing at that
dump, to me it definitly seams like bridging is causing this. I'm going to
post this to the ebtables lists tomarrow also.
Best Regards,
Shaun R.
----- Original Message -----
From: "Zwane Mwaikambo" <[email protected]>
To: "[email protected]" <[email protected]>
Cc: <[email protected]>
Sent: Thursday, April 07, 2005 1:09 AM
Subject: Re: kernel panic - not syncing: Fatal exception in interupt
> On Wed, 6 Apr 2005, [email protected] wrote:
>
> > No, sorry, i have to run with bridging support other wise the
guests(UML's)
> > wont be able to communicate with the outside world.
>
> Ok in that case, can you connect a serial console so that you can capture
> the entire output?
>
> Thanks,
> Zwane
>
>
On Tue, 12 Apr 2005, [email protected] wrote:
> The machine crashed again twice today. I have vga=791 so i caugh a bit more
> of the crash. i enabled serial redirection in the bios so i'm hoping to
> catch the full dump next time.
Cool, can you also try Cc'ing [email protected]?
Thanks,
Zwane
OK, finally got a full dump from the serial console! Here is it!
----------------------------------------------------------------------------
---------------------------------------
Unable to handle kernel paging request at virtual address f8b6f02c
printing eip:
f88b0078
*pde = 031f6067
Oops: 0000 [#1]
SMP
Modules linked in: loop tun ebt_ip ebt_arp ebtable_filter ebtables autofs4
ipv6
bridge e1000 ohci1394 ieee1394 floppy sg parport_pc parport ext3 jbd 3w_9xxx
sd_
mod scsi_mod
CPU: 0
EIP: 0060:[<f88b0078>] Not tainted VLI
EFLAGS: 00010286 (2.6.11.5-hr1)
EIP is at ebt_do_table+0x78/0x6e0 [ebtables]
eax: f8b6f000 ebx: f88b7000 ecx: f88b7080 edx: f88b7080
esi: c03e8d20 edi: c0438f08 ebp: 80000000 esp: c03e8c7c
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c03e8000 task=c0333b60)
Stack: c0438dd8 80000000 c0294362 f7728580 c02a9010 00000003 00000003
c85c6012
f8816610 f8b6f000 f8b6f000 f8b39000 00000000 f88b7080 f88b7080
0000000a
0000000a f6646800 c03e8d1c 00000001 f8816620 c03e8d20 c0438f08
80000000
Call Trace:
[<c0294362>] nf_iterate+0x72/0xb0
[<c02a9010>] dst_output+0x0/0x20
[<c0294362>] nf_iterate+0x72/0xb0
[<f889e6d0>] br_pass_frame_up_finish+0x0/0x10 [bridge]
[<c0294708>] nf_hook_slow+0x68/0xf0
[<f889e6d0>] br_pass_frame_up_finish+0x0/0x10 [bridge]
[<f889e73a>] br_pass_frame_up+0x5a/0x60 [bridge]
[<f889e6d0>] br_pass_frame_up_finish+0x0/0x10 [bridge]
[<f889e7b9>] br_handle_frame_finish+0x79/0x110 [bridge]
[<f889e740>] br_handle_frame_finish+0x0/0x110 [bridge]
[<c0294752>] nf_hook_slow+0xb2/0xf0
[<f889e740>] br_handle_frame_finish+0x0/0x110 [bridge]
[<f88a231c>] br_nf_pre_routing_finish+0xfc/0x290 [bridge]
[<f889e740>] br_handle_frame_finish+0x0/0x110 [bridge]
[<c0113d20>] try_to_wake_up+0x220/0x250
[<c012dbeb>] autoremove_wake_function+0x1b/0x50
[<c0115737>] __wake_up_common+0x37/0x60
[<c0122386>] __mod_timer+0xd6/0x120
[<c0294362>] nf_iterate+0x72/0xb0
[<f88a2220>] br_nf_pre_routing_finish+0x0/0x290 [bridge]
[<f88a2220>] br_nf_pre_routing_finish+0x0/0x290 [bridge]
[<c0294752>] nf_hook_slow+0xb2/0xf0
[<f88a2220>] br_nf_pre_routing_finish+0x0/0x290 [bridge]
[<f88a29f7>] br_nf_pre_routing+0x257/0x410 [bridge]
[<f88a2220>] br_nf_pre_routing_finish+0x0/0x290 [bridge]
[<c0294362>] nf_iterate+0x72/0xb0
[<f889e740>] br_handle_frame_finish+0x0/0x110 [bridge]
[<f889e740>] br_handle_frame_finish+0x0/0x110 [bridge]
[<c0294708>] nf_hook_slow+0x68/0xf0
[<f889e740>] br_handle_frame_finish+0x0/0x110 [bridge]
[<f889e94a>] br_handle_frame+0xfa/0x1e0 [bridge]
[<f889e740>] br_handle_frame_finish+0x0/0x110 [bridge]
[<c028aa3d>] netif_receive_skb+0x13d/0x2c0
[<f888b42e>] e1000_clean_rx_irq+0x15e/0x4a0 [e1000]
[<c02849d9>] __kfree_skb+0xa9/0x150
[<f888b06a>] e1000_clean+0xba/0xf0 [e1000]
[<c028ad4f>] net_rx_action+0x6f/0x100
[<c011e909>] __do_softirq+0xb9/0xd0
[<c0104a8a>] do_softirq+0x4a/0x60
=======================
[<c0104953>] do_IRQ+0x63/0xb0
[<c010e7d0>] smp_apic_timer_interrupt+0xd0/0xe0
[<c01030c2>] common_interrupt+0x1a/0x20
[<c0100812>] mwait_idle+0x52/0x80
[<c0100650>] default_idle+0x0/0x30
[<c010071b>] cpu_idle+0x5b/0x70
[<c03ad911>] start_kernel+0x161/0x1a0
[<c03ad350>] unknown_bootoption+0x0/0x1e0
Code: 00 89 54 24 34 8b 43 20 c7 44 24 2c 00 00 00 00 85 c0 74 07 8b 0c 88
89 4c
24 2c 8b 44 24 4c 8b 4c 24 34 8b 44 83 08 89 44 24 28 <8b> 50 2c 89 c5 83
c5 30
89 54 24 3c 8b 40 24 c1 e0 04 01 c8 89
<0>Kernel panic - not syncing: Fatal exception in interrupt
----------------------------------------------------------------------------
-----------------------------
--
Best Regards,
Shaun Reitan
"Zwane Mwaikambo" <[email protected]> wrote in message
news:[email protected]...
> On Tue, 12 Apr 2005, [email protected] wrote:
>
> > The machine crashed again twice today. I have vga=791 so i caugh a bit
more
> > of the crash. i enabled serial redirection in the bios so i'm hoping to
> > catch the full dump next time.
>
> Cool, can you also try Cc'ing [email protected]?
>
> Thanks,
> Zwane
>
On Sun, Apr 17, 2005 at 08:32:42PM +0000, Shaun Reitan wrote:
> OK, finally got a full dump from the serial console! Here is it!
This was fixed about a month ago. Here is the patch that did it.
Perhaps it's time to include this in 2.6.11.*?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
I actually ran accross that patch this morning on the ebtables lists but i
wasnt sure it would fix the problem i was having. I will get this patched
into the kernel and see what happens. Thanks for the responce, if this
went on much longer this $4,000 machine was going to become a paper weight
:)
Best Regards,
Shaun Reitan
----- Original Message -----
From: "Herbert Xu" <[email protected]>
To: "Shaun Reitan" <[email protected]>
Cc: <[email protected]>; <[email protected]>
Sent: Sunday, April 17, 2005 11:07 PM
Subject: Re: kernel panic - not syncing: Fatal exception in interupt
> On Sun, Apr 17, 2005 at 08:32:42PM +0000, Shaun Reitan wrote:
> > OK, finally got a full dump from the serial console! Here is it!
>
> This was fixed about a month ago. Here is the patch that did it.
>
> Perhaps it's time to include this in 2.6.11.*?
>
> Cheers,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>
Is there a way i can trigger this bug rather than waiting for it to happen?
I would like to make sure it's not going to crash on me again.
--
Shaun Reitan
"Herbert Xu" <[email protected]> wrote in message
news:[email protected]...
> On Sun, Apr 17, 2005 at 08:32:42PM +0000, Shaun Reitan wrote:
> > OK, finally got a full dump from the serial console! Here is it!
>
> This was fixed about a month ago. Here is the patch that did it.
>
> Perhaps it's time to include this in 2.6.11.*?
>
> Cheers,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <[email protected]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>
On Mon, Apr 18, 2005 at 04:07:44PM +1000, Herbert Xu wrote:
> On Sun, Apr 17, 2005 at 08:32:42PM +0000, Shaun Reitan wrote:
> > OK, finally got a full dump from the serial console! Here is it!
>
> This was fixed about a month ago. Here is the patch that did it.
>
> Perhaps it's time to include this in 2.6.11.*?
Applied to the -stable queue, thanks.
greg k-h