Today our mailserver froze after just one day of uptime. I was able to
capture the Oops on the screen using my digital camera:
http://www.stahl.bau.tu-bs.de/~hildeb/bugreport/
Keywords: EIP is at journal_commit_transaction, process kjournald
# mount
/dev/cciss/c0d0p6 on / type ext3 (rw,errors=remount-ro)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/cciss/c0d0p5 on /boot type ext3 (rw)
/dev/shm on /var/amavis type tmpfs (rw,noatime,size=200m,mode=770,uid=104,gid=108)
--
Ralf Hildebrandt (i.A. des IT-Zentrum) [email protected]
Charite - Universit?tsmedizin Berlin Tel. +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin Fax. +49 (0)30-450 570-962
IT-Zentrum Standort CBF send no mail to [email protected]
Hello,
> Today our mailserver froze after just one day of uptime. I was able to
> capture the Oops on the screen using my digital camera:
>
> http://www.stahl.bau.tu-bs.de/~hildeb/bugreport/
>
> Keywords: EIP is at journal_commit_transaction, process kjournald
I guess the system is SMP... Sadly a few lines in the beginning of the
report are missing (probably scrolled off the screen) but it seems
similar like a several other oopses I've seen reported recently. Is this
the first time you hit this bug?
> # mount
> /dev/cciss/c0d0p6 on / type ext3 (rw,errors=remount-ro)
> proc on /proc type proc (rw)
> sysfs on /sys type sysfs (rw)
> devpts on /dev/pts type devpts (rw,gid=5,mode=620)
> tmpfs on /dev/shm type tmpfs (rw)
> /dev/cciss/c0d0p5 on /boot type ext3 (rw)
> /dev/shm on /var/amavis type tmpfs (rw,noatime,size=200m,mode=770,uid=104,gid=108)
Honza
--
Jan Kara <[email protected]>
SuSE CR Labs
* Jan Kara <[email protected]>:
> I guess the system is SMP...
Indeed it is. Dual Xeon with SMP.
> Sadly a few lines in the beginning of the
> report are missing (probably scrolled off the screen)
Yes, this sucks. I rebooted with vesafb active, no I do have 50 lines :)
> but it seems similar like a several other oopses I've seen reported
> recently. Is this the first time you hit this bug?
It's actually the second time. The first time it hit the SAME box but
with kernel-2.6.10 (vanilla) after 30 days of uptime. Nobody had a
camera at hand, so I couldn't take a photo.
Any suggestions? I'm open to suggestions. One difference between the
2.6.10 and 2.6.10-ac12 was that 2.6.10 has no in-kernel irq
balancing, while in 2.6.10-ac12 I acivated that.
--
Ralf Hildebrandt (i.A. des IT-Zentrum) [email protected]
Charite - Universit?tsmedizin Berlin Tel. +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin Fax. +49 (0)30-450 570-962
IT-Zentrum Standort CBF send no mail to [email protected]
On Wed, 2005-02-16 at 21:04 +0100, Ralf Hildebrandt wrote:
> * Jan Kara <[email protected]>:
>
> > I guess the system is SMP...
>
> Indeed it is. Dual Xeon with SMP.
>
This looks very similar (at least to me) to an OOPS I posted with 2.6.9
on 12/03/2004.
http://marc.theaimsgroup.com/?l=linux-kernel&m=110210705504716&w=2
My system is also a dual Xeon using SMP and Hyperthreading
(/proc/cpuinfo shows 4 cpus).
Mine, like Ralf's, is also a mail server running postfix using ext3 for
the spool directory.
> > but it seems similar like a several other oopses I've seen reported
> > recently. Is this the first time you hit this bug?
>
> It's actually the second time. The first time it hit the SAME box but
> with kernel-2.6.10 (vanilla) after 30 days of uptime. Nobody had a
> camera at hand, so I couldn't take a photo.
>
I've actually hit this bug (assuming it's the same) with 2.6.10 also. I
had to power cycle remotely and unfortunately didn't have the serial
console logging enabled when it happened with 2.6.10. I upgraded from
2.4.23 to 2.6.8.1 and crashed within a week, and continued to crash at
least monthly after that. It had been running 2.4.23 for 200+ days with
no problems.
Hope this helps trace it back.
Dale
* Dale Blount <[email protected]>:
> This looks very similar (at least to me) to an OOPS I posted with 2.6.9
> on 12/03/2004.
> http://marc.theaimsgroup.com/?l=linux-kernel&m=110210705504716&w=2
Could be.
> My system is also a dual Xeon using SMP and Hyperthreading
> (/proc/cpuinfo shows 4 cpus).
Same system here.
> Mine, like Ralf's, is also a mail server running postfix using ext3 for
> the spool directory.
Same here.
> I've actually hit this bug (assuming it's the same) with 2.6.10 also. I
> had to power cycle remotely and unfortunately didn't have the serial
> console logging enabled when it happened with 2.6.10. I upgraded from
> 2.4.23 to 2.6.8.1 and crashed within a week, and continued to crash at
> least monthly after that. It had been running 2.4.23 for 200+ days with
> no problems.
>
> Hope this helps trace it back.
Me too
--
Ralf Hildebrandt (i.A. des IT-Zentrum) [email protected]
Charite - Universit?tsmedizin Berlin Tel. +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin Fax. +49 (0)30-450 570-962
IT-Zentrum Standort CBF send no mail to [email protected]
Dale Blount <[email protected]> wrote:
>
> This looks very similar (at least to me) to an OOPS I posted with 2.6.9
> on 12/03/2004.
> http://marc.theaimsgroup.com/?l=linux-kernel&m=110210705504716&w=2
There have been a handful of reports - there's surely a race in there.
Unfortunately I've yet to see a report from which we can identify the
offending line in the very large journal_commit_transaction() function.
The best way to do that is to ensure that the kernel was built with
CONFIG_DEBUG_INFO, note the offending EIP value, then do
# gdb vmlinux
(gdb) l *0xc0<whatever>
* Andrew Morton <[email protected]>:
> There have been a handful of reports - there's surely a race in there.
>
> Unfortunately I've yet to see a report from which we can identify the
> offending line in the very large journal_commit_transaction() function.
:(
>
> The best way to do that is to ensure that the kernel was built with
> CONFIG_DEBUG_INFO, note the offending EIP value, then do
>
> # gdb vmlinux
> (gdb) l *0xc0<whatever>
I'm rebuilding the ac12 kernel which crashed on me after just one day
and will reboot it today.
--
Ralf Hildebrandt (i.A. des IT-Zentrum) [email protected]
Charite - Universit?tsmedizin Berlin Tel. +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin Fax. +49 (0)30-450 570-962
IT-Zentrum Standort CBF send no mail to [email protected]
* Ralf Hildebrandt <[email protected]>:
> > The best way to do that is to ensure that the kernel was built with
> > CONFIG_DEBUG_INFO, note the offending EIP value, then do
> >
> > # gdb vmlinux
> > (gdb) l *0xc0<whatever>
>
> I'm rebuilding the ac12 kernel which crashed on me after just one day
> and will reboot it today.
Is it normal that the kernel with debugging enabled is not larger than
the normal kernel?
Ralf Hildebrandt wrote:
> * Ralf Hildebrandt <[email protected]>:
>
>
>>>The best way to do that is to ensure that the kernel was built with
>>>CONFIG_DEBUG_INFO, note the offending EIP value, then do
>>>
>>># gdb vmlinux
>>>(gdb) l *0xc0<whatever>
>>
>>I'm rebuilding the ac12 kernel which crashed on me after just one day
>>and will reboot it today.
>
>
> Is it normal that the kernel with debugging enabled is not larger than
> the normal kernel?
> -
No, it should be much larger. Recheck the .config file
for CONFIG_DEBUG_INFO=y. Maybe you need to do 'make clean'
first.
--
~Randy
* Randy.Dunlap <[email protected]>:
> >Is it normal that the kernel with debugging enabled is not larger than
> >the normal kernel?
> >-
>
> No, it should be much larger. Recheck the .config file
> for CONFIG_DEBUG_INFO=y. Maybe you need to do 'make clean'
> first.
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
# CONFIG_SCHEDSTATS is not set
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_KOBJECT is not set
# CONFIG_DEBUG_HIGHMEM is not set
CONFIG_DEBUG_INFO=y
# CONFIG_FRAME_POINTER is not set
CONFIG_EARLY_PRINTK=y
I built that using "make-kpkg"
make-kpkg clean
CONCURRENCY_LEVEL=4 MAKEFLAGS="CC=gcc-3.4" make-kpkg --revision=20050217 kernel_image
--
Ralf Hildebrandt (i.A. des IT-Zentrum) [email protected]
Charite - Universit?tsmedizin Berlin Tel. +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin Fax. +49 (0)30-450 570-962
IT-Zentrum Standort CBF send no mail to [email protected]