2006-02-19 19:09:53

by Maurice Volaski

[permalink] [raw]
Subject: ext3 involved in kernel panic in 2.6.13?

Dual Opteron system running ext3 atop drbd (network RAID) devices,
which, in turn, are atop LVM logical volumes. The underlying device
is hardware SCSI RAID via a LSILogic HBA. The kernel is vanilla
2.6.13 on a Gentoo-based system.

A panic occurred, which contains references to ext3 code.

I'm not sure how others manage to get these typed out, but I'm
manually typing it from what's on the monitor:

Call Trace: <IRQ> <ffffffff802820df>{i8042_interrupt+111}
<ffffffff80200080>{commit_timeout+0}
<ffffffff8013f143>{run_timer_softirq+387} <ffffffff8013b111>{__do_softirq+113}
<ffffffff8010ee63>{call_softirq+31} <ffffffff80110a55>{do_softirq+53}
<ffffffff8010e5c8>{apic_timer_interrupt+132} <EOI>
<ffffffff801fb8a6>{do_get_write_access+118}
<ffffffff801fb88e>{do_get_write_access+94} <ffffffff80185d1f>{__getblk+47}
<ffffffff80195170>{filldir+0} <ffffffff801fbf69>{journal_get_write_access+41}
<ffffffff801ec41c>{ext3_reserve_inode+write+76} <ffffffff80195170>{filldir+0}
<ffffffff801ec4d8>{ext3_mark_inode_dirty+56}
<ffffffff801fa9e5>{journal_start_229}
<ffffffff801ee571>{ext3_dirty_inode+113}
<ffffffff801a5604>{__mark_inode_dirty+52}
<ffffffff8019bd2b>{update_atime+123} <ffffffff80195016>{vfs_readdir+166}
<ffffffff801952e2>{syst_getdents+130} <ffffffff8019465e>{sys_fcntl+830}
<ffffffff8010dc46>{system_call+126}

Code: 8b 40 18 48 c1 e0 07 48 8b 98 08 58 5b 80 4c 01 e3 48 89 df
RIP <ffffffff8012f369>{try_to_wake_up+57} RSP <ffff810004827e88>
<0>Kernel panic - not syncing: Aiee, killing interrupt handler!
--

Maurice Volaski, [email protected]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University


2006-02-20 09:25:48

by Jan Kara

[permalink] [raw]
Subject: Re: ext3 involved in kernel panic in 2.6.13?

> Dual Opteron system running ext3 atop drbd (network RAID) devices,
> which, in turn, are atop LVM logical volumes. The underlying device
> is hardware SCSI RAID via a LSILogic HBA. The kernel is vanilla
> 2.6.13 on a Gentoo-based system.
>
> A panic occurred, which contains references to ext3 code.
>
> I'm not sure how others manage to get these typed out, but I'm
> manually typing it from what's on the monitor:

There should be more in the logs (just before the Call Trace:). Didn't
you capture also that information? Without it it is rather hard to find
out what was happening.

> Call Trace: <IRQ> <ffffffff802820df>{i8042_interrupt+111}
> <ffffffff80200080>{commit_timeout+0}
> <ffffffff8013f143>{run_timer_softirq+387}
> <ffffffff8013b111>{__do_softirq+113}
> <ffffffff8010ee63>{call_softirq+31} <ffffffff80110a55>{do_softirq+53}
> <ffffffff8010e5c8>{apic_timer_interrupt+132} <EOI>
> <ffffffff801fb8a6>{do_get_write_access+118}
> <ffffffff801fb88e>{do_get_write_access+94} <ffffffff80185d1f>{__getblk+47}
> <ffffffff80195170>{filldir+0}
> <ffffffff801fbf69>{journal_get_write_access+41}
> <ffffffff801ec41c>{ext3_reserve_inode+write+76}
> <ffffffff80195170>{filldir+0}
> <ffffffff801ec4d8>{ext3_mark_inode_dirty+56}
> <ffffffff801fa9e5>{journal_start_229}
> <ffffffff801ee571>{ext3_dirty_inode+113}
> <ffffffff801a5604>{__mark_inode_dirty+52}
> <ffffffff8019bd2b>{update_atime+123} <ffffffff80195016>{vfs_readdir+166}
> <ffffffff801952e2>{syst_getdents+130} <ffffffff8019465e>{sys_fcntl+830}
> <ffffffff8010dc46>{system_call+126}
>
> Code: 8b 40 18 48 c1 e0 07 48 8b 98 08 58 5b 80 4c 01 e3 48 89 df
> RIP <ffffffff8012f369>{try_to_wake_up+57} RSP <ffff810004827e88>
> <0>Kernel panic - not syncing: Aiee, killing interrupt handler!

Bye
Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2006-02-20 17:06:32

by Maurice Volaski

[permalink] [raw]
Subject: Re: ext3 involved in kernel panic in 2.6.13?

> > Dual Opteron system running ext3 atop drbd (network RAID) devices,
>> which, in turn, are atop LVM logical volumes. The underlying device
>> is hardware SCSI RAID via a LSILogic HBA. The kernel is vanilla
>> 2.6.13 on a Gentoo-based system.
>>
>> A panic occurred, which contains references to ext3 code.
>>
>> I'm not sure how others manage to get these typed out, but I'm
>> manually typing it from what's on the monitor:
>
> There should be more in the logs (just before the Call Trace:). Didn't
>you capture also that information? Without it it is rather hard to find
>out what was happening.

Unfortunately, crash information never appears in regular system log,
at least using the metalog logging program. I think I would have to
configure the netconsole to do that.

> > Call Trace: <IRQ> <ffffffff802820df>{i8042_interrupt+111}
>> <ffffffff80200080>{commit_timeout+0}
>> <ffffffff8013f143>{run_timer_softirq+387}
>> <ffffffff8013b111>{__do_softirq+113}
>> <ffffffff8010ee63>{call_softirq+31} <ffffffff80110a55>{do_softirq+53}
>> <ffffffff8010e5c8>{apic_timer_interrupt+132} <EOI>
>> <ffffffff801fb8a6>{do_get_write_access+118}
>> <ffffffff801fb88e>{do_get_write_access+94} <ffffffff80185d1f>{__getblk+47}
>> <ffffffff80195170>{filldir+0}
>> <ffffffff801fbf69>{journal_get_write_access+41}
>> <ffffffff801ec41c>{ext3_reserve_inode+write+76}
>> <ffffffff80195170>{filldir+0}
>> <ffffffff801ec4d8>{ext3_mark_inode_dirty+56}
>> <ffffffff801fa9e5>{journal_start_229}
>> <ffffffff801ee571>{ext3_dirty_inode+113}
>> <ffffffff801a5604>{__mark_inode_dirty+52}
>> <ffffffff8019bd2b>{update_atime+123} <ffffffff80195016>{vfs_readdir+166}
>> <ffffffff801952e2>{syst_getdents+130} <ffffffff8019465e>{sys_fcntl+830}
>> <ffffffff8010dc46>{system_call+126}
>>
>> Code: 8b 40 18 48 c1 e0 07 48 8b 98 08 58 5b 80 4c 01 e3 48 89 df
>> RIP <ffffffff8012f369>{try_to_wake_up+57} RSP <ffff810004827e88>
>> <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
>
> Bye
> Honza
>--
>Jan Kara <[email protected]>
>SuSE CR Labs


--

Maurice Volaski, [email protected]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University

2006-02-21 05:07:45

by Andreas Dilger

[permalink] [raw]
Subject: Re: ext3 involved in kernel panic in 2.6.13?

On Feb 19, 2006 14:09 -0500, Maurice Volaski wrote:
> A panic occurred, which contains references to ext3 code.
>
> I'm not sure how others manage to get these typed out,

Normally a serial console is best, and if you have at least 2 machines
you can cross-connect the serial ports with a NULL-modem cable and run
a terminal emulator (e.g. minicom) to log it to disk on the other system.
Having netdump is also a good choice, though maybe not quite as reliable
as a real serial console.

> but I'm manually typing it from what's on the monitor:
>
> Call Trace: <IRQ> <ffffffff802820df>{i8042_interrupt+111}
> <ffffffff80200080>{commit_timeout+0}
> <ffffffff8013f143>{run_timer_softirq+387}
> <ffffffff8013b111>{__do_softirq+113}
> <ffffffff8010ee63>{call_softirq+31} <ffffffff80110a55>{do_softirq+53}
> <ffffffff8010e5c8>{apic_timer_interrupt+132} <EOI>

At this point (and above) the process is in an IRQ handler, so it is likely
that the problem exists somewhere at that level. However, the critical
part of the oops is missing - what actually went wrong? It could be a BUG,
which is a kernel assertion, or it could be a bad pointer dereference, or
anything really. There is nothing here which indicates what the problem is.

> <ffffffff801fb8a6>{do_get_write_access+118}
> <ffffffff801fb88e>{do_get_write_access+94} <ffffffff80185d1f>{__getblk+47}
> <ffffffff80195170>{filldir+0}
> <ffffffff801fbf69>{journal_get_write_access+41}
> <ffffffff801ec41c>{ext3_reserve_inode+write+76}
> <ffffffff80195170>{filldir+0}
> <ffffffff801ec4d8>{ext3_mark_inode_dirty+56}
> <ffffffff801fa9e5>{journal_start_229}
> <ffffffff801ee571>{ext3_dirty_inode+113}
> <ffffffff801a5604>{__mark_inode_dirty+52}
> <ffffffff8019bd2b>{update_atime+123} <ffffffff80195016>{vfs_readdir+166}
> <ffffffff801952e2>{syst_getdents+130} <ffffffff8019465e>{sys_fcntl+830}
> <ffffffff8010dc46>{system_call+126}

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.