2000-11-10 14:16:08

by David Ford

[permalink] [raw]
Subject: [bug] kernel panic related to reiserfs, 2.4.0-test11-pre1 and 3.6.18

Over the last three weeks my box has been locking up w/ a black screen
of death. This time I had kdb patched in and got the following:

Entering kdb (current=0xcf906000, pid 16808) Panic: invalid operand
due to panic @ 0xc0163d7a
eax = 0x0000001a ebx = 0xcf907d8c ecx = 0xcf906000 edx = 0xcd3cde00
esi = 0xc36fc494 edi = 0xcf907cd4 esp = 0xcf907c78 eip = 0xc0163d7a
ebp = 0xcf907cd4 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010246
xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff &regs =
0xcf907c44
kdb> bt
EBP EIP Function(args)
0xcf907cd4 0xc0163d7a de_still_valid+0x3e (0xcf4dcf80, 0xa, 0xcf907d8c)
kernel .text 0xc0100000 0xc0163d3c
0xc0163e18
0xcf907cf0 0xc0163e31 entry_points_to_object+0x19 (0xcf4dcf80, 0xa,
0xcf907d8c, 0xc6f053e0, 0xce3a39a8)
kernel .text 0xc0100000 0xc0163e18
0xc0163e98
0xcf907f04 0xc01642fa reiserfs_rename+0x432 (0xce3a3940, 0xc974d2c0,
0xce3a3940, 0xcf4dcf20)
kernel .text 0xc0100000 0xc0163ec8
0xc0164610
0xcf907f2c 0xc0139c94 vfs_rename_other+0x26c (0xce3a3940, 0xc974d2c0,
0xce3a3940, 0xcf4dcf20)
kernel .text 0xc0100000 0xc0139a28
0xc0139cec
0xcf907f50 0xc0139d25 vfs_rename+0x39 (0xce3a3940, 0xc974d2c0,
0xce3a3940, 0xcf4dcf20)
kernel .text 0xc0100000 0xc0139cec
0xc0139d7c
0xcf907fbc 0xc0139ef9 sys_rename+0x17d (0xbfffd824, 0x809c160,
0x809c160, 0xbfffd824, 0x811b048)
kernel .text 0xc0100000 0xc0139d7c
0xc0139f80
0xc010aadb system_call+0x33
kernel .text 0xc0100000 0xc010aaa8
0xc010aae0

kdb> sr s
SysRq: Emergency Sync
kdb> g
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0163d7a>]
EFLAGS: 00010246
eax: 0000001a ebx: cf907d8c ecx: cf906000 edx: cd3cde00
esi: c36fc494 edi: cf907cd4 ebp: cf907cd4 esp: cf907c78
ds: 0018 es: 0018 ss: 0018
Process sendmail (pid: 16808, stackpage=cf907000)
Stack: c02ba2e5 c02ba37d 0000004d cf907d8c c6f053e0 c6f053e0 c0cd73c0
00000012
c36fc1c8 00000005 c36fc444 00000010 0000000a c36fc6e4 00000000
0000057b
00002b2e 0000056c 0000057b 32e1fd80 000001f4 00000000 ce8fe472
cf907cf0
Call Trace: [<c02ba2e5>] [<c02ba37d>] [<c0163e31>] [<c01642fa>]
[<c02bf60f>] [<c0139c94>] [<c0139d25>]
[<c0139ef9>] [<c010aadb>]
Code: 0f 0b 8b 7d c4 8b 45 bc 8b 4d c8 0f b7 57 14 03 50 34 89 c8


I have been writing code on it for the last two days straight. It was
fully functional until I left for 15 minutes for a shower. I came back
and the system is hosed. Everything is quickly going to D state. I can
move and type until I attempt to execute or reference anything. It's
all downhill from there.

It is 2.4.0-test11-pre1 with reiserfs 3.6.18.

With kdb, after the panic happens, I can hit 'sr s' then 'g', it will
OOPS (process sendmail) then continue. Without kdb, I am SOL and have
to hit the power button. sysrq won't react.

-d

--
"The difference between 'involvement' and 'commitment' is like an
eggs-and-ham breakfast: the chicken was 'involved' - the pig was
'committed'."



Attachments:
david.vcf (176.00 B)
Card for David Ford

2000-11-10 14:20:09

by Michael Rothwell

[permalink] [raw]
Subject: Re: [bug] kernel panic related to reiserfs, 2.4.0-test11-pre1 and 3.6.18

David Ford wrote:

> With kdb, after the panic happens, I can hit 'sr s' then 'g', it will
> OOPS (process sendmail) then continue. Without kdb, I am SOL and have
> to hit the power button. sysrq won't react.

Debugger good.

2000-11-10 15:32:11

by Chris Mason

[permalink] [raw]
Subject: Re: [reiserfs-list] [bug] kernel panic related to reiserfs, 2.4.0-test11-pre1 and 3.6.18



On Friday, November 10, 2000 06:15:40 -0800 David Ford <[email protected]>
wrote:

> Over the last three weeks my box has been locking up w/ a black screen
> of death. This time I had kdb patched in and got the following:
>
> Entering kdb (current=0xcf906000, pid 16808) Panic: invalid operand
> due to panic @ 0xc0163d7a

Odd. There is nothing in de_still_valid that should panic, unless there is
some major memory corruption going on. Do you always get the same trace?

[ ... ]

> I have been writing code on it for the last two days straight. It was
> fully functional until I left for 15 minutes for a shower. I came back
> and the system is hosed. Everything is quickly going to D state. I can
> move and type until I attempt to execute or reference anything. It's
> all downhill from there.
>
> It is 2.4.0-test11-pre1 with reiserfs 3.6.18.
>
> With kdb, after the panic happens, I can hit 'sr s' then 'g', it will
> OOPS (process sendmail) then continue. Without kdb, I am SOL and have
> to hit the power button. sysrq won't react.
>

Once you get inside reiserfs_rename, you've started a transaction. If you
oops inside there, the transaction never finishes, and all the other
transactions end up waiting on it. So, if you can continue without hanging
the box, the oops didn't happen in reiserfs_rename ;-) Could you send a
decoded version?

-chris

2000-11-10 23:39:15

by David Ford

[permalink] [raw]
Subject: Re: [reiserfs-list] [bug] kernel panic related to reiserfs,2.4.0-test11-pre1 and 3.6.18

This is the first time I had kdb running. All other times I lost the console
from the deadlock. The deadlock is bad enough to prevent any more access to
basically everything. I'm still running the same kernel and will do so for
several more days. It usually happens every two to three days.

Chris Mason wrote:

> On Friday, November 10, 2000 06:15:40 -0800 David Ford <[email protected]>
> wrote:
>
> > Over the last three weeks my box has been locking up w/ a black screen
> > of death. This time I had kdb patched in and got the following:
> >
> > Entering kdb (current=0xcf906000, pid 16808) Panic: invalid operand
> > due to panic @ 0xc0163d7a
>
> Odd. There is nothing in de_still_valid that should panic, unless there is
> some major memory corruption going on. Do you always get the same trace?
>
> [ ... ]
>
> > I have been writing code on it for the last two days straight. It was
> > fully functional until I left for 15 minutes for a shower. I came back
> > and the system is hosed. Everything is quickly going to D state. I can
> > move and type until I attempt to execute or reference anything. It's
> > all downhill from there.
> >
> > It is 2.4.0-test11-pre1 with reiserfs 3.6.18.
> >
> > With kdb, after the panic happens, I can hit 'sr s' then 'g', it will
> > OOPS (process sendmail) then continue. Without kdb, I am SOL and have
> > to hit the power button. sysrq won't react.
> >
>
> Once you get inside reiserfs_rename, you've started a transaction. If you
> oops inside there, the transaction never finishes, and all the other
> transactions end up waiting on it. So, if you can continue without hanging
> the box, the oops didn't happen in reiserfs_rename ;-) Could you send a
> decoded version?
>
> -chris

No, I can't continue. Everything is halted beyond that. I can type into an
rxvt only until I hit enter, then that rxvt is hosed. I cannot even jump from
X to console. Even input via the serial console was dead. Luckily the
keyboard remained alive, I suspect it was only from the grace of kdb.
Normally it is also dead.

>>EIP; c0163d7a <de_still_valid+3e/dc> <=====
Trace; c02ba2e5 <devfsd_buf_size+1c45/8e58>
Trace; c02ba37d <devfsd_buf_size+1cdd/8e58>
Trace; c0163e31 <entry_points_to_object+19/80>
Trace; c01642fa <reiserfs_rename+432/748>
Trace; c02bf60f <devfsd_buf_size+6f6f/8e58>
Trace; c0139c94 <vfs_rename_other+26c/2c4>
Trace; c0139d25 <vfs_rename+39/90>
Trace; c0139ef9 <sys_rename+17d/204>
Trace; c010aadb <system_call+33/38>
Code; c0163d7a <de_still_valid+3e/dc>
00000000 <_EIP>:
Code; c0163d7a <de_still_valid+3e/dc> <=====
0: 0f 0b ud2a <=====
Code; c0163d7c <de_still_valid+40/dc>
2: 8b 7d c4 mov 0xffffffc4(%ebp),%edi
Code; c0163d7f <de_still_valid+43/dc>
5: 8b 45 bc mov 0xffffffbc(%ebp),%eax
Code; c0163d82 <de_still_valid+46/dc>
8: 8b 4d c8 mov 0xffffffc8(%ebp),%ecx
Code; c0163d85 <de_still_valid+49/dc>
b: 0f b7 57 14 movzwl 0x14(%edi),%edx
Code; c0163d89 <de_still_valid+4d/dc>
f: 03 50 34 add 0x34(%eax),%edx
Code; c0163d8c <de_still_valid+50/dc>
12: 89 c8 mov %ecx,%eax

-d


--
"The difference between 'involvement' and 'commitment' is like an
eggs-and-ham breakfast: the chicken was 'involved' - the pig was
'committed'."



Attachments:
david.vcf (176.00 B)
Card for David Ford