2003-07-11 13:53:29

by Peter Lojkin

[permalink] [raw]
Subject: 2.4.22-pre3 and reiserfs problem (not boot)

Hello,

I am not on the list so please CC me if replying...

After few hours of work with 2.4.22-pre3 (patched to solve mount problem) we got this (ksyms was unavailable):

Jul 10 06:25:41 host kernel: kernel BUG at prints.c:334!
Jul 10 06:25:41 host kernel: invalid operand: 0000
Jul 10 06:25:41 host kernel: CPU: 1
Jul 10 06:25:41 host kernel: EIP: 0010:[reiserfs_panic+41/96] Not tainted
Jul 10 06:25:41 host kernel: EFLAGS: 00010286
Jul 10 06:25:41 host kernel: eax: 00000024 ebx: c02da700 ecx: 00000097 edx: 01000000
Jul 10 06:25:41 host kernel: esi: f7e57000 edi: 00000000 ebp: f7e57000 esp: f7ed7ecc
Jul 10 06:25:41 host kernel: ds: 0018 es: 0018 ss: 0018
Jul 10 06:25:41 host kernel: Process kupdated (pid: 7, stackpage=f7ed7000)
Jul 10 06:25:41 host kernel: Stack: c02d89fa c03bf920 c02da700 f7ed7ef0 f8b10110 00000073 c01bb1df f7e57000
Jul 10 06:25:41 host kernel: c02da700 00000001 00000012 00000010 00000000 f8b10144 f8b10138 00000074
Jul 10 06:25:41 host kernel: 00000000 00000002 eea13500 c01beacb f7e57000 f8b10110 00000001 f7ed7f8c
Jul 10 06:25:41 host kernel: Call Trace: [flush_commit_list+675/920] [do_journal_end+1955/2668] [flush_old_commits+286/308] [reiserfs_write_super+56/104] [sync_supers+250/340]
Jul 10 06:25:41 host kernel: Code: 0f 0b 4e 01 00 8a 2d c0 68 20 f9 3b c0 85 f6 74 16 0f b7 46
Using defaults from ksymoops -t elf32-i386 -a i386


>>ebx; c02da700 <tails+ea4/1ff4>
>>edx; 01000000 Before first symbol
>>esi; f7e57000 <END_OF_CODE+37a5e204/????>
>>ebp; f7e57000 <END_OF_CODE+37a5e204/????>
>>esp; f7ed7ecc <END_OF_CODE+37adf0d0/????>

Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 0f 0b ud2a
Code; 00000002 Before first symbol
2: 4e dec %esi
Code; 00000003 Before first symbol
3: 01 00 add %eax,(%eax)
Code; 00000005 Before first symbol
5: 8a 2d c0 68 20 f9 mov 0xf92068c0,%ch
Code; 0000000b Before first symbol
b: 3b c0 cmp %eax,%eax
Code; 0000000d Before first symbol
d: 85 f6 test %esi,%esi
Code; 0000000f Before first symbol
f: 74 16 je 27 <_EIP+0x27> 00000027 Before first symbol
Code; 00000011 Before first symbol
11: 0f b7 46 00 movzwl 0x0(%esi),%eax

Jul 11 06:25:41 host kernel: kernel BUG at prints.c:341!
Jul 11 06:25:41 host kernel: invalid operand: 0000
Jul 11 06:25:41 host kernel: CPU: 0
Jul 11 06:25:41 host kernel: EIP: 0010:[reiserfs_panic+52/104] Not tainted
Jul 11 06:25:41 host kernel: EFLAGS: 00010286
Jul 11 06:25:41 host kernel: eax: 00000037 ebx: c02dc6a0 ecx: 00000002 edx: 02000000
Jul 11 06:25:41 host kernel: esi: f7e57000 edi: 00000000 ebp: f7e57000 esp: f7ed7eb8
Jul 11 06:25:41 host kernel: ds: 0018 es: 0018 ss: 0018
Jul 11 06:25:41 host kernel: Process kupdated (pid: 7, stackpage=f7ed7000)
Jul 11 06:25:41 host kernel: Stack: c02da97f c03c5c20 c03c1b80 00000841 c02dc6a0 f7ed7ee4 f8b1017c f7a10000
Jul 11 06:25:41 host kernel: c01bc1fe f7e57000 c02dc6a0 00000002 00000012 00000010 00000000 f8b100b4
Jul 11 06:25:41 host kernel: f8b101b0 f8b101a4 00000077 00000000 00000003 eefcf7a0 c01c01ad f7e57000
Jul 11 06:25:41 host kernel: Call Trace: [flush_commit_list+658/904] [do_journal_end+1989/2764] [journal_mark_dirty+490/792] [flush_old_commits+295/320] [reiserfs_write_super+56/108]
Jul 11 06:25:41 host kernel: Code: 0f 0b 55 01 92 a9 2d c0 68 20 5c 3c c0 85 f6 74 13 0f b7 46


>>ebx; c02dc6a0 <MAX_KEY+e40/3fb8>
>>edx; 02000000 Before first symbol
>>esi; f7e57000 <END_OF_CODE+37a5e204/????>
>>ebp; f7e57000 <END_OF_CODE+37a5e204/????>
>>esp; f7ed7eb8 <END_OF_CODE+37adf0bc/????>

Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 0f 0b ud2a
Code; 00000002 Before first symbol
2: 55 push %ebp
Code; 00000003 Before first symbol
3: 01 92 a9 2d c0 68 add %edx,0x68c02da9(%edx)
Code; 00000009 Before first symbol
9: 20 5c 3c c0 and %bl,0xffffffc0(%esp,%edi,1)
Code; 0000000d Before first symbol
d: 85 f6 test %esi,%esi
Code; 0000000f Before first symbol
f: 74 13 je 24 <_EIP+0x24> 00000024 Before first symbol
Code; 00000011 Before first symbol
11: 0f b7 46 00 movzwl 0x0(%esi),%eax

1 warning and 1 error issued. Results may not be reliable.



2003-07-11 14:18:10

by Oleg Drokin

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs problem (not boot)

Hello!

On Fri, Jul 11, 2003 at 06:08:08PM +0400, "Peter Lojkin" wrote:

> After few hours of work with 2.4.22-pre3 (patched to solve mount problem) we got this (ksyms was unavailable):

There was one more reiserfs message in kernel log just before this line, can you please include it?

> Jul 10 06:25:41 host kernel: kernel BUG at prints.c:334!
> Jul 10 06:25:41 host kernel: invalid operand: 0000
> Jul 10 06:25:41 host kernel: CPU: 1
> Jul 10 06:25:41 host kernel: EIP: 0010:[reiserfs_panic+41/96] Not tainted

Bye,
Oleg

2003-07-11 15:26:47

by Peter Lojkin

[permalink] [raw]
Subject: Re[2]: 2.4.22-pre3 and reiserfs problem (not boot)

Hello,

> There was one more reiserfs message in kernel log just before this
> line, can you please include it?
>
> > Jul 10 06:25:41 host kernel: kernel BUG at prints.c:334!
> > Jul 10 06:25:41 host kernel: invalid operand: 0000
> > Jul 10 06:25:41 host kernel: CPU: 1
> > Jul 10 06:25:41 host kernel: EIP: 0010:[reiserfs_panic+41/96] Not tainted

right. ksymoops cut it out so i missed it.

Jul 10 06:25:10 host kernel: journal-601, buffer write failed

another thing to note, both oopses happend exactly at 06:25:41 (Jul 10 and 11), and both times there were "journal-601, buffer write failed"
close prior to it.

i missed a lot info in original message, sorry.
here it is:

the box is dual p3, serverworks le chipset, 1gb memory, integrated
dual-channel adaptec 7899a, intel e1000, 4 scsi disks, scsi promise
ide-raid box attached.

local disks form md0 with raid5 with total size of ~130gb.
promise box also in raid5 mode with total size of ~1.3tb.
both use reiserfs.

in the logs we often get messages like:
Jul 11 14:25:59 host kernel: (scsi0:A:10:0): parity error detected in Data-out phase. SEQADDR(0x55) SCSIRATE(0xc2)
Jul 11 14:25:59 host kernel: ^INo terminal CRC packet recevied

_but_ with 2.4.21-rc? kernel it cause no problems and no data loss.
promise box itself doesn't detect any errors.
i've checked the list and found coule of messages about such "parity
errors" in recent kernels, but no solution or any info about it
causing problems.
hoping to get rid of this messages i've tried 2.4.22-pre3 and got
oopses...


2003-07-11 15:36:17

by Oleg Drokin

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs problem (not boot)

Hello!

On Fri, Jul 11, 2003 at 07:41:03PM +0400, "Peter Lojkin" wrote:

> > There was one more reiserfs message in kernel log just before this
> > line, can you please include it?
> right. ksymoops cut it out so i missed it.
> Jul 10 06:25:10 host kernel: journal-601, buffer write failed

Well, the write to journal failed. Reiserfs panics in such an event as it does not
know what to do in such a case (there are some works at SuSE by Jeff Mahoney to
remount r/o if such an event happens).

> another thing to note, both oopses happend exactly at 06:25:41 (Jul 10 and 11), and both times there were "journal-601, buffer write failed"
> close prior to it.

Well, how about some i/o error messages from block device drivers?

> in the logs we often get messages like:
> Jul 11 14:25:59 host kernel: (scsi0:A:10:0): parity error detected in Data-out phase. SEQADDR(0x55) SCSIRATE(0xc2)
> Jul 11 14:25:59 host kernel: ^INo terminal CRC packet recevied

Hm, can that lead to i/o error propagated up to reiserfs? If yes, then thats' the problem.

> _but_ with 2.4.21-rc? kernel it cause no problems and no data loss.
> promise box itself doesn't detect any errors.
> i've checked the list and found coule of messages about such "parity
> errors" in recent kernels, but no solution or any info about it
> causing problems.
> hoping to get rid of this messages i've tried 2.4.22-pre3 and got
> oopses...

Hm, I guess you need to stop the driver to propagate i/o errors upstream
(perhaps find a recent change that started to do this).
There is nothing to do from reiserfs perspective (except for better error handling,
which will not do you anything good anyway).

Bye,
Oleg

2003-07-11 15:52:26

by Peter Lojkin

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs problem (not boot)

> > Jul 10 06:25:10 host kernel: journal-601, buffer write failed
>
> Well, the write to journal failed. Reiserfs panics in such an event as it does not
> know what to do in such a case (there are some works at SuSE by Jeff Mahoney to
> remount r/o if such an event happens).
yes, once i found the "buffer write failed" i knew it wasn't a random
reiserfs oops. just missed it first time, sorry.

and i think that close timming of oopses was caused by some cron job
started at this time, the one that does search through entire fs tree...

> Well, how about some i/o error messages from block device drivers?
>
> > in the logs we often get messages like:
> > Jul 11 14:25:59 host kernel: (scsi0:A:10:0): parity error detected in Data-out phase. SEQADDR(0x55) SCSIRATE(0xc2)
> > Jul 11 14:25:59 host kernel: ^INo terminal CRC packet recevied
>
> Hm, can that lead to i/o error propagated up to reiserfs? If yes,
> then thats' the problem.
sure if there were real errors, but with earlier kernels we get
this errors in logs but no problems or data loss. strange...

> Hm, I guess you need to stop the driver to propagate i/o errors upstream
> (perhaps find a recent change that started to do this).
> There is nothing to do from reiserfs perspective (except for better error handling,
> which will not do you anything good anyway).
well i cannot do a lot of reboots on this box, so i guess i just
try to move promise to another host with another scsi hba and see if
it works...

Big thanks for quick reply!