2004-01-17 01:04:39

by Brad Tilley

[permalink] [raw]
Subject: Re: [Re: Possible Bug in 2.4.24???]



Marcelo Tosatti <[email protected]> wrote:

>
>
> On Fri, 16 Jan 2004, Brad Tilley wrote:
>
> > While running a script that recursively changes permissions on a ftp
> > directory, I received an error to the term window where the script was
> > running. I then checked out /var/log/messages and saw the below kernel
errors.
> > The machine was generally unresponsive and had to be physically rebooted
at
> > the power switch. It worked fine upon reboot an fsck ran w/o producing
any
> > error... the script ran fine too. This is a HP XW4100 with a P4, 1.5GB DDR
RAM
> > and two very fast (15,000 RPM), very large (140GB) SCSI HDDs. It had been
up
> > for 9 days (since compiling and installing 2.4.24) and has worked fine
until
> > this point. Could someone tell me if this is or isn't a kernel bug?
> >
> >
> > Jan 16 11:50:43 athop1 kernel: SCSI disk error : host 0 channel 0 id 1 lun
0
> > return code = 8000002
> > Jan 16 11:50:43 athop1 kernel: Info fld=0x2cd1bd9, Current sd08:15: sense
key
> > Hardware Error
> > Jan 16 11:50:43 athop1 kernel: Additional sense indicates Internal target
> > failure
> > Jan 16 11:50:43 athop1 kernel: I/O error: dev 08:15, sector 54128
> > Jan 16 11:50:43 athop1 kernel: journal-601, buffer write failed
> > Jan 16 11:50:43 athop1 kernel: (device sd(8,21))
> > Jan 16 11:50:43 athop1 kernel: kernel BUG at prints.c:341!
> > Jan 16 11:50:43 athop1 kernel: invalid operand: 0000
> > Jan 16 11:50:43 athop1 kernel: CPU: 0
> > Jan 16 11:50:43 athop1 kernel: EIP: 0010:[<c0189878>] Tainted: P
>
> Brad,
>
> A device error happened (you see the "SCSI disk error : " message and
> "Additional sense indicates Internal target failure") which reiserfs
> could not handle.
>
> kernel BUG at prints.c:341 == reiserfs_panic().

Thanks for the reply Marcelo,

Does this mean that there is a physical or mechanical problem with the drive
itself? I do use
reiserfs as it's the best fs available for my purposes. Could the drive
attempt to write
outside its physical bounds? Move the arm right when it was instructed to go
left? I don't
understand how the drive could have an error w/o affecting the filesystem.




2004-01-17 05:11:51

by Oleg Drokin

[permalink] [raw]
Subject: Re: [Re: Possible Bug in 2.4.24???]

Hello!

On Fri, Jan 16, 2004 at 08:04:28PM -0500, Brad Tilley wrote:
> > > Jan 16 11:50:43 athop1 kernel: I/O error: dev 08:15, sector 54128
> > > Jan 16 11:50:43 athop1 kernel: journal-601, buffer write failed
> > > Jan 16 11:50:43 athop1 kernel: (device sd(8,21))
> > A device error happened (you see the "SCSI disk error : " message and
> > "Additional sense indicates Internal target failure") which reiserfs
> > could not handle.
> > kernel BUG at prints.c:341 == reiserfs_panic().
> Does this mean that there is a physical or mechanical problem with the drive
> itself? I do use

Yes it does.

> reiserfs as it's the best fs available for my purposes. Could the drive
> attempt to write
> outside its physical bounds? Move the arm right when it was instructed to go

The sector for I/O error is 54128, which is somewhere withing journal (at the
beginning of a disk). What was the problem inside of the drive is not very
clear, as modern drives are sort of black-boxes.

> left? I don't
> understand how the drive could have an error w/o affecting the filesystem.

Well, there was affect on filesystem - the write have failed.
Also may be later that block was remapped, or that was internal drive's logic
failure or something else like that.
This journal block won't be used on subsequent mount (because transaction
was not closed), but will be just
overwritten. So even if its content was corrupted, reiserfs does not care.

Bye,
Oleg

2004-01-17 06:24:10

by Mike Fedyk

[permalink] [raw]
Subject: Re: Possible Bug in 2.4.24???]

On Sat, Jan 17, 2004 at 07:11:07AM +0200, Oleg Drokin wrote:
> Well, there was affect on filesystem - the write have failed.
> Also may be later that block was remapped, or that was internal drive's logic
> failure or something else like that.
> This journal block won't be used on subsequent mount (because transaction
> was not closed), but will be just
> overwritten. So even if its content was corrupted, reiserfs does not care.

I'd also suggest to brad that he replace the drive ASAP.

Mike