2007-12-07 13:15:54

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Possible EXT2 race



On linux-2.6.22.1, executing the following script
while the mailer is writing to /var/spool/mail/linux-os.....


#!/bin/bash
while true ;
do
>/var/spool/mail/linux-os;
sleep 1;
done

...will cause the following errors to occur.

Dec 7 04:05:55 chaos kernel: sd 0:0:1:0: [sdb] Sense Key : No Sense [deferred]
Dec 7 04:05:55 chaos kernel: Info fld=0x1980240
Dec 7 04:05:55 chaos kernel: sd 0:0:1:0: [sdb] Add. Sense: Peripheral device write fault
Dec 7 04:08:13 chaos kernel: attempt to access beyond end of device
Dec 7 04:08:13 chaos kernel: sdb1: rw=0, want=29687515944, limit=33736437
Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_xattr_delete_inode: inode 656387: block -584027804 read error
Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 3710940964, count = 1
Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1
Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1
Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 3710940980, count = 1
Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 3710940980, count = 1
Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: bit already cleared for block 1
Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_blocks: Freeing blocks not in datazone - block = 3710941012, count = 1
Dec 7 04:08:13 chaos kernel: attempt to access beyond end of device
Dec 7 04:08:13 chaos kernel: sdb1: rw=0, want=29687528104, limit=33736437
Dec 7 04:08:13 chaos kernel: EXT2-fs error (device sdb1): ext2_free_branches: Read failure, inode=656399, block=-584026284
Dec 7 04:08:13 chaos kernel: attempt to access beyond end of device
Dec 7 04:08:13 chaos kernel: sdb1: rw=0, want=29687529288, limit=33736437
Dec 7 04:08:15 chaos kernel: EXT2-fs error (device sdb1): ext2_xattr_delete_inode: inode 656400: block -584026136 read error
Dec 7 04:08:18 chaos kernel: EXT2-fs error (device sdb1): ext2_xattr_delete_inode: inode 656403: bad block 30188


Caution is advised when testing because this destroyed a filesystem,
making it unfixable by `fsck`.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.22.1 on an i686 machine (5588.29 BogoMips).
My book : http://www.AbominableFirebug.com/
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.


2007-12-07 16:51:17

by Dave Jones

[permalink] [raw]
Subject: Re: Possible EXT2 race

On Fri, Dec 07, 2007 at 08:15:42AM -0500, linux-os (Dick Johnson) wrote:

> Dec 7 04:05:55 chaos kernel: sd 0:0:1:0: [sdb] Add. Sense: Peripheral device write fault

This sounds more like a hardware problem.

Dave

--
http://www.codemonkey.org.uk

2007-12-07 18:58:59

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Possible EXT2 race


On Fri, 7 Dec 2007, Dave Jones wrote:

> On Fri, Dec 07, 2007 at 08:15:42AM -0500, linux-os (Dick Johnson) wrote:
>
> > Dec 7 04:05:55 chaos kernel: sd 0:0:1:0: [sdb] Add. Sense: Peripheral device write fault
>
> This sounds more like a hardware problem.
>
> Dave
>

There was an attempt to write beyond the end of the device because
everything in the file-system was getting trashed. I can read/write
the 5 year-old SCSI physical drive with no errors from both within
linux and through the Adaptec BIOS. This problem only occurs
when I attempt to truncate a file that is being written by another
task.

> --
> http://www.codemonkey.org.uk
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.22.1 on an i686 machine (5588.29 BogoMips).
My book : http://www.AbominableFirebug.com/
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2007-12-08 03:45:17

by Robert Hancock

[permalink] [raw]
Subject: Re: Possible EXT2 race

linux-os (Dick Johnson) wrote:
> On Fri, 7 Dec 2007, Dave Jones wrote:
>
>> On Fri, Dec 07, 2007 at 08:15:42AM -0500, linux-os (Dick Johnson) wrote:
>>
>>> Dec 7 04:05:55 chaos kernel: sd 0:0:1:0: [sdb] Add. Sense: Peripheral device write fault
>> This sounds more like a hardware problem.
>>
>> Dave
>>
>
> There was an attempt to write beyond the end of the device because
> everything in the file-system was getting trashed. I can read/write
> the 5 year-old SCSI physical drive with no errors from both within
> linux and through the Adaptec BIOS. This problem only occurs
> when I attempt to truncate a file that is being written by another
> task.

That SCSI error code doesn't sound like a reasonable one for the drive
getting a bad block address. The more typical one in that case would be
"Logical block address out of range", or maybe the catch-all "Invalid
field in CDB". "Peripheral device write fault", especially as a deferred
error (i.e. after the drive already returned a normal completion for the
data, and then is reporting the failure to actually write to the media
on the next command), really sounds like a drive problem.

And the kernel is supposed to trap those at the disk layer, like these
are saying it is, _after_ that error occurs:

Dec 7 04:08:13 chaos kernel: attempt to access beyond end of device
Dec 7 04:08:13 chaos kernel: sdb1: rw=0, want=29687515944, limit=33736437

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-12-08 08:58:39

by Jon Masters

[permalink] [raw]
Subject: Re: Possible EXT2 race


On Fri, 2007-12-07 at 13:58 -0500, linux-os (Dick Johnson) wrote:
> On Fri, 7 Dec 2007, Dave Jones wrote:
>
> > On Fri, Dec 07, 2007 at 08:15:42AM -0500, linux-os (Dick Johnson) wrote:
> >
> > > Dec 7 04:05:55 chaos kernel: sd 0:0:1:0: [sdb] Add. Sense: Peripheral device write fault
> >
> > This sounds more like a hardware problem.

> There was an attempt to write beyond the end of the device because
> everything in the file-system was getting trashed. I can read/write
> the 5 year-old SCSI physical drive with no errors from both within
> linux and through the Adaptec BIOS. This problem only occurs
> when I attempt to truncate a file that is being written by another
> task.

Well, that might be how you can reproduce it, but this is almost
certainly a hardware problem and not EXT2 at fault - the filesystem can
only do just so much once its data has been corrupted by an old disk.

Jon.