2006-02-03 17:47:39

by Mark Salyzyn

[permalink] [raw]
Subject: RE: RAID5 unusably unstable through 2.6.14

Martin Drab [mailto:[email protected]] sez:
> no access was possible at all to that block device entirely.

Then 'we' are missing an offline message (from SCSI/block or from a
check of the controller's array status).

bd_claim locking out access?

-- Mark


2006-02-03 18:07:11

by Martin Drab

[permalink] [raw]
Subject: RE: RAID5 unusably unstable through 2.6.14

On Fri, 3 Feb 2006, Salyzyn, Mark wrote:

> Martin Drab [mailto:[email protected]] sez:
> > no access was possible at all to that block device entirely.
>
> Then 'we' are missing an offline message (from SCSI/block or from a
> check of the controller's array status).

Besides, when the disk goes offline, which is what happened to me before
due to the bad setting of the AAC_MAX_32BIT_SGBCOUNT constant in
aacraid.h, kernel adequately responses with messages saying something like
this:

[ 278.705813] scsi0 (0:0): rejecting I/O to offline device
[ 278.708685] Buffer I/O error on device sda2, logical block 1
[ 278.711589] lost page write due to I/O error on sda2

As you may see in my first report of the event when I've witnessed the
real situation of the array going offline, see the whole report here:

http://lkml.org/lkml/2005/7/5/194

However this time, it was different. I am a 100% positive that no such
messages appeared whatsoever. Only these:

sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key: Hardware Error
Additional sense: Internal target failure
Info fld=0x0
end_request: I/O error, dev sda, sector <some sector number>

Nothing else.

Martin