2002-09-11 21:14:15

by Russell King

[permalink] [raw]
Subject: 2.4.19 SCSI core bug?

Hi,

I've just been running some of my tests on 2.4.19 on ARM, and came
across the following bug, which seems to be somewhere in the bowls of
the SCSI layer.

I'm using drivers/acorn/scsi/powertec.c, which uses fas216.c as the
core chipset driver (I authored both of these files) which contains
some pretty rigorous sanity checking.

Basically, what seems to be happening is that scsi can go wrong after
a medium error, and repeatedly pass invalid requests back down to the
HBA. My HBA driver dislikes invalid requests thusly:

scsi0.2: bad request buffer length 38400, should be 130048

38400 is SCpnt->request_bufflen.
130048 is the total number of bytes in the SG list.

Dumping out the offending command:
Read (10) 00 00 00 46 00 00 00 fe 00

Ok, so we were asking for 0xfe 512-byte sectors, which is 130048.
So why did SCSI tell me that it wanted 38400 bytes in
SCpnt->request_bufflen?

My workaround for this is to forcefully set request_bufflen
correctly, which appears to work. However, the SCSI subsystem
completely stops making progress at this point. (It just
repeats the above command indefinitely.)

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html


2002-09-12 08:56:55

by Russell King

[permalink] [raw]
Subject: Re: 2.4.19 SCSI core bug?

On Wed, Sep 11, 2002 at 10:19:00PM +0100, Russell King wrote:
> Ok, so we were asking for 0xfe 512-byte sectors, which is 130048.
> So why did SCSI tell me that it wanted 38400 bytes in
> SCpnt->request_bufflen?

Ok, problem found.

There's a nice loop in scsi_send_eh_cmnd() which just loops endlessly
trying to retry a SCpnt command on medium error without restoring it
to its pristine state before giving it back to the host, or limiting
the number of retries.

The end result is that the SCSI command says N sectors, the request
list may say N * sector size bytes, but SCpnt->request_bufflen may
be complete rubbish, since the HBA driver is allowed to modify this
field during the processing of a command.

We do have a specific function to restore the SCSI command data to
a pristine state (scsi_eh_retry_command).

Ok, so there's two bugs here:

1. It's possible for SCSI to completely bring a box to a complete
standstill when it encounters a SCSI error.

2. Not retrying the command in its correct state.

I'll look at cooking up a patch for both of these in the next few days
or so.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-09-12 15:33:47

by Mike Anderson

[permalink] [raw]
Subject: Re: 2.4.19 SCSI core bug?

Russell King [[email protected]] wrote:
> On Wed, Sep 11, 2002 at 10:19:00PM +0100, Russell King wrote:
> > Ok, so we were asking for 0xfe 512-byte sectors, which is 130048.
> > So why did SCSI tell me that it wanted 38400 bytes in
> > SCpnt->request_bufflen?
>
> Ok, problem found.
>
> There's a nice loop in scsi_send_eh_cmnd() which just loops endlessly
> trying to retry a SCpnt command on medium error without restoring it
> to its pristine state before giving it back to the host, or limiting
> the number of retries.


The scsi_eh_completed_normally function should be limiting your retries.
So you should not be looping endlessly unless the problem of sending
down a dirty command is causing another issue.

I have a cleanup patch for 2.5 scsi_error I will add this fix in.
scsi_send_eh_cmnd should not be retrying the command it should return to
the caller the status and let them decide. We also should create a
?restore_scsi_cb? function that is shared so that it is done
consistently.

Eventually the retry policy is going to be changed, but until then we
should fix this problem.

-andmike
--
Michael Anderson
[email protected]

2002-09-12 15:37:15

by Russell King

[permalink] [raw]
Subject: Re: 2.4.19 SCSI core bug?

On Thu, Sep 12, 2002 at 08:39:23AM -0700, Mike Anderson wrote:
> I have a cleanup patch for 2.5 scsi_error I will add this fix in.
> scsi_send_eh_cmnd should not be retrying the command it should return to
> the caller the status and let them decide. We also should create a
> ?restore_scsi_cb? function that is shared so that it is done
> consistently.

I've got a patch for both of these now. I'm working through some other
issues at the moment though (like, why the hell requests for sectors
after the sector in error don't have a data phase, and the drive returns
status = GOOD, message = COMMAND COMPLETE, leading the kernel to believe
that it did transfer data.)

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-09-12 16:40:45

by Russell King

[permalink] [raw]
Subject: Re: 2.4.19 SCSI core bug?

On Thu, Sep 12, 2002 at 04:42:01PM +0100, Russell King wrote:
> On Thu, Sep 12, 2002 at 08:39:23AM -0700, Mike Anderson wrote:
> > I have a cleanup patch for 2.5 scsi_error I will add this fix in.
> > scsi_send_eh_cmnd should not be retrying the command it should return to
> > the caller the status and let them decide. We also should create a
> > ?restore_scsi_cb? function that is shared so that it is done
> > consistently.
>
> I've got a patch for both of these now. I'm working through some other
> issues at the moment though (like, why the hell requests for sectors
> after the sector in error don't have a data phase, and the drive returns
> status = GOOD, message = COMMAND COMPLETE, leading the kernel to believe
> that it did transfer data.)

<mode=fact>
Ok, I've found the cause of this one. Lets assume we're issuing a
READ10 command (transfer 1) for blocks N to N + L, and N + B is bad.
( L >= 0, B >= 0, B < L )

We successfully transfer blocks N to N + (B - 1) inclusive to the host
without problem. We think B is bad, so we then request N + B + 1 to
N + L blocks from the device using a READ10 command (transfer 2.)

The device immediately comes back without any data, but with GOOD status
and COMMAND COMPLETED message. In other words, for transfer 2's command,
it goes through the states:

Message Out -> Command Out -> Status In -> Message In

Now, the interesting thing is that the READ10 commands have a special
bit - FUA or Force Unit Access. When I set this bit in _all_ READ10
commands, I _do not_ see this behaviour.

<mode=theory>
It turns out that my drive appears to cache the request for transfer 1
(blocks N to N + L), and tries to satisfy the request from its cache.
However, since block N + B was in error, it appears fill its cache
only for N to N + (B - 1), but reserve N to N + L.

When a request with the FUA bit clear for a block between N + B and
N + L is received, it tries to satisfy it from its cache, which
doesn't contain the data. It thus responds without a data phase,
indicating that the command was successful and without error.

<mode=fact>
This means that, using READ10, it is possible to issue READ commands
to the drive, and have the drive respond without any error, yet NOT
actually read _or_ transfer any data to the host. Think about what
can happen to your ext2fs superblocks / bitmaps / inode tables in
this situation, and what happens if something writes back the page to
the drive...

I think we have two options:

1. we need to come up with a way of reasonably handing SCSI medium
errors if we are going to use the READ10 command with the FUA
bit clear.

2. we could just set the FUA bit and bypass the drives on-board
cache completely.

Note that the only reason I've found this is because my HBA drives
are _really_ pedantic about checking that all expected data does
in fact get transferred by the drive.

I wonder how many other drives out there are buggy like this. 8/

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-09-12 21:18:22

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: 2.4.19 SCSI core bug?

>1. we need to come up with a way of reasonably handing SCSI medium
> errors if we are going to use the READ10 command with the FUA
> bit clear.
>
>2. we could just set the FUA bit and bypass the drives on-board
> cache completely.
>
>Note that the only reason I've found this is because my HBA drives
>are _really_ pedantic about checking that all expected data does
>in fact get transferred by the drive.
>
>I wonder how many other drives out there are buggy like this. 8/

What about setting FUA on the next command after an error ?

Would this work or may the disk really keep stale data around ?

I don't have a recent SCSI spec at hand, but is there a command
we can send after an error to force a cache flush & invalidate
or will it only flush dirty datas to platter and not invalidate ?

Ben.