I'm running 2.4.19, using AIC7XXX 6.2.8.
SCSI devices are 0:0:0 hard disk and 0:6:0 CDR.
During CD burning, errors sometimes occur and aic7xxx driver
sets the CDR offline. Is there a way to reset the device and
set it online again _without_rebooting_ ?
I tried different utilities from "scsitools" and "sg3-utils"
packages (also sg_reset) without any luck. Any suggestion?
Boot:
Aug 31 22:48:51 jerolim kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
Aug 31 22:48:51 jerolim kernel: <Adaptec aic7896/97 Ultra2 SCSI adapter>
Aug 31 22:48:51 jerolim kernel: aic7896/97: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs
Aug 31 22:48:51 jerolim kernel: Vendor: IBM Model: DNES-309170W Rev: SA30
Aug 31 22:48:51 jerolim kernel: Type: Direct-Access ANSI SCSI revision: 03
Aug 31 22:48:51 jerolim kernel: Vendor: SONY Model: CD-R CDU948S Rev: 1.0j
Aug 31 22:48:51 jerolim kernel: Type: CD-ROM ANSI SCSI revision: 02
Aug 31 22:48:51 jerolim kernel: scsi0:A:0:0: Tagged Queuing enabled. Depth 8
SCSI abort, CDR offline:
Sep 2 11:26:26 jerolim kernel: DevQ(0:0:0): 0 waiting
Sep 2 11:26:26 jerolim kernel: DevQ(0:6:0): 0 waiting
Sep 2 11:26:26 jerolim kernel: (scsi0:A:6:0): Queuing a recovery SCB
Sep 2 11:26:26 jerolim kernel: scsi0:0:6:0: Device is disconnected, re-queuing SCB
Sep 2 11:26:26 jerolim kernel: Recovery code sleeping
Sep 2 11:26:26 jerolim kernel: (scsi0:A:6:0): Abort Message Sent
Sep 2 11:26:26 jerolim kernel: Recovery code awake
Sep 2 11:26:26 jerolim kernel: Timer Expired
Sep 2 11:26:26 jerolim kernel: aic7xxx_abort returns 0x2003
Sep 2 11:26:26 jerolim kernel: scsi0:0:6:0: Attempting to queue a TARGET RESET message
Sep 2 11:26:26 jerolim kernel: aic7xxx_dev_reset returns 0x2003
Sep 2 11:26:26 jerolim kernel: Recovery SCB completes
Sep 2 11:26:26 jerolim kernel: scsi: device set offline - not ready or command retry failed after bus reset: host 0 channel 0 id 6 lun 0
Sep 2 11:26:26 jerolim kernel: SCSI cdrom error : host 0 channel 0 id 6 lun 0 return code = 10000
-Igor Mozetic
> I'm running 2.4.19, using AIC7XXX 6.2.8.
> SCSI devices are 0:0:0 hard disk and 0:6:0 CDR.
> During CD burning, errors sometimes occur and aic7xxx driver
> sets the CDR offline. Is there a way to reset the device and
> set it online again _without_rebooting_ ?
I don't know that any mechanism currently exists. It shouldn't be
too hard to create on though. Just modify the proc handler in
drivers/scsi/scsi.c.
While your looking at that, I would like to better understand why the
device is being set offline. The message listing you've provided is
not complete. If you send me all of the messages output by the driver
from boot through failure I will try to diagnose your problem.
--
Justin
On Mon, Sep 02, 2002 at 09:50:11AM -0600, Justin T. Gibbs wrote:
> > I'm running 2.4.19, using AIC7XXX 6.2.8.
> > SCSI devices are 0:0:0 hard disk and 0:6:0 CDR.
> > During CD burning, errors sometimes occur and aic7xxx driver
> > sets the CDR offline. Is there a way to reset the device and
> > set it online again _without_rebooting_ ?
>
> I don't know that any mechanism currently exists. It shouldn't be
> too hard to create on though. Just modify the proc handler in
> drivers/scsi/scsi.c.
>
> While your looking at that, I would like to better understand why the
> device is being set offline. The message listing you've provided is
> not complete. If you send me all of the messages output by the driver
> from boot through failure I will try to diagnose your problem.
Actually, it looked to me like there was a bus hang, a device reset, the
driver returned a complete reset to the error handler thread, then the
error handler thread kicked the queue before the CD was ready to accept
commands again and as a result of sense info saying as much the mid layer
took the device off line. So, in short, the mid layer isn't waiting long
enough, or when it gets sense indicated not ready it needs to implement a
waiting queue with a timeout to try rekicking things a few times and don't
actually mark the device off line until a longer period of time has
elasped without the device coming back.
As for getting it to be not off line without rebooting, just do a this:
echo "scsi-remove-single-device 0 0 6 0" > /proc/scsi/scsi
echo "scsi-add-single-device 0 0 6 0" > /proc/scsi/scsi
That'll remove the device and then rescan it. Assuming it's had enough
time to complete the reset by the time you do this and it's once again
ready to accept commands, this should get your CD back working.
--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
Doug Ledford writes:
> took the device off line. So, in short, the mid layer isn't waiting long
> enough, or when it gets sense indicated not ready it needs to implement a
> waiting queue with a timeout to try rekicking things a few times and don't
> actually mark the device off line until a longer period of time has
> elasped without the device coming back.
There is a kernel config CONFIG_AIC7XXX_RESET_DELAY_MS (default 15s).
Would increasing it help?
> As for getting it to be not off line without rebooting, just do a this:
>
> echo "scsi-remove-single-device 0 0 6 0" > /proc/scsi/scsi
> echo "scsi-add-single-device 0 0 6 0" > /proc/scsi/scsi
>
> That'll remove the device and then rescan it. Assuming it's had enough
> time to complete the reset by the time you do this and it's once again
> ready to accept commands, this should get your CD back working.
This works fine, thanks a lot.
-Igor Mozetic
> Actually, it looked to me like there was a bus hang,
This is the actual part I want to understand. The problems with the
mid-layer are already well known. 8-)
>From the more complete logs I've now received it appears that the cdrom
drive is just not responding within the time allowed which sends us into
recovery.
--
Justin
> Doug Ledford writes:
>
> > took the device off line. So, in short, the mid layer isn't waiting
> long > enough, or when it gets sense indicated not ready it needs to
> implement a > waiting queue with a timeout to try rekicking things a
> few times and don't > actually mark the device off line until a longer
> period of time has > elasped without the device coming back.
>
> There is a kernel config CONFIG_AIC7XXX_RESET_DELAY_MS (default 15s).
> Would increasing it help?
This currently only effects the initial bus reset delay. If the driver
holds off commands after subsequent bus resets, it can cause undeserved
timeouts on the commands it has intentionally deferred. The mid-layer has
a 5 second delay after bus resets, but I haven't verified that this is
honored correctly during error recovery.
--
Justin
> Doug Ledford writes:
>
> > took the device off line. So, in short, the mid layer isn't waiting
> long > enough, or when it gets sense indicated not ready it needs to
> implement a > waiting queue with a timeout to try rekicking things a
> few times and don't > actually mark the device off line until a longer
> period of time has > elasped without the device coming back.
>
> There is a kernel config CONFIG_AIC7XXX_RESET_DELAY_MS (default 15s).
> Would increasing it help?
Justin Gibbs writes:
> This currently only effects the initial bus reset delay. If the
> driver holds off commands after subsequent bus resets, it can cause
> undeserved timeouts on the commands it has intentionally deferred.
> The mid-layer has a 5 second delay after bus resets, but I haven't
> verified that this is honored correctly during error recovery.
I'm planning a major re-write of this area in the error handler. The way I
think it should go is:
1) Quiesce host (set in_recovery flag)
2) Suspend active timers on this host
3) Proceed down the error correction track (eliminate abort and go down
device, bus and host resets and finally set the device offline).
5) On each error recovery wait out a recovery timer for the device to become
active before talking to it again. Send all affected commands back to the
block layer to await reissue (note: it would now be illegal for commands to
lie to the mid layer and say they've done the reset when they haven't).
6) issue a TUR using a command allocated to the eh for that purpose. Process
the return code (in particular, if the device says NOT READY, wait some more).
Only if the TUR definitively fails proceed up the recovery chain all the way
to taking the device offline.
I also plan to expose the suspend and resume timers API in some form for FC
drivers to use.
James
On Tue, Sep 03, 2002 at 09:35:02AM -0500, James Bottomley wrote:
>
> 1) Quiesce host (set in_recovery flag)
Right.
> 2) Suspend active timers on this host
Right.
> 3) Proceed down the error correction track (eliminate abort and go down
> device, bus and host resets and finally set the device offline).
Leave abort active. It does actually work in certain scenarios. The CD
burner scenario that started this thread is an example of somewhere that
an abort should actually do the job.
> 5) On each error recovery wait out a recovery timer for the device to become
> active before talking to it again. Send all affected commands back to the
> block layer to await reissue (note: it would now be illegal for commands to
> lie to the mid layer and say they've done the reset when they haven't).
> 6) issue a TUR using a command allocated to the eh for that purpose. Process
> the return code (in particular, if the device says NOT READY, wait some more).
> Only if the TUR definitively fails proceed up the recovery chain all the way
> to taking the device offline.
Right.
> I also plan to expose the suspend and resume timers API in some form for FC
> drivers to use.
--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
[email protected] said:
> Leave abort active. It does actually work in certain scenarios. The
> CD burner scenario that started this thread is an example of
> somewhere that an abort should actually do the job.
Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.
At best, abort probably causes a command to overtake a barrier it shouldn't,
at worst we abort the ordered tag that is the barrier and transactional
integrity is lost.
When error correction is needed, we have to return all the commands for that
device to the block layer so that ordering and barrier issues can be taken
care of in the reissue. This makes LUN RESET (for those that support it) the
minimum level of error correction we can apply.
James
On Tue, 2002-09-03 at 20:09, James Bottomley wrote:
> [email protected] said:
> > Leave abort active. It does actually work in certain scenarios. The
> > CD burner scenario that started this thread is an example of
> > somewhere that an abort should actually do the job.
>
> Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.
> At best, abort probably causes a command to overtake a barrier it shouldn't,
> at worst we abort the ordered tag that is the barrier and transactional
> integrity is lost.
What do we plan to do for the cases where reset is disabled because we
have shared disk scsi so don't want to reset and hose the reservations ?
If your error correction always requires all commands return to the
block layer then the block layer is IMHO broken. Its messy enough doing
that before you hit the fun situations where insert scsi commands of
their own the block layer never initiated.
Next you only need to return stuff if commands have been issued between
the aborting command and a barrier. Since most sane systems will never
be causing REQ_BARRIER that should mean the general case for an abort is
going to be fine. The CD burner example is also true for this. If we
track barrier sequences then we will know the barrier count for the
command we are aborting and the top barrier count for commands issued to
the device. Finally you only need to go to the large hammer approach
when you are dealing with a media changing command (ie WRITE*) - if we
abort a read then knowing we don't queue overlapping read then write to
disk we already know that the read will not break down the tag ordering
as I understand it ?
If we get to the point we need an abort we don't want to issue a reset.
Not every device comes back sane from a reset and in some cases we have
to issue a whole sequence of commands to get the state of the device
back (door locking, power management, ..)
On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote:
> [email protected] said:
> > Leave abort active. It does actually work in certain scenarios. The
> > CD burner scenario that started this thread is an example of
> > somewhere that an abort should actually do the job.
>
> Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.
REQ_BARRIER is used for filesystem devices (disks mainly) while the
devices that might benefit most from working aborts would likely be other
devices. But, regardless, the REQ_BARRIER ordering *can* be preserved
while using abort processing. Since the command that needs aborting is,
as you are hypothesizing, before the REQ_BARRIER command, and since it
hasn't completed, then the REQ_BARRIER command can not be complete and
neither can any of the commands behind the REQ_BARRIER. On direct access
devices you are only concerned about ordering around the barrier, not
ordering of the actual tagged commands, so for abort you can actually call
abort on all the commands past the REQ_BARRIER command first, then the
REQ_BARRIER command, then the hung command. That would do the job and
preserve REQ_BARRIER ordering while still using aborts.
> At best, abort probably causes a command to overtake a barrier it shouldn't,
> at worst we abort the ordered tag that is the barrier and transactional
> integrity is lost.
>
> When error correction is needed, we have to return all the commands for that
> device to the block layer so that ordering and barrier issues can be taken
> care of in the reissue.
Not really, this would be easily enough done in the ML_QUEUE area of the
scsi layer, but it matters not to me. However, if you throw a BDR, then
you have cancelled all outstanding commands and (typically) initiated a
hard reset of the device which then requires a device settle time. All of
this is more drastic and typically takes longer than the individual aborts
which are completed in a single connect->disconnect cycle without ever
hitting a data phase and without triggering a full device reset and
requiring a settle time.
> This makes LUN RESET (for those that support it) the
> minimum level of error correction we can apply.
Not true.
--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
James -
On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote:
> [email protected] said:
> > Leave abort active. It does actually work in certain scenarios. The
> > CD burner scenario that started this thread is an example of
> > somewhere that an abort should actually do the job.
>
> Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.
> At best, abort probably causes a command to overtake a barrier it shouldn't,
> at worst we abort the ordered tag that is the barrier and transactional
> integrity is lost.
>
> When error correction is needed, we have to return all the commands for that
> device to the block layer so that ordering and barrier issues can be taken
> care of in the reissue. This makes LUN RESET (for those that support it) the
> minimum level of error correction we can apply.
>
> James
If we only send an abort or reset after a quiesce I don't see why one
is better than the other.
Not specific to reset or abort - if a single command gets an error, we
wait for oustanding commands to complete before starting up the error
handler thread. If all the commands (error one and outstanding) have
barriers, those that do not error out will complete out of order from
the errored command.
How is this properly handled?
And what happens if one command gets some sort of check condition (like
medium error, or aborted command) that causes a retry? Will IO's still
be correctly ordered?
The abort could also be usefull in handling the locking/ownership of the
scsi_cmnd - the abort at the LLD layer can be used by the LLD to cancel
any software timeouts, as well as to flush the command from the hardware.
After the abort, the mid-layer could assume that it once again "owned"
the scsi_cmnd, especially if the LLD abort were a required function.
I would like to see error handling occur without quiescing the entire
adapter before taking any action. Stopping all adapter IO for a timeout
can be a bit expensive - imagine a tape drive and multiple disks on an
adapter, any IO disk timeout or failure will wait for the tape IO to
complete before allowing any other IO, if the tape operation is long or
is going to timeout this could be minutes or hours.
-- Patrick Mansfield
[email protected] said:
> What do we plan to do for the cases where reset is disabled because we
> have shared disk scsi so don't want to reset and hose the reservations
The reset gets issued and the reservation gets broken. Good HA or other
software knows the reservation may be lost and takes this into account in the
cluster algorithm.
With SCSI-2 reservations, there's no way to preserve the reservation and have
the reset be effective (I know, in theory, that this can be circumvented by
the soft reset alternative, but I've never seen a device that implements it
correctly). I suppose we hope SCSI-3 Persistent Group Reservations come along
quickly.
> If your error correction always requires all commands return to the
> block layer then the block layer is IMHO broken. Its messy enough
> doing that before you hit the fun situations where insert scsi
> commands of their own the block layer never initiated.
This is part of the slim SCSI down approach. The block layer already has
handling for tag errors like this. Inserted SCSI commands should now work
correctly since we're deprecating the scsi_do_cmnd() in favour of scsi_do_req,
which means the command is always associated with a request and goes into the
block queue just like any other request.
I think the block layer, which already knows about the barrier ordering, is
the appropriate place for this. If you think the scsi error handler is a
hairy wart now, just watch it grow into a stonking great carbuncle as I try to
introduce it to the concept of command queue ordering and appropriate recovery.
> Next you only need to return stuff if commands have been issued
> between the aborting command and a barrier. Since most sane systems
> will never be causing REQ_BARRIER that should mean the general case
> for an abort is going to be fine. The CD burner example is also true
> for this. If we track barrier sequences then we will know the barrier
> count for the command we are aborting and the top barrier count for
> commands issued to the device. Finally you only need to go to the
> large hammer approach when you are dealing with a media changing
> command (ie WRITE*) - if we abort a read then knowing we don't queue
> overlapping read then write to disk we already know that the read will
> not break down the tag ordering as I understand it ?
I agree with your reasoning. However, errors occur infrequently enough (I
hope) so that its just not worth the extra code complexity to make the error
handler look for that case.
However, in all honesty, I have to say that I just don't believe ABORTs are
ever particularly effective. As part of error recovery, If a device is
tipping over into failure, adding another message isn't a good way to pull it
back. ABORT is really part of the I/O cancellation API, and, like all
cancellation implementations, it's potentially full of holes. The only uses
it might have---like oops I didn't mean to fixate that CD, give it back to me
now---aren't clearly defined in the SPEC to produce the desired effect (stop
the fixation so the drive door can be opened).
> If we get to the point we need an abort we don't want to issue a
> reset. Not every device comes back sane from a reset and in some cases
> we have to issue a whole sequence of commands to get the state of the
> device back (door locking, power management, ..)
Well, this is SCSI---the first thing most controllers do for parallel SCSI at
least is reset the BUS. Some FC drivers do the FC equivalent as well (not
that they should, but that's another issue).
The pain of coming back from a reset (and I grant, it isn't trivial) is well
known and well implemented in SCSI. It also, from error handlings point of
view, sets the device back to a known point in the state model.
James
[email protected] said:
> But, regardless, the REQ_BARRIER ordering *can* be preserved while
> using abort processing. Since the command that needs aborting is, as
> you are hypothesizing, before the REQ_BARRIER command, and since it
> hasn't completed, then the REQ_BARRIER command can not be complete and
> neither can any of the commands behind the REQ_BARRIER.
You are correct. However, as soon as you abort the problem command (assuming
the device recovers from this), it will go on its merry way processing the
remaining commands in the queue. Assuming one of these is the barrier, you've
no way now of re-queueing the aborted command so that it comes before the
ordered tag barrier. You can try using a head of queue tag, but it's still a
nasty race.
> On direct access devices you are only concerned about ordering around
> the barrier, not ordering of the actual tagged commands, so for abort
> you can actually call abort on all the commands past the REQ_BARRIER
> command first, then the REQ_BARRIER command, then the hung command.
> That would do the job and preserve REQ_BARRIER ordering while still
> using aborts.
I agree, but the most likely scenario is that now you're trying to abort
almost every tag for that device in the system. Isn't reset a simpler
alternative to this?
> > At best, abort probably causes a command to overtake a barrier it shouldn't,
> > at worst we abort the ordered tag that is the barrier and transactional
> > integrity is lost.
> >
> > When error correction is needed, we have to return all the commands for that
> > device to the block layer so that ordering and barrier issues can be taken
> > care of in the reissue.
> Not really, this would be easily enough done in the ML_QUEUE area of
> the scsi layer, but it matters not to me. However, if you throw a
> BDR, then you have cancelled all outstanding commands and (typically)
> initiated a hard reset of the device which then requires a device
> settle time. All of this is more drastic and typically takes longer
> than the individual aborts which are completed in a single connect->
> disconnect cycle without ever hitting a data phase and without
> triggering a full device reset and requiring a settle time.
I agree. I certainly could do it. I'm just a lazy so-and-so. However, think
what it does. Apart from me having to do more work, the code becomes longer
and the error recovery path more convoluted and difficult to follow. The
benefit? well, error recovery might be faster in certain circumstances. I
just don't see that it's a cost effective change. If you're hitting error
recovery so often that whether it recovers in half a second or several
seconds makes a difference, I'd say there's something else wrong.
James
On Tue, 2002-09-03 at 22:32, James Bottomley wrote:
> I think the block layer, which already knows about the barrier ordering, is
> the appropriate place for this. If you think the scsi error handler is a
> hairy wart now, just watch it grow into a stonking great carbuncle as I try to
> introduce it to the concept of command queue ordering and appropriate recovery.
Point taken
> I agree with your reasoning. However, errors occur infrequently enough (I
> hope) so that its just not worth the extra code complexity to make the error
> handler look for that case.
When you ar ehandling CD-ROM problems then they can be quite a lot. Not
helped by the fact some ancient CD's seem to like to take a long walk in
the park when told to reset.
> cancellation implementations, it's potentially full of holes. The only uses
> it might have---like oops I didn't mean to fixate that CD, give it back to me
> now---aren't clearly defined in the SPEC to produce the desired effect (stop
> the fixation so the drive door can be opened).
Yes but does windows assume it. There are two specs 8)
> The pain of coming back from a reset (and I grant, it isn't trivial) is well
> known and well implemented in SCSI. It also, from error handlings point of
> view, sets the device back to a known point in the state model.
>
Resetting the bus is antisocial to say the least. We had lots of hangs
with the old eh handler that went away when the new one didnt keep
resetting the bus. If we do reset the bus then we have to handle "Sorry
can't reset the bus" (we have cards that can't or won't) as well as
being conservative about timings, making sure we don't reset the bus
during the seconds while a device rattles back into online state and so
on.
[email protected] said:
> If we only send an abort or reset after a quiesce I don't see why one
> is better than the other.
Quiesce means from the top (no more commands go from the block layer to the
device) not from the bottom (commands can still be completed for the device).
> Not specific to reset or abort - if a single command gets an error, we
> wait for oustanding commands to complete before starting up the error
> handler thread. If all the commands (error one and outstanding) have
> barriers, those that do not error out will complete out of order from
> the errored command.
We don't wait. The commands may possibly be completing in parallel with
recovery. To address your point, though, that's why the device needs to be
reset as fast as possible: to preserve what's left of the command order. I
accept that for a misbehaving device, this may be a race.
> And what happens if one command gets some sort of check condition
> (like medium error, or aborted command) that causes a retry? Will IO's
> still be correctly ordered?
Retries get eliminated. It should be up to the upper layers (sd or beyond) to
say whether a retry is done. Since, as you point out, retries automatically
violate any barrier, it is probably up to the block layer to judge what should
be done about the problem.
> I would like to see error handling occur without quiescing the entire
> adapter before taking any action. Stopping all adapter IO for a
> timeout can be a bit expensive - imagine a tape drive and multiple
> disks on an adapter, any IO disk timeout or failure will wait for the
> tape IO to complete before allowing any other IO, if the tape
> operation is long or is going to timeout this could be minutes or
> hours.
Actually, I do think that quiescing has an important role to play. A lot of
drive errors can be self inflicted indigestion by accepting more tags into the
queue than it can process. Quiescing the queue lets us see if the drive can
digest what it currently has or whether we need to apply a strong emetic.
I'd actually like to add tag starvation recovery to the error handler's list
of things to do. In that case, all you do on entry to the error handler is
quiesce the upper queue and wait a while to see if the commands continue
returning. You only begin more drastic measures if nothing comes back in the
designated time period.
James
On Tue, Sep 03, 2002 at 04:48:39PM -0500, James Bottomley wrote:
> [email protected] said:
> You are correct. However, as soon as you abort the problem command (assuming
> the device recovers from this), it will go on its merry way processing the
> remaining commands in the queue. Assuming one of these is the barrier, you've
> no way now of re-queueing the aborted command so that it comes before the
> ordered tag barrier. You can try using a head of queue tag, but it's still a
> nasty race.
(Solution to this race was in my next paragraph as you found ;-)
> > On direct access devices you are only concerned about ordering around
> > the barrier, not ordering of the actual tagged commands, so for abort
> > you can actually call abort on all the commands past the REQ_BARRIER
> > command first, then the REQ_BARRIER command, then the hung command.
> > That would do the job and preserve REQ_BARRIER ordering while still
> > using aborts.
>
> I agree, but the most likely scenario is that now you're trying to abort
> almost every tag for that device in the system. Isn't reset a simpler
> alternative to this?
Not really. It hasn't been done yet, but one of my goals is to change the
scsi commands over to reasonable list usage (finally) so that we can avoid
all these horrible linear scans it does now looking for an available
command (it also means things like SCSI_OWNER_MID_LAYER can go away
because ownership is defined implicitly by list membership). So,
basically, you have a list item struct on each command. When you build
the commands, you add them to SDpnt->free_list. When you need a command,
instead of searching for a free one, you just grab the head of
SDpnt->free_list and use it. Once you've built the command and are ready
to hand it off to the lldd, you put the command on the tail of the
SDpnt->active_list. When a command completes, you list_remove() it from
the SDpnt->active_list and put it on the SDpnt->complete_list to be
handled by the tasklet. When the tasklet actually completes the command,
it frees the scsi command struct by simply putting it back on the
SDpnt->free_list. Now, if you do things that way, your reset vs. abort
code is actually pretty trivial.
Case 1: you want to throw a BDR. Sample code might end up looking like
this,
[ oops we timed out ]
hostt->bus_device_reset(cmd);
if(!list_empty(cmd->device->active_list)) {
[ our commands haven't all been returned, spew chunks! ]
}
[ do post reset processing ]
Case 2: you want to do an abort, but you need to preserve ordering around
any possible REQ_BARRIERs on the bus. This requires that we keep a
REQ_BARRIER count for the device, it is after all possible that we could
have multiple barriers active at once, so as each command is put on the
active_list, if it is a barrier, then we increment SDpnt->barrier_count
and as we complete commands (at the interrupt context completion, not the
final completion) if it is a barrier command we decrement the count.
[ oops we timed out ]
while(SDpnt->barrier_count && cmd) {
// when the aborted command is returned via the done()
// it will remove it from the active_list, so don't remove
// it here
abort_cmd = list_get_tail(SDpnt->active_list);
if(hostt->abort(abort_cmd) != SUCCESS) {
[ oops, go on to more drastic action ]
} else {
if(abort_cmd->type == BARRIER)
SDpnt->barrier_count--;
if(abort_cmd == cmd)
cmd = NULL;
}
}
if(cmd) {
if(hostt->abort(cmd) != SUCCESS)
[ oops, go on to more drastic action ]
}
Now, granted, that is more complex than going straight to a BDR, but I
have to argue that it *isn't* that complex. It certainly isn't the
nightmare you make it sound like ;-)
> > > At best, abort probably causes a command to overtake a barrier it shouldn't,
> > > at worst we abort the ordered tag that is the barrier and transactional
> > > integrity is lost.
> > >
> > > When error correction is needed, we have to return all the commands for that
> > > device to the block layer so that ordering and barrier issues can be taken
> > > care of in the reissue.
>
> > Not really, this would be easily enough done in the ML_QUEUE area of
> > the scsi layer, but it matters not to me. However, if you throw a
> > BDR, then you have cancelled all outstanding commands and (typically)
> > initiated a hard reset of the device which then requires a device
> > settle time. All of this is more drastic and typically takes longer
> > than the individual aborts which are completed in a single connect->
> > disconnect cycle without ever hitting a data phase and without
> > triggering a full device reset and requiring a settle time.
>
> I agree. I certainly could do it. I'm just a lazy so-and-so. However, think
> what it does. Apart from me having to do more work, the code becomes longer
> and the error recovery path more convoluted and difficult to follow. The
> benefit? well, error recovery might be faster in certain circumstances.
Well, as I've laid it out above, I don't really think it's all that much
to implement ;-) At least not in the mid layer. The low level device
drivers are doing *far* more work to support aborts than the mid layer has
to do.
> I
> just don't see that it's a cost effective change.
Matter of some question, I'm sure. I don't see it as all that much work,
so it seems reasonably cost effective to me ;-)
> If you're hitting error
> recovery so often that whether it recovers in half a second or several
> seconds makes a difference, I'd say there's something else wrong.
Hehehe, if you are hitting error recovery at all then something else is
wrong by definition, the only difference is in how you handle it :-P
--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
> [email protected] said:
> > Next you only need to return stuff if commands have been issued
> > between the aborting command and a barrier. Since most sane systems
> > will never be causing REQ_BARRIER
Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to
support more robust journaling with order requirement verification. If
that's true, then REQ_BARRIER commands could become quite common on disks
using ext3.
On Tue, Sep 03, 2002 at 04:32:38PM -0500, James Bottomley wrote:
> However, in all honesty, I have to say that I just don't believe ABORTs are
> ever particularly effective. As part of error recovery, If a device is
> tipping over into failure, adding another message isn't a good way to pull it
^^^^^^^^^^^^^^^^^^^^^^
Then you might as well skip device resets since they are implemented using
messages and go straight to bus resets. Shot deflected, no score.
> back. ABORT is really part of the I/O cancellation API, and, like all
> cancellation implementations, it's potentially full of holes. The only uses
> it might have---like oops I didn't mean to fixate that CD, give it back to me
> now---aren't clearly defined in the SPEC to produce the desired effect (stop
> the fixation so the drive door can be opened).
In my experience, aborts have always actually worked fairly well in any
scenario where a bus device reset will work. Generally speaking, the
problems I've always ran into with SCSI busses have been either A) this
command is screwing up but it isn't confusing the drive so we can abort it
or BDR it because the drive still responds to us or B) the bus is hung
hard and no transfers or messages of any kind can make it through. In the
B case, a full bus reset is the only thing that works. In the A case,
aborts work just as often as anything else.
> The pain of coming back from a reset (and I grant, it isn't trivial) is well
> known and well implemented in SCSI. It also, from error handlings point of
> view, sets the device back to a known point in the state model.
So does a successful abort.
--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
On Tue, Sep 03, 2002 at 06:42:16PM -0400, Doug Ledford wrote:
>
> Case 2: you want to do an abort, but you need to preserve ordering around
> any possible REQ_BARRIERs on the bus. This requires that we keep a
> REQ_BARRIER count for the device, it is after all possible that we could
> have multiple barriers active at once, so as each command is put on the
> active_list, if it is a barrier, then we increment SDpnt->barrier_count
> and as we complete commands (at the interrupt context completion, not the
> final completion) if it is a barrier command we decrement the count.
>
> [ oops we timed out ]
> while(SDpnt->barrier_count && cmd) {
> // when the aborted command is returned via the done()
> // it will remove it from the active_list, so don't remove
> // it here
> abort_cmd = list_get_tail(SDpnt->active_list);
> if(hostt->abort(abort_cmd) != SUCCESS) {
> [ oops, go on to more drastic action ]
> } else {
> if(abort_cmd->type == BARRIER)
> SDpnt->barrier_count--;
Oops, delete those last two lines....the done() function decrements the
barrier_count for us.
> if(abort_cmd == cmd)
> cmd = NULL;
> }
> }
> if(cmd) {
> if(hostt->abort(cmd) != SUCCESS)
> [ oops, go on to more drastic action ]
> }
--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
On Tue, 2002-09-03 at 23:02, James Bottomley wrote:
> > And what happens if one command gets some sort of check condition
> > (like medium error, or aborted command) that causes a retry? Will IO's
> > still be correctly ordered?
>
> Retries get eliminated. It should be up to the upper layers (sd or beyond) to
> say whether a retry is done. Since, as you point out, retries automatically
> violate any barrier, it is probably up to the block layer to judge what should
> be done about the problem.
Then we need to give the block layer a lot more information about what
kind of a problem occurred
On Tue, 2002-09-03 at 23:42, Doug Ledford wrote:
> Not really. It hasn't been done yet, but one of my goals is to change the
> scsi commands over to reasonable list usage (finally) so that we can avoid
> all these horrible linear scans it does now looking for an available
> command (it also means things like SCSI_OWNER_MID_LAYER can go away
At least partly done in 2.4-ac and waiting pushing to Marcelo. I believe
Hugh contributed the O(1) command finder
On Tue, 2002-09-03 at 23:50, Doug Ledford wrote:
> Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to
> support more robust journaling with order requirement verification. If
> that's true, then REQ_BARRIER commands could become quite common on disks
> using ext3.
I doubt it very much because the only way to implement barriers on IDE
is fantastically expensive. I'm dubious it makes sense for ext3 to use
barriers in that way. All it needs as I understand the rules is that an
I/O isnt reported as completed by the device driver before it is on the
medium or in a non volatile cache.
On Sep 3, 6:50pm, Doug Ledford wrote:
>
> > [email protected] said:
> > > Next you only need to return stuff if commands have been issued
> > > between the aborting command and a barrier. Since most sane systems
> > > will never be causing REQ_BARRIER
>
> Hmmm...I thought a big reason for adding REQ_BARRIER was to be able to
> support more robust journaling with order requirement verification. If
> that's true, then REQ_BARRIER commands could become quite common on disks
> using ext3.
Hmm. There do seem to be a lot of loopholes/race conditions where the
barrier just won't work right in the face of error recovery. I wouldn't
want to use barriers on any system where data integrity was crucial.
For example, in Fibrechannel using class 3 (the usual)
send command (command frame corrupted; device does not receive)
send barrier (completes normally)
... (lots of time goes by, many more commands are processed)
timeout original command whose command frame was corrupted
The only safe way to run such a filesystem is to hold the barriers in
the driver until all previous commands are successfully completed.
There was also the problem of the queue full to the barrier command,
etc.
Did I miss the answer to these? I don't recall seeing an answer to
Patrick Mansfield's questions either (original message edited cutting
out a couple of paragraphs):
> On Tue, Sep 03, 2002 at 02:09:44PM -0500, James Bottomley wrote:
> > [email protected] said:
> > > Leave abort active. It does actually work in certain scenarios. The
> > > CD burner scenario that started this thread is an example of
> > > somewhere that an abort should actually do the job.
> >
> > Unfortunately, it would destroy the REQ_BARRIER approach in the block layer.
> > At best, abort probably causes a command to overtake a barrier it shouldn't,
> > at worst we abort the ordered tag that is the barrier and transactional
> > integrity is lost.
> >
> > When error correction is needed, we have to return all the commands for that
> > device to the block layer so that ordering and barrier issues can be taken
> > care of in the reissue. This makes LUN RESET (for those that support it) the
> > minimum level of error correction we can apply.
> >
> > James
>
> If we only send an abort or reset after a quiesce I don't see why one
> is better than the other.
>
> Not specific to reset or abort - if a single command gets an error, we
> wait for oustanding commands to complete before starting up the error
> handler thread. If all the commands (error one and outstanding) have
> barriers, those that do not error out will complete out of order from
> the errored command.
>
> How is this properly handled?
>
> And what happens if one command gets some sort of check condition (like
> medium error, or aborted command) that causes a retry? Will IO's still
> be correctly ordered?
jeremy
On Tue, Sep 03, 2002 at 04:48:39PM -0500, James Bottomley wrote:
> If you're hitting error recovery so often that whether it recovers
> in half a second or several seconds makes a difference, I'd say
> there's something else wrong.
Not that I want to contradict, but an example.
Without my sd.c patch from yesterday or so (fixing MODE SENSE calls)
an "insmod usb-storage.o" would take 14 minutes and 6 seconds for me.
[One USB device, with 3 subdevices, gets into a bad state when
presented with a MODE SENSE command that asks for more than the
56 bytes it has available. For each of the three subdevices we
get a long sequence of retries, abort, reset, host reset, bus reset
before it is taken off-line.]
The scsi error recovery has many bad properties, but one is its slowness.
Once it gets triggered on a machine with SCSI disks it is common to
have a dead system for several minutes. I have not yet met a situation
in which rebooting was not preferable above scsi error recovery,
especially since the attempt to recover often fails.
Andries
On Wed, Sep 04, 2002 at 12:37:37PM +0200, Andries Brouwer wrote:
>
> The scsi error recovery has many bad properties, but one is its slowness.
This does not have to be this way. It is a solvable problem.
> Once it gets triggered on a machine with SCSI disks it is common to
> have a dead system for several minutes.
Yes, well, this too is solvable. It, in fact, reminds me that one of the
things I think needs added to the scsi host settings is a default timeout
value for typical devices. Something like adding a default_timeout value
to each Scsi_Device struct and allowing the scsi driver to modify the
value during the slave_attach() call. Then we can put the default timeout
on non-intelligent controllers to something sane while things like
MegaRAID controllers can keep their sky high timeout values.
> I have not yet met a situation
> in which rebooting was not preferable above scsi error recovery,
> especially since the attempt to recover often fails.
This, too, is solvable. It just requires that the scsi subsystem start
paying attention to *how* things fail and making the error handling code
smart enough to know when to retry things and when to just fail.
--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
On Wed, 2002-09-04 at 11:37, Andries Brouwer wrote:
> The scsi error recovery has many bad properties, but one is its slowness.
> Once it gets triggered on a machine with SCSI disks it is common to
> have a dead system for several minutes. I have not yet met a situation
> in which rebooting was not preferable above scsi error recovery,
> especially since the attempt to recover often fails.
Well I for one prefer the scsi timeout/abort sequence on a CD getting
confused badly by a bad block (as at least some of my drives do) to a
reboot everytime I get bad media
[email protected] said:
> Now, granted, that is more complex than going straight to a BDR, but I
> have to argue that it *isn't* that complex. It certainly isn't the
> nightmare you make it sound like ;-)
It's three times longer even in pseudocode...
However, assume we do this (because we must for barrier preservation). The
chances are that for a failing device we're aborting a significant number of
the tags. This is quite a big increase in the message load over what we do
now---Particularly for the AIC driver which can have hundreds of tags
outstanding (murphys law says it's usually the earilest tag which times out).
I'm not convinced that a BDR, which is a single message and has roughly the
same effect, isn't preferable.
However, what about a compromise? We can count outstanding commands, so what
about doing abort *if* the number of outstanding commands is exactly one (the
one we're trying to abort). This means for devices that don't do TCQ (like
old CD-ROMs) we'll always try abort first. For large numbers of outstanding
tags, we skip over abort and move straight to BDR. The code to implement this
will be clean and simple because abort no longer has to pay attention to the
barrier.
James
[email protected] said:
> For example, in Fibrechannel using class 3 (the usual)
> send command (command frame corrupted; device does not receive)
> send barrier (completes normally)
> ... (lots of time goes by, many more commands are processed)
> timeout original command whose command frame was corrupted
This doesn't look right to me from the SCSI angle I don't see how you can get
a successful disconnect on a command the device doesn't receive (I take it
this is some type of Fibre magic?). Of course, if the device (or its proxy)
does receive the command then the ordered queue tag implementation requires
that the corrupted frame command be processed prior to the barrier, this
isn't optional if you obey the spec. Thus, assuming the processor does no
integrity checking of the command until it does processing (this should be a
big if), then we still must get notification of the failed command before the
barrier tag is begun. Obviously, from that notification we do then race to
eliminate the overtaking tags.
> There was also the problem of the queue full to the barrier command,
> etc.
The queue full problem still exists. I've used this argument against the
filesystem people many times at the various fora where it has been discussed.
The situation is that everyone agrees that it's a theoretical problem, but
no-one is convinced that it will actually occur in practice (I think it falls
into the "risk we're willing to take" category).
James
On Wed, Sep 04, 2002 at 12:23:13PM +0100, Alan Cox wrote:
> On Wed, 2002-09-04 at 11:37, Andries Brouwer wrote:
> > The scsi error recovery has many bad properties, but one is its slowness.
> > Once it gets triggered on a machine with SCSI disks it is common to
> > have a dead system for several minutes. I have not yet met a situation
> > in which rebooting was not preferable above scsi error recovery,
> > especially since the attempt to recover often fails.
>
> Well I for one prefer the scsi timeout/abort sequence on a CD getting
> confused badly by a bad block (as at least some of my drives do) to a
> reboot everytime I get bad media
Reboot is bad. Retries are bad.
Errors should be returned to an upper layer, with an error code: "may
retry", or "will never work". (Like in SMTP)
I will most likely set the "retry count" to 0: Never retry. Almost
never works anyway. And the disk already retried manytimes, so
retrying in software is only "taking time".
We do datarecovery around here. We get bad disks on a dayly basis. We
are currently reading a drive that gets over 10Mb per second while
spitting out bad block reports!
Thing is: those blocks that didn't work first time, may work on our
second retry. However, we need userspace control over that retry. We
prefer to get the 18G worth of data off the disk first, and only then
retry the blocks that happened to be bad first time around.
Roger.
--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* The Worlds Ecosystem is a stable system. Stable systems may experience *
* excursions from the stable situation. We are currenly in such an *
* excursion: The stable situation does not include humans. ***************
> [email protected] said:
>> Now, granted, that is more complex than going straight to a BDR, but I
>> have to argue that it *isn't* that complex. It certainly isn't the
>> nightmare you make it sound like ;-)
>
> It's three times longer even in pseudocode...
To make this work, you really need to use the QErr bit in the
disconnect/reconnect page and/or ECA or ACA. QErr I believe is
well supported in devices, but ECA (pre SCSI-3) and ACA most
likely receive very little testing.
I will also voice my opinion (again) that watchdog timer recovery
is in the wrong place in Linux. It belongs in the controller drivers:
1) Only the controller driver knows when to start the timeout
2) Only the controller driver knows the current status of the bus/transport
3) Only the controller can close timeout/just completed races
4) Only the controller driver knows the true transport type
(SPI/FC/ATA/USB/1394/IP) and what recovery actions are appropriate
for that transport type given the capabilities of the controller.
5) The algorithm for recovery and maintaining order becomes quite simple:
1) Freeze the input queue for the controller
2) Return all transactions unqueued to a device to the mid-layer
3) Perform the recovery actions required
4) Unfreeze the controller's queue
5) Device type driver (sd, cd, tape, etc) decides what errors
at what rates should cause the failure of a device. The
controller driver just needs to have the error codes so
it can honestly and fully report to the type driver what
really happens so it can make good decissions
This of course assumes that all transactions have a serial number and
that requeuing transactions orders them by serial number. With QErr
set, the device closes the rest if the barrier race for you, but even
without barriers, full transaction ordering is required if you don't
want a read to inadvertantly pass a write to the same location during
recovery.
For prior art, take a look at FreeBSD. In the worst case, where
escalation to a bus reset is required, recovery takes 5 seconds.
--
Justin
Hi,
On Wed, 4 Sep 2002, Rogier Wolff wrote:
> I will most likely set the "retry count" to 0: Never retry. Almost
> never works anyway. And the disk already retried manytimes, so
> retrying in software is only "taking time".
fd = open(pathname, O_NORETRY); ?
Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-
James Bottomley [[email protected]] wrote:
> [email protected] said:
> > For example, in Fibrechannel using class 3 (the usual)
>
> > send command (command frame corrupted; device does not receive)
> > send barrier (completes normally)
> > ... (lots of time goes by, many more commands are processed)
> > timeout original command whose command frame was corrupted
>
> This doesn't look right to me from the SCSI angle I don't see how you can get
> a successful disconnect on a command the device doesn't receive (I take it
> this is some type of Fibre magic?). Of course, if the device (or its proxy)
> does receive the command then the ordered queue tag implementation requires
> that the corrupted frame command be processed prior to the barrier, this
> isn't optional if you obey the spec. Thus, assuming the processor does no
> integrity checking of the command until it does processing (this should be a
> big if), then we still must get notification of the failed command before the
> barrier tag is begun. Obviously, from that notification we do then race to
> eliminate the overtaking tags.
In FC class 3 if you are logged into a port then notice of this loss
doesn't happen until a upper level timeout occurs (ULTP?). The loss can
happen prior to the command reaching the device (i.e. the switch can
drop the frame). If a corrupted frame makes it to the device it will be
discarded as there is not much it can do with a frame containing unreliable
data. In FC class 2 frames are ack'd so the recovery can be much more
responsive.
-Mike
--
Michael Anderson
[email protected]
Doug Ledford wrote:
>
> Not really. It hasn't been done yet, but one of my goals is to change the
> scsi commands over to reasonable list usage (finally) so that we can avoid
> all these horrible linear scans it does now looking for an available
> command
Using the struct list_head for this will literally allow you to do _magic_.
Avoiding the linear scan is the last thing this will fix.
It would allow for a lot better/simpler/sound design of all of
the mid layer/SCSI core. Things will be/become easier as you
point out below. Currently the mid-layer queuing is hairy at best.
I'm all for it.
> So,
> basically, you have a list item struct on each command. When you build
> the commands, you add them to SDpnt->free_list. When you need a command,
> instead of searching for a free one, you just grab the head of
> SDpnt->free_list and use it. Once you've built the command and are ready
> to hand it off to the lldd, you put the command on the tail of the
> SDpnt->active_list. When a command completes, you list_remove() it from
> the SDpnt->active_list and put it on the SDpnt->complete_list to be
> handled by the tasklet. When the tasklet actually completes the command,
> it frees the scsi command struct by simply putting it back on the
> SDpnt->free_list.
Great!
Once you're on that train, you may want to rethink the whole queuing
mechanism of the mid-layer (straight from sd/etc and internally down to LLDD)
for an improved design.
There'd be problems like cmd moving b/n lists is atomic, only cmd movers
can actually cancel a command, move before calling queuecommand(), etc,
but is nothing extraordinary.
> Now, granted, that is more complex than going straight to a BDR, but I
> have to argue that it *isn't* that complex. It certainly isn't the
> nightmare you make it sound like ;-)
No, it certainly is NOT!
Granted, by looking at the code it will not be overly clear
who moves what and when, but a 2 page commentary on the design
would only leave one exlaiming ``Aaaaah... such simplicity, so great!''
> Well, as I've laid it out above, I don't really think it's all that much
> to implement ;-) At least not in the mid layer.
Right, it's not. This type of queuing mechanism would only make things
more consistent and easy to manipulate.
There'd be logistical issues, but those are easy to figure out
with pen and paper.
--
Luben
``Perfection is achieved not when there is nothing more to add
but when there is nothing left to take away.''
Antoine de Saint Exupery
On Sep 4, 10:50am, Justin T. Gibbs wrote:
>
> This of course assumes that all transactions have a serial number and
> that requeuing transactions orders them by serial number. With QErr
> set, the device closes the rest if the barrier race for you, but even
> without barriers, full transaction ordering is required if you don't
> want a read to inadvertantly pass a write to the same location during
> recovery.
The original FCP (SCSI commands over Fibre) profile specified that QERR=1
was not available. Unless that is changed, it would appear that you cannot
count on being able to set Qerr.
Qerr was one of those annoying little things in SCSI that forces host
adapter drivers to know a mode page setting.
jeremy
On Sep 4, 11:24am, James Bottomley wrote:
>
> [email protected] said:
> > For example, in Fibrechannel using class 3 (the usual)
>
> > send command (command frame corrupted; device does not receive)
> > send barrier (completes normally)
> > ... (lots of time goes by, many more commands are processed)
> > timeout original command whose command frame was corrupted
>
> This doesn't look right to me from the SCSI angle I don't see how you can get
> a successful disconnect on a command the device doesn't receive (I take it
> this is some type of Fibre magic?). Of course, if the device (or its proxy)
> does receive the command then the ordered queue tag implementation requires
> that the corrupted frame command be processed prior to the barrier, this
> isn't optional if you obey the spec. Thus, assuming the processor does no
> integrity checking of the command until it does processing (this should be a
> big if), then we still must get notification of the failed command before the
> barrier tag is begun. Obviously, from that notification we do then race to
> eliminate the overtaking tags.
You don't have disconnect per se with Fibre (or with other packet based
interfaces). A command is sent as a frame. If the frame is corrupted
or lost on the way to the target, then the target will never know about
the command having been sent. There are obvious problems if that command
has an ordered tag, preceded and followed by simple tags dependent on
the ordered semantics being followed, and where the simple tagged commands
following don't wait for the ordered tag to finish. There are other
failure modes too, of course.
> > There was also the problem of the queue full to the barrier command,
> > etc.
>
> The queue full problem still exists. I've used this argument against the
> filesystem people many times at the various fora where it has been discussed.
> The situation is that everyone agrees that it's a theoretical problem, but
> no-one is convinced that it will actually occur in practice (I think it falls
> into the "risk we're willing to take" category).
I guess it all depends on how sensitive one is to getting it wrong. In
any case, it looks as though you are doing what you can . . . .
> James
thanks
jeremy
> On Sep 4, 10:50am, Justin T. Gibbs wrote:
>>
>> This of course assumes that all transactions have a serial number and
>> that requeuing transactions orders them by serial number. With QErr
>> set, the device closes the rest if the barrier race for you, but even
>> without barriers, full transaction ordering is required if you don't
>> want a read to inadvertantly pass a write to the same location during
>> recovery.
>
>
> The original FCP (SCSI commands over Fibre) profile specified that QERR=1
> was not available. Unless that is changed, it would appear that you
> cannot count on being able to set Qerr.
>
> Qerr was one of those annoying little things in SCSI that forces host
> adapter drivers to know a mode page setting.
It is not the controllers that know, but the type drivers. The controller
drivers should be conduits for commands, nothing more. With the proper
events and error codes, the type drivers can maintain mode parameters
and only the type drivers know what parameters are appropriate for their
type of service.
For FC, you need to use ECA/ACA anyway as that is the only way to deal with
inflight commands at the time of an error.
--
Justin