On Tue, Dec 06 2011 at 5:42pm -0500,
Mike Snitzer <[email protected]> wrote:
> On Tue, Dec 06 2011 at 5:03pm -0500,
> Martin K. Petersen <[email protected]> wrote:
>
> > >>>>> "Mike" == Mike Snitzer <[email protected]> writes:
> >
> > Mike> Regardless, shouldn't the SCSI midlayer classify such
> > Mike> ILLEGAL_REQUEST sense, with an add. sense I listed in the patch,
> > Mike> as a target error?
> >
> > Well, even SUCCESS should cause the I/O to be aborted.
>
> As I replied to James, yes SUCCESS does cause the IO to fail, but the
> discard gets retried by multipath.
>
> Returning TARGET_ERROR enables the block layer to return -EREMOTEIO
> which multipath will immediately pass up (rather than the normal fail
> path and retry).
>
> > I assume this is the RHEL6 kernel? Did you backport my provisioning
> > updates that brings the heuristics in sync with SBC-3 (#c98a0e)?
>
> Yes, that update was pulled in to RHEL6.2 (released today). But this
> issue is a concern for both upstream and RHEL6 (and any other distro
> with a recent kernel).
Hey Martin and James,
(for the benefit of others, original proposed fix is here):
http://www.spinics.net/lists/linux-scsi/msg55792.html
I've had 2 additional reports from different storage vendors that they
(or their customers) are having Linux (RHEL6) installation failures.
The reason is their storage is broken:
- they set TPE in the storage target's READ CAPACITY response
- but they only support UNMAP (not WRITE SAME w/ UNMAP bit set)
- but they don't properly populate the BLOCK LIMITS VPD; so Linux
defaults to using WRITE SAME w/ UNMAP bit set.
So that makes 3 different _prominent_ storage vendors, that I am aware
of, that are bitten by their broken storage (relative to discard and
properly advertising which variant they actually support). I'd much
rather deal with the storage vendors (or their customers) reporting that
discards aren't working than mutual customers reporting that they cannot
even install to the storage.
The ultimate fix is clear: storage vendors need to fix their storage
(2 of the 3 have, 1 is working on it). But a Linux-only workaround for
this series of unfortunate events (particularly as it happens with
multipath in the mix) is to have SCSI classify certain ILLEGAL_REQUEST
as the TARGET_ERROR that they are.
I would very much appreciate this fix making its way to Linux 3.3 (even
though we're past the merge window this is a pure fix).
It will allow Linux to cope with vendors' storage that is broken.
Please advise, Ack, submit for upstream 3.3 inclusion, etc... thanks! :)
Mike
>>>>> "Mike" == Mike Snitzer <[email protected]> writes:
Mike> So that makes 3 different _prominent_ storage vendors, that I am
Mike> aware of, that are bitten by their broken storage (relative to
Mike> discard and properly advertising which variant they actually
Mike> support). I'd much rather deal with the storage vendors (or their
Mike> customers) reporting that discards aren't working than mutual
Mike> customers reporting that they cannot even install to the storage.
More graceful handling of the sense data aside, we do have a couple of
options:
1. Now that the provisioning portion seems to be stable in SBC-3 we can
nuke the interim spec heuristics and only support devices that
report the right thing. This may disable provisioning for some
existing users whose arrays run non-compliant firmware.
2. We can add another layer of heuristics based on the RSOC wrapper I
introduced for write same. Maybe you could send me sg_opcodes output
for the arrays in question?
Mike> The ultimate fix is clear: storage vendors need to fix their
Mike> storage (2 of the 3 have, 1 is working on it). But a Linux-only
Mike> workaround for this series of unfortunate events (particularly as
Mike> it happens with multipath in the mix) is to have SCSI classify
Mike> certain ILLEGAL_REQUEST as the TARGET_ERROR that they are.
I don't have a fundamental problem with your patch. But since we
explicitly handle ILLEGAL REQUEST with 0x20 and 0x24 in sd.c I wonder
what's broken? We should disable discard support if the WRITE SAME w/
UNMAP fails.
I'll see if I can figure out what's going on unless you beat me to it...
--
Martin K. Petersen Oracle Linux Engineering
On Mon, Feb 13 2012 at 12:53pm -0500,
Martin K. Petersen <[email protected]> wrote:
> >>>>> "Mike" == Mike Snitzer <[email protected]> writes:
>
> Mike> So that makes 3 different _prominent_ storage vendors, that I am
> Mike> aware of, that are bitten by their broken storage (relative to
> Mike> discard and properly advertising which variant they actually
> Mike> support). I'd much rather deal with the storage vendors (or their
> Mike> customers) reporting that discards aren't working than mutual
> Mike> customers reporting that they cannot even install to the storage.
>
> More graceful handling of the sense data aside, we do have a couple of
> options:
>
> 1. Now that the provisioning portion seems to be stable in SBC-3 we can
> nuke the interim spec heuristics and only support devices that
> report the right thing. This may disable provisioning for some
> existing users whose arrays run non-compliant firmware.
>
> 2. We can add another layer of heuristics based on the RSOC wrapper I
> introduced for write same. Maybe you could send me sg_opcodes output
> for the arrays in question?
Yeah, I think that would be welcomed evolution (but as you say,
independent of improving additional ILLEGAL REQUEST processing).
> Mike> The ultimate fix is clear: storage vendors need to fix their
> Mike> storage (2 of the 3 have, 1 is working on it). But a Linux-only
> Mike> workaround for this series of unfortunate events (particularly as
> Mike> it happens with multipath in the mix) is to have SCSI classify
> Mike> certain ILLEGAL_REQUEST as the TARGET_ERROR that they are.
>
> I don't have a fundamental problem with your patch. But since we
> explicitly handle ILLEGAL REQUEST with 0x20 and 0x24 in sd.c I wonder
> what's broken? We should disable discard support if the WRITE SAME w/
> UNMAP fails.
Yeah, I thought the disabling would be sufficient too. But
unfortunately multipath doesn't inspect the request it is retrying
(after it fails the path the request just failed on). So even though
discards get disabled: the first discard (which caused discards to
become disabled) is still in-flight and keeps getting retried
indefinitely by the multipath layer (if the paths recover quickly).
Mike
On Mon, Feb 13 2012 at 1:13pm -0500,
Mike Snitzer <[email protected]> wrote:
> On Mon, Feb 13 2012 at 12:53pm -0500,
> Martin K. Petersen <[email protected]> wrote:
>
> > >>>>> "Mike" == Mike Snitzer <[email protected]> writes:
> >
> > Mike> So that makes 3 different _prominent_ storage vendors, that I am
> > Mike> aware of, that are bitten by their broken storage (relative to
> > Mike> discard and properly advertising which variant they actually
> > Mike> support). I'd much rather deal with the storage vendors (or their
> > Mike> customers) reporting that discards aren't working than mutual
> > Mike> customers reporting that they cannot even install to the storage.
> >
> > More graceful handling of the sense data aside, we do have a couple of
> > options:
> >
> > 1. Now that the provisioning portion seems to be stable in SBC-3 we can
> > nuke the interim spec heuristics and only support devices that
> > report the right thing. This may disable provisioning for some
> > existing users whose arrays run non-compliant firmware.
> >
> > 2. We can add another layer of heuristics based on the RSOC wrapper I
> > introduced for write same. Maybe you could send me sg_opcodes output
> > for the arrays in question?
>
> Yeah, I think that would be welcomed evolution (but as you say,
> independent of improving additional ILLEGAL REQUEST processing).
That was a response to 1 above.
I don't have direct access to the arrays in question to get sg_opcodes.
But I can work on getting them.
Mike
>>>>> "Mike" == Mike Snitzer <[email protected]> writes:
>> I don't have a fundamental problem with your patch. But since we
>> explicitly handle ILLEGAL REQUEST with 0x20 and 0x24 in sd.c I wonder
>> what's broken? We should disable discard support if the WRITE SAME w/
>> UNMAP fails.
Mike> Yeah, I thought the disabling would be sufficient too. But
Mike> unfortunately multipath doesn't inspect the request it is retrying
Mike> (after it fails the path the request just failed on).
Well, we shouldn't be returning something that multipath should ever act
on.
I think I understand what's going on. Can you try the following patch?
--
Martin K. Petersen Oracle Linux Engineering
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b2c95db..4e8d0b6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -879,6 +879,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
cmd->cmnd[0] == WRITE_SAME_16 ||
cmd->cmnd[0] == WRITE_SAME)) {
description = "Discard failure";
+ error = -EREMOTEIO;
action = ACTION_FAIL;
} else
action = ACTION_FAIL;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c691fb5..5f0d383 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -497,6 +497,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
max(sdkp->physical_block_size,
sdkp->unmap_granularity * logical_block_size);
+ sdkp->provisioning_mode = mode;
+
switch (mode) {
case SD_LBP_DISABLE:
@@ -524,8 +526,6 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
q->limits.max_discard_sectors = max_blocks * (logical_block_size >> 9);
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
-
- sdkp->provisioning_mode = mode;
}
/**
On Mon, Feb 13 2012 at 2:16pm -0500,
Martin K. Petersen <[email protected]> wrote:
> >>>>> "Mike" == Mike Snitzer <[email protected]> writes:
>
> >> I don't have a fundamental problem with your patch. But since we
> >> explicitly handle ILLEGAL REQUEST with 0x20 and 0x24 in sd.c I wonder
> >> what's broken? We should disable discard support if the WRITE SAME w/
> >> UNMAP fails.
>
> Mike> Yeah, I thought the disabling would be sufficient too. But
> Mike> unfortunately multipath doesn't inspect the request it is retrying
> Mike> (after it fails the path the request just failed on).
>
> Well, we shouldn't be returning something that multipath should ever act
> on.
>
> I think I understand what's going on. Can you try the following patch?
Looks good to me (small nit below), it'll solve the immediate problem,
I'll pass it on. Please add my:
Acked-by: Mike Snitzer <[email protected]>
But I also think establishing a baseline of TARGET_ERROR for certain
ILLEGAL REQUEST is still sane and should go in too...
Thanks,
Mike
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index b2c95db..4e8d0b6 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -879,6 +879,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
> cmd->cmnd[0] == WRITE_SAME_16 ||
> cmd->cmnd[0] == WRITE_SAME)) {
> description = "Discard failure";
> + error = -EREMOTEIO;
> action = ACTION_FAIL;
Previous DIX -EILSEQ code block sets error after action. Should follow
that order here? Purely an aesthetics thing.
On Mon, 2012-02-13 at 14:36 -0500, Mike Snitzer wrote:
> On Mon, Feb 13 2012 at 2:16pm -0500,
> Martin K. Petersen <[email protected]> wrote:
>
> > >>>>> "Mike" == Mike Snitzer <[email protected]> writes:
> >
> > >> I don't have a fundamental problem with your patch. But since we
> > >> explicitly handle ILLEGAL REQUEST with 0x20 and 0x24 in sd.c I wonder
> > >> what's broken? We should disable discard support if the WRITE SAME w/
> > >> UNMAP fails.
> >
> > Mike> Yeah, I thought the disabling would be sufficient too. But
> > Mike> unfortunately multipath doesn't inspect the request it is retrying
> > Mike> (after it fails the path the request just failed on).
> >
> > Well, we shouldn't be returning something that multipath should ever act
> > on.
> >
> > I think I understand what's going on. Can you try the following patch?
>
> Looks good to me (small nit below), it'll solve the immediate problem,
> I'll pass it on. Please add my:
>
> Acked-by: Mike Snitzer <[email protected]>
>
> But I also think establishing a baseline of TARGET_ERROR for certain
> ILLEGAL REQUEST is still sane and should go in too...
So someone still needs to package up the final agreed version with a
nice changelog and send it to the list ...
James