LinuxLists.cc - [PATCH 0/8] Drivers: scsi: storvsc: Bug fixes and improvements

[permalink] [raw]

Subject: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

Implement a simple abort handler. The host does not support "Abort"; just
ensure that all inflight I/Os have been accounted for.

Signed-off-by: K. Y. Srinivasan <[email protected]>
---
drivers/scsi/storvsc_drv.c | 22 ++++++++++++++++++++++
1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 8f1b263..82fb590 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1524,6 +1524,27 @@ static int storvsc_host_reset_handler(struct scsi_cmnd *scmnd)
return SUCCESS;
}

+static int storvsc_host_abort_handler(struct scsi_cmnd *scmnd)
+{
+ struct hv_host_device *host_dev = shost_priv(scmnd->device->host);
+ struct hv_device *device = host_dev->dev;
+
+ struct storvsc_device *stor_device;
+
+
+ stor_device = get_out_stor_device(device);
+ if (!stor_device)
+ return FAILED;
+
+ /*
+ * Just wait for all in flight I/O's to complete.
+ */
+
+ storvsc_wait_to_drain(stor_device);
+
+ return SUCCESS;
+}
+
static bool storvsc_scsi_cmd_ok(struct scsi_cmnd *scmnd)
{
bool allowed = true;
@@ -1699,6 +1720,7 @@ static struct scsi_host_template scsi_driver = {
.bios_param = storvsc_get_chs,
.queuecommand = storvsc_queuecommand,
.eh_host_reset_handler = storvsc_host_reset_handler,
+ .eh_abort_handler = storvsc_host_abort_handler,
.slave_alloc = storvsc_device_alloc,
.slave_destroy = storvsc_device_destroy,
.slave_configure = storvsc_device_configure,
--
1.7.4.1

2014-07-08 23:46:43

[permalink] [raw]

Subject: [PATCH 8/8] drivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure

On some Windows hosts on FC SANs, TEST_UNIT_READY can return SRB_STATUS_ERROR.
Correctly handle this.

Signed-off-by: K. Y. Srinivasan <[email protected]>
Cc: <[email protected]>
---
drivers/scsi/storvsc_drv.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 6540fb6..9afdd6d 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1023,6 +1023,13 @@ static void storvsc_handle_error(struct vmscsi_request *vm_srb,
case ATA_12:
set_host_byte(scmnd, DID_PASSTHROUGH);
break;
+ /*
+ * On Some Windows hosts TEST_UNIT_READY command can return
+ * SRB_STATUS_ERROR, let the upper level code deal with it
+ * based on the sense information.
+ */
+ case TEST_UNIT_READY:
+ break;
default:
set_host_byte(scmnd, DID_TARGET_FAILURE);
}
--
1.7.4.1

2014-07-08 23:46:42

[permalink] [raw]

Subject: [PATCH 5/8] Drivers: scsi: storvsc: Fix a bug in the handling of SRB status flags

SRB status can have additional information. Mask these out before processing SRB status.

Signed-off-by: K. Y. Srinivasan <[email protected]>
Cc: <[email protected]>
---
drivers/scsi/storvsc_drv.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 428dbda..8f1b263 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -299,11 +299,14 @@ enum storvsc_request_type {
*/

#define SRB_STATUS_AUTOSENSE_VALID 0x80
+#define SRB_STATUS_QUEUE_FROZEN 0x40
#define SRB_STATUS_INVALID_LUN 0x20
#define SRB_STATUS_SUCCESS 0x01
#define SRB_STATUS_ABORTED 0x02
#define SRB_STATUS_ERROR 0x04

+#define SRB_STATUS(status) \
+ (status & ~(SRB_STATUS_AUTOSENSE_VALID | SRB_STATUS_QUEUE_FROZEN))
/*
* This is the end of Protocol specific defines.
*/
@@ -1007,7 +1010,7 @@ static void storvsc_handle_error(struct vmscsi_request *vm_srb,
void (*process_err_fn)(struct work_struct *work);
bool do_work = false;

- switch (vm_srb->srb_status) {
+ switch (SRB_STATUS(vm_srb->srb_status)) {
case SRB_STATUS_ERROR:
/*
* If there is an error; offline the device since all
--
1.7.4.1

2014-07-08 23:46:41

[permalink] [raw]

Subject: [PATCH 1/8] Drivers: scsi: storvsc: Change the limits to reflect the values on the host

Hyper-V hosts can support multiple targets and multiple channels and larger number of
LUNs per target. Update the code to reflect this. With this patch we can correctly
enumerate all the paths in a multi-path storage environment.

Signed-off-by: K. Y. Srinivasan <[email protected]>
Cc: <[email protected]>
---
drivers/scsi/storvsc_drv.c | 53 ++++++++++++++++++++++++++++++++-----------
1 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 9969fa1..2e4131c 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -331,16 +331,19 @@ static int storvsc_timeout = 180;
static void storvsc_on_channel_callback(void *context);

/*
- * In Hyper-V, each port/path/target maps to 1 scsi host adapter. In
- * reality, the path/target is not used (ie always set to 0) so our
- * scsi host adapter essentially has 1 bus with 1 target that contains
- * up to 256 luns.
+ * In Hyper-V, each port/path/target maps to 1 scsi host adapter.
*/
-#define STORVSC_MAX_LUNS_PER_TARGET 64
-#define STORVSC_MAX_TARGETS 1
-#define STORVSC_MAX_CHANNELS 1
+#define STORVSC_MAX_LUNS_PER_TARGET 255
+#define STORVSC_MAX_TARGETS 2
+#define STORVSC_MAX_CHANNELS 8

+#define STORVSC_FC_MAX_LUNS_PER_TARGET 255
+#define STORVSC_FC_MAX_TARGETS 128
+#define STORVSC_FC_MAX_CHANNELS 8

+#define STORVSC_IDE_MAX_LUNS_PER_TARGET 64
+#define STORVSC_IDE_MAX_TARGETS 1
+#define STORVSC_IDE_MAX_CHANNELS 1

struct storvsc_cmd_request {
struct list_head entry;
@@ -1690,7 +1693,7 @@ static struct scsi_host_template scsi_driver = {
.slave_alloc = storvsc_device_alloc,
.slave_destroy = storvsc_device_destroy,
.slave_configure = storvsc_device_configure,
- .cmd_per_lun = 1,
+ .cmd_per_lun = 255,
/* 64 max_queue * 1 target */
.can_queue = STORVSC_MAX_IO_REQUESTS*STORVSC_MAX_TARGETS,
.this_id = -1,
@@ -1789,12 +1792,34 @@ static int storvsc_probe(struct hv_device *device,
host_dev->path = stor_device->path_id;
host_dev->target = stor_device->target_id;

- /* max # of devices per target */
- host->max_lun = STORVSC_MAX_LUNS_PER_TARGET;
- /* max # of targets per channel */
- host->max_id = STORVSC_MAX_TARGETS;
- /* max # of channels */
- host->max_channel = STORVSC_MAX_CHANNELS - 1;
+ switch (dev_id->driver_data) {
+ case SFC_GUID:
+ /* max # of devices per target */
+ host->max_lun = STORVSC_FC_MAX_LUNS_PER_TARGET;
+ /* max # of targets per channel */
+ host->max_id = STORVSC_FC_MAX_TARGETS;
+ /* max # of channels */
+ host->max_channel = STORVSC_FC_MAX_CHANNELS - 1;
+ break;
+
+ case SCSI_GUID:
+ /* max # of devices per target */
+ host->max_lun = STORVSC_MAX_LUNS_PER_TARGET;
+ /* max # of targets per channel */
+ host->max_id = STORVSC_MAX_TARGETS;
+ /* max # of channels */
+ host->max_channel = STORVSC_MAX_CHANNELS - 1;
+ break;
+
+ default:
+ /* max # of devices per target */
+ host->max_lun = STORVSC_IDE_MAX_LUNS_PER_TARGET;
+ /* max # of targets per channel */
+ host->max_id = STORVSC_IDE_MAX_TARGETS;
+ /* max # of channels */
+ host->max_channel = STORVSC_IDE_MAX_CHANNELS - 1;
+ break;
+ }
/* max cmd length */
host->max_cmd_len = STORVSC_MAX_CMD_LEN;

--
1.7.4.1

2014-07-08 23:46:40

[permalink] [raw]

Subject: [PATCH 7/8] drivers: scsi: storvsc: Set srb_flags in all cases

Correctly set SRB flags for all valid I/O directions. Some IHV drivers on the
Windows host require this.

Signed-off-by: K. Y. Srinivasan <[email protected]>
Cc: <[email protected]>
---
drivers/scsi/storvsc_drv.c | 12 +++++-------
1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 82fb590..6540fb6 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1613,26 +1613,24 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
vm_srb = &cmd_request->vstor_packet.vm_srb;
vm_srb->win8_extension.time_out_value = 60;

+ vm_srb->win8_extension.srb_flags |=
+ (SRB_FLAGS_QUEUE_ACTION_ENABLE |
+ SRB_FLAGS_DISABLE_SYNCH_TRANSFER);

/* Build the SRB */
switch (scmnd->sc_data_direction) {
case DMA_TO_DEVICE:
vm_srb->data_in = WRITE_TYPE;
vm_srb->win8_extension.srb_flags |= SRB_FLAGS_DATA_OUT;
- vm_srb->win8_extension.srb_flags |=
- (SRB_FLAGS_QUEUE_ACTION_ENABLE |
- SRB_FLAGS_DISABLE_SYNCH_TRANSFER);
break;
case DMA_FROM_DEVICE:
vm_srb->data_in = READ_TYPE;
vm_srb->win8_extension.srb_flags |= SRB_FLAGS_DATA_IN;
- vm_srb->win8_extension.srb_flags |=
- (SRB_FLAGS_QUEUE_ACTION_ENABLE |
- SRB_FLAGS_DISABLE_SYNCH_TRANSFER);
break;
default:
vm_srb->data_in = UNKNOWN_TYPE;
- vm_srb->win8_extension.srb_flags = 0;
+ vm_srb->win8_extension.srb_flags |= (SRB_FLAGS_DATA_IN |
+ SRB_FLAGS_DATA_OUT);
break;
}

--
1.7.4.1

2014-07-08 23:46:39

[permalink] [raw]

Subject: [PATCH 3/8] Drivers: scsi: storvsc: Fix a bug in handling VMBUS protocol version

Based on the negotiated VMBUS protocol version, we adjust the size of the storage
protocol messages. The two sizes we currently handle or pre-win8 and post-win8.
Win WS2012 R2, we are negotiating higher VMBUS protocol version than the win8
version. Make adjustments to correctly handle this.

Signed-off-by: K. Y. Srinivasan <[email protected]>
Cc: <[email protected]>
---
drivers/scsi/storvsc_drv.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 759853c..d9d8051 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1751,19 +1751,19 @@ static int storvsc_probe(struct hv_device *device,
* set state to properly communicate with the host.
*/

- if (vmbus_proto_version == VERSION_WIN8) {
- sense_buffer_size = POST_WIN7_STORVSC_SENSE_BUFFER_SIZE;
- vmscsi_size_delta = 0;
- vmstor_current_major = VMSTOR_WIN8_MAJOR;
- vmstor_current_minor = VMSTOR_WIN8_MINOR;
- } else {
+ if ((vmbus_proto_version == VERSION_WS2008) ||
+ (vmbus_proto_version == VERSION_WIN7)) {
sense_buffer_size = PRE_WIN8_STORVSC_SENSE_BUFFER_SIZE;
vmscsi_size_delta = sizeof(struct vmscsi_win8_extension);
vmstor_current_major = VMSTOR_WIN7_MAJOR;
vmstor_current_minor = VMSTOR_WIN7_MINOR;
+ } else {
+ sense_buffer_size = POST_WIN7_STORVSC_SENSE_BUFFER_SIZE;
+ vmscsi_size_delta = 0;
+ vmstor_current_major = VMSTOR_WIN8_MAJOR;
+ vmstor_current_minor = VMSTOR_WIN8_MINOR;
}

-
host = scsi_host_alloc(&scsi_driver,
sizeof(struct hv_host_device));
if (!host)
--
1.7.4.1

2014-07-08 23:46:38

[permalink] [raw]

Subject: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

Host does not handle WRITE_SAME_16; filter this command out. This patch
is required to handle large devices (greater than 2 TB disks).

Signed-off-by: K. Y. Srinivasan <[email protected]>
Cc: <[email protected]>
---
drivers/scsi/storvsc_drv.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index d9d8051..428dbda 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1529,6 +1529,7 @@ static bool storvsc_scsi_cmd_ok(struct scsi_cmnd *scmnd)
switch (scsi_op) {
/* the host does not handle WRITE_SAME, log accident usage */
case WRITE_SAME:
+ case WRITE_SAME_16:
/*
* smartd sends this command and the host does not handle
* this. So, don't send it.
--
1.7.4.1

2014-07-08 23:46:36

[permalink] [raw]

Subject: [PATCH 2/8] Drivers: scsi: storvsc: Filter commands based on the storage protocol version

Going forward it is possible that some of the commands that are not currently
implemented will be implemented on future Windows hosts. Make command filtering
depend on the host version.

Signed-off-by: K. Y. Srinivasan <[email protected]>
Cc: <[email protected]>
---
drivers/scsi/storvsc_drv.c | 11 ++++++++---
1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 2e4131c..759853c 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1556,9 +1556,14 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
struct vmscsi_request *vm_srb;
struct stor_mem_pools *memp = scmnd->device->hostdata;

- if (!storvsc_scsi_cmd_ok(scmnd)) {
- scmnd->scsi_done(scmnd);
- return 0;
+ if (vmstor_current_major <= VMSTOR_WIN8_MAJOR) {
+ /*
+ * On legacy hosts filter unimplemented commands.
+ */
+ if (!storvsc_scsi_cmd_ok(scmnd)) {
+ scmnd->scsi_done(scmnd);
+ return 0;
+ }
}

request_size = sizeof(struct storvsc_cmd_request);
--
1.7.4.1

2014-07-09 08:40:00

[permalink] [raw]

Subject: Re: [PATCH 1/8] Drivers: scsi: storvsc: Change the limits to reflect the values on the host

On Tue, Jul 08, 2014 at 05:46:45PM -0700, K. Y. Srinivasan wrote:
> + * In Hyper-V, each port/path/target maps to 1 scsi host adapter.

Does it still? The STORVSC_FC_MAX_TARGETS define suggests otherwise.

> - .cmd_per_lun = 1,
> + .cmd_per_lun = 255,

This looks like an unrelated change.

> + /* max # of devices per target */
> + host->max_lun = STORVSC_FC_MAX_LUNS_PER_TARGET;
> + /* max # of targets per channel */
> + host->max_id = STORVSC_FC_MAX_TARGETS;
> + /* max # of channels */
> + host->max_channel = STORVSC_FC_MAX_CHANNELS - 1;

I don't think these comments add any value..

Also any reason you use off by one defines for max_channel, but not the
others?

2014-07-09 08:40:57

[permalink] [raw]

Subject: Re: [PATCH 2/8] Drivers: scsi: storvsc: Filter commands based on the storage protocol version

On Tue, Jul 08, 2014 at 05:46:46PM -0700, K. Y. Srinivasan wrote:
> Going forward it is possible that some of the commands that are not currently
> implemented will be implemented on future Windows hosts. Make command filtering
> depend on the host version.

> + if (vmstor_current_major <= VMSTOR_WIN8_MAJOR) {
> + /*
> + * On legacy hosts filter unimplemented commands.
> + */
> + if (!storvsc_scsi_cmd_ok(scmnd)) {
> + scmnd->scsi_done(scmnd);
> + return 0;
> + }

So post-Win8 versions don't need command filering as they can properly
reject commands they don't understand? If that's the case please state
it in the patch description.

2014-07-09 08:41:56

[permalink] [raw]

Subject: Re: [PATCH 3/8] Drivers: scsi: storvsc: Fix a bug in handling VMBUS protocol version

> + if ((vmbus_proto_version == VERSION_WS2008) ||
> + (vmbus_proto_version == VERSION_WIN7)) {

This has superflous braces and doesn't use proper Linux indentation.

But I think simply using a switch here might be cleaner anyway.

2014-07-09 08:43:05

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Tue, Jul 08, 2014 at 05:46:48PM -0700, K. Y. Srinivasan wrote:
> Host does not handle WRITE_SAME_16; filter this command out. This patch
> is required to handle large devices (greater than 2 TB disks).

Storvsc already sets the no_write_same flag, where is the command coming
from?

2014-07-09 08:43:27

[permalink] [raw]

Subject: Re: [PATCH 5/8] Drivers: scsi: storvsc: Fix a bug in the handling of SRB status flags

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2014-07-09 08:44:17

[permalink] [raw]

Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote:
> Implement a simple abort handler. The host does not support "Abort"; just
> ensure that all inflight I/Os have been accounted for.

The abort handler should abort a single command, not wait for all of
them. What issue do you see that this tries to address?

2014-07-09 08:44:56

[permalink] [raw]

Subject: Re: [PATCH 7/8] drivers: scsi: storvsc: Set srb_flags in all cases

On Tue, Jul 08, 2014 at 05:46:51PM -0700, K. Y. Srinivasan wrote:
> Correctly set SRB flags for all valid I/O directions. Some IHV drivers on the
> Windows host require this.

What are IHV drivers?

Otherwise looks fine,

Reviewed-by: Christoph Hellwig <[email protected]>

2014-07-09 08:46:47

[permalink] [raw]

Subject: Re: [PATCH 8/8] drivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure

On Tue, Jul 08, 2014 at 05:46:52PM -0700, K. Y. Srinivasan wrote:
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1023,6 +1023,13 @@ static void storvsc_handle_error(struct vmscsi_request *vm_srb,
> case ATA_12:
> set_host_byte(scmnd, DID_PASSTHROUGH);
> break;
> + /*
> + * On Some Windows hosts TEST_UNIT_READY command can return
> + * SRB_STATUS_ERROR, let the upper level code deal with it
> + * based on the sense information.
> + */
> + case TEST_UNIT_READY:
> + break;

Don't we need to set an error in the command for the error handler to
take action? Or is this propagated elsewhere?

2014-07-09 18:35:52

[permalink] [raw]

Subject: RE: [PATCH 8/8] drivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Wednesday, July 9, 2014 1:47 AM
> To: KY Srinivasan
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 8/8] drivers: scsi: storvsc: Correctly handle
> TEST_UNIT_READY failure
>
> On Tue, Jul 08, 2014 at 05:46:52PM -0700, K. Y. Srinivasan wrote:
> > --- a/drivers/scsi/storvsc_drv.c
> > +++ b/drivers/scsi/storvsc_drv.c
> > @@ -1023,6 +1023,13 @@ static void storvsc_handle_error(struct
> vmscsi_request *vm_srb,
> > case ATA_12:
> > set_host_byte(scmnd, DID_PASSTHROUGH);
> > break;
> > + /*
> > + * On Some Windows hosts TEST_UNIT_READY command can
> return
> > + * SRB_STATUS_ERROR, let the upper level code deal with it
> > + * based on the sense information.
> > + */
> > + case TEST_UNIT_READY:
> > + break;
>
> Don't we need to set an error in the command for the error handler to take
> action? Or is this propagated elsewhere?

The host sets the appropriate scsi response and sense information that allows the upper-level scsi stack to appropriately recover. We are just making sure that we won't mark the target as failed which is what would happen in the absence of this patch since the host has set a very generic SRB error code that indicates failure.

Regards,

K. Y

2014-07-09 18:40:45

[permalink] [raw]

Subject: RE: [PATCH 7/8] drivers: scsi: storvsc: Set srb_flags in all cases

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Wednesday, July 9, 2014 1:45 AM
> To: KY Srinivasan
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 7/8] drivers: scsi: storvsc: Set srb_flags in all cases
>
> On Tue, Jul 08, 2014 at 05:46:51PM -0700, K. Y. Srinivasan wrote:
> > Correctly set SRB flags for all valid I/O directions. Some IHV drivers
> > on the Windows host require this.
>
> What are IHV drivers?

If the target is a SAN device, the host simply passes the request over to the native driver stack on the host. Some specific hardware (IHV - independent hadware vendor) drivers on Windows require that SRB flags be correctly set in all cases.

Regards,

K. Y
>
>
> Otherwise looks fine,
>
> Reviewed-by: Christoph Hellwig <[email protected]>

2014-07-09 18:51:45

[permalink] [raw]

Subject: RE: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Wednesday, July 9, 2014 1:44 AM
> To: KY Srinivasan
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
>
> On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote:
> > Implement a simple abort handler. The host does not support "Abort";
> > just ensure that all inflight I/Os have been accounted for.
>
> The abort handler should abort a single command, not wait for all of them.
> What issue do you see that this tries to address?

On Azure, we sometimes have unbounded I/O latencies and some distributions (such as SLES12) based on recent kernels are invoking
the "Abort Handler". Unfortunately, our scsi emulation on the host does not support aborting a command.
The issue I have seen is that the upper level scsi code attempts error recovery when the command times out and finally frees up the command.
The host subsequently responds to the command that has timed out and since the memory has been freed up, we end up touching freed memory
in this driver. Since the host is also doing error recovery, by just delaying the error handler in the guest until we can account for all the in-flight commands,
we can get around the problem.

Regards,

K. Y

2014-07-09 19:53:00

[permalink] [raw]

Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Wednesday, July 9, 2014 1:43 AM
> To: KY Srinivasan
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
>
> On Tue, Jul 08, 2014 at 05:46:48PM -0700, K. Y. Srinivasan wrote:
> > Host does not handle WRITE_SAME_16; filter this command out. This
> > patch is required to handle large devices (greater than 2 TB disks).
>
> Storvsc already sets the no_write_same flag, where is the command coming
> from?

In spite of this flag, I see WRITE_SAME_16 being issued when I format a device bigger than 2 TB;
I tried both xfs and ext4. Windows hosts currently do not handle unsupported commands correctly -
The information returned is not sufficient to effect recovery in the Linux guest. While this may be addressed in
future hosts, this patch fixes the problem.

K. Y

2014-07-09 19:56:38

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Wed, 2014-07-09 at 19:52 +0000, KY Srinivasan wrote:
>
> > -----Original Message-----
> > From: Christoph Hellwig [mailto:[email protected]]
> > Sent: Wednesday, July 9, 2014 1:43 AM
> > To: KY Srinivasan
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
> >
> > On Tue, Jul 08, 2014 at 05:46:48PM -0700, K. Y. Srinivasan wrote:
> > > Host does not handle WRITE_SAME_16; filter this command out. This
> > > patch is required to handle large devices (greater than 2 TB disks).
> >
> > Storvsc already sets the no_write_same flag, where is the command coming
> > from?
>
> In spite of this flag, I see WRITE_SAME_16 being issued when I format a device bigger than 2 TB;
> I tried both xfs and ext4. Windows hosts currently do not handle unsupported commands correctly -
> The information returned is not sufficient to effect recovery in the Linux guest. While this may be addressed in
> future hosts, this patch fixes the problem.

What Christoph means is that this looks like a bug somewhere in SCSI
itself. That means we need to find it and kill it, not add workarounds
to every driver that sets no_write_same ...

James

2014-07-09 20:01:35

[permalink] [raw]

Subject: RE: [PATCH 3/8] Drivers: scsi: storvsc: Fix a bug in handling VMBUS protocol version

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Wednesday, July 9, 2014 1:42 AM
> To: KY Srinivasan
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 3/8] Drivers: scsi: storvsc: Fix a bug in handling VMBUS
> protocol version
>
> > + if ((vmbus_proto_version == VERSION_WS2008) ||
> > + (vmbus_proto_version == VERSION_WIN7)) {
>
> This has superflous braces and doesn't use proper Linux indentation.
>
> But I think simply using a switch here might be cleaner anyway.
I will do as you have suggested.

K. Y

2014-07-09 20:03:07

[permalink] [raw]

Subject: RE: [PATCH 2/8] Drivers: scsi: storvsc: Filter commands based on the storage protocol version

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Wednesday, July 9, 2014 1:41 AM
> To: KY Srinivasan
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 2/8] Drivers: scsi: storvsc: Filter commands based on the
> storage protocol version
>
> On Tue, Jul 08, 2014 at 05:46:46PM -0700, K. Y. Srinivasan wrote:
> > Going forward it is possible that some of the commands that are not
> > currently implemented will be implemented on future Windows hosts.
> > Make command filtering depend on the host version.
>
> > + if (vmstor_current_major <= VMSTOR_WIN8_MAJOR) {
> > + /*
> > + * On legacy hosts filter unimplemented commands.
> > + */
> > + if (!storvsc_scsi_cmd_ok(scmnd)) {
> > + scmnd->scsi_done(scmnd);
> > + return 0;
> > + }
>
> So post-Win8 versions don't need command filering as they can properly
> reject commands they don't understand? If that's the case please state it in
> the patch description.

I will add the comments you have suggested.

K. Y

2014-07-09 20:07:30

[permalink] [raw]

Subject: RE: [PATCH 1/8] Drivers: scsi: storvsc: Change the limits to reflect the values on the host

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Wednesday, July 9, 2014 1:40 AM
> To: KY Srinivasan
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 1/8] Drivers: scsi: storvsc: Change the limits to reflect the
> values on the host
>
> On Tue, Jul 08, 2014 at 05:46:45PM -0700, K. Y. Srinivasan wrote:
> > + * In Hyper-V, each port/path/target maps to 1 scsi host adapter.
>
> Does it still? The STORVSC_FC_MAX_TARGETS define suggests otherwise.

I will fix the comments and get rid of unnecessary comments.

>
> > - .cmd_per_lun = 1,
> > + .cmd_per_lun = 255,
>
> This looks like an unrelated change.

I will have a separate patch for this.
>
> > + /* max # of devices per target */
> > + host->max_lun = STORVSC_FC_MAX_LUNS_PER_TARGET;
> > + /* max # of targets per channel */
> > + host->max_id = STORVSC_FC_MAX_TARGETS;
> > + /* max # of channels */
> > + host->max_channel = STORVSC_FC_MAX_CHANNELS - 1;
>
> I don't think these comments add any value..

I will get rid of the comments.

>
> Also any reason you use off by one defines for max_channel, but not the
> others?

No particular reason; I will clean this up.

Thanks Christoph for the detailed comments. I will re-spin these after I address your comments.

Regards,

K. Y

2014-07-09 21:14:07

[permalink] [raw]

Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

> -----Original Message-----
> From: James Bottomley [mailto:[email protected]]
> Sent: Wednesday, July 9, 2014 12:57 PM
> To: KY Srinivasan
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
>
> On Wed, 2014-07-09 at 19:52 +0000, KY Srinivasan wrote:
> >
> > > -----Original Message-----
> > > From: Christoph Hellwig [mailto:[email protected]]
> > > Sent: Wednesday, July 9, 2014 1:43 AM
> > > To: KY Srinivasan
> > > Cc: [email protected]; [email protected];
> > > [email protected]; [email protected]; [email protected];
> > > [email protected]; [email protected];
> > > [email protected]
> > > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter
> > > WRITE_SAME_16
> > >
> > > On Tue, Jul 08, 2014 at 05:46:48PM -0700, K. Y. Srinivasan wrote:
> > > > Host does not handle WRITE_SAME_16; filter this command out. This
> > > > patch is required to handle large devices (greater than 2 TB disks).
> > >
> > > Storvsc already sets the no_write_same flag, where is the command
> > > coming from?
> >
> > In spite of this flag, I see WRITE_SAME_16 being issued when I format
> > a device bigger than 2 TB; I tried both xfs and ext4. Windows hosts
> > currently do not handle unsupported commands correctly - The
> > information returned is not sufficient to effect recovery in the Linux guest.
> While this may be addressed in future hosts, this patch fixes the problem.
>
> What Christoph means is that this looks like a bug somewhere in SCSI itself.
> That means we need to find it and kill it, not add workarounds to every driver
> that sets no_write_same ...

James,

I will try to isolate this issue in the SCSI stack. If it is ok with you guys, I would still want to filter WRITE_SAME_16
(as we currently do WRITE_SAME) in our driver since this would address the problem for a large number of
customers on our platform.

Regards,

K. Y

2014-07-09 22:27:31

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Wed, 2014-07-09 at 21:14 +0000, KY Srinivasan wrote:
>
> > -----Original Message-----
> > From: James Bottomley [mailto:[email protected]]
> > Sent: Wednesday, July 9, 2014 12:57 PM
> > To: KY Srinivasan
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
> >
> > On Wed, 2014-07-09 at 19:52 +0000, KY Srinivasan wrote:
> > >
> > > > -----Original Message-----
> > > > From: Christoph Hellwig [mailto:[email protected]]
> > > > Sent: Wednesday, July 9, 2014 1:43 AM
> > > > To: KY Srinivasan
> > > > Cc: [email protected]; [email protected];
> > > > [email protected]; [email protected]; [email protected];
> > > > [email protected]; [email protected];
> > > > [email protected]
> > > > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter
> > > > WRITE_SAME_16
> > > >
> > > > On Tue, Jul 08, 2014 at 05:46:48PM -0700, K. Y. Srinivasan wrote:
> > > > > Host does not handle WRITE_SAME_16; filter this command out. This
> > > > > patch is required to handle large devices (greater than 2 TB disks).
> > > >
> > > > Storvsc already sets the no_write_same flag, where is the command
> > > > coming from?
> > >
> > > In spite of this flag, I see WRITE_SAME_16 being issued when I format
> > > a device bigger than 2 TB; I tried both xfs and ext4. Windows hosts
> > > currently do not handle unsupported commands correctly - The
> > > information returned is not sufficient to effect recovery in the Linux guest.
> > While this may be addressed in future hosts, this patch fixes the problem.
> >
> > What Christoph means is that this looks like a bug somewhere in SCSI itself.
> > That means we need to find it and kill it, not add workarounds to every driver
> > that sets no_write_same ...
>
> James,
>
> I will try to isolate this issue in the SCSI stack. If it is ok with you guys, I would still want to filter WRITE_SAME_16
> (as we currently do WRITE_SAME) in our driver since this would address the problem for a large number of
> customers on our platform.

If we fix it at source, why would there be any need to filter? That's
the reason the no_write_same flag was introduced. If we can find and
fix the bug, it can go back into the stable trees as a bug fix, hence
nothing should ever emit write_same(10 or 16) and additional driver code
is redundant (and counter productive, since if this ever breaks again
you're our best canary).

This looks like it might be the problem but Martin should confirm (I
think the problem comes to us from the RC16 code which unconditionally
sets WS16).

James

---

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 6825eda..8353a4c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -634,6 +634,23 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
max(sdkp->physical_block_size,
sdkp->unmap_granularity * logical_block_size);

+ if (sdkp->device->host->no_write_same) {
+ switch(mode) {
+
+ case SD_LBP_WS16:
+ case SD_LBP_WS10:
+ case SD_LBP_ZERO:
+ /*
+ * filter out all the WRITE SAME modes and map them
+ * directly to UNMAP
+ */
+ mode = SD_LBP_UNMAP;
+ /* fall through */
+ default:
+ /* everything else is OK */
+ break;
+ }
+ }
sdkp->provisioning_mode = mode;

switch (mode) {

2014-07-09 22:36:32

[permalink] [raw]

Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

> -----Original Message-----
> From: James Bottomley [mailto:[email protected]]
> Sent: Wednesday, July 9, 2014 3:27 PM
> To: KY Srinivasan
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
>
> On Wed, 2014-07-09 at 21:14 +0000, KY Srinivasan wrote:
> >
> > > -----Original Message-----
> > > From: James Bottomley [mailto:[email protected]]
> > > Sent: Wednesday, July 9, 2014 12:57 PM
> > > To: KY Srinivasan
> > > Cc: [email protected]; [email protected];
> > > [email protected]; [email protected];
> > > [email protected]; [email protected];
> > > [email protected]; [email protected]
> > > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter
> > > WRITE_SAME_16
> > >
> > > On Wed, 2014-07-09 at 19:52 +0000, KY Srinivasan wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: Christoph Hellwig [mailto:[email protected]]
> > > > > Sent: Wednesday, July 9, 2014 1:43 AM
> > > > > To: KY Srinivasan
> > > > > Cc: [email protected]; [email protected];
> > > > > [email protected]; [email protected];
> [email protected];
> > > > > [email protected]; [email protected];
> > > > > [email protected]
> > > > > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter
> > > > > WRITE_SAME_16
> > > > >
> > > > > On Tue, Jul 08, 2014 at 05:46:48PM -0700, K. Y. Srinivasan wrote:
> > > > > > Host does not handle WRITE_SAME_16; filter this command out.
> > > > > > This patch is required to handle large devices (greater than 2 TB
> disks).
> > > > >
> > > > > Storvsc already sets the no_write_same flag, where is the
> > > > > command coming from?
> > > >
> > > > In spite of this flag, I see WRITE_SAME_16 being issued when I
> > > > format a device bigger than 2 TB; I tried both xfs and ext4.
> > > > Windows hosts currently do not handle unsupported commands
> > > > correctly - The information returned is not sufficient to effect recovery
> in the Linux guest.
> > > While this may be addressed in future hosts, this patch fixes the problem.
> > >
> > > What Christoph means is that this looks like a bug somewhere in SCSI
> itself.
> > > That means we need to find it and kill it, not add workarounds to
> > > every driver that sets no_write_same ...
> >
> > James,
> >
> > I will try to isolate this issue in the SCSI stack. If it is ok with
> > you guys, I would still want to filter WRITE_SAME_16 (as we currently
> > do WRITE_SAME) in our driver since this would address the problem for a
> large number of customers on our platform.
>
> If we fix it at source, why would there be any need to filter? That's the
> reason the no_write_same flag was introduced. If we can find and fix the
> bug, it can go back into the stable trees as a bug fix, hence nothing should
> ever emit write_same(10 or 16) and additional driver code is redundant (and
> counter productive, since if this ever breaks again you're our best canary).
>
> This looks like it might be the problem but Martin should confirm (I think the
> problem comes to us from the RC16 code which unconditionally sets WS16).

Ok; I am concerned about older kernels that do not have no_write_same flag.
I suppose I can work directly with these Distros and give them a choice: either implement
the no_write_same flag or filter the command in our driver. I will not include this patch when
I resubmit this patch-set.

Thanks,

K. Y

2014-07-09 23:39:47

[permalink] [raw]

Subject: RE: [PATCH 1/8] Drivers: scsi: storvsc: Change the limits to reflect the values on the host

> -----Original Message-----
> From: [email protected] [mailto:driverdev-
> [email protected]] On Behalf Of KY Srinivasan
> Sent: Wednesday, July 9, 2014 1:07 PM
> To: Christoph Hellwig
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: RE: [PATCH 1/8] Drivers: scsi: storvsc: Change the limits to reflect the
> values on the host
>
>
>
> > -----Original Message-----
> > From: Christoph Hellwig [mailto:[email protected]]
> > Sent: Wednesday, July 9, 2014 1:40 AM
> > To: KY Srinivasan
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Subject: Re: [PATCH 1/8] Drivers: scsi: storvsc: Change the limits to
> > reflect the values on the host
> >
> > On Tue, Jul 08, 2014 at 05:46:45PM -0700, K. Y. Srinivasan wrote:
> > > + * In Hyper-V, each port/path/target maps to 1 scsi host adapter.
> >
> > Does it still? The STORVSC_FC_MAX_TARGETS define suggests otherwise.
>
> I will fix the comments and get rid of unnecessary comments.
>
> >
> > > - .cmd_per_lun = 1,
> > > + .cmd_per_lun = 255,
> >
> > This looks like an unrelated change.
>
> I will have a separate patch for this.
> >
> > > + /* max # of devices per target */
> > > + host->max_lun = STORVSC_FC_MAX_LUNS_PER_TARGET;
> > > + /* max # of targets per channel */
> > > + host->max_id = STORVSC_FC_MAX_TARGETS;
> > > + /* max # of channels */
> > > + host->max_channel = STORVSC_FC_MAX_CHANNELS - 1;
> >
> > I don't think these comments add any value..
>
> I will get rid of the comments.
>
> >
> > Also any reason you use off by one defines for max_channel, but not
> > the others?
>
> No particular reason; I will clean this up.

On further examination max_channel is the maximum number of channels including channel 0. Thus the value set for
max_channel is correct. max_id appears to indicate the limit. In scsi_scan_channel the loop control is (id < max_id) and
hence the value I have here is correct. max_lun is also used like max_id to indicate the limit. In scsi_sequential_lun_scan()
the loop control is (lun < max_dev_lun) and hence I think the value I have here is fine.

Regards,

K. Y

2014-07-10 10:07:24

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Wed, Jul 09, 2014 at 10:36:26PM +0000, KY Srinivasan wrote:
> Ok; I am concerned about older kernels that do not have no_write_same flag.
> I suppose I can work directly with these Distros and give them a choice: either implement
> the no_write_same flag or filter the command in our driver. I will not include this patch when

Exactly. In general they should be interested in the flag as various
raid controllers react badly to it as well.

2014-07-10 10:13:24

[permalink] [raw]

Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

On Wed, Jul 09, 2014 at 06:51:38PM +0000, KY Srinivasan wrote:
> On Azure, we sometimes have unbounded I/O latencies and some distributions
> (such as SLES12) based on recent kernels are invoking the "Abort Handler".

Any kernel will invoke the abort handler if present, and then escalate
to the various resets.

> Unfortunately, our scsi emulation on the host does not support aborting
> a command. The issue I have seen is that the upper level scsi code attempts
> error recovery when the command times out and finally frees up the command.
> The host subsequently responds to the command that has timed out and since
> the memory has been freed up, we end up touching freed memory in this
> driver. Since the host is also doing error recovery, by just delaying the
> error handler in the guest until we can account for all the in-flight
> commands, we can get around the problem.

The storvsc driver does implement an bus reset error handler, and
after that completes successfully the midlayer frees the commands,
and the driver has to deal with this and not call scsi_done after
the reset finished (normally you'd expect the hardware to not complete
requests after an reset).

Note that you could increase the timeout and/or implement an
eh_timed_out handler that just returns BLK_EH_RESET_TIMER, but if the
completion takes too long the expectation is that a command will
eventually finish instead of beeing delayed by an unmound amount.

2014-07-10 10:17:16

[permalink] [raw]

Subject: Re: [PATCH 7/8] drivers: scsi: storvsc: Set srb_flags in all cases

On Wed, Jul 09, 2014 at 06:40:09PM +0000, KY Srinivasan wrote:
> > On Tue, Jul 08, 2014 at 05:46:51PM -0700, K. Y. Srinivasan wrote:
> > > Correctly set SRB flags for all valid I/O directions. Some IHV drivers
> > > on the Windows host require this.
> >
> > What are IHV drivers?
>
> If the target is a SAN device, the host simply passes the request over to the native driver stack on the host. Some specific hardware (IHV - independent hadware vendor) drivers on Windows require that SRB flags be correctly set in all cases.
>

I'm fine with putting this in, but treating I/O request from guests as
trusted in a hypervisor doesn't seem like a good idea in general..

2014-07-10 10:18:54

[permalink] [raw]

Subject: Re: [PATCH 7/8] drivers: scsi: storvsc: Set srb_flags in all cases

> default:
> vm_srb->data_in = UNKNOWN_TYPE;
> - vm_srb->win8_extension.srb_flags = 0;
> + vm_srb->win8_extension.srb_flags |= (SRB_FLAGS_DATA_IN |
> + SRB_FLAGS_DATA_OUT);

This would usually be a command that doesn't transfer data (e.g.
TEST_UNIT_READY or SYNCHRONIZE_CACHE), do you really want to set the in
and out flags here?

2014-07-10 10:33:09

by Richard Weinberger

[permalink] [raw]

Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

On Wed, Jul 9, 2014 at 8:51 PM, KY Srinivasan <[email protected]> wrote:
>
>
>> -----Original Message-----
>> From: Christoph Hellwig [mailto:[email protected]]
>> Sent: Wednesday, July 9, 2014 1:44 AM
>> To: KY Srinivasan
>> Cc: [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]
>> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
>>
>> On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote:
>> > Implement a simple abort handler. The host does not support "Abort";
>> > just ensure that all inflight I/Os have been accounted for.
>>
>> The abort handler should abort a single command, not wait for all of them.
>> What issue do you see that this tries to address?
>
> On Azure, we sometimes have unbounded I/O latencies and some distributions (such as SLES12) based on recent kernels are invoking
> the "Abort Handler". Unfortunately, our scsi emulation on the host does not support aborting a command.
> The issue I have seen is that the upper level scsi code attempts error recovery when the command times out and finally frees up the command.
> The host subsequently responds to the command that has timed out and since the memory has been freed up, we end up touching freed memory
> in this driver. Since the host is also doing error recovery, by just delaying the error handler in the guest until we can account for all the in-flight commands,
> we can get around the problem.

I see strange issues in Azure and maybe they are related to this.
Some Linux machines crash in a way that no disk IO is possible (thus,
no SSH for me) but they still respond to
ping. It happens rather seldom (every few weeks).

Do you see similar symptoms?

--
Thanks,
//richard

2014-07-10 21:02:50

[permalink] [raw]

Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

> -----Original Message-----
> From: James Bottomley [mailto:[email protected]]
> Sent: Wednesday, July 9, 2014 3:27 PM
> To: KY Srinivasan
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
>
> On Wed, 2014-07-09 at 21:14 +0000, KY Srinivasan wrote:
> >
> > > -----Original Message-----
> > > From: James Bottomley [mailto:[email protected]]
> > > Sent: Wednesday, July 9, 2014 12:57 PM
> > > To: KY Srinivasan
> > > Cc: [email protected]; [email protected];
> > > [email protected]; [email protected];
> > > [email protected]; [email protected];
> > > [email protected]; [email protected]
> > > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter
> > > WRITE_SAME_16
> > >
> > > On Wed, 2014-07-09 at 19:52 +0000, KY Srinivasan wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: Christoph Hellwig [mailto:[email protected]]
> > > > > Sent: Wednesday, July 9, 2014 1:43 AM
> > > > > To: KY Srinivasan
> > > > > Cc: [email protected]; [email protected];
> > > > > [email protected]; [email protected];
> [email protected];
> > > > > [email protected]; [email protected];
> > > > > [email protected]
> > > > > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter
> > > > > WRITE_SAME_16
> > > > >
> > > > > On Tue, Jul 08, 2014 at 05:46:48PM -0700, K. Y. Srinivasan wrote:
> > > > > > Host does not handle WRITE_SAME_16; filter this command out.
> > > > > > This patch is required to handle large devices (greater than 2 TB
> disks).
> > > > >
> > > > > Storvsc already sets the no_write_same flag, where is the
> > > > > command coming from?
> > > >
> > > > In spite of this flag, I see WRITE_SAME_16 being issued when I
> > > > format a device bigger than 2 TB; I tried both xfs and ext4.
> > > > Windows hosts currently do not handle unsupported commands
> > > > correctly - The information returned is not sufficient to effect recovery
> in the Linux guest.
> > > While this may be addressed in future hosts, this patch fixes the problem.
> > >
> > > What Christoph means is that this looks like a bug somewhere in SCSI
> itself.
> > > That means we need to find it and kill it, not add workarounds to
> > > every driver that sets no_write_same ...
> >
> > James,
> >
> > I will try to isolate this issue in the SCSI stack. If it is ok with
> > you guys, I would still want to filter WRITE_SAME_16 (as we currently
> > do WRITE_SAME) in our driver since this would address the problem for a
> large number of customers on our platform.
>
> If we fix it at source, why would there be any need to filter? That's the
> reason the no_write_same flag was introduced. If we can find and fix the
> bug, it can go back into the stable trees as a bug fix, hence nothing should
> ever emit write_same(10 or 16) and additional driver code is redundant (and
> counter productive, since if this ever breaks again you're our best canary).
>
> This looks like it might be the problem but Martin should confirm (I think the
> problem comes to us from the RC16 code which unconditionally sets WS16).
>
> James

James,

This patch works for me; are you planning on committing this patch.

Regards,

K. Y
>
> ---
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 6825eda..8353a4c
> 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -634,6 +634,23 @@ static void sd_config_discard(struct scsi_disk *sdkp,
> unsigned int mode)
> max(sdkp->physical_block_size,
> sdkp->unmap_granularity * logical_block_size);
>
> + if (sdkp->device->host->no_write_same) {
> + switch(mode) {
> +
> + case SD_LBP_WS16:
> + case SD_LBP_WS10:
> + case SD_LBP_ZERO:
> + /*
> + * filter out all the WRITE SAME modes and map them
> + * directly to UNMAP
> + */
> + mode = SD_LBP_UNMAP;
> + /* fall through */
> + default:
> + /* everything else is OK */
> + break;
> + }
> + }
> sdkp->provisioning_mode = mode;
>
> switch (mode) {

2014-07-10 22:12:57

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Thu, 2014-07-10 at 21:02 +0000, KY Srinivasan wrote:
>
> > -----Original Message-----
> > From: James Bottomley [mailto:[email protected]]
> > Sent: Wednesday, July 9, 2014 3:27 PM
> > To: KY Srinivasan
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
> >
> > On Wed, 2014-07-09 at 21:14 +0000, KY Srinivasan wrote:
> > >
> > > > -----Original Message-----
> > > > From: James Bottomley [mailto:[email protected]]
> > > > Sent: Wednesday, July 9, 2014 12:57 PM
> > > > To: KY Srinivasan
> > > > Cc: [email protected]; [email protected];
> > > > [email protected]; [email protected];
> > > > [email protected]; [email protected];
> > > > [email protected]; [email protected]
> > > > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter
> > > > WRITE_SAME_16
> > > >
> > > > On Wed, 2014-07-09 at 19:52 +0000, KY Srinivasan wrote:
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Christoph Hellwig [mailto:[email protected]]
> > > > > > Sent: Wednesday, July 9, 2014 1:43 AM
> > > > > > To: KY Srinivasan
> > > > > > Cc: [email protected]; [email protected];
> > > > > > [email protected]; [email protected];
> > [email protected];
> > > > > > [email protected]; [email protected];
> > > > > > [email protected]
> > > > > > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter
> > > > > > WRITE_SAME_16
> > > > > >
> > > > > > On Tue, Jul 08, 2014 at 05:46:48PM -0700, K. Y. Srinivasan wrote:
> > > > > > > Host does not handle WRITE_SAME_16; filter this command out.
> > > > > > > This patch is required to handle large devices (greater than 2 TB
> > disks).
> > > > > >
> > > > > > Storvsc already sets the no_write_same flag, where is the
> > > > > > command coming from?
> > > > >
> > > > > In spite of this flag, I see WRITE_SAME_16 being issued when I
> > > > > format a device bigger than 2 TB; I tried both xfs and ext4.
> > > > > Windows hosts currently do not handle unsupported commands
> > > > > correctly - The information returned is not sufficient to effect recovery
> > in the Linux guest.
> > > > While this may be addressed in future hosts, this patch fixes the problem.
> > > >
> > > > What Christoph means is that this looks like a bug somewhere in SCSI
> > itself.
> > > > That means we need to find it and kill it, not add workarounds to
> > > > every driver that sets no_write_same ...
> > >
> > > James,
> > >
> > > I will try to isolate this issue in the SCSI stack. If it is ok with
> > > you guys, I would still want to filter WRITE_SAME_16 (as we currently
> > > do WRITE_SAME) in our driver since this would address the problem for a
> > large number of customers on our platform.
> >
> > If we fix it at source, why would there be any need to filter? That's the
> > reason the no_write_same flag was introduced. If we can find and fix the
> > bug, it can go back into the stable trees as a bug fix, hence nothing should
> > ever emit write_same(10 or 16) and additional driver code is redundant (and
> > counter productive, since if this ever breaks again you're our best canary).
> >
> > This looks like it might be the problem but Martin should confirm (I think the
> > problem comes to us from the RC16 code which unconditionally sets WS16).
> >
> > James
>
> James,
>
> This patch works for me; are you planning on committing this patch.

OK, if we go with this we can add your tested by.

It's not going in until Martin takes a look because there may be a
better way of doing this.

James

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2014-07-10 22:23:03

[permalink] [raw]

Subject: RE: [PATCH 7/8] drivers: scsi: storvsc: Set srb_flags in all cases

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Thursday, July 10, 2014 3:17 AM
> To: KY Srinivasan
> Cc: Christoph Hellwig; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 7/8] drivers: scsi: storvsc: Set srb_flags in all cases
>
> On Wed, Jul 09, 2014 at 06:40:09PM +0000, KY Srinivasan wrote:
> > > On Tue, Jul 08, 2014 at 05:46:51PM -0700, K. Y. Srinivasan wrote:
> > > > Correctly set SRB flags for all valid I/O directions. Some IHV
> > > > drivers on the Windows host require this.
> > >
> > > What are IHV drivers?
> >
> > If the target is a SAN device, the host simply passes the request over to the
> native driver stack on the host. Some specific hardware (IHV - independent
> hadware vendor) drivers on Windows require that SRB flags be correctly set
> in all cases.
> >
>
> I'm fine with putting this in, but treating I/O request from guests as trusted in
> a hypervisor doesn't seem like a good idea in general..

The host does validate the guest request before forwarding the request to the appropriate target.

K. Y

2014-07-10 22:26:19

[permalink] [raw]

Subject: RE: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Thursday, July 10, 2014 3:13 AM
> To: KY Srinivasan
> Cc: Christoph Hellwig; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
>
>
> Note that you could increase the timeout and/or implement an
> eh_timed_out handler that just returns BLK_EH_RESET_TIMER, but if the
> completion takes too long the expectation is that a command will eventually
> finish instead of beeing delayed by an unmound amount.

I like this idea; I will implement a eh_timed_out_handler.

K. Y

2014-07-11 06:32:20

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Wed, Jul 09, 2014 at 10:27:24PM +0000, James Bottomley wrote:
> If we fix it at source, why would there be any need to filter? That's
> the reason the no_write_same flag was introduced. If we can find and
> fix the bug, it can go back into the stable trees as a bug fix, hence
> nothing should ever emit write_same(10 or 16) and additional driver code
> is redundant (and counter productive, since if this ever breaks again
> you're our best canary).
>
> This looks like it might be the problem but Martin should confirm (I
> think the problem comes to us from the RC16 code which unconditionally
> sets WS16).

I think the problem is a differnet one. If we have the logical
provisioning EVPD it configures what method to use, but if we don't have
one we simply check for a max unmap blocks field, and if that's not
present use WRITE SAME.

The patch checks the no_write_same flag before doing that, for which
we also have to do the write_same setup before the discard setup
in sd_revalidate_disk.

Ky: does hyperv support UNMAP? If so any idea why it doesn't set
the maximum unmap block count field in the EVPD?

If we want to enable UNMAP in this case I'd prefer a blacklist entry
than trying UNMAP despite the device not advertising it.

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ba756b1..fbccfd2 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2614,9 +2614,10 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)

if (sdkp->max_unmap_blocks)
sd_config_discard(sdkp, SD_LBP_UNMAP);
- else
+ else if (!sdkp->device->no_write_same)
sd_config_discard(sdkp, SD_LBP_WS16);
-
+ else
+ sd_config_discard(sdkp, SD_LBP_DISABLE);
} else { /* LBP VPD page tells us what to use */

if (sdkp->lbpu && sdkp->max_unmap_blocks)
@@ -2766,6 +2767,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
*/
if (sdkp->media_present) {
sd_read_capacity(sdkp, buffer);
+ sd_read_write_same(sdkp, buffer);

if (sd_try_extended_inquiry(sdp)) {
sd_read_block_provisioning(sdkp);
@@ -2776,7 +2778,6 @@ static int sd_revalidate_disk(struct gendisk *disk)
sd_read_write_protect_flag(sdkp, buffer);
sd_read_cache_type(sdkp, buffer);
sd_read_app_tag_own(sdkp, buffer);
- sd_read_write_same(sdkp, buffer);
}

sdkp->first_scan = 0;

2014-07-11 09:52:59

by Hannes Reinecke

[permalink] [raw]

Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

On 07/11/2014 12:26 AM, KY Srinivasan wrote:
>
>
>> -----Original Message-----
>> From: Christoph Hellwig [mailto:[email protected]]
>> Sent: Thursday, July 10, 2014 3:13 AM
>> To: KY Srinivasan
>> Cc: Christoph Hellwig; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]
>> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
>>
>>
>> Note that you could increase the timeout and/or implement an
>> eh_timed_out handler that just returns BLK_EH_RESET_TIMER, but if the
>> completion takes too long the expectation is that a command will eventually
>> finish instead of beeing delayed by an unmound amount.
>
> I like this idea; I will implement a eh_timed_out_handler.
>
Something like this should be sufficient:

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index e71a0d7..630ae81 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1468,6 +1468,12 @@ static int storvsc_get_chs(struct scsi_device
*sdev, stru
ct block_device * bdev,
return 0;
}

+static enum blk_eh_timer_return
+storvsc_timed_out_handler(struct scsi_cmnd *scmd)
+{
+ return BLK_EH_RESET_TIMER;
+}
+
static int storvsc_host_reset_handler(struct scsi_cmnd *scmnd)
{
struct hv_host_device *host_dev =
shost_priv(scmnd->device->host);
@@ -1687,6 +1693,7 @@ static struct scsi_host_template scsi_driver = {
.name = "storvsc_host_t",
.bios_param = storvsc_get_chs,
.queuecommand = storvsc_queuecommand,
+ .eh_timed_out = storvsc_timed_out_handler,
.eh_host_reset_handler = storvsc_host_reset_handler,
.slave_alloc = storvsc_device_alloc,
.slave_destroy = storvsc_device_destroy,

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)

2014-07-11 09:54:11

[permalink] [raw]

Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

On Fri, Jul 11, 2014 at 11:52:55AM +0200, Hannes Reinecke wrote:
> Something like this should be sufficient:

Right. That plus a detailed comment explaining why it's there..

2014-07-11 12:55:07

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

>>>>> "hch" == hch@infradead org <[email protected]> writes:

(Back from vacation: Bear with me while I'm catching up on two weeks of
linux-scsi stuff...)

hch> I think the problem is a differnet one. If we have the logical
hch> provisioning EVPD it configures what method to use, but if we don't
hch> have one we simply check for a max unmap blocks field, and if
hch> that's not present use WRITE SAME.

Yeah, that was done to accommodate the devices out there that predate
the LBP VPD. There sadly are several still. And it's hard to motivate
people to update their storage array firmware even when updates are
readily available.

hch> The patch checks the no_write_same flag before doing that, for
hch> which we also have to do the write_same setup before the discard
hch> setup in sd_revalidate_disk.

The no_write_same was introduced to disable the REQ_WRITE_SAME use case
where we have no INQUIRY/READ CAPACITY/VPD flags to key off of to
determine whether it is safe to send the commands.

no_write_same does not preclude the REQ_DISCARD variants of
WRITE_SAME. Those are gated by the discard heuristics exclusively.

So first of all I'd like to know whether it's a WRITE SAME(16) or a
WRITE SAME(16) with the UNMAP bit set that's coming down.

hch> Ky: does hyperv support UNMAP? If so any idea why it doesn't set
hch> the maximum unmap block count field in the EVPD?

We shouldn't be sending down WRITE SAME(16) with the UNMAP bit set
unless the device sets LBPME=1 in the READ CAPACITY(16) response.

So what does the storsvc report as its thin provisioning capabilities?

--
Martin K. Petersen Oracle Linux Engineering

2014-07-11 14:49:23

[permalink] [raw]

Subject: RE: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

> -----Original Message-----
> From: Hannes Reinecke [mailto:[email protected]]
> Sent: Friday, July 11, 2014 2:53 AM
> To: KY Srinivasan; Christoph Hellwig
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
>
> On 07/11/2014 12:26 AM, KY Srinivasan wrote:
> >
> >
> >> -----Original Message-----
> >> From: Christoph Hellwig [mailto:[email protected]]
> >> Sent: Thursday, July 10, 2014 3:13 AM
> >> To: KY Srinivasan
> >> Cc: Christoph Hellwig; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]
> >> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort
> >> handler
> >>
> >>
> >> Note that you could increase the timeout and/or implement an
> >> eh_timed_out handler that just returns BLK_EH_RESET_TIMER, but if the
> >> completion takes too long the expectation is that a command will
> >> eventually finish instead of beeing delayed by an unmound amount.
> >
> > I like this idea; I will implement a eh_timed_out_handler.
> >
> Something like this should be sufficient:
>
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index
> e71a0d7..630ae81 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1468,6 +1468,12 @@ static int storvsc_get_chs(struct scsi_device
> *sdev, stru ct block_device * bdev,
> return 0;
> }
>
> +static enum blk_eh_timer_return
> +storvsc_timed_out_handler(struct scsi_cmnd *scmd) {
> + return BLK_EH_RESET_TIMER;
> +}
> +
> static int storvsc_host_reset_handler(struct scsi_cmnd *scmnd)
> {
> struct hv_host_device *host_dev = shost_priv(scmnd->device->host);
> @@ -1687,6 +1693,7 @@ static struct scsi_host_template scsi_driver = {
> .name = "storvsc_host_t",
> .bios_param = storvsc_get_chs,
> .queuecommand = storvsc_queuecommand,
> + .eh_timed_out = storvsc_timed_out_handler,
> .eh_host_reset_handler = storvsc_host_reset_handler,
> .slave_alloc = storvsc_device_alloc,
> .slave_destroy = storvsc_device_destroy,
>
> Cheers,

Thanks Hannes. Based on Christoph's feedback I have implemented exactly this patch. It is under test on Azure.
I will be re-spinning and posting the patches shortly.

K. Y
>
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> [email protected] +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
> GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)

2014-07-11 21:04:17

[permalink] [raw]

Subject: RE: [PATCH 7/8] drivers: scsi: storvsc: Set srb_flags in all cases

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Thursday, July 10, 2014 3:19 AM
> To: KY Srinivasan
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 7/8] drivers: scsi: storvsc: Set srb_flags in all cases
>
> > default:
> > vm_srb->data_in = UNKNOWN_TYPE;
> > - vm_srb->win8_extension.srb_flags = 0;
> > + vm_srb->win8_extension.srb_flags |=
> (SRB_FLAGS_DATA_IN |
> > + SRB_FLAGS_DATA_OUT);
>
> This would usually be a command that doesn't transfer data (e.g.
> TEST_UNIT_READY or SYNCHRONIZE_CACHE), do you really want to set the in
> and out flags here?

On the host, before they forward the command to the native driver stack, I am told they validate that the flags be correctly set because of some bugs in the Emulex driver.

K. Y

2014-07-12 02:50:25

[permalink] [raw]

Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Thursday, July 10, 2014 11:32 PM
> To: James Bottomley
> Cc: KY Srinivasan; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
>
> On Wed, Jul 09, 2014 at 10:27:24PM +0000, James Bottomley wrote:
> > If we fix it at source, why would there be any need to filter? That's
> > the reason the no_write_same flag was introduced. If we can find and
> > fix the bug, it can go back into the stable trees as a bug fix, hence
> > nothing should ever emit write_same(10 or 16) and additional driver
> > code is redundant (and counter productive, since if this ever breaks
> > again you're our best canary).
> >
> > This looks like it might be the problem but Martin should confirm (I
> > think the problem comes to us from the RC16 code which unconditionally
> > sets WS16).
>
> I think the problem is a differnet one. If we have the logical provisioning
> EVPD it configures what method to use, but if we don't have one we simply
> check for a max unmap blocks field, and if that's not present use WRITE
> SAME.
>
> The patch checks the no_write_same flag before doing that, for which we
> also have to do the write_same setup before the discard setup in
> sd_revalidate_disk.
>
> Ky: does hyperv support UNMAP? If so any idea why it doesn't set the
> maximum unmap block count field in the EVPD?

Windows hosts do support UNMAP and set the field in the EVPD. However, since the host
advertises SPC-2 compliance, Linux does not even query the VPD page.

>
> If we want to enable UNMAP in this case I'd prefer a blacklist entry than
> trying UNMAP despite the device not advertising it.
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index ba756b1..fbccfd2 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2614,9 +2614,10 @@ static void sd_read_block_limits(struct scsi_disk
> *sdkp)
>
> if (sdkp->max_unmap_blocks)
> sd_config_discard(sdkp, SD_LBP_UNMAP);
> - else
> + else if (!sdkp->device->no_write_same)
> sd_config_discard(sdkp, SD_LBP_WS16);
> -
> + else
> + sd_config_discard(sdkp, SD_LBP_DISABLE);
> } else { /* LBP VPD page tells us what to use */
>
> if (sdkp->lbpu && sdkp->max_unmap_blocks) @@ -
> 2766,6 +2767,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
> */
> if (sdkp->media_present) {
> sd_read_capacity(sdkp, buffer);
> + sd_read_write_same(sdkp, buffer);
>
> if (sd_try_extended_inquiry(sdp)) {
> sd_read_block_provisioning(sdkp);
> @@ -2776,7 +2778,6 @@ static int sd_revalidate_disk(struct gendisk *disk)
> sd_read_write_protect_flag(sdkp, buffer);
> sd_read_cache_type(sdkp, buffer);
> sd_read_app_tag_own(sdkp, buffer);
> - sd_read_write_same(sdkp, buffer);
> }
>
> sdkp->first_scan = 0;

2014-07-12 02:54:14

[permalink] [raw]

Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

> -----Original Message-----
> From: Martin K. Petersen [mailto:[email protected]]
> Sent: Friday, July 11, 2014 5:54 AM
> To: [email protected]
> Cc: James Bottomley; KY Srinivasan; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
>
> >>>>> "hch" == hch@infradead org <[email protected]> writes:
>
> (Back from vacation: Bear with me while I'm catching up on two weeks of
> linux-scsi stuff...)
>
> hch> I think the problem is a differnet one. If we have the logical
> hch> provisioning EVPD it configures what method to use, but if we don't
> hch> have one we simply check for a max unmap blocks field, and if
> hch> that's not present use WRITE SAME.
>
> Yeah, that was done to accommodate the devices out there that predate the
> LBP VPD. There sadly are several still. And it's hard to motivate people to
> update their storage array firmware even when updates are readily available.
>
> hch> The patch checks the no_write_same flag before doing that, for
> hch> which we also have to do the write_same setup before the discard
> hch> setup in sd_revalidate_disk.
>
> The no_write_same was introduced to disable the REQ_WRITE_SAME use
> case where we have no INQUIRY/READ CAPACITY/VPD flags to key off of to
> determine whether it is safe to send the commands.
>
> no_write_same does not preclude the REQ_DISCARD variants of
> WRITE_SAME. Those are gated by the discard heuristics exclusively.
>
> So first of all I'd like to know whether it's a WRITE SAME(16) or a WRITE
> SAME(16) with the UNMAP bit set that's coming down.
>
> hch> Ky: does hyperv support UNMAP? If so any idea why it doesn't set
> hch> the maximum unmap block count field in the EVPD?
>
> We shouldn't be sending down WRITE SAME(16) with the UNMAP bit set
> unless the device sets LBPME=1 in the READ CAPACITY(16) response.
>
> So what does the storsvc report as its thin provisioning capabilities?
Given that our current Host (ws2012 r2) advertises SPC-2 compliance, Linux does not even query this page. We support UNMAP.
We just prototyped a host where we advertised SPC-3 compliance and Linux queries the EVPD page and does not use WRITE_SAME_16

K. Y

2014-07-12 16:16:40

by Richard Weinberger

[permalink] [raw]

Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

On Thu, Jul 10, 2014 at 12:33 PM, Richard Weinberger
<[email protected]> wrote:
> On Wed, Jul 9, 2014 at 8:51 PM, KY Srinivasan <[email protected]> wrote:
>>
>>
>>> -----Original Message-----
>>> From: Christoph Hellwig [mailto:[email protected]]
>>> Sent: Wednesday, July 9, 2014 1:44 AM
>>> To: KY Srinivasan
>>> Cc: [email protected]; [email protected];
>>> [email protected]; [email protected]; [email protected];
>>> [email protected]; [email protected]
>>> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
>>>
>>> On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote:
>>> > Implement a simple abort handler. The host does not support "Abort";
>>> > just ensure that all inflight I/Os have been accounted for.
>>>
>>> The abort handler should abort a single command, not wait for all of them.
>>> What issue do you see that this tries to address?
>>
>> On Azure, we sometimes have unbounded I/O latencies and some distributions (such as SLES12) based on recent kernels are invoking
>> the "Abort Handler". Unfortunately, our scsi emulation on the host does not support aborting a command.
>> The issue I have seen is that the upper level scsi code attempts error recovery when the command times out and finally frees up the command.
>> The host subsequently responds to the command that has timed out and since the memory has been freed up, we end up touching freed memory
>> in this driver. Since the host is also doing error recovery, by just delaying the error handler in the guest until we can account for all the in-flight commands,
>> we can get around the problem.
>
> I see strange issues in Azure and maybe they are related to this.
> Some Linux machines crash in a way that no disk IO is possible (thus,
> no SSH for me) but they still respond to
> ping. It happens rather seldom (every few weeks).
>
> Do you see similar symptoms?

ping?

--
Thanks,
//richard

2014-07-12 16:35:17

[permalink] [raw]

Subject: RE: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler

> -----Original Message-----
> From: Richard Weinberger [mailto:[email protected]]
> Sent: Saturday, July 12, 2014 9:17 AM
> To: KY Srinivasan
> Cc: Christoph Hellwig; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort handler
>
> On Thu, Jul 10, 2014 at 12:33 PM, Richard Weinberger
> <[email protected]> wrote:
> > On Wed, Jul 9, 2014 at 8:51 PM, KY Srinivasan <[email protected]> wrote:
> >>
> >>
> >>> -----Original Message-----
> >>> From: Christoph Hellwig [mailto:[email protected]]
> >>> Sent: Wednesday, July 9, 2014 1:44 AM
> >>> To: KY Srinivasan
> >>> Cc: [email protected]; [email protected];
> >>> [email protected]; [email protected]; [email protected];
> >>> [email protected]; [email protected]
> >>> Subject: Re: [PATCH 6/8] Drivers: scsi: storvsc: Implement an abort
> >>> handler
> >>>
> >>> On Tue, Jul 08, 2014 at 05:46:50PM -0700, K. Y. Srinivasan wrote:
> >>> > Implement a simple abort handler. The host does not support
> >>> > "Abort"; just ensure that all inflight I/Os have been accounted for.
> >>>
> >>> The abort handler should abort a single command, not wait for all of
> them.
> >>> What issue do you see that this tries to address?
> >>
> >> On Azure, we sometimes have unbounded I/O latencies and some
> >> distributions (such as SLES12) based on recent kernels are invoking the
> "Abort Handler". Unfortunately, our scsi emulation on the host does not
> support aborting a command.
> >> The issue I have seen is that the upper level scsi code attempts error
> recovery when the command times out and finally frees up the command.
> >> The host subsequently responds to the command that has timed out and
> >> since the memory has been freed up, we end up touching freed memory
> >> in this driver. Since the host is also doing error recovery, by just delaying
> the error handler in the guest until we can account for all the in-flight
> commands, we can get around the problem.
> >
> > I see strange issues in Azure and maybe they are related to this.
> > Some Linux machines crash in a way that no disk IO is possible (thus,
> > no SSH for me) but they still respond to ping. It happens rather
> > seldom (every few weeks).
> >
> > Do you see similar symptoms?
>
> ping?

Sorry for the delayed response. Yes we have seen resets and potentially the file system mounted
Read-only because of the I/O timeouts. We have increased the standard scsi timeouts. Implementing the
Timedout handler as we have done now should solve this problem.

K. Y
>
> --
> Thanks,
> //richard
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2014-07-13 12:59:35

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

>>>>> "KY" == KY Srinivasan <[email protected]> writes:

KY> Windows hosts do support UNMAP and set the field in the
KY> EVPD. However, since the host advertises SPC-2 compliance, Linux
KY> does not even query the VPD page.

>> If we want to enable UNMAP in this case I'd prefer a blacklist entry
>> than trying UNMAP despite the device not advertising it.

I agree with that. We could do something like the patch below.

However, I do think it's a good idea that you guys are looking into
reporting SPC-3.

SCSI: Add a blacklist flag which enables VPD page inquiries

Despite supporting modern SCSI features some storage devices continue to
claim conformance to an older version of the SPC spec. This is done for
compatibility with legacy operating systems.

Linux by default will not attempt to read VPD pages on devices that
claim SPC-2 or older. Introduce a blacklist flag that can be used to
trigger VPD page inquiries on devices that are known to support them.

Reported-by: KY Srinivasan <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 4a6e4ba5a400..a5b1a224628a 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -949,7 +949,9 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,

sdev->eh_timeout = SCSI_DEFAULT_EH_TIMEOUT;

- if (*bflags & BLIST_SKIP_VPD_PAGES)
+ if (*bflags & BLIST_TRY_VPD_PAGES)
+ sdev->try_vpd_pages = 1;
+ else if (*bflags & BLIST_SKIP_VPD_PAGES)
sdev->skip_vpd_pages = 1;

transport_configure_device(&sdev->sdev_gendev);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 87566b51fcf7..31d32b9077ca 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2701,6 +2701,11 @@ static void sd_read_write_same(struct scsi_disk *sdkp, unsigned char *buffer)

static int sd_try_extended_inquiry(struct scsi_device *sdp)
{
+ /* Attempt VPD inquiry if the device blacklist explicitly calls
+ * for it.
+ */
+ if (sdp->try_vpd_pages)
+ return 1;
/*
* Although VPD inquiries can go to SCSI-2 type devices,
* some USB ones crash on receiving them, and the pages
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 9aa38f7b303b..f579408620f0 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -155,6 +155,7 @@ struct scsi_device {
unsigned skip_ms_page_8:1; /* do not use MODE SENSE page 0x08 */
unsigned skip_ms_page_3f:1; /* do not use MODE SENSE page 0x3f */
unsigned skip_vpd_pages:1; /* do not read VPD pages */
+ unsigned try_vpd_pages:1; /* attempt to read VPD pages */
unsigned use_192_bytes_for_3f:1; /* ask for 192 bytes from page 0x3f */
unsigned no_start_on_add:1; /* do not issue start on add */
unsigned allow_restart:1; /* issue START_UNIT in error handler */
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index 8670c04e199e..1fdd6fc5492b 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -34,4 +34,5 @@
#define BLIST_SKIP_VPD_PAGES 0x4000000 /* Ignore SBC-3 VPD pages */
#define BLIST_SCSI3LUN 0x8000000 /* Scan more than 256 LUNs
for sequential scan */
+#define BLIST_TRY_VPD_PAGES 0x10000000 /* Attempt to read VPD pages */
#endif

2014-07-13 18:49:54

[permalink] [raw]

Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

> -----Original Message-----
> From: Martin K. Petersen [mailto:[email protected]]
> Sent: Sunday, July 13, 2014 5:59 AM
> To: KY Srinivasan
> Cc: [email protected]; James Bottomley; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
>
> >>>>> "KY" == KY Srinivasan <[email protected]> writes:
>
> KY> Windows hosts do support UNMAP and set the field in the EVPD.
> KY> However, since the host advertises SPC-2 compliance, Linux does not
> KY> even query the VPD page.
>
> >> If we want to enable UNMAP in this case I'd prefer a blacklist entry
> >> than trying UNMAP despite the device not advertising it.
>
> I agree with that. We could do something like the patch below.
>
> However, I do think it's a good idea that you guys are looking into reporting
> SPC-3.

Thanks Martin; this patch would address our present issues. I will get some testing done
and report back.

Regards,

K. Y
>
>
> SCSI: Add a blacklist flag which enables VPD page inquiries
>
> Despite supporting modern SCSI features some storage devices continue to
> claim conformance to an older version of the SPC spec. This is done for
> compatibility with legacy operating systems.
>
> Linux by default will not attempt to read VPD pages on devices that claim
> SPC-2 or older. Introduce a blacklist flag that can be used to trigger VPD page
> inquiries on devices that are known to support them.
>
> Reported-by: KY Srinivasan <[email protected]>
> Signed-off-by: Martin K. Petersen <[email protected]>
>
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index
> 4a6e4ba5a400..a5b1a224628a 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -949,7 +949,9 @@ static int scsi_add_lun(struct scsi_device *sdev,
> unsigned char *inq_result,
>
> sdev->eh_timeout = SCSI_DEFAULT_EH_TIMEOUT;
>
> - if (*bflags & BLIST_SKIP_VPD_PAGES)
> + if (*bflags & BLIST_TRY_VPD_PAGES)
> + sdev->try_vpd_pages = 1;
> + else if (*bflags & BLIST_SKIP_VPD_PAGES)
> sdev->skip_vpd_pages = 1;
>
> transport_configure_device(&sdev->sdev_gendev);
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index
> 87566b51fcf7..31d32b9077ca 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2701,6 +2701,11 @@ static void sd_read_write_same(struct scsi_disk
> *sdkp, unsigned char *buffer)
>
> static int sd_try_extended_inquiry(struct scsi_device *sdp) {
> + /* Attempt VPD inquiry if the device blacklist explicitly calls
> + * for it.
> + */
> + if (sdp->try_vpd_pages)
> + return 1;
> /*
> * Although VPD inquiries can go to SCSI-2 type devices,
> * some USB ones crash on receiving them, and the pages diff --git
> a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index
> 9aa38f7b303b..f579408620f0 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -155,6 +155,7 @@ struct scsi_device {
> unsigned skip_ms_page_8:1; /* do not use MODE SENSE page 0x08
> */
> unsigned skip_ms_page_3f:1; /* do not use MODE SENSE page 0x3f
> */
> unsigned skip_vpd_pages:1; /* do not read VPD pages */
> + unsigned try_vpd_pages:1; /* attempt to read VPD pages */
> unsigned use_192_bytes_for_3f:1; /* ask for 192 bytes from page
> 0x3f */
> unsigned no_start_on_add:1; /* do not issue start on add */
> unsigned allow_restart:1; /* issue START_UNIT in error handler */
> diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h index
> 8670c04e199e..1fdd6fc5492b 100644
> --- a/include/scsi/scsi_devinfo.h
> +++ b/include/scsi/scsi_devinfo.h
> @@ -34,4 +34,5 @@
> #define BLIST_SKIP_VPD_PAGES 0x4000000 /* Ignore SBC-3 VPD pages
> */
> #define BLIST_SCSI3LUN 0x8000000 /* Scan more than 256
> LUNs
> for sequential scan */
> +#define BLIST_TRY_VPD_PAGES 0x10000000 /* Attempt to read VPD
> pages */
> #endif

2014-07-14 02:38:00

[permalink] [raw]

Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

> -----Original Message-----
> From: KY Srinivasan
> Sent: Sunday, July 13, 2014 11:50 AM
> To: 'Martin K. Petersen'
> Cc: [email protected]; James Bottomley; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
>
>
>
> > -----Original Message-----
> > From: Martin K. Petersen [mailto:[email protected]]
> > Sent: Sunday, July 13, 2014 5:59 AM
> > To: KY Srinivasan
> > Cc: [email protected]; James Bottomley; [email protected];
> > [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]
> > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
> >
> > >>>>> "KY" == KY Srinivasan <[email protected]> writes:
> >
> > KY> Windows hosts do support UNMAP and set the field in the EVPD.
> > KY> However, since the host advertises SPC-2 compliance, Linux does
> > KY> not even query the VPD page.
> >
> > >> If we want to enable UNMAP in this case I'd prefer a blacklist
> > >> entry than trying UNMAP despite the device not advertising it.
> >
> > I agree with that. We could do something like the patch below.
> >
> > However, I do think it's a good idea that you guys are looking into
> > reporting SPC-3.
>
> Thanks Martin; this patch would address our present issues. I will get some
> testing done and report back.

How should we force the flag (BLIST_TRY_VPD_PAGES) to be set.

K. Y
>
> Regards,
>
> K. Y
> >
> >
> > SCSI: Add a blacklist flag which enables VPD page inquiries
> >
> > Despite supporting modern SCSI features some storage devices continue
> > to claim conformance to an older version of the SPC spec. This is done
> > for compatibility with legacy operating systems.
> >
> > Linux by default will not attempt to read VPD pages on devices that
> > claim
> > SPC-2 or older. Introduce a blacklist flag that can be used to trigger
> > VPD page inquiries on devices that are known to support them.
> >
> > Reported-by: KY Srinivasan <[email protected]>
> > Signed-off-by: Martin K. Petersen <[email protected]>
> >
> > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index
> > 4a6e4ba5a400..a5b1a224628a 100644
> > --- a/drivers/scsi/scsi_scan.c
> > +++ b/drivers/scsi/scsi_scan.c
> > @@ -949,7 +949,9 @@ static int scsi_add_lun(struct scsi_device *sdev,
> > unsigned char *inq_result,
> >
> > sdev->eh_timeout = SCSI_DEFAULT_EH_TIMEOUT;
> >
> > - if (*bflags & BLIST_SKIP_VPD_PAGES)
> > + if (*bflags & BLIST_TRY_VPD_PAGES)
> > + sdev->try_vpd_pages = 1;
> > + else if (*bflags & BLIST_SKIP_VPD_PAGES)
> > sdev->skip_vpd_pages = 1;
> >
> > transport_configure_device(&sdev->sdev_gendev);
> > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index
> > 87566b51fcf7..31d32b9077ca 100644
> > --- a/drivers/scsi/sd.c
> > +++ b/drivers/scsi/sd.c
> > @@ -2701,6 +2701,11 @@ static void sd_read_write_same(struct scsi_disk
> > *sdkp, unsigned char *buffer)
> >
> > static int sd_try_extended_inquiry(struct scsi_device *sdp) {
> > + /* Attempt VPD inquiry if the device blacklist explicitly calls
> > + * for it.
> > + */
> > + if (sdp->try_vpd_pages)
> > + return 1;
> > /*
> > * Although VPD inquiries can go to SCSI-2 type devices,
> > * some USB ones crash on receiving them, and the pages diff --git
> > a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index
> > 9aa38f7b303b..f579408620f0 100644
> > --- a/include/scsi/scsi_device.h
> > +++ b/include/scsi/scsi_device.h
> > @@ -155,6 +155,7 @@ struct scsi_device {
> > unsigned skip_ms_page_8:1; /* do not use MODE SENSE page 0x08
> > */
> > unsigned skip_ms_page_3f:1; /* do not use MODE SENSE page 0x3f
> > */
> > unsigned skip_vpd_pages:1; /* do not read VPD pages */
> > + unsigned try_vpd_pages:1; /* attempt to read VPD pages */
> > unsigned use_192_bytes_for_3f:1; /* ask for 192 bytes from page
> 0x3f
> > */
> > unsigned no_start_on_add:1; /* do not issue start on add */
> > unsigned allow_restart:1; /* issue START_UNIT in error handler */
> > diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
> > index 8670c04e199e..1fdd6fc5492b 100644
> > --- a/include/scsi/scsi_devinfo.h
> > +++ b/include/scsi/scsi_devinfo.h
> > @@ -34,4 +34,5 @@
> > #define BLIST_SKIP_VPD_PAGES 0x4000000 /* Ignore SBC-3 VPD pages
> > */
> > #define BLIST_SCSI3LUN 0x8000000 /* Scan more than 256
> > LUNs
> > for sequential scan */
> > +#define BLIST_TRY_VPD_PAGES 0x10000000 /* Attempt to read VPD
> > pages */
> > #endif

2014-07-14 03:22:35

[permalink] [raw]

Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

> -----Original Message-----
> From: [email protected] [mailto:driverdev-
> [email protected]] On Behalf Of KY Srinivasan
> Sent: Sunday, July 13, 2014 7:38 PM
> To: Martin K. Petersen
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; James Bottomley; [email protected];
> [email protected]
> Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
>
>
>
> > -----Original Message-----
> > From: KY Srinivasan
> > Sent: Sunday, July 13, 2014 11:50 AM
> > To: 'Martin K. Petersen'
> > Cc: [email protected]; James Bottomley; [email protected];
> > [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]
> > Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
> >
> >
> >
> > > -----Original Message-----
> > > From: Martin K. Petersen [mailto:[email protected]]
> > > Sent: Sunday, July 13, 2014 5:59 AM
> > > To: KY Srinivasan
> > > Cc: [email protected]; James Bottomley;
> > > [email protected]; [email protected];
> > > [email protected]; [email protected];
> > > [email protected]; [email protected]; [email protected]
> > > Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter
> > > WRITE_SAME_16
> > >
> > > >>>>> "KY" == KY Srinivasan <[email protected]> writes:
> > >
> > > KY> Windows hosts do support UNMAP and set the field in the EVPD.
> > > KY> However, since the host advertises SPC-2 compliance, Linux does
> > > KY> not even query the VPD page.
> > >
> > > >> If we want to enable UNMAP in this case I'd prefer a blacklist
> > > >> entry than trying UNMAP despite the device not advertising it.
> > >
> > > I agree with that. We could do something like the patch below.
> > >
> > > However, I do think it's a good idea that you guys are looking into
> > > reporting SPC-3.
> >
> > Thanks Martin; this patch would address our present issues. I will get
> > some testing done and report back.
>
> How should we force the flag (BLIST_TRY_VPD_PAGES) to be set.

Once I force the flag set, it works. Thanks Martin.

K. Y
>
> K. Y
> >
> > Regards,
> >
> > K. Y
> > >
> > >
> > > SCSI: Add a blacklist flag which enables VPD page inquiries
> > >
> > > Despite supporting modern SCSI features some storage devices
> > > continue to claim conformance to an older version of the SPC spec.
> > > This is done for compatibility with legacy operating systems.
> > >
> > > Linux by default will not attempt to read VPD pages on devices that
> > > claim
> > > SPC-2 or older. Introduce a blacklist flag that can be used to
> > > trigger VPD page inquiries on devices that are known to support them.
> > >
> > > Reported-by: KY Srinivasan <[email protected]>
> > > Signed-off-by: Martin K. Petersen <[email protected]>
> > >
> > > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> > > index 4a6e4ba5a400..a5b1a224628a 100644
> > > --- a/drivers/scsi/scsi_scan.c
> > > +++ b/drivers/scsi/scsi_scan.c
> > > @@ -949,7 +949,9 @@ static int scsi_add_lun(struct scsi_device
> > > *sdev, unsigned char *inq_result,
> > >
> > > sdev->eh_timeout = SCSI_DEFAULT_EH_TIMEOUT;
> > >
> > > - if (*bflags & BLIST_SKIP_VPD_PAGES)
> > > + if (*bflags & BLIST_TRY_VPD_PAGES)
> > > + sdev->try_vpd_pages = 1;
> > > + else if (*bflags & BLIST_SKIP_VPD_PAGES)
> > > sdev->skip_vpd_pages = 1;
> > >
> > > transport_configure_device(&sdev->sdev_gendev);
> > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index
> > > 87566b51fcf7..31d32b9077ca 100644
> > > --- a/drivers/scsi/sd.c
> > > +++ b/drivers/scsi/sd.c
> > > @@ -2701,6 +2701,11 @@ static void sd_read_write_same(struct
> > > scsi_disk *sdkp, unsigned char *buffer)
> > >
> > > static int sd_try_extended_inquiry(struct scsi_device *sdp) {
> > > + /* Attempt VPD inquiry if the device blacklist explicitly calls
> > > + * for it.
> > > + */
> > > + if (sdp->try_vpd_pages)
> > > + return 1;
> > > /*
> > > * Although VPD inquiries can go to SCSI-2 type devices,
> > > * some USB ones crash on receiving them, and the pages diff --git
> > > a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index
> > > 9aa38f7b303b..f579408620f0 100644
> > > --- a/include/scsi/scsi_device.h
> > > +++ b/include/scsi/scsi_device.h
> > > @@ -155,6 +155,7 @@ struct scsi_device {
> > > unsigned skip_ms_page_8:1; /* do not use MODE SENSE page 0x08
> > > */
> > > unsigned skip_ms_page_3f:1; /* do not use MODE SENSE page 0x3f
> > > */
> > > unsigned skip_vpd_pages:1; /* do not read VPD pages */
> > > + unsigned try_vpd_pages:1; /* attempt to read VPD pages */
> > > unsigned use_192_bytes_for_3f:1; /* ask for 192 bytes from page
> > 0x3f
> > > */
> > > unsigned no_start_on_add:1; /* do not issue start on add */
> > > unsigned allow_restart:1; /* issue START_UNIT in error handler */
> > > diff --git a/include/scsi/scsi_devinfo.h
> > > b/include/scsi/scsi_devinfo.h index 8670c04e199e..1fdd6fc5492b
> > > 100644
> > > --- a/include/scsi/scsi_devinfo.h
> > > +++ b/include/scsi/scsi_devinfo.h
> > > @@ -34,4 +34,5 @@
> > > #define BLIST_SKIP_VPD_PAGES 0x4000000 /* Ignore SBC-3 VPD pages
> > > */
> > > #define BLIST_SCSI3LUN 0x8000000 /* Scan more than 256
> > > LUNs
> > > for sequential scan */
> > > +#define BLIST_TRY_VPD_PAGES 0x10000000 /* Attempt to read VPD
> > > pages */
> > > #endif
> _______________________________________________
> devel mailing list
> [email protected]
> http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

2014-07-16 11:01:17

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Sun, Jul 13, 2014 at 08:58:34AM -0400, Martin K. Petersen wrote:
> >>>>> "KY" == KY Srinivasan <[email protected]> writes:
>
> KY> Windows hosts do support UNMAP and set the field in the
> KY> EVPD. However, since the host advertises SPC-2 compliance, Linux
> KY> does not even query the VPD page.
>
> >> If we want to enable UNMAP in this case I'd prefer a blacklist entry
> >> than trying UNMAP despite the device not advertising it.
>
> I agree with that. We could do something like the patch below.
>
> However, I do think it's a good idea that you guys are looking into
> reporting SPC-3.

KY mentioned that they have a prototype for that now.

Btw, I looked over sd.c a bit more, and I think I understand why they
get the WRITE SAME commands now:

read_capacity_16 calls sd_config_discard(sdkp, SD_LBP_WS16) if the LPBME
bit is set. At least older SBC drafts left it wide open if a target
supports WRITE SAME with UNMAP or UNMAP in this case. So I think we'd
still want a patch to use UNMAP instead of WRITE SAME for this case,
which should also fix hyperv. Below is the quick hack version of that
that just checks the host no_write_same flag, as the one on the device
isn't set yet - I guess we need to refactor some of that logic.

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 87566b5..4480fdf 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2035,7 +2035,10 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
if (buffer[14] & 0x40) /* LBPRZ */
sdkp->lbprz = 1;

- sd_config_discard(sdkp, SD_LBP_WS16);
+ if (sdp->host->no_write_same)
+ sd_config_discard(sdkp, SD_LBP_UNMAP);
+ else
+ sd_config_discard(sdkp, SD_LBP_WS16);
}

sdkp->capacity = lba + 1;

2014-07-16 14:53:54

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Wed, 2014-07-16 at 04:01 -0700, [email protected] wrote:
> On Sun, Jul 13, 2014 at 08:58:34AM -0400, Martin K. Petersen wrote:
> > >>>>> "KY" == KY Srinivasan <[email protected]> writes:
> >
> > KY> Windows hosts do support UNMAP and set the field in the
> > KY> EVPD. However, since the host advertises SPC-2 compliance, Linux
> > KY> does not even query the VPD page.
> >
> > >> If we want to enable UNMAP in this case I'd prefer a blacklist entry
> > >> than trying UNMAP despite the device not advertising it.
> >
> > I agree with that. We could do something like the patch below.
> >
> > However, I do think it's a good idea that you guys are looking into
> > reporting SPC-3.
>
> KY mentioned that they have a prototype for that now.
>
> Btw, I looked over sd.c a bit more, and I think I understand why they
> get the WRITE SAME commands now:
>
> read_capacity_16 calls sd_config_discard(sdkp, SD_LBP_WS16) if the LPBME
> bit is set. At least older SBC drafts left it wide open if a target
> supports WRITE SAME with UNMAP or UNMAP in this case. So I think we'd
> still want a patch to use UNMAP instead of WRITE SAME for this case,
> which should also fix hyperv. Below is the quick hack version of that
> that just checks the host no_write_same flag, as the one on the device
> isn't set yet - I guess we need to refactor some of that logic.
>
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 87566b5..4480fdf 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2035,7 +2035,10 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
> if (buffer[14] & 0x40) /* LBPRZ */
> sdkp->lbprz = 1;
>
> - sd_config_discard(sdkp, SD_LBP_WS16);
> + if (sdp->host->no_write_same)
> + sd_config_discard(sdkp, SD_LBP_UNMAP);
> + else
> + sd_config_discard(sdkp, SD_LBP_WS16);

Right, I already said this was the problem: that was the reason for my
patch. However, there are a couple of other cases (including the /sys
entry) which is why I patched sd_config_discard instead.

James

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2014-07-16 15:45:05

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

>>>>> "hch" == hch@infradead org <[email protected]> writes:

hch> read_capacity_16 calls sd_config_discard(sdkp, SD_LBP_WS16) if the
hch> LPBME bit is set. At least older SBC drafts left it wide open if a
hch> target supports WRITE SAME with UNMAP or UNMAP in this case.

Correct.

hch> So I think we'd still want a patch to use UNMAP instead of WRITE
hch> SAME for this case, which should also fix hyperv.

There are lots of devices out there that support WRITE SAME(10) or (16)
without the UNMAP bit. And there are devices that support WRITE SAME w/
UNMAP functionality but not "regular" WRITE SAME.

no_write_same is there to prevent the REQ_WRITE_SAME use case (for which
we have really weak heuristics). Your patch overloads no_write_same so
it also governs a REQ_DISCARD use case.

My proposed black list patch fixes the hyperv discard issue. So I don't
see why we'd need to overload no_write_same which was meant for an
entirely different purpose.

--
Martin K. Petersen Oracle Linux Engineering

2014-07-16 17:38:39

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Wed, Jul 16, 2014 at 11:44:18AM -0400, Martin K. Petersen wrote:
> There are lots of devices out there that support WRITE SAME(10) or (16)
> without the UNMAP bit. And there are devices that support WRITE SAME w/
> UNMAP functionality but not "regular" WRITE SAME.

Oh, we actually have devices that support WRITE SAME with unmap, but not
without? That's defintively a little strange.

> no_write_same is there to prevent the REQ_WRITE_SAME use case (for which
> we have really weak heuristics). Your patch overloads no_write_same so
> it also governs a REQ_DISCARD use case.

Yes, and it did this intentionally. I really wouldn't expect devices
to support WRITE SAME with UNMAP but blow up on a WRITE SAME without
it (and not just simple fail it in an orderly way).

> My proposed black list patch fixes the hyperv discard issue. So I don't
> see why we'd need to overload no_write_same which was meant for an
> entirely different purpose.

It definitively seems odd to default to trying WRITE SAME for unmap
for a device that explicitly tells us that it doesn't support WRITE
SAME.

Note that I'm not against your patch - I suspect forcing us to read
EVPD pages even for devices that claim to be SPC-2 will come in useful
in various scenarios.

2014-07-16 17:48:23

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

>>>>> "Christoph" == hch@infradead org <[email protected]> writes:

Christoph> Oh, we actually have devices that support WRITE SAME with
Christoph> unmap, but not without? That's defintively a little strange.

Yep :(

There were several SSDs that did not want to support wearing out flash
by writing gobs of zeroes and only support the UNMAP case.

Christoph> Yes, and it did this intentionally. I really wouldn't expect
Christoph> devices to support WRITE SAME with UNMAP but blow up on a
Christoph> WRITE SAME without it (and not just simple fail it in an
Christoph> orderly way).

*sigh*

Christoph> It definitively seems odd to default to trying WRITE SAME for
Christoph> unmap for a device that explicitly tells us that it doesn't
Christoph> support WRITE SAME.

Maybe it's just a naming thing. I was really trying to convey
no_req_write_same support, not no_write_same_10_or_16.

Christoph> Note that I'm not against your patch - I suspect forcing us
Christoph> to read EVPD pages even for devices that claim to be SPC-2
Christoph> will come in useful in various scenarios.

I don't have a problem with a BLIST_PREFER_UNMAP flag or something like
that. But BLIST_TRY_VPD_PAGES seems more generally useful and it does
fix the problem at hand. That's why I went that route.

--
Martin K. Petersen Oracle Linux Engineering

2014-07-16 17:55:31

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Wed, Jul 16, 2014 at 01:47:35PM -0400, Martin K. Petersen wrote:
> There were several SSDs that did not want to support wearing out flash
> by writing gobs of zeroes and only support the UNMAP case.

Given that SSDs usually aren't hard provisioned anyway that seems like
an odd decision. But SAS SSD would be something I'd at least expect
to properly fail these..

> I don't have a problem with a BLIST_PREFER_UNMAP flag or something like
> that. But BLIST_TRY_VPD_PAGES seems more generally useful and it does
> fix the problem at hand. That's why I went that route.

As I said I'm perfectly fine with your patch and I think we'll find more
uses for it. I'll apply it as soon as I get a second review.

2014-07-16 18:02:18

by Elliott, Robert (Server Storage)

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Wed, 2014-07-16 at 13:47 -0400, Martin K. Petersen wrote:
> >>>>> "Christoph" == hch@infradead org <[email protected]> writes:
>
> Christoph> Oh, we actually have devices that support WRITE SAME with
> Christoph> unmap, but not without? That's defintively a little strange.
>
> Yep :(
>
> There were several SSDs that did not want to support wearing out flash
> by writing gobs of zeroes and only support the UNMAP case.
>
> Christoph> Yes, and it did this intentionally. I really wouldn't expect
> Christoph> devices to support WRITE SAME with UNMAP but blow up on a
> Christoph> WRITE SAME without it (and not just simple fail it in an
> Christoph> orderly way).
>
> *sigh*
>
> Christoph> It definitively seems odd to default to trying WRITE SAME for
> Christoph> unmap for a device that explicitly tells us that it doesn't
> Christoph> support WRITE SAME.
>
> Maybe it's just a naming thing. I was really trying to convey
> no_req_write_same support, not no_write_same_10_or_16.
>
> Christoph> Note that I'm not against your patch - I suspect forcing us
> Christoph> to read EVPD pages even for devices that claim to be SPC-2
> Christoph> will come in useful in various scenarios.
>
> I don't have a problem with a BLIST_PREFER_UNMAP flag or something like
> that. But BLIST_TRY_VPD_PAGES seems more generally useful and it does
> fix the problem at hand. That's why I went that route.

Hang on ... unless we apply Christoph or my fix, we'll get the same
issue with every raid driver (that's about 10 I think) that set
no_write_same when they hit a >2TB RAID volume, so I think we need both
fixes.

James

2014-07-16 18:45:14

[permalink] [raw]

Subject: RE: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

> -----Original Message-----
> From: [email protected] [mailto:linux-scsi-
> [email protected]] On Behalf Of James Bottomley
> Sent: Wednesday, 16 July, 2014 1:02 PM
> To: [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16
>
> On Wed, 2014-07-16 at 13:47 -0400, Martin K. Petersen wrote:
> > >>>>> "Christoph" == hch@infradead org <[email protected]> writes:
> >
> > Christoph> Oh, we actually have devices that support WRITE SAME with
> > Christoph> unmap, but not without? That's defintively a little strange.
> >
> > Yep :(
> >
> > There were several SSDs that did not want to support wearing out flash
> > by writing gobs of zeroes and only support the UNMAP case.
> >
> > Christoph> Yes, and it did this intentionally. I really wouldn't expect
> > Christoph> devices to support WRITE SAME with UNMAP but blow up on a
> > Christoph> WRITE SAME without it (and not just simple fail it in an
> > Christoph> orderly way).
> >
> > *sigh*
> >
> > Christoph> It definitively seems odd to default to trying WRITE SAME for
> > Christoph> unmap for a device that explicitly tells us that it doesn't
> > Christoph> support WRITE SAME.
> >
> > Maybe it's just a naming thing. I was really trying to convey
> > no_req_write_same support, not no_write_same_10_or_16.
> >
> > Christoph> Note that I'm not against your patch - I suspect forcing us
> > Christoph> to read EVPD pages even for devices that claim to be SPC-2
> > Christoph> will come in useful in various scenarios.
> >
> > I don't have a problem with a BLIST_PREFER_UNMAP flag or something like
> > that. But BLIST_TRY_VPD_PAGES seems more generally useful and it does
> > fix the problem at hand. That's why I went that route.
>
> Hang on ... unless we apply Christoph or my fix, we'll get the same
> issue with every raid driver (that's about 10 I think) that set
> no_write_same when they hit a >2TB RAID volume, so I think we need both
> fixes.
>
> James
>

WRITE SAME with the UNMAP bit set to one (and a few other
conditions) guarantees that the data is zeroed out, while
the UNMAP command is just a hint. They're not fully
interchangeable. Which semantics are implied by REQ_DISCARD
and these functions?

One benefit of UNMAP is that UNMAP supports a list of
discontiguous LBA ranges, whereas WRITE SAME just supports
one LBA range. sd_setup_discard_cmnd is not taking
advantage of this feature, though. Ideally, the block
layer would merge multiple discards into one UNMAP command
if they're stuck on the request queue for a while, like
it merges adjacent reads and writes. That would pave the way
for building up WRITE SCATTERED and READ GATHERED commands.

2014-07-16 18:46:38

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

>>>>> "James" == James Bottomley <[email protected]> writes:

>> I don't have a problem with a BLIST_PREFER_UNMAP flag or something
>> like that. But BLIST_TRY_VPD_PAGES seems more generally useful and it
>> does fix the problem at hand. That's why I went that route.

James> Hang on ... unless we apply Christoph or my fix, we'll get the
James> same issue with every raid driver (that's about 10 I think) that
James> set no_write_same when they hit a >2TB RAID volume,

no_write_same is set for all the RAID controller drivers (54b2b50c20a6).

If a WRITE SAME(10/16) command fails and the UNMAP bit is not set we'll
set no_write_same=1 and disable REQ_WRITE_SAME support.

If a WRITE SAME(10/16) command fails and the UNMAP bit is set we'll
disable REQ_DISCARD support.

Not sure where the 10 vs. 16 byte 2TB limitation comes into play here?

--
Martin K. Petersen Oracle Linux Engineering

2014-07-16 18:50:41

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Wed, 2014-07-16 at 14:45 -0400, Martin K. Petersen wrote:
> >>>>> "James" == James Bottomley <[email protected]> writes:
>
> >> I don't have a problem with a BLIST_PREFER_UNMAP flag or something
> >> like that. But BLIST_TRY_VPD_PAGES seems more generally useful and it
> >> does fix the problem at hand. That's why I went that route.
>
> James> Hang on ... unless we apply Christoph or my fix, we'll get the
> James> same issue with every raid driver (that's about 10 I think) that
> James> set no_write_same when they hit a >2TB RAID volume,
>
> no_write_same is set for all the RAID controller drivers (54b2b50c20a6).
>
> If a WRITE SAME(10/16) command fails and the UNMAP bit is not set we'll
> set no_write_same=1 and disable REQ_WRITE_SAME support.
>
> If a WRITE SAME(10/16) command fails and the UNMAP bit is set we'll
> disable REQ_DISCARD support.
>
> Not sure where the 10 vs. 16 byte 2TB limitation comes into play here?

It's the code we identified in sd.c:read_capacity_16():

if (buffer[14] & 0x80) { /* LBPME */
sdkp->lbpme = 1;

if (buffer[14] & 0x40) /* LBPRZ */
sdkp->lbprz = 1;

sd_config_discard(sdkp, SD_LBP_WS16);
}

If the inquiry shows LBPME set, we'll do write same regardless of the
no_write_same bit. That's why for SPC-2 devices it only shows up on
>2TB devices because they force RC16.

James

2014-07-16 19:09:00

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

>>>>> "James" == James Bottomley <[email protected]> writes:

James> It's the code we identified in sd.c:read_capacity_16():

That's there to support devices that implement thin provisioning but
which predate the LBP VPD page. And thus have no way to tell us their
preferred command variant.

James> If the inquiry shows LBPME set, we'll do write same regardless of
James> the no_write_same bit. That's why for SPC-2 devices it only
James> shows up on >> 2TB devices because they force RC16.

But we only do so if they have LBPME set. Generally they don't.

The problem for hyperv was that LBPME was set. And because it claimed
SBC-2 we did not check the LBP VPD page to see that it prefers UNMAP.

I believe we did consider unconditionally checking for VPDs when LBPME
was set but I seem to recall that it broke some device that had garbage
in the READ CAPACITY(16) response. I'll see if I can locate the details.

Otherwise I'm willing to entertain that idea.

--
Martin K. Petersen Oracle Linux Engineering

2014-07-16 19:14:23

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Wed, 2014-07-16 at 15:08 -0400, Martin K. Petersen wrote:
> >>>>> "James" == James Bottomley <[email protected]> writes:
>
> James> It's the code we identified in sd.c:read_capacity_16():
>
> That's there to support devices that implement thin provisioning but
> which predate the LBP VPD page. And thus have no way to tell us their
> preferred command variant.
>
> James> If the inquiry shows LBPME set, we'll do write same regardless of
> James> the no_write_same bit. That's why for SPC-2 devices it only
> James> shows up on >> 2TB devices because they force RC16.
>
> But we only do so if they have LBPME set. Generally they don't.
>
> The problem for hyperv was that LBPME was set. And because it claimed
> SBC-2 we did not check the LBP VPD page to see that it prefers UNMAP.
>
> I believe we did consider unconditionally checking for VPDs when LBPME
> was set but I seem to recall that it broke some device that had garbage
> in the READ CAPACITY(16) response. I'll see if I can locate the details.
>
> Otherwise I'm willing to entertain that idea.

Well, your judgement: is this situation (support for UNMAP but not for
WRITE_SAME) in what is effectively a RAID driver (hv drivers count as
RAID) just a silly Microsoft one off? If so, we can go with your force
VPD page patch. However, if we get any RAID drivers with strange
discard support, we'll likely get the same problem ... I've got to say
in the long run, it sounds likely we will given that RAID vendors are
only marginally more competent than USB bridge vendors when it comes to
following the spec.

James

2014-07-16 19:21:20

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

>>>>> "Rob" == Elliott, Robert (Server Storage) <[email protected]> writes:

Rob> WRITE SAME with the UNMAP bit set to one (and a few other
Rob> conditions) guarantees that the data is zeroed out, while the UNMAP
Rob> command is just a hint. They're not fully interchangeable. Which
Rob> semantics are implied by REQ_DISCARD and these functions?

Voluntary deprovisioning of a block range.

Rob> One benefit of UNMAP is that UNMAP supports a list of discontiguous
Rob> LBA ranges, whereas WRITE SAME just supports one LBA range.
Rob> sd_setup_discard_cmnd is not taking advantage of this feature,
Rob> though.

The block layer can only describe one contiguous block range in a
request. My copy offload patches introduces the bi_special field that
allows us to attach additional information to an I/O. I have
experimented with doing that for discards to overcome the suck of DSM
TRIM. Splitting and merging requests in MD/DM gets much more cumbersome,
though.

Rob> Ideally, the block layer would merge multiple discards into one
Rob> UNMAP command if they're stuck on the request queue for a while,
Rob> like it merges adjacent reads and writes.

We support merging contiguous smaller discard requests.

We did discuss having a (separate) I/O scheduler to coalesce
discontiguous discard requests. However, the attempts at this turned out
to be pretty hideous.

It also wasn't evident that it was worth the hassle. While UNMAP allows
us to express large regions, DSM TRIM on the SATA side is limited to 32
MB per range. So in many cases we end up maxing out the payload capacity
even with a single contiguous range.

We expect LBP SCSI devices to queue commands. Being able to express
multiple ranges in one shot is less critical in that case.

--
Martin K. Petersen Oracle Linux Engineering

2014-07-16 19:44:56

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

>>>>> "James" == James Bottomley <[email protected]> writes:

James> Well, your judgement: is this situation (support for UNMAP but
James> not for WRITE_SAME) in what is effectively a RAID driver (hv
James> drivers count as RAID) just a silly Microsoft one off?

I only recall seeing one or two devices that supported LBP but not WRITE
SAME w/UNMAP.

James> However, if we get any RAID drivers with strange discard support,
James> we'll likely get the same problem

The LBP VPD page is mandatory now. It wasn't for the first couple of
generations of devices that we still have to support. I think that if a
vendor were to support LBP, adding the mandatory VPD page would be a
given. And so far nobody has messed up the LBP VPD page contents.

My main gripe about linking no_write_same and discard functionality is
that the heuristics for the latter are already excessively complex
thanks to having to support devices that predate the spec. I'm wary of
adding another dimension to that.

Also, linking the two use cases we can get into inconsistent states
where no_write_same is set but the device does not support UNMAP and has
LBPWS=1 and LBPWS10=1 set in the LBP VPD.

I'll contemplate the LBPME => mandatory VPD lookup thing for bit.

--
Martin K. Petersen Oracle Linux Engineering

2014-07-17 07:42:51

[permalink] [raw]

Subject: Re: [PATCH 4/8] Drivers: scsi: storvsc: Filter WRITE_SAME_16

On Wed, Jul 16, 2014 at 03:20:00PM -0400, Martin K. Petersen wrote:
> The block layer can only describe one contiguous block range in a
> request. My copy offload patches introduces the bi_special field that
> allows us to attach additional information to an I/O. I have
> experimented with doing that for discards to overcome the suck of DSM
> TRIM. Splitting and merging requests in MD/DM gets much more cumbersome,
> though.

I had a bunch of prototypes for this years ago that didn't really work
out. I always made ranged didscards something that driver had to opt
in for. In my case md and dm never opted in, although for mirroring or
multipath it should of course handle it fairly easily.

> It also wasn't evident that it was worth the hassle. While UNMAP allows
> us to express large regions, DSM TRIM on the SATA side is limited to 32
> MB per range. So in many cases we end up maxing out the payload capacity
> even with a single contiguous range.

That's mostly because we don't support larger than 512 byte TRIM payloads
yet..

2014-07-17 12:41:58