Some kernel transport drivers unconditionally disable
retrieval of the Caching mode page. One such for example is
the BBB/CBI transport over USB. Such a restraint is too
harsh as some devices do support the Caching mode
page. Unconditionally enabling the retrieval of this mode
page over those transports at their transport code level may
result in some devices failing and becoming unusable.
This patch implements a method of retrieving the Caching
mode page without unconditionally enabling it in the
transports which unconditionally disable it. The idea is to
ask for all supported pages, page code 0x3F, and then search
for the Caching mode page in the mode parameter data
returned. The sd driver already asks for all the mode pages
supported by the attached device by setting the page code to
0x3F in order to find out if the media is write protected by
reading the WP bit in the Device Specific Parameter
field. It then attempts to retrieve only the Caching mode
page by setting the page code to 8 and actually attempting
to retrieve it if and only if the transport allows it.
The method implemented here is that if the transport doesn't
allow retrieval of the Caching mode page and the device is
not RBC, then we ask for all pages supported by setting the
page code to 0x3F (similarly to how the WP bit is retrieved
above), and then we search for the Caching mode page in the
mode parameter data returned.
With this patch, devices over SATA, report this (no change):
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB)
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 22 18:45:58 localhost kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Smart devices report their Caching mode page. This is a
change where we'd previously see the kernel making
assumption about the device's cache being write-through:
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] 610472646 4096-byte logical blocks: (2.50 TB/2.27 TiB)
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Write Protect is off
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Mode Sense: 47 00 10 08
Oct 22 18:45:58 localhost kernel: sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
And "dumb" devices over BBB, are correctly shown not to
support reporting the Caching mode page:
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] 15663104 512-byte logical blocks: (8.01 GB/7.46 GiB)
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Write Protect is off
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Mode Sense: 23 00 00 00
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] No Caching mode page present
Oct 22 18:49:06 localhost kernel: sd 7:0:0:0: [sdc] Assuming drive cache: write through
Signed-off-by: Luben Tuikov <[email protected]>
---
drivers/scsi/sd.c | 63 +++++++++++++++++++++++++++++++++++++++-------------
1 files changed, 47 insertions(+), 16 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ffa0689..81e83d7 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1867,10 +1867,14 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char *buffer)
int old_rcd = sdkp->RCD;
int old_dpofua = sdkp->DPOFUA;
- if (sdp->skip_ms_page_8)
- goto defaults;
-
- if (sdp->type == TYPE_RBC) {
+ if (sdp->skip_ms_page_8) {
+ if (sdp->type == TYPE_RBC)
+ goto defaults;
+ else {
+ modepage = 0x3F;
+ dbd = 0;
+ }
+ } else if (sdp->type == TYPE_RBC) {
modepage = 6;
dbd = 8;
} else {
@@ -1898,13 +1902,11 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char *buffer)
*/
if (len < 3)
goto bad_sense;
- if (len > 20)
- len = 20;
-
- /* Take headers and block descriptors into account */
- len += data.header_length + data.block_descriptor_length;
- if (len > SD_BUF_SIZE)
- goto bad_sense;
+ else if (len > SD_BUF_SIZE) {
+ sd_printk(KERN_NOTICE, sdkp, "Truncating mode parameter "
+ "data from %d to %d bytes\n", len, SD_BUF_SIZE);
+ len = SD_BUF_SIZE;
+ }
/* Get the data */
res = sd_do_mode_sense(sdp, dbd, modepage, buffer, len, &data, &sshdr);
@@ -1912,16 +1914,45 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char *buffer)
if (scsi_status_is_good(res)) {
int offset = data.header_length + data.block_descriptor_length;
- if (offset >= SD_BUF_SIZE - 2) {
- sd_printk(KERN_ERR, sdkp, "Malformed MODE SENSE response\n");
- goto defaults;
+ while (offset < len) {
+ u8 page_code = buffer[offset] & 0x3F;
+ u8 spf = buffer[offset] & 0x40;
+
+ if (page_code == 8 || page_code == 6) {
+ /* We're interested only in the first 3 bytes.
+ */
+ if (len - offset <= 2) {
+ sd_printk(KERN_ERR, sdkp, "Incomplete "
+ "mode parameter data\n");
+ goto defaults;
+ } else {
+ modepage = page_code;
+ goto Page_found;
+ }
+ } else {
+ /* Go to the next page */
+ if (spf && len - offset > 3)
+ offset += 4 + (buffer[offset+2] << 8) +
+ buffer[offset+3];
+ else if (!spf && len - offset > 1)
+ offset += 2 + buffer[offset+1];
+ else {
+ sd_printk(KERN_ERR, sdkp, "Incomplete "
+ "mode parameter data\n");
+ goto defaults;
+ }
+ }
}
- if ((buffer[offset] & 0x3f) != modepage) {
+ if (modepage == 0x3F) {
+ sd_printk(KERN_ERR, sdkp, "No Caching mode page "
+ "present\n");
+ goto defaults;
+ } else if ((buffer[offset] & 0x3f) != modepage) {
sd_printk(KERN_ERR, sdkp, "Got wrong page\n");
goto defaults;
}
-
+ Page_found:
if (modepage == 8) {
sdkp->WCE = ((buffer[offset + 2] & 0x04) != 0);
sdkp->RCD = ((buffer[offset + 2] & 0x01) != 0);
--
1.7.2.2.165.gbc382
On Mon, Nov 22, 2010 at 8:56 AM, Luben Tuikov <[email protected]> wrote:
>
> Some kernel transport drivers unconditionally disable
> retrieval of the Caching mode page. One such for example is
> the BBB/CBI transport over USB.
One reason for that is that historically we've seen devices that
simply go crazy - to the point of simply stopping to respond to
anything - when you ask for pages that Windows doesn't ask for.
It's especially common on USB storage, but it happens elsewhere too.
The device firmware simply hasn't ever been tested in that situation,
and it's buggy.
So I don't mind the patch per se, but I think it's potentially way
more dangerous than it looks.
Linus
On 10-11-22 12:07 PM, Linus Torvalds wrote:
> On Mon, Nov 22, 2010 at 8:56 AM, Luben Tuikov<[email protected]> wrote:
>>
>> Some kernel transport drivers unconditionally disable
>> retrieval of the Caching mode page. One such for example is
>> the BBB/CBI transport over USB.
>
> One reason for that is that historically we've seen devices that
> simply go crazy - to the point of simply stopping to respond to
> anything - when you ask for pages that Windows doesn't ask for.
>
> It's especially common on USB storage, but it happens elsewhere too.
> The device firmware simply hasn't ever been tested in that situation,
> and it's buggy.
>
> So I don't mind the patch per se, but I think it's potentially way
> more dangerous than it looks.
The vast majority of USB mass storage devices are based
on SCSI-2 (1994) or a particularly nasty standard
called RBC (Reduced Block Commands, 1999) which is a
partial subset of the block commands (i.e. disk related).
We are all aware of the quality of most of the device
end implementations out in the wild.
I believe what Luben is working with is a new standard
called UAS (soon to be ratified) which is based on
http://www.t10.org work in the last few years. It seems to be
an attempt to throw out the older USB mass storage
transport and do it again, properly.
In the USB domain the UAS transport is identified in
an interface as mass storage class (8), SCSI transparent
command set subclass (6) and protocol 0x62. I would
think that the USB and SCSI layers could be modified to
remove some or all of its mass storage hacks (e.g. disabling
retrieval of the Caching mode page) when the transport
is UAS.
Doug Gilbert
--- On Mon, 11/22/10, Linus Torvalds <[email protected]> wrote:
> > Some kernel transport drivers unconditionally disable
> > retrieval of the Caching mode page. One such for
> example is
> > the BBB/CBI transport over USB.
>
> One reason for that is that historically we've seen devices
> that
> simply go crazy - to the point of simply stopping to
> respond to
> anything - when you ask for pages that Windows doesn't ask
> for.
>
> It's especially common on USB storage, but it happens
> elsewhere too.
> The device firmware simply hasn't ever been tested in that
> situation,
> and it's buggy.
>
> So I don't mind the patch per se, but I think it's
> potentially way
> more dangerous than it looks.
This patch does not ask for pages that Windows doesn't ask for. The sd driver already asks for all pages (page code 0x3F) and then checks if the device is write protected. Here is the present code:
2217 sd_read_write_protect_flag(sdkp, buffer);
2218 sd_read_cache_type(sdkp, buffer);
2219 sd_read_app_tag_own(sdkp, buffer);
Line 2217 asks for page code 0x3F, meaning all mode pages.
Line 2218 asks for the Caching mode page or not at all if the device flags forbid it (all USB storage devices in the Linux kernel).
This patch adds processing of the data returned when all mode pages are asked for (0x3F) in the function on line 2218. It then parses the data to find out if the Caching mode page is present.
Luben
On Mon, Nov 22, 2010 at 02:02:06PM -0500, Douglas Gilbert wrote:
> On 10-11-22 12:07 PM, Linus Torvalds wrote:
> >On Mon, Nov 22, 2010 at 8:56 AM, Luben Tuikov<[email protected]> wrote:
> >>
> >>Some kernel transport drivers unconditionally disable
> >>retrieval of the Caching mode page. One such for example is
> >>the BBB/CBI transport over USB.
> >
> >One reason for that is that historically we've seen devices that
> >simply go crazy - to the point of simply stopping to respond to
> >anything - when you ask for pages that Windows doesn't ask for.
> >
> >It's especially common on USB storage, but it happens elsewhere too.
> >The device firmware simply hasn't ever been tested in that situation,
> >and it's buggy.
> >
> >So I don't mind the patch per se, but I think it's potentially way
> >more dangerous than it looks.
>
> The vast majority of USB mass storage devices are based
> on SCSI-2 (1994) or a particularly nasty standard
> called RBC (Reduced Block Commands, 1999) which is a
> partial subset of the block commands (i.e. disk related).
> We are all aware of the quality of most of the device
> end implementations out in the wild.
Not true. The vast majority of USB mass storage devices adhere to no SCSI
or T10 specification at all. The official definition of the "Transparent
SCSI" operating mode of the USB Mass Storage Class is "that which
microsoft/sun/other major vendors use". Of course, that's not written down
anywhere, but it was the intent of the committee and that is the reason why
the committee rejected all attempts by people like me to implement a
command-based compliance test.
> I believe what Luben is working with is a new standard
> called UAS (soon to be ratified) which is based on
> http://www.t10.org work in the last few years. It seems to be
> an attempt to throw out the older USB mass storage
> transport and do it again, properly.
Luben's patch changes the behavior of sd-mod, which would affect both
usb-storage and UAS.
The primary thing that UAS adds is support for USB 3.0 streams and TCQ,
which are both designed to improve performance. I consider it likely that
the quality of device firmware will be equally poor as older devices.
That said, Luben's patch is kinda slick. sd-mod already asks for a lot of
mode page data; it does this so that it's request matches the observed
behavior of a "popular" Redmond, WA-based OS. Luben's patch just searches
through that data to see if it can find what it is looking for, rather than
rqeuesting a specific mode page later (which may not be supported).
That said, I still consider this somewhat risky. We've seen devices with
grossly inaccurate data before.
But, to my thinking, as long as the devices don't see any change in the
command stream, then I think the risk has been properly mitigated. To my
understanding, this is the case.
Matt
--
Matthew Dharm Home: [email protected]
Maintainer, Linux USB Mass Storage Driver
Sir, for the hundreth time, we do NOT carry 600-round boxes of belt-fed
suction darts!
-- Salesperson to Greg
User Friendly, 12/30/1997
On Mon, Nov 22, 2010 at 12:02:26PM -0800, Luben Tuikov wrote:
> --- On Mon, 11/22/10, Linus Torvalds <[email protected]>
> wrote:
> > So I don't mind the patch per se, but I think it's potentially way more
> > dangerous than it looks.
>
> This patch does not ask for pages that Windows doesn't ask for. The sd
> driver already asks for all pages (page code 0x3F) and then checks if the
> device is write protected. Here is the present code:
>
> 2217 sd_read_write_protect_flag(sdkp, buffer); 2218
> sd_read_cache_type(sdkp, buffer); 2219
> sd_read_app_tag_own(sdkp, buffer);
>
> Line 2217 asks for page code 0x3F, meaning all mode pages. Line 2218
> asks for the Caching mode page or not at all if the device flags forbid
> it (all USB storage devices in the Linux kernel).
>
> This patch adds processing of the data returned when all mode pages are
> asked for (0x3F) in the function on line 2218. It then parses the data to
> find out if the Caching mode page is present.
As long as you are not changing the command stream that the devices see, I
think this is a reasonable risk to take.
What are the consequences if the device returns what appear to be a Caching
Mode Page, but it is actually filled with garbage or otherwise inaccurate
data?
Matt
--
Matthew Dharm Home: [email protected]
Maintainer, Linux USB Mass Storage Driver
You suck Stef.
-- Greg
User Friendly, 11/29/97
--- On Mon, 11/22/10, Matthew Dharm <[email protected]> wrote:
> That said, Luben's patch is kinda slick.
Thank you.
> sd-modalready asks for a lot of
> mode page data; it does this so that it's request matches
> the observed
> behavior of a "popular" Redmond, WA-based OS.? Luben's
> patch just searches
> through that data to see if it can find what it is looking
> for, rather than
> rqeuesting a specific mode page later (which may not be
> supported).
Yes, devices wouldn't sense a change in requests. The patch preserves this quality of device discovery.
Luben
--- On Mon, 11/22/10, Matthew Dharm <[email protected]> wrote:
> As long as you are not changing the command stream that the
> devices see, I
> think this is a reasonable risk to take.
No, not at all.
> What are the consequences if the device returns what appear
> to be a Caching
> Mode Page, but it is actually filled with garbage or
> otherwise inaccurate
> data?
Data corruption would be no different than when determining whether the media is write protected or determining other parameters. A similar question would be "What would the consequences be if the device misreported the media write-protect bit?" I suppose such a device would be quite broken and inoperative.
Luben
On Tue, Nov 23, 2010 at 01:25:23AM -0800, Luben Tuikov wrote:
> --- On Mon, 11/22/10, Matthew Dharm <[email protected]>
> wrote:
>
> > What are the consequences if the device returns what appear to be a
> > Caching Mode Page, but it is actually filled with garbage or otherwise
> > inaccurate data?
>
> Data corruption would be no different than when determining whether the
> media is write protected or determining other parameters. A similar
> question would be "What would the consequences be if the device
> misreported the media write-protect bit?" I suppose such a device would
> be quite broken and inoperative.
Not really. In your example case of the write-protect bit, either the
device would be improperly marked as write-protected when it is not, or we
would improperly send write commands to a protected device. Neither is a
great tradgedy; commands will fail, errors will be generated, and life will
go on.
However, I don't know what is in the Caching Mode Page. I don't know how
those bits are used to determine the behavior of the rest of the kernel.
Maybe one of those bits means "only store data in Japanese" or something
equally disturbing.
The old code used to make a "safe" default choice if the caching mode page
was not available (for whatever reason). What are the implications of not
making the "safe" choice improperly (due to mode page corruption)?
Matt
--
Matthew Dharm Home: [email protected]
Maintainer, Linux USB Mass Storage Driver
I need a computer?
-- Customer
User Friendly, 2/19/1998
On 10-11-22 11:59 PM, Matthew Dharm wrote:
> On Mon, Nov 22, 2010 at 02:02:06PM -0500, Douglas Gilbert wrote:
>> On 10-11-22 12:07 PM, Linus Torvalds wrote:
>>> On Mon, Nov 22, 2010 at 8:56 AM, Luben Tuikov<[email protected]> wrote:
>>>>
>>>> Some kernel transport drivers unconditionally disable
>>>> retrieval of the Caching mode page. One such for example is
>>>> the BBB/CBI transport over USB.
>>>
>>> One reason for that is that historically we've seen devices that
>>> simply go crazy - to the point of simply stopping to respond to
>>> anything - when you ask for pages that Windows doesn't ask for.
>>>
>>> It's especially common on USB storage, but it happens elsewhere too.
>>> The device firmware simply hasn't ever been tested in that situation,
>>> and it's buggy.
>>>
>>> So I don't mind the patch per se, but I think it's potentially way
>>> more dangerous than it looks.
>>
>> The vast majority of USB mass storage devices are based
>> on SCSI-2 (1994) or a particularly nasty standard
>> called RBC (Reduced Block Commands, 1999) which is a
>> partial subset of the block commands (i.e. disk related).
>> We are all aware of the quality of most of the device
>> end implementations out in the wild.
>
> Not true. The vast majority of USB mass storage devices adhere to no SCSI
> or T10 specification at all. The official definition of the "Transparent
> SCSI" operating mode of the USB Mass Storage Class is "that which
> microsoft/sun/other major vendors use". Of course, that's not written down
> anywhere, but it was the intent of the committee and that is the reason why
> the committee rejected all attempts by people like me to implement a
> command-based compliance test.
Okay.
>> I believe what Luben is working with is a new standard
>> called UAS (soon to be ratified) which is based on
>> http://www.t10.org work in the last few years. It seems to be
>> an attempt to throw out the older USB mass storage
>> transport and do it again, properly.
>
> Luben's patch changes the behavior of sd-mod, which would affect both
> usb-storage and UAS.
>
> The primary thing that UAS adds is support for USB 3.0 streams and TCQ,
> which are both designed to improve performance. I consider it likely that
> the quality of device firmware will be equally poor as older devices.
That depends on how low we, and particularly W7, set
the bar. And W7 has a similar set of problems to
us. For example, how to detect an SSD supporting
trim behind a USB transport.
Doug Gilbert
> That said, Luben's patch is kinda slick. sd-mod already asks for a lot of
> mode page data; it does this so that it's request matches the observed
> behavior of a "popular" Redmond, WA-based OS. Luben's patch just searches
> through that data to see if it can find what it is looking for, rather than
> rqeuesting a specific mode page later (which may not be supported).
>
> That said, I still consider this somewhat risky. We've seen devices with
> grossly inaccurate data before.
>
> But, to my thinking, as long as the devices don't see any change in the
> command stream, then I think the risk has been properly mitigated. To my
> understanding, this is the case.
>
> Matt
>
--- On Tue, 11/23/10, Matthew Dharm <[email protected]> wrote:
> Not really.? In your example case of the write-protect
> bit, either the
> device would be improperly marked as write-protected when
> it is not, or we
> would improperly send write commands to a protected
> device.? Neither is a
> great tradgedy; commands will fail, errors will be
> generated, and life will
> go on.
Yes, indeed.
> However, I don't know what is in the Caching Mode
> Page.
It's described in SBC-3.
> I don't know how
> those bits are used to determine the behavior of the rest
> of the kernel.
I see.
> Maybe one of those bits means "only store data in Japanese"
> or something
> equally disturbing.
I doubt it. (And I don't find that disturbing.)
> The old code used to make a "safe" default choice if the
> caching mode page
> was not available (for whatever
> reason).???What are the implications of not
> making the "safe" choice improperly (due to mode page
> corruption)?
Let's see... the old code made the ``"safe" default choice'' of believing that the write cache is disabled and will thus never synchronize the cache, i.e. send SYNCHRONIZE CACHE command, reading drivers/scsi/sd.c. It also believes that the read cache is enabled, harmless as it is not used by the kernel. It also assumes that DPOFUA is not supported and will thus never set FUA in the CDB, in sd_revalidate_disk() (blk dev ops).
Quoting you, s/write commands/cdbs with FUA set or SYNCHRONIZE CACHE command/g:
> we
> would improperly send write commands to a protected
> device. Neither is a
> great tradgedy; commands will fail, errors will be
> generated, and life will
> go on.
But I believe the discussion has gone astray. Bringing it back to the original topic:
I doubt this as very unlikely. Has anyone actually seen a device that sends mode parameter data with faux Caching mode page or corrupted data that is in fact interpreted as a Caching mode page? Is such a device fully operational sans the faux Caching mode page, or does it just not work? Is it common to have devices having a faux Caching mode page or corrupted mode parameter data resulting in a Caching mode page with random data?
Undoubtedly, as the usb-storage maintainer, you must have variety of devices, some broken some not. Could you apply this patch to your tree and test some of the devices you have? My tests indicate a stable behavior.
Luben
On Wed, 2010-11-24 at 01:02 -0800, Luben Tuikov wrote:
> I doubt this as very unlikely. Has anyone actually seen a device that
> sends mode parameter data with faux Caching mode page or corrupted
> data that is in fact interpreted as a Caching mode page? Is such a
> device fully operational sans the faux Caching mode page, or does it
> just not work? Is it common to have devices having a faux Caching mode
> page or corrupted mode parameter data resulting in a Caching mode page
> with random data?
>
> Undoubtedly, as the usb-storage maintainer, you must have variety of
> devices, some broken some not. Could you apply this patch to your tree
> and test some of the devices you have? My tests indicate a stable
> behavior.
The basic problem isn't devices lying ... the worst we'll do is current
behaviour (not SYNC when we should). The problem is devices that get
confused (or worse simply crash the firmware). The best way to avoid
the crashing firmware problem ... if we can assume that modern USB
devices are better is to key off the SCSI version. Unfortunately, in
spite of several attempts, we've never managed to stop usbstorage lying
about this:
/* Some devices report a SCSI revision level above 2 but are
* unable to handle the REPORT LUNS command (for which
* support is mandatory at level 3). Since we already have
* a Get-Max-LUN request, we won't lose much by setting the
* revision level down to 2. The only devices that would be
* affected are those with sparse LUNs. */
if (sdev->scsi_level > SCSI_2)
sdev->sdev_target->scsi_level =
sdev->scsi_level = SCSI_2;
Untangling all of this would be rather complex, I fear.
The final question is is it worth it? Since USB devices are supposed to
be hot unpluggable, surely a USB device with a write back cache would be
a disaster: no-one will SYNC the cache on a surprise unplug anyway ...
therefore there shouldn't really be any of them surviving in the wild
(famous last words, I suppose).
James
On Wed, Nov 24, 2010 at 10:10:29AM +0000, James Bottomley wrote:
> The final question is is it worth it? Since USB devices are supposed to
> be hot unpluggable, surely a USB device with a write back cache would be
> a disaster: no-one will SYNC the cache on a surprise unplug anyway ...
> therefore there shouldn't really be any of them surviving in the wild
> (famous last words, I suppose).
There's tons of USB devices with writeback caches. Recent windows
version tend to use the FUA bit a lot because of this. And the linux
usb storage target implementation thus has a hook to ignore the FUA bit,
and it's probably not the only one..
On Wed, 24 Nov 2010, James Bottomley wrote:
> On Wed, 2010-11-24 at 01:02 -0800, Luben Tuikov wrote:
> > I doubt this as very unlikely. Has anyone actually seen a device that
> > sends mode parameter data with faux Caching mode page or corrupted
> > data that is in fact interpreted as a Caching mode page? Is such a
> > device fully operational sans the faux Caching mode page, or does it
> > just not work? Is it common to have devices having a faux Caching mode
> > page or corrupted mode parameter data resulting in a Caching mode page
> > with random data?
> >
> > Undoubtedly, as the usb-storage maintainer, you must have variety of
> > devices, some broken some not. Could you apply this patch to your tree
> > and test some of the devices you have? My tests indicate a stable
> > behavior.
>
> The basic problem isn't devices lying ... the worst we'll do is current
> behaviour (not SYNC when we should). The problem is devices that get
> confused (or worse simply crash the firmware). The best way to avoid
> the crashing firmware problem ... if we can assume that modern USB
> devices are better is to key off the SCSI version. Unfortunately, in
> spite of several attempts, we've never managed to stop usbstorage lying
> about this:
>
> /* Some devices report a SCSI revision level above 2 but are
> * unable to handle the REPORT LUNS command (for which
> * support is mandatory at level 3). Since we already have
> * a Get-Max-LUN request, we won't lose much by setting the
> * revision level down to 2. The only devices that would be
> * affected are those with sparse LUNs. */
> if (sdev->scsi_level > SCSI_2)
> sdev->sdev_target->scsi_level =
> sdev->scsi_level = SCSI_2;
>
> Untangling all of this would be rather complex, I fear.
Quite likely.
> The final question is is it worth it? Since USB devices are supposed to
> be hot unpluggable, surely a USB device with a write back cache would be
> a disaster: no-one will SYNC the cache on a surprise unplug anyway ...
> therefore there shouldn't really be any of them surviving in the wild
> (famous last words, I suppose).
Well, hot unpluggable doesn't mean it's okay to unplug the device at
any time. For example, under Windows you're not supposed to unplug a
USB drive without first going through the "Safely remove hardware"
applet. And of course, you can easily guess what command that applet
sends to the device...
On the whole, I'm with Luben on this. The likelihood of introducing
bad behavior because of devices sending incorrect cache-page
information seems very small.
Alan Stern
On Wed, 2010-11-24 at 09:49 -0500, Alan Stern wrote:
> On Wed, 24 Nov 2010, James Bottomley wrote:
>
> > On Wed, 2010-11-24 at 01:02 -0800, Luben Tuikov wrote:
> > > I doubt this as very unlikely. Has anyone actually seen a device that
> > > sends mode parameter data with faux Caching mode page or corrupted
> > > data that is in fact interpreted as a Caching mode page? Is such a
> > > device fully operational sans the faux Caching mode page, or does it
> > > just not work? Is it common to have devices having a faux Caching mode
> > > page or corrupted mode parameter data resulting in a Caching mode page
> > > with random data?
> > >
> > > Undoubtedly, as the usb-storage maintainer, you must have variety of
> > > devices, some broken some not. Could you apply this patch to your tree
> > > and test some of the devices you have? My tests indicate a stable
> > > behavior.
> >
> > The basic problem isn't devices lying ... the worst we'll do is current
> > behaviour (not SYNC when we should). The problem is devices that get
> > confused (or worse simply crash the firmware). The best way to avoid
> > the crashing firmware problem ... if we can assume that modern USB
> > devices are better is to key off the SCSI version. Unfortunately, in
> > spite of several attempts, we've never managed to stop usbstorage lying
> > about this:
> >
> > /* Some devices report a SCSI revision level above 2 but are
> > * unable to handle the REPORT LUNS command (for which
> > * support is mandatory at level 3). Since we already have
> > * a Get-Max-LUN request, we won't lose much by setting the
> > * revision level down to 2. The only devices that would be
> > * affected are those with sparse LUNs. */
> > if (sdev->scsi_level > SCSI_2)
> > sdev->sdev_target->scsi_level =
> > sdev->scsi_level = SCSI_2;
> >
> > Untangling all of this would be rather complex, I fear.
>
> Quite likely.
>
> > The final question is is it worth it? Since USB devices are supposed to
> > be hot unpluggable, surely a USB device with a write back cache would be
> > a disaster: no-one will SYNC the cache on a surprise unplug anyway ...
> > therefore there shouldn't really be any of them surviving in the wild
> > (famous last words, I suppose).
>
> Well, hot unpluggable doesn't mean it's okay to unplug the device at
> any time. For example, under Windows you're not supposed to unplug a
> USB drive without first going through the "Safely remove hardware"
> applet. And of course, you can easily guess what command that applet
> sends to the device...
>
> On the whole, I'm with Luben on this. The likelihood of introducing
> bad behavior because of devices sending incorrect cache-page
> information seems very small.
Yes, just assure me that sending the mode page request won't kill
anything and I'm fine with applying it. As I said, as long as we get a
reply, whatever it is it can't make use behave any worse than we do now.
James
--- On Wed, 11/24/10, James Bottomley <[email protected]> wrote:
> The basic problem isn't devices lying ... the worst we'll
> do is current
> behaviour (not SYNC when we should).? The problem is
> devices that get
> confused (or worse simply crash the firmware).? The
> best way to avoid
> the crashing firmware problem ... if we can assume that
> modern USB
> devices are better is to key off the SCSI version.?
> Unfortunately, in
> spite of several attempts, we've never managed to stop
> usbstorage lying
> about this:
>
> ??? ??? /* Some devices
> report a SCSI revision level above 2 but are
> ??? ?????* unable
> to handle the REPORT LUNS command (for which
> ??? ?????* support
> is mandatory at level 3).? Since we already have
> ??? ?????* a
> Get-Max-LUN request, we won't lose much by setting the
> ??? ?????* revision
> level down to 2.? The only devices that would be
> ??? ?????* affected
> are those with sparse LUNs. */
> ??? ??? if
> (sdev->scsi_level > SCSI_2)
> ??? ??? ???
> sdev->sdev_target->scsi_level =
> ??? ??? ???
> ??? ??? sdev->scsi_level =
> SCSI_2;
>
> Untangling all of this would be rather complex, I fear.
CBI/BBB isn't supposed to be, nor is designed to support SAM-modern devices. So while REQUEST LUN /may/ work on some devices which implement it in their firmware, it is NOT a requirement for those devices as they are not required to adhere to any SAM version. Those transport protocols define a class-specific request to get the maximum LUN, and another to reset the target port (instead of I_T Reset or LU Reset). They also do not support SCSI Status completion of the command, nor Autosense. They also do not provide TMFs. They provide none of the SCSI transport protocol services in support of the Execute Command procedure call. The SCSI layer shouldn't be trying to guess their "SCSI version", and or treat them as real SCSI devices sending REPORT LUNs, etc. commands.
Newer, modern transport protocols over USB, are part of SAM, and it is devices who connect via those protocols that are being disadvantaged, due to the adoption (assumption) of CBI/BBB well into the SCSI layer.
To this effect, the transport protocol can tell upper layers if the device is true SCSI (new usb transports or other) or hybrid (usb-storage). In the former case, the device is a SCSI device, in the latter, only basic commands should be attempted.
This isn't to say that firmware for those devices wouldn't be buggy. Of course it will, and most will probably port their legacy FW over to the new SPTL, but the protocol requirements are there by design (i.e. there is no longer Get Max Lun class-specific request, the application client has to send REPORT LUNS, and FW has to answer it) and we have to accommodate that.
It is in this spirit that this patch doesn't change wire behavior, but simply parses data returned by a command already supported by older protocols.
Luben
--- On Wed, 11/24/10, Luben Tuikov <[email protected]> wrote:
>
> CBI/BBB isn't supposed to be, nor is designed to support
> SAM-modern devices. So while REQUEST LUN /may/ work on some
> devices which implement it in their firmware, it is NOT a
> requirement for those devices as they are not required to
> adhere to any SAM version. Those transport protocols define
> a class-specific request to get the maximum LUN, and another
> to reset the target port (instead of I_T Reset or LU Reset).
> They also do not support SCSI Status completion of the
> command, nor Autosense. They also do not provide TMFs. They
> provide none of the SCSI transport protocol services in
> support of the Execute Command procedure call. The SCSI
> layer shouldn't be trying to guess their "SCSI version", and
> or treat them as real SCSI devices sending REPORT LUNs, etc.
> commands.
>
> Newer, modern transport protocols over USB, are part of
> SAM, and it is devices who connect via those protocols that
> are being disadvantaged, due to the adoption (assumption) of
> CBI/BBB well into the SCSI layer.
>
> To this effect, the transport protocol can tell upper
> layers if the device is true SCSI (new usb transports or
> other) or hybrid (usb-storage). In the former case, the
> device is a SCSI device, in the latter, only basic commands
> should be attempted.
>
> This isn't to say that firmware for those devices wouldn't
> be buggy. Of course it will, and most will probably port
> their legacy FW over to the new SPTL, but the protocol
> requirements are there by design (i.e. there is no longer
> Get Max Lun class-specific request, the application client
> has to send REPORT LUNS, and FW has to answer it) and we
> have to accommodate that.
>
> It is in this spirit that this patch doesn't change wire
> behavior, but simply parses data returned by a command
> already supported by older protocols.
Did anyone pick up this patch?
Luben
--- On Sun, 12/5/10, Luben Tuikov <[email protected]> wrote:
> --- On Wed, 11/24/10, Luben Tuikov
> <[email protected]>
> wrote:
> >
> > CBI/BBB isn't supposed to be, nor is designed to
> support
> > SAM-modern devices. So while REQUEST LUN /may/ work on
> some
> > devices which implement it in their firmware, it is
> NOT a
> > requirement for those devices as they are not required
> to
> > adhere to any SAM version. Those transport protocols
> define
> > a class-specific request to get the maximum LUN, and
> another
> > to reset the target port (instead of I_T Reset or LU
> Reset).
> > They also do not support SCSI Status completion of
> the
> > command, nor Autosense. They also do not provide TMFs.
> They
> > provide none of the SCSI transport protocol services
> in
> > support of the Execute Command procedure call. The
> SCSI
> > layer shouldn't be trying to guess their "SCSI
> version", and
> > or treat them as real SCSI devices sending REPORT
> LUNs, etc.
> > commands.
> >
> > Newer, modern transport protocols over USB, are part
> of
> > SAM, and it is devices who connect via those protocols
> that
> > are being disadvantaged, due to the adoption
> (assumption) of
> > CBI/BBB well into the SCSI layer.
> >
> > To this effect, the transport protocol can tell upper
> > layers if the device is true SCSI (new usb transports
> or
> > other) or hybrid (usb-storage). In the former case,
> the
> > device is a SCSI device, in the latter, only basic
> commands
> > should be attempted.
> >
> > This isn't to say that firmware for those devices
> wouldn't
> > be buggy. Of course it will, and most will probably
> port
> > their legacy FW over to the new SPTL, but the
> protocol
> > requirements are there by design (i.e. there is no
> longer
> > Get Max Lun class-specific request, the application
> client
> > has to send REPORT LUNS, and FW has to answer it) and
> we
> > have to accommodate that.
> >
> > It is in this spirit that this patch doesn't change
> wire
> > behavior, but simply parses data returned by a
> command
> > already supported by older protocols.
>
> Did anyone pick up this patch?
It's been over 6 weeks now that this patch's been in these mailing lists.
Will anyone pick up this patch, or should I stop posting it every week? Please let me know--it's been posted here 6 times in the last 6 weeks.
Thanks,
Luben
On Tue, Dec 07, 2010 at 04:02:26PM -0800, Luben Tuikov wrote:
> --- On Sun, 12/5/10, Luben Tuikov <[email protected]> wrote:
> > --- On Wed, 11/24/10, Luben Tuikov
> > <[email protected]>
> > wrote:
> > >
> > > CBI/BBB isn't supposed to be, nor is designed to
> > support
> > > SAM-modern devices. So while REQUEST LUN /may/ work on
> > some
> > > devices which implement it in their firmware, it is
> > NOT a
> > > requirement for those devices as they are not required
> > to
> > > adhere to any SAM version. Those transport protocols
> > define
> > > a class-specific request to get the maximum LUN, and
> > another
> > > to reset the target port (instead of I_T Reset or LU
> > Reset).
> > > They also do not support SCSI Status completion of
> > the
> > > command, nor Autosense. They also do not provide TMFs.
> > They
> > > provide none of the SCSI transport protocol services
> > in
> > > support of the Execute Command procedure call. The
> > SCSI
> > > layer shouldn't be trying to guess their "SCSI
> > version", and
> > > or treat them as real SCSI devices sending REPORT
> > LUNs, etc.
> > > commands.
> > >
> > > Newer, modern transport protocols over USB, are part
> > of
> > > SAM, and it is devices who connect via those protocols
> > that
> > > are being disadvantaged, due to the adoption
> > (assumption) of
> > > CBI/BBB well into the SCSI layer.
> > >
> > > To this effect, the transport protocol can tell upper
> > > layers if the device is true SCSI (new usb transports
> > or
> > > other) or hybrid (usb-storage). In the former case,
> > the
> > > device is a SCSI device, in the latter, only basic
> > commands
> > > should be attempted.
> > >
> > > This isn't to say that firmware for those devices
> > wouldn't
> > > be buggy. Of course it will, and most will probably
> > port
> > > their legacy FW over to the new SPTL, but the
> > protocol
> > > requirements are there by design (i.e. there is no
> > longer
> > > Get Max Lun class-specific request, the application
> > client
> > > has to send REPORT LUNS, and FW has to answer it) and
> > we
> > > have to accommodate that.
> > >
> > > It is in this spirit that this patch doesn't change
> > wire
> > > behavior, but simply parses data returned by a
> > command
> > > already supported by older protocols.
> >
> > Did anyone pick up this patch?
>
> It's been over 6 weeks now that this patch's been in these mailing lists.
> Will anyone pick up this patch, or should I stop posting it every
> week? Please let me know--it's been posted here 6 times in the last 6
> weeks.
James, this is all you. Any thoughts?
thanks,
greg k-h
On Tue, 2010-12-07 at 16:12 -0800, Greg KH wrote:
> On Tue, Dec 07, 2010 at 04:02:26PM -0800, Luben Tuikov wrote:
> > --- On Sun, 12/5/10, Luben Tuikov <[email protected]> wrote:
> > > --- On Wed, 11/24/10, Luben Tuikov
> > > <[email protected]>
> > > wrote:
> > > >
> > > > CBI/BBB isn't supposed to be, nor is designed to
> > > support
> > > > SAM-modern devices. So while REQUEST LUN /may/ work on
> > > some
> > > > devices which implement it in their firmware, it is
> > > NOT a
> > > > requirement for those devices as they are not required
> > > to
> > > > adhere to any SAM version. Those transport protocols
> > > define
> > > > a class-specific request to get the maximum LUN, and
> > > another
> > > > to reset the target port (instead of I_T Reset or LU
> > > Reset).
> > > > They also do not support SCSI Status completion of
> > > the
> > > > command, nor Autosense. They also do not provide TMFs.
> > > They
> > > > provide none of the SCSI transport protocol services
> > > in
> > > > support of the Execute Command procedure call. The
> > > SCSI
> > > > layer shouldn't be trying to guess their "SCSI
> > > version", and
> > > > or treat them as real SCSI devices sending REPORT
> > > LUNs, etc.
> > > > commands.
> > > >
> > > > Newer, modern transport protocols over USB, are part
> > > of
> > > > SAM, and it is devices who connect via those protocols
> > > that
> > > > are being disadvantaged, due to the adoption
> > > (assumption) of
> > > > CBI/BBB well into the SCSI layer.
> > > >
> > > > To this effect, the transport protocol can tell upper
> > > > layers if the device is true SCSI (new usb transports
> > > or
> > > > other) or hybrid (usb-storage). In the former case,
> > > the
> > > > device is a SCSI device, in the latter, only basic
> > > commands
> > > > should be attempted.
> > > >
> > > > This isn't to say that firmware for those devices
> > > wouldn't
> > > > be buggy. Of course it will, and most will probably
> > > port
> > > > their legacy FW over to the new SPTL, but the
> > > protocol
> > > > requirements are there by design (i.e. there is no
> > > longer
> > > > Get Max Lun class-specific request, the application
> > > client
> > > > has to send REPORT LUNS, and FW has to answer it) and
> > > we
> > > > have to accommodate that.
> > > >
> > > > It is in this spirit that this patch doesn't change
> > > wire
> > > > behavior, but simply parses data returned by a
> > > command
> > > > already supported by older protocols.
> > >
> > > Did anyone pick up this patch?
> >
> > It's been over 6 weeks now that this patch's been in these mailing lists.
> > Will anyone pick up this patch, or should I stop posting it every
> > week? Please let me know--it's been posted here 6 times in the last 6
> > weeks.
>
> James, this is all you. Any thoughts?
Well, not other than I've already said: I think it looks OK, so I
marked it for my merge window queue. I'd still rather like USB people
to confirm that the original reason why this was done (to prevent device
crashes on the mode sense) is no longer an issue ... but it's only USB
stuff, so suppose I'm OK with finding out in the field.
James
--- On Tue, 12/7/10, James Bottomley <[email protected]> wrote:
> Greg KH wrote:
> > On Tue, Dec 07, 2010 at 04:02:26PM -0800, Luben Tuikov
> wrote:
> > > --- On Sun, 12/5/10, Luben Tuikov <[email protected]>
> wrote:
> > > > --- On Wed, 11/24/10, Luben Tuikov
> > > > <[email protected]>
> > > > wrote:
> > > > >
> > > > > CBI/BBB isn't supposed to be, nor is
> designed to
> > > > support
> > > > > SAM-modern devices. So while REQUEST
> LUN /may/ work on
> > > > some
> > > > > devices which implement it in their
> firmware, it is
> > > > NOT a
> > > > > requirement for those devices as they
> are not required
> > > > to
> > > > > adhere to any SAM version. Those
> transport protocols
> > > > define
> > > > > a class-specific request to get the
> maximum LUN, and
> > > > another
> > > > > to reset the target port (instead of
> I_T Reset or LU
> > > > Reset).
> > > > > They also do not support SCSI Status
> completion of
> > > > the
> > > > > command, nor Autosense. They also do
> not provide TMFs.
> > > > They
> > > > > provide none of the SCSI transport
> protocol services
> > > > in
> > > > > support of the Execute Command
> procedure call. The
> > > > SCSI
> > > > > layer shouldn't be trying to guess
> their "SCSI
> > > > version", and
> > > > > or treat them as real SCSI devices
> sending REPORT
> > > > LUNs, etc.
> > > > > commands.
> > > > >
> > > > > Newer, modern transport protocols over
> USB, are part
> > > > of
> > > > > SAM, and it is devices who connect via
> those protocols
> > > > that
> > > > > are being disadvantaged, due to the
> adoption
> > > > (assumption) of
> > > > > CBI/BBB well into the SCSI layer.
> > > > >
> > > > > To this effect, the transport protocol
> can tell upper
> > > > > layers if the device is true SCSI (new
> usb transports
> > > > or
> > > > > other) or hybrid (usb-storage). In the
> former case,
> > > > the
> > > > > device is a SCSI device, in the latter,
> only basic
> > > > commands
> > > > > should be attempted.
> > > > >
> > > > > This isn't to say that firmware for
> those devices
> > > > wouldn't
> > > > > be buggy. Of course it will, and most
> will probably
> > > > port
> > > > > their legacy FW over to the new SPTL,
> but the
> > > > protocol
> > > > > requirements are there by design (i.e.
> there is no
> > > > longer
> > > > > Get Max Lun class-specific request, the
> application
> > > > client
> > > > > has to send REPORT LUNS, and FW has to
> answer it) and
> > > > we
> > > > > have to accommodate that.
> > > > >
> > > > > It is in this spirit that this patch
> doesn't change
> > > > wire
> > > > > behavior, but simply parses data
> returned by a
> > > > command
> > > > > already supported by older protocols.
> > > >
> > > > Did anyone pick up this patch?
> > >
> > > It's been over 6 weeks now that this patch's been
> in these mailing lists.
> > > Will anyone pick up this patch, or should I stop
> posting it every
> > > week? Please let me know--it's been posted here 6
> times in the last 6
> > > weeks.
> >
> > James, this is all you.? Any thoughts?
>
> Well, not other than I've already said:? I think it
> looks OK, so I
> marked it for my merge window queue.? I'd still rather
> like USB people
> to confirm that the original reason why this was done (to
> prevent device
> crashes on the mode sense) is no longer an issue ... but
> it's only USB
> stuff, so suppose I'm OK with finding out in the field.
Check their responses above in this thread. The USB people seem to be okay with it. The thread is archived here: http://marc.info/?t=129044508400003&r=1&w=2.
Luben
On Tue, 7 Dec 2010, James Bottomley wrote:
> Well, not other than I've already said: I think it looks OK, so I
> marked it for my merge window queue. I'd still rather like USB people
> to confirm that the original reason why this was done (to prevent device
> crashes on the mode sense) is no longer an issue
The original reason for adding the skip_ms_page_8 flag still applies.
To assume it is no longer an issue would not be safe -- there's no
reason to believe that the buggy devices it was meant for have all been
retired.
> ... but it's only USB
> stuff, so suppose I'm OK with finding out in the field.
With USB there's often no other choice.
Alan Stern
On Wed, 2010-12-08 at 10:16 -0500, Alan Stern wrote:
> On Tue, 7 Dec 2010, James Bottomley wrote:
>
> > Well, not other than I've already said: I think it looks OK, so I
> > marked it for my merge window queue. I'd still rather like USB people
> > to confirm that the original reason why this was done (to prevent device
> > crashes on the mode sense) is no longer an issue
>
> The original reason for adding the skip_ms_page_8 flag still applies.
> To assume it is no longer an issue would not be safe -- there's no
> reason to believe that the buggy devices it was meant for have all been
> retired.
>
> > ... but it's only USB
> > stuff, so suppose I'm OK with finding out in the field.
>
> With USB there's often no other choice.
So the translation is that there's a possibility it will crash USB
devices but the only way to find out is to release it and see.
My problem is that it only takes one bug report from one failing device
(which I'm sure some kind soul will dig out of the attic or wherever
they threw it) for this to be a regression and then I get to revert the
patch ... unless we have a working backup plan?
I think it's easy to put in and easy to revert ... we'll pick up a bit
of flack if the failing device doesn't appear for some time, but I'm OK
with risking that.
James
On Wed, 8 Dec 2010, James Bottomley wrote:
> On Wed, 2010-12-08 at 10:16 -0500, Alan Stern wrote:
> > On Tue, 7 Dec 2010, James Bottomley wrote:
> >
> > > Well, not other than I've already said: I think it looks OK, so I
> > > marked it for my merge window queue. I'd still rather like USB people
> > > to confirm that the original reason why this was done (to prevent device
> > > crashes on the mode sense) is no longer an issue
> >
> > The original reason for adding the skip_ms_page_8 flag still applies.
> > To assume it is no longer an issue would not be safe -- there's no
> > reason to believe that the buggy devices it was meant for have all been
> > retired.
> >
> > > ... but it's only USB
> > > stuff, so suppose I'm OK with finding out in the field.
> >
> > With USB there's often no other choice.
>
> So the translation is that there's a possibility it will crash USB
> devices but the only way to find out is to release it and see.
In the strictest sense, there's always a possibility that any change
will crash _some_ device somewhere. In this case I believe the
probability is very low. Luben's patch does not change the commands
sent to a USB device; it only changes the kernel's interpretation of
the data sent back. Unless things are terribly badly broken, this
won't hurt.
> My problem is that it only takes one bug report from one failing device
> (which I'm sure some kind soul will dig out of the attic or wherever
> they threw it) for this to be a regression and then I get to revert the
> patch ... unless we have a working backup plan?
>
> I think it's easy to put in and easy to revert ... we'll pick up a bit
> of flack if the failing device doesn't appear for some time, but I'm OK
> with risking that.
IMO the risk is extremely small.
Alan Stern
On Wed, Dec 08, 2010 at 10:57:46AM -0500, Alan Stern wrote:
> In the strictest sense, there's always a possibility that any change
> will crash _some_ device somewhere. In this case I believe the
> probability is very low. Luben's patch does not change the commands
> sent to a USB device; it only changes the kernel's interpretation of
> the data sent back. Unless things are terribly badly broken, this
> won't hurt.
It doesn't change the _discovery_ commands sent to the device, but (I
think ...) it will change the subsequent commands sent to the device;
eg we'll now send it SYNCHRONISE CACHE when we wouldn't have before.
I think it's low-risk too, and am in favour of seeing this patch applied.
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."