Hello,
An external HDD, usb-encased, works fine under 2.6.26.5, but under
2.6.27.2 I get hundreds of errors per second, of 'No Sense [current]'.
On connection, under 2.6.26.5, I get the following:
[ 84.503949] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 8
[ 84.696861] usb 1-2.1.1: configuration #1 chosen from 1 choice
[ 84.740856] usbcore: registered new interface driver libusual
[ 84.817625] Initializing USB Mass Storage driver...
[ 84.817625] scsi2 : SCSI emulation for USB Mass Storage devices
[ 84.818820] usbcore: registered new interface driver usb-storage
[ 84.818823] USB Mass Storage support registered.
[ 89.819567] scsi 2:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0
[ 89.857627] Driver 'sd' needs updating - please use bus_type methods
[ 89.857837] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
[ 89.858754] sd 2:0:0:0: [sda] Write Protect is off
[ 89.858754] sd 2:0:0:0: [sda] Assuming drive cache: write through
[ 89.858869] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
[ 89.858869] sd 2:0:0:0: [sda] Write Protect is off
[ 89.858869] sd 2:0:0:0: [sda] Assuming drive cache: write through
[ 89.858869] sda: sda1
[ 89.879851] sd 2:0:0:0: [sda] Attached SCSI disk
[ 89.901630] sd 2:0:0:0: Attached scsi generic sg0 type 0
[ 89.963632] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 89.963632] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 106.913629] kjournald starting. Commit interval 5 seconds
[ 106.913629] EXT3 FS on dm-5, internal journal
On 2.6.27.2, I get:
[ 551.484341] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 8
[ 551.677467] usb 1-2.1.1: configuration #1 chosen from 1 choice
[ 551.718201] usbcore: registered new interface driver libusual
[ 551.799287] Initializing USB Mass Storage driver...
[ 551.800330] scsi2 : SCSI emulation for USB Mass Storage devices
[ 551.800694] usbcore: registered new interface driver usb-storage
[ 551.800698] USB Mass Storage support registered.
[ 556.800702] scsi 2:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0
[ 556.835190] Driver 'sd' needs updating - please use bus_type methods
[ 556.835928] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
[ 556.836809] sd 2:0:0:0: [sda] Write Protect is off
[ 556.836816] sd 2:0:0:0: [sda] Assuming drive cache: write through
[ 556.837548] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
[ 556.838296] sd 2:0:0:0: [sda] Write Protect is off
[ 556.838307] sd 2:0:0:0: [sda] Assuming drive cache: write through
[ 556.838310] sda: sda1
[ 556.859169] sd 2:0:0:0: [sda] Attached SCSI disk
[ 556.878553] sd 2:0:0:0: Attached scsi generic sg0 type 0
[ 556.946044] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 556.946055] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 556.947290] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 556.947298] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 556.948539] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 556.948547] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 556.949790] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 556.949798] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 556.951045] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 556.951053] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 556.952290] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 556.952297] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 556.954665] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 556.954673] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 556.957665] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 556.957673] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 556.960791] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 556.960799] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 556.963790] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 556.963798] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 556.966915] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
These repeat forever, till I disconnect.
Regards,
Luciano Rocha
--
Luciano Rocha <[email protected]>
Eurotux Inform?tica, S.A. <http://www.eurotux.com/>
[Adding CCs]
On Wednesday, 22 of October 2008, Luciano Rocha wrote:
>
> Hello,
>
> An external HDD, usb-encased, works fine under 2.6.26.5, but under
> 2.6.27.2 I get hundreds of errors per second, of 'No Sense [current]'.
>
> On connection, under 2.6.26.5, I get the following:
>
> [ 84.503949] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 8
> [ 84.696861] usb 1-2.1.1: configuration #1 chosen from 1 choice
> [ 84.740856] usbcore: registered new interface driver libusual
> [ 84.817625] Initializing USB Mass Storage driver...
> [ 84.817625] scsi2 : SCSI emulation for USB Mass Storage devices
> [ 84.818820] usbcore: registered new interface driver usb-storage
> [ 84.818823] USB Mass Storage support registered.
> [ 89.819567] scsi 2:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0
> [ 89.857627] Driver 'sd' needs updating - please use bus_type methods
> [ 89.857837] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
> [ 89.858754] sd 2:0:0:0: [sda] Write Protect is off
> [ 89.858754] sd 2:0:0:0: [sda] Assuming drive cache: write through
> [ 89.858869] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
> [ 89.858869] sd 2:0:0:0: [sda] Write Protect is off
> [ 89.858869] sd 2:0:0:0: [sda] Assuming drive cache: write through
> [ 89.858869] sda: sda1
> [ 89.879851] sd 2:0:0:0: [sda] Attached SCSI disk
> [ 89.901630] sd 2:0:0:0: Attached scsi generic sg0 type 0
> [ 89.963632] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 89.963632] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 106.913629] kjournald starting. Commit interval 5 seconds
> [ 106.913629] EXT3 FS on dm-5, internal journal
>
> On 2.6.27.2, I get:
>
> [ 551.484341] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 8
> [ 551.677467] usb 1-2.1.1: configuration #1 chosen from 1 choice
> [ 551.718201] usbcore: registered new interface driver libusual
> [ 551.799287] Initializing USB Mass Storage driver...
> [ 551.800330] scsi2 : SCSI emulation for USB Mass Storage devices
> [ 551.800694] usbcore: registered new interface driver usb-storage
> [ 551.800698] USB Mass Storage support registered.
> [ 556.800702] scsi 2:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0
> [ 556.835190] Driver 'sd' needs updating - please use bus_type methods
> [ 556.835928] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
> [ 556.836809] sd 2:0:0:0: [sda] Write Protect is off
> [ 556.836816] sd 2:0:0:0: [sda] Assuming drive cache: write through
> [ 556.837548] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
> [ 556.838296] sd 2:0:0:0: [sda] Write Protect is off
> [ 556.838307] sd 2:0:0:0: [sda] Assuming drive cache: write through
> [ 556.838310] sda: sda1
> [ 556.859169] sd 2:0:0:0: [sda] Attached SCSI disk
> [ 556.878553] sd 2:0:0:0: Attached scsi generic sg0 type 0
> [ 556.946044] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 556.946055] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 556.947290] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 556.947298] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 556.948539] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 556.948547] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 556.949790] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 556.949798] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 556.951045] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 556.951053] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 556.952290] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 556.952297] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 556.954665] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 556.954673] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 556.957665] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 556.957673] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 556.960791] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 556.960799] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 556.963790] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 556.963798] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 556.966915] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
>
> These repeat forever, till I disconnect.
>
> Regards,
> Luciano Rocha
>
On Sat, 25 Oct 2008, Rafael J. Wysocki wrote:
> [Adding CCs]
>
> On Wednesday, 22 of October 2008, Luciano Rocha wrote:
> >
> > Hello,
> >
> > An external HDD, usb-encased, works fine under 2.6.26.5, but under
> > 2.6.27.2 I get hundreds of errors per second, of 'No Sense [current]'.
You can use usbmon to capture the details of what happens when you plug
in the drive. Instructions are in the kernel source file
Documentation/usb/usbmon.txt.
Alan Stern
On Sat, 2008-10-25 at 15:50 -0400, Alan Stern wrote:
> On Sat, 25 Oct 2008, Rafael J. Wysocki wrote:
>
> > [Adding CCs]
> >
> > On Wednesday, 22 of October 2008, Luciano Rocha wrote:
> > >
> > > Hello,
> > >
> > > An external HDD, usb-encased, works fine under 2.6.26.5, but under
> > > 2.6.27.2 I get hundreds of errors per second, of 'No Sense [current]'.
>
> You can use usbmon to capture the details of what happens when you plug
> in the drive. Instructions are in the kernel source file
> Documentation/usb/usbmon.txt.
Try reversing this patch (apply with patch -p1 -R)
James
---
commit 10dab22664914505dcb804d9ad09cad6bc94d349
Author: Jamie Wellnitz <[email protected]>
Date: Thu Sep 11 21:39:36 2008 -0400
[SCSI] sd: Fix handling of NO_SENSE check condition
The current handling of NO_SENSE check condition is the same as
RECOVERED_ERROR, and assumes that in both cases, the I/O was fully
transferred.
We have seen cases of arrays returning with NO_SENSE (no error), but
the I/O was not completely transferred, thus residual set. Thus,
rather than return good_bytes as the entire transfer, set good_bytes
to 0, so that the midlayer then applies the residual in calculating
the transfer, and for sd, will fail the I/O and fall into a retry
path.
Signed-off-by: Jamie Wellnitz <[email protected]>
Signed-off-by: James Bottomley <[email protected]>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7c4d2e6..55e6ed4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1054,7 +1054,6 @@ static int sd_done(struct scsi_cmnd *SCpnt)
good_bytes = sd_completed_bytes(SCpnt);
break;
case RECOVERED_ERROR:
- case NO_SENSE:
/* Inform the user, but make sure that it's not treated
* as a hard error.
*/
@@ -1063,6 +1062,15 @@ static int sd_done(struct scsi_cmnd *SCpnt)
memset(SCpnt->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
good_bytes = scsi_bufflen(SCpnt);
break;
+ case NO_SENSE:
+ /* This indicates a false check condition, so ignore it. An
+ * unknown amount of data was transferred so treat it as an
+ * error.
+ */
+ scsi_print_sense("sd", SCpnt);
+ SCpnt->result = 0;
+ memset(SCpnt->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
+ break;
case ABORTED_COMMAND:
if (sshdr.asc == 0x10) { /* DIF: Disk detected corruption */
scsi_print_result(SCpnt);
On Sat, Oct 25, 2008 at 03:11:33PM -0500, James Bottomley wrote:
> On Sat, 2008-10-25 at 15:50 -0400, Alan Stern wrote:
> > On Sat, 25 Oct 2008, Rafael J. Wysocki wrote:
> >
> > > [Adding CCs]
> > >
> > > On Wednesday, 22 of October 2008, Luciano Rocha wrote:
> > > >
> > > > Hello,
> > > >
> > > > An external HDD, usb-encased, works fine under 2.6.26.5, but under
> > > > 2.6.27.2 I get hundreds of errors per second, of 'No Sense [current]'.
> >
> > You can use usbmon to capture the details of what happens when you plug
> > in the drive. Instructions are in the kernel source file
> > Documentation/usb/usbmon.txt.
>
> Try reversing this patch (apply with patch -p1 -R)
I'll try it tomorrow.
Thanks,
Luciano Rocha
--
Luciano Rocha <[email protected]>
Eurotux Inform?tica, S.A. <http://www.eurotux.com/>
On Sat, Oct 25, 2008 at 03:11:33PM -0500, James Bottomley wrote:
> On Sat, 2008-10-25 at 15:50 -0400, Alan Stern wrote:
> > On Sat, 25 Oct 2008, Rafael J. Wysocki wrote:
> >
> > > [Adding CCs]
> > >
> > > On Wednesday, 22 of October 2008, Luciano Rocha wrote:
> > > >
> > > > Hello,
> > > >
> > > > An external HDD, usb-encased, works fine under 2.6.26.5, but under
> > > > 2.6.27.2 I get hundreds of errors per second, of 'No Sense [current]'.
> >
> > You can use usbmon to capture the details of what happens when you plug
> > in the drive. Instructions are in the kernel source file
> > Documentation/usb/usbmon.txt.
>
> Try reversing this patch (apply with patch -p1 -R)
>
> James
>
> ---
>
> commit 10dab22664914505dcb804d9ad09cad6bc94d349
> Author: Jamie Wellnitz <[email protected]>
> Date: Thu Sep 11 21:39:36 2008 -0400
>
> [SCSI] sd: Fix handling of NO_SENSE check condition
>
> The current handling of NO_SENSE check condition is the same as
> RECOVERED_ERROR, and assumes that in both cases, the I/O was fully
> transferred.
<snip>
2.6.27.4 comes with it reverted. But I still get the same error.
Reapplied it as one more test, with the same error.
Regards,
Luciano Rocha
--
Luciano Rocha <[email protected]>
Eurotux Inform?tica, S.A. <http://www.eurotux.com/>
On Mon, 27 Oct 2008, Luciano Rocha wrote:
> On Sat, Oct 25, 2008 at 03:50:07PM -0400, Alan Stern wrote:
> > On Sat, 25 Oct 2008, Rafael J. Wysocki wrote:
> >
> > > [Adding CCs]
> > >
> > > On Wednesday, 22 of October 2008, Luciano Rocha wrote:
> > > >
> > > > Hello,
> > > >
> > > > An external HDD, usb-encased, works fine under 2.6.26.5, but under
> > > > 2.6.27.2 I get hundreds of errors per second, of 'No Sense [current]'.
> >
> > You can use usbmon to capture the details of what happens when you plug
> > in the drive. Instructions are in the kernel source file
> > Documentation/usb/usbmon.txt.
> >
>
> Now in 2.6.27.4, same problem. The usb traffic is:
This looks exactly like the "infinite retry" problem I warned about
earlier. Here are the important parts of the log. For people who
don't know how to interpret these messages, the CDB starts in the 16th
byte of the 31-byte messages. For example, the first command here
starts with 0x25 and so it is READ CAPACITY:
> f21e7cc0 3570408174 S Bo:1:008:1 -115 31 = 55534243 06000000 08000000 80000a25 00000000 00000000 00000000 000000
> f21e7cc0 3570408264 C Bo:1:008:1 0 31 >
> f21e72c0 3570408280 S Bi:1:008:2 -115 8 <
> f21e72c0 3570408389 C Bi:1:008:2 0 8 = 2e9390b0 00000200
> f21e7cc0 3570408400 S Bi:1:008:2 -115 13 <
> f21e7cc0 3570408513 C Bi:1:008:2 0 13 = 55534253 06000000 00000000 00
The response is 0x2e9390b0. In typical broken fashion, that is
undoubtedly the total number of sectors rather than the highest sector
number.
Later on the system tries to read the contents of what it thinks is the
last sector:
> f21e7cc0 3570515635 S Bo:1:008:1 -115 31 = 55534243 0c000000 00020000 80000a28 002e9390 b0000001 00000000 000000
> f21e7cc0 3570515762 C Bo:1:008:1 0 31 >
> f21e76c0 3570515776 S Bi:1:008:2 -115 512 <
> f21e76c0 3570516261 C Bi:1:008:2 -32 0
> f21e7cc0 3570516281 S Co:1:008:0 s 02 01 0000 0082 0000 0
> f21e7cc0 3570516387 C Co:1:008:0 0 0
> f21e7cc0 3570516399 S Bi:1:008:2 -115 13 <
> f21e7cc0 3570516511 C Bi:1:008:2 0 13 = 55534253 0c000000 00020000 01
There's no data in the response, and the 01 on the line above
indicates Check Condition status.
> f21e7cc0 3570516524 S Bo:1:008:1 -115 31 = 55534243 0d000000 12000000 80000603 00000012 00000000 00000000 000000
> f21e7cc0 3570516636 C Bo:1:008:1 0 31 >
> f21e76c0 3570516649 S Bi:1:008:2 -115 18 <
> f21e76c0 3570516762 C Bi:1:008:2 0 18 = 70000000 0000000a 00000000 00000000 0000
> f21e7cc0 3570516779 S Bi:1:008:2 -115 13 <
> f21e7cc0 3570516886 C Bi:1:008:2 0 13 = 55534253 0d000000 00000000 00
The automatically-generated REQUEST SENSE gets the 18-byte response you
see above. It is entirely empty (No Sense).
The remainder of the trace shows the same command being repeated over
and over again, with the same result each time.
> f21e7cc0 3570516936 S Bo:1:008:1 -115 31 = 55534243 0e000000 00020000 80000a28 002e9390 b0000001 00000000 000000
> f21e7cc0 3570517012 C Bo:1:008:1 0 31 >
> f21e76c0 3570517031 S Bi:1:008:2 -115 512 <
> f21e76c0 3570517511 C Bi:1:008:2 -32 0
> f21e7cc0 3570517533 S Co:1:008:0 s 02 01 0000 0082 0000 0
> f21e7cc0 3570517637 C Co:1:008:0 0 0
> f21e7cc0 3570517648 S Bi:1:008:2 -115 13 <
> f21e7cc0 3570517762 C Bi:1:008:2 0 13 = 55534253 0e000000 00020000 01
> f21e7cc0 3570517775 S Bo:1:008:1 -115 31 = 55534243 0f000000 12000000 80000603 00000012 00000000 00000000 000000
> f21e7cc0 3570517886 C Bo:1:008:1 0 31 >
> f21e76c0 3570517897 S Bi:1:008:2 -115 18 <
> f21e76c0 3570518011 C Bi:1:008:2 0 18 = 70000000 0000000a 00000000 00000000 0000
> f21e7cc0 3570518027 S Bi:1:008:2 -115 13 <
> f21e7cc0 3570518136 C Bi:1:008:2 0 13 = 55534253 0f000000 00000000 00
etc...
There's a patch which might help resolve this problem:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8bfa24727087d7252f9ecfb5fea2dfc92d797fbd
It is already present in 2.6.28-rc1, so it's worth a try. If it does
fix things, let me know so I can submit it for a future 2.6.27.stable
release.
Alan Stern
Alan Stern wrote:
> On Mon, 27 Oct 2008, Luciano Rocha wrote:
>
>> On Sat, Oct 25, 2008 at 03:50:07PM -0400, Alan Stern wrote:
>>> On Sat, 25 Oct 2008, Rafael J. Wysocki wrote:
>>>
>>>> [Adding CCs]
>>>>
>>>> On Wednesday, 22 of October 2008, Luciano Rocha wrote:
>>>>> Hello,
>>>>>
>>>>> An external HDD, usb-encased, works fine under 2.6.26.5, but under
>>>>> 2.6.27.2 I get hundreds of errors per second, of 'No Sense [current]'.
>>> You can use usbmon to capture the details of what happens when you plug
>>> in the drive. Instructions are in the kernel source file
>>> Documentation/usb/usbmon.txt.
>>>
>> Now in 2.6.27.4, same problem. The usb traffic is:
>
> This looks exactly like the "infinite retry" problem I warned about
> earlier. Here are the important parts of the log. For people who
> don't know how to interpret these messages, the CDB starts in the 16th
> byte of the 31-byte messages. For example, the first command here
> starts with 0x25 and so it is READ CAPACITY:
>
>> f21e7cc0 3570408174 S Bo:1:008:1 -115 31 = 55534243 06000000 08000000 80000a25 00000000 00000000 00000000 000000
>> f21e7cc0 3570408264 C Bo:1:008:1 0 31 >
>> f21e72c0 3570408280 S Bi:1:008:2 -115 8 <
>> f21e72c0 3570408389 C Bi:1:008:2 0 8 = 2e9390b0 00000200
>> f21e7cc0 3570408400 S Bi:1:008:2 -115 13 <
>> f21e7cc0 3570408513 C Bi:1:008:2 0 13 = 55534253 06000000 00000000 00
>
> The response is 0x2e9390b0. In typical broken fashion, that is
> undoubtedly the total number of sectors rather than the highest sector
> number.
Since the READ CAPACITY "off by one" error is so common,
perhaps drivers such as usb-storage could have a hook to
do a pseudo READ CAPACITY. Then if the capacity value
looked odd (in both senses) the driver could do an IO to
the suspect block and if that failed decrement the capacity
value passed back to the mid level.
Put another way, why don't these defective devices trip up
another OS?
BTW a single disk in RAID 0 (seen on a HP E200 controller)
has a shortened capacity value seen in the midlevel on the
corresponding logical drive. That missing chunk is probably
where the RAID controller puts its control information.
Anyway, playing with the capacity value returned by READ
CAPACITY certainly has a precedent.
> Later on the system tries to read the contents of what it thinks is the
> last sector:
I know that happens but it seems strange that upper levels
are reading a block that has never been written to. Read ahead?
Doug Gilbert
>> f21e7cc0 3570515635 S Bo:1:008:1 -115 31 = 55534243 0c000000 00020000 80000a28 002e9390 b0000001 00000000 000000
>> f21e7cc0 3570515762 C Bo:1:008:1 0 31 >
>> f21e76c0 3570515776 S Bi:1:008:2 -115 512 <
>> f21e76c0 3570516261 C Bi:1:008:2 -32 0
>> f21e7cc0 3570516281 S Co:1:008:0 s 02 01 0000 0082 0000 0
>> f21e7cc0 3570516387 C Co:1:008:0 0 0
>> f21e7cc0 3570516399 S Bi:1:008:2 -115 13 <
>> f21e7cc0 3570516511 C Bi:1:008:2 0 13 = 55534253 0c000000 00020000 01
>
> There's no data in the response, and the 01 on the line above
> indicates Check Condition status.
>
>> f21e7cc0 3570516524 S Bo:1:008:1 -115 31 = 55534243 0d000000 12000000 80000603 00000012 00000000 00000000 000000
>> f21e7cc0 3570516636 C Bo:1:008:1 0 31 >
>> f21e76c0 3570516649 S Bi:1:008:2 -115 18 <
>> f21e76c0 3570516762 C Bi:1:008:2 0 18 = 70000000 0000000a 00000000 00000000 0000
>> f21e7cc0 3570516779 S Bi:1:008:2 -115 13 <
>> f21e7cc0 3570516886 C Bi:1:008:2 0 13 = 55534253 0d000000 00000000 00
>
> The automatically-generated REQUEST SENSE gets the 18-byte response you
> see above. It is entirely empty (No Sense).
>
> The remainder of the trace shows the same command being repeated over
> and over again, with the same result each time.
>
>> f21e7cc0 3570516936 S Bo:1:008:1 -115 31 = 55534243 0e000000 00020000 80000a28 002e9390 b0000001 00000000 000000
>> f21e7cc0 3570517012 C Bo:1:008:1 0 31 >
>> f21e76c0 3570517031 S Bi:1:008:2 -115 512 <
>> f21e76c0 3570517511 C Bi:1:008:2 -32 0
>> f21e7cc0 3570517533 S Co:1:008:0 s 02 01 0000 0082 0000 0
>> f21e7cc0 3570517637 C Co:1:008:0 0 0
>> f21e7cc0 3570517648 S Bi:1:008:2 -115 13 <
>> f21e7cc0 3570517762 C Bi:1:008:2 0 13 = 55534253 0e000000 00020000 01
>> f21e7cc0 3570517775 S Bo:1:008:1 -115 31 = 55534243 0f000000 12000000 80000603 00000012 00000000 00000000 000000
>> f21e7cc0 3570517886 C Bo:1:008:1 0 31 >
>> f21e76c0 3570517897 S Bi:1:008:2 -115 18 <
>> f21e76c0 3570518011 C Bi:1:008:2 0 18 = 70000000 0000000a 00000000 00000000 0000
>> f21e7cc0 3570518027 S Bi:1:008:2 -115 13 <
>> f21e7cc0 3570518136 C Bi:1:008:2 0 13 = 55534253 0f000000 00000000 00
> etc...
>
> There's a patch which might help resolve this problem:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8bfa24727087d7252f9ecfb5fea2dfc92d797fbd
>
> It is already present in 2.6.28-rc1, so it's worth a try. If it does
> fix things, let me know so I can submit it for a future 2.6.27.stable
> release.
>
> Alan Stern
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Douglas Gilbert wrote:
>
> Since the READ CAPACITY "off by one" error is so common,
> perhaps drivers such as usb-storage could have a hook to
> do a pseudo READ CAPACITY. Then if the capacity value
> looked odd (in both senses) the driver could do an IO to
> the suspect block and if that failed decrement the capacity
> value passed back to the mid level.
> Put another way, why don't these defective devices trip up
> another OS?
>
Window$ never reads the last sector unless it is actually written
to. I had such a device it got stuck when I wrote to the
last sector.
> BTW a single disk in RAID 0 (seen on a HP E200 controller)
> has a shortened capacity value seen in the midlevel on the
> corresponding logical drive. That missing chunk is probably
> where the RAID controller puts its control information.
> Anyway, playing with the capacity value returned by READ
> CAPACITY certainly has a precedent.
>
>> Later on the system tries to read the contents of what it thinks is the
>> last sector:
>
> I know that happens but it seems strange that upper levels
> are reading a block that has never been written to. Read ahead?
>
That would be udev or hald, I can't remember which. It is a special Linux
fixture. ;)
> Doug Gilbert
>
>
Boaz
On Mon, Oct 27, 2008 at 10:24:14AM -0400, Alan Stern wrote:
> On Mon, 27 Oct 2008, Luciano Rocha wrote:
>
> > On Sat, Oct 25, 2008 at 03:50:07PM -0400, Alan Stern wrote:
> > > On Sat, 25 Oct 2008, Rafael J. Wysocki wrote:
> > >
> > > > [Adding CCs]
> > > >
> > > > On Wednesday, 22 of October 2008, Luciano Rocha wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > An external HDD, usb-encased, works fine under 2.6.26.5, but under
> > > > > 2.6.27.2 I get hundreds of errors per second, of 'No Sense [current]'.
> > >
> > > You can use usbmon to capture the details of what happens when you plug
> > > in the drive. Instructions are in the kernel source file
> > > Documentation/usb/usbmon.txt.
> > >
> >
> > Now in 2.6.27.4, same problem. The usb traffic is:
>
> This looks exactly like the "infinite retry" problem I warned about
> earlier. Here are the important parts of the log. For people who
> don't know how to interpret these messages, the CDB starts in the 16th
> byte of the 31-byte messages. For example, the first command here
> starts with 0x25 and so it is READ CAPACITY:
>
> > f21e7cc0 3570408174 S Bo:1:008:1 -115 31 = 55534243 06000000 08000000 80000a25 00000000 00000000 00000000 000000
> > f21e7cc0 3570408264 C Bo:1:008:1 0 31 >
> > f21e72c0 3570408280 S Bi:1:008:2 -115 8 <
> > f21e72c0 3570408389 C Bi:1:008:2 0 8 = 2e9390b0 00000200
> > f21e7cc0 3570408400 S Bi:1:008:2 -115 13 <
> > f21e7cc0 3570408513 C Bi:1:008:2 0 13 = 55534253 06000000 00000000 00
>
> The response is 0x2e9390b0. In typical broken fashion, that is
> undoubtedly the total number of sectors rather than the highest sector
> number.
>
> Later on the system tries to read the contents of what it thinks is the
> last sector:
Hm, where in the code can I change it to return less that what it thinks
is the last sector?
<snip>
>
> There's a patch which might help resolve this problem:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8bfa24727087d7252f9ecfb5fea2dfc92d797fbd
>
No luck. I'm going to try 2.6.28-rc2 now.
Regards,
Luciano Rocha
--
Luciano Rocha <[email protected]>
Eurotux Inform?tica, S.A. <http://www.eurotux.com/>
On Mon, 27 Oct 2008, Douglas Gilbert wrote:
> > This looks exactly like the "infinite retry" problem I warned about
> > earlier. Here are the important parts of the log. For people who
> > don't know how to interpret these messages, the CDB starts in the 16th
> > byte of the 31-byte messages. For example, the first command here
> > starts with 0x25 and so it is READ CAPACITY:
> >
> >> f21e7cc0 3570408174 S Bo:1:008:1 -115 31 = 55534243 06000000 08000000 80000a25 00000000 00000000 00000000 000000
> >> f21e7cc0 3570408264 C Bo:1:008:1 0 31 >
> >> f21e72c0 3570408280 S Bi:1:008:2 -115 8 <
> >> f21e72c0 3570408389 C Bi:1:008:2 0 8 = 2e9390b0 00000200
> >> f21e7cc0 3570408400 S Bi:1:008:2 -115 13 <
> >> f21e7cc0 3570408513 C Bi:1:008:2 0 13 = 55534253 06000000 00000000 00
> >
> > The response is 0x2e9390b0. In typical broken fashion, that is
> > undoubtedly the total number of sectors rather than the highest sector
> > number.
>
> Since the READ CAPACITY "off by one" error is so common,
> perhaps drivers such as usb-storage could have a hook to
> do a pseudo READ CAPACITY. Then if the capacity value
> looked odd (in both senses) the driver could do an IO to
> the suspect block and if that failed decrement the capacity
> value passed back to the mid level.
We thought of that years ago. Unfortunately there is no reliable way
of telling when a capacity value is wrong. There definitely do exist
disks with an odd number of sectors.
Furthermore, doing I/O to a suspect block is not a good idea. There
are plenty of devices which simply crash when you try to access a
nonexistent sector.
> Put another way, why don't these defective devices trip up
> another OS?
I imagine they do. However Linux has partition code that stores
information in the last sector of a partition (EFI GUID and md, for
example). Other OS's apparently do not try to access the medium's last
sector under most circumstances.
> BTW a single disk in RAID 0 (seen on a HP E200 controller)
> has a shortened capacity value seen in the midlevel on the
> corresponding logical drive. That missing chunk is probably
> where the RAID controller puts its control information.
> Anyway, playing with the capacity value returned by READ
> CAPACITY certainly has a precedent.
usb-storage isn't in the business of altering the data it gets from a
device. It's just a transport. That's why the sdev->fix_capacity flag
exists; we tell the upper layer that the data it gets is going to be
wrong and let the upper layer worry about fixing things up.
> > Later on the system tries to read the contents of what it thinks is the
> > last sector:
>
> I know that happens but it seems strange that upper levels
> are reading a block that has never been written to. Read ahead?
No, partition scanning. Also maybe /lib/udev/vol_id, which seems to
read an inordinate number of irrelevant sectors.
Alan Stern
On Mon, 2008-10-27 at 11:25 -0400, Alan Stern wrote:
> On Mon, 27 Oct 2008, Douglas Gilbert wrote:
> > BTW a single disk in RAID 0 (seen on a HP E200 controller)
> > has a shortened capacity value seen in the midlevel on the
> > corresponding logical drive. That missing chunk is probably
> > where the RAID controller puts its control information.
> > Anyway, playing with the capacity value returned by READ
> > CAPACITY certainly has a precedent.
>
> usb-storage isn't in the business of altering the data it gets from a
> device. It's just a transport. That's why the sdev->fix_capacity flag
> exists; we tell the upper layer that the data it gets is going to be
> wrong and let the upper layer worry about fixing things up.
And you do this by setting US_FL_FIX_CAPACITY in unusual_devs.h
James
On Mon, 27 Oct 2008, Luciano Rocha wrote:
> Hm, where in the code can I change it to return less that what it thinks
> is the last sector?
I can write a patch to do that for you. However I'm more interested in
solving the "infinite-retry" problem first.
I did write some patches which might help. They weren't meant as bug
fixes exactly, more as infrastructure cleanup. But you ought to try
them out, because they do affect the logic in this area.
The patches are here:
http://marc.info/?l=linux-scsi&m=122443015406309&w=2
They are based on 2.6.27, not 2.6.28-rc.
Alan Stern
On Mon, Oct 27, 2008 at 16:08, Boaz Harrosh <[email protected]> wrote:
> Douglas Gilbert wrote:
>>
>> Since the READ CAPACITY "off by one" error is so common,
>> perhaps drivers such as usb-storage could have a hook to
>> do a pseudo READ CAPACITY. Then if the capacity value
>> looked odd (in both senses) the driver could do an IO to
>> the suspect block and if that failed decrement the capacity
>> value passed back to the mid level.
>> Put another way, why don't these defective devices trip up
>> another OS?
>>
>
> Window$ never reads the last sector unless it is actually written
> to. I had such a device it got stuck when I wrote to the
> last sector.
>
>> BTW a single disk in RAID 0 (seen on a HP E200 controller)
>> has a shortened capacity value seen in the midlevel on the
>> corresponding logical drive. That missing chunk is probably
>> where the RAID controller puts its control information.
>> Anyway, playing with the capacity value returned by READ
>> CAPACITY certainly has a precedent.
>>
>>> Later on the system tries to read the contents of what it thinks is the
>>> last sector:
>>
>> I know that happens but it seems strange that upper levels
>> are reading a block that has never been written to. Read ahead?
>>
>
> That would be udev or hald, I can't remember which. It is a special Linux
> fixture. ;)
It's vol_id from udev. To auto-detect raid setups, LVM, MD metadata, ...
Such devices carry their magic sequence at the "end" of the device,
usually the last sector or a few sectors before the last sector. The
"last" sector is obviously determined by the capacity the device
reports.
There is no way around looking at all block devices' last sector on a
Linux box, as long as people expect auto-setup, auto-mounting,
anything that is not put in /etc/fstab to work, as long as any "meta
device" format puts a magic number at the end of the device.
Kay
On Mon, Oct 27, 2008 at 11:38:13AM -0400, Alan Stern wrote:
> On Mon, 27 Oct 2008, Luciano Rocha wrote:
>
> > Hm, where in the code can I change it to return less that what it thinks
> > is the last sector?
>
> I can write a patch to do that for you. However I'm more interested in
> solving the "infinite-retry" problem first.
>
> I did write some patches which might help. They weren't meant as bug
> fixes exactly, more as infrastructure cleanup. But you ought to try
> them out, because they do affect the logic in this area.
>
> The patches are here:
>
> http://marc.info/?l=linux-scsi&m=122443015406309&w=2
>
> They are based on 2.6.27, not 2.6.28-rc.
I've tested them (on 2.6.27.4), but I still get the bug. That email mentions:
"Neither patch addresses the infinite-retry problem; I wanted
to keep the issues separate."
So am I missing some other patch?
2.6.28-rc2 is also buggy.
Thanks,
Luciano Rocha
--
Luciano Rocha <[email protected]>
Eurotux Inform?tica, S.A. <http://www.eurotux.com/>
On Mon, 27 Oct 2008, Luciano Rocha wrote:
> > The patches are here:
> >
> > http://marc.info/?l=linux-scsi&m=122443015406309&w=2
> >
> > They are based on 2.6.27, not 2.6.28-rc.
>
> I've tested them (on 2.6.27.4), but I still get the bug. That email mentions:
> "Neither patch addresses the infinite-retry problem; I wanted
> to keep the issues separate."
>
> So am I missing some other patch?
>
> 2.6.28-rc2 is also buggy.
Okay, I tried the same sort of experiment in 2.6.27.4 and got the same
result as you did. I was able to fix it by applying the 8bfa24727
commit mentioned earlier together with the patch below.
Now, I'm pretty sure this is not the ideal solution. Fixing the block
core would be better. Still, it does indicate for sure that there's a
real problem.
It's worth noting that the test added to scsi_requeue_command()
succeeds every time, that is, req->retries is always 0. Clearly more
work is needed.
Alan Stern
Index: 2.6.27.4/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
+++ 2.6.27.4/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct
struct request *req = cmd->request;
unsigned long flags;
+ if (--req->retries <= 0) {
+ blk_end_request(req, -EIO, blk_rq_bytes(req));
+ scsi_next_command(cmd);
+ return;
+ }
scsi_unprep_request(req);
spin_lock_irqsave(q->queue_lock, flags);
blk_requeue_request(q, req);
On Mon, 27 Oct 2008, Luciano Rocha wrote:
> On Mon, Oct 27, 2008 at 11:38:13AM -0400, Alan Stern wrote:
> > On Mon, 27 Oct 2008, Luciano Rocha wrote:
> >
> > > Hm, where in the code can I change it to return less that what it thinks
> > > is the last sector?
> >
> > I can write a patch to do that for you. However I'm more interested in
> > solving the "infinite-retry" problem first.
Here's the promised patch.
Alan Stern
Index: usb-2.6/drivers/usb/storage/unusual_devs.h
===================================================================
--- usb-2.6.orig/drivers/usb/storage/unusual_devs.h
+++ usb-2.6/drivers/usb/storage/unusual_devs.h
@@ -1251,6 +1251,13 @@ UNUSUAL_DEV( 0x0839, 0x000a, 0x0001, 0x0
US_SC_DEVICE, US_PR_DEVICE, NULL,
US_FL_FIX_INQUIRY),
+/* Reported by Luciano Rocha <[email protected]> */
+UNUSUAL_DEV( 0x0840, 0x0082, 0x0001, 0x0001,
+ "Argosy",
+ "Storage",
+ US_SC_DEVICE, US_PR_DEVICE, NULL,
+ US_FL_FIX_CAPACITY),
+
/* Entry and supporting patch by Theodore Kilgore <[email protected]>.
* Flag will support Bulk devices which use a standards-violating 32-byte
* Command Block Wrapper. Here, the "DC2MEGA" cameras (several brands) with
On Mon, Oct 27, 2008 at 04:10:02PM -0400, Alan Stern wrote:
> On Mon, 27 Oct 2008, Luciano Rocha wrote:
>
> > > The patches are here:
> > >
> > > http://marc.info/?l=linux-scsi&m=122443015406309&w=2
> > >
> > > They are based on 2.6.27, not 2.6.28-rc.
> >
> > I've tested them (on 2.6.27.4), but I still get the bug. That email mentions:
> > "Neither patch addresses the infinite-retry problem; I wanted
> > to keep the issues separate."
> >
> > So am I missing some other patch?
> >
> > 2.6.28-rc2 is also buggy.
>
> Okay, I tried the same sort of experiment in 2.6.27.4 and got the same
> result as you did. I was able to fix it by applying the 8bfa24727
> commit mentioned earlier together with the patch below.
I tried it, but I get some I/O errors with it.
With 8bfa24727 and this patch, I get:
[ 94.289819] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 8
[ 94.493891] usb 1-2.1.1: configuration #1 chosen from 1 choice
[ 94.537884] usbcore: registered new interface driver libusual
[ 94.611888] Initializing USB Mass Storage driver...
[ 94.612704] scsi2 : SCSI emulation for USB Mass Storage devices
[ 94.619835] usbcore: registered new interface driver usb-storage
[ 94.619891] USB Mass Storage support registered.
[ 94.622190] usb-storage: device found at 8
[ 94.622232] usb-storage: waiting for device to settle before scanning
[ 99.622933] scsi 2:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0
[ 99.627569] usb-storage: device scan complete
[ 99.657746] Driver 'sd' needs updating - please use bus_type methods
[ 99.661428] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
[ 99.662551] sd 2:0:0:0: [sda] Write Protect is off
[ 99.662662] sd 2:0:0:0: [sda] Mode Sense: 03 00 00 00
[ 99.662702] sd 2:0:0:0: [sda] Assuming drive cache: write through
[ 99.666582] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
[ 99.672877] sd 2:0:0:0: [sda] Write Protect is off
[ 99.672884] sd 2:0:0:0: [sda] Mode Sense: 03 00 00 00
[ 99.672886] sd 2:0:0:0: [sda] Assuming drive cache: write through
[ 99.672890] sda: sda1
[ 99.683592] sd 2:0:0:0: [sda] Attached SCSI disk
[ 99.704160] sd 2:0:0:0: Attached scsi generic sg0 type 0
[ 99.808029] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 99.808040] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 99.808051] end_request: I/O error, dev sda, sector 781422768
[ 99.808056] Buffer I/O error on device sda, logical block 781422768
[ 99.837771] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 99.837780] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 99.837790] end_request: I/O error, dev sda, sector 781422768
[ 99.837794] Buffer I/O error on device sda, logical block 781422768
[ 99.838775] end_request: I/O error, dev sda, sector 781422761
[ 99.838781] Buffer I/O error on device sda, logical block 781422761
[ 99.838784] Buffer I/O error on device sda, logical block 781422762
[ 99.838786] Buffer I/O error on device sda, logical block 781422763
[ 99.838789] Buffer I/O error on device sda, logical block 781422764
[ 99.838791] Buffer I/O error on device sda, logical block 781422765
[ 99.838793] Buffer I/O error on device sda, logical block 781422766
[ 99.838796] Buffer I/O error on device sda, logical block 781422767
[ 99.839525] end_request: I/O error, dev sda, sector 781422762
[ 99.839530] Buffer I/O error on device sda, logical block 781422762
[ 99.847527] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 99.847539] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 99.847554] end_request: I/O error, dev sda, sector 781422768
[ 99.864923] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 99.864942] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 99.864967] end_request: I/O error, dev sda, sector 781422768
[ 99.883038] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 99.883047] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 99.883058] end_request: I/O error, dev sda, sector 781422768
[ 99.884239] end_request: I/O error, dev sda, sector 781422763
[ 99.901645] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 99.901655] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 99.901665] end_request: I/O error, dev sda, sector 781422768
[ 99.925027] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
[ 99.925036] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
[ 99.925046] end_request: I/O error, dev sda, sector 781422768
[ 119.957384] kjournald starting. Commit interval 5 seconds
[ 119.962246] EXT3 FS on dm-5, internal journal
[ 119.962267] EXT3-fs: mounted filesystem with ordered data mode.
Oddly enough, with the other patch you sent me fixing the reported
number of sectors (*), on top of this one, I also get I/O errors:
[ 127.047323] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 13
[ 127.240694] usb 1-2.1.1: configuration #1 chosen from 1 choice
[ 127.241178] scsi3 : SCSI emulation for USB Mass Storage devices
[ 127.241752] usb-storage: device found at 13
[ 127.241924] usb-storage: waiting for device to settle before scanning
[ 132.242710] scsi 3:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0
[ 132.244195] sd 3:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
[ 132.245070] sd 3:0:0:0: [sda] Write Protect is off
[ 132.245162] sd 3:0:0:0: [sda] Mode Sense: 03 00 00 00
[ 132.245190] sd 3:0:0:0: [sda] Assuming drive cache: write through
[ 132.245929] sd 3:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
[ 132.246686] sd 3:0:0:0: [sda] Write Protect is off
[ 132.246691] sd 3:0:0:0: [sda] Mode Sense: 03 00 00 00
[ 132.246693] sd 3:0:0:0: [sda] Assuming drive cache: write through
[ 132.246696] sda: sda1
[ 132.272924] sd 3:0:0:0: [sda] Attached SCSI disk
[ 132.273058] sd 3:0:0:0: Attached scsi generic sg0 type 0
[ 132.273297] usb-storage: device scan complete
[ 132.325802] end_request: I/O error, dev sda, sector 781422761
[ 132.325815] __ratelimit: 6 callbacks suppressed
[ 132.325818] Buffer I/O error on device sda, logical block 97677845
[ 132.327676] end_request: I/O error, dev sda, sector 781422761
[ 132.327689] Buffer I/O error on device sda, logical block 97677845
[ 132.328302] end_request: I/O error, dev sda, sector 781422761
[ 132.328314] Buffer I/O error on device sda, logical block 97677845
[ 132.328926] end_request: I/O error, dev sda, sector 781422761
[ 132.328938] Buffer I/O error on device sda, logical block 97677845
[ 132.329551] end_request: I/O error, dev sda, sector 781422761
[ 132.329557] Buffer I/O error on device sda, logical block 97677845
[ 132.330176] end_request: I/O error, dev sda, sector 781422761
[ 132.330188] Buffer I/O error on device sda, logical block 97677845
[ 132.330800] end_request: I/O error, dev sda, sector 781422761
[ 132.330807] Buffer I/O error on device sda, logical block 97677845
[ 132.345053] end_request: I/O error, dev sda, sector 781422761
[ 132.345065] Buffer I/O error on device sda, logical block 97677845
[ 132.345926] end_request: I/O error, dev sda, sector 781422761
[ 132.345938] Buffer I/O error on device sda, logical block 97677845
[ 136.743629] kjournald starting. Commit interval 5 seconds
[ 136.744951] EXT3 FS on dm-5, internal journal
[ 136.745039] EXT3-fs: mounted filesystem with ordered data mode.
Without this patch, but still with the (*) one, everything seems fine:
[ 90.680299] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 8
[ 90.873674] usb 1-2.1.1: configuration #1 chosen from 1 choice
[ 91.493054] usbcore: registered new interface driver libusual
[ 91.582207] Initializing USB Mass Storage driver...
[ 91.582696] scsi2 : SCSI emulation for USB Mass Storage devices
[ 91.584473] usbcore: registered new interface driver usb-storage
[ 91.584582] USB Mass Storage support registered.
[ 91.585032] usb-storage: device found at 8
[ 91.585039] usb-storage: waiting for device to settle before scanning
[ 96.584772] scsi 2:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0
[ 96.585058] usb-storage: device scan complete
[ 96.623106] Driver 'sd' needs updating - please use bus_type methods
[ 96.624120] sd 2:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
[ 96.624869] sd 2:0:0:0: [sda] Write Protect is off
[ 96.624876] sd 2:0:0:0: [sda] Mode Sense: 03 00 00 00
[ 96.624880] sd 2:0:0:0: [sda] Assuming drive cache: write through
[ 96.625615] sd 2:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
[ 96.626365] sd 2:0:0:0: [sda] Write Protect is off
[ 96.626371] sd 2:0:0:0: [sda] Mode Sense: 03 00 00 00
[ 96.626375] sd 2:0:0:0: [sda] Assuming drive cache: write through
[ 96.626379] sda: sda1
[ 96.638218] sd 2:0:0:0: [sda] Attached SCSI disk
[ 96.662312] sd 2:0:0:0: Attached scsi generic sg0 type 0
[ 107.078870] kjournald starting. Commit interval 5 seconds
[ 107.080071] EXT3 FS on dm-5, internal journal
[ 107.080082] EXT3-fs: mounted filesystem with ordered data mode.
Do you wish me to test anything else?
Regards,
Luciano Rocha
--
Luciano Rocha <[email protected]>
Eurotux Inform?tica, S.A. <http://www.eurotux.com/>
On Tue, 28 Oct 2008, Luciano Rocha wrote:
> >
> > Okay, I tried the same sort of experiment in 2.6.27.4 and got the same
> > result as you did. I was able to fix it by applying the 8bfa24727
> > commit mentioned earlier together with the patch below.
>
> I tried it, but I get some I/O errors with it.
That's to be expected. These patches aren't supposed to prevent I/O
errors, they are merely supposed to prevent the errors from continuing
indefinitely.
> With 8bfa24727 and this patch, I get:
>
> [ 94.289819] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 8
> [ 94.493891] usb 1-2.1.1: configuration #1 chosen from 1 choice
> [ 94.537884] usbcore: registered new interface driver libusual
> [ 94.611888] Initializing USB Mass Storage driver...
> [ 94.612704] scsi2 : SCSI emulation for USB Mass Storage devices
> [ 94.619835] usbcore: registered new interface driver usb-storage
> [ 94.619891] USB Mass Storage support registered.
> [ 94.622190] usb-storage: device found at 8
> [ 94.622232] usb-storage: waiting for device to settle before scanning
> [ 99.622933] scsi 2:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0
> [ 99.627569] usb-storage: device scan complete
> [ 99.657746] Driver 'sd' needs updating - please use bus_type methods
> [ 99.661428] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
> [ 99.662551] sd 2:0:0:0: [sda] Write Protect is off
> [ 99.662662] sd 2:0:0:0: [sda] Mode Sense: 03 00 00 00
> [ 99.662702] sd 2:0:0:0: [sda] Assuming drive cache: write through
> [ 99.666582] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB)
> [ 99.672877] sd 2:0:0:0: [sda] Write Protect is off
> [ 99.672884] sd 2:0:0:0: [sda] Mode Sense: 03 00 00 00
> [ 99.672886] sd 2:0:0:0: [sda] Assuming drive cache: write through
> [ 99.672890] sda: sda1
> [ 99.683592] sd 2:0:0:0: [sda] Attached SCSI disk
> [ 99.704160] sd 2:0:0:0: Attached scsi generic sg0 type 0
> [ 99.808029] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 99.808040] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 99.808051] end_request: I/O error, dev sda, sector 781422768
> [ 99.808056] Buffer I/O error on device sda, logical block 781422768
> [ 99.837771] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 99.837780] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 99.837790] end_request: I/O error, dev sda, sector 781422768
> [ 99.837794] Buffer I/O error on device sda, logical block 781422768
> [ 99.838775] end_request: I/O error, dev sda, sector 781422761
> [ 99.838781] Buffer I/O error on device sda, logical block 781422761
> [ 99.838784] Buffer I/O error on device sda, logical block 781422762
> [ 99.838786] Buffer I/O error on device sda, logical block 781422763
> [ 99.838789] Buffer I/O error on device sda, logical block 781422764
> [ 99.838791] Buffer I/O error on device sda, logical block 781422765
> [ 99.838793] Buffer I/O error on device sda, logical block 781422766
> [ 99.838796] Buffer I/O error on device sda, logical block 781422767
> [ 99.839525] end_request: I/O error, dev sda, sector 781422762
> [ 99.839530] Buffer I/O error on device sda, logical block 781422762
> [ 99.847527] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 99.847539] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 99.847554] end_request: I/O error, dev sda, sector 781422768
> [ 99.864923] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 99.864942] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 99.864967] end_request: I/O error, dev sda, sector 781422768
> [ 99.883038] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 99.883047] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 99.883058] end_request: I/O error, dev sda, sector 781422768
> [ 99.884239] end_request: I/O error, dev sda, sector 781422763
> [ 99.901645] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 99.901655] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 99.901665] end_request: I/O error, dev sda, sector 781422768
> [ 99.925027] sd 2:0:0:0: [sda] Sense Key : No Sense [current]
> [ 99.925036] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> [ 99.925046] end_request: I/O error, dev sda, sector 781422768
> [ 119.957384] kjournald starting. Commit interval 5 seconds
> [ 119.962246] EXT3 FS on dm-5, internal journal
> [ 119.962267] EXT3-fs: mounted filesystem with ordered data mode.
And the errors did stop, so the patch did what it was supposed to do.
> Oddly enough, with the other patch you sent me fixing the reported
> number of sectors (*), on top of this one, I also get I/O errors:
>
>
> [ 127.047323] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 13
> [ 127.240694] usb 1-2.1.1: configuration #1 chosen from 1 choice
> [ 127.241178] scsi3 : SCSI emulation for USB Mass Storage devices
> [ 127.241752] usb-storage: device found at 13
> [ 127.241924] usb-storage: waiting for device to settle before scanning
> [ 132.242710] scsi 3:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0
> [ 132.244195] sd 3:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
> [ 132.245070] sd 3:0:0:0: [sda] Write Protect is off
> [ 132.245162] sd 3:0:0:0: [sda] Mode Sense: 03 00 00 00
> [ 132.245190] sd 3:0:0:0: [sda] Assuming drive cache: write through
> [ 132.245929] sd 3:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
> [ 132.246686] sd 3:0:0:0: [sda] Write Protect is off
> [ 132.246691] sd 3:0:0:0: [sda] Mode Sense: 03 00 00 00
> [ 132.246693] sd 3:0:0:0: [sda] Assuming drive cache: write through
> [ 132.246696] sda: sda1
> [ 132.272924] sd 3:0:0:0: [sda] Attached SCSI disk
> [ 132.273058] sd 3:0:0:0: Attached scsi generic sg0 type 0
> [ 132.273297] usb-storage: device scan complete
> [ 132.325802] end_request: I/O error, dev sda, sector 781422761
> [ 132.325815] __ratelimit: 6 callbacks suppressed
> [ 132.325818] Buffer I/O error on device sda, logical block 97677845
> [ 132.327676] end_request: I/O error, dev sda, sector 781422761
> [ 132.327689] Buffer I/O error on device sda, logical block 97677845
> [ 132.328302] end_request: I/O error, dev sda, sector 781422761
> [ 132.328314] Buffer I/O error on device sda, logical block 97677845
> [ 132.328926] end_request: I/O error, dev sda, sector 781422761
> [ 132.328938] Buffer I/O error on device sda, logical block 97677845
> [ 132.329551] end_request: I/O error, dev sda, sector 781422761
> [ 132.329557] Buffer I/O error on device sda, logical block 97677845
> [ 132.330176] end_request: I/O error, dev sda, sector 781422761
> [ 132.330188] Buffer I/O error on device sda, logical block 97677845
> [ 132.330800] end_request: I/O error, dev sda, sector 781422761
> [ 132.330807] Buffer I/O error on device sda, logical block 97677845
> [ 132.345053] end_request: I/O error, dev sda, sector 781422761
> [ 132.345065] Buffer I/O error on device sda, logical block 97677845
> [ 132.345926] end_request: I/O error, dev sda, sector 781422761
> [ 132.345938] Buffer I/O error on device sda, logical block 97677845
> [ 136.743629] kjournald starting. Commit interval 5 seconds
> [ 136.744951] EXT3 FS on dm-5, internal journal
> [ 136.745039] EXT3-fs: mounted filesystem with ordered data mode.
That's a little strange, but not unheard of.
> Without this patch, but still with the (*) one, everything seems fine:
>
> [ 90.680299] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 8
> [ 90.873674] usb 1-2.1.1: configuration #1 chosen from 1 choice
> [ 91.493054] usbcore: registered new interface driver libusual
> [ 91.582207] Initializing USB Mass Storage driver...
> [ 91.582696] scsi2 : SCSI emulation for USB Mass Storage devices
> [ 91.584473] usbcore: registered new interface driver usb-storage
> [ 91.584582] USB Mass Storage support registered.
> [ 91.585032] usb-storage: device found at 8
> [ 91.585039] usb-storage: waiting for device to settle before scanning
> [ 96.584772] scsi 2:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0
> [ 96.585058] usb-storage: device scan complete
> [ 96.623106] Driver 'sd' needs updating - please use bus_type methods
> [ 96.624120] sd 2:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
> [ 96.624869] sd 2:0:0:0: [sda] Write Protect is off
> [ 96.624876] sd 2:0:0:0: [sda] Mode Sense: 03 00 00 00
> [ 96.624880] sd 2:0:0:0: [sda] Assuming drive cache: write through
> [ 96.625615] sd 2:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
> [ 96.626365] sd 2:0:0:0: [sda] Write Protect is off
> [ 96.626371] sd 2:0:0:0: [sda] Mode Sense: 03 00 00 00
> [ 96.626375] sd 2:0:0:0: [sda] Assuming drive cache: write through
> [ 96.626379] sda: sda1
> [ 96.638218] sd 2:0:0:0: [sda] Attached SCSI disk
> [ 96.662312] sd 2:0:0:0: Attached scsi generic sg0 type 0
> [ 107.078870] kjournald starting. Commit interval 5 seconds
> [ 107.080071] EXT3 FS on dm-5, internal journal
> [ 107.080082] EXT3-fs: mounted filesystem with ordered data mode.
Now that is truly bizarre. It means that the drive responded
differently even though it received the same sequence of commands as
before.
> Do you wish me to test anything else?
Yes. Please post a usbmon trace for each of the last two tests. That
is, with the unusual_devs update only, and with the unusual_devs update
plus the invoke_transport update and the scsi_requeue_command update.
Alan Stern
On Tue, Oct 28, 2008 at 01:38:11PM -0400, Alan Stern wrote:
> > Do you wish me to test anything else?
>
> Yes. Please post a usbmon trace for each of the last two tests. That
> is, with the unusual_devs update only, and with the unusual_devs update
> plus the invoke_transport update and the scsi_requeue_command update.
I've updated the bugzilla entry with traces for some combinations of the
discussed patches:
http://bugzilla.kernel.org/show_bug.cgi?id=11843
--
Luciano Rocha <[email protected]>
Eurotux Inform?tica, S.A. <http://www.eurotux.com/>
On Mon, 3 Nov 2008, Luciano Rocha wrote:
> On Tue, Oct 28, 2008 at 01:38:11PM -0400, Alan Stern wrote:
> > > Do you wish me to test anything else?
> >
> > Yes. Please post a usbmon trace for each of the last two tests. That
> > is, with the unusual_devs update only, and with the unusual_devs update
> > plus the invoke_transport update and the scsi_requeue_command update.
>
> I've updated the bugzilla entry with traces for some combinations of the
> discussed patches:
> http://bugzilla.kernel.org/show_bug.cgi?id=11843
Okay, thanks. I see the problem. This whole area is a mess, and it's
going to be different in 2.6.28. But for now...
Please rerun the A+B and A+B+C tests, but this time using the B' patch
below instead of B.
Alan Stern
Index: 2.6.27.4/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
+++ 2.6.27.4/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct
struct request *req = cmd->request;
unsigned long flags;
+ if (--req->retries <= 0) {
+ blk_end_request(req, -EIO, blk_rq_bytes(req));
+ scsi_next_command(cmd);
+ return;
+ }
scsi_unprep_request(req);
spin_lock_irqsave(q->queue_lock, flags);
blk_requeue_request(q, req);
@@ -690,6 +695,8 @@ static struct scsi_cmnd *scsi_end_reques
* leftovers in the front of the
* queue, and goose the queue again.
*/
+ if (bytes > 0) /* Made progress */
+ ++req->retries;
scsi_requeue_command(q, cmd);
cmd = NULL;
}
On Mon, Nov 03, 2008 at 02:46:38PM -0500, Alan Stern wrote:
> On Mon, 3 Nov 2008, Luciano Rocha wrote:
>
> > On Tue, Oct 28, 2008 at 01:38:11PM -0400, Alan Stern wrote:
> > > > Do you wish me to test anything else?
> > >
> > > Yes. Please post a usbmon trace for each of the last two tests. That
> > > is, with the unusual_devs update only, and with the unusual_devs update
> > > plus the invoke_transport update and the scsi_requeue_command update.
> >
> > I've updated the bugzilla entry with traces for some combinations of the
> > discussed patches:
> > http://bugzilla.kernel.org/show_bug.cgi?id=11843
>
> Okay, thanks. I see the problem. This whole area is a mess, and it's
> going to be different in 2.6.28. But for now...
>
> Please rerun the A+B and A+B+C tests, but this time using the B' patch
> below instead of B.
New test data in the bugzilla.
Regards,
Luciano Rocha
--
Luciano Rocha <[email protected]>
Eurotux Inform?tica, S.A. <http://www.eurotux.com/>
On Wed, 5 Nov 2008, Luciano Rocha wrote:
> > Please rerun the A+B and A+B+C tests, but this time using the B' patch
> > below instead of B.
>
> New test data in the bugzilla.
Sorry, dumb mistake on my part. B'' is below. It is the same as B'
except for one character.
Alan Stern
Index: 2.6.27.4/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
+++ 2.6.27.4/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct
struct request *req = cmd->request;
unsigned long flags;
+ if (--req->retries < 0) {
+ blk_end_request(req, -EIO, blk_rq_bytes(req));
+ scsi_next_command(cmd);
+ return;
+ }
scsi_unprep_request(req);
spin_lock_irqsave(q->queue_lock, flags);
blk_requeue_request(q, req);
@@ -690,6 +695,8 @@ static struct scsi_cmnd *scsi_end_reques
* leftovers in the front of the
* queue, and goose the queue again.
*/
+ if (bytes > 0) /* Made progress */
+ ++req->retries;
scsi_requeue_command(q, cmd);
cmd = NULL;
}
On Wed, Nov 05, 2008 at 11:51:41AM -0500, Alan Stern wrote:
> On Wed, 5 Nov 2008, Luciano Rocha wrote:
>
> > > Please rerun the A+B and A+B+C tests, but this time using the B' patch
> > > below instead of B.
> >
> > New test data in the bugzilla.
>
> Sorry, dumb mistake on my part. B'' is below. It is the same as B'
> except for one character.
A+B''+C seems to have done it. The bugzilla entry was updated with the
kernel messages and usb traffic.
http://bugzilla.kernel.org/show_bug.cgi?id=11843
Regards,
Luciano Rocha
--
Luciano Rocha <[email protected]>
Eurotux Inform?tica, S.A. <http://www.eurotux.com/>