2003-09-30 17:10:28

by linas

[permalink] [raw]
Subject: [2.4 PATCH:] Lengthen SCSI timeouts to deal with broken hardware



[PATCH: 2.4] Lengthen SCSI timeouts to deal with broken hardware

Hi,

The following patch lengthens some scsi timeouts to deal with some
misbehaving hardware. The device in question is an Achip ARC765-based
ide-to-scsi converter that is used to attach IDE DVD-ROMS to scsi
chains. This device can take a very long time (10-15 seconds) to
get back to a normal state after a bus reset, causing the scsi layer
to timeout on commands issued after the reset.

We measured these timeouts using a SCSI bus analyzer. Our guess is
that the slow IDE-CDROM behind this device is helping make the response
times exceptionally large.

The patch below is minimalistic, and not at all fancy. Its also marked
as a __powerpc64__ patch, because this device commonly ships with ppc64
machines. Precisely speaking, though, this is not an architecture
specific problem, anyone with this Achip/Acard will see the (hardware)
bug. I suspect this is a rare, obscure device on non-ppc64 machines.
I've made the patch a __powerpc64__ patch so as to limit impact and
complaints and reasons for rejecting this patch.


--linas


--- drivers/scsi/scsi_obsolete.c.orig 2003-09-29 17:47:26.000000000 -0500
+++ drivers/scsi/scsi_obsolete.c 2003-09-29 17:51:40.000000000 -0500
@@ -118,10 +118,19 @@ static void scsi_dump_status(void);
#define ABORT_TIMEOUT SCSI_TIMEOUT
#define RESET_TIMEOUT SCSI_TIMEOUT
#else
+#if defined(__powerpc64__)
+/* Some Achip ARC765-based DVD-ROM's can take 15 seconds or more to reset.
+ * All commands (sense, abort) will not get a response until the reset
+ * completes. Lengthen timeouts to make up for this. */
+#define SENSE_TIMEOUT (20*HZ)
+#define RESET_TIMEOUT (2*HZ)
+#define ABORT_TIMEOUT (25*HZ)
+#else
#define SENSE_TIMEOUT (5*HZ/10)
#define RESET_TIMEOUT (5*HZ/10)
#define ABORT_TIMEOUT (5*HZ/10)
#endif
+#endif


/* Do not call reset on error if we just did a reset within 15 sec. */
--- drivers/scsi/scsi_scan.c.orig 2003-09-29 17:52:11.000000000 -0500
+++ drivers/scsi/scsi_scan.c 2003-09-29 17:56:58.000000000 -0500
@@ -23,6 +23,15 @@
#include <linux/kmod.h>
#endif

+#if defined(__powerpc64__)
+/* Some Achip ARC765-based DVD-ROM's can take 15 seconds or more to reset.
+ * All commands (sense, abort) will not get a response until the reset
+ * completes. Lengthen timeouts to make up for this. */
+#define SCSI_INQ_TIMEOUT (SCSI_TIMEOUT+25*HZ)
+#else
+#define SCSI_INQ_TIMEOUT (SCSI_TIMEOUT+4*HZ)
+#endif
+
/*
* Flags for irregular SCSI devices that need special treatment
*/
@@ -602,7 +611,7 @@ static int scan_scsis_single(unsigned in

scsi_wait_req (SRpnt, (void *) scsi_cmd,
(void *) scsi_result,
- 256, SCSI_TIMEOUT+4*HZ, 3);
+ 256, SCSI_INQ_TIMEOUT, 3);

SCSI_LOG_SCAN_BUS(3, printk("scsi: INQUIRY %s with code 0x%x\n",
SRpnt->sr_result ? "failed" : "successful", SRpnt->sr_result));


2003-09-30 17:20:22

by Jeff Garzik

[permalink] [raw]
Subject: Re: [2.4 PATCH:] Lengthen SCSI timeouts to deal with broken hardware

On Tue, Sep 30, 2003 at 12:09:44PM -0500, [email protected] wrote:
>
>
> --- drivers/scsi/scsi_obsolete.c.orig 2003-09-29 17:47:26.000000000 -0500
> +++ drivers/scsi/scsi_obsolete.c 2003-09-29 17:51:40.000000000 -0500
> @@ -118,10 +118,19 @@ static void scsi_dump_status(void);
> #define ABORT_TIMEOUT SCSI_TIMEOUT
> #define RESET_TIMEOUT SCSI_TIMEOUT
> #else
> +#if defined(__powerpc64__)
> +/* Some Achip ARC765-based DVD-ROM's can take 15 seconds or more to reset.
> + * All commands (sense, abort) will not get a response until the reset
> + * completes. Lengthen timeouts to make up for this. */
> +#define SENSE_TIMEOUT (20*HZ)
> +#define RESET_TIMEOUT (2*HZ)
> +#define ABORT_TIMEOUT (25*HZ)

This should be device-dependent, not platform-dependent.

Jeff



2003-09-30 18:13:37

by linas

[permalink] [raw]
Subject: Re: [2.4 PATCH:] Lengthen SCSI timeouts to deal with broken hardware

On Tue, Sep 30, 2003 at 01:16:19PM -0400, Jeff Garzik wrote:
> On Tue, Sep 30, 2003 at 12:09:44PM -0500, [email protected] wrote:
> >
> >
> > --- drivers/scsi/scsi_obsolete.c.orig 2003-09-29 17:47:26.000000000 -0500
> > +++ drivers/scsi/scsi_obsolete.c 2003-09-29 17:51:40.000000000 -0500
> > @@ -118,10 +118,19 @@ static void scsi_dump_status(void);
> > #define ABORT_TIMEOUT SCSI_TIMEOUT
> > #define RESET_TIMEOUT SCSI_TIMEOUT
> > #else
> > +#if defined(__powerpc64__)
> > +/* Some Achip ARC765-based DVD-ROM's can take 15 seconds or more to reset.
> > + * All commands (sense, abort) will not get a response until the reset
> > + * completes. Lengthen timeouts to make up for this. */
> > +#define SENSE_TIMEOUT (20*HZ)
> > +#define RESET_TIMEOUT (2*HZ)
> > +#define ABORT_TIMEOUT (25*HZ)
>
> This should be device-dependent, not platform-dependent.


Hi Jeff,

Easier said than done. I could add a new bit to the device blacklist that
covers timeouts. However, one of the timeouts pops during the scsi
inquiry, before we've even identified the device type.

Let me review the changes to make this patch device-dependent:
-- add a new bitfield to the blacklist (in scsi_scan.c) for timeouts
-- add new fields to struct scsi_device that hold timeout values
-- make changes in various places to use the timeout value that
was stored in Scsi_Device, including, possibly, no longer passing
the timeout as a subroutine arg.

This would be a consderably larger, possibly controversial patch.
I suppose I can do it if that's what is needed, but I thought I'd opt
for the minimal, 'tactical' first, dirty as it is.

My goal is to get the fix in there, and make everybody happy in the
process. How should I proceed?

--linas