2001-07-09 18:25:13

by Cress, Andrew R

[permalink] [raw]
Subject: [PATCH-2.4.3] scsi logging


I'd like to propose the following patch to 3 SCSI mid-layer files from
kernel 2.4.3. I have tested this with 2.4.3, but it should be relevant to
other 2.4.x
kernels also.

It has the following changes/enhancements:
1) Log the disk serial number during scsi_scan() - scsi_scan.c.
Why: This is a requirement in some environments to ensure unambiguous
identification of a particular problem disk.
2) Interpret additional values in print_sense_internal() - constants.c.
Why: The detail wrt Illegal Requests is very useful, since it can
indicate
either an application bug or an incompatible feature of the device.
3) Don't skip logging sense errors for sg functions - sg.c.
Why: All sense errors should be logged so that a potential scsi device
hardware problem doesn't go unrecognized.

Andy Cress

--- linux-243.org/drivers/scsi/scsi_scan.c Sun Feb 4 13:05:30 2001
+++ linux-243/drivers/scsi/scsi_scan.c Tue Jun 19 15:33:38 2001
@@ -212,6 +212,16 @@
printk(" ");
}

+ if ((data[0] & 0x1f) == 0) { /* if type == disk */
+ printk(" Ser#: ");
+ for (i = 36; i < 44; i++) {
+ if (data[i] >= 0x20 && i < data[4] + 5)
+ printk("%c", data[i]);
+ else
+ printk(" ");
+ }
+ } /*endif*/
+
printk("\n");

i = data[0] & 0x1f;
--- linux-243.org/drivers/scsi/constants.c Mon Jan 15 16:08:15 2001
+++ linux-243/drivers/scsi/constants.c Wed Jun 27 16:35:49 2001
@@ -690,17 +690,27 @@
{
int i, s;
int sense_class, valid, code;
+ int key;
const char * error = NULL;

sense_class = (sense_buffer[0] >> 4) & 0x07;
code = sense_buffer[0] & 0xf;
valid = sense_buffer[0] & 0x80;
+ key = sense_buffer[2] & 0x0f;

if (sense_class == 7) { /* extended sense data */
s = sense_buffer[7] + 8;
if(s > SCSI_SENSE_BUFFERSIZE)
s = SCSI_SENSE_BUFFERSIZE;

+#if (CONSTANTS & CONST_SENSE)
+ printk( "%s%s: sense key %s\n", devclass,
+ kdevname(dev), snstext[key]);
+#else
+ printk("%s%s: sns = %2x %2x\n", devclass,
+ kdevname(dev), sense_buffer[0], sense_buffer[2]);
+#endif
+
if (!valid)
printk("[valid=0] ");
printk("Info fld=0x%x, ", (int)((sense_buffer[3] << 24) |
@@ -728,14 +738,6 @@

printk("%s ", error);

-#if (CONSTANTS & CONST_SENSE)
- printk( "%s%s: sense key %s\n", devclass,
- kdevname(dev), snstext[sense_buffer[2] & 0x0f]);
-#else
- printk("%s%s: sns = %2x %2x\n", devclass,
- kdevname(dev), sense_buffer[0], sense_buffer[2]);
-#endif
-
/* Check to see if additional sense information is available */
if(sense_buffer[7] + 7 < 13 ||
(sense_buffer[12] == 0 && sense_buffer[13] == 0)) goto done;
@@ -754,6 +756,28 @@
printk(additional2[i].text, sense_buffer[13]);
printk("\n");
};
+
+ /* Illegal Request & SKSV is set (Sense-Key Specific Field Valid
bit)*/
+ if (key == 0x05 && (sense_buffer[15] & 0x80) == 0x80) {
+ int bytenum;
+ /* Now find the invalid byte in parameter list or CDB */
+ (sense_buffer[15] & 0x40) == 0x40 ? printk("Error in CDB: ")
+ : printk("Error in Parameter List: ");
+ bytenum = sense_buffer[16] * 0x100 + sense_buffer[17];
+ printk("Byte #%d", bytenum);
+ if ((sense_buffer[15] & 0x08) == 0x08) /*BPV(Bit Pointer
Valid)?*/
+ printk(", Bit #%d is in error.\n", sense_buffer[15] & 0x07);
+ else printk(" is in error.\n");
+ }
+
+ /* If media error (bad spot), indicate where (LBA) */
+ if (key == MEDIUM_ERROR || key == RECOVERED_ERROR ||
+ key == HARDWARE_ERROR || key == VOLUME_OVERFLOW ||
+ key == MISCOMPARE ) {
+ printk("LBA=%02x%02x%02x%02x \n",
+ sense_buffer[3], sense_buffer[4],
+ sense_buffer[5], sense_buffer[6]);
+ }
#else
printk("ASC=%2x ASCQ=%2x\n", sense_buffer[12], sense_buffer[13]);
#endif
--- linux-243.org/drivers/scsi/sg.c Mon Jan 15 16:08:15 2001
+++ linux-243/drivers/scsi/sg.c Wed Jun 27 16:27:56 2001
@@ -1049,9 +1049,9 @@
srp->header.msg_status = msg_byte(SRpnt->sr_result);
srp->header.host_status = host_byte(SRpnt->sr_result);
srp->header.driver_status = driver_byte(SRpnt->sr_result);
- if ((sdp->sgdebug > 0) &&
- ((CHECK_CONDITION == srp->header.masked_status) ||
- (COMMAND_TERMINATED == srp->header.masked_status)))
+ if ((CHECK_CONDITION == srp->header.masked_status) ||
+ (COMMAND_TERMINATED == srp->header.masked_status) ||
+ (DRIVER_SENSE & srp->header.driver_status))
print_req_sense("sg_cmd_done_bh", SRpnt);

/* Following if statement is a patch supplied by Eric Youngdale */



2001-07-10 04:36:56

by Douglas Gilbert

[permalink] [raw]
Subject: Re: [PATCH-2.4.3] scsi logging

Andrew wrote:
> I'd like to propose the following patch to 3 SCSI mid-layer
> files from kernel 2.4.3. I have tested this with 2.4.3,
> but it should be relevant to other 2.4.x kernels also.
>
> It has the following changes/enhancements:
> 1) Log the disk serial number during scsi_scan()
> - scsi_scan.c.
> Why: This is a requirement in some environments to
> ensure unambiguous identification of a particular
> problem disk.
> 2) Interpret additional values in print_sense_internal()
> - constants.c. Why: The detail wrt Illegal Requests
> is very useful, since it can indicate either an
> application bug or an incompatible feature of the device.
> 3) Don't skip logging sense errors for sg functions - sg.c.
> Why: All sense errors should be logged so that a
> potential scsi device hardware problem doesn't go
> unrecognized.

Andrew,
I would object to point 3). SANE, and to a lesser extent
cdrecord, execute lots of commands that give SCSI check
conditions and would bloat the log and the console with
many serious looking messages. Those error
indications are conveyed back to the app via the sg
interface so the information is not lost. There is an
ioctl in the sg driver [SG_SET_DEBUG] to turn on that
output to the log/console [the default is off (to
stop the curious querying the maintainer about the
strange messages in their logs)].

Doug Gilbert

2001-07-10 12:41:30

by Cress, Andrew R

[permalink] [raw]
Subject: RE: [PATCH-2.4.3] scsi logging


DG> I would object to point 3). SANE, and to a lesser extent
DG> cdrecord, execute lots of commands that give SCSI check
DG> conditions and would bloat the log and the console with
DG> many serious looking messages. [...]

Doug,

Would it be possible to recognize & filter the specific errors output by
SANE and cdrecord rather than removing all sg sense errors? Or to ignore
all but certain sense codes unless SG_SET_DEBUG is set?
What I'm most concerned about is to make sure that any media problems on
fixed disks (03/11/00, etc.) don't get missed.

Perhaps something like:
if (CHECK_CONDITION ...) {
if ((type == disk && (key == MEDIUM_ERROR || key == HARDWARE_ERROR))
|| (sdp->sgdebug > 0))
print_req_sense(...)
}
How about that?

Andy


2001-07-10 13:35:47

by Eddie Williams

[permalink] [raw]
Subject: Re: [PATCH-2.4.3] scsi logging


Hi Andy, a name from the past... :-)

I agree with Doug. I think the behavior of the sg driver to not log errors or
do retries is appropriate. To put in context to yours and mine past lives,
NCR MP/RAS had similar functionality in the libscsi interface. That interface
also turned error logging and retries off by default. This is necessary for
an interface that is to be used for error diagnosis, recovery, probing and the
like.

LifeKeeper uses the sg driver to probe for devices and to get the serial
number that you listed in (1) in the original request. This can and will
generate errors that one would not want logged. The difficult part about
logging errors is the "thundering herd" problem. If you log a lot of
unnecessary or meaningless errors then the "real" errors are lost in the flow.
I think one of the responsibilities of using the sg driver is that an
application has to handle the errors that will occur and log or retry as
appropriate.

I think SANE and cdrecored are just two examples of programs that expect this
behavior (no logging of errors). LifeKeeper is another example. I would be
willing to bet there are others. So changing this behavior could have a wide
ranging affect.

Getting back to your first comment (sorry I have already deleted your initial
mail) did you provide the patch for getting the serial number? Note that
getting the serial number is not a required SCSI feature so you wont get 100%
of the devices.

Eddie Williams

> Andrew wrote:
> > I'd like to propose the following patch to 3 SCSI mid-layer
> > files from kernel 2.4.3. I have tested this with 2.4.3,
> > but it should be relevant to other 2.4.x kernels also.
> >
> > It has the following changes/enhancements:
> > 1) Log the disk serial number during scsi_scan()
> > - scsi_scan.c.
> > Why: This is a requirement in some environments to
> > ensure unambiguous identification of a particular
> > problem disk.
> > 2) Interpret additional values in print_sense_internal()
> > - constants.c. Why: The detail wrt Illegal Requests
> > is very useful, since it can indicate either an
> > application bug or an incompatible feature of the device.
> > 3) Don't skip logging sense errors for sg functions - sg.c.
> > Why: All sense errors should be logged so that a
> > potential scsi device hardware problem doesn't go
> > unrecognized.
>
> Andrew,
> I would object to point 3). SANE, and to a lesser extent
> cdrecord, execute lots of commands that give SCSI check
> conditions and would bloat the log and the console with
> many serious looking messages. Those error
> indications are conveyed back to the app via the sg
> interface so the information is not lost. There is an
> ioctl in the sg driver [SG_SET_DEBUG] to turn on that
> output to the log/console [the default is off (to
> stop the curious querying the maintainer about the
> strange messages in their logs)].
>
> Doug Gilbert
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]


2001-07-10 16:17:53

by Mike Anderson

[permalink] [raw]
Subject: Re: [PATCH-2.4.3] scsi logging


While I would like to see scsi scan pick up / print serial numbers or a
form of device UUID I have to agree with Eddie that this patch would only
provide data for devices that decided to fill in the vendor specific area
of the standard inquiry page.

A more generic solution would involve two more inquiry commands to the
device using the EVPD bit. The first would be to request page 0 and then
working from highest to lowest use page 0x83, 0x80, stanard inquiry page
(vendor, product, serial). In some cases though with really old devices
this still might not generate useful information.

-Mike

Eddie Williams [[email protected]] wrote:
>
> .. Snip ...
> Getting back to your first comment (sorry I have already deleted your initial
> mail) did you provide the patch for getting the serial number? Note that
> getting the serial number is not a required SCSI feature so you wont get 100%
> of the devices.
>
> Eddie Williams
>
> > Andrew wrote:
> > > I'd like to propose the following patch to 3 SCSI mid-layer
> > > files from kernel 2.4.3. I have tested this with 2.4.3,
> > > but it should be relevant to other 2.4.x kernels also.
> > >
> > > It has the following changes/enhancements:
> > > 1) Log the disk serial number during scsi_scan()
> > > - scsi_scan.c.
> > > Why: This is a requirement in some environments to
> > > ensure unambiguous identification of a particular
> > > problem disk.
> > > 2) Interpret additional values in print_sense_internal()
> > > - constants.c. Why: The detail wrt Illegal Requests
> > > is very useful, since it can indicate either an
> > > application bug or an incompatible feature of the device.
> > > 3) Don't skip logging sense errors for sg functions - sg.c.
> > > Why: All sense errors should be logged so that a
> > > potential scsi device hardware problem doesn't go
> > > unrecognized.
> >
> > Andrew,
> > I would object to point 3). SANE, and to a lesser extent
> > cdrecord, execute lots of commands that give SCSI check
> > conditions and would bloat the log and the console with
> > many serious looking messages. Those error
> > indications are conveyed back to the app via the sg
> > interface so the information is not lost. There is an
> > ioctl in the sg driver [SG_SET_DEBUG] to turn on that
> > output to the log/console [the default is off (to
> > stop the curious querying the maintainer about the
> > strange messages in their logs)].
> >
> > Doug Gilbert
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to [email protected]
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Michael Anderson
[email protected]