This adds human-readable decoding of the ATA status and error registers (similar
to what drivers/ide does) as well as the SATA Serror register to libata error
handling output. This prevents the need to pore through standards documents
to figure out the meaning of the bits in these registers when looking at error
reports. Some bits that drivers/ide decoded are not decoded here, since the bits
are either command-dependent or obsolete, and properly parsing them would add
too much complexity.
Signed-off-by: Robert Hancock <[email protected]>
--- linux-2.6.21.1/drivers/ata/libata-eh.c 2007-04-27 15:49:26.000000000 -0600
+++ linux-2.6.21.1edit/drivers/ata/libata-eh.c 2007-05-09 19:47:53.000000000 -0600
@@ -1523,6 +1523,27 @@ static void ata_eh_report(struct ata_por
ata_port_printk(ap, KERN_ERR, "(%s)\n", desc);
}
+ if (ehc->i.serror)
+ ata_port_printk(ap, KERN_ERR,
+ "SError: {%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s}\n",
+ ehc->i.serror & SERR_DATA_RECOVERED ? "RecovDataErr " : "",
+ ehc->i.serror & SERR_COMM_RECOVERED ? "RecovCommErr " : "",
+ ehc->i.serror & SERR_DATA ? "UnrecovDataErr " : "",
+ ehc->i.serror & SERR_PERSISTENT ? "PersistErr " : "",
+ ehc->i.serror & SERR_PROTOCOL ? "ProtocolErr " : "",
+ ehc->i.serror & SERR_INTERNAL ? "HostInternalErr " : "",
+ ehc->i.serror & SERR_PHYRDY_CHG ? "PHYRdyChg " : "",
+ ehc->i.serror & SERR_PHY_INT_ERR ? "PHYInternalErr " : "",
+ ehc->i.serror & SERR_COMM_WAKE ? "CommWake " : "",
+ ehc->i.serror & SERR_10B_8B_ERR ? "10B8BErr " : "",
+ ehc->i.serror & SERR_DISPARITY ? "Disparity " : "",
+ ehc->i.serror & SERR_CRC ? "CRCErr " : "",
+ ehc->i.serror & SERR_HANDSHAKE ? "HandshakeErr " : "",
+ ehc->i.serror & SERR_LINK_SEQ_ERR ? "LinkSeqErr " : "",
+ ehc->i.serror & SERR_TRANS_ST_ERROR ? "TransStatTransErr " : "",
+ ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecogFIS " : "",
+ ehc->i.serror & SERR_DEV_XCHG ? "DevExchanged " : "" );
+
for (tag = 0; tag < ATA_MAX_QUEUE; tag++) {
static const char *dma_str[] = {
[DMA_BIDIRECTIONAL] = "bidi",
@@ -1552,6 +1573,29 @@ static void ata_eh_report(struct ata_por
res->hob_feature, res->hob_nsect,
res->hob_lbal, res->hob_lbam, res->hob_lbah,
res->device, qc->err_mask, ata_err_string(qc->err_mask));
+
+ if (res->command & (ATA_BUSY | ATA_DRDY | ATA_DF | ATA_DRQ |
+ ATA_ERR) ) {
+ if (res->command & ATA_BUSY)
+ ata_dev_printk(qc->dev, KERN_ERR,
+ "status: {Busy}\n" );
+ else
+ ata_dev_printk(qc->dev, KERN_ERR,
+ "status: {%s%s%s%s}\n",
+ res->command & ATA_DRDY ? "DRDY " : "",
+ res->command & ATA_DF ? "DF " : "",
+ res->command & ATA_DRQ ? "DRQ " : "",
+ res->command & ATA_ERR ? "ERR " : "" );
+ }
+
+ if (cmd->command != ATA_CMD_PACKET &&
+ (res->feature & (ATA_ICRC | ATA_UNC | ATA_IDNF | ATA_ABORTED)))
+ ata_dev_printk(qc->dev, KERN_ERR,
+ "error: {%s%s%s%s}\n",
+ res->feature & ATA_ICRC ? "ICRC " : "",
+ res->feature & ATA_UNC ? "UNC " : "",
+ res->feature & ATA_IDNF ? "IDNF " : "",
+ res->feature & ATA_ABORTED ? "ABRT " : "" );
}
}
--- linux-2.6.21.1/include/linux/ata.h 2007-04-27 15:49:26.000000000 -0600
+++ linux-2.6.21.1edit/include/linux/ata.h 2007-05-09 19:25:54.000000000 -0600
@@ -223,6 +223,15 @@ enum {
SERR_PROTOCOL = (1 << 10), /* protocol violation */
SERR_INTERNAL = (1 << 11), /* host internal error */
SERR_PHYRDY_CHG = (1 << 16), /* PHY RDY changed */
+ SERR_PHY_INT_ERR = (1 << 17), /* PHY internal error */
+ SERR_COMM_WAKE = (1 << 18), /* Comm wake */
+ SERR_10B_8B_ERR = (1 << 19), /* 10b to 8b decode error */
+ SERR_DISPARITY = (1 << 20), /* Disparity */
+ SERR_CRC = (1 << 21), /* CRC error */
+ SERR_HANDSHAKE = (1 << 22), /* Handshake error */
+ SERR_LINK_SEQ_ERR = (1 << 23), /* Link sequence error */
+ SERR_TRANS_ST_ERROR = (1 << 24), /* Transport state transition error */
+ SERR_UNRECOG_FIS = (1 << 25), /* Unrecognized FIS */
SERR_DEV_XCHG = (1 << 26), /* device exchanged */
/* struct ata_taskfile flags */
Robert Hancock wrote:
> This adds human-readable decoding of the ATA status and error registers
> (similar
> to what drivers/ide does) as well as the SATA Serror register to libata
> error
> handling output. This prevents the need to pore through standards documents
> to figure out the meaning of the bits in these registers when looking at
> error
> reports. Some bits that drivers/ide decoded are not decoded here, since
> the bits
> are either command-dependent or obsolete, and properly parsing them
> would add
> too much complexity.
>
> Signed-off-by: Robert Hancock <[email protected]>
>
> --- linux-2.6.21.1/drivers/ata/libata-eh.c 2007-04-27
> 15:49:26.000000000 -0600
> +++ linux-2.6.21.1edit/drivers/ata/libata-eh.c 2007-05-09
> 19:47:53.000000000 -0600
> @@ -1523,6 +1523,27 @@ static void ata_eh_report(struct ata_por
> ata_port_printk(ap, KERN_ERR, "(%s)\n", desc);
> }
>
> + if (ehc->i.serror)
> + ata_port_printk(ap, KERN_ERR,
> + "SError: {%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s}\n",
> + ehc->i.serror & SERR_DATA_RECOVERED ? "RecovDataErr " : "",
> + ehc->i.serror & SERR_COMM_RECOVERED ? "RecovCommErr " : "",
> + ehc->i.serror & SERR_DATA ? "UnrecovDataErr " : "",
> + ehc->i.serror & SERR_PERSISTENT ? "PersistErr " : "",
> + ehc->i.serror & SERR_PROTOCOL ? "ProtocolErr " : "",
> + ehc->i.serror & SERR_INTERNAL ? "HostInternalErr " : "",
> + ehc->i.serror & SERR_PHYRDY_CHG ? "PHYRdyChg " : "",
> + ehc->i.serror & SERR_PHY_INT_ERR ? "PHYInternalErr " : "",
> + ehc->i.serror & SERR_COMM_WAKE ? "CommWake " : "",
> + ehc->i.serror & SERR_10B_8B_ERR ? "10B8BErr " : "",
> + ehc->i.serror & SERR_DISPARITY ? "Disparity " : "",
> + ehc->i.serror & SERR_CRC ? "CRCErr " : "",
> + ehc->i.serror & SERR_HANDSHAKE ? "HandshakeErr " : "",
> + ehc->i.serror & SERR_LINK_SEQ_ERR ? "LinkSeqErr " : "",
> + ehc->i.serror & SERR_TRANS_ST_ERROR ? "TransStatTransErr " : "",
> + ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecogFIS " : "",
> + ehc->i.serror & SERR_DEV_XCHG ? "DevExchanged " : "" );
I'm not really convinced whether this is necessary. The human readable
form is also a bit cryptic and can get quite long. So, mild NACK from me.
--
tejun
Tejun Heo wrote:
> Robert Hancock wrote:
>> This adds human-readable decoding of the ATA status and error registers
>> (similar
>> to what drivers/ide does) as well as the SATA Serror register to libata
...
> I'm not really convinced whether this is necessary. The human readable
> form is also a bit cryptic and can get quite long. So, mild NACK from me.
Same here, but I would like to see it in there under a CONFIG_DEBUG_LIBATA
kernel build option or something. Kind of like the "FANCY_STATUS_DUMPS" flag
that drivers/ide used to have for this kind of stuff.
Cheers
Mark Lord wrote:
> Same here, but I would like to see it in there under a CONFIG_DEBUG_LIBATA
> kernel build option or something. Kind of like the "FANCY_STATUS_DUMPS"
> flag
> that drivers/ide used to have for this kind of stuff.
The long term goal is to enable verbose output with a module option
and/or sysfs knob, rather than a compile-time switch.
That's what ata_msg_xxx is for. We just don't have the knob yet.
Patches welcome...
Jeff
Jeff Garzik wrote:
> Mark Lord wrote:
>> Same here, but I would like to see it in there under a
>> CONFIG_DEBUG_LIBATA
>> kernel build option or something. Kind of like the
>> "FANCY_STATUS_DUMPS" flag
>> that drivers/ide used to have for this kind of stuff.
>
>
> The long term goal is to enable verbose output with a module option
> and/or sysfs knob, rather than a compile-time switch.
If we're compiling the messages into the kernel regardless,
then it doesn't really make much sense to NOT show all of them
on the error paths.
This stuff (fancy status dumps) is mostly just for the error paths,
where more information is always a good thing.
Controlling the rest of the ata_msg_xxx stuff from sysctl is fine
for non-error paths.
Cheers
Mark Lord wrote:
> If we're compiling the messages into the kernel regardless,
> then it doesn't really make much sense to NOT show all of them
> on the error paths.
Not true. Uncontrolled message spewage inevitably results in critical
information scrolling off the screen, before a user can take a digital
photo of the output... Or of users being confused by subsequent error
fallout (i.e. multiple oopses reporting problem).
Moderation and restraint still have roles to play... :)
Jeff
Tejun Heo wrote:
>> + if (ehc->i.serror)
>> + ata_port_printk(ap, KERN_ERR,
>> + "SError: {%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s}\n",
>> + ehc->i.serror & SERR_DATA_RECOVERED ? "RecovDataErr " : "",
>> + ehc->i.serror & SERR_COMM_RECOVERED ? "RecovCommErr " : "",
>> + ehc->i.serror & SERR_DATA ? "UnrecovDataErr " : "",
>> + ehc->i.serror & SERR_PERSISTENT ? "PersistErr " : "",
>> + ehc->i.serror & SERR_PROTOCOL ? "ProtocolErr " : "",
>> + ehc->i.serror & SERR_INTERNAL ? "HostInternalErr " : "",
>> + ehc->i.serror & SERR_PHYRDY_CHG ? "PHYRdyChg " : "",
>> + ehc->i.serror & SERR_PHY_INT_ERR ? "PHYInternalErr " : "",
>> + ehc->i.serror & SERR_COMM_WAKE ? "CommWake " : "",
>> + ehc->i.serror & SERR_10B_8B_ERR ? "10B8BErr " : "",
>> + ehc->i.serror & SERR_DISPARITY ? "Disparity " : "",
>> + ehc->i.serror & SERR_CRC ? "CRCErr " : "",
>> + ehc->i.serror & SERR_HANDSHAKE ? "HandshakeErr " : "",
>> + ehc->i.serror & SERR_LINK_SEQ_ERR ? "LinkSeqErr " : "",
>> + ehc->i.serror & SERR_TRANS_ST_ERROR ? "TransStatTransErr " : "",
>> + ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecogFIS " : "",
>> + ehc->i.serror & SERR_DEV_XCHG ? "DevExchanged " : "" );
>
> I'm not really convinced whether this is necessary. The human readable
> form is also a bit cryptic and can get quite long. So, mild NACK from me.
>
It certainly seems useful when debugging hotplug issues or random SATA
problems which end up being caused by communication problems. Without
this output, Joe User stands no chance of figuring out what's going on,
and neither does Joe libata Developer unless they really care to dig
through the spec and count bits to figure out what they mean. At least
with this you can see that there was a CRC error, etc. and go from that..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
Jeff Garzik wrote:
> Mark Lord wrote:
>> If we're compiling the messages into the kernel regardless,
>> then it doesn't really make much sense to NOT show all of them
>> on the error paths.
>
>
> Not true. Uncontrolled message spewage inevitably results in critical
> information scrolling off the screen, before a user can take a digital
> photo of the output... Or of users being confused by subsequent error
> fallout (i.e. multiple oopses reporting problem).
>
> Moderation and restraint still have roles to play... :)
>
> Jeff
I don't think this is as big of a deal here as in other cases, like oops
output. With libata errors, if they're at the console (which they'd have
to be to see these messages), unless something has actually caused a
panic the scrollback buffer should still be functional and they'd be
able to see the entire output..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
Robert Hancock wrote:
> I don't think this is as big of a deal here as in other cases, like oops
> output. With libata errors, if they're at the console (which they'd have
> to be to see these messages), unless something has actually caused a
> panic the scrollback buffer should still be functional and they'd be
> able to see the entire output..
Scrollback rarely works as planned, for me. Overall, a balance must be
found.
More information is more helpful. But.
There are downsides to spewing everything possible, upon error. You
cause logging to the possibly problematic disk, you push older messages
out of the printk ring buffer, etc., etc.
Jeff
> Scrollback rarely works as planned, for me. Overall, a balance must be
> found.
>
> More information is more helpful. But.
>
> There are downsides to spewing everything possible, upon error. You
> cause logging to the possibly problematic disk, you push older messages
> out of the printk ring buffer, etc., etc.
Get yourself a Voodoo5 or similar card cheap off ebay. The firmware on
most of them doesn't clear the top 30MB of RAM on a reboot/PCI reset
which makes them excellent debug buffers providing you empty the buffer
before you run the X server.
Alan
Robert Hancock wrote:
>>> + ehc->i.serror & SERR_TRANS_ST_ERROR ? "TransStatTransErr "
>>> : "",
>>> + ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecogFIS " : "",
>>> + ehc->i.serror & SERR_DEV_XCHG ? "DevExchanged " : "" );
>>
>> I'm not really convinced whether this is necessary. The human readable
>> form is also a bit cryptic and can get quite long. So, mild NACK from
>> me.
>>
>
> It certainly seems useful when debugging hotplug issues or random SATA
> problems which end up being caused by communication problems. Without
> this output, Joe User stands no chance of figuring out what's going on,
> and neither does Joe libata Developer unless they really care to dig
> through the spec and count bits to figure out what they mean. At least
> with this you can see that there was a CRC error, etc. and go from that..
>
Why not just document the error messages?
And the scsi ones too, I can't seem to find what the sense codes mean.
Chuck Ebbert wrote:
> Robert Hancock wrote:
>>>> + ehc->i.serror & SERR_TRANS_ST_ERROR ? "TransStatTransErr "
>>>> : "",
>>>> + ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecogFIS " : "",
>>>> + ehc->i.serror & SERR_DEV_XCHG ? "DevExchanged " : "" );
>>> I'm not really convinced whether this is necessary. The human readable
>>> form is also a bit cryptic and can get quite long. So, mild NACK from
>>> me.
>>>
>> It certainly seems useful when debugging hotplug issues or random SATA
>> problems which end up being caused by communication problems. Without
>> this output, Joe User stands no chance of figuring out what's going on,
>> and neither does Joe libata Developer unless they really care to dig
>> through the spec and count bits to figure out what they mean. At least
>> with this you can see that there was a CRC error, etc. and go from that..
>>
>
> Why not just document the error messages?
>
> And the scsi ones too, I can't seem to find what the sense codes mean.
They are well documented elsewhere - the standard documents. For sense
codes, t10.org. For SError bits, t13.org. You can get drafts free of
charge.
--
tejun
Tejun Heo wrote:
> Chuck Ebbert wrote:
>> Robert Hancock wrote:
>>>>> + ehc->i.serror & SERR_TRANS_ST_ERROR ? "TransStatTransErr "
>>>>> : "",
>>>>> + ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecogFIS " : "",
>>>>> + ehc->i.serror & SERR_DEV_XCHG ? "DevExchanged " : "" );
>>>> I'm not really convinced whether this is necessary. The human readable
>>>> form is also a bit cryptic and can get quite long. So, mild NACK from
>>>> me.
>>>>
>>> It certainly seems useful when debugging hotplug issues or random SATA
>>> problems which end up being caused by communication problems. Without
>>> this output, Joe User stands no chance of figuring out what's going on,
>>> and neither does Joe libata Developer unless they really care to dig
>>> through the spec and count bits to figure out what they mean. At least
>>> with this you can see that there was a CRC error, etc. and go from that..
>>>
>> Why not just document the error messages?
>>
>> And the scsi ones too, I can't seem to find what the sense codes mean.
>
> They are well documented elsewhere - the standard documents. For sense
> codes, t10.org. For SError bits, t13.org. You can get drafts free of
> charge.
The ATA ones are more of a pain in that regard than SCSI though - SCSI
has all distinct error codes for different errors, whereas ATA has
bitmasks for everything..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
Robert Hancock wrote:
> The ATA ones are more of a pain in that regard than SCSI though - SCSI
> has all distinct error codes for different errors, whereas ATA has
> bitmasks for everything..
That should not affect implementation. Either way, a table-driven
approach can easily work.
I favor decoding the SError status bits, but your names were far too
long. "ProtocolErr" should be "Proto". "10B8BErr" should be "10b8b".
HostInternalErr to HostInt. PHYInternalErr to PHYInt. etc.
Jeff