2007-06-06 05:28:54

by Florin Iucha

[permalink] [raw]
Subject: spurious completions during NCQ?

Hello,

I was working on a I/O heavy workload (parsing 100K spam messages to
extract certain structures) when I got this in the kernel log:

[ 2320.132893] ata1.00: exception Emask 0x2 SAct 0x701fffff SErr 0x0 action 0x2 frozen
[ 2320.132899] ata1.00: (spurious completions during NCQ issue=0x0 SAct=0x701fffff FIS=005040a1:08000000)
[ 2320.132905] ata1.00: cmd 61/10:00:59:fc:d0/00:00:07:00:00/40 tag 0 cdb 0x0 data 8192 out
[ 2320.132906] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132911] ata1.00: cmd 61/10:08:69:fc:d0/00:00:07:00:00/40 tag 1 cdb 0x0 data 8192 out
[ 2320.132913] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132918] ata1.00: cmd 61/08:10:19:4c:d1/00:00:07:00:00/40 tag 2 cdb 0x0 data 4096 out
[ 2320.132919] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132924] ata1.00: cmd 61/01:18:fb:27:0f/00:00:08:00:00/40 tag 3 cdb 0x0 data 512 out
[ 2320.132925] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132930] ata1.00: cmd 61/08:20:11:28:0f/00:00:08:00:00/40 tag 4 cdb 0x0 data 4096 out
[ 2320.132932] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132937] ata1.00: cmd 61/08:28:91:92:17/00:00:08:00:00/40 tag 5 cdb 0x0 data 4096 out
[ 2320.132938] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132943] ata1.00: cmd 61/08:30:99:b3:17/00:00:08:00:00/40 tag 6 cdb 0x0 data 4096 out
[ 2320.132944] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132949] ata1.00: cmd 61/01:38:5b:56:4b/00:00:05:00:00/40 tag 7 cdb 0x0 data 512 out
[ 2320.132950] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132956] ata1.00: cmd 61/08:40:71:56:4b/00:00:05:00:00/40 tag 8 cdb 0x0 data 4096 out
[ 2320.132957] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132962] ata1.00: cmd 61/08:48:09:cf:5a/00:00:05:00:00/40 tag 9 cdb 0x0 data 4096 out
[ 2320.132963] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132968] ata1.00: cmd 61/01:50:3b:af:8b/00:00:05:00:00/40 tag 10 cdb 0x0 data 512 out
[ 2320.132969] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132974] ata1.00: cmd 61/08:58:51:af:8b/00:00:05:00:00/40 tag 11 cdb 0x0 data 4096 out
[ 2320.132976] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132981] ata1.00: cmd 61/08:60:d9:f1:8d/00:00:05:00:00/40 tag 12 cdb 0x0 data 4096 out
[ 2320.132982] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132987] ata1.00: cmd 61/08:68:49:bd:8e/00:00:05:00:00/40 tag 13 cdb 0x0 data 4096 out
[ 2320.132988] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.132993] ata1.00: cmd 61/08:70:d9:63:d1/00:00:05:00:00/40 tag 14 cdb 0x0 data 4096 out
[ 2320.132995] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.133000] ata1.00: cmd 61/08:78:91:39:0f/00:00:06:00:00/40 tag 15 cdb 0x0 data 4096 out
[ 2320.133001] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.133006] ata1.00: cmd 61/08:80:49:30:97/00:00:06:00:00/40 tag 16 cdb 0x0 data 4096 out
[ 2320.133007] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.133013] ata1.00: cmd 61/08:88:41:c0:d0/00:00:06:00:00/40 tag 17 cdb 0x0 data 4096 out
[ 2320.133014] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.133019] ata1.00: cmd 61/08:90:b9:8c:d1/00:00:06:00:00/40 tag 18 cdb 0x0 data 4096 out
[ 2320.133020] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.133025] ata1.00: cmd 61/01:98:1b:cf:ce/00:00:07:00:00/40 tag 19 cdb 0x0 data 512 out
[ 2320.133027] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.133032] ata1.00: cmd 61/08:a0:31:cf:ce/00:00:07:00:00/40 tag 20 cdb 0x0 data 4096 out
[ 2320.133033] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.133038] ata1.00: cmd 61/10:e0:41:af:8b/00:00:05:00:00/40 tag 28 cdb 0x0 data 8192 out
[ 2320.133039] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.133044] ata1.00: cmd 61/01:e8:ba:12:8d/00:00:06:00:00/40 tag 29 cdb 0x0 data 512 out
[ 2320.133046] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.133051] ata1.00: cmd 61/10:f0:c1:12:8d/00:00:06:00:00/40 tag 30 cdb 0x0 data 8192 out
[ 2320.133052] res 50/00:08:31:cf:ce/00:00:07:00:00/40 Emask 0x2 (HSM violation)
[ 2320.378387] ata1: soft resetting port
[ 2320.442169] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2320.460012] ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
[ 2320.461395] ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
[ 2320.461402] ata1.00: configured for UDMA/100
[ 2320.461460] ata1: EH complete
[ 2320.461868] SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
[ 2320.462012] sda: Write Protect is off
[ 2320.462016] sda: Mode Sense: 00 3a 00 00
[ 2320.462203] SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

This is on a Thinkpad T60 with 2 GB RAM, running Ubuntu 7.04 (kernel
2.6.20-16-generic). No proprietary drivers (ok, maybe the Intel
Wi-Fi - but that should not count).

The laptop came with Windows but I blew that away - did I mess some
thing up regarding HPA and its ilk?

Thanks,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (5.60 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-06-06 14:28:17

by Robert Hancock

[permalink] [raw]
Subject: Re: spurious completions during NCQ?

Florin Iucha wrote:
> Hello,
>
> I was working on a I/O heavy workload (parsing 100K spam messages to
> extract certain structures) when I got this in the kernel log:

..

>
> This is on a Thinkpad T60 with 2 GB RAM, running Ubuntu 7.04 (kernel
> 2.6.20-16-generic). No proprietary drivers (ok, maybe the Intel
> Wi-Fi - but that should not count).
>
> The laptop came with Windows but I blew that away - did I mess some
> thing up regarding HPA and its ilk?

It means the drive reported command tags were completed that were not
outstanding. What kind of drive is this?

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-06-07 12:04:39

by Florin Iucha

[permalink] [raw]
Subject: Re: spurious completions during NCQ?

On Wed, Jun 06, 2007 at 08:28:07AM -0600, Robert Hancock wrote:
> >This is on a Thinkpad T60 with 2 GB RAM, running Ubuntu 7.04 (kernel
> >2.6.20-16-generic). No proprietary drivers (ok, maybe the Intel
> >Wi-Fi - but that should not count).
> >
> >The laptop came with Windows but I blew that away - did I mess some
> >thing up regarding HPA and its ilk?
>
> It means the drive reported command tags were completed that were not
> outstanding. What kind of drive is this?

[ 29.033142] ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156
301488
[ 29.033146] ata1.00: ATA-7: HITACHI HTS541680J9SA00, SB2IC7EP, max UDMA/100
[ 29.033149] ata1.00: 156301488 sectors, multi 16: LBA48 NCQ (depth 31/32)
[ 29.034230] ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156
301488

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (918.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-06-08 12:56:19

by Tejun Heo

[permalink] [raw]
Subject: Re: spurious completions during NCQ?

Florin Iucha wrote:
> On Wed, Jun 06, 2007 at 08:28:07AM -0600, Robert Hancock wrote:
>>> This is on a Thinkpad T60 with 2 GB RAM, running Ubuntu 7.04 (kernel
>>> 2.6.20-16-generic). No proprietary drivers (ok, maybe the Intel
>>> Wi-Fi - but that should not count).
>>>
>>> The laptop came with Windows but I blew that away - did I mess some
>>> thing up regarding HPA and its ilk?
>> It means the drive reported command tags were completed that were not
>> outstanding. What kind of drive is this?
>
> [ 29.033142] ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156
> 301488
> [ 29.033146] ata1.00: ATA-7: HITACHI HTS541680J9SA00, SB2IC7EP, max UDMA/100
> [ 29.033149] ata1.00: 156301488 sectors, multi 16: LBA48 NCQ (depth 31/32)
> [ 29.034230] ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156
> 301488

Please report the result of 'hdparm -I /dev/sda'. We'll probably have
to blacklist it for NCQ.

--
tejun

2007-06-08 13:12:33

by Florin Iucha

[permalink] [raw]
Subject: Re: spurious completions during NCQ?

On Fri, Jun 08, 2007 at 09:55:58PM +0900, Tejun Heo wrote:
> Florin Iucha wrote:
> >> It means the drive reported command tags were completed that were not
> >> outstanding. What kind of drive is this?
> >
> > [ 29.033142] ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156
> > 301488
> > [ 29.033146] ata1.00: ATA-7: HITACHI HTS541680J9SA00, SB2IC7EP, max UDMA/100
> > [ 29.033149] ata1.00: 156301488 sectors, multi 16: LBA48 NCQ (depth 31/32)
> > [ 29.034230] ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156
> > 301488
>
> Please report the result of 'hdparm -I /dev/sda'. We'll probably have
> to blacklist it for NCQ.

Here it is: I only jumbled the serial number a bit.

/dev/sda:

ATA device, with non-removable media
Model Number: HITACHI HTS541680J9SA00
Serial Number: SB******I57L4A
Firmware Revision: SB2IC7EP
Standards:
Used: ATA/ATAPI-7 T13 1532D revision 1
Supported: 7 6 5 4
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 156301488
LBA48 user addressable sectors: 156301488
device size with M = 1024*1024: 76319 MBytes
device size with M = 1000*1000: 80026 MBytes (80 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Vendor, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: 128 (0x80)
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
* Advanced Power Management feature set
SET_MAX security extension
Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
* IDLE_IMMEDIATE with UNLOAD
* SATA-I signaling speed (1.5Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
Non-Zero buffer offsets in DMA Setup FIS
DMA Setup Auto-Activate optimization
Device-initiated interface power management
In-order data delivery
* Software settings preservation
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
not supported: enhanced erase
42min for SECURITY ERASE UNIT.
Checksum: correct

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (3.14 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-06-08 13:21:17

by Tejun Heo

[permalink] [raw]
Subject: [PATCH] libata: disable NCQ for HITACHI HTS541680J9SA00/SB21C7EP

HITACHI HTS541680J9SA00/SB21C7EP spuriously completes NCQ commands.
Blacklist it for NCQ.

Signed-off-by: Tejun Heo <[email protected]>
---
drivers/ata/libata-core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 636df4e..880cdd3 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3787,6 +3787,8 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
{ "HTS541060G9SA00", "MB3OC60D", ATA_HORKAGE_NONCQ, },
{ "HTS541080G9SA00", "MB4OC60D", ATA_HORKAGE_NONCQ, },
{ "HTS541010G9SA00", "MBZOC60D", ATA_HORKAGE_NONCQ, },
+ /* Drives which do spurious command completion */
+ { "HTS541680J9SA00", "SB2IC7EP", ATA_HORKAGE_NONCQ, },

/* Devices with NCQ limits */

2007-06-08 15:23:29

by Alan

[permalink] [raw]
Subject: Re: spurious completions during NCQ?


> Model Number: HITACHI HTS541680J9SA00
> Serial Number: SB******I57L4A

This one is already blacklisted in the windows drivers.

2007-06-08 15:24:25

by Alan

[permalink] [raw]
Subject: Re: [PATCH] libata: disable NCQ for HITACHI HTS541680J9SA00/SB21C7EP

On Fri, 8 Jun 2007 22:20:59 +0900
Tejun Heo <[email protected]> wrote:

> HITACHI HTS541680J9SA00/SB21C7EP spuriously completes NCQ commands.
> Blacklist it for NCQ.
>
> Signed-off-by: Tejun Heo <[email protected]>

Acked-by: Alan Cox <[email protected]>

2007-06-10 02:42:40

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] libata: disable NCQ for HITACHI HTS541680J9SA00/SB21C7EP

Tejun Heo wrote:
> HITACHI HTS541680J9SA00/SB21C7EP spuriously completes NCQ commands.
> Blacklist it for NCQ.
>
> Signed-off-by: Tejun Heo <[email protected]>
> ---
> drivers/ata/libata-core.c | 2 ++
> 1 file changed, 2 insertions(+)

applied


2007-06-15 08:01:11

by Massimiliano Hofer

[permalink] [raw]
Subject: Re: spurious completions during NCQ?

On Friday 8 June 2007, Alan Cox wrote:

> > Model Number: HITACHI HTS541680J9SA00
> > Serial Number: SB******I57L4A
>
> This one is already blacklisted in the windows drivers.

I have the same problem with this drive (shipped with a Thinkpad Z61m):

Model Number: HITACHI HTS541612J9SA00
Firmware Revision: SBDIC7JP

It reported several dozen spurious completions over a few days, but I can't
find any corruption. Is it really dangerous? I understand this clearly is a
firmware bug, but I'm not sure how to interpret the drive behavior.
I can perform any test you like, if you find it useful.

--
Saluti,
Massimiliano Hofer

2007-06-18 05:30:18

by Tejun Heo

[permalink] [raw]
Subject: Re: spurious completions during NCQ?

Massimiliano Hofer wrote:
> On Friday 8 June 2007, Alan Cox wrote:
>
>>> Model Number: HITACHI HTS541680J9SA00
>>> Serial Number: SB******I57L4A
>> This one is already blacklisted in the windows drivers.
>
> I have the same problem with this drive (shipped with a Thinkpad Z61m):
>
> Model Number: HITACHI HTS541612J9SA00
> Firmware Revision: SBDIC7JP
>
> It reported several dozen spurious completions over a few days, but I can't
> find any corruption. Is it really dangerous? I understand this clearly is a
> firmware bug, but I'm not sure how to interpret the drive behavior.
> I can perform any test you like, if you find it useful.

It has _slight_ change of being dangerous and is just in violation of
the protocol. No real data corruption case reported yet tho. Till now
most drives which report such problems had stinky NCQ implementation
anyway - slow, buggy, blacklisted in the other OS, etc... I'll submit a
patch to blacklist the drive.

Thanks.

--
tejun