2008-02-15 14:29:42

by Hugo Mills

[permalink] [raw]
Subject: Spurious completions during NCQ

I'm getting these on my Dell Latitude D830:

Feb 15 13:06:00 willow kernel: ata1.00: exception Emask 0x2 SAct 0x4 SErr 0x0 action 0x2 frozen
Feb 15 13:06:00 willow kernel: ata1.00: spurious completions during NCQ issue=0x0 SAct=0x4 FIS=004040a1:00000002
Feb 15 13:06:00 willow kernel: ata1.00: cmd 61/10:10:26:fb:c4/00:00:02:00:00/40 tag 2 cdb 0x0 data 8192 out
Feb 15 13:06:00 willow kernel: res 40/00:10:26:fb:c4/00:00:02:00:00/40 Emask 0x2 (HSM violation)
Feb 15 13:06:00 willow kernel: ata1: soft resetting port
Feb 15 13:06:00 willow kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 15 13:06:00 willow kernel: ata1.00: configured for UDMA/133
Feb 15 13:06:00 willow kernel: ata1: EH complete
Feb 15 13:06:00 willow kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Feb 15 13:06:00 willow kernel: sd 0:0:0:0: [sda] Write Protect is off
Feb 15 13:06:00 willow kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Feb 15 13:06:00 willow kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

In some cases, there are several cmd/res lines listed. It's
happening about once an hour or so (not correlated with any other
event that I can see). It doesn't seem to be affecting operation of
the machine, but it's making me nervous.

Can anyone set my mind at rest? (Or suggest a fix?)

uname -a reports:
Linux willow 2.6.23.1-hrt3 #1 SMP Sun Nov 4 14:51:20 GMT 2007 x86_64 GNU/Linux

It's a kernel.org kernel with the patch for tickless operation on
amd64.

hdparm -i reports:

/dev/sda:

Model=ST9160823AS , FwRev=3.ADC , SerialNo= 5NK0C448
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=?8?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
AdvancedPM=yes: unknown setting WriteCache=enabled
Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7

* signifies the current active mode


Hugo.

--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- w.w.w. : England's batting scorecard ---


Attachments:
(No filename) (2.45 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-02-15 15:00:27

by Calvin Walton

[permalink] [raw]
Subject: Re: Spurious completions during NCQ

On Fri, 2008-02-15 at 13:46 +0000, Hugo Mills wrote:
> I'm getting these on my Dell Latitude D830:
>
> Feb 15 13:06:00 willow kernel: ata1.00: exception Emask 0x2 SAct 0x4 SErr 0x0 action 0x2 frozen
> Feb 15 13:06:00 willow kernel: ata1.00: spurious completions during NCQ issue=0x0 SAct=0x4 FIS=004040a1:00000002

> In some cases, there are several cmd/res lines listed. It's
> happening about once an hour or so (not correlated with any other
> event that I can see). It doesn't seem to be affecting operation of
> the machine, but it's making me nervous.
>
> Can anyone set my mind at rest? (Or suggest a fix?)

You didn't mention which SATA chipset your laptop has, but some quick
googling says that it's AHCI. Until 2.6.24, the AHCI driver has a
problem where it'll report superious NCQ completions due to a bug in the
driver logic.

> uname -a reports:
> Linux willow 2.6.23.1-hrt3 #1 SMP Sun Nov 4 14:51:20 GMT 2007 x86_64 GNU/Linux

The fix is simple, upgrade your kernel to 2.6.24 :)

> It's a kernel.org kernel with the patch for tickless operation on
> amd64.

Handily, the 2.6.24 kernel.org kernel includes amd64 tickless support
already.

--
Calvin Walton <[email protected]>

2008-02-15 15:29:41

by Hugo Mills

[permalink] [raw]
Subject: Re: Spurious completions during NCQ

On Fri, Feb 15, 2008 at 10:00:00AM -0500, Calvin Walton wrote:
> On Fri, 2008-02-15 at 13:46 +0000, Hugo Mills wrote:
> > I'm getting these on my Dell Latitude D830:
> >
> > Feb 15 13:06:00 willow kernel: ata1.00: exception Emask 0x2 SAct 0x4 SErr 0x0 action 0x2 frozen
> > Feb 15 13:06:00 willow kernel: ata1.00: spurious completions during NCQ issue=0x0 SAct=0x4 FIS=004040a1:00000002
>
> > In some cases, there are several cmd/res lines listed. It's
> > happening about once an hour or so (not correlated with any other
> > event that I can see). It doesn't seem to be affecting operation of
> > the machine, but it's making me nervous.
> >
> > Can anyone set my mind at rest? (Or suggest a fix?)
>
> You didn't mention which SATA chipset your laptop has, but some quick
> googling says that it's AHCI. Until 2.6.24, the AHCI driver has a
> problem where it'll report superious NCQ completions due to a bug in the
> driver logic.
>
> > uname -a reports:
> > Linux willow 2.6.23.1-hrt3 #1 SMP Sun Nov 4 14:51:20 GMT 2007 x86_64 GNU/Linux
>
> The fix is simple, upgrade your kernel to 2.6.24 :)

Excellent. Thank you for clearing this up for me. I'll head off and
do the upgrade now.

Hugo.

--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- All mushrooms are edible, but some are only edible once. ---


Attachments:
(No filename) (1.40 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-02-15 15:37:22

by Michael Tokarev

[permalink] [raw]
Subject: Re: Spurious completions during NCQ

Hugo Mills wrote:
> On Fri, Feb 15, 2008 at 10:00:00AM -0500, Calvin Walton wrote:
>> On Fri, 2008-02-15 at 13:46 +0000, Hugo Mills wrote:
>>> I'm getting these on my Dell Latitude D830:
>>>
>>> Feb 15 13:06:00 willow kernel: ata1.00: exception Emask 0x2 SAct 0x4 SErr 0x0 action 0x2 frozen
>>> Feb 15 13:06:00 willow kernel: ata1.00: spurious completions during NCQ issue=0x0 SAct=0x4 FIS=004040a1:00000002
>>> In some cases, there are several cmd/res lines listed. It's
>>> happening about once an hour or so (not correlated with any other
>>> event that I can see). It doesn't seem to be affecting operation of
>>> the machine, but it's making me nervous.

JFYI: Most probably it is correlated with smartd asking
the device for it's SMART status.

/mjt

2008-02-16 14:08:31

by Mark Lord

[permalink] [raw]
Subject: Re: Spurious completions during NCQ

Hugo Mills wrote:
> I'm getting these on my Dell Latitude D830:
>
> Feb 15 13:06:00 willow kernel: ata1.00: exception Emask 0x2 SAct 0x4 SErr 0x0 action 0x2 frozen
> Feb 15 13:06:00 willow kernel: ata1.00: spurious completions during NCQ issue=0x0 SAct=0x4 FIS=004040a1:00000002
> Feb 15 13:06:00 willow kernel: ata1.00: cmd 61/10:10:26:fb:c4/00:00:02:00:00/40 tag 2 cdb 0x0 data 8192 out
> Feb 15 13:06:00 willow kernel: res 40/00:10:26:fb:c4/00:00:02:00:00/40 Emask 0x2 (HSM violation)
> Feb 15 13:06:00 willow kernel: ata1: soft resetting port
> Feb 15 13:06:00 willow kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> Feb 15 13:06:00 willow kernel: ata1.00: configured for UDMA/133
> Feb 15 13:06:00 willow kernel: ata1: EH complete
> Feb 15 13:06:00 willow kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
> Feb 15 13:06:00 willow kernel: sd 0:0:0:0: [sda] Write Protect is off
> Feb 15 13:06:00 willow kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> Feb 15 13:06:00 willow kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>
> In some cases, there are several cmd/res lines listed. It's
> happening about once an hour or so (not correlated with any other
> event that I can see). It doesn't seem to be affecting operation of
> the machine, but it's making me nervous.
>
> Can anyone set my mind at rest? (Or suggest a fix?)
..

Tejun, have the spurious completion fixes been backported
to 2.6.23 / 2.6.22 yet ? Those kernels will be in common use
for some time to come, and this fix is more or less essential.

???

2008-03-06 05:56:50

by Tejun Heo

[permalink] [raw]
Subject: Re: Spurious completions during NCQ

Mark Lord wrote:
> Tejun, have the spurious completion fixes been backported
> to 2.6.23 / 2.6.22 yet ? Those kernels will be in common use
> for some time to come, and this fix is more or less essential.

Not that I know of. I'm prepping patches now.

Thanks.

--
tejun

2008-03-06 06:05:27

by Tejun Heo

[permalink] [raw]
Subject: Re: Spurious completions during NCQ

Tejun Heo wrote:
> Mark Lord wrote:
>> Tejun, have the spurious completion fixes been backported
>> to 2.6.23 / 2.6.22 yet ? Those kernels will be in common use
>> for some time to come, and this fix is more or less essential.
>
> Not that I know of. I'm prepping patches now.
>

Oh... it already happened in 2.6.22.15 and 2.6.23.10. Nice.

--
tejun