2009-10-26 08:31:01

by martin f krafft

[permalink] [raw]
Subject: What are these ATA exceptions trying to tell me? [2.6.26] System Events]

Dear folks,

I have a quality and high performance, new rack-mounted system, but
every now and then, the kernel spews a slew of messages like the
following to syslog:

ata3: EH in SWNCQ mode,QC:qc_active 0x7FFF sactive 0x7FFF
ata3: SWNCQ:qc_active 0x1 defer_bits 0x7FFE last_issue_tag 0x0
dhfis 0x1 dmafis 0x0 sdbfis 0x0
ata3: ATA_REG 0x40 ERR_REG 0x0
ata3: tag : dhfis dmafis sdbfis sacitve
ata3: tag 0x0: 1 0 0 1
ata3.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x6 frozen
ata3.00: cmd 61/18:00:4f:22:b3/00:00:05:00:00/40 tag 0 ncq 12288 out
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
[...]
ata3: hard resetting link
ata3: SRST failed (errno=-19)
ata3: SATA link down (SStatus 0 SControl 300)
ata3: failed to recover some devices, retrying in 5 secs
ata3: hard resetting link
ata3: link is slow to respond, please be patient (ready=-19)
ata3: SRST failed (errno=-16)
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: configured for UDMA/133
ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
ata3: hot plug
ata3.00: configured for UDMA/133
sd 2:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 2:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

This happens for ata[34], but never for ata[12].

At the time of these messages, the the machine was not loaded. In
particular, there was no SMART self-test running. I mention this
because i have set smartd to short tests daily and extended tests
weekly, and those only rarely complete:

# 1 Extended offline Interrupted (host reset) 00% 4812 -
# 2 Short offline Interrupted (host reset) 00% 4684 -

while they run fine for ata[12].

The chipset is nVidia MCP55. Am I dealing with a broken controller?

Cheers,

--
martin | http://madduck.net/ | http://two.sentenc.es/

"i like young girls. their stories are shorter."
-- tom mcguane

spamtraps: [email protected]


Attachments:
(No filename) (2.38 kB)
digital_signature_gpg.asc (198.00 B)
Digital signature (see http://martin-krafft.net/gpg/)
Download all attachments

2009-10-27 00:28:09

by Robert Hancock

[permalink] [raw]
Subject: Re: What are these ATA exceptions trying to tell me? [2.6.26] System Events]

On 10/26/2009 02:23 AM, martin f krafft wrote:
> Dear folks,
>
> I have a quality and high performance, new rack-mounted system, but
> every now and then, the kernel spews a slew of messages like the
> following to syslog:
>
> ata3: EH in SWNCQ mode,QC:qc_active 0x7FFF sactive 0x7FFF
> ata3: SWNCQ:qc_active 0x1 defer_bits 0x7FFE last_issue_tag 0x0
> dhfis 0x1 dmafis 0x0 sdbfis 0x0
> ata3: ATA_REG 0x40 ERR_REG 0x0
> ata3: tag : dhfis dmafis sdbfis sacitve
> ata3: tag 0x0: 1 0 0 1
> ata3.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x6 frozen
> ata3.00: cmd 61/18:00:4f:22:b3/00:00:05:00:00/40 tag 0 ncq 12288 out
> res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata3.00: status: { DRDY }
> [...]
> ata3: hard resetting link
> ata3: SRST failed (errno=-19)
> ata3: SATA link down (SStatus 0 SControl 300)
> ata3: failed to recover some devices, retrying in 5 secs
> ata3: hard resetting link
> ata3: link is slow to respond, please be patient (ready=-19)
> ata3: SRST failed (errno=-16)
> ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata3.00: configured for UDMA/133
> ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
> ata3: hot plug
> ata3.00: configured for UDMA/133
> sd 2:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
> sd 2:0:0:0: [sdc] Write Protect is off
> sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> sd 2:0:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
> sd 2:0:0:0: [sdc] Write Protect is off
> sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>
> This happens for ata[34], but never for ata[12].

The "SATA link down" part is really quite abnormal, it seems like the
drive dropped off the SATA link. Rather suspicious of some kind of
hardware problem..

Are the two sets of disks the same model?

>
> At the time of these messages, the the machine was not loaded. In
> particular, there was no SMART self-test running. I mention this
> because i have set smartd to short tests daily and extended tests
> weekly, and those only rarely complete:
>
> # 1 Extended offline Interrupted (host reset) 00% 4812 -
> # 2 Short offline Interrupted (host reset) 00% 4684 -
>
> while they run fine for ata[12].
>
> The chipset is nVidia MCP55. Am I dealing with a broken controller?
>
> Cheers,
>

2009-10-27 06:43:04

by martin f krafft

[permalink] [raw]
Subject: Re: What are these ATA exceptions trying to tell me? [2.6.26] System Events]

also sprach Robert Hancock <[email protected]> [2009.10.27.0128 +0100]:
> The "SATA link down" part is really quite abnormal, it seems like the
> drive dropped off the SATA link. Rather suspicious of some kind of
> hardware problem..
>
> Are the two sets of disks the same model?

Yes. All four disks are. And there are 4 identical MCP55 SATA ports
too. Either those two disks are broken by chance, or the controller
is broken.

Cheers,

--
martin | http://madduck.net/ | http://two.sentenc.es/

"a mathematician is a device for turning coffee into theorems."
-- paul erd?s

spamtraps: [email protected]


Attachments:
(No filename) (676.00 B)
digital_signature_gpg.asc (198.00 B)
Digital signature (see http://martin-krafft.net/gpg/)
Download all attachments