2010-06-23 17:08:36

by Ortwin Glück

[permalink] [raw]
Subject: ata link not reset properly

>From time to time this nVidia SATA controller chokes on a FLUSH CACHE.

1. why does the kernel not try to HARD reset the link?
2. it would be nice to have the possibility to manually force a (hard) reset or
to re-initialize the device. Other than rebooting I mean :-)

Thanks.
Ortwin

Kernel is basically a 2.6.32.8 with a cherry picked fix:
http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob;f=queue-2.6.33/enable-retries-for-syncronize_cache-commands-to-fix-i-o-error.patch;h=2401e54b05502803889d4ece2afefc3e2b64995f;hb=117d7c078957b2e200e3fcf06c182422366764b0

Jun 23 12:06:48 gollum kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen
Jun 23 12:06:48 gollum kernel: ata2.00: failed command: FLUSH CACHE
Jun 23 12:06:48 gollum kernel: ata2.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0
tag 0
Jun 23 12:06:48 gollum kernel: res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4
(timeout)
Jun 23 12:06:48 gollum kernel: ata2.00: status: { DRDY }
Jun 23 12:06:53 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:06:53 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: device not ready (errno=-16), forcing hardreset
Jun 23 12:07:59 gollum kernel: ata2: device not ready (errno=-16), forcing hardreset
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: reset failed, giving up
Jun 23 12:07:59 gollum kernel: ata2.00: disabled
Jun 23 12:07:59 gollum kernel: ata2.00: disabled
Jun 23 12:07:59 gollum kernel: ata2.01: disabled
Jun 23 12:07:59 gollum kernel: ata2.01: disabled
Jun 23 12:07:59 gollum kernel: ata2.00: device reported invalid CHS sector 0
Jun 23 12:07:59 gollum kernel: ata2.00: device reported invalid CHS sector 0
Jun 23 12:07:59 gollum kernel: ata2: EH complete
Jun 23 12:07:59 gollum kernel: ata2: EH complete
Jun 23 12:07:59 gollum kernel: end_request: I/O error, dev sdb, sector 58604962
Jun 23 12:07:59 gollum kernel: md: super_written gets error=-5, uptodate=0
Jun 23 12:07:59 gollum kernel: md: super_written gets error=-5, uptodate=0
Jun 23 12:07:59 gollum kernel: raid1: Disk failure on sdb3, disabling device.
Jun 23 12:07:59 gollum kernel: raid1: Operation continuing on 1 devices.
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3
Jun 23 12:07:59 gollum kernel: disk 1, wo:1, o:0, dev:sdb3
Jun 23 12:07:59 gollum kernel: disk 1, wo:1, o:0, dev:sdb3
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3


lspci:
00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev c1)
00:1e.0 0604: 10de:01e8 (rev c1)

ATA initialization:
Jun 23 18:46:38 gollum kernel: ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max
UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max
UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.00: 58605120 sectors, multi 16: LBA
Jun 23 18:46:38 gollum kernel: ata2.00: 58605120 sectors, multi 16: LBA
Jun 23 18:46:38 gollum kernel: ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel
DVD-115 0127, E1.27, max UDMA/33
Jun 23 18:46:38 gollum kernel: ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel
DVD-115 0127, E1.27, max UDMA/33
Jun 23 18:46:38 gollum kernel: ata2: nv_mode_filter: 0x3f39f&0x3f39f->0x3f39f,
BIOS=0x3f000 (0xc700c6c0) ACPI=0x3f01f (20:60:0x1f)
Jun 23 18:46:38 gollum kernel: ata2: nv_mode_filter: 0x739f&0x739f->0x739f,
BIOS=0x7000 (0xc700c6c0) ACPI=0x701f (20:60:0x1f)
Jun 23 18:46:38 gollum kernel: ata2.00: configured for UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.00: configured for UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.01: configured for UDMA/33
Jun 23 18:46:38 gollum kernel: ata2.01: configured for UDMA/33
Jun 23 18:46:38 gollum kernel: scsi 1:0:0:0: Direct-Access ATA
IC25N030ATCS04-0 CA3O PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: scsi 1:0:0:0: Direct-Access ATA
IC25N030ATCS04-0 CA3O PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] 58605120 512-byte logical
blocks: (30.0 GB/27.9 GiB)
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] 58605120 512-byte logical
blocks: (30.0 GB/27.9 GiB)
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write Protect is off
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write Protect is off
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
Jun 23 18:46:38 gollum kernel: sdb:
Jun 23 18:46:38 gollum kernel: sdb:
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
Jun 23 18:46:38 gollum kernel: sdb1 sdb2 sdb3
Jun 23 18:46:38 gollum kernel: sdb1 sdb2 sdb3
Jun 23 18:46:38 gollum kernel: scsi 1:0:1:0: CD-ROM PIONEER DVD-ROM
DVD-115F 1.27 PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: scsi 1:0:1:0: CD-ROM PIONEER DVD-ROM
DVD-115F 1.27 PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Attached SCSI disk
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Attached SCSI disk


2010-06-23 17:28:19

by Tejun Heo

[permalink] [raw]
Subject: Re: ata link not reset properly

Hello,

On 06/23/2010 07:08 PM, Ortwin Gl?ck wrote:
>>From time to time this nVidia SATA controller chokes on a FLUSH CACHE.
>
> 1. why does the kernel not try to HARD reset the link?

Because hardreset sometimes brings the link completely offline on
sata_nv's. Hardreset on sata_nv controllers is quite fragile.

> 2. it would be nice to have the possibility to manually force a
> (hard) reset or to re-initialize the device. Other than rebooting I
> mean :-)

Maybe we can use hardreset as the last resort before ditching the
device. Something like the following. Can you please try it and post
the kernel log? Thanks.

diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
index 2116113..5105951 100644
--- a/drivers/ata/sata_nv.c
+++ b/drivers/ata/sata_nv.c
@@ -1587,7 +1587,7 @@ static int nv_hardreset(struct ata_link *link, unsigned int *class,
* comment above port ops for details.
*/
if (!(link->ap->pflags & ATA_PFLAG_LOADING) &&
- !ata_dev_enabled(link->device))
+ (!ata_dev_enabled(link->device) || ehc->tries[0] == 1))
sata_link_hardreset(link, sata_deb_timing_hotplug, deadline,
NULL, NULL);
else {

--
tejun

2010-06-23 17:31:53

by Tejun Heo

[permalink] [raw]
Subject: Re: ata link not reset properly

On 06/23/2010 07:28 PM, Tejun Heo wrote:
> Maybe we can use hardreset as the last resort before ditching the
> device. Something like the following. Can you please try it and post
> the kernel log? Thanks.

Meh, it won't work. It's failing softreset so we should be checking
reset try counts. I'll try to write up something tomorrow.

Thanks.

--
tejun

2010-06-24 07:14:06

by Ortwin Glück

[permalink] [raw]
Subject: Re: ata link not reset properly

On 23.06.2010 19:31, Tejun Heo wrote:
> Meh, it won't work. It's failing softreset so we should be checking
> reset try counts. I'll try to write up something tomorrow.

I am happy to try patches. The problem shows up maybe once a month only,
however. Interestingly it only occurs on ata2 with a IBM hdd, never on the first
channel with a maxtor. I can also try and attach the dvd on the first channel
and see if that makes any difference.

ata1.00: ATA-5: MAXTOR 6L020J1, A93.0500, max UDMA/133
ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max UDMA/100
ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel DVD-115 0127, E1.27, max UDMA/33

Thanks.
Ortwin

2010-06-25 15:19:41

by Tejun Heo

[permalink] [raw]
Subject: Re: ata link not reset properly

Hello,

Patch attached, but please see below.

On 06/24/2010 09:13 AM, Ortwin Gl?ck wrote:
> On 23.06.2010 19:31, Tejun Heo wrote:
>> Meh, it won't work. It's failing softreset so we should be checking
>> reset try counts. I'll try to write up something tomorrow.
>
> I am happy to try patches. The problem shows up maybe once a month
> only, however. Interestingly it only occurs on ata2 with a IBM hdd,
> never on the first channel with a maxtor. I can also try and attach
> the dvd on the first channel and see if that makes any difference.

Problems like this definitely can depend on the specific drive.

> ata1.00: ATA-5: MAXTOR 6L020J1, A93.0500, max UDMA/133
> ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max UDMA/100
> ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel DVD-115 0127, E1.27, max UDMA/33

Is it PATA? Why do you have 2.01? Can you please attach full boot
log?

Thanks.

--
tejun


Attachments:
nv-last-reset.patch (2.06 kB)

2010-06-25 18:01:06

by Ortwin Glück

[permalink] [raw]
Subject: Re: ata link not reset properly



On 25.06.2010 17:19, Tejun Heo wrote:
> Patch attached, but please see below.

Cheers, I will give it a try and provide feedback if "it" happens again.

> Is it PATA?

Yes. This chipset doesn't have SATA, only PATA.

CONFIG_ATA=y
CONFIG_ATA_SFF=y
CONFIG_SATA_NV=y
CONFIG_PATA_AMD=y

> Why do you have 2.01?

for the DVD drive?

> Can you please attach full boot
> log?

attached.

Thanks.
Ortwin


Attachments:
bootlog.txt (24.14 kB)

2010-06-26 04:07:19

by Robert Hancock

[permalink] [raw]
Subject: Re: ata link not reset properly

On 06/25/2010 12:00 PM, Ortwin Gl?ck wrote:
>
>
> On 25.06.2010 17:19, Tejun Heo wrote:
>> Patch attached, but please see below.
>
> Cheers, I will give it a try and provide feedback if "it" happens again.
>
>> Is it PATA?
>
> Yes. This chipset doesn't have SATA, only PATA.

Ahh, ok, it's not sata_nv at all then, it's pata_amd. On the same chip,
totally different controller though.

PATA doesn't normally use hard-resets as a means of error recovery -
that would mean hitting the RESET line, which I don't think most
controllers can do on software command (it usually only gets asserted on
power up or hitting the reset button), unlike on SATA where there's a
defined way to trigger a COMRESET which is mostly equivalent. Also, that
resets both devices on the channel, unlike soft reset which is specific
to one device.

>
> CONFIG_ATA=y
> CONFIG_ATA_SFF=y
> CONFIG_SATA_NV=y
> CONFIG_PATA_AMD=y
>
>> Why do you have 2.01?
>
> for the DVD drive?
>
>> Can you please attach full boot
>> log?
>
> attached.
>
> Thanks.
> Ortwin

2010-06-26 08:03:51

by Tejun Heo

[permalink] [raw]
Subject: Re: ata link not reset properly

Hello,

On 06/26/2010 06:07 AM, Robert Hancock wrote:
> Ahh, ok, it's not sata_nv at all then, it's pata_amd. On the same chip,
> totally different controller though.
>
> PATA doesn't normally use hard-resets as a means of error recovery -
> that would mean hitting the RESET line, which I don't think most
> controllers can do on software command (it usually only gets asserted on
> power up or hitting the reset button), unlike on SATA where there's a
> defined way to trigger a COMRESET which is mostly equivalent. Also, that
> resets both devices on the channel, unlike soft reset which is specific
> to one device.

Yeah, if it's pata_amd and SRST isn't recovering the device, there
isn't much else to do. :-(

Thanks.

--
tejun

2010-06-26 09:00:38

by Ortwin Glück

[permalink] [raw]
Subject: Re: ata link not reset properly

On 26.06.2010 06:07, Robert Hancock wrote:
> PATA doesn't normally use hard-resets as a means of error recovery -
> that would mean hitting the RESET line, which I don't think most
> controllers can do on software command

Alright, no worries. I'll try to address the problem from the hardware side
then: cables, drives, etc.

Thanks.
Ortwin

2010-06-27 18:28:36

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: ata link not reset properly

Hello.

Robert Hancock wrote:

>> On 25.06.2010 17:19, Tejun Heo wrote:
>>> Patch attached, but please see below.

>> Cheers, I will give it a try and provide feedback if "it" happens again.

>>> Is it PATA?

>> Yes. This chipset doesn't have SATA, only PATA.

> Ahh, ok, it's not sata_nv at all then, it's pata_amd. On the same chip,
> totally different controller though.

> PATA doesn't normally use hard-resets as a means of error recovery -
> that would mean hitting the RESET line, which I don't think most
> controllers can do on software command (it usually only gets asserted on
> power up or hitting the reset button), unlike on SATA where there's a
> defined way to trigger a COMRESET which is mostly equivalent. Also, that
> resets both devices on the channel, unlike soft reset which is specific
> to one device.

PATA soft reset affects both devices too. Unless you mean ATAPI Device
Reset command.

MBR, Sergei