>From time to time this nVidia SATA controller chokes on a FLUSH CACHE.
1. why does the kernel not try to HARD reset the link?
2. it would be nice to have the possibility to manually force a (hard) reset or
to re-initialize the device. Other than rebooting I mean :-)
Thanks.
Ortwin
Kernel is basically a 2.6.32.8 with a cherry picked fix:
http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob;f=queue-2.6.33/enable-retries-for-syncronize_cache-commands-to-fix-i-o-error.patch;h=2401e54b05502803889d4ece2afefc3e2b64995f;hb=117d7c078957b2e200e3fcf06c182422366764b0
Jun 23 12:06:48 gollum kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen
Jun 23 12:06:48 gollum kernel: ata2.00: failed command: FLUSH CACHE
Jun 23 12:06:48 gollum kernel: ata2.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0
tag 0
Jun 23 12:06:48 gollum kernel: res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4
(timeout)
Jun 23 12:06:48 gollum kernel: ata2.00: status: { DRDY }
Jun 23 12:06:53 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:06:53 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: device not ready (errno=-16), forcing hardreset
Jun 23 12:07:59 gollum kernel: ata2: device not ready (errno=-16), forcing hardreset
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: reset failed, giving up
Jun 23 12:07:59 gollum kernel: ata2.00: disabled
Jun 23 12:07:59 gollum kernel: ata2.00: disabled
Jun 23 12:07:59 gollum kernel: ata2.01: disabled
Jun 23 12:07:59 gollum kernel: ata2.01: disabled
Jun 23 12:07:59 gollum kernel: ata2.00: device reported invalid CHS sector 0
Jun 23 12:07:59 gollum kernel: ata2.00: device reported invalid CHS sector 0
Jun 23 12:07:59 gollum kernel: ata2: EH complete
Jun 23 12:07:59 gollum kernel: ata2: EH complete
Jun 23 12:07:59 gollum kernel: end_request: I/O error, dev sdb, sector 58604962
Jun 23 12:07:59 gollum kernel: md: super_written gets error=-5, uptodate=0
Jun 23 12:07:59 gollum kernel: md: super_written gets error=-5, uptodate=0
Jun 23 12:07:59 gollum kernel: raid1: Disk failure on sdb3, disabling device.
Jun 23 12:07:59 gollum kernel: raid1: Operation continuing on 1 devices.
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3
Jun 23 12:07:59 gollum kernel: disk 1, wo:1, o:0, dev:sdb3
Jun 23 12:07:59 gollum kernel: disk 1, wo:1, o:0, dev:sdb3
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3
lspci:
00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev c1)
00:1e.0 0604: 10de:01e8 (rev c1)
ATA initialization:
Jun 23 18:46:38 gollum kernel: ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max
UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max
UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.00: 58605120 sectors, multi 16: LBA
Jun 23 18:46:38 gollum kernel: ata2.00: 58605120 sectors, multi 16: LBA
Jun 23 18:46:38 gollum kernel: ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel
DVD-115 0127, E1.27, max UDMA/33
Jun 23 18:46:38 gollum kernel: ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel
DVD-115 0127, E1.27, max UDMA/33
Jun 23 18:46:38 gollum kernel: ata2: nv_mode_filter: 0x3f39f&0x3f39f->0x3f39f,
BIOS=0x3f000 (0xc700c6c0) ACPI=0x3f01f (20:60:0x1f)
Jun 23 18:46:38 gollum kernel: ata2: nv_mode_filter: 0x739f&0x739f->0x739f,
BIOS=0x7000 (0xc700c6c0) ACPI=0x701f (20:60:0x1f)
Jun 23 18:46:38 gollum kernel: ata2.00: configured for UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.00: configured for UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.01: configured for UDMA/33
Jun 23 18:46:38 gollum kernel: ata2.01: configured for UDMA/33
Jun 23 18:46:38 gollum kernel: scsi 1:0:0:0: Direct-Access ATA
IC25N030ATCS04-0 CA3O PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: scsi 1:0:0:0: Direct-Access ATA
IC25N030ATCS04-0 CA3O PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] 58605120 512-byte logical
blocks: (30.0 GB/27.9 GiB)
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] 58605120 512-byte logical
blocks: (30.0 GB/27.9 GiB)
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write Protect is off
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write Protect is off
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
Jun 23 18:46:38 gollum kernel: sdb:
Jun 23 18:46:38 gollum kernel: sdb:
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
Jun 23 18:46:38 gollum kernel: sdb1 sdb2 sdb3
Jun 23 18:46:38 gollum kernel: sdb1 sdb2 sdb3
Jun 23 18:46:38 gollum kernel: scsi 1:0:1:0: CD-ROM PIONEER DVD-ROM
DVD-115F 1.27 PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: scsi 1:0:1:0: CD-ROM PIONEER DVD-ROM
DVD-115F 1.27 PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Attached SCSI disk
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Attached SCSI disk
Hello,
On 06/23/2010 07:08 PM, Ortwin Gl?ck wrote:
>>From time to time this nVidia SATA controller chokes on a FLUSH CACHE.
>
> 1. why does the kernel not try to HARD reset the link?
Because hardreset sometimes brings the link completely offline on
sata_nv's. Hardreset on sata_nv controllers is quite fragile.
> 2. it would be nice to have the possibility to manually force a
> (hard) reset or to re-initialize the device. Other than rebooting I
> mean :-)
Maybe we can use hardreset as the last resort before ditching the
device. Something like the following. Can you please try it and post
the kernel log? Thanks.
diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
index 2116113..5105951 100644
--- a/drivers/ata/sata_nv.c
+++ b/drivers/ata/sata_nv.c
@@ -1587,7 +1587,7 @@ static int nv_hardreset(struct ata_link *link, unsigned int *class,
* comment above port ops for details.
*/
if (!(link->ap->pflags & ATA_PFLAG_LOADING) &&
- !ata_dev_enabled(link->device))
+ (!ata_dev_enabled(link->device) || ehc->tries[0] == 1))
sata_link_hardreset(link, sata_deb_timing_hotplug, deadline,
NULL, NULL);
else {
--
tejun
On 06/23/2010 07:28 PM, Tejun Heo wrote:
> Maybe we can use hardreset as the last resort before ditching the
> device. Something like the following. Can you please try it and post
> the kernel log? Thanks.
Meh, it won't work. It's failing softreset so we should be checking
reset try counts. I'll try to write up something tomorrow.
Thanks.
--
tejun
On 23.06.2010 19:31, Tejun Heo wrote:
> Meh, it won't work. It's failing softreset so we should be checking
> reset try counts. I'll try to write up something tomorrow.
I am happy to try patches. The problem shows up maybe once a month only,
however. Interestingly it only occurs on ata2 with a IBM hdd, never on the first
channel with a maxtor. I can also try and attach the dvd on the first channel
and see if that makes any difference.
ata1.00: ATA-5: MAXTOR 6L020J1, A93.0500, max UDMA/133
ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max UDMA/100
ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel DVD-115 0127, E1.27, max UDMA/33
Thanks.
Ortwin
Hello,
Patch attached, but please see below.
On 06/24/2010 09:13 AM, Ortwin Gl?ck wrote:
> On 23.06.2010 19:31, Tejun Heo wrote:
>> Meh, it won't work. It's failing softreset so we should be checking
>> reset try counts. I'll try to write up something tomorrow.
>
> I am happy to try patches. The problem shows up maybe once a month
> only, however. Interestingly it only occurs on ata2 with a IBM hdd,
> never on the first channel with a maxtor. I can also try and attach
> the dvd on the first channel and see if that makes any difference.
Problems like this definitely can depend on the specific drive.
> ata1.00: ATA-5: MAXTOR 6L020J1, A93.0500, max UDMA/133
> ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max UDMA/100
> ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel DVD-115 0127, E1.27, max UDMA/33
Is it PATA? Why do you have 2.01? Can you please attach full boot
log?
Thanks.
--
tejun
On 25.06.2010 17:19, Tejun Heo wrote:
> Patch attached, but please see below.
Cheers, I will give it a try and provide feedback if "it" happens again.
> Is it PATA?
Yes. This chipset doesn't have SATA, only PATA.
CONFIG_ATA=y
CONFIG_ATA_SFF=y
CONFIG_SATA_NV=y
CONFIG_PATA_AMD=y
> Why do you have 2.01?
for the DVD drive?
> Can you please attach full boot
> log?
attached.
Thanks.
Ortwin
On 06/25/2010 12:00 PM, Ortwin Gl?ck wrote:
>
>
> On 25.06.2010 17:19, Tejun Heo wrote:
>> Patch attached, but please see below.
>
> Cheers, I will give it a try and provide feedback if "it" happens again.
>
>> Is it PATA?
>
> Yes. This chipset doesn't have SATA, only PATA.
Ahh, ok, it's not sata_nv at all then, it's pata_amd. On the same chip,
totally different controller though.
PATA doesn't normally use hard-resets as a means of error recovery -
that would mean hitting the RESET line, which I don't think most
controllers can do on software command (it usually only gets asserted on
power up or hitting the reset button), unlike on SATA where there's a
defined way to trigger a COMRESET which is mostly equivalent. Also, that
resets both devices on the channel, unlike soft reset which is specific
to one device.
>
> CONFIG_ATA=y
> CONFIG_ATA_SFF=y
> CONFIG_SATA_NV=y
> CONFIG_PATA_AMD=y
>
>> Why do you have 2.01?
>
> for the DVD drive?
>
>> Can you please attach full boot
>> log?
>
> attached.
>
> Thanks.
> Ortwin
Hello,
On 06/26/2010 06:07 AM, Robert Hancock wrote:
> Ahh, ok, it's not sata_nv at all then, it's pata_amd. On the same chip,
> totally different controller though.
>
> PATA doesn't normally use hard-resets as a means of error recovery -
> that would mean hitting the RESET line, which I don't think most
> controllers can do on software command (it usually only gets asserted on
> power up or hitting the reset button), unlike on SATA where there's a
> defined way to trigger a COMRESET which is mostly equivalent. Also, that
> resets both devices on the channel, unlike soft reset which is specific
> to one device.
Yeah, if it's pata_amd and SRST isn't recovering the device, there
isn't much else to do. :-(
Thanks.
--
tejun
On 26.06.2010 06:07, Robert Hancock wrote:
> PATA doesn't normally use hard-resets as a means of error recovery -
> that would mean hitting the RESET line, which I don't think most
> controllers can do on software command
Alright, no worries. I'll try to address the problem from the hardware side
then: cables, drives, etc.
Thanks.
Ortwin
Hello.
Robert Hancock wrote:
>> On 25.06.2010 17:19, Tejun Heo wrote:
>>> Patch attached, but please see below.
>> Cheers, I will give it a try and provide feedback if "it" happens again.
>>> Is it PATA?
>> Yes. This chipset doesn't have SATA, only PATA.
> Ahh, ok, it's not sata_nv at all then, it's pata_amd. On the same chip,
> totally different controller though.
> PATA doesn't normally use hard-resets as a means of error recovery -
> that would mean hitting the RESET line, which I don't think most
> controllers can do on software command (it usually only gets asserted on
> power up or hitting the reset button), unlike on SATA where there's a
> defined way to trigger a COMRESET which is mostly equivalent. Also, that
> resets both devices on the channel, unlike soft reset which is specific
> to one device.
PATA soft reset affects both devices too. Unless you mean ATAPI Device
Reset command.
MBR, Sergei