LinuxLists.cc - sata_sil24 broken since 2.6.23-rc4-mm1

2007-09-26 20:26:53

Subject: sata_sil24 broken since 2.6.23-rc4-mm1

As reported in the "2.6.23-rc4-mm1"-thread and the "What's in
linux-2.6-block.git for 2.6.24"-thread I'm having trouble that
sometimes on bootup one drive from the SiI-3132 throws errors and
becomes inaccesible.

The latest kernel I have seen this error was 2.6.23-rc7-mm1.
>From 7 boots 2 times the following happend:

Sep 25 07:42:11 treogen [ 33.810000] md1: bitmap initialized from
disk: read 10/10 pages, set 0 bits
Sep 25 07:42:11 treogen [ 33.810000] created bitmap (145 pages) for device md1
Sep 25 07:42:11 treogen [ 63.910000] ata1.00: exception Emask 0x0
SAct 0x1 SErr 0x0 action 0x6 frozen
Sep 25 07:42:11 treogen [ 63.910000] ata1.00: cmd
61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
Sep 25 07:42:11 treogen [ 63.910000] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 25 07:42:11 treogen [ 63.910000] ata1.00: status: {DRDY }
Sep 25 07:42:11 treogen [ 63.910000] ata1: hard resetting link
Sep 25 07:42:11 treogen [ 66.210000] ata1: softreset failed (port not ready)
Sep 25 07:42:11 treogen [ 66.210000] ata1: reset failed (errno=-5),
retrying in 8 secs
Sep 25 07:42:11 treogen [ 73.910000] ata1: hard resetting link
Sep 25 07:42:11 treogen [ 76.210000] ata1: softreset failed (port not ready)
Sep 25 07:42:11 treogen [ 76.210000] ata1: reset failed (errno=-5),
retrying in 8 secs
Sep 25 07:42:11 treogen [ 83.910000] ata1: hard resetting link
Sep 25 07:42:11 treogen [ 86.210000] ata1: softreset failed (port not ready)
Sep 25 07:42:11 treogen [ 86.210000] ata1: reset failed (errno=-5),
retrying in 33 secs
Sep 25 07:42:11 treogen [ 118.910000] ata1: limiting SATA link speed
to 1.5 Gbps
Sep 25 07:42:11 treogen [ 118.910000] ata1: hard resetting link
Sep 25 07:42:11 treogen [ 121.210000] ata1: softreset failed (port not ready)
Sep 25 07:42:11 treogen [ 121.210000] ata1: reset failed, giving up
Sep 25 07:42:11 treogen [ 121.210000] ata1.00: disabled
Sep 25 07:42:11 treogen [ 121.210000] ata1: EH complete
Sep 25 07:42:11 treogen [ 121.210000] sd 0:0:0:0: [sda] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 25 07:42:11 treogen [ 121.210000] end_request: I/O error, dev
sda, sector 625137161
Sep 25 07:42:11 treogen [ 121.210000] md: super_written gets
error=-5, uptodate=0
Sep 25 07:42:11 treogen [ 121.210000] raid5: Disk failure on sda2,
disabling device. Operation continuing on 2 devices

Comparing the driver/ata directory from rc3-mm1 and rc4-mm1 the
following change looked the most suspicions to me:
http://git.kernel.org/?p=linux/kernel/git/jgarzik/libata-dev.git;a=blobdiff;f=drivers/ata/sata_sil24.c;h=3dcb223117be9739ee04d70b6bfc776a4b839a3f;hp=e0cd31aa8002350add53ba6ff07493e503275244;hb=020bc1bd8d369a77bd9379cd9763ac0057651753;hpb=8d4bdf8087e682df98bdb856f6ad451bf6d597e7

That after rc4-mm1 the sata_sil24.c did not change anymore also
matches the occurrence of the error.

To confirm my theorie I exchanged the sata_sil24.c from rc8-mm1 with
the version from rc3-mm1.
I was able to boot the resulting kernel successfully 5 times, without
the error happening again.

Torsten

2007-09-27 04:57:01

by Tejun Heo

[permalink] [raw]

Subject: Re: sata_sil24 broken since 2.6.23-rc4-mm1

diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c
index 3831920..dc3ddcb 100644
--- a/drivers/ata/sata_sil24.c
+++ b/drivers/ata/sata_sil24.c
@@ -1117,6 +1117,7 @@ static int sil24_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)

host->ports[i]->ioaddr.cmd_addr = port;
host->ports[i]->ioaddr.scr_addr = port + PORT_SCONTROL;
+ ata_std_ports(&ap->ioaddr);

ata_port_pbar_desc(ap, SIL24_HOST_BAR, -1, "host");
ata_port_pbar_desc(ap, SIL24_PORT_BAR, offset, "port");

Attachments:

patch (511.00 B)

2007-09-27 04:59:33

by Tejun Heo

[permalink] [raw]

Subject: Re: sata_sil24 broken since 2.6.23-rc4-mm1

Tejun Heo wrote:
> Torsten Kaiser wrote:
>> Comparing the driver/ata directory from rc3-mm1 and rc4-mm1 the
>> following change looked the most suspicions to me:
>> http://git.kernel.org/?p=linux/kernel/git/jgarzik/libata-dev.git;a=blobdiff;f=drivers/ata/sata_sil24.c;h=3dcb223117be9739ee04d70b6bfc776a4b839a3f;hp=e0cd31aa8002350add53ba6ff07493e503275244;hb=020bc1bd8d369a77bd9379cd9763ac0057651753;hpb=8d4bdf8087e682df98bdb856f6ad451bf6d597e7
>>
>> That after rc4-mm1 the sata_sil24.c did not change anymore also
>> matches the occurrence of the error.
>>
>> To confirm my theorie I exchanged the sata_sil24.c from rc8-mm1 with
>> the version from rc3-mm1.
>> I was able to boot the resulting kernel successfully 5 times, without
>> the error happening again.
>
> Thanks a lot for chasing down the problem. The changed code is address
> initialization path and it's weird that it causes intermittent failures,
> not a consistent one.
>
> Anyways, does the attached patch fix the problem?

If not, can you add printk of iomap[SIL24_PORT_BAR], offset, initialized
cmd_addr and scr_addr in the loop and see whether anything is different
between when the driver works and fails.

Thanks.

--
tejun

2007-09-27 06:14:41