2011-05-11 22:26:32

by Rafael J. Wysocki

[permalink] [raw]
Subject: AHCI driver problem on SB700/SB800 w/ Acer Ferrari One

Hi,

There seems to be a problem on my Acer box which is related to DIPM
and switching the power source. Namely, when I detach the AC adapter from
the machine, the disk (which is an Intel SSD) freezes for a while and
something like this appears in dmesg:

[ 2177.808120] ata1.00: exception Emask 0x0 SAct 0x7 SErr 0xd0002 action 0x6 frozen
[ 2177.808140] ata1: SError: { RecovComm PHYRdyChg CommWake 10B8B }
[ 2177.808153] ata1.00: failed command: WRITE FPDMA QUEUED
[ 2177.808171] ata1.00: cmd 61/08:00:08:ed:37/00:00:06:00:00/40 tag 0 ncq 4096 out
[ 2177.808174] res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2177.808192] ata1.00: status: { DRDY }
[ 2177.808201] ata1.00: failed command: WRITE FPDMA QUEUED
[ 2177.808218] ata1.00: cmd 61/08:08:48:f4:c7/00:00:05:00:00/40 tag 1 ncq 4096 out
[ 2177.808220] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2177.808239] ata1.00: status: { DRDY }
[ 2177.808247] ata1.00: failed command: WRITE FPDMA QUEUED
[ 2177.808275] ata1.00: cmd 61/08:10:50:f4:c7/00:00:05:00:00/40 tag 2 ncq 4096 out
[ 2177.808277] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2177.808303] ata1.00: status: { DRDY }
[ 2177.808320] ata1: hard resetting link
[ 2187.812149] ata1: softreset failed (1st FIS failed)
[ 2187.812227] ata1: hard resetting link
[ 2188.304128] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 2188.304774] ata1.00: configured for UDMA/133
[ 2188.320333] ata1.00: device reported invalid CHS sector 0
[ 2188.320361] ata1.00: device reported invalid CHS sector 0
[ 2188.320394] ata1: EH complete

Then, when the AC adapter is attached again, the SSD freezes again and this
appears in dmesg:

[178523.064145] ata1.00: qc timeout (cmd 0xef)
[178523.064228] ata1.00: failed to disable DIPM, Emask 0x4
[178523.064248] ata1: hard resetting link
[178523.556096] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[178523.556793] ata1.00: configured for UDMA/133
[178523.572051] ata1: EH complete

That seems to be reproducible 100% of the time. Moreover, if the box is
suspended on AC power and then resume on battery power, something more serious
happens:

[178881.824160] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0xd0002 action 0x6 frozen
[178881.824203] ata1: SError: { RecovComm PHYRdyChg CommWake 10B8B }
[178881.824223] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.824249] ata1.00: cmd 61/08:00:00:d0:9e/00:00:01:00:00/40 tag 0 ncq 4096 out
[178881.824252] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.824288] ata1.00: status: { DRDY }
[178881.824304] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.824328] ata1.00: cmd 61/08:08:48:d0:9e/00:00:01:00:00/40 tag 1 ncq 4096 out
[178881.824331] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.824366] ata1.00: status: { DRDY }
[178881.824381] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.824405] ata1.00: cmd 61/08:10:00:d2:9e/00:00:01:00:00/40 tag 2 ncq 4096 out
[178881.824408] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.824443] ata1.00: status: { DRDY }
[178881.824457] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.824481] ata1.00: cmd 61/08:18:20:d3:9e/00:00:01:00:00/40 tag 3 ncq 4096 out
[178881.824484] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.824519] ata1.00: status: { DRDY }
[178881.824534] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.824557] ata1.00: cmd 61/10:20:90:da:9e/00:00:01:00:00/40 tag 4 ncq 8192 out
[178881.824560] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.824595] ata1.00: status: { DRDY }
[178881.824610] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.824633] ata1.00: cmd 61/08:28:c0:5a:6c/00:00:01:00:00/40 tag 5 ncq 4096 out
[178881.824637] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.824671] ata1.00: status: { DRDY }
[178881.824710] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.824731] ata1.00: cmd 61/10:30:48:5f:98/00:00:01:00:00/40 tag 6 ncq 8192 out
[178881.824734] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.824760] ata1.00: status: { DRDY }
[178881.824773] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.824794] ata1.00: cmd 61/08:38:e0:1a:a4/00:00:01:00:00/40 tag 7 ncq 4096 out
[178881.824797] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.824823] ata1.00: status: { DRDY }
[178881.824836] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.824856] ata1.00: cmd 61/08:40:90:29:ac/00:00:01:00:00/40 tag 8 ncq 4096 out
[178881.824859] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.824885] ata1.00: status: { DRDY }
[178881.824898] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.824919] ata1.00: cmd 61/08:48:a0:33:a4/00:00:01:00:00/40 tag 9 ncq 4096 out
[178881.824922] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.824949] ata1.00: status: { DRDY }
[178881.824961] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.824982] ata1.00: cmd 61/08:50:c8:51:ac/00:00:01:00:00/40 tag 10 ncq 4096 out
[178881.824985] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825012] ata1.00: status: { DRDY }
[178881.825025] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825046] ata1.00: cmd 61/10:58:00:50:96/00:00:01:00:00/40 tag 11 ncq 8192 out
[178881.825049] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825075] ata1.00: status: { DRDY }
[178881.825087] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825108] ata1.00: cmd 61/08:60:20:50:96/00:00:01:00:00/40 tag 12 ncq 4096 out
[178881.825111] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825137] ata1.00: status: { DRDY }
[178881.825149] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825170] ata1.00: cmd 61/08:68:40:50:96/00:00:01:00:00/40 tag 13 ncq 4096 out
[178881.825173] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825199] ata1.00: status: { DRDY }
[178881.825212] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825232] ata1.00: cmd 61/10:70:e0:4c:94/00:00:01:00:00/40 tag 14 ncq 8192 out
[178881.825235] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825261] ata1.00: status: { DRDY }
[178881.825273] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825294] ata1.00: cmd 61/10:78:80:55:7b/00:00:01:00:00/40 tag 15 ncq 8192 out
[178881.825297] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825323] ata1.00: status: { DRDY }
[178881.825335] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825356] ata1.00: cmd 61/08:80:68:60:64/00:00:01:00:00/40 tag 16 ncq 4096 out
[178881.825359] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825385] ata1.00: status: { DRDY }
[178881.825398] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825419] ata1.00: cmd 61/08:88:c8:5a:6c/00:00:01:00:00/40 tag 17 ncq 4096 out
[178881.825422] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825449] ata1.00: status: { DRDY }
[178881.825461] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825461] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825482] ata1.00: cmd 61/08:90:b8:b6:64/00:00:01:00:00/40 tag 18 ncq 4096 out
[178881.825485] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825512] ata1.00: status: { DRDY }
[178881.825524] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825545] ata1.00: cmd 61/10:98:a8:a2:6d/00:00:01:00:00/40 tag 19 ncq 8192 out
[178881.825548] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825574] ata1.00: status: { DRDY }
[178881.825587] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825608] ata1.00: cmd 61/08:a0:58:f6:76/00:00:01:00:00/40 tag 20 ncq 4096 out
[178881.825611] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825637] ata1.00: status: { DRDY }
[178881.825649] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825670] ata1.00: cmd 61/08:a8:e0:28:3b/00:00:06:00:00/40 tag 21 ncq 4096 out
[178881.825673] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825699] ata1.00: status: { DRDY }
[178881.825712] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825732] ata1.00: cmd 61/48:b0:c0:1e:a5/01:00:04:00:00/40 tag 22 ncq 167936 out
[178881.825735] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825761] ata1.00: status: { DRDY }
[178881.825774] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825794] ata1.00: cmd 61/08:b8:78:c1:37/00:00:06:00:00/40 tag 23 ncq 4096 out
[178881.825798] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825824] ata1.00: status: { DRDY }
[178881.825836] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825857] ata1.00: cmd 61/18:c0:98:c1:37/00:00:06:00:00/40 tag 24 ncq 12288 out
[178881.825860] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825886] ata1.00: status: { DRDY }
[178881.825899] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825919] ata1.00: cmd 61/08:c8:10:c1:37/00:00:06:00:00/40 tag 25 ncq 4096 out
[178881.825923] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.825949] ata1.00: status: { DRDY }
[178881.825961] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.825982] ata1.00: cmd 61/10:d0:30:c1:37/00:00:06:00:00/40 tag 26 ncq 8192 out
[178881.825985] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.826011] ata1.00: status: { DRDY }
[178881.826024] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.826045] ata1.00: cmd 61/18:d8:50:c1:37/00:00:06:00:00/40 tag 27 ncq 12288 out
[178881.826048] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.826074] ata1.00: status: { DRDY }
[178881.826086] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.826107] ata1.00: cmd 61/08:e0:d8:a2:35/00:00:06:00:00/40 tag 28 ncq 4096 out
[178881.826110] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.826137] ata1.00: status: { DRDY }
[178881.826149] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.826170] ata1.00: cmd 61/08:e8:d0:a3:35/00:00:06:00:00/40 tag 29 ncq 4096 out
[178881.826173] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.826199] ata1.00: status: { DRDY }
[178881.826211] ata1.00: failed command: WRITE FPDMA QUEUED
[178881.826232] ata1.00: cmd 61/08:f0:50:e5:64/00:00:06:00:00/40 tag 30 ncq 4096 out
[178881.826235] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[178881.826261] ata1.00: status: { DRDY }
[178881.826278] ata1: hard resetting link
[178891.824149] ata1: softreset failed (1st FIS failed)
[178891.824188] ata1: hard resetting link
[178892.320170] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[178892.320874] ata1.00: configured for UDMA/133
[178892.336156] ata1.00: device reported invalid CHS sector 0
[178892.336199] ata1.00: device reported invalid CHS sector 0
[178892.336218] ata1.00: device reported invalid CHS sector 0
[178892.336236] ata1.00: device reported invalid CHS sector 0
[178892.336254] ata1.00: device reported invalid CHS sector 0
[178892.336272] ata1.00: device reported invalid CHS sector 0
[178892.336290] ata1.00: device reported invalid CHS sector 0
[178892.336308] ata1.00: device reported invalid CHS sector 0
[178892.336327] ata1.00: device reported invalid CHS sector 0
[178892.336345] ata1.00: device reported invalid CHS sector 0
[178892.336363] ata1.00: device reported invalid CHS sector 0
[178892.336381] ata1.00: device reported invalid CHS sector 0
[178892.336399] ata1.00: device reported invalid CHS sector 0
[178892.336418] ata1.00: device reported invalid CHS sector 0
[178892.336436] ata1.00: device reported invalid CHS sector 0
[178892.336455] ata1.00: device reported invalid CHS sector 0
[178892.336474] ata1.00: device reported invalid CHS sector 0
[178892.336492] ata1.00: device reported invalid CHS sector 0
[178892.336512] ata1.00: device reported invalid CHS sector 0
[178892.336530] ata1.00: device reported invalid CHS sector 0
[178892.336548] ata1.00: device reported invalid CHS sector 0
[178892.336567] ata1.00: device reported invalid CHS sector 0
[178892.336586] ata1.00: device reported invalid CHS sector 0
[178892.336604] ata1.00: device reported invalid CHS sector 0
[178892.336622] ata1.00: device reported invalid CHS sector 0
[178892.336640] ata1.00: device reported invalid CHS sector 0
[178892.336658] ata1.00: device reported invalid CHS sector 0
[178892.336677] ata1.00: device reported invalid CHS sector 0
[178892.336694] ata1.00: device reported invalid CHS sector 0
[178892.336712] ata1.00: device reported invalid CHS sector 0
[178892.336731] ata1.00: device reported invalid CHS sector 0
[178892.336837] ata1: EH complete

This doesn't happen if the box is both suspended and resumed on battery
power.

I have verified that the issue described above doesn't happen with
the appended patch applied.

The hardware is:

00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] (prog-if 01 [AHCI 1.0])
Subsystem: Acer Incorporated [ALI] Device 029e
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64
Interrupt: pin A routed to IRQ 22
Region 0: I/O ports at 8420 [size=8]
Region 1: I/O ports at 8414 [size=4]
Region 2: I/O ports at 8418 [size=8]
Region 3: I/O ports at 8410 [size=4]
Region 4: I/O ports at 8400 [size=16]
Region 5: Memory at f0508000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [70] SATA HBA v1.0 InCfgSpace
Kernel driver in use: ahci

[ 3.336087] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 3.340777] ata1.00: ATA-7: INTEL SSDSA2M080G2GC, 2CV102HD, max UDMA/133
[ 3.345233] ata1.00: 156301488 sectors, multi 16: LBA48 NCQ (depth 31/32)
[ 3.350070] ata1.00: configured for UDMA/133
[ 3.354969] scsi 0:0:0:0: Direct-Access ATA INTEL SSDSA2M080 2CV1 PQ: 0 ANSI: 5

Please tell me if there's anything I can do as a permanent workaround.

Thanks,
Rafael


---
drivers/ata/ahci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/drivers/ata/ahci.c
===================================================================
--- linux-2.6.orig/drivers/ata/ahci.c
+++ linux-2.6/drivers/ata/ahci.c
@@ -193,7 +193,7 @@ static const struct ata_port_info ahci_p
[board_ahci_sb700] = /* for SB700 and SB800 */
{
AHCI_HFLAGS (AHCI_HFLAG_IGN_SERR_INTERNAL),
- .flags = AHCI_FLAG_COMMON,
+ .flags = AHCI_FLAG_COMMON | ATA_FLAG_NO_DIPM,
.pio_mask = ATA_PIO4,
.udma_mask = ATA_UDMA6,
.port_ops = &ahci_sb600_ops,


2011-05-13 13:07:35

by Michael Leun

[permalink] [raw]
Subject: AHCI driver problem on ICH (Re: AHCI driver problem on SB700/SB800 w/ Acer Ferrari One)

On Thu, 12 May 2011 00:25:34 +0200
"Rafael J. Wysocki" <[email protected]> wrote:

> There seems to be a problem on my Acer box which is related to DIPM
> and switching the power source. Namely, when I detach the AC adapter
> from the machine, the disk (which is an Intel SSD) freezes for a
> while and something like this appears in dmesg:
[...]

Since updating to 2.6.39-rc7 (from 2.6.38.x) I have similar problems
with two notebooks using Intel ICH:

Acer PTZ1825
00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI
Controller (rev 03) (prog-if 01 [AHCI 1.0]) Subsystem: Acer
Incorporated [ALI] Device 0300 Flags: bus master, 66MHz, medium devsel,
latency 0, IRQ 42 I/O ports at 30e8 [size=8]
I/O ports at 30fc [size=4]
I/O ports at 30e0 [size=8]
I/O ports at 30f8 [size=4]
I/O ports at 3020 [size=32]
Memory at d4504000 (32-bit, non-prefetchable) [size=2K]
Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit-
Capabilities: [70] Power Management version 3
Capabilities: [a8] SATA HBA v1.0
Capabilities: [b0] PCI Advanced Features
Kernel driver in use: ahci


Dell Latitude E6510
00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset
6 port SATA AHCI Cont roller (rev 05) (prog-if 01 [AHCI 1.0])
Subsystem: Dell Device 040b
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 41
I/O ports at 8090 [size=8]
I/O ports at 8080 [size=4]
I/O ports at 8070 [size=8]
I/O ports at 8060 [size=4]
I/O ports at 8020 [size=32]
Memory at e9640000 (32-bit, non-prefetchable) [size=2K]
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Power Management version 3
Capabilities: [a8] SATA HBA v1.0
Capabilities: [b0] PCI Advanced Features
Kernel driver in use: ahci

Patch not tried yet (do not know / not looked into yet where to change
for this chipset).

Seems to be a 2.6.39-rcX regression?


--
MfG,

Michael Leun

2011-05-13 16:56:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: AHCI driver problem on ICH (Re: AHCI driver problem on SB700/SB800 w/ Acer Ferrari One)

On Friday, May 13, 2011, Michael Leun wrote:
> On Thu, 12 May 2011 00:25:34 +0200
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > There seems to be a problem on my Acer box which is related to DIPM
> > and switching the power source. Namely, when I detach the AC adapter
> > from the machine, the disk (which is an Intel SSD) freezes for a
> > while and something like this appears in dmesg:
> [...]
>
> Since updating to 2.6.39-rc7 (from 2.6.38.x) I have similar problems
> with two notebooks using Intel ICH:
>
> Acer PTZ1825
> 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI
> Controller (rev 03) (prog-if 01 [AHCI 1.0]) Subsystem: Acer
> Incorporated [ALI] Device 0300 Flags: bus master, 66MHz, medium devsel,
> latency 0, IRQ 42 I/O ports at 30e8 [size=8]
> I/O ports at 30fc [size=4]
> I/O ports at 30e0 [size=8]
> I/O ports at 30f8 [size=4]
> I/O ports at 3020 [size=32]
> Memory at d4504000 (32-bit, non-prefetchable) [size=2K]
> Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit-
> Capabilities: [70] Power Management version 3
> Capabilities: [a8] SATA HBA v1.0
> Capabilities: [b0] PCI Advanced Features
> Kernel driver in use: ahci
>
>
> Dell Latitude E6510
> 00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset
> 6 port SATA AHCI Cont roller (rev 05) (prog-if 01 [AHCI 1.0])
> Subsystem: Dell Device 040b
> Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 41
> I/O ports at 8090 [size=8]
> I/O ports at 8080 [size=4]
> I/O ports at 8070 [size=8]
> I/O ports at 8060 [size=4]
> I/O ports at 8020 [size=32]
> Memory at e9640000 (32-bit, non-prefetchable) [size=2K]
> Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> Capabilities: [70] Power Management version 3
> Capabilities: [a8] SATA HBA v1.0
> Capabilities: [b0] PCI Advanced Features
> Kernel driver in use: ahci
>
> Patch not tried yet (do not know / not looked into yet where to change
> for this chipset).
>
> Seems to be a 2.6.39-rcX regression?

Well, that very well may be the case.

Can you verify that it didn't happen with 2.6.38 and earlier?

Rafael

2011-05-13 17:39:54

by Tejun Heo

[permalink] [raw]
Subject: Re: AHCI driver problem on ICH (Re: AHCI driver problem on SB700/SB800 w/ Acer Ferrari One)

Hello,

Can you guys please take a look at bug#34692? It seems to be the same problem.

https://bugzilla.kernel.org/show_bug.cgi?id=34692

The offending commit seems to be 270dac35c2. Does reverting it
resolves the problem for you guys too?

Thanks.

--
tejun

2011-05-13 20:03:17

by Michael Leun

[permalink] [raw]
Subject: Re: AHCI driver problem on ICH (Re: AHCI driver problem on SB700/SB800 w/ Acer Ferrari One)

On Fri, 13 May 2011 19:39:51 +0200
Tejun Heo <[email protected]> wrote:

> Can you guys please take a look at bug#34692? It seems to be the
> same problem.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=34692
>
> The offending commit seems to be 270dac35c2. Does reverting it
> resolves the problem for you guys too?

- Does not happen with 2.6.38.6

- Happens 100% of cases with 2.6.39-rc7 (connect ac adapter or suspend
to ram with ac connected and resume with ac disconnected)

- Reverting 270dac35c26433d06a89150c51e75ca0181ca7e4 fixes issue

--
MfG,

Michael Leun

2011-05-13 20:19:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: AHCI driver problem on ICH (Re: AHCI driver problem on SB700/SB800 w/ Acer Ferrari One)

On Friday, May 13, 2011, Michael Leun wrote:
> On Fri, 13 May 2011 19:39:51 +0200
> Tejun Heo <[email protected]> wrote:
>
> > Can you guys please take a look at bug#34692? It seems to be the
> > same problem.
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=34692
> >
> > The offending commit seems to be 270dac35c2. Does reverting it
> > resolves the problem for you guys too?
>
> - Does not happen with 2.6.38.6
>
> - Happens 100% of cases with 2.6.39-rc7 (connect ac adapter or suspend
> to ram with ac connected and resume with ac disconnected)
>
> - Reverting 270dac35c26433d06a89150c51e75ca0181ca7e4 fixes issue

I can confirm that.

Thanks,
Rafael

2011-05-13 20:21:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: AHCI driver problem on ICH (Re: AHCI driver problem on SB700/SB800 w/ Acer Ferrari One)

On Friday, May 13, 2011, Tejun Heo wrote:
> Hello,
>
> Can you guys please take a look at bug#34692? It seems to be the same problem.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=34692

Yes, same issue.

> The offending commit seems to be 270dac35c2. Does reverting it
> resolves the problem for you guys too?

Yes, it does.

Thanks,
Rafael

2011-05-13 20:43:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: AHCI driver problem on ICH (Re: AHCI driver problem on SB700/SB800 w/ Acer Ferrari One)

On Friday, May 13, 2011, Rafael J. Wysocki wrote:
> On Friday, May 13, 2011, Tejun Heo wrote:
> > Hello,
> >
> > Can you guys please take a look at bug#34692? It seems to be the same problem.
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=34692
>
> Yes, same issue.
>
> > The offending commit seems to be 270dac35c2. Does reverting it
> > resolves the problem for you guys too?
>
> Yes, it does.

Would you object if I asked Linus to revert that commit?

Rafael

2011-05-14 02:08:39

by Robert Hancock

[permalink] [raw]
Subject: Re: AHCI driver problem on ICH (Re: AHCI driver problem on SB700/SB800 w/ Acer Ferrari One)

On 05/13/2011 02:44 PM, Rafael J. Wysocki wrote:
> On Friday, May 13, 2011, Rafael J. Wysocki wrote:
>> On Friday, May 13, 2011, Tejun Heo wrote:
>>> Hello,
>>>
>>> Can you guys please take a look at bug#34692? It seems to be the same problem.
>>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=34692
>>
>> Yes, same issue.
>>
>>> The offending commit seems to be 270dac35c2. Does reverting it
>>> resolves the problem for you guys too?
>>
>> Yes, it does.
>
> Would you object if I asked Linus to revert that commit?

Think that's likely the best solution for now. I'm guessing there's some
case where link power management trips up the check that was added. Most
or all existing released controllers don't need it so I think Jian could
likely try again after 2.6.39.

2011-05-14 10:18:16

by Tejun Heo

[permalink] [raw]
Subject: Re: AHCI driver problem on ICH (Re: AHCI driver problem on SB700/SB800 w/ Acer Ferrari One)

Hello,

On Fri, May 13, 2011 at 10:44:29PM +0200, Rafael J. Wysocki wrote:
> On Friday, May 13, 2011, Rafael J. Wysocki wrote:
> > On Friday, May 13, 2011, Tejun Heo wrote:
> > > Hello,
> > >
> > > Can you guys please take a look at bug#34692? It seems to be the same problem.
> > >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=34692
> >
> > Yes, same issue.
> >
> > > The offending commit seems to be 270dac35c2. Does reverting it
> > > resolves the problem for you guys too?
> >
> > Yes, it does.
>
> Would you object if I asked Linus to revert that commit?

Not at all. I think that's the best option at this point. I'll send
revert patch right now.

Thanks.

--
tejun

2011-05-14 10:30:18

by Tejun Heo

[permalink] [raw]
Subject: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

This reverts commit 270dac35c26433d06a89150c51e75ca0181ca7e4.

The commits causes command timeouts on AC plug/unplug. It isn't yet
clear why. As the commit was for a single rather obscure controller,
revert the change for now.

The problem was reported and bisected by Gu Rui in bug#34692.

https://bugzilla.kernel.org/show_bug.cgi?id=34692

Also, reported by Rafael and Michael in the following thread.

http://thread.gmane.org/gmane.linux.kernel/1138771

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Gu Rui <[email protected]>
Reported-by: Rafael J. Wysocki <[email protected]>
Reported-by: Michael Leun <[email protected]>
Cc: Jian Peng <[email protected]>
Cc: Jeff Garzik <[email protected]>
---
As we're already in -rc7, I'm sending the revert patch to both Jeff
and Linus.

Thank you.

drivers/ata/libahci.c | 21 ---------------------
1 files changed, 0 insertions(+), 21 deletions(-)

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index ff9d832..d38c40f 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -561,27 +561,6 @@ void ahci_start_engine(struct ata_port *ap)
{
void __iomem *port_mmio = ahci_port_base(ap);
u32 tmp;
- u8 status;
-
- status = readl(port_mmio + PORT_TFDATA) & 0xFF;
-
- /*
- * At end of section 10.1 of AHCI spec (rev 1.3), it states
- * Software shall not set PxCMD.ST to 1 until it is determined
- * that a functoinal device is present on the port as determined by
- * PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h
- *
- * Even though most AHCI host controllers work without this check,
- * specific controller will fail under this condition
- */
- if (status & (ATA_BUSY | ATA_DRQ))
- return;
- else {
- ahci_scr_read(&ap->link, SCR_STATUS, &tmp);
-
- if ((tmp & 0xf) != 0x3)
- return;
- }

/* start DMA */
tmp = readl(port_mmio + PORT_CMD);
--
1.7.1

2011-05-14 17:48:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On Sat, May 14, 2011 at 3:28 AM, Tejun Heo <[email protected]> wrote:
>
> As we're already in -rc7, I'm sending the revert patch to both Jeff
> and Linus.

Applied,

Linus

2011-05-14 18:49:38

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On 05/14/2011 01:47 PM, Linus Torvalds wrote:
> On Sat, May 14, 2011 at 3:28 AM, Tejun Heo<[email protected]> wrote:
>>
>> As we're already in -rc7, I'm sending the revert patch to both Jeff
>> and Linus.
>
> Applied,

ACK, thanks

2011-05-15 09:24:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On Sunday, May 15, 2011, Jian Peng wrote:
> Hi, Michael/Rafael/Gu,
>
> Could you help me understand the testing environme? I like to reproduce it
> if it is possible and debug it. The host controller this patch dealing with
> is not used on PC so my testing environment is quite different.

Well, the system I can reproduce it on is a production one, so I can't
really test too much on it. What exactly would you like to do?

Rafael

2011-05-15 12:22:56

by Michael Leun

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On Sun, 15 May 2011 11:25:19 +0200
"Rafael J. Wysocki" <[email protected]> wrote:

> On Sunday, May 15, 2011, Jian Peng wrote:
> > Hi, Michael/Rafael/Gu,
> >
> > Could you help me understand the testing environme? I like to
> > reproduce it if it is possible and debug it. The host controller
> > this patch dealing with is not used on PC so my testing environment
> > is quite different.

We have seen it on at least 3 different controllers from at least 2
different vendors (I do not know, what hardware Gu has).

So, for now, it looks like to happen on more different controllers than
not.

Testing environment - uhm, what shall I say? On each of the two
notebooks affected runs 64bit openSuSE 11.4 in more or less default
configuration (apart from kernel, of course).

Just do a "ls -lR /" and watch kernel log while plugging/unplugging
power.

Maybe running a 64bit system/kernel is part of picture? As far as I can
see all affected machines run 64bit.

> Well, the system I can reproduce it on is a production one, so I can't
> really test too much on it. What exactly would you like to do?

I've at least one machine available where I can test. So, if there is
anything (preferably non destructive ;-) ) you want me to test this
should be possible.

Maybe I'll have a look if my netbook (running 32bit) is also affected.

--
MfG,

Michael Leun

2011-05-16 17:03:57

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On Sat, 14 May 2011 12:28:04 +0200, Tejun Heo said:
> This reverts commit 270dac35c26433d06a89150c51e75ca0181ca7e4.
>
> The commits causes command timeouts on AC plug/unplug. It isn't yet
> clear why. As the commit was for a single rather obscure controller,
> revert the change for now.
>
> The problem was reported and bisected by Gu Rui in bug#34692.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=34692
>
> Also, reported by Rafael and Michael in the following thread.
>
> http://thread.gmane.org/gmane.linux.kernel/1138771

This also fixes the issue I had with a 10-second pause on my Dell Latitude E6500
laptop at boot while detecting the CD/DVD drive.


Attachments:
(No filename) (227.00 B)

2011-05-18 19:44:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On Wednesday, May 18, 2011, Jian Peng wrote:
> Hi, Valdis/Rafael/Michael,
>
> Could you help me test the following change?
>
> After reverting 81ca7e4, add 5ms delay as follow since that seems also
> fixing the issue on my SATA host controller that requires 81ca7e4.
>
> In drivers/ata/libahci.c, inside ahci_hardreset() function,
>
>
> 1333 <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1333>
> ahci_start_engine
> <http://lxr.linux.no/linux+*/+code=ahci_start_engine>(ap
> <http://lxr.linux.no/linux+*/+code=ap>);
>
> msleep(5);1334
> <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1334>1335
> <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1335> if
> (online <http://lxr.linux.no/linux+*/+code=online>)1336
> <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1336>
> *class <http://lxr.linux.no/linux+*/+code=class> = ahci_dev_classify
> <http://lxr.linux.no/linux+*/+code=ahci_dev_classify>(ap
> <http://lxr.linux.no/linux+*/+code=ap>);
>
> Since my host controller requires time to switch internal state to be ready.
> Please let me know your testing result.

Could you simply post a patch?

Rafael

2011-05-19 10:19:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On Thursday, May 19, 2011, Jian Peng wrote:
> Hi, Rafael,

Hi,

> I am using Dell E6410 with same AHCI host controller as your system, but
> could not reproduce the bug you found out (using Ubuntu 11.04 and 2.6.39-rc6
> kernel).

OK, do I think correctly you tested without the patch you sent previously?

> Do you know which config file I need change to perform this testing?

I'm not sure what you mean. The kernel .config?

Rafael

2011-05-19 21:31:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On Thursday, May 19, 2011, Jian Peng wrote:
> Hi, Rafael,
>
> Yes. I tested without reverting the patch that caused failure on E6410 (with
> same AHCI controller), The system is Ubuntu 11.04 with default 2.6.39-rc6
> kernel (guess this should be same kernel as yours before reverting the
> patch).
>
> My system worked well with AC power cable plugging in and out at boot time,
> at runtime, and during "dd if=/dev/sda5 of=/dev/null ..".

What kind of disk did you use? Mine is an Intel SSD.

> The config file I mentioned is those under /etc/ or other directories that
> is used to control the power management.

They aren't relevant to this issue.

> Did you test my patch? What it the result?

No, I didn't. Unfortunately, I won't be able to do any tests on the affected
machine in the near future (most probably for the next two weeks or so).

Thanks,
Rafael

2011-05-19 22:02:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On Thursday, May 19, 2011, Jian Peng wrote:
> Hi, Rafael,
>
> I am using internal WDC laptop HDD. I will give SDD a try even though I only
> have non-Intel SDD now.
>
> Is your SDD the boot disk (by swapping the built-in HDD out)

Yes, it is.

> or the extra one installed using device bay on Dell laptop?

The machine is not a Dell. It's an Acer.

Thanks,
Rafael

2011-05-20 15:41:22

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said:

> > @@ -1353,6 +1332,8 @@
> >
> >
> > ahci_start_engine(ap);
> >
> > + msleep(5);
> > +
> > if (online)
> >
> > *class = ahci_dev_classify(ap);
> >

It may very well be that adding a magic msleep(5) here just Makes It Work, but
I have a gut feeling that it's in the wrong place (for starters, 'online' can't change
during the msleep() unless somebody *else* sets it - in which case the locking
is screwed up as we're not forcing a re-read of the value). The msleep() probably
needs to be before something else further down in the code (but I have no idea
exactly what).


Attachments:
(No filename) (227.00 B)

2011-05-20 15:43:48

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On Fri, May 20, 2011 at 11:40:20AM -0400, [email protected] wrote:
> On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said:
>
> > > @@ -1353,6 +1332,8 @@
> > >
> > >
> > > ahci_start_engine(ap);
> > >
> > > + msleep(5);
> > > +
> > > if (online)
> > >
> > > *class = ahci_dev_classify(ap);
> > >
>
> It may very well be that adding a magic msleep(5) here just Makes It Work, but
> I have a gut feeling that it's in the wrong place (for starters, 'online' can't change
> during the msleep() unless somebody *else* sets it - in which case the locking
> is screwed up as we're not forcing a re-read of the value). The msleep() probably
> needs to be before something else further down in the code (but I have no idea
> exactly what).

At this point, I think it would be better to simply add a flag and
enable the check for the affected controller.

Thanks.

--
tejun

2011-05-20 18:21:35

by Jian Peng

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

Sorry that I need?fix my gmail setting to make it show up in LKML.

Hi, Tejun/Valdis,

Since this is an interoperability issue of SATA host controller, the
first step I want to try it to make sure the tweak that MAKE my
controller WORK does not break other controllers.
You are both right that adding this majic 5ms delay at this place
should not be the final solution.

If this magic 5ms delay works on other affected systems, I plan to
post a new patch that inside ahci_start_engine(), still perform same
check, and show warning message if failed, but will set a flag, then
still set START bit, and if there is such failure flag, add 5ms delay.

Valdis, could you apply the following patch and retest it?

Tejun, please review it.

--- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700

+++ c/drivers/ata/libahci.c 2011-05-20 09:48:06.194663506 -0700
@@ -540,6 +540,7 @@

? void __iomem *port_mmio = ahci_port_base(ap);
? u32 tmp;
? u8 status;

+ int err = 0;


? status = readl(port_mmio + PORT_TFDATA) & 0xFF;


@@ -553,12 +554,12 @@

?? * specific controller will fail under this condition
?? */
? if (status & (ATA_BUSY | ATA_DRQ))
-? return;

+? err = 1;

? else {
?? ahci_scr_read(&ap->link, SCR_STATUS, &tmp);

?? if ((tmp & 0xf) != 0x3)
-?? return;

+?? err = 1;
? }

? /* start DMA */
@@ -566,6 +567,13 @@
? tmp |= PORT_CMD_START;
? writel(tmp, port_mmio + PORT_CMD);
? readl(port_mmio + PORT_CMD); /* flush */
+
+ /* Some controllers need longer time to be ready */
+ if(err) {
+? printk(KERN_WARNING
+?? "Controller in wrong state when setting START bit\n");
+? msleep(5);
+ }
?}
?EXPORT_SYMBOL_GPL(ahci_start_engine);

On Fri, May 20, 2011 at 8:43 AM, Tejun Heo <[email protected]> wrote:
>
> On Fri, May 20, 2011 at 11:40:20AM -0400, [email protected] wrote:
> > On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said:
> >
> > > > @@ -1353,6 +1332,8 @@
> > > >
> > > >
> > > > ? ahci_start_engine(ap);
> > > >
> > > > + msleep(5);
> > > > +
> > > > ? if (online)
> > > >
> > > > ? ?*class = ahci_dev_classify(ap);
> > > >
> >
> > It may very well be that adding a magic msleep(5) here just Makes It Work, but
> > I have a gut feeling that it's in the wrong place (for starters, 'online' can't change
> > during the msleep() unless somebody *else* sets it - in which case the locking
> > is screwed up as we're not forcing a re-read of the value). ?The msleep() probably
> > needs to be before something else further down in the code (but I have no idea
> > exactly what).
>
> At this point, I think it would be better to simply add a flag and
> enable the check for the affected controller.
>
> Thanks.
>
> --
> tejun

2011-05-20 18:26:12

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

On Fri, 20 May 2011 10:02:56 PDT, Jian Peng said:
> --20cf307f38f6d842a904a3b81730

> You are both right that adding this majic 5ms delay at this place should not
> be the final solution.
>
> If this magic 5ms delay works on other affected systems, I plan to post a
> new patch that inside ahci_start_engine(), still perform same check, and
> show warning message if failed, but will set a flag, then still set START
> bit, and if there is such failure flag, add 5ms delay.
>
> Valdis, could you apply the following patch and retest it?

I should be able to do that this weekend. To clarify - should this be with the
problem commit 270dac35c26433d06a89150c51e75ca0181ca7e4 applied, or reverted?


Attachments:
(No filename) (227.00 B)

2011-05-22 02:00:30

by Jian Peng

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

HI, Valdis,

This patch is on top of reverted patch 81ca7e4. So you should not
revert 81ca7e4 before applying this new one.

Best regards,
Jian

On Fri, May 20, 2011 at 11:25 AM, <[email protected]> wrote:
>
> On Fri, 20 May 2011 10:02:56 PDT, Jian Peng said:
> > --20cf307f38f6d842a904a3b81730
>
> > You are both right that adding this majic 5ms delay at this place should not
> > be the final solution.
> >
> > If this magic 5ms delay works on other affected systems, I plan to post a
> > new patch that inside ahci_start_engine(), still perform same check, and
> > show warning message if failed, but will set a flag, then still set START
> > bit, and if there is such failure flag, add 5ms delay.
> >
> > Valdis, could you apply the following patch and retest it?
>
> I should be able to do that this weekend. ?To clarify - should this be with the
> problem commit 270dac35c26433d06a89150c51e75ca0181ca7e4 applied, or reverted?
>

2011-05-23 12:13:18

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

Hello,

On Fri, May 20, 2011 at 10:02:56AM -0700, Jian Peng wrote:
> Hi, Tejun/Valdis,
>
> Since this is an interoperability issue of SATA host controller, the first
> step I want to try it to make sure the tweak that MAKE my controller WORK
> does not break other controllers.
> You are both right that adding this majic 5ms delay at this place should not
> be the final solution.
>
> If this magic 5ms delay works on other affected systems, I plan to post a
> new patch that inside ahci_start_engine(), still perform same check, and
> show warning message if failed, but will set a flag, then still set START
> bit, and if there is such failure flag, add 5ms delay.

Yeah, sounds like a plan but please add ample comment explaining
what's going on for which controller including link to the mailing
list threads. As we're basically adding a black magic, I would still
like to enable it only for the affected controllers so that we don't
have to worry whether there are controllers which are affected by this
problem but we don't know about.

> --- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700
> +++ c/drivers/ata/libahci.c 2011-05-20 09:48:06.194663506 -0700
> @@ -540,6 +540,7 @@
> void __iomem *port_mmio = ahci_port_base(ap);
> u32 tmp;
> u8 status;
> + int err = 0;

I think

bool stat_failed = false;

would be more in line with recent coding style.

> status = readl(port_mmio + PORT_TFDATA) & 0xFF;
>
> @@ -553,12 +554,12 @@
> * specific controller will fail under this condition
> */
> if (status & (ATA_BUSY | ATA_DRQ))
> - return;
> + err = 1;
> else {
> ahci_scr_read(&ap->link, SCR_STATUS, &tmp);
>
> if ((tmp & 0xf) != 0x3)
> - return;
> + err = 1;
> }
>
> /* start DMA */
> @@ -566,6 +567,13 @@
> tmp |= PORT_CMD_START;
> writel(tmp, port_mmio + PORT_CMD);
> readl(port_mmio + PORT_CMD); /* flush */
> +
> + /* Some controllers need longer time to be ready */
> + if(err) {
> + printk(KERN_WARNING
> + "Controller in wrong state when setting START bit\n");
> + msleep(5);

ata_port_printk()?

Thanks.

--
tejun

2011-05-24 01:04:27

by Jian Peng

[permalink] [raw]
Subject: Re: [PATCH v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

Hi, Tejun,

Defintely. I will clarify it as much as possible and meanwhile, wait
for it to be tested by reporters.

Thanks,
Jian

On Mon, May 23, 2011 at 5:13 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Fri, May 20, 2011 at 10:02:56AM -0700, Jian Peng wrote:
>> Hi, Tejun/Valdis,
>>
>> Since this is an interoperability issue of SATA host controller, the first
>> step I want to try it to make sure the tweak that MAKE my controller WORK
>> does not break other controllers.
>> You are both right that adding this majic 5ms delay at this place should not
>> be the final solution.
>>
>> If this magic 5ms delay works on other affected systems, I plan to post a
>> new patch that inside ahci_start_engine(), still perform same check, and
>> show warning message if failed, but will set a flag, then still set START
>> bit, and if there is such failure flag, add 5ms delay.
>
> Yeah, sounds like a plan but please add ample comment explaining
> what's going on for which controller including link to the mailing
> list threads. ?As we're basically adding a black magic, I would still
> like to enable it only for the affected controllers so that we don't
> have to worry whether there are controllers which are affected by this
> problem but we don't know about.
>
>> --- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700
>> +++ c/drivers/ata/libahci.c 2011-05-20 09:48:06.194663506 -0700
>> @@ -540,6 +540,7 @@
>> ? void __iomem *port_mmio = ahci_port_base(ap);
>> ? u32 tmp;
>> ? u8 status;
>> + int err = 0;
>
> I think
>
> ?bool stat_failed = false;
>
> would be more in line with recent coding style.
>
>> ? status = readl(port_mmio + PORT_TFDATA) & 0xFF;
>>
>> @@ -553,12 +554,12 @@
>> ? ?* specific controller will fail under this condition
>> ? ?*/
>> ? if (status & (ATA_BUSY | ATA_DRQ))
>> - ?return;
>> + ?err = 1;
>> ? else {
>> ? ?ahci_scr_read(&ap->link, SCR_STATUS, &tmp);
>>
>> ? ?if ((tmp & 0xf) != 0x3)
>> - ? return;
>> + ? err = 1;
>> ? }
>>
>> ? /* start DMA */
>> @@ -566,6 +567,13 @@
>> ? tmp |= PORT_CMD_START;
>> ? writel(tmp, port_mmio + PORT_CMD);
>> ? readl(port_mmio + PORT_CMD); /* flush */
>> +
>> + /* Some controllers need longer time to be ready */
>> + if(err) {
>> + ?printk(KERN_WARNING
>> + ? "Controller in wrong state when setting START bit\n");
>> + ?msleep(5);
>
> ?ata_port_printk()?
>
> Thanks.
>
> --
> tejun
>