LinuxLists.cc - questions regarding possible violation of AHCI spec in AHCI driver

2010-12-07 07:43:50

Subject: questions regarding possible violation of AHCI spec in AHCI driver

Recently, while bringing up a new AHCI host controller, I found out that current AHCI driver (in 2.6.37-rc3) may violate AHCI spec in function libahci.c: ahci_start_engine().

>From end of section 10.1.2 in AHCI 1.3 spec, it claims

Software shall not set PxCMD.ST to '1' until it is determined that a functional device is present on the port
as determined by PxTFD.STS.BSY = '0', PxTFD.STS.DRQ = '0', and PxSSTS.DET = 3h.

It seems working well on most controller without this extra checking, but does cause problem in our new core. Since toggling PxCMD.SUD already initiated reset process at early time, and by the time of ahci_start_engine() got called, BSY bit may not be cleared yet, and forcing PxCMD.ST bit to 1 will cause problem for HW in this case.

Since I do not know all history about AHCI driver, I really hope that AHCI driver and libata maintainer can comment on this.

Thanks,
Jian

2010-12-08 01:54:57

by Robert Hancock

[permalink] [raw]

Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

CCing linux-ide and Tejun.

On 12/07/2010 01:43 AM, Jian Peng wrote:
> Recently, while bringing up a new AHCI host controller, I found out that current AHCI driver (in 2.6.37-rc3) may violate AHCI spec in function libahci.c: ahci_start_engine().
>
> From end of section 10.1.2 in AHCI 1.3 spec, it claims
>
> Software shall not set PxCMD.ST to '1' until it is determined that a functional device is present on the port
> as determined by PxTFD.STS.BSY = '0', PxTFD.STS.DRQ = '0', and PxSSTS.DET = 3h.
>
> It seems working well on most controller without this extra checking, but does cause problem in our new core. Since toggling PxCMD.SUD already initiated reset process at early time, and by the time of ahci_start_engine() got called, BSY bit may not be cleared yet, and forcing PxCMD.ST bit to 1 will cause problem for HW in this case.
>
> Since I do not know all history about AHCI driver, I really hope that AHCI driver and libata maintainer can comment on this.

Seems like a valid point to me, according to the spec. I'm guessing it
just hasn't come up previously with other controllers?

2010-12-08 10:06:53

by Tejun Heo

[permalink] [raw]

Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

Hello,

On 12/08/2010 02:54 AM, Robert Hancock wrote:
> On 12/07/2010 01:43 AM, Jian Peng wrote:
>> Recently, while bringing up a new AHCI host controller, I found out
>> that current AHCI driver (in 2.6.37-rc3) may violate AHCI spec in
>> function libahci.c: ahci_start_engine().
>>
>> From end of section 10.1.2 in AHCI 1.3 spec, it claims
>>
>> Software shall not set PxCMD.ST to '1' until it is determined that
>> a functional device is present on the port as determined by
>> PxTFD.STS.BSY = '0', PxTFD.STS.DRQ = '0', and PxSSTS.DET = 3h.
>>
>> It seems working well on most controller without this extra
>> checking, but does cause problem in our new core. Since toggling
>> PxCMD.SUD already initiated reset process at early time, and by the
>> time of ahci_start_engine() got called, BSY bit may not be cleared
>> yet, and forcing PxCMD.ST bit to 1 will cause problem for HW in
>> this case.

Hmmm... interesting. Yeah, we have never had any problem in that area
and would like to avoid changing unless necessary but then again if
it's broken, well, we should. What kind of problem is the controller
showing?

Thanks.

--
tejun

2010-12-08 18:14:45

by Jian Peng

[permalink] [raw]

Subject: RE: questions regarding possible violation of AHCI spec in AHCI driver

Hi, Tejun,

The problem happened as follow:

After power up, inside ahci_init_one(), it will call ahci_power_up() to toggle PxCMD.SUD bit first, then HBA will send COMRESET to device, and device will send first D2H FIS back. Here it will call ahci_start_engine() to turn on PxCMD.ST to process command. In this case, it may run into race condition that transaction triggered by toggling PxCMD.SUD is not completed yet, and that is the reason why extra check is required by spec to guarantee that HBA already received FIS and in sane state.

In most HBA, either staggered spin-up feature was not supported, or time required for transaction is less than that between two function calls, it may work. IMHO, this is a clear violation of spec, and not robust against all HBA design.

The major concern is that ahci_start_engine() is used widely in EH and it does not return result to reflect whether ST bit was set or not, this may cause trouble in some cases. I am working on verifying those cases with different HBAs now.

Thanks,
Jian

-----Original Message-----
From: Tejun Heo [mailto:[email protected]]
Sent: Wednesday, December 08, 2010 2:07 AM
To: Robert Hancock
Cc: Jian Peng; [email protected]; [email protected]; ide
Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

Hello,

On 12/08/2010 02:54 AM, Robert Hancock wrote:
> On 12/07/2010 01:43 AM, Jian Peng wrote:
>> Recently, while bringing up a new AHCI host controller, I found out
>> that current AHCI driver (in 2.6.37-rc3) may violate AHCI spec in
>> function libahci.c: ahci_start_engine().
>>
>> From end of section 10.1.2 in AHCI 1.3 spec, it claims
>>
>> Software shall not set PxCMD.ST to '1' until it is determined that
>> a functional device is present on the port as determined by
>> PxTFD.STS.BSY = '0', PxTFD.STS.DRQ = '0', and PxSSTS.DET = 3h.
>>
>> It seems working well on most controller without this extra
>> checking, but does cause problem in our new core. Since toggling
>> PxCMD.SUD already initiated reset process at early time, and by the
>> time of ahci_start_engine() got called, BSY bit may not be cleared
>> yet, and forcing PxCMD.ST bit to 1 will cause problem for HW in
>> this case.

Hmmm... interesting. Yeah, we have never had any problem in that area
and would like to avoid changing unless necessary but then again if
it's broken, well, we should. What kind of problem is the controller
showing?

Thanks.

--
tejun

2010-12-08 18:24:54

by Tejun Heo

[permalink] [raw]

Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

Hello,

On 12/08/2010 07:14 PM, Jian Peng wrote:
> The problem happened as follow:
>
> After power up, inside ahci_init_one(), it will call ahci_power_up()
> to toggle PxCMD.SUD bit first, then HBA will send COMRESET to
> device, and device will send first D2H FIS back. Here it will call
> ahci_start_engine() to turn on PxCMD.ST to process command. In this
> case, it may run into race condition that transaction triggered by
> toggling PxCMD.SUD is not completed yet, and that is the reason why
> extra check is required by spec to guarantee that HBA already
> received FIS and in sane state.
>
> In most HBA, either staggered spin-up feature was not supported, or
> time required for transaction is less than that between two function
> calls, it may work. IMHO, this is a clear violation of spec, and not
> robust against all HBA design.
>
> The major concern is that ahci_start_engine() is used widely in EH
> and it does not return result to reflect whether ST bit was set or
> not, this may cause trouble in some cases. I am working on verifying
> those cases with different HBAs now.

I see, so it's not that the controller actually failed but you
observed the race condition. During EH, ahci_start_engine() is used
rather liberally when the driver wants the controller in working
condition. The assumption is that even if the driver tries to set ST
with the incorrect condition, the controller wouldn't go crazy where
it can't be restarted later, which _must_ be true as there's no atomic
way to check the condition and set ST.

The driver at the same time guarantees that if the whole
initialization protocol succeeds, the last ahci_start_engine() is
called after hardreset is sucessfully completed and thus all the
prerequisites are met.

So, yeap, the driver might set ST when the conditions are not met but
that can't be avoided completely anyway so the controller shouldn't go
completely bonkers for that (it's okay to fail in recoverable way) and
the driver will do the last ST setting after all the conditions are
met on the success path.

Thanks.

--
tejun

2010-12-08 18:49:45

by Jian Peng

[permalink] [raw]

Subject: RE: questions regarding possible violation of AHCI spec in AHCI driver

So it is reasonable to add extra check in ahci_start_engine() without returning status of ST bit. If so, here is my patch

--- libahci.c.orig 2010-12-08 10:42:48.383976763 -0800
+++ libahci.c 2010-12-08 10:45:17.495156944 -0800
@@ -542,6 +542,13 @@
{
void __iomem *port_mmio = ahci_port_base(ap);
u32 tmp;
+ u8 status = readl(port_mmio + PORT_TFDATA) & 0xFF;
+
+ /* avoid race condition per spec (end of section 10.1.2) */
+ if (status & (ATA_BUSY | ATA_DRQ) ||
+ ahci_scr_read(&ap->link, SCR_STATUS, &tmp) ||
+ (tmp & 0x0f) != 0x03)
+ return;

/* start DMA */
tmp = readl(port_mmio + PORT_CMD);

Thanks,
Jian
________________________________________
From: Tejun Heo [[email protected]]
Sent: Wednesday, December 08, 2010 10:24 AM
To: Jian Peng
Cc: Robert Hancock; [email protected]; [email protected]; ide
Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

Hello,

On 12/08/2010 07:14 PM, Jian Peng wrote:
> The problem happened as follow:
>
> After power up, inside ahci_init_one(), it will call ahci_power_up()
> to toggle PxCMD.SUD bit first, then HBA will send COMRESET to
> device, and device will send first D2H FIS back. Here it will call
> ahci_start_engine() to turn on PxCMD.ST to process command. In this
> case, it may run into race condition that transaction triggered by
> toggling PxCMD.SUD is not completed yet, and that is the reason why
> extra check is required by spec to guarantee that HBA already
> received FIS and in sane state.
>
> In most HBA, either staggered spin-up feature was not supported, or
> time required for transaction is less than that between two function
> calls, it may work. IMHO, this is a clear violation of spec, and not
> robust against all HBA design.
>
> The major concern is that ahci_start_engine() is used widely in EH
> and it does not return result to reflect whether ST bit was set or
> not, this may cause trouble in some cases. I am working on verifying
> those cases with different HBAs now.

I see, so it's not that the controller actually failed but you
observed the race condition. During EH, ahci_start_engine() is used
rather liberally when the driver wants the controller in working
condition. The assumption is that even if the driver tries to set ST
with the incorrect condition, the controller wouldn't go crazy where
it can't be restarted later, which _must_ be true as there's no atomic
way to check the condition and set ST.

The driver at the same time guarantees that if the whole
initialization protocol succeeds, the last ahci_start_engine() is
called after hardreset is sucessfully completed and thus all the
prerequisites are met.

So, yeap, the driver might set ST when the conditions are not met but
that can't be avoided completely anyway so the controller shouldn't go
completely bonkers for that (it's okay to fail in recoverable way) and
the driver will do the last ST setting after all the conditions are
met on the success path.

Thanks.

--
tejun

2010-12-08 18:53:00

by Tejun Heo

[permalink] [raw]

Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

Hello,

On 12/08/2010 07:48 PM, Jian Peng wrote:
> So it is reasonable to add extra check in ahci_start_engine() without returning status of ST bit. If so, here is my patch
>
> --- libahci.c.orig 2010-12-08 10:42:48.383976763 -0800
> +++ libahci.c 2010-12-08 10:45:17.495156944 -0800
> @@ -542,6 +542,13 @@
> {
> void __iomem *port_mmio = ahci_port_base(ap);
> u32 tmp;
> + u8 status = readl(port_mmio + PORT_TFDATA) & 0xFF;
> +
> + /* avoid race condition per spec (end of section 10.1.2) */
> + if (status & (ATA_BUSY | ATA_DRQ) ||
> + ahci_scr_read(&ap->link, SCR_STATUS, &tmp) ||
> + (tmp & 0x0f) != 0x03)
> + return;
>
> /* start DMA */
> tmp = readl(port_mmio + PORT_CMD);

Yes, it is reasonable but I want to see that it actually fixes
something. There are just too many controllers which use this path to
blindly apply the above change and given my previous explanation even
without the above change any ahci controller _should_ work fine.

Thanks.

--
tejun

2010-12-08 19:50:08

by Jian Peng

[permalink] [raw]

Subject: RE: questions regarding possible violation of AHCI spec in AHCI driver

I agree. I have AHCI based PCI card using HBA from Marvell, Via and Silicon Image, and am going to test my patch.
Before this patch can be applied universally, I like to use it for specific PCI_VENDOR_ID first. Here is my new patch to limit it to Broadcom's AHCI core

--- libahci.c.orig 2010-12-08 10:42:48.383976763 -0800
+++ libahci.c 2010-12-08 11:44:24.394023128 -0800
@@ -44,6 +44,7 @@
#include <scsi/scsi_host.h>
#include <scsi/scsi_cmnd.h>
#include <linux/libata.h>
+#include <linux/pci.h>
#include "ahci.h"

static int ahci_skip_host_reset;
@@ -541,8 +542,20 @@
void ahci_start_engine(struct ata_port *ap)
{
void __iomem *port_mmio = ahci_port_base(ap);
+ struct ata_host *host = ap->host;
+ struct pci_dev *pdev = to_pci_dev(host->dev);
u32 tmp;

+ /* avoid race condition per spec (end of section 10.1.2) */
+ if (pdev->vendor == PCI_VENDOR_ID_BROADCOM) {
+ u8 status = readl(port_mmio + PORT_TFDATA) & 0xFF;
+
+ if (status & (ATA_BUSY | ATA_DRQ) ||
+ ahci_scr_read(&ap->link, SCR_STATUS, &tmp) ||
+ (tmp & 0x0f) != 0x03)
+ return;
+ }
+
/* start DMA */
tmp = readl(port_mmio + PORT_CMD);
tmp |= PORT_CMD_START;

Thanks,
Jian

________________________________________
From: Tejun Heo [[email protected]]
Sent: Wednesday, December 08, 2010 10:52 AM
To: Jian Peng
Cc: Robert Hancock; [email protected]; [email protected]; ide
Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

Hello,

On 12/08/2010 07:48 PM, Jian Peng wrote:
> So it is reasonable to add extra check in ahci_start_engine() without returning status of ST bit. If so, here is my patch
>
> --- libahci.c.orig 2010-12-08 10:42:48.383976763 -0800
> +++ libahci.c 2010-12-08 10:45:17.495156944 -0800
> @@ -542,6 +542,13 @@
> {
> void __iomem *port_mmio = ahci_port_base(ap);
> u32 tmp;
> + u8 status = readl(port_mmio + PORT_TFDATA) & 0xFF;
> +
> + /* avoid race condition per spec (end of section 10.1.2) */
> + if (status & (ATA_BUSY | ATA_DRQ) ||
> + ahci_scr_read(&ap->link, SCR_STATUS, &tmp) ||
> + (tmp & 0x0f) != 0x03)
> + return;
>
> /* start DMA */
> tmp = readl(port_mmio + PORT_CMD);

Yes, it is reasonable but I want to see that it actually fixes
something. There are just too many controllers which use this path to
blindly apply the above change and given my previous explanation even
without the above change any ahci controller _should_ work fine.

Thanks.

--
tejun

2010-12-08 19:55:47

by Tejun Heo

[permalink] [raw]

Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

Hello,

On 12/08/2010 08:49 PM, Jian Peng wrote:
> I agree. I have AHCI based PCI card using HBA from Marvell, Via and
> Silicon Image, and am going to test my patch. Before this patch can
> be applied universally, I like to use it for specific PCI_VENDOR_ID
> first. Here is my new patch to limit it to Broadcom's AHCI core

Hmmm... is the change actually necessary for broadcom controllers? As
I wrote before, any ahci controller should just work without the above
checks because,

> + /* avoid race condition per spec (end of section 10.1.2) */
> + if (pdev->vendor == PCI_VENDOR_ID_BROADCOM) {
> + u8 status = readl(port_mmio + PORT_TFDATA) & 0xFF;
> +
> + if (status & (ATA_BUSY | ATA_DRQ) ||
> + ahci_scr_read(&ap->link, SCR_STATUS, &tmp) ||
> + (tmp & 0x0f) != 0x03)
> + return;

PHY event can occur here which causes the device to send D2H Reg FIS
w/ BSY set.

1. So, the controller _MUST NOT_ fail in irrecoverable way even if the
driver sets ST while BSY is set.

2. The driver guarantees the final ST setting before entering normal
operation is done when the prerequisites are met.

If you combine 1 and 2, the current behavior is perfectly fine. Do
broadcom controllers actually fail without the above change?

Thanks.

--
tejun

2010-12-08 20:09:47

by Jian Peng

[permalink] [raw]

Subject: RE: questions regarding possible violation of AHCI spec in AHCI driver

The controller may take much longer time to recover in this case, and leads to wrong HW state after stop_engine() inside ahci_hardreset() and cause device type checking failure due to unfinished HW state change and missing D2H FIS after start_engine() again inside ahci_hardreset(). I guess this is the reason why AHCI spec try to emphasize.

Yes, without this change, Broadcom controller will fail due to above reason.

Thanks,
Jian

-----Original Message-----
From: Tejun Heo [mailto:[email protected]]
Sent: Wednesday, December 08, 2010 11:55 AM
To: Jian Peng
Cc: Robert Hancock; [email protected]; [email protected]; ide
Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

Hello,

On 12/08/2010 08:49 PM, Jian Peng wrote:
> I agree. I have AHCI based PCI card using HBA from Marvell, Via and
> Silicon Image, and am going to test my patch. Before this patch can
> be applied universally, I like to use it for specific PCI_VENDOR_ID
> first. Here is my new patch to limit it to Broadcom's AHCI core

Hmmm... is the change actually necessary for broadcom controllers? As
I wrote before, any ahci controller should just work without the above
checks because,

> + /* avoid race condition per spec (end of section 10.1.2) */
> + if (pdev->vendor == PCI_VENDOR_ID_BROADCOM) {
> + u8 status = readl(port_mmio + PORT_TFDATA) & 0xFF;
> +
> + if (status & (ATA_BUSY | ATA_DRQ) ||
> + ahci_scr_read(&ap->link, SCR_STATUS, &tmp) ||
> + (tmp & 0x0f) != 0x03)
> + return;

PHY event can occur here which causes the device to send D2H Reg FIS
w/ BSY set.

1. So, the controller _MUST NOT_ fail in irrecoverable way even if the
driver sets ST while BSY is set.

2. The driver guarantees the final ST setting before entering normal
operation is done when the prerequisites are met.

If you combine 1 and 2, the current behavior is perfectly fine. Do
broadcom controllers actually fail without the above change?

Thanks.

--
tejun

2010-12-08 22:54:55

by Tejun Heo

[permalink] [raw]

Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

Hello, Jian.

On 12/08/2010 09:09 PM, Jian Peng wrote:
> The controller may take much longer time to recover in this case,
> and leads to wrong HW state after stop_engine() inside
> ahci_hardreset() and cause device type checking failure due to
> unfinished HW state change and missing D2H FIS after start_engine()
> again inside ahci_hardreset(). I guess this is the reason why AHCI
> spec try to emphasize.

I don't necessarily agree there. The requirement is impossible to
reliably satisfy to begin with (it's inherently racy) and most specs
are filled with "the outcome is undefined" when they don't _need_ to
be well defined. The hardware can do "eh.. well, whatever, I don't
know" but shouldn't get lost and fail to react to further
state-resetting actions.

> Yes, without this change, Broadcom controller will fail due to above
> reason.

Okay, so, the controller goes bonkers if ST is set when prerequisites
are not met. You know that the problem can still happen with the
proposed change, right? It's much less likely but definitely can and
actually is likely to happen once in a blue moon. It isn't too
uncommon for link to take some time to stabilize after a PHY event
(including hardreset) and some devices will end up sending multiple
D2H Reg FISes in the process with conflicting status. Also, note that
the delay between the check and ST setting could be substantial
especially with parallel probing / booting.

I'm not objecting to the change but you guys probably want to fix the
controller in following revisions. If we're gonna make the change,
I'd like to go with the previous version without the vendor check.
What is the timeframe for the controller release? Would it be enough
to merge the change during 2.6.38-rc1? After baking it for some time
in 2.6.38, we can propagate the change back through -stable.

Thanks.

--
tejun

2010-12-09 01:30:30

by Jian Peng

[permalink] [raw]

Subject: RE: questions regarding possible violation of AHCI spec in AHCI driver

Hi, Tejun,

I will go over with chip designer on all detail of this race condition again. AFAIK, our controller reacted to ST bit change but lack of full handshaking between SW and HW leads to failure finally.

I can definitely help checking all available controllers I can get. Schedule wise, it is not too bad since this AHCI core is part of SOC instead of standalone controller so we have manageable kernel and patches release for our customers. To help AHCI driver to be more compliant with spec, and also fix specific problem in our controller, it requires some actions.

I will post my findings on other controllers after testing it.

Thanks,
Jian

-----Original Message-----
From: Tejun Heo [mailto:[email protected]]
Sent: Wednesday, December 08, 2010 2:54 PM
To: Jian Peng
Cc: Robert Hancock; [email protected]; [email protected]; ide
Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

Hello, Jian.

On 12/08/2010 09:09 PM, Jian Peng wrote:
> The controller may take much longer time to recover in this case,
> and leads to wrong HW state after stop_engine() inside
> ahci_hardreset() and cause device type checking failure due to
> unfinished HW state change and missing D2H FIS after start_engine()
> again inside ahci_hardreset(). I guess this is the reason why AHCI
> spec try to emphasize.

I don't necessarily agree there. The requirement is impossible to
reliably satisfy to begin with (it's inherently racy) and most specs
are filled with "the outcome is undefined" when they don't _need_ to
be well defined. The hardware can do "eh.. well, whatever, I don't
know" but shouldn't get lost and fail to react to further
state-resetting actions.

> Yes, without this change, Broadcom controller will fail due to above
> reason.

Okay, so, the controller goes bonkers if ST is set when prerequisites
are not met. You know that the problem can still happen with the
proposed change, right? It's much less likely but definitely can and
actually is likely to happen once in a blue moon. It isn't too
uncommon for link to take some time to stabilize after a PHY event
(including hardreset) and some devices will end up sending multiple
D2H Reg FISes in the process with conflicting status. Also, note that
the delay between the check and ST setting could be substantial
especially with parallel probing / booting.

I'm not objecting to the change but you guys probably want to fix the
controller in following revisions. If we're gonna make the change,
I'd like to go with the previous version without the vendor check.
What is the timeframe for the controller release? Would it be enough
to merge the change during 2.6.38-rc1? After baking it for some time
in 2.6.38, we can propagate the change back through -stable.

Thanks.

--
tejun

2011-04-19 21:48:13

by Jian Peng

[permalink] [raw]

Subject: RE: questions regarding possible violation of AHCI spec in AHCI driver

Hi, Jeff/Tejun,

I retested my patch using Marvell AHCI controller card and Intel built-in controller with 2.6.38 kernel, it worked well. I resubmitted the patch against Linus git tree in separate email through git send-email already.

Please review it.

Thanks,
Jian

-----Original Message-----
From: Jeff Garzik [mailto:[email protected]] On Behalf Of Jeff Garzik
Sent: Tuesday, January 18, 2011 4:51 PM
To: Jian Peng
Cc: Tejun Heo; Robert Hancock; [email protected]; ide
Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver

On 01/11/2011 01:55 PM, Jian Peng wrote:
> Here is new patch based on your suggestion (against 2.6.37 release)
[...]
> I will test host controller made by other vendors and report issues soon.

Any updates?

Jeff