2013-03-28 04:29:09

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] PCI, ACPI: Don't query OSC support with all possible controls

Found problem on system that firmware that could handle pci aer.
Firmware get error reporting after pci injecting error, before os boots.
But after os boots, firmware can not get report anymore, even pci=noaer
is passed.

Root cause: BIOS _OSC has problem with query bit checking.
It turns out that BIOS vendor is copying example code from ACPI Spec.
In ACPI Spec 5.0, page 290:

If (Not(And(CDW1,1))) // Query flag clear?
{ // Disable GPEs for features granted native control.
If (And(CTRL,0x01)) // Hot plug control granted?
{
Store(0,HPCE) // clear the hot plug SCI enable bit
Store(1,HPCS) // clear the hot plug SCI status bit
}
...
}

When Query flag is set, And(CDW1,1) will be 1, Not(1) will return 0xfffffffe.
So it will get into code path that should be for control set only.
BIOS acpi code should be changed to "If (LEqual(And(CDW1,1), 0)))"

Current kernel code is using _OSC query to notify firmware about support
from OS and then use _OSC to set control bits.
During query support, current code is using all possible controls.
So will execute code that should be only for control set stage.

That will have problem when pci=noaer or aer firmware_first is used.
As firmware have that control set for os aer already in query support stage,
but later will not os aer handling.

We should avoid passing all possible controls, just use osc_control_set
instead.
That should workaround BIOS bugs with affected systems on the field
as more bios vendors are copying sample code from ACPI spec.

Signed-off-by: Yinghai Lu <[email protected]>
Cc: [email protected]

---
drivers/acpi/pci_root.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/acpi/pci_root.c
===================================================================
--- linux-2.6.orig/drivers/acpi/pci_root.c
+++ linux-2.6/drivers/acpi/pci_root.c
@@ -201,8 +201,8 @@ static acpi_status acpi_pci_query_osc(st
*control &= OSC_PCI_CONTROL_MASKS;
capbuf[OSC_CONTROL_TYPE] = *control | root->osc_control_set;
} else {
- /* Run _OSC query for all possible controls. */
- capbuf[OSC_CONTROL_TYPE] = OSC_PCI_CONTROL_MASKS;
+ /* Run _OSC query only with existing controls. */
+ capbuf[OSC_CONTROL_TYPE] = root->osc_control_set;
}

status = acpi_pci_run_osc(root->device->handle, capbuf, &result);


2013-03-30 00:28:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PCI, ACPI: Don't query OSC support with all possible controls

On Wednesday, March 27, 2013 09:28:58 PM Yinghai Lu wrote:
> Found problem on system that firmware that could handle pci aer.
> Firmware get error reporting after pci injecting error, before os boots.
> But after os boots, firmware can not get report anymore, even pci=noaer
> is passed.
>
> Root cause: BIOS _OSC has problem with query bit checking.
> It turns out that BIOS vendor is copying example code from ACPI Spec.
> In ACPI Spec 5.0, page 290:
>
> If (Not(And(CDW1,1))) // Query flag clear?
> { // Disable GPEs for features granted native control.
> If (And(CTRL,0x01)) // Hot plug control granted?
> {
> Store(0,HPCE) // clear the hot plug SCI enable bit
> Store(1,HPCS) // clear the hot plug SCI status bit
> }
> ...
> }
>
> When Query flag is set, And(CDW1,1) will be 1, Not(1) will return 0xfffffffe.
> So it will get into code path that should be for control set only.
> BIOS acpi code should be changed to "If (LEqual(And(CDW1,1), 0)))"
>
> Current kernel code is using _OSC query to notify firmware about support
> from OS and then use _OSC to set control bits.
> During query support, current code is using all possible controls.
> So will execute code that should be only for control set stage.
>
> That will have problem when pci=noaer or aer firmware_first is used.
> As firmware have that control set for os aer already in query support stage,
> but later will not os aer handling.
>
> We should avoid passing all possible controls, just use osc_control_set
> instead.
> That should workaround BIOS bugs with affected systems on the field
> as more bios vendors are copying sample code from ACPI spec.
>
> Signed-off-by: Yinghai Lu <[email protected]>
> Cc: [email protected]
>
> ---
> drivers/acpi/pci_root.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/drivers/acpi/pci_root.c
> ===================================================================
> --- linux-2.6.orig/drivers/acpi/pci_root.c
> +++ linux-2.6/drivers/acpi/pci_root.c
> @@ -201,8 +201,8 @@ static acpi_status acpi_pci_query_osc(st
> *control &= OSC_PCI_CONTROL_MASKS;
> capbuf[OSC_CONTROL_TYPE] = *control | root->osc_control_set;
> } else {
> - /* Run _OSC query for all possible controls. */
> - capbuf[OSC_CONTROL_TYPE] = OSC_PCI_CONTROL_MASKS;
> + /* Run _OSC query only with existing controls. */
> + capbuf[OSC_CONTROL_TYPE] = root->osc_control_set;

I suppose we can do that, but then why this should be root->osc_control_set and
not just 0?

> }
>
> status = acpi_pci_run_osc(root->device->handle, capbuf, &result);
> --

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-03-30 01:02:48

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] PCI, ACPI: Don't query OSC support with all possible controls

On Fri, Mar 29, 2013 at 5:36 PM, Rafael J. Wysocki <[email protected]> wrote:
>> - /* Run _OSC query for all possible controls. */
>> - capbuf[OSC_CONTROL_TYPE] = OSC_PCI_CONTROL_MASKS;
>> + /* Run _OSC query only with existing controls. */
>> + capbuf[OSC_CONTROL_TYPE] = root->osc_control_set;
>
> I suppose we can do that, but then why this should be root->osc_control_set and
> not just 0?

in case query support and set control are called in mixed sequence.

And ACPI spec says if control set and can not be revoked.

also when it control is passed, it is always OR with root->os_control_set.
capbuf[OSC_CONTROL_TYPE] = *control | root->osc_control_set;


Yinghai

2013-03-30 01:05:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PCI, ACPI: Don't query OSC support with all possible controls

On Friday, March 29, 2013 06:02:45 PM Yinghai Lu wrote:
> On Fri, Mar 29, 2013 at 5:36 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> - /* Run _OSC query for all possible controls. */
> >> - capbuf[OSC_CONTROL_TYPE] = OSC_PCI_CONTROL_MASKS;
> >> + /* Run _OSC query only with existing controls. */
> >> + capbuf[OSC_CONTROL_TYPE] = root->osc_control_set;
> >
> > I suppose we can do that, but then why this should be root->osc_control_set and
> > not just 0?
>
> in case query support and set control are called in mixed sequence.

OK, that's a good enough reason I think.

I'm kind of afarid of regressions that may result from this, though, so I'm
going to queue it up for 3.10.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-03-30 01:08:38

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] PCI, ACPI: Don't query OSC support with all possible controls

On Fri, Mar 29, 2013 at 6:12 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Friday, March 29, 2013 06:02:45 PM Yinghai Lu wrote:
>> On Fri, Mar 29, 2013 at 5:36 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >> - /* Run _OSC query for all possible controls. */
>> >> - capbuf[OSC_CONTROL_TYPE] = OSC_PCI_CONTROL_MASKS;
>> >> + /* Run _OSC query only with existing controls. */
>> >> + capbuf[OSC_CONTROL_TYPE] = root->osc_control_set;
>> >
>> > I suppose we can do that, but then why this should be root->osc_control_set and
>> > not just 0?
>>
>> in case query support and set control are called in mixed sequence.
>
> OK, that's a good enough reason I think.
>
> I'm kind of afarid of regressions that may result from this, though, so I'm
> going to queue it up for 3.10.

Ok,

Thanks

2013-04-03 23:01:23

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI, ACPI: Don't query OSC support with all possible controls

[+cc Bob for spec typo question]

On Wed, Mar 27, 2013 at 10:28 PM, Yinghai Lu <[email protected]> wrote:
> Found problem on system that firmware that could handle pci aer.
> Firmware get error reporting after pci injecting error, before os boots.
> But after os boots, firmware can not get report anymore, even pci=noaer
> is passed.
>
> Root cause: BIOS _OSC has problem with query bit checking.
> It turns out that BIOS vendor is copying example code from ACPI Spec.
> In ACPI Spec 5.0, page 290:
>
> If (Not(And(CDW1,1))) // Query flag clear?
> { // Disable GPEs for features granted native control.
> If (And(CTRL,0x01)) // Hot plug control granted?
> {
> Store(0,HPCE) // clear the hot plug SCI enable bit
> Store(1,HPCS) // clear the hot plug SCI status bit
> }
> ...
> }
>
> When Query flag is set, And(CDW1,1) will be 1, Not(1) will return 0xfffffffe.
> So it will get into code path that should be for control set only.
> BIOS acpi code should be changed to "If (LEqual(And(CDW1,1), 0)))"

Isn't this just a typo in the spec? Shouldn't it be using "LNot"
instead of "Not"?

If (LNot(And(CDW1,1))) // Query flag clear?

Of course, that doesn't change the need for your Linux change, though
a comment about the hazard might be nice for future readers.

> Current kernel code is using _OSC query to notify firmware about support
> from OS and then use _OSC to set control bits.
> During query support, current code is using all possible controls.
> So will execute code that should be only for control set stage.
>
> That will have problem when pci=noaer or aer firmware_first is used.
> As firmware have that control set for os aer already in query support stage,
> but later will not os aer handling.
>
> We should avoid passing all possible controls, just use osc_control_set
> instead.
> That should workaround BIOS bugs with affected systems on the field
> as more bios vendors are copying sample code from ACPI spec.
>
> Signed-off-by: Yinghai Lu <[email protected]>
> Cc: [email protected]
>
> ---
> drivers/acpi/pci_root.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/drivers/acpi/pci_root.c
> ===================================================================
> --- linux-2.6.orig/drivers/acpi/pci_root.c
> +++ linux-2.6/drivers/acpi/pci_root.c
> @@ -201,8 +201,8 @@ static acpi_status acpi_pci_query_osc(st
> *control &= OSC_PCI_CONTROL_MASKS;
> capbuf[OSC_CONTROL_TYPE] = *control | root->osc_control_set;
> } else {
> - /* Run _OSC query for all possible controls. */
> - capbuf[OSC_CONTROL_TYPE] = OSC_PCI_CONTROL_MASKS;
> + /* Run _OSC query only with existing controls. */
> + capbuf[OSC_CONTROL_TYPE] = root->osc_control_set;
> }
>
> status = acpi_pci_run_osc(root->device->handle, capbuf, &result);