2013-05-10 22:53:06

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: is L1 really disabled in iwlwifi

[+cc Rafael, other pci_disable_link_state() users]

On Wed, May 01, 2013 at 11:13:15AM -0600, Bjorn Helgaas wrote:
> On Wed, May 1, 2013 at 2:31 AM, Emmanuel Grumbach <[email protected]> wrote:
> > [from Bjorn's mail]
> >> In Emmanuel's case, we don't get _OSC control, so
> >> pci_disable_link_state() does nothing.
> >
> > Right, but this is true with the specific log I sent to you. Is it
> > possible that another platform / BIOS, we *will* get _OSC control and
> > that pci_disable_link_state() will actually do something? In that case
> > I would prefer not to remove the call to pcie_disable_link_state().
>
> Yes, absolutely, on many platforms we will get _OSC control, and
> pci_disable_link_state() will work as expected. The problem is that
> the driver doesn't have a good way to know whether pci_disable_link()
> did anything or not.
>
> Today I think we have:
>
> 1) If the BIOS grants the OS permission to control PCIe services via
> _OSC, pci_disable_link_state() works and L1 will be disabled.
>
> 2) If the BIOS does not grant permission, pci_disable_link_state()
> does nothing and L1 may be enabled or not depending on what
> configuration the BIOS did.
>
> If the device really doesn't work reliably when L1 is enabled, we're
> currently at the mercy of the BIOS -- if the BIOS enables L1 but
> doesn't grant us permission via _OSC, L1 will remain enabled (as it is
> on your system).

I propose the following patch. Any comments?


commit cd11e3f87c4d2777cf8921c0454500c9baa54b46
Author: Bjorn Helgaas <[email protected]>
Date: Fri May 10 15:54:35 2013 -0600

PCI/ASPM: Allow drivers to disable ASPM unconditionally

Some devices have hardware problems related to using ASPM. Drivers for
these devices use pci_disable_link_state() to prevent their device from
entering L0s or L1. But on platforms where the OS doesn't have permission
to manage ASPM, pci_disable_link_state() does nothing, and the driver has
no way to know this.

Therefore, if the BIOS enables ASPM but declines (either via the FADT
ACPI_FADT_NO_ASPM bit or the _OSC method) to allow the OS to manage it,
the device can still use ASPM and trip over the hardware issue.

This patch makes pci_disable_link_state() disable ASPM unconditionally,
regardless of whether the OS has permission to manage ASPM in general.

Reported-by: Emmanuel Grumbach <[email protected]>
Reference: https://lkml.kernel.org/r/CANUX_P3F5YhbZX3WGU-j1AGpbXb_T9Bis2ErhvKkFMtDvzatVQ@mail.gmail.com
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=57331
Signed-off-by: Bjorn Helgaas <[email protected]>

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index d320df6..9ef4ab8 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -718,15 +718,11 @@ void pcie_aspm_powersave_config_link(struct pci_dev *pdev)
* pci_disable_link_state - disable pci device's link state, so the link will
* never enter specific states
*/
-static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
- bool force)
+static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
{
struct pci_dev *parent = pdev->bus->self;
struct pcie_link_state *link;

- if (aspm_disabled && !force)
- return;
-
if (!pci_is_pcie(pdev))
return;

@@ -757,13 +753,13 @@ static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,

void pci_disable_link_state_locked(struct pci_dev *pdev, int state)
{
- __pci_disable_link_state(pdev, state, false, false);
+ __pci_disable_link_state(pdev, state, false);
}
EXPORT_SYMBOL(pci_disable_link_state_locked);

void pci_disable_link_state(struct pci_dev *pdev, int state)
{
- __pci_disable_link_state(pdev, state, true, false);
+ __pci_disable_link_state(pdev, state, true);
}
EXPORT_SYMBOL(pci_disable_link_state);

@@ -781,7 +777,7 @@ void pcie_clear_aspm(struct pci_bus *bus)
__pci_disable_link_state(child, PCIE_LINK_STATE_L0S |
PCIE_LINK_STATE_L1 |
PCIE_LINK_STATE_CLKPM,
- false, true);
+ false);
}
}


2013-05-11 20:18:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: is L1 really disabled in iwlwifi

On Friday, May 10, 2013 04:52:57 PM Bjorn Helgaas wrote:
> [+cc Rafael, other pci_disable_link_state() users]
>
> On Wed, May 01, 2013 at 11:13:15AM -0600, Bjorn Helgaas wrote:
> > On Wed, May 1, 2013 at 2:31 AM, Emmanuel Grumbach <[email protected]> wrote:
> > > [from Bjorn's mail]
> > >> In Emmanuel's case, we don't get _OSC control, so
> > >> pci_disable_link_state() does nothing.
> > >
> > > Right, but this is true with the specific log I sent to you. Is it
> > > possible that another platform / BIOS, we *will* get _OSC control and
> > > that pci_disable_link_state() will actually do something? In that case
> > > I would prefer not to remove the call to pcie_disable_link_state().
> >
> > Yes, absolutely, on many platforms we will get _OSC control, and
> > pci_disable_link_state() will work as expected. The problem is that
> > the driver doesn't have a good way to know whether pci_disable_link()
> > did anything or not.
> >
> > Today I think we have:
> >
> > 1) If the BIOS grants the OS permission to control PCIe services via
> > _OSC, pci_disable_link_state() works and L1 will be disabled.
> >
> > 2) If the BIOS does not grant permission, pci_disable_link_state()
> > does nothing and L1 may be enabled or not depending on what
> > configuration the BIOS did.
> >
> > If the device really doesn't work reliably when L1 is enabled, we're
> > currently at the mercy of the BIOS -- if the BIOS enables L1 but
> > doesn't grant us permission via _OSC, L1 will remain enabled (as it is
> > on your system).
>
> I propose the following patch. Any comments?

In my opinion this is dangerous, because it opens us to bugs that right now
are prevented from happening due to the way the code works.

Thanks,
Rafael


> commit cd11e3f87c4d2777cf8921c0454500c9baa54b46
> Author: Bjorn Helgaas <[email protected]>
> Date: Fri May 10 15:54:35 2013 -0600
>
> PCI/ASPM: Allow drivers to disable ASPM unconditionally
>
> Some devices have hardware problems related to using ASPM. Drivers for
> these devices use pci_disable_link_state() to prevent their device from
> entering L0s or L1. But on platforms where the OS doesn't have permission
> to manage ASPM, pci_disable_link_state() does nothing, and the driver has
> no way to know this.
>
> Therefore, if the BIOS enables ASPM but declines (either via the FADT
> ACPI_FADT_NO_ASPM bit or the _OSC method) to allow the OS to manage it,
> the device can still use ASPM and trip over the hardware issue.
>
> This patch makes pci_disable_link_state() disable ASPM unconditionally,
> regardless of whether the OS has permission to manage ASPM in general.
>
> Reported-by: Emmanuel Grumbach <[email protected]>
> Reference: https://lkml.kernel.org/r/CANUX_P3F5YhbZX3WGU-j1AGpbXb_T9Bis2ErhvKkFMtDvzatVQ@mail.gmail.com
> Reference: https://bugzilla.kernel.org/show_bug.cgi?id=57331
> Signed-off-by: Bjorn Helgaas <[email protected]>
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index d320df6..9ef4ab8 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -718,15 +718,11 @@ void pcie_aspm_powersave_config_link(struct pci_dev *pdev)
> * pci_disable_link_state - disable pci device's link state, so the link will
> * never enter specific states
> */
> -static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
> - bool force)
> +static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
> {
> struct pci_dev *parent = pdev->bus->self;
> struct pcie_link_state *link;
>
> - if (aspm_disabled && !force)
> - return;
> -
> if (!pci_is_pcie(pdev))
> return;
>
> @@ -757,13 +753,13 @@ static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
>
> void pci_disable_link_state_locked(struct pci_dev *pdev, int state)
> {
> - __pci_disable_link_state(pdev, state, false, false);
> + __pci_disable_link_state(pdev, state, false);
> }
> EXPORT_SYMBOL(pci_disable_link_state_locked);
>
> void pci_disable_link_state(struct pci_dev *pdev, int state)
> {
> - __pci_disable_link_state(pdev, state, true, false);
> + __pci_disable_link_state(pdev, state, true);
> }
> EXPORT_SYMBOL(pci_disable_link_state);
>
> @@ -781,7 +777,7 @@ void pcie_clear_aspm(struct pci_bus *bus)
> __pci_disable_link_state(child, PCIE_LINK_STATE_L0S |
> PCIE_LINK_STATE_L1 |
> PCIE_LINK_STATE_CLKPM,
> - false, true);
> + false);
> }
> }
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-05-11 20:22:22

by Matthew Garrett

[permalink] [raw]
Subject: Re: is L1 really disabled in iwlwifi

On Sat, 2013-05-11 at 22:26 +0200, Rafael J. Wysocki wrote:
> On Friday, May 10, 2013 04:52:57 PM Bjorn Helgaas wrote:
> > I propose the following patch. Any comments?
>
> In my opinion this is dangerous, because it opens us to bugs that right now
> are prevented from happening due to the way the code works.

Right, I'm also not entirely comfortable with this. The current
behaviour may be confusing, but we could reduce that by renaming the
functions. I'm still not clear on whether anyone's actually seeing
problems caused by the existing behaviour.

--
Matthew Garrett | [email protected]
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-05-16 22:55:42

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: is L1 really disabled in iwlwifi

On Sat, May 11, 2013 at 08:22:11PM +0000, Matthew Garrett wrote:
> On Sat, 2013-05-11 at 22:26 +0200, Rafael J. Wysocki wrote:
> > On Friday, May 10, 2013 04:52:57 PM Bjorn Helgaas wrote:
> > > I propose the following patch. Any comments?
> >
> > In my opinion this is dangerous, because it opens us to bugs that right now
> > are prevented from happening due to the way the code works.
>
> Right, I'm also not entirely comfortable with this. The current
> behaviour may be confusing, but we could reduce that by renaming the
> functions. I'm still not clear on whether anyone's actually seeing
> problems caused by the existing behaviour.

I couldn't imagine that silently ignoring the request to disable ASPM
would be the right thing, but I spent a long time experimenting with
Windows on qemu, and I think you're right. Windows 7 also seems to
ignore the "PciASPMOptOut" directive when we don't have permission
to manage ASPM. All the gory details are at
https://bugzilla.kernel.org/show_bug.cgi?id=57331

The current behavior is definitely confusing. I hate to rename or change
pci_disable_link_state() because it's exported and we'd have to maintain
the old interface for a while anyway. And I don't really want to return
failure to drivers, because I think that would encourage people to fiddle
with the Link Control register directly in the driver, which doesn't seem
like a good idea.

And you're also right that (as far as I know) there's not an actual
problem with the current behavior other than the confusion it causes.

So, how about something like the following patch, which just prints a
warning when we can't do what the driver requested? I suppose this may
also be a nuisance, because users will be worried, but they can't actually
*do* anything about it. Maybe it should be dev_info() instead.

commit f1956960fa0759c53b28e3a2810bd7e1b6e8925f
Author: Bjorn Helgaas <[email protected]>
Date: Wed May 15 17:02:37 2013 -0600

PCI/ASPM: Warn when driver asks to disable ASPM, but we can't do it

Some devices have hardware problems related to using ASPM. Drivers for
these devices use pci_disable_link_state() to prevent their device from
entering L0s or L1. But on platforms where the OS doesn't have permission
to manage ASPM, pci_disable_link_state() doesn't actually disable ASPM.

Windows has a similar mechanism ("PciASPMOptOut"), and when the OS doesn't
have control of ASPM, it doesn't actually disable ASPM either.

This patch just adds a warning in dmesg about the fact that
pci_disable_link_state() is doing nothing.

Reported-by: Emmanuel Grumbach <[email protected]>
Reference: https://lkml.kernel.org/r/CANUX_P3F5YhbZX3WGU-j1AGpbXb_T9Bis2ErhvKkFMtDvzatVQ@mail.gmail.com
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=57331
Signed-off-by: Bjorn Helgaas <[email protected]>

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index d320df6..faa83b6 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -724,9 +724,6 @@ static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
struct pci_dev *parent = pdev->bus->self;
struct pcie_link_state *link;

- if (aspm_disabled && !force)
- return;
-
if (!pci_is_pcie(pdev))
return;

@@ -736,6 +733,19 @@ static void __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem,
if (!parent || !parent->link_state)
return;

+ /*
+ * A driver requested that ASPM be disabled on this device, but
+ * if we don't have permission to manage ASPM (e.g., on ACPI
+ * systems we have to observe the FADT ACPI_FADT_NO_ASPM bit and
+ * the _OSC method), we can't honor that request. Windows has
+ * a similar mechanism using "PciASPMOptOut", which is also
+ * ignored in this situation.
+ */
+ if (aspm_disabled && !force) {
+ dev_warn(&pdev->dev, "can't disable ASPM; OS doesn't have ASPM control\n");
+ return;
+ }
+
if (sem)
down_read(&pci_bus_sem);
mutex_lock(&aspm_lock);

2013-05-17 05:49:07

by Emmanuel Grumbach

[permalink] [raw]
Subject: Re: is L1 really disabled in iwlwifi

>
> I couldn't imagine that silently ignoring the request to disable ASPM
> would be the right thing, but I spent a long time experimenting with
> Windows on qemu, and I think you're right. Windows 7 also seems to
> ignore the "PciASPMOptOut" directive when we don't have permission
> to manage ASPM. All the gory details are at
> https://bugzilla.kernel.org/show_bug.cgi?id=57331
>
> The current behavior is definitely confusing. I hate to rename or change
> pci_disable_link_state() because it's exported and we'd have to maintain
> the old interface for a while anyway. And I don't really want to return
> failure to drivers, because I think that would encourage people to fiddle
> with the Link Control register directly in the driver, which doesn't seem
> like a good idea.
>
> And you're also right that (as far as I know) there's not an actual
> problem with the current behavior other than the confusion it causes.
>
> So, how about something like the following patch, which just prints a
> warning when we can't do what the driver requested? I suppose this may
> also be a nuisance, because users will be worried, but they can't actually
> *do* anything about it. Maybe it should be dev_info() instead.
>

Good for me - now I would be notified that something wrong happened.