2021-10-11 16:37:36

by Jonas Dreßler

[permalink] [raw]
Subject: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
reports a hardcoded LTR value to the system during initialization,
probably as an (unsuccessful) attempt of the developers to fix firmware
crashes. This LTR value prevents most of the Microsoft Surface devices
from entering deep powersaving states (either platform C-State 10 or
S0ix state), because the exit latency of that state would be higher than
what the card can tolerate.

Turns out the card works just the same (including the firmware crashes)
no matter if that hardcoded LTR value is reported or not, so it's kind
of useless and only prevents us from saving power.

To get rid of those hardcoded LTR requirements, it's possible to reset
the PCI bridge device after initializing the cards firmware. I'm not
exactly sure why that works, maybe the power management subsystem of the
PCH resets its stored LTR values when doing a function level reset of
the bridge device. Doing the reset once after starting the wifi firmware
works very well, probably because the firmware only reports that LTR
value a single time during firmware startup.

Signed-off-by: Jonas Dreßler <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 12 +++++++++
.../wireless/marvell/mwifiex/pcie_quirks.c | 26 +++++++++++++------
.../wireless/marvell/mwifiex/pcie_quirks.h | 1 +
3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index c6ccce426b49..2506e7e49f0c 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -1748,9 +1748,21 @@ mwifiex_pcie_send_boot_cmd(struct mwifiex_adapter *adapter, struct sk_buff *skb)
static int mwifiex_pcie_init_fw_port(struct mwifiex_adapter *adapter)
{
struct pcie_service_card *card = adapter->card;
+ struct pci_dev *pdev = card->dev;
+ struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
const struct mwifiex_pcie_card_reg *reg = card->pcie.reg;
int tx_wrap = card->txbd_wrptr & reg->tx_wrap_mask;

+ /* Trigger a function level reset of the PCI bridge device, this makes
+ * the firmware (latest version 15.68.19.p21) of the 88W8897 PCIe+USB
+ * card stop reporting a fixed LTR value that prevents the system from
+ * entering package C10 and S0ix powersaving states.
+ * We need to do it here because it must happen after firmware
+ * initialization and this function is called right after that is done.
+ */
+ if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
+ pci_reset_function(parent_pdev);
+
/* Write the RX ring read pointer in to reg->rx_rdptr */
if (mwifiex_write_reg(adapter, reg->rx_rdptr, card->rxbd_rdptr |
tx_wrap)) {
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
index 0234cf3c2974..cbf0565353ae 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
@@ -27,7 +27,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 4"),
},
- .driver_data = (void *)QUIRK_FW_RST_D3COLD,
+ .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
+ QUIRK_DO_FLR_ON_BRIDGE),
},
{
.ident = "Surface Pro 5",
@@ -36,7 +37,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1796"),
},
- .driver_data = (void *)QUIRK_FW_RST_D3COLD,
+ .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
+ QUIRK_DO_FLR_ON_BRIDGE),
},
{
.ident = "Surface Pro 5 (LTE)",
@@ -45,7 +47,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1807"),
},
- .driver_data = (void *)QUIRK_FW_RST_D3COLD,
+ .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
+ QUIRK_DO_FLR_ON_BRIDGE),
},
{
.ident = "Surface Pro 6",
@@ -53,7 +56,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 6"),
},
- .driver_data = (void *)QUIRK_FW_RST_D3COLD,
+ .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
+ QUIRK_DO_FLR_ON_BRIDGE),
},
{
.ident = "Surface Book 1",
@@ -61,7 +65,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book"),
},
- .driver_data = (void *)QUIRK_FW_RST_D3COLD,
+ .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
+ QUIRK_DO_FLR_ON_BRIDGE),
},
{
.ident = "Surface Book 2",
@@ -69,7 +74,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book 2"),
},
- .driver_data = (void *)QUIRK_FW_RST_D3COLD,
+ .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
+ QUIRK_DO_FLR_ON_BRIDGE),
},
{
.ident = "Surface Laptop 1",
@@ -77,7 +83,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop"),
},
- .driver_data = (void *)QUIRK_FW_RST_D3COLD,
+ .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
+ QUIRK_DO_FLR_ON_BRIDGE),
},
{
.ident = "Surface Laptop 2",
@@ -85,7 +92,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop 2"),
},
- .driver_data = (void *)QUIRK_FW_RST_D3COLD,
+ .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
+ QUIRK_DO_FLR_ON_BRIDGE),
},
{}
};
@@ -103,6 +111,8 @@ void mwifiex_initialize_quirks(struct pcie_service_card *card)
dev_info(&pdev->dev, "no quirks enabled\n");
if (card->quirks & QUIRK_FW_RST_D3COLD)
dev_info(&pdev->dev, "quirk reset_d3cold enabled\n");
+ if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
+ dev_info(&pdev->dev, "quirk do_flr_on_bridge enabled\n");
}

static void mwifiex_pcie_set_power_d3cold(struct pci_dev *pdev)
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
index 8ec4176d698f..f8d463f4269a 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
+++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
@@ -18,6 +18,7 @@
#include "pcie.h"

#define QUIRK_FW_RST_D3COLD BIT(0)
+#define QUIRK_DO_FLR_ON_BRIDGE BIT(1)

void mwifiex_initialize_quirks(struct pcie_service_card *card);
int mwifiex_pcie_reset_d3cold_quirk(struct pci_dev *pdev);
--
2.31.1


2021-10-11 16:54:16

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

[+cc Alex]

On Mon, Oct 11, 2021 at 03:42:38PM +0200, Jonas Dre?ler wrote:
> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
> reports a hardcoded LTR value to the system during initialization,
> probably as an (unsuccessful) attempt of the developers to fix firmware
> crashes. This LTR value prevents most of the Microsoft Surface devices
> from entering deep powersaving states (either platform C-State 10 or
> S0ix state), because the exit latency of that state would be higher than
> what the card can tolerate.

S0ix and C-State 10 are ACPI concepts that don't mean anything in a
PCIe context.

I think LTR is only involved in deciding whether to enter the ASPM
L1.2 substate. Maybe the system will only enter C-State 10 or S0ix
when the link is in L1.2?

> Turns out the card works just the same (including the firmware crashes)
> no matter if that hardcoded LTR value is reported or not, so it's kind
> of useless and only prevents us from saving power.
>
> To get rid of those hardcoded LTR requirements, it's possible to reset
> the PCI bridge device after initializing the cards firmware. I'm not
> exactly sure why that works, maybe the power management subsystem of the
> PCH resets its stored LTR values when doing a function level reset of
> the bridge device. Doing the reset once after starting the wifi firmware
> works very well, probably because the firmware only reports that LTR
> value a single time during firmware startup.
>
> Signed-off-by: Jonas Dre?ler <[email protected]>
> ---
> drivers/net/wireless/marvell/mwifiex/pcie.c | 12 +++++++++
> .../wireless/marvell/mwifiex/pcie_quirks.c | 26 +++++++++++++------
> .../wireless/marvell/mwifiex/pcie_quirks.h | 1 +
> 3 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index c6ccce426b49..2506e7e49f0c 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -1748,9 +1748,21 @@ mwifiex_pcie_send_boot_cmd(struct mwifiex_adapter *adapter, struct sk_buff *skb)
> static int mwifiex_pcie_init_fw_port(struct mwifiex_adapter *adapter)
> {
> struct pcie_service_card *card = adapter->card;
> + struct pci_dev *pdev = card->dev;
> + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
> const struct mwifiex_pcie_card_reg *reg = card->pcie.reg;
> int tx_wrap = card->txbd_wrptr & reg->tx_wrap_mask;
>
> + /* Trigger a function level reset of the PCI bridge device, this makes
> + * the firmware (latest version 15.68.19.p21) of the 88W8897 PCIe+USB
> + * card stop reporting a fixed LTR value that prevents the system from
> + * entering package C10 and S0ix powersaving states.

I don't believe this. Why would resetting the root port change what
the downstream device reports via LTR messages?

From PCIe r5.0, sec 5.5.1:

The following rules define how the L1.1 and L1.2 substates are entered:
...
* When in ASPM L1.0 and the ASPM L1.2 Enable bit is Set, the L1.2
substate must be entered when CLKREQ# is deasserted and all of
the following conditions are true:

- The reported snooped LTR value last sent or received by this
Port is greater than or equal to the value set by the
LTR_L1.2_THRESHOLD Value and Scale fields, or there is no
snoop service latency requirement.

- The reported non-snooped LTR last sent or received by this
Port value is greater than or equal to the value set by the
LTR_L1.2_THRESHOLD Value and Scale fields, or there is no
non-snoop service latency requirement.

From the LTR Message format in sec 6.18:

No-Snoop Latency and Snoop Latency: As shown in Figure 6-15, these
fields include a Requirement bit that indicates if the device has a
latency requirement for the given type of Request. If the
Requirement bit is Set, the LatencyValue and LatencyScale fields
describe the latency requirement. If the Requirement bit is Clear,
there is no latency requirement and the LatencyValue and
LatencyScale fields are ignored.

Resetting the root port might make it forget the LTR value it last
received. If that's equivalent to having no service latency
requirement, it *might* enable L1.2 entry, although that doesn't seem
equivalent to the downstream device having sent an LTR message with
the Requirement bit cleared.

I think the endpoint is required to send a new LTR message before it
goes to a non-D0 state (sec 6.18), so the bridge will capture the
latency again, and we'll probably be back in the same state.

This all seems fragile to me. If we force the link to L1.2 without
knowing accurate exit latencies and latency tolerance, the device is
liable to drop packets.

> + * We need to do it here because it must happen after firmware
> + * initialization and this function is called right after that is done.
> + */
> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
> + pci_reset_function(parent_pdev);

PCIe r5.0, sec 7.5.3.3, says Function Level Reset can only be
supported by endpoints, so I guess this will actually do some other
kind of reset.

> /* Write the RX ring read pointer in to reg->rx_rdptr */
> if (mwifiex_write_reg(adapter, reg->rx_rdptr, card->rxbd_rdptr |
> tx_wrap)) {
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
> index 0234cf3c2974..cbf0565353ae 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
> @@ -27,7 +27,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 4"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Pro 5",
> @@ -36,7 +37,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1796"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Pro 5 (LTE)",
> @@ -45,7 +47,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1807"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Pro 6",
> @@ -53,7 +56,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 6"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Book 1",
> @@ -61,7 +65,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Book 2",
> @@ -69,7 +74,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book 2"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Laptop 1",
> @@ -77,7 +83,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Laptop 2",
> @@ -85,7 +92,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop 2"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {}
> };
> @@ -103,6 +111,8 @@ void mwifiex_initialize_quirks(struct pcie_service_card *card)
> dev_info(&pdev->dev, "no quirks enabled\n");
> if (card->quirks & QUIRK_FW_RST_D3COLD)
> dev_info(&pdev->dev, "quirk reset_d3cold enabled\n");
> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
> + dev_info(&pdev->dev, "quirk do_flr_on_bridge enabled\n");
> }
>
> static void mwifiex_pcie_set_power_d3cold(struct pci_dev *pdev)
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
> index 8ec4176d698f..f8d463f4269a 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
> @@ -18,6 +18,7 @@
> #include "pcie.h"
>
> #define QUIRK_FW_RST_D3COLD BIT(0)
> +#define QUIRK_DO_FLR_ON_BRIDGE BIT(1)
>
> void mwifiex_initialize_quirks(struct pcie_service_card *card);
> int mwifiex_pcie_reset_d3cold_quirk(struct pci_dev *pdev);
> --
> 2.31.1
>

2021-10-11 17:04:02

by Pali Rohár

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On Monday 11 October 2021 15:42:38 Jonas Dreßler wrote:
> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
> reports a hardcoded LTR value to the system during initialization,
> probably as an (unsuccessful) attempt of the developers to fix firmware
> crashes. This LTR value prevents most of the Microsoft Surface devices
> from entering deep powersaving states (either platform C-State 10 or
> S0ix state), because the exit latency of that state would be higher than
> what the card can tolerate.

This description looks like a generic issue in 88W8897 chip or its
firmware and not something to Surface PCIe controller or Surface HW. But
please correct me if I'm wrong here.

Has somebody 88W8897-based PCIe card in non-Surface device and can check
or verify if this issue happens also outside of the Surface device?

It would be really nice to know if this is issue in Surface or in 8897.

> Turns out the card works just the same (including the firmware crashes)
> no matter if that hardcoded LTR value is reported or not, so it's kind
> of useless and only prevents us from saving power.
>
> To get rid of those hardcoded LTR requirements, it's possible to reset
> the PCI bridge device after initializing the cards firmware. I'm not
> exactly sure why that works, maybe the power management subsystem of the
> PCH resets its stored LTR values when doing a function level reset of
> the bridge device. Doing the reset once after starting the wifi firmware
> works very well, probably because the firmware only reports that LTR
> value a single time during firmware startup.
>
> Signed-off-by: Jonas Dreßler <[email protected]>
> ---
> drivers/net/wireless/marvell/mwifiex/pcie.c | 12 +++++++++
> .../wireless/marvell/mwifiex/pcie_quirks.c | 26 +++++++++++++------
> .../wireless/marvell/mwifiex/pcie_quirks.h | 1 +
> 3 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index c6ccce426b49..2506e7e49f0c 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -1748,9 +1748,21 @@ mwifiex_pcie_send_boot_cmd(struct mwifiex_adapter *adapter, struct sk_buff *skb)
> static int mwifiex_pcie_init_fw_port(struct mwifiex_adapter *adapter)
> {
> struct pcie_service_card *card = adapter->card;
> + struct pci_dev *pdev = card->dev;
> + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
> const struct mwifiex_pcie_card_reg *reg = card->pcie.reg;
> int tx_wrap = card->txbd_wrptr & reg->tx_wrap_mask;
>
> + /* Trigger a function level reset of the PCI bridge device, this makes
> + * the firmware (latest version 15.68.19.p21) of the 88W8897 PCIe+USB
> + * card stop reporting a fixed LTR value that prevents the system from
> + * entering package C10 and S0ix powersaving states.
> + * We need to do it here because it must happen after firmware
> + * initialization and this function is called right after that is done.
> + */
> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
> + pci_reset_function(parent_pdev);
> +
> /* Write the RX ring read pointer in to reg->rx_rdptr */
> if (mwifiex_write_reg(adapter, reg->rx_rdptr, card->rxbd_rdptr |
> tx_wrap)) {
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
> index 0234cf3c2974..cbf0565353ae 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
> @@ -27,7 +27,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 4"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Pro 5",
> @@ -36,7 +37,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1796"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Pro 5 (LTE)",
> @@ -45,7 +47,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1807"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Pro 6",
> @@ -53,7 +56,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 6"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Book 1",
> @@ -61,7 +65,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Book 2",
> @@ -69,7 +74,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book 2"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Laptop 1",
> @@ -77,7 +83,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {
> .ident = "Surface Laptop 2",
> @@ -85,7 +92,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop 2"),
> },
> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
> + QUIRK_DO_FLR_ON_BRIDGE),
> },
> {}
> };
> @@ -103,6 +111,8 @@ void mwifiex_initialize_quirks(struct pcie_service_card *card)
> dev_info(&pdev->dev, "no quirks enabled\n");
> if (card->quirks & QUIRK_FW_RST_D3COLD)
> dev_info(&pdev->dev, "quirk reset_d3cold enabled\n");
> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
> + dev_info(&pdev->dev, "quirk do_flr_on_bridge enabled\n");
> }
>
> static void mwifiex_pcie_set_power_d3cold(struct pci_dev *pdev)
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
> index 8ec4176d698f..f8d463f4269a 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
> @@ -18,6 +18,7 @@
> #include "pcie.h"
>
> #define QUIRK_FW_RST_D3COLD BIT(0)
> +#define QUIRK_DO_FLR_ON_BRIDGE BIT(1)
>
> void mwifiex_initialize_quirks(struct pcie_service_card *card);
> int mwifiex_pcie_reset_d3cold_quirk(struct pci_dev *pdev);
> --
> 2.31.1
>

2021-10-12 08:49:35

by Jonas Dreßler

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On 10/11/21 18:53, Bjorn Helgaas wrote:
> [+cc Alex]
>
> On Mon, Oct 11, 2021 at 03:42:38PM +0200, Jonas Dreßler wrote:
>> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
>> reports a hardcoded LTR value to the system during initialization,
>> probably as an (unsuccessful) attempt of the developers to fix firmware
>> crashes. This LTR value prevents most of the Microsoft Surface devices
>> from entering deep powersaving states (either platform C-State 10 or
>> S0ix state), because the exit latency of that state would be higher than
>> what the card can tolerate.
>
> S0ix and C-State 10 are ACPI concepts that don't mean anything in a
> PCIe context.
>
> I think LTR is only involved in deciding whether to enter the ASPM
> L1.2 substate. Maybe the system will only enter C-State 10 or S0ix
> when the link is in L1.2?

Yup, this is indeed the case, see https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting
(ctrl+f "IP LINK PM STATE").

>
>> Turns out the card works just the same (including the firmware crashes)
>> no matter if that hardcoded LTR value is reported or not, so it's kind
>> of useless and only prevents us from saving power.
>>
>> To get rid of those hardcoded LTR requirements, it's possible to reset
>> the PCI bridge device after initializing the cards firmware. I'm not
>> exactly sure why that works, maybe the power management subsystem of the
>> PCH resets its stored LTR values when doing a function level reset of
>> the bridge device. Doing the reset once after starting the wifi firmware
>> works very well, probably because the firmware only reports that LTR
>> value a single time during firmware startup.
>>
>> Signed-off-by: Jonas Dreßler <[email protected]>
>> ---
>> drivers/net/wireless/marvell/mwifiex/pcie.c | 12 +++++++++
>> .../wireless/marvell/mwifiex/pcie_quirks.c | 26 +++++++++++++------
>> .../wireless/marvell/mwifiex/pcie_quirks.h | 1 +
>> 3 files changed, 31 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
>> index c6ccce426b49..2506e7e49f0c 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
>> @@ -1748,9 +1748,21 @@ mwifiex_pcie_send_boot_cmd(struct mwifiex_adapter *adapter, struct sk_buff *skb)
>> static int mwifiex_pcie_init_fw_port(struct mwifiex_adapter *adapter)
>> {
>> struct pcie_service_card *card = adapter->card;
>> + struct pci_dev *pdev = card->dev;
>> + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
>> const struct mwifiex_pcie_card_reg *reg = card->pcie.reg;
>> int tx_wrap = card->txbd_wrptr & reg->tx_wrap_mask;
>>
>> + /* Trigger a function level reset of the PCI bridge device, this makes
>> + * the firmware (latest version 15.68.19.p21) of the 88W8897 PCIe+USB
>> + * card stop reporting a fixed LTR value that prevents the system from
>> + * entering package C10 and S0ix powersaving states.
>
> I don't believe this. Why would resetting the root port change what
> the downstream device reports via LTR messages?
>
> From PCIe r5.0, sec 5.5.1:
>
> The following rules define how the L1.1 and L1.2 substates are entered:
> ...
> * When in ASPM L1.0 and the ASPM L1.2 Enable bit is Set, the L1.2
> substate must be entered when CLKREQ# is deasserted and all of
> the following conditions are true:
>
> - The reported snooped LTR value last sent or received by this
> Port is greater than or equal to the value set by the
> LTR_L1.2_THRESHOLD Value and Scale fields, or there is no
> snoop service latency requirement.
>
> - The reported non-snooped LTR last sent or received by this
> Port value is greater than or equal to the value set by the
> LTR_L1.2_THRESHOLD Value and Scale fields, or there is no
> non-snoop service latency requirement.
>
> From the LTR Message format in sec 6.18:
>
> No-Snoop Latency and Snoop Latency: As shown in Figure 6-15, these
> fields include a Requirement bit that indicates if the device has a
> latency requirement for the given type of Request. If the
> Requirement bit is Set, the LatencyValue and LatencyScale fields
> describe the latency requirement. If the Requirement bit is Clear,
> there is no latency requirement and the LatencyValue and
> LatencyScale fields are ignored.
>
> Resetting the root port might make it forget the LTR value it last
> received. If that's equivalent to having no service latency
> requirement, it *might* enable L1.2 entry, although that doesn't seem
> equivalent to the downstream device having sent an LTR message with
> the Requirement bit cleared.
>
> I think the endpoint is required to send a new LTR message before it
> goes to a non-D0 state (sec 6.18), so the bridge will capture the
> latency again, and we'll probably be back in the same state.

Indeed that happens when suspending the device, after resuming the LTR
value is back to the initial value. mwifiex_pcie_init_fw_port() is
executed on resume, too though (I should probably have mentioned this
in the commit message, will do in v2), so this is taken care of.

While suspended, the device goes into D3 anyway and S0ix is achieved
regardless of the LTR value.

>
> This all seems fragile to me. If we force the link to L1.2 without
> knowing accurate exit latencies and latency tolerance, the device is
> liable to drop packets.

Yeah, I'm not saying this patch isn't an ugly hack...

What I can say though is that this patch has been running in the
linux-surface (https://github.com/linux-surface/kernel/pull/72) kernel
for a few months now, and so far we've only received positive feedback.

There's two alternatives I can think of to deal with this issue:

1) Revert the cards firmware in linux-firmware back to the second-latest
version. That firmware didn't report a fixed LTR value and also doesn't
have any other obvious issues I know of compared to the latest one.

2) Somehow interact with the PMC Core driver to make it ignore the LTR
values reported by the card (I doubt that's possible from mwifiex).
It can be done manually via debugfs by writing to
/sys/kernel/debug/pmc_core/ltr_ignore.

>
>> + * We need to do it here because it must happen after firmware
>> + * initialization and this function is called right after that is done.
>> + */
>> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
>> + pci_reset_function(parent_pdev);
>
> PCIe r5.0, sec 7.5.3.3, says Function Level Reset can only be
> supported by endpoints, so I guess this will actually do some other
> kind of reset.

Interesting, I briefly searched and it doesn't seem like think
there's public documentation available by Intel that goes into
the specifics here, maybe someone working at Intel knows more?

>
>> /* Write the RX ring read pointer in to reg->rx_rdptr */
>> if (mwifiex_write_reg(adapter, reg->rx_rdptr, card->rxbd_rdptr |
>> tx_wrap)) {
>> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
>> index 0234cf3c2974..cbf0565353ae 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
>> @@ -27,7 +27,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 4"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Pro 5",
>> @@ -36,7 +37,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1796"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Pro 5 (LTE)",
>> @@ -45,7 +47,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1807"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Pro 6",
>> @@ -53,7 +56,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 6"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Book 1",
>> @@ -61,7 +65,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Book 2",
>> @@ -69,7 +74,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book 2"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Laptop 1",
>> @@ -77,7 +83,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Laptop 2",
>> @@ -85,7 +92,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop 2"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {}
>> };
>> @@ -103,6 +111,8 @@ void mwifiex_initialize_quirks(struct pcie_service_card *card)
>> dev_info(&pdev->dev, "no quirks enabled\n");
>> if (card->quirks & QUIRK_FW_RST_D3COLD)
>> dev_info(&pdev->dev, "quirk reset_d3cold enabled\n");
>> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
>> + dev_info(&pdev->dev, "quirk do_flr_on_bridge enabled\n");
>> }
>>
>> static void mwifiex_pcie_set_power_d3cold(struct pci_dev *pdev)
>> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
>> index 8ec4176d698f..f8d463f4269a 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
>> @@ -18,6 +18,7 @@
>> #include "pcie.h"
>>
>> #define QUIRK_FW_RST_D3COLD BIT(0)
>> +#define QUIRK_DO_FLR_ON_BRIDGE BIT(1)
>>
>> void mwifiex_initialize_quirks(struct pcie_service_card *card);
>> int mwifiex_pcie_reset_d3cold_quirk(struct pci_dev *pdev);
>> --
>> 2.31.1
>>

2021-10-12 08:55:26

by Jonas Dreßler

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On 10/11/21 19:02, Pali Rohár wrote:
> On Monday 11 October 2021 15:42:38 Jonas Dreßler wrote:
>> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
>> reports a hardcoded LTR value to the system during initialization,
>> probably as an (unsuccessful) attempt of the developers to fix firmware
>> crashes. This LTR value prevents most of the Microsoft Surface devices
>> from entering deep powersaving states (either platform C-State 10 or
>> S0ix state), because the exit latency of that state would be higher than
>> what the card can tolerate.
>
> This description looks like a generic issue in 88W8897 chip or its
> firmware and not something to Surface PCIe controller or Surface HW. But
> please correct me if I'm wrong here.
>
> Has somebody 88W8897-based PCIe card in non-Surface device and can check
> or verify if this issue happens also outside of the Surface device?
>
> It would be really nice to know if this is issue in Surface or in 8897.
>

Fairly sure the LTR value is something that's reported by the firmware
and will be the same on all 8897 devices (as mentioned in my reply to Bjorn
the second-latest firmware doesn't report that fixed LTR value).

The thing is I'm not sure if this hack works fine on non-Surface devices
or maybe breaks things there (I guess the change had some effect on Marvells
test platform at least), so this is simply the minimum risk approach.

>> Turns out the card works just the same (including the firmware crashes)
>> no matter if that hardcoded LTR value is reported or not, so it's kind
>> of useless and only prevents us from saving power.
>>
>> To get rid of those hardcoded LTR requirements, it's possible to reset
>> the PCI bridge device after initializing the cards firmware. I'm not
>> exactly sure why that works, maybe the power management subsystem of the
>> PCH resets its stored LTR values when doing a function level reset of
>> the bridge device. Doing the reset once after starting the wifi firmware
>> works very well, probably because the firmware only reports that LTR
>> value a single time during firmware startup.
>>
>> Signed-off-by: Jonas Dreßler <[email protected]>
>> ---
>> drivers/net/wireless/marvell/mwifiex/pcie.c | 12 +++++++++
>> .../wireless/marvell/mwifiex/pcie_quirks.c | 26 +++++++++++++------
>> .../wireless/marvell/mwifiex/pcie_quirks.h | 1 +
>> 3 files changed, 31 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
>> index c6ccce426b49..2506e7e49f0c 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
>> @@ -1748,9 +1748,21 @@ mwifiex_pcie_send_boot_cmd(struct mwifiex_adapter *adapter, struct sk_buff *skb)
>> static int mwifiex_pcie_init_fw_port(struct mwifiex_adapter *adapter)
>> {
>> struct pcie_service_card *card = adapter->card;
>> + struct pci_dev *pdev = card->dev;
>> + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
>> const struct mwifiex_pcie_card_reg *reg = card->pcie.reg;
>> int tx_wrap = card->txbd_wrptr & reg->tx_wrap_mask;
>>
>> + /* Trigger a function level reset of the PCI bridge device, this makes
>> + * the firmware (latest version 15.68.19.p21) of the 88W8897 PCIe+USB
>> + * card stop reporting a fixed LTR value that prevents the system from
>> + * entering package C10 and S0ix powersaving states.
>> + * We need to do it here because it must happen after firmware
>> + * initialization and this function is called right after that is done.
>> + */
>> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
>> + pci_reset_function(parent_pdev);
>> +
>> /* Write the RX ring read pointer in to reg->rx_rdptr */
>> if (mwifiex_write_reg(adapter, reg->rx_rdptr, card->rxbd_rdptr |
>> tx_wrap)) {
>> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
>> index 0234cf3c2974..cbf0565353ae 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c
>> @@ -27,7 +27,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 4"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Pro 5",
>> @@ -36,7 +37,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1796"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Pro 5 (LTE)",
>> @@ -45,7 +47,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1807"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Pro 6",
>> @@ -53,7 +56,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 6"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Book 1",
>> @@ -61,7 +65,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Book 2",
>> @@ -69,7 +74,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book 2"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Laptop 1",
>> @@ -77,7 +83,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {
>> .ident = "Surface Laptop 2",
>> @@ -85,7 +92,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = {
>> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
>> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop 2"),
>> },
>> - .driver_data = (void *)QUIRK_FW_RST_D3COLD,
>> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD |
>> + QUIRK_DO_FLR_ON_BRIDGE),
>> },
>> {}
>> };
>> @@ -103,6 +111,8 @@ void mwifiex_initialize_quirks(struct pcie_service_card *card)
>> dev_info(&pdev->dev, "no quirks enabled\n");
>> if (card->quirks & QUIRK_FW_RST_D3COLD)
>> dev_info(&pdev->dev, "quirk reset_d3cold enabled\n");
>> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
>> + dev_info(&pdev->dev, "quirk do_flr_on_bridge enabled\n");
>> }
>>
>> static void mwifiex_pcie_set_power_d3cold(struct pci_dev *pdev)
>> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
>> index 8ec4176d698f..f8d463f4269a 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
>> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h
>> @@ -18,6 +18,7 @@
>> #include "pcie.h"
>>
>> #define QUIRK_FW_RST_D3COLD BIT(0)
>> +#define QUIRK_DO_FLR_ON_BRIDGE BIT(1)
>>
>> void mwifiex_initialize_quirks(struct pcie_service_card *card);
>> int mwifiex_pcie_reset_d3cold_quirk(struct pci_dev *pdev);
>> --
>> 2.31.1
>>

2021-10-12 09:01:05

by Pali Rohár

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On Tuesday 12 October 2021 10:48:49 Jonas Dreßler wrote:
> 1) Revert the cards firmware in linux-firmware back to the second-latest
> version. That firmware didn't report a fixed LTR value and also doesn't
> have any other obvious issues I know of compared to the latest one.

FYI, there are new bugs with new firmware versions for 8997 sent by NXP
to linux-firmware repository... and questions what to do with it. Seems
that NXP again do not respond to any questions after new version was
merged into linux-firmware repo.

https://lore.kernel.org/linux-firmware/[email protected]/

So firmware revert also for other ex-Marvell / NXP chips is not
something which could not happen.

2021-10-12 11:21:24

by Jonas Dreßler

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On 10/12/21 11:00, Pali Rohár wrote:
> On Tuesday 12 October 2021 10:48:49 Jonas Dreßler wrote:
>> 1) Revert the cards firmware in linux-firmware back to the second-latest
>> version. That firmware didn't report a fixed LTR value and also doesn't
>> have any other obvious issues I know of compared to the latest one.
>
> FYI, there are new bugs with new firmware versions for 8997 sent by NXP
> to linux-firmware repository... and questions what to do with it. Seems
> that NXP again do not respond to any questions after new version was
> merged into linux-firmware repo.
>
> https://lore.kernel.org/linux-firmware/[email protected]/
>
> So firmware revert also for other ex-Marvell / NXP chips is not
> something which could not happen.
>

Argh, nevermind, it seems like my memory is fooling me once again, sorry.
I just tried the older firmware and I was completely wrong, there's no
difference at all between the versions when it comes to LTR messages. So
there goes alternative 1).

Something interesting and reassuring I noticed though: After resuming from
suspend the firmware actually doesn't send a new LTR message, which means
now the LTR is 0 and we enter PC10/S0ix just fine. So basically the change
this patch does is already in effect, just after one suspend/resume cycle.
That gives me more confidence that we should maybe apply this patch for
all 8897 devices, not only Surface devices?

2021-10-12 11:29:52

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On Tue, Oct 12, 2021 at 10:48:49AM +0200, Jonas Dre?ler wrote:
> On 10/11/21 18:53, Bjorn Helgaas wrote:
> > On Mon, Oct 11, 2021 at 03:42:38PM +0200, Jonas Dre?ler wrote:
> > > The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
> > > reports a hardcoded LTR value to the system during initialization,
> > > probably as an (unsuccessful) attempt of the developers to fix firmware
> > > crashes. This LTR value prevents most of the Microsoft Surface devices
> > > from entering deep powersaving states (either platform C-State 10 or
> > > S0ix state), because the exit latency of that state would be higher than
> > > what the card can tolerate.
> >
> > S0ix and C-State 10 are ACPI concepts that don't mean anything in a
> > PCIe context.
> >
> > I think LTR is only involved in deciding whether to enter the ASPM
> > L1.2 substate. Maybe the system will only enter C-State 10 or S0ix
> > when the link is in L1.2?
>
> Yup, this is indeed the case, see https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting
> (ctrl+f "IP LINK PM STATE").

I think it would be helpful if the commit log included this missing
link, e.g., the LTR value prevents the link from going to L1.2, which
in turn prevents use of C-State 10/S0ix.

> There's two alternatives I can think of to deal with this issue:
>
> 1) Revert the cards firmware in linux-firmware back to the second-latest
> version. That firmware didn't report a fixed LTR value and also doesn't
> have any other obvious issues I know of compared to the latest one.

You've mentioned "fixed LTR value" more than once. My weak
understanding of LTR and L1.2 is that the latencies a device reports
via LTR messages are essentially a function of buffering in the device
and electrical characteristics of the link. I expect them to be set
once and not changed.

But did the previous firmware report different latencies at different
times? Or did it just not advertise L1.2 support at all? Or do you
mean the new firmware reports a "corrected" LTR value that doesn't
work as well?

> 2) Somehow interact with the PMC Core driver to make it ignore the LTR
> values reported by the card (I doubt that's possible from mwifiex).
> It can be done manually via debugfs by writing to
> /sys/kernel/debug/pmc_core/ltr_ignore.

Interesting; I wasn't aware of that, thanks. This still feels like a
configuration issue. If we ignore the reported LTR values, I guess
you mean the root port assumes it's *always* safe to enter L1.2, i.e.,
the device has enough buffering to deal with the exit latency?

I would think there would be a way to program the LTR capability to
have the device itself report that, so we wouldn't have to fiddle with
the upstream end.

> > > + * We need to do it here because it must happen after firmware
> > > + * initialization and this function is called right after that is done.
> > > + */

> > > + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
> > > + pci_reset_function(parent_pdev);
> >
> > PCIe r5.0, sec 7.5.3.3, says Function Level Reset can only be
> > supported by endpoints, so I guess this will actually do some other
> > kind of reset.
>
> Interesting, I briefly searched and it doesn't seem like think
> there's public documentation available by Intel that goes into
> the specifics here, maybe someone working at Intel knows more?

"lspci -vv" will tell you whether the root port advertises FLR
support. The spec says it shouldn't, but I think pci_reset_function()
relies on what DevCap says. You could instrument pci_reset_function()
to see exactly what kind of reset we do.

Bjorn

2021-10-13 21:36:00

by Jonas Dreßler

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On 10/12/21 13:29, Bjorn Helgaas wrote:
> On Tue, Oct 12, 2021 at 10:48:49AM +0200, Jonas Dreßler wrote:
>> On 10/11/21 18:53, Bjorn Helgaas wrote:
>>> On Mon, Oct 11, 2021 at 03:42:38PM +0200, Jonas Dreßler wrote:
>>>> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
>>>> reports a hardcoded LTR value to the system during initialization,
>>>> probably as an (unsuccessful) attempt of the developers to fix firmware
>>>> crashes. This LTR value prevents most of the Microsoft Surface devices
>>>> from entering deep powersaving states (either platform C-State 10 or
>>>> S0ix state), because the exit latency of that state would be higher than
>>>> what the card can tolerate.
>>>
>>> S0ix and C-State 10 are ACPI concepts that don't mean anything in a
>>> PCIe context.
>>>
>>> I think LTR is only involved in deciding whether to enter the ASPM
>>> L1.2 substate. Maybe the system will only enter C-State 10 or S0ix
>>> when the link is in L1.2?
>>
>> Yup, this is indeed the case, see https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting
>> (ctrl+f "IP LINK PM STATE").
>
> I think it would be helpful if the commit log included this missing
> link, e.g., the LTR value prevents the link from going to L1.2, which
> in turn prevents use of C-State 10/S0ix.
>
>> There's two alternatives I can think of to deal with this issue:
>>
>> 1) Revert the cards firmware in linux-firmware back to the second-latest
>> version. That firmware didn't report a fixed LTR value and also doesn't
>> have any other obvious issues I know of compared to the latest one.
>
> You've mentioned "fixed LTR value" more than once. My weak
> understanding of LTR and L1.2 is that the latencies a device reports
> via LTR messages are essentially a function of buffering in the device
> and electrical characteristics of the link. I expect them to be set
> once and not changed.

I'm not an expert on PCI at all, but from my understanding the idea behind
the LTR mechanism is to be able to dynamically communicate latency
requirements depending on the current situation/workload. So for example
in case the wifi card is receiving a lot of data, it would report a lower
latency tolerance because its rx buffers are filling up fast.

Looking at ltr_show in the pmc_core debug driver, that also seems to be how
other devices handle it: For example moving the USB mouse makes the XHCI
controller report a non-null LTR for a few seconds.

So with "fixed LTR value" I meant to say that in my understanding this is
exactly the opposite of how LTR is supposed to be used: Instead of
dynamically reporting a new latency tolerance when the card is being used,
it reports a "fixed" tolerance once during firmware startup, maybe with
the intention of papering over bugs in the firmware...

>
> But did the previous firmware report different latencies at different
> times? Or did it just not advertise L1.2 support at all? Or do you
> mean the new firmware reports a "corrected" LTR value that doesn't
> work as well?

Sorry, as mentioned in my other reply the previous firmware is actually
behaving in the exact same way, no clue why I remembered this wrong.

>
>> 2) Somehow interact with the PMC Core driver to make it ignore the LTR
>> values reported by the card (I doubt that's possible from mwifiex).
>> It can be done manually via debugfs by writing to
>> /sys/kernel/debug/pmc_core/ltr_ignore.
>
> Interesting; I wasn't aware of that, thanks. This still feels like a
> configuration issue. If we ignore the reported LTR values, I guess
> you mean the root port assumes it's *always* safe to enter L1.2, i.e.,
> the device has enough buffering to deal with the exit latency?

Not sure about that, in theory there's also the whole negotiation via the
CLKREQ# pin when entering ASPM L1.2. The card could use that to reject L1.2
entry I guess.

>
> I would think there would be a way to program the LTR capability to
> have the device itself report that, so we wouldn't have to fiddle with
> the upstream end.

Well I mean the device does report LTR capabilities and a maximum snoop and
non-snoop latency via extended capabilities, so I guess that means it is
supported.

>
>>>> + * We need to do it here because it must happen after firmware
>>>> + * initialization and this function is called right after that is done.
>>>> + */
>
>>>> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
>>>> + pci_reset_function(parent_pdev);
>>>
>>> PCIe r5.0, sec 7.5.3.3, says Function Level Reset can only be
>>> supported by endpoints, so I guess this will actually do some other
>>> kind of reset.
>>
>> Interesting, I briefly searched and it doesn't seem like think
>> there's public documentation available by Intel that goes into
>> the specifics here, maybe someone working at Intel knows more?
>
> "lspci -vv" will tell you whether the root port advertises FLR
> support. The spec says it shouldn't, but I think pci_reset_function()
> relies on what DevCap says. You could instrument pci_reset_function()
> to see exactly what kind of reset we do.

Ahh indeed, I wasn't aware there's multiple kinds of resets. Looks like
what it uses is pci_pm_reset().

>
> Bjorn
>

2021-10-13 22:09:56

by Jonas Dreßler

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On 10/12/21 17:39, Bjorn Helgaas wrote:
> [+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread:
> https://lore.kernel.org/all/[email protected]/]
>
> On Tue, Oct 12, 2021 at 10:55:03AM +0200, Jonas Dreßler wrote:
>> On 10/11/21 19:02, Pali Rohár wrote:
>>> On Monday 11 October 2021 15:42:38 Jonas Dreßler wrote:
>>>> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
>>>> reports a hardcoded LTR value to the system during initialization,
>>>> probably as an (unsuccessful) attempt of the developers to fix firmware
>>>> crashes. This LTR value prevents most of the Microsoft Surface devices
>>>> from entering deep powersaving states (either platform C-State 10 or
>>>> S0ix state), because the exit latency of that state would be higher than
>>>> what the card can tolerate.
>>>
>>> This description looks like a generic issue in 88W8897 chip or its
>>> firmware and not something to Surface PCIe controller or Surface HW. But
>>> please correct me if I'm wrong here.
>>>
>>> Has somebody 88W8897-based PCIe card in non-Surface device and can check
>>> or verify if this issue happens also outside of the Surface device?
>>>
>>> It would be really nice to know if this is issue in Surface or in 8897.
>>
>> Fairly sure the LTR value is something that's reported by the firmware
>> and will be the same on all 8897 devices (as mentioned in my reply to Bjorn
>> the second-latest firmware doesn't report that fixed LTR value).
>
> I suggested earlier that the LTR values reported by the device might
> depend on the electrical characteristics of the link and hence be
> platform-dependent, but I think that might be wrong.
>
> The spec (PCIe r5.0, sec 5.5.4) does say that some of the *other*
> parameters related to L1.2 entry are platform-dependent:
>
> Prior to setting either or both of the enable bits for L1.2, the
> values for TPOWER_ON, Common_Mode_Restore_Time, and, if the ASPM
> L1.2 Enable bit is to be Set, the LTR_L1.2_THRESHOLD (both Value
> and Scale fields) must be programmed. The TPOWER_ON and
> Common_Mode_Restore_Time fields must be programmed to the
> appropriate values based on the components and AC coupling
> capacitors used in the connection linking the two components. The
> determination of these values is design implementation specific.
>
> These T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD
> values are in the L1 PM Substates Control registers.
>
> I don't know of a way for the kernel or the device firmware to learn
> these circuit characteristics or the appropriate values, so I think
> only system firmware can program the L1 PM Substates Control registers
> (a corollary of this is that I don't see a way for hot-plugged devices
> to *ever* use L1.2).
>
> I wonder if this reset quirk works because pci_reset_function() saves
> and restores much of config space, but it currently does *not* restore
> the L1 PM Substates capability, so those T_POWER_ON,
> Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get
> cleared out by the reset. We did briefly save/restore it [1], but we
> had to revert that because of a regression that AFAIK was never
> resolved [2]. I expect we will eventually save/restore this, so if
> the quirk depends on it *not* being restored, that would be a problem.
>
> You should be able to test whether this is the critical thing by
> clearing those registers with setpci instead of doing the reset. Per
> spec, they can only be modified when L1.2 is disabled, so you would
> have to disable it via sysfs (for the endpoint, I think)
> /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root
> port, then re-enable L1.2.
>
> [1] https://git.kernel.org/linus/4257f7e008ea
> [2] https://lore.kernel.org/all/[email protected]/
>

Hmm, interesting, thanks for those links.

Are you sure the config values will get lost on the reset? If we only reset
the port by going into D3hot and back into D0, the device will remain powered
and won't lose the config space, will it?

Because when I reset the bridge using pci_reset_function() (ie. pci_pm_reset())
or when I suspend and resume the laptop, all the L1 PM Substates registers are
still the same as before, nothing is lost.

That said, our new mwifiex_pcie_reset_d3cold_quirk() puts *both the card and
the bridge* into D3cold, so I gave that a try, and indeed the cards L1 Substate
Ctl registers are cleared out (so T_CommonMode, LTR1.2_Threshold and T_PwrOn),
but the bridge still has its values, no clue why that's the case.

2021-10-18 12:54:39

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

Jonas Dreßler <[email protected]> wrote:

> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
> reports a hardcoded LTR value to the system during initialization,
> probably as an (unsuccessful) attempt of the developers to fix firmware
> crashes. This LTR value prevents most of the Microsoft Surface devices
> from entering deep powersaving states (either platform C-State 10 or
> S0ix state), because the exit latency of that state would be higher than
> what the card can tolerate.
>
> Turns out the card works just the same (including the firmware crashes)
> no matter if that hardcoded LTR value is reported or not, so it's kind
> of useless and only prevents us from saving power.
>
> To get rid of those hardcoded LTR requirements, it's possible to reset
> the PCI bridge device after initializing the cards firmware. I'm not
> exactly sure why that works, maybe the power management subsystem of the
> PCH resets its stored LTR values when doing a function level reset of
> the bridge device. Doing the reset once after starting the wifi firmware
> works very well, probably because the firmware only reports that LTR
> value a single time during firmware startup.
>
> Signed-off-by: Jonas Dreßler <[email protected]>

I'm not sure what was the conclusion from the discussion, so dropping
the patch. Please resend once it's ready to be applied.

Patch set to Changes Requested.

--
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2021-10-18 15:35:56

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On Thu, Oct 14, 2021 at 12:08:31AM +0200, Jonas Dre?ler wrote:
> On 10/12/21 17:39, Bjorn Helgaas wrote:
> > [+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread:
> > https://lore.kernel.org/all/[email protected]/]

> > I wonder if this reset quirk works because pci_reset_function() saves
> > and restores much of config space, but it currently does *not* restore
> > the L1 PM Substates capability, so those T_POWER_ON,
> > Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get
> > cleared out by the reset. We did briefly save/restore it [1], but we
> > had to revert that because of a regression that AFAIK was never
> > resolved [2]. I expect we will eventually save/restore this, so if
> > the quirk depends on it *not* being restored, that would be a problem.
> >
> > You should be able to test whether this is the critical thing by
> > clearing those registers with setpci instead of doing the reset. Per
> > spec, they can only be modified when L1.2 is disabled, so you would
> > have to disable it via sysfs (for the endpoint, I think)
> > /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root
> > port, then re-enable L1.2.
> >
> > [1] https://git.kernel.org/linus/4257f7e008ea
> > [2] https://lore.kernel.org/all/[email protected]/
>
> Hmm, interesting, thanks for those links.
>
> Are you sure the config values will get lost on the reset? If we only reset
> the port by going into D3hot and back into D0, the device will remain powered
> and won't lose the config space, will it?

I think you're doing a PM reset (transition to D3hot and back to D0).
Linux only does this when PCI_PM_CTRL_NO_SOFT_RESET == 0. The spec
doesn't actually *require* the device to be reset; it only says the
internal state of the device is undefined after these transitions.

2021-10-25 16:47:30

by Jonas Dreßler

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On 10/18/21 17:35, Bjorn Helgaas wrote:
> On Thu, Oct 14, 2021 at 12:08:31AM +0200, Jonas Dreßler wrote:
>> On 10/12/21 17:39, Bjorn Helgaas wrote:
>>> [+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread:
>>> https://lore.kernel.org/all/[email protected]/]
>
>>> I wonder if this reset quirk works because pci_reset_function() saves
>>> and restores much of config space, but it currently does *not* restore
>>> the L1 PM Substates capability, so those T_POWER_ON,
>>> Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get
>>> cleared out by the reset. We did briefly save/restore it [1], but we
>>> had to revert that because of a regression that AFAIK was never
>>> resolved [2]. I expect we will eventually save/restore this, so if
>>> the quirk depends on it *not* being restored, that would be a problem.
>>>
>>> You should be able to test whether this is the critical thing by
>>> clearing those registers with setpci instead of doing the reset. Per
>>> spec, they can only be modified when L1.2 is disabled, so you would
>>> have to disable it via sysfs (for the endpoint, I think)
>>> /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root
>>> port, then re-enable L1.2.
>>>
>>> [1] https://git.kernel.org/linus/4257f7e008ea
>>> [2] https://lore.kernel.org/all/[email protected]/
>>
>> Hmm, interesting, thanks for those links.
>>
>> Are you sure the config values will get lost on the reset? If we only reset
>> the port by going into D3hot and back into D0, the device will remain powered
>> and won't lose the config space, will it?
>
> I think you're doing a PM reset (transition to D3hot and back to D0).
> Linux only does this when PCI_PM_CTRL_NO_SOFT_RESET == 0. The spec
> doesn't actually *require* the device to be reset; it only says the
> internal state of the device is undefined after these transitions.
>

Not requiring the device to be reset sounds sensible to me given that
D3hot is what devices are transitioned into during suspend.

But anyway, that doesn't really get us any further except it somewhat
gives an explanation why the LTR is suddenly 0 after the reset. Or are
you making the point that we shouldn't rely on "undefined state" for this
hack because not all PCI bridges/ports will necessarily behave the same?

2021-10-26 03:19:06

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On Mon, Oct 25, 2021 at 06:45:29PM +0200, Jonas Dre?ler wrote:
> On 10/18/21 17:35, Bjorn Helgaas wrote:
> > On Thu, Oct 14, 2021 at 12:08:31AM +0200, Jonas Dre?ler wrote:
> > > On 10/12/21 17:39, Bjorn Helgaas wrote:
> > > > [+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread:
> > > > https://lore.kernel.org/all/[email protected]/]
> >
> > > > I wonder if this reset quirk works because pci_reset_function() saves
> > > > and restores much of config space, but it currently does *not* restore
> > > > the L1 PM Substates capability, so those T_POWER_ON,
> > > > Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get
> > > > cleared out by the reset. We did briefly save/restore it [1], but we
> > > > had to revert that because of a regression that AFAIK was never
> > > > resolved [2]. I expect we will eventually save/restore this, so if
> > > > the quirk depends on it *not* being restored, that would be a problem.
> > > >
> > > > You should be able to test whether this is the critical thing by
> > > > clearing those registers with setpci instead of doing the reset. Per
> > > > spec, they can only be modified when L1.2 is disabled, so you would
> > > > have to disable it via sysfs (for the endpoint, I think)
> > > > /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root
> > > > port, then re-enable L1.2.
> > > >
> > > > [1] https://git.kernel.org/linus/4257f7e008ea
> > > > [2] https://lore.kernel.org/all/[email protected]/
> > >
> > > Hmm, interesting, thanks for those links.
> > >
> > > Are you sure the config values will get lost on the reset? If we
> > > only reset the port by going into D3hot and back into D0, the
> > > device will remain powered and won't lose the config space, will
> > > it?
> >
> > I think you're doing a PM reset (transition to D3hot and back to
> > D0). Linux only does this when PCI_PM_CTRL_NO_SOFT_RESET == 0.
> > The spec doesn't actually *require* the device to be reset; it
> > only says the internal state of the device is undefined after
> > these transitions.
>
> Not requiring the device to be reset sounds sensible to me given
> that D3hot is what devices are transitioned into during suspend.
>
> But anyway, that doesn't really get us any further except it
> somewhat gives an explanation why the LTR is suddenly 0 after the
> reset. Or are you making the point that we shouldn't rely on
> "undefined state" for this hack because not all PCI bridges/ports
> will necessarily behave the same?

I guess I'm just making the point that I don't understand why the
bridge reset fixes something, and I'm not confident that the fix will
work on every system and continue working even if/when the PCI core
starts saving and restoring the L1 PM Substates capability.

2021-11-04 13:59:16

by Jonas Dreßler

[permalink] [raw]
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface devices

On 10/26/21 01:56, Bjorn Helgaas wrote:
> On Mon, Oct 25, 2021 at 06:45:29PM +0200, Jonas Dreßler wrote:
>> On 10/18/21 17:35, Bjorn Helgaas wrote:
>>> On Thu, Oct 14, 2021 at 12:08:31AM +0200, Jonas Dreßler wrote:
>>>> On 10/12/21 17:39, Bjorn Helgaas wrote:
>>>>> [+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread:
>>>>> https://lore.kernel.org/all/[email protected]/]
>>>
>>>>> I wonder if this reset quirk works because pci_reset_function() saves
>>>>> and restores much of config space, but it currently does *not* restore
>>>>> the L1 PM Substates capability, so those T_POWER_ON,
>>>>> Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get
>>>>> cleared out by the reset. We did briefly save/restore it [1], but we
>>>>> had to revert that because of a regression that AFAIK was never
>>>>> resolved [2]. I expect we will eventually save/restore this, so if
>>>>> the quirk depends on it *not* being restored, that would be a problem.
>>>>>
>>>>> You should be able to test whether this is the critical thing by
>>>>> clearing those registers with setpci instead of doing the reset. Per
>>>>> spec, they can only be modified when L1.2 is disabled, so you would
>>>>> have to disable it via sysfs (for the endpoint, I think)
>>>>> /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root
>>>>> port, then re-enable L1.2.
>>>>>
>>>>> [1] https://git.kernel.org/linus/4257f7e008ea
>>>>> [2] https://lore.kernel.org/all/[email protected]/
>>>>
>>>> Hmm, interesting, thanks for those links.
>>>>
>>>> Are you sure the config values will get lost on the reset? If we
>>>> only reset the port by going into D3hot and back into D0, the
>>>> device will remain powered and won't lose the config space, will
>>>> it?
>>>
>>> I think you're doing a PM reset (transition to D3hot and back to
>>> D0). Linux only does this when PCI_PM_CTRL_NO_SOFT_RESET == 0.
>>> The spec doesn't actually *require* the device to be reset; it
>>> only says the internal state of the device is undefined after
>>> these transitions.
>>
>> Not requiring the device to be reset sounds sensible to me given
>> that D3hot is what devices are transitioned into during suspend.
>>
>> But anyway, that doesn't really get us any further except it
>> somewhat gives an explanation why the LTR is suddenly 0 after the
>> reset. Or are you making the point that we shouldn't rely on
>> "undefined state" for this hack because not all PCI bridges/ports
>> will necessarily behave the same?
>
> I guess I'm just making the point that I don't understand why the
> bridge reset fixes something, and I'm not confident that the fix will
> work on every system and continue working even if/when the PCI core
> starts saving and restoring the L1 PM Substates capability.
>

FWIW, I've tested it with the restoring of L1 PM Substates enabled now
and the bridge reset worked just as before.

But yeah I, too, have no clue why exactly the bridge reset does what it
does...

Anyway, I've also confirmed that it actually impacts the power usage by
measuring consumed energy during idle over a few minutes: Applying either
the bridge reset quirk or ignoring the LTR via pmc_core results in about
7% less energy usage. Given that the overall energy usage was almost
nothing to make the measurement easier, those 7% are not a lot, but
nonetheless it confirms that the quirk works.