2023-09-27 05:32:44

by Siddharth Vadapalli

[permalink] [raw]
Subject: [PATCH v3] PCI: keystone: Fix race condition when initializing PHYs

The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy()
function. The PHY in this case is the Serdes. It is possible that the
PCI instance is configured for 2 lane operation across two different
Serdes instances, using 1 lane of each Serdes. In such a configuration,
if the reference clock for one Serdes is provided by the other Serdes,
it results in a race condition. After the Serdes providing the reference
clock is initialized by the PCI driver by invoking its PHY APIs, it is
not guaranteed that this Serdes remains powered on long enough for the
PHY APIs based initialization of the dependent Serdes. In such cases,
the PLL of the dependent Serdes fails to lock due to the absence of the
reference clock from the former Serdes which has been powered off by the
PM Core.

Fix this by obtaining reference to the PHYs before invoking the PHY
initialization APIs and releasing reference after the initialization is
complete.

Fixes: 49229238ab47 ("PCI: keystone: Cleanup PHY handling")
Signed-off-by: Siddharth Vadapalli <[email protected]>
---

NOTE: This patch is based on linux-next tagged next-20230927.

v2:
https://lore.kernel.org/r/[email protected]/

Changes since v2:
- Implement suggestion by Ilpo Järvinen <[email protected]>
moving the phy_pm_runtime_put_sync() For-Loop section before the
return value of ks_pcie_enable_phy(ks_pcie) is checked, thereby
preventing duplication of the For-Loop.
- Rebase patch on next-20230927.

v1:
https://lore.kernel.org/r/[email protected]/

Changes since v1:
- Add code to release reference(s) to the phy(s) when
ks_pcie_enable_phy(ks_pcie) fails.

Regards,
Siddharth.

drivers/pci/controller/dwc/pci-keystone.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c
index 49aea6ce3e87..0ec6720cc2df 100644
--- a/drivers/pci/controller/dwc/pci-keystone.c
+++ b/drivers/pci/controller/dwc/pci-keystone.c
@@ -1218,7 +1218,16 @@ static int __init ks_pcie_probe(struct platform_device *pdev)
goto err_link;
}

+ /* Obtain reference(s) to the phy(s) */
+ for (i = 0; i < num_lanes; i++)
+ phy_pm_runtime_get_sync(ks_pcie->phy[i]);
+
ret = ks_pcie_enable_phy(ks_pcie);
+
+ /* Release reference(s) to the phy(s) */
+ for (i = 0; i < num_lanes; i++)
+ phy_pm_runtime_put_sync(ks_pcie->phy[i]);
+
if (ret) {
dev_err(dev, "failed to enable phy\n");
goto err_link;
--
2.34.1


2023-09-28 09:17:51

by Ravi Gunasekaran

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: keystone: Fix race condition when initializing PHYs



On 9/27/23 9:48 AM, Siddharth Vadapalli wrote:
> The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy()
> function. The PHY in this case is the Serdes. It is possible that the
> PCI instance is configured for 2 lane operation across two different
> Serdes instances, using 1 lane of each Serdes. In such a configuration,
> if the reference clock for one Serdes is provided by the other Serdes,
> it results in a race condition. After the Serdes providing the reference
> clock is initialized by the PCI driver by invoking its PHY APIs, it is
> not guaranteed that this Serdes remains powered on long enough for the
> PHY APIs based initialization of the dependent Serdes. In such cases,
> the PLL of the dependent Serdes fails to lock due to the absence of the
> reference clock from the former Serdes which has been powered off by the
> PM Core.
>
> Fix this by obtaining reference to the PHYs before invoking the PHY
> initialization APIs and releasing reference after the initialization is
> complete.

Sounds reasonable.

Acked-by: Ravi Gunasekaran <[email protected]>

Ravi
>
> Fixes: 49229238ab47 ("PCI: keystone: Cleanup PHY handling")
> Signed-off-by: Siddharth Vadapalli <[email protected]>
> ---
>
> NOTE: This patch is based on linux-next tagged next-20230927.
>
> v2:
> https://lore.kernel.org/r/[email protected]/
>
> Changes since v2:
> - Implement suggestion by Ilpo Järvinen <[email protected]>
> moving the phy_pm_runtime_put_sync() For-Loop section before the
> return value of ks_pcie_enable_phy(ks_pcie) is checked, thereby
> preventing duplication of the For-Loop.
> - Rebase patch on next-20230927.
>
> v1:
> https://lore.kernel.org/r/[email protected]/
>
> Changes since v1:
> - Add code to release reference(s) to the phy(s) when
> ks_pcie_enable_phy(ks_pcie) fails.
>
> Regards,
> Siddharth.
>
> drivers/pci/controller/dwc/pci-keystone.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c
> index 49aea6ce3e87..0ec6720cc2df 100644
> --- a/drivers/pci/controller/dwc/pci-keystone.c
> +++ b/drivers/pci/controller/dwc/pci-keystone.c
> @@ -1218,7 +1218,16 @@ static int __init ks_pcie_probe(struct platform_device *pdev)
> goto err_link;
> }
>
> + /* Obtain reference(s) to the phy(s) */
> + for (i = 0; i < num_lanes; i++)
> + phy_pm_runtime_get_sync(ks_pcie->phy[i]);
> +
> ret = ks_pcie_enable_phy(ks_pcie);
> +
> + /* Release reference(s) to the phy(s) */
> + for (i = 0; i < num_lanes; i++)
> + phy_pm_runtime_put_sync(ks_pcie->phy[i]);
> +
> if (ret) {
> dev_err(dev, "failed to enable phy\n");
> goto err_link;

--
Regards,
Ravi

2023-11-15 08:39:03

by Siddharth Vadapalli

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: keystone: Fix race condition when initializing PHYs

Hello Bjorn,

Could you please merge this patch?

On 28/09/23 13:21, Ravi Gunasekaran wrote:
>
>
> On 9/27/23 9:48 AM, Siddharth Vadapalli wrote:
>> The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy()
>> function. The PHY in this case is the Serdes. It is possible that the
>> PCI instance is configured for 2 lane operation across two different
>> Serdes instances, using 1 lane of each Serdes. In such a configuration,
>> if the reference clock for one Serdes is provided by the other Serdes,
>> it results in a race condition. After the Serdes providing the reference
>> clock is initialized by the PCI driver by invoking its PHY APIs, it is
>> not guaranteed that this Serdes remains powered on long enough for the
>> PHY APIs based initialization of the dependent Serdes. In such cases,
>> the PLL of the dependent Serdes fails to lock due to the absence of the
>> reference clock from the former Serdes which has been powered off by the
>> PM Core.
>>
>> Fix this by obtaining reference to the PHYs before invoking the PHY
>> initialization APIs and releasing reference after the initialization is
>> complete.
>
> Sounds reasonable.
>
> Acked-by: Ravi Gunasekaran <[email protected]>
>
> Ravi
>>
>> Fixes: 49229238ab47 ("PCI: keystone: Cleanup PHY handling")
>> Signed-off-by: Siddharth Vadapalli <[email protected]>
>> ---
>>
>> NOTE: This patch is based on linux-next tagged next-20230927.
>>
>> v2:
>> https://lore.kernel.org/r/[email protected]/
>>
>> Changes since v2:
>> - Implement suggestion by Ilpo Järvinen <[email protected]>
>> moving the phy_pm_runtime_put_sync() For-Loop section before the
>> return value of ks_pcie_enable_phy(ks_pcie) is checked, thereby
>> preventing duplication of the For-Loop.
>> - Rebase patch on next-20230927.
>>
>> v1:
>> https://lore.kernel.org/r/[email protected]/
>>
>> Changes since v1:
>> - Add code to release reference(s) to the phy(s) when
>> ks_pcie_enable_phy(ks_pcie) fails.
>>
>> Regards,
>> Siddharth.
>>
>> drivers/pci/controller/dwc/pci-keystone.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c
>> index 49aea6ce3e87..0ec6720cc2df 100644
>> --- a/drivers/pci/controller/dwc/pci-keystone.c
>> +++ b/drivers/pci/controller/dwc/pci-keystone.c
>> @@ -1218,7 +1218,16 @@ static int __init ks_pcie_probe(struct platform_device *pdev)
>> goto err_link;
>> }
>>
>> + /* Obtain reference(s) to the phy(s) */
>> + for (i = 0; i < num_lanes; i++)
>> + phy_pm_runtime_get_sync(ks_pcie->phy[i]);
>> +
>> ret = ks_pcie_enable_phy(ks_pcie);
>> +
>> + /* Release reference(s) to the phy(s) */
>> + for (i = 0; i < num_lanes; i++)
>> + phy_pm_runtime_put_sync(ks_pcie->phy[i]);
>> +
>> if (ret) {
>> dev_err(dev, "failed to enable phy\n");
>> goto err_link;
>

--
Regards,
Siddharth.

2024-01-09 03:42:16

by Krzysztof Wilczyński

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: keystone: Fix race condition when initializing PHYs

Hello,

> The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy()
> function. The PHY in this case is the Serdes. It is possible that the
> PCI instance is configured for 2 lane operation across two different
> Serdes instances, using 1 lane of each Serdes. In such a configuration,
> if the reference clock for one Serdes is provided by the other Serdes,
> it results in a race condition. After the Serdes providing the reference
> clock is initialized by the PCI driver by invoking its PHY APIs, it is
> not guaranteed that this Serdes remains powered on long enough for the
> PHY APIs based initialization of the dependent Serdes. In such cases,
> the PLL of the dependent Serdes fails to lock due to the absence of the
> reference clock from the former Serdes which has been powered off by the
> PM Core.
>
> Fix this by obtaining reference to the PHYs before invoking the PHY
> initialization APIs and releasing reference after the initialization is
> complete.

Applied to controller/keystone, thank you!

[1/1] PCI: keystone: Fix race condition when initializing PHYs
https://git.kernel.org/pci/pci/c/c12ca110c613

Krzysztof

2024-01-09 21:23:36

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: keystone: Fix race condition when initializing PHYs

On Wed, Sep 27, 2023 at 09:48:45AM +0530, Siddharth Vadapalli wrote:
> The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy()
> function. The PHY in this case is the Serdes. It is possible that the
> PCI instance is configured for 2 lane operation across two different
> Serdes instances, using 1 lane of each Serdes. In such a configuration,
> if the reference clock for one Serdes is provided by the other Serdes,
> it results in a race condition. After the Serdes providing the reference
> clock is initialized by the PCI driver by invoking its PHY APIs, it is
> not guaranteed that this Serdes remains powered on long enough for the
> PHY APIs based initialization of the dependent Serdes. In such cases,
> the PLL of the dependent Serdes fails to lock due to the absence of the
> reference clock from the former Serdes which has been powered off by the
> PM Core.
>
> Fix this by obtaining reference to the PHYs before invoking the PHY
> initialization APIs and releasing reference after the initialization is
> complete.
>
> Fixes: 49229238ab47 ("PCI: keystone: Cleanup PHY handling")
> Signed-off-by: Siddharth Vadapalli <[email protected]>
> ---
>
> NOTE: This patch is based on linux-next tagged next-20230927.
>
> v2:
> https://lore.kernel.org/r/[email protected]/
>
> Changes since v2:
> - Implement suggestion by Ilpo Järvinen <[email protected]>
> moving the phy_pm_runtime_put_sync() For-Loop section before the
> return value of ks_pcie_enable_phy(ks_pcie) is checked, thereby
> preventing duplication of the For-Loop.
> - Rebase patch on next-20230927.
>
> v1:
> https://lore.kernel.org/r/[email protected]/
>
> Changes since v1:
> - Add code to release reference(s) to the phy(s) when
> ks_pcie_enable_phy(ks_pcie) fails.
>
> Regards,
> Siddharth.
>
> drivers/pci/controller/dwc/pci-keystone.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c
> index 49aea6ce3e87..0ec6720cc2df 100644
> --- a/drivers/pci/controller/dwc/pci-keystone.c
> +++ b/drivers/pci/controller/dwc/pci-keystone.c
> @@ -1218,7 +1218,16 @@ static int __init ks_pcie_probe(struct platform_device *pdev)
> goto err_link;
> }
>
> + /* Obtain reference(s) to the phy(s) */
> + for (i = 0; i < num_lanes; i++)
> + phy_pm_runtime_get_sync(ks_pcie->phy[i]);
> +
> ret = ks_pcie_enable_phy(ks_pcie);
> +
> + /* Release reference(s) to the phy(s) */
> + for (i = 0; i < num_lanes; i++)
> + phy_pm_runtime_put_sync(ks_pcie->phy[i]);

This looks good and has already been applied, so no immediate action
required.

This is the only call to ks_pcie_enable_phy(), and these loops get and
put the PM references for the same PHYs initialized in
ks_pcie_enable_phy(), so it seems like maybe these loops could be
moved *into* ks_pcie_enable_phy().

Is there any similar issue in ks_pcie_disable_phy()? What if we
power-off a PHY that provides a reference clock to other PHYs that are
still powered-up? Will the dependent PHYs still power-off cleanly?

> if (ret) {
> dev_err(dev, "failed to enable phy\n");
> goto err_link;
> --
> 2.34.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2024-01-10 06:06:20

by Siddharth Vadapalli

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: keystone: Fix race condition when initializing PHYs

Hello Bjorn,

On 10/01/24 02:53, Bjorn Helgaas wrote:
> On Wed, Sep 27, 2023 at 09:48:45AM +0530, Siddharth Vadapalli wrote:
>> The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy()
>> function. The PHY in this case is the Serdes. It is possible that the
>> PCI instance is configured for 2 lane operation across two different

..

>>
>> + /* Obtain reference(s) to the phy(s) */
>> + for (i = 0; i < num_lanes; i++)
>> + phy_pm_runtime_get_sync(ks_pcie->phy[i]);
>> +
>> ret = ks_pcie_enable_phy(ks_pcie);
>> +
>> + /* Release reference(s) to the phy(s) */
>> + for (i = 0; i < num_lanes; i++)
>> + phy_pm_runtime_put_sync(ks_pcie->phy[i]);
>
> This looks good and has already been applied, so no immediate action
> required.
>
> This is the only call to ks_pcie_enable_phy(), and these loops get and
> put the PM references for the same PHYs initialized in
> ks_pcie_enable_phy(), so it seems like maybe these loops could be
> moved *into* ks_pcie_enable_phy().

Does the following look fine?
===============================================================================
diff --git a/drivers/pci/controller/dwc/pci-keystone.c
b/drivers/pci/controller/dwc/pci-keystone.c
index e02236003b46..6e9f9589d26c 100644
--- a/drivers/pci/controller/dwc/pci-keystone.c
+++ b/drivers/pci/controller/dwc/pci-keystone.c
@@ -962,6 +962,9 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie)
int num_lanes = ks_pcie->num_lanes;

for (i = 0; i < num_lanes; i++) {
+ /* Obtain reference to the phy */
+ phy_pm_runtime_get_sync(ks_pcie->phy[i]);
+
ret = phy_reset(ks_pcie->phy[i]);
if (ret < 0)
goto err_phy;
@@ -977,12 +980,18 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie)
}
}

+ /* Release reference(s) to the phy(s) */
+ for (i = 0; i < num_lanes; i++)
+ phy_pm_runtime_put_sync(ks_pcie->phy[i]);
+
return 0;

err_phy:
while (--i >= 0) {
phy_power_off(ks_pcie->phy[i]);
phy_exit(ks_pcie->phy[i]);
+ /* Release reference to the phy */
+ phy_pm_runtime_put_sync(ks_pcie->phy[i]);
}

return ret;
===============================================================================

>
> Is there any similar issue in ks_pcie_disable_phy()? What if we
> power-off a PHY that provides a reference clock to other PHYs that are
> still powered-up? Will the dependent PHYs still power-off cleanly?

While debugging the issue fixed by this patch, I had bisected and identified
that prior to the following commit:
https://github.com/torvalds/linux/commit/e611f8cd8717c8fe7d4229997e6cd029a1465253
despite the race condition being present, there was no issue. While I am not
fully certain, I believe that the above observation indicates that prior to the
aforementioned commit, the race condition did exist, but there was a slightly
longer delay between the PHY providing the reference clock being powered off
within "ks_pcie_enable_phy()". That delay was sufficient for the dependent PHY
to lock its PLL based on the reference clock provided, following which, despite
the PHY providing the reference clock being powered off and the dependent PHY
staying powered on, there was no issue observed. Therefore, it appears to me
that holding reference to the PHY providing the reference clock isn't necessary
once the dependent PHY's PLL is locked.

..

--
Regards,
Siddharth.

2024-01-19 23:20:44

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: keystone: Fix race condition when initializing PHYs

On Wed, Jan 10, 2024 at 11:35:24AM +0530, Siddharth Vadapalli wrote:
> Hello Bjorn,
>
> On 10/01/24 02:53, Bjorn Helgaas wrote:
> > On Wed, Sep 27, 2023 at 09:48:45AM +0530, Siddharth Vadapalli wrote:
> >> The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy()
> >> function. The PHY in this case is the Serdes. It is possible that the
> >> PCI instance is configured for 2 lane operation across two different
>
> ...
>
> >>
> >> + /* Obtain reference(s) to the phy(s) */
> >> + for (i = 0; i < num_lanes; i++)
> >> + phy_pm_runtime_get_sync(ks_pcie->phy[i]);
> >> +
> >> ret = ks_pcie_enable_phy(ks_pcie);
> >> +
> >> + /* Release reference(s) to the phy(s) */
> >> + for (i = 0; i < num_lanes; i++)
> >> + phy_pm_runtime_put_sync(ks_pcie->phy[i]);
> >
> > This looks good and has already been applied, so no immediate action
> > required.
> >
> > This is the only call to ks_pcie_enable_phy(), and these loops get and
> > put the PM references for the same PHYs initialized in
> > ks_pcie_enable_phy(), so it seems like maybe these loops could be
> > moved *into* ks_pcie_enable_phy().
>
> Does the following look fine?
> ===============================================================================
> diff --git a/drivers/pci/controller/dwc/pci-keystone.c
> b/drivers/pci/controller/dwc/pci-keystone.c
> index e02236003b46..6e9f9589d26c 100644
> --- a/drivers/pci/controller/dwc/pci-keystone.c
> +++ b/drivers/pci/controller/dwc/pci-keystone.c
> @@ -962,6 +962,9 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie)
> int num_lanes = ks_pcie->num_lanes;
>
> for (i = 0; i < num_lanes; i++) {
> + /* Obtain reference to the phy */
> + phy_pm_runtime_get_sync(ks_pcie->phy[i]);

I thought the point was that you needed to guarantee that all PHYs are
powered on and stay that way before initializing any of them. If so,
you would need a separate loop, e.g.,

for (i = 0; i < num_lanes; i++)
phy_pm_runtime_get_sync(ks_pcie->phy[i]);

for (i = 0; i < num_lanes; i++) {
ret = phy_reset(ks_pcie->phy[i]);
...

> ret = phy_reset(ks_pcie->phy[i]);
> if (ret < 0)
> goto err_phy;
> @@ -977,12 +980,18 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie)
> }
> }
>
> + /* Release reference(s) to the phy(s) */
> + for (i = 0; i < num_lanes; i++)
> + phy_pm_runtime_put_sync(ks_pcie->phy[i]);
> +
> return 0;
>
> err_phy:
> while (--i >= 0) {
> phy_power_off(ks_pcie->phy[i]);
> phy_exit(ks_pcie->phy[i]);
> + /* Release reference to the phy */
> + phy_pm_runtime_put_sync(ks_pcie->phy[i]);
> }
>
> return ret;

2024-01-22 04:39:14

by Siddharth Vadapalli

[permalink] [raw]
Subject: Re: [PATCH v3] PCI: keystone: Fix race condition when initializing PHYs



On 20/01/24 04:50, Bjorn Helgaas wrote:
> On Wed, Jan 10, 2024 at 11:35:24AM +0530, Siddharth Vadapalli wrote:
>> Hello Bjorn,
>>
>> On 10/01/24 02:53, Bjorn Helgaas wrote:

..

>>
>> Does the following look fine?
>> ===============================================================================
>> diff --git a/drivers/pci/controller/dwc/pci-keystone.c
>> b/drivers/pci/controller/dwc/pci-keystone.c
>> index e02236003b46..6e9f9589d26c 100644
>> --- a/drivers/pci/controller/dwc/pci-keystone.c
>> +++ b/drivers/pci/controller/dwc/pci-keystone.c
>> @@ -962,6 +962,9 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie)
>> int num_lanes = ks_pcie->num_lanes;
>>
>> for (i = 0; i < num_lanes; i++) {
>> + /* Obtain reference to the phy */
>> + phy_pm_runtime_get_sync(ks_pcie->phy[i]);
>
> I thought the point was that you needed to guarantee that all PHYs are
> powered on and stay that way before initializing any of them. If so,
> you would need a separate loop, e.g.,
>
> for (i = 0; i < num_lanes; i++)
> phy_pm_runtime_get_sync(ks_pcie->phy[i]);
>
> for (i = 0; i < num_lanes; i++) {
> ret = phy_reset(ks_pcie->phy[i]);
> ...
>

Yes, the above implementation seems better. The strict requirement will be that
post initialization of the first PHY (Serdes), it remains powered ON so that it
can provide its reference clock to the second PHY (Serdes). Therefore, getting
the reference to the PHYs within the loop will work too since the reference is
being release only outside the loop. Nevertheless I shall go ahead with the
implementation suggested by you since that looks much better and cleaner.

>> ret = phy_reset(ks_pcie->phy[i]);
>> if (ret < 0)
>> goto err_phy;
>> @@ -977,12 +980,18 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie)
>> }
>> }
>>
>> + /* Release reference(s) to the phy(s) */
>> + for (i = 0; i < num_lanes; i++)
>> + phy_pm_runtime_put_sync(ks_pcie->phy[i]);
>> +
>> return 0;
>>
>> err_phy:
>> while (--i >= 0) {
>> phy_power_off(ks_pcie->phy[i]);
>> phy_exit(ks_pcie->phy[i]);
>> + /* Release reference to the phy */
>> + phy_pm_runtime_put_sync(ks_pcie->phy[i]);
>> }
>>
>> return ret;

--
Regards,
Siddharth.