2019-06-13 15:29:14

by Marc Gonzalez

[permalink] [raw]
Subject: [PATCH v1] phy: qcom-qmp: Raise qcom_qmp_phy_enable() polling delay

readl_poll_timeout() calls usleep_range() to sleep between reads.
usleep_range() doesn't work efficiently for tiny values.

Raise the polling delay in qcom_qmp_phy_enable() to bring it in line
with the delay in qcom_qmp_phy_com_init().

Signed-off-by: Marc Gonzalez <[email protected]>
---
Vivek, do you remember why you didn't use the same delay value in
qcom_qmp_phy_enable) and qcom_qmp_phy_com_init() ?
---
drivers/phy/qualcomm/phy-qcom-qmp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
index bb522b915fa9..34ff6434da8f 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
@@ -1548,7 +1548,7 @@ static int qcom_qmp_phy_enable(struct phy *phy)
status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
mask = cfg->mask_pcs_ready;

- ret = readl_poll_timeout(status, val, val & mask, 1,
+ ret = readl_poll_timeout(status, val, val & mask, 10,
PHY_INIT_COMPLETE_TIMEOUT);
if (ret) {
dev_err(qmp->dev, "phy initialization timed-out\n");
--
2.17.1


2019-06-14 09:51:53

by Vivek Gautam

[permalink] [raw]
Subject: Re: [PATCH v1] phy: qcom-qmp: Raise qcom_qmp_phy_enable() polling delay

Hi Marc,

On 6/13/2019 5:02 PM, Marc Gonzalez wrote:
> readl_poll_timeout() calls usleep_range() to sleep between reads.
> usleep_range() doesn't work efficiently for tiny values.
>
> Raise the polling delay in qcom_qmp_phy_enable() to bring it in line
> with the delay in qcom_qmp_phy_com_init().
>
> Signed-off-by: Marc Gonzalez <[email protected]>
> ---
> Vivek, do you remember why you didn't use the same delay value in
> qcom_qmp_phy_enable) and qcom_qmp_phy_com_init() ?

phy_qcom_init() thingy came from the PCIE phy driver from downstream
msm-3.18
PCIE did something as below:

-----
do {
        if (pcie_phy_is_ready(dev))
                break;
        retries++;
        usleep_range(REFCLK_STABILIZATION_DELAY_US_MIN,
                                 REFCLK_STABILIZATION_DELAY_US_MAX);
} while (retries < PHY_READY_TIMEOUT_COUNT);

REFCLK_STABILIZATION_DELAY_US_MIN/MAX ==> 1000/1005
PHY_READY_TIMEOUT_COUNT ==> 10
-----


phy_enable() from the usb phy driver from downstream.
 /* Wait for PHY initialization to be done */
 do {
         if (readl_relaxed(phy->base +
                 phy->phy_reg[USB3_PHY_PCS_STATUS]) & PHYSTATUS)
                 usleep_range(1, 2);
else
break;
 } while (--init_timeout_usec);

init_timeout_usec ==> 1000
-----
USB never had a COM_PHY status bit.

So clearly the resolutions were different.

Does this change solves an issue at hand?

> ---
> drivers/phy/qualcomm/phy-qcom-qmp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
> index bb522b915fa9..34ff6434da8f 100644
> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
> @@ -1548,7 +1548,7 @@ static int qcom_qmp_phy_enable(struct phy *phy)
> status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
> mask = cfg->mask_pcs_ready;
>
> - ret = readl_poll_timeout(status, val, val & mask, 1,
> + ret = readl_poll_timeout(status, val, val & mask, 10,
> PHY_INIT_COMPLETE_TIMEOUT);
> if (ret) {
> dev_err(qmp->dev, "phy initialization timed-out\n");

2019-06-14 12:38:45

by Marc Gonzalez

[permalink] [raw]
Subject: Re: [PATCH v1] phy: qcom-qmp: Raise qcom_qmp_phy_enable() polling delay

+ Doug (who is familiar with usleep_range quirks)

On 14/06/2019 11:50, Vivek Gautam wrote:

> On 6/13/2019 5:02 PM, Marc Gonzalez wrote:
>
>> readl_poll_timeout() calls usleep_range() to sleep between reads.
>> usleep_range() doesn't work efficiently for tiny values.
>>
>> Raise the polling delay in qcom_qmp_phy_enable() to bring it in line
>> with the delay in qcom_qmp_phy_com_init().
>>
>> Signed-off-by: Marc Gonzalez <[email protected]>
>> ---
>> Vivek, do you remember why you didn't use the same delay value in
>> qcom_qmp_phy_enable) and qcom_qmp_phy_com_init() ?
>
> phy_qcom_init() thingy came from the PCIE phy driver from downstream
> msm-3.18 PCIE did something as below:

FWIW and IMO, drivers/pci/host/pci-msm.c is a good example of how not to write
a device driver. It's huge (7000+ lines) because it handles multiple platforms
via ifdefs, and lumps everything together (phy, core IP, SoC specific glue)
in a single file.

> -----
> do {
>         if (pcie_phy_is_ready(dev))
>                 break;
>         retries++;
>         usleep_range(REFCLK_STABILIZATION_DELAY_US_MIN,
>                                  REFCLK_STABILIZATION_DELAY_US_MAX);
> } while (retries < PHY_READY_TIMEOUT_COUNT);
>
> REFCLK_STABILIZATION_DELAY_US_MIN/MAX ==> 1000/1005
> PHY_READY_TIMEOUT_COUNT ==> 10
> -----

https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/drivers/pci/host/pci-msm.c?h=LE.UM.1.3.r3.25#n4624

https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/drivers/pci/host/pci-msm.c?h=LE.UM.1.3.r3.25#n1721

readl_relaxed(dev->phy + PCIE_N_PCS_STATUS(dev->rc_idx, dev->common_phy)) & BIT(6)
is equivalent to:
the check in qcom_qmp_phy_enable()

readl_relaxed(dev->phy + PCIE_COM_PCS_READY_STATUS) & 0x1
is equivalent to:
the check in qcom_qmp_phy_com_init()

I'll take a closer look, using some printks, to narrow down the run-time
execution path.

> phy_enable() from the usb phy driver from downstream.
>  /* Wait for PHY initialization to be done */
>  do {
>          if (readl_relaxed(phy->base +
>                  phy->phy_reg[USB3_PHY_PCS_STATUS]) & PHYSTATUS)
>                  usleep_range(1, 2);
> else
> break;
>  } while (--init_timeout_usec);
>
> init_timeout_usec ==> 1000
> -----
> USB never had a COM_PHY status bit.
>
> So clearly the resolutions were different.
>
> Does this change solve an issue at hand?

The issue is usleep_range() being misused ^_^

Although usleep_range() takes unsigned longs as parameters, it is
not appropriate over the entire 0-2^64 range.

a) It should not be used with tiny values, because the cost of programming
the timer interrupt, and processing the resulting IRQ would dominate.

b) It should not be used with large values (above 2000000/HZ) because
msleep() is more efficient, and is acceptable for these ranges.

Regards.

2019-06-20 06:28:16

by Kishon Vijay Abraham I

[permalink] [raw]
Subject: Re: [PATCH v1] phy: qcom-qmp: Raise qcom_qmp_phy_enable() polling delay

Hi,

On 14/06/19 6:08 PM, Marc Gonzalez wrote:
> + Doug (who is familiar with usleep_range quirks)
>
> On 14/06/2019 11:50, Vivek Gautam wrote:
>
>> On 6/13/2019 5:02 PM, Marc Gonzalez wrote:
>>
>>> readl_poll_timeout() calls usleep_range() to sleep between reads.
>>> usleep_range() doesn't work efficiently for tiny values.
>>>
>>> Raise the polling delay in qcom_qmp_phy_enable() to bring it in line
>>> with the delay in qcom_qmp_phy_com_init().
>>>
>>> Signed-off-by: Marc Gonzalez <[email protected]>
>>> ---
>>> Vivek, do you remember why you didn't use the same delay value in
>>> qcom_qmp_phy_enable) and qcom_qmp_phy_com_init() ?
>>
>> phy_qcom_init() thingy came from the PCIE phy driver from downstream
>> msm-3.18 PCIE did something as below:
>
> FWIW and IMO, drivers/pci/host/pci-msm.c is a good example of how not to write
> a device driver. It's huge (7000+ lines) because it handles multiple platforms
> via ifdefs, and lumps everything together (phy, core IP, SoC specific glue)
> in a single file.
>
>> -----
>> do {
>>         if (pcie_phy_is_ready(dev))
>>                 break;
>>         retries++;
>>         usleep_range(REFCLK_STABILIZATION_DELAY_US_MIN,
>>                                  REFCLK_STABILIZATION_DELAY_US_MAX);
>> } while (retries < PHY_READY_TIMEOUT_COUNT);
>>
>> REFCLK_STABILIZATION_DELAY_US_MIN/MAX ==> 1000/1005
>> PHY_READY_TIMEOUT_COUNT ==> 10
>> -----
>
> https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/drivers/pci/host/pci-msm.c?h=LE.UM.1.3.r3.25#n4624
>
> https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/drivers/pci/host/pci-msm.c?h=LE.UM.1.3.r3.25#n1721
>
> readl_relaxed(dev->phy + PCIE_N_PCS_STATUS(dev->rc_idx, dev->common_phy)) & BIT(6)
> is equivalent to:
> the check in qcom_qmp_phy_enable()
>
> readl_relaxed(dev->phy + PCIE_COM_PCS_READY_STATUS) & 0x1
> is equivalent to:
> the check in qcom_qmp_phy_com_init()
>
> I'll take a closer look, using some printks, to narrow down the run-time
> execution path.
>
>> phy_enable() from the usb phy driver from downstream.
>>  /* Wait for PHY initialization to be done */
>>  do {
>>          if (readl_relaxed(phy->base +
>>                  phy->phy_reg[USB3_PHY_PCS_STATUS]) & PHYSTATUS)
>>                  usleep_range(1, 2);
>> else
>> break;
>>  } while (--init_timeout_usec);
>>
>> init_timeout_usec ==> 1000
>> -----
>> USB never had a COM_PHY status bit.
>>
>> So clearly the resolutions were different.
>>
>> Does this change solve an issue at hand?
>
> The issue is usleep_range() being misused ^_^
>
> Although usleep_range() takes unsigned longs as parameters, it is
> not appropriate over the entire 0-2^64 range.
>
> a) It should not be used with tiny values, because the cost of programming
> the timer interrupt, and processing the resulting IRQ would dominate.
>
> b) It should not be used with large values (above 2000000/HZ) because
> msleep() is more efficient, and is acceptable for these ranges.

Documentation/timers/timers-howto.txt has all the information on the various
kernel delay/sleep mechanisms. For < ~10us, it recommends to use udelay
(readx_poll_timeout_atomic). Depending on the actual timeout to be used, the
delay mechanism in timers-howto.txt should be used.

Thanks
Kishon

2019-06-24 14:00:45

by Marc Gonzalez

[permalink] [raw]
Subject: Re: [PATCH v1] phy: qcom-qmp: Raise qcom_qmp_phy_enable() polling delay

On 20/06/2019 08:25, Kishon Vijay Abraham I wrote:

> On 14/06/19 6:08 PM, Marc Gonzalez wrote:
>
>> The issue is usleep_range() being misused ^_^
>>
>> Although usleep_range() takes unsigned longs as parameters, it is
>> not appropriate over the entire 0-2^64 range.
>>
>> a) It should not be used with tiny values, because the cost of programming
>> the timer interrupt, and processing the resulting IRQ would dominate.
>>
>> b) It should not be used with large values (above 2000000/HZ) because
>> msleep() is more efficient, and is acceptable for these ranges.
>
> Documentation/timers/timers-howto.txt has all the information on the various
> kernel delay/sleep mechanisms. For < ~10us, it recommends to use udelay
> (readx_poll_timeout_atomic). Depending on the actual timeout to be used, the
> delay mechanism in timers-howto.txt should be used.

Hello Kishon,

I believe the proposed patch does the right thing:

a) polling for the ready bit is not done in atomic context,
therefore we don't need to busy-loop

b) since we're ultimately calling usleep_range(), we should
pass an appropriate parameter, such as max_us = 10
(instead of max_us = 1, which is outside usleep_range spec)

Maybe it would help if someone reviewed this patch.

Regards.

2019-06-24 19:49:55

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v1] phy: qcom-qmp: Raise qcom_qmp_phy_enable() polling delay

Hi,

On Thu, Jun 13, 2019 at 8:28 AM Marc Gonzalez <[email protected]> wrote:
>
> readl_poll_timeout() calls usleep_range() to sleep between reads.
> usleep_range() doesn't work efficiently for tiny values.
>
> Raise the polling delay in qcom_qmp_phy_enable() to bring it in line
> with the delay in qcom_qmp_phy_com_init().
>
> Signed-off-by: Marc Gonzalez <[email protected]>
> ---
> Vivek, do you remember why you didn't use the same delay value in
> qcom_qmp_phy_enable) and qcom_qmp_phy_com_init() ?
> ---
> drivers/phy/qualcomm/phy-qcom-qmp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
> index bb522b915fa9..34ff6434da8f 100644
> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
> @@ -1548,7 +1548,7 @@ static int qcom_qmp_phy_enable(struct phy *phy)
> status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
> mask = cfg->mask_pcs_ready;
>
> - ret = readl_poll_timeout(status, val, val & mask, 1,
> + ret = readl_poll_timeout(status, val, val & mask, 10,
> PHY_INIT_COMPLETE_TIMEOUT);

I would agree that the existing code is almost certainly wrong, since,
as you said, trying to sleep for 1 us is likely pointless. I quickly
coded up a test and ran it on sdm845-cheza. It looked like this:

--

ktime_t a, b, c;

a = ktime_get();
b = ktime_get();
usleep_range(1, 1);
c = ktime_get();

pr_info("DOUG: %d ns, %d ns\n", (int)ktime_to_ns(ktime_sub(b, a)),
(int)ktime_to_ns(ktime_sub(c, b)));

--

At bootup I got:

[ 4.121247] DOUG: 52 ns, 9479 ns
[ 4.144990] DOUG: 52 ns, 9636 ns
[ 4.328168] DOUG: 0 ns, 11667 ns
[ 4.332659] DOUG: 52 ns, 7136 ns
[ 4.358833] DOUG: 0 ns, 6666 ns
[ 4.362095] DOUG: 52 ns, 8229 ns

So basically the existing code is already waiting 5-10 us between
polls but it's spending all of that time context switching. Changing
the above to:

usleep_range(5, 10);

Give me instead:

[ 4.120781] DOUG: 52 ns, 16927 ns
[ 4.144626] DOUG: 53 ns, 17447 ns
[ 4.327932] DOUG: 52 ns, 11302 ns
[ 4.332501] DOUG: 0 ns, 7395 ns
[ 4.357912] DOUG: 0 ns, 6823 ns
[ 4.361175] DOUG: 52 ns, 9063 ns

...and that seems fine to me.

--

Thus:

Reviewed-by: Douglas Anderson <[email protected]>