Hi all,
When implementing a polling loop, a common review comment is to use one
of the read*_poll_timeout*() helpers. Unfortunately this is not always
possible, or might introduce subtle bugs. This patch series aims to
improve these helpers, so they can gain wider use.
1. The first patch improves busy-looping behavior of both the atomic
and non-atomic read*_poll_timeout*() helpers.
The issue addressed by this patch was discussed before[1-2], but I
am not aware of any patches moving forward.
2. The second patch fixes timeout handling of the atomic variants.
Some of the issues addressed by this patch were mitigated in
various places[3-5], and some of these findings may be of interest
to the authors of [6-8].
The first two patches were sent before, and already received some acks
and reviews. I plan to queue these in an immutable and tagged branch
after the weekend, for consumption by myself, and by other interested
parties.
The last five patches are new, and convert several Renesas-specific
drivers from open-coded loops to the fixed read*_poll_timeout_atomic()
helpers:
- I plan to queue the clk and soc patches in renesas-devel resp.
renesas-clk for v6.5,
- The IOMMU patch can wait for v6.6, unless the IOMMU maintainers
already want to merge the immutable branch, too.
Thanks for your comments!
Changes compared to v2[9]:
- Add Acked-by, Reviewed-by,
- Add comment about not using timekeeping, and its impact,
- Add conversion patches 3-7.
Changes compared to v1[10]:
- Add Acked-by,
- Add compiler barrier and inlining explanation (thanks, Peter!),
- Add patch [2/2].
Thanks for your comments!
[1] "Re: [PATCH 6/7] clk: renesas: rcar-gen3: Add custom clock for PLLs"
https://lore.kernel.org/all/CAMuHMdWUEhs=nwP+a0vO2jOzkq-7FEOqcJ+SsxAGNXX1PQ2KMA@mail.gmail.com
[2] "Re: [PATCH v2] clk: samsung: Prevent potential endless loop in the
PLL set_rate ops"
https://lore.kernel.org/all/20200811164628.GA7958@kozik-lap
[3] b3e9431854e8f305 ("bus: ti-sysc: Fix timekeeping_suspended warning
on resume")
[4] 44a9e78f9242872c ("clk: samsung: Prevent potential endless loop in
the PLL ops")
[5] 3d8598fb9c5a7783 ("clk: ti: clkctrl: use fallback udelay approach if
timekeeping is suspended")
[6] bd40cbb0e3b37a4d ("PM: domains: Move genpd's time-accounting to
ktime_get_mono_fast_ns()")
[7] c155f6499f9797f2 ("PM-runtime: Switch accounting over to
ktime_get_mono_fast_ns()")
[8] 15efb47dc560849d ("PM-runtime: Fix deadlock with ktime_get()")
[9] https://lore.kernel.org/all/[email protected]
[10] https://lore.kernel.org/r/8d492ee4a391bd089a01c218b0b4e05cf8ea593c.1674729407.git.geert+renesas@glider.be/
Geert Uytterhoeven (7):
iopoll: Call cpu_relax() in busy loops
iopoll: Do not use timekeeping in read_poll_timeout_atomic()
clk: renesas: cpg-mssr: Convert to readl_poll_timeout_atomic()
clk: renesas: mstp: Convert to readl_poll_timeout_atomic()
clk: renesas: rzg2l: Convert to readl_poll_timeout_atomic()
soc: renesas: rmobile-sysc: Convert to readl_poll_timeout_atomic()
iommu/ipmmu-vmsa: Convert to read_poll_timeout_atomic()
drivers/clk/renesas/clk-mstp.c | 18 ++++++---------
drivers/clk/renesas/renesas-cpg-mssr.c | 31 +++++++++-----------------
drivers/clk/renesas/rzg2l-cpg.c | 16 +++++--------
drivers/iommu/ipmmu-vmsa.c | 15 +++++--------
drivers/soc/renesas/rmobile-sysc.c | 31 +++++++++-----------------
include/linux/iopoll.h | 24 +++++++++++++++-----
6 files changed, 58 insertions(+), 77 deletions(-)
--
2.34.1
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Use readl_poll_timeout_atomic() instead of open-coding the same
operation.
As typically no retries are needed, 10 µs is a suitable timeout value.
Signed-off-by: Geert Uytterhoeven <[email protected]>
---
Polling measurements done on RZ/Five.
v3:
- New.
---
drivers/clk/renesas/rzg2l-cpg.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/drivers/clk/renesas/rzg2l-cpg.c b/drivers/clk/renesas/rzg2l-cpg.c
index ca8b921c77625317..bc623515ad843cf5 100644
--- a/drivers/clk/renesas/rzg2l-cpg.c
+++ b/drivers/clk/renesas/rzg2l-cpg.c
@@ -903,9 +903,9 @@ static int rzg2l_mod_clock_endisable(struct clk_hw *hw, bool enable)
unsigned int reg = clock->off;
struct device *dev = priv->dev;
unsigned long flags;
- unsigned int i;
u32 bitmask = BIT(clock->bit);
u32 value;
+ int error;
if (!clock->off) {
dev_dbg(dev, "%pC does not support ON/OFF\n", hw->clk);
@@ -930,19 +930,13 @@ static int rzg2l_mod_clock_endisable(struct clk_hw *hw, bool enable)
if (!priv->info->has_clk_mon_regs)
return 0;
- for (i = 1000; i > 0; --i) {
- if (((readl(priv->base + CLK_MON_R(reg))) & bitmask))
- break;
- cpu_relax();
- }
-
- if (!i) {
+ error = readl_poll_timeout_atomic(priv->base + CLK_MON_R(reg), value,
+ value & bitmask, 0, 10);
+ if (error)
dev_err(dev, "Failed to enable CLK_ON %p\n",
priv->base + CLK_ON_R(reg));
- return -ETIMEDOUT;
- }
- return 0;
+ return error;
}
static int rzg2l_mod_clock_enable(struct clk_hw *hw)
--
2.34.1
It is considered good practice to call cpu_relax() in busy loops, see
Documentation/process/volatile-considered-harmful.rst. This can not
only lower CPU power consumption or yield to a hyperthreaded twin
processor, but also allows an architecture to mitigate hardware issues
(e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the
architecture-specific cpu_relax() implementation.
In addition, cpu_relax() is also a compiler barrier. It is not
immediately obvious that the @op argument "function" will result in an
actual function call (e.g. in case of inlining).
Where a function call is a C sequence point, this is lost on inlining.
Therefore, with agressive enough optimization it might be possible for
the compiler to hoist the:
(val) = op(args);
"load" out of the loop because it doesn't see the value changing. The
addition of cpu_relax() would inhibit this.
As the iopoll helpers lack calls to cpu_relax(), people are sometimes
reluctant to use them, and may fall back to open-coded polling loops
(including cpu_relax() calls) instead.
Fix this by adding calls to cpu_relax() to the iopoll helpers:
- For the non-atomic case, it is sufficient to call cpu_relax() in
case of a zero sleep-between-reads value, as a call to
usleep_range() is a safe barrier otherwise. However, it doesn't
hurt to add the call regardless, for simplicity, and for similarity
with the atomic case below.
- For the atomic case, cpu_relax() must be called regardless of the
sleep-between-reads value, as there is no guarantee all
architecture-specific implementations of udelay() handle this.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Reviewed-by: Tony Lindgren <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
---
v3:
- Add Reviewed-by,
v2:
- Add Acked-by,
- Add compiler barrier and inlining explanation (thanks, Peter!).
---
include/linux/iopoll.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
index 2c8860e406bd8cae..0417360a6db9b0d6 100644
--- a/include/linux/iopoll.h
+++ b/include/linux/iopoll.h
@@ -53,6 +53,7 @@
} \
if (__sleep_us) \
usleep_range((__sleep_us >> 2) + 1, __sleep_us); \
+ cpu_relax(); \
} \
(cond) ? 0 : -ETIMEDOUT; \
})
@@ -95,6 +96,7 @@
} \
if (__delay_us) \
udelay(__delay_us); \
+ cpu_relax(); \
} \
(cond) ? 0 : -ETIMEDOUT; \
})
--
2.34.1
Use readl_poll_timeout_atomic() instead of open-coding the same
operation.
As typically no retries are needed, 10 µs is a suitable timeout value.
Signed-off-by: Geert Uytterhoeven <[email protected]>
---
Polling measurements done on R-Car M2-W, H3 ES2.0, M3-W, M3-N, E3, and
V4H.
v3:
- New.
---
drivers/clk/renesas/renesas-cpg-mssr.c | 31 +++++++++-----------------
1 file changed, 11 insertions(+), 20 deletions(-)
diff --git a/drivers/clk/renesas/renesas-cpg-mssr.c b/drivers/clk/renesas/renesas-cpg-mssr.c
index e9c0e341380e8f55..2772499d20165127 100644
--- a/drivers/clk/renesas/renesas-cpg-mssr.c
+++ b/drivers/clk/renesas/renesas-cpg-mssr.c
@@ -17,6 +17,7 @@
#include <linux/device.h>
#include <linux/init.h>
#include <linux/io.h>
+#include <linux/iopoll.h>
#include <linux/mod_devicetable.h>
#include <linux/module.h>
#include <linux/of_address.h>
@@ -196,8 +197,8 @@ static int cpg_mstp_clock_endisable(struct clk_hw *hw, bool enable)
struct device *dev = priv->dev;
u32 bitmask = BIT(bit);
unsigned long flags;
- unsigned int i;
u32 value;
+ int error;
dev_dbg(dev, "MSTP %u%02u/%pC %s\n", reg, bit, hw->clk,
enable ? "ON" : "OFF");
@@ -228,19 +229,13 @@ static int cpg_mstp_clock_endisable(struct clk_hw *hw, bool enable)
if (!enable || priv->reg_layout == CLK_REG_LAYOUT_RZ_A)
return 0;
- for (i = 1000; i > 0; --i) {
- if (!(readl(priv->base + priv->status_regs[reg]) & bitmask))
- break;
- cpu_relax();
- }
-
- if (!i) {
+ error = readl_poll_timeout_atomic(priv->base + priv->status_regs[reg],
+ value, !(value & bitmask), 0, 10);
+ if (error)
dev_err(dev, "Failed to enable SMSTP %p[%d]\n",
priv->base + priv->control_regs[reg], bit);
- return -ETIMEDOUT;
- }
- return 0;
+ return error;
}
static int cpg_mstp_clock_enable(struct clk_hw *hw)
@@ -896,8 +891,9 @@ static int cpg_mssr_suspend_noirq(struct device *dev)
static int cpg_mssr_resume_noirq(struct device *dev)
{
struct cpg_mssr_priv *priv = dev_get_drvdata(dev);
- unsigned int reg, i;
+ unsigned int reg;
u32 mask, oldval, newval;
+ int error;
/* This is the best we can do to check for the presence of PSCI */
if (!psci_ops.cpu_suspend)
@@ -935,14 +931,9 @@ static int cpg_mssr_resume_noirq(struct device *dev)
if (!mask)
continue;
- for (i = 1000; i > 0; --i) {
- oldval = readl(priv->base + priv->status_regs[reg]);
- if (!(oldval & mask))
- break;
- cpu_relax();
- }
-
- if (!i)
+ error = readl_poll_timeout_atomic(priv->base + priv->status_regs[reg],
+ oldval, !(oldval & mask), 0, 10);
+ if (error)
dev_warn(dev, "Failed to enable SMSTP%u[0x%x]\n", reg,
oldval & mask);
}
--
2.34.1
Use readl_poll_timeout_atomic() instead of open-coding the same
operation.
As typically no retries are needed, 10 µs is a suitable timeout value.
Signed-off-by: Geert Uytterhoeven <[email protected]>
---
Polling measurements done on R-Mobile APE6 and A1, R-Car H1, and
SH-Mobile AG5.
v3:
- New.
---
drivers/clk/renesas/clk-mstp.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/drivers/clk/renesas/clk-mstp.c b/drivers/clk/renesas/clk-mstp.c
index 6e3c4a9c16b07ae9..e96457371b4cce88 100644
--- a/drivers/clk/renesas/clk-mstp.c
+++ b/drivers/clk/renesas/clk-mstp.c
@@ -14,6 +14,7 @@
#include <linux/clk/renesas.h>
#include <linux/device.h>
#include <linux/io.h>
+#include <linux/iopoll.h>
#include <linux/of.h>
#include <linux/of_address.h>
#include <linux/pm_clock.h>
@@ -78,8 +79,8 @@ static int cpg_mstp_clock_endisable(struct clk_hw *hw, bool enable)
struct mstp_clock_group *group = clock->group;
u32 bitmask = BIT(clock->bit_index);
unsigned long flags;
- unsigned int i;
u32 value;
+ int ret;
spin_lock_irqsave(&group->lock, flags);
@@ -102,19 +103,14 @@ static int cpg_mstp_clock_endisable(struct clk_hw *hw, bool enable)
if (!enable || !group->mstpsr)
return 0;
- for (i = 1000; i > 0; --i) {
- if (!(cpg_mstp_read(group, group->mstpsr) & bitmask))
- break;
- cpu_relax();
- }
-
- if (!i) {
+ /* group->width_8bit is always false if group->mstpsr is present */
+ ret = readl_poll_timeout_atomic(group->mstpsr, value,
+ !(value & bitmask), 0, 10);
+ if (ret)
pr_err("%s: failed to enable %p[%d]\n", __func__,
group->smstpcr, clock->bit_index);
- return -ETIMEDOUT;
- }
- return 0;
+ return ret;
}
static int cpg_mstp_clock_enable(struct clk_hw *hw)
--
2.34.1
Use read_poll_timeout_atomic() instead of open-coding the same
operation.
Signed-off-by: Geert Uytterhoeven <[email protected]>
---
v3:
- New.
---
drivers/iommu/ipmmu-vmsa.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 9f64c5c9f5b90ace..3b58a8ea3bdef190 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -14,6 +14,7 @@
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/io.h>
+#include <linux/iopoll.h>
#include <linux/io-pgtable.h>
#include <linux/iommu.h>
#include <linux/of.h>
@@ -253,17 +254,13 @@ static void ipmmu_imuctr_write(struct ipmmu_vmsa_device *mmu,
/* Wait for any pending TLB invalidations to complete */
static void ipmmu_tlb_sync(struct ipmmu_vmsa_domain *domain)
{
- unsigned int count = 0;
+ u32 val;
- while (ipmmu_ctx_read_root(domain, IMCTR) & IMCTR_FLUSH) {
- cpu_relax();
- if (++count == TLB_LOOP_TIMEOUT) {
- dev_err_ratelimited(domain->mmu->dev,
+ if (read_poll_timeout_atomic(ipmmu_ctx_read_root, val,
+ !(val & IMCTR_FLUSH), 1, TLB_LOOP_TIMEOUT,
+ false, domain, IMCTR))
+ dev_err_ratelimited(domain->mmu->dev,
"TLB sync timed out -- MMU may be deadlocked\n");
- return;
- }
- udelay(1);
- }
}
static void ipmmu_tlb_invalidate(struct ipmmu_vmsa_domain *domain)
--
2.34.1
On Fri, Jun 2, 2023 at 10:51 AM Geert Uytterhoeven
<[email protected]> wrote:
> When implementing a polling loop, a common review comment is to use one
> of the read*_poll_timeout*() helpers. Unfortunately this is not always
> possible, or might introduce subtle bugs. This patch series aims to
> improve these helpers, so they can gain wider use.
>
> 1. The first patch improves busy-looping behavior of both the atomic
> and non-atomic read*_poll_timeout*() helpers.
> The issue addressed by this patch was discussed before[1-2], but I
> am not aware of any patches moving forward.
>
> 2. The second patch fixes timeout handling of the atomic variants.
> Some of the issues addressed by this patch were mitigated in
> various places[3-5], and some of these findings may be of interest
> to the authors of [6-8].
>
> The first two patches were sent before, and already received some acks
> and reviews. I plan to queue these in an immutable and tagged branch
> after the weekend, for consumption by myself, and by other interested
> parties.
FTR...
The following changes since commit ac9a78681b921877518763ba0e89202254349d1b:
Linux 6.4-rc1 (2023-05-07 13:34:35 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git
tags/iopoll-busy-loop-timeout-tag
for you to fetch changes up to 7349a69cf3125e92d48e442d9f400ba446fa314f:
iopoll: Do not use timekeeping in read_poll_timeout_atomic()
(2023-06-05 15:35:27 +0200)
----------------------------------------------------------------
iopoll: Busy loop and timeout improvements
----------------------------------------------------------------
Geert Uytterhoeven (2):
iopoll: Call cpu_relax() in busy loops
iopoll: Do not use timekeeping in read_poll_timeout_atomic()
include/linux/iopoll.h | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds