2021-09-30 08:06:18

by Tony Lindgren

[permalink] [raw]
Subject: [PATCH] soc: ti: omap-prm: Fix external abort for am335x pruss

Starting with v5.15-rc1, we may now see some am335x beaglebone black
device produce the following error on pruss probe:

Unhandled fault: external abort on non-linefetch (0x1008) at 0xe0326000

This has started with the enabling of pruss for am335x in the dts files.

Turns out the is caused by the PRM reset handling not waiting for the
reset bit to clear. To fix the issue, let's always wait for the reset
bit to clear, even if there is a separate reset status register.

We attempted to fix a similar issue for dra7 iva with a udelay() in
commit effe89e40037 ("soc: ti: omap-prm: Fix occasional abort on reset
deassert for dra7 iva"). There is no longer a need for the udelay()
for dra7 iva reset either with the check added for reset bit clearing.

Cc: Drew Fustini <[email protected]>
Cc: Grygorii Strashko <[email protected]>
Cc: "H. Nikolaus Schaller" <[email protected]>
Cc: Robert Nelson <[email protected]>
Cc: Yongqin Liu <[email protected]>
Reported-by: Matti Vaittinen <[email protected]>
Fixes: effe89e40037 ("soc: ti: omap-prm: Fix occasional abort on reset deassert for dra7 iva")
Signed-off-by: Tony Lindgren <[email protected]>
---
drivers/soc/ti/omap_prm.c | 27 +++++++++++++++------------
1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c
--- a/drivers/soc/ti/omap_prm.c
+++ b/drivers/soc/ti/omap_prm.c
@@ -825,25 +825,28 @@ static int omap_reset_deassert(struct reset_controller_dev *rcdev,
writel_relaxed(v, reset->prm->base + reset->prm->data->rstctrl);
spin_unlock_irqrestore(&reset->lock, flags);

- if (!has_rstst)
- goto exit;
+ /* wait for the reset bit to clear */
+ ret = readl_relaxed_poll_timeout_atomic(reset->prm->base +
+ reset->prm->data->rstctrl,
+ v, !(v & BIT(id)), 1,
+ OMAP_RESET_MAX_WAIT);
+ if (ret)
+ pr_err("%s: timedout waiting for %s:%lu\n", __func__,
+ reset->prm->data->name, id);

/* wait for the status to be set */
- ret = readl_relaxed_poll_timeout_atomic(reset->prm->base +
+ if (has_rstst) {
+ ret = readl_relaxed_poll_timeout_atomic(reset->prm->base +
reset->prm->data->rstst,
v, v & BIT(st_bit), 1,
OMAP_RESET_MAX_WAIT);
- if (ret)
- pr_err("%s: timedout waiting for %s:%lu\n", __func__,
- reset->prm->data->name, id);
+ if (ret)
+ pr_err("%s: timedout waiting for %s:%lu\n", __func__,
+ reset->prm->data->name, id);
+ }

-exit:
- if (reset->clkdm) {
- /* At least dra7 iva needs a delay before clkdm idle */
- if (has_rstst)
- udelay(1);
+ if (reset->clkdm)
pdata->clkdm_allow_idle(reset->clkdm);
- }

return ret;
}
--
2.33.0


2021-09-30 11:21:54

by Matti Vaittinen

[permalink] [raw]
Subject: Re: [PATCH] soc: ti: omap-prm: Fix external abort for am335x pruss

Thanks Tony!

This was _much_ appreciated :)

On 9/30/21 11:01, Tony Lindgren wrote:
> Starting with v5.15-rc1, we may now see some am335x beaglebone black
> device produce the following error on pruss probe:
>
> Unhandled fault: external abort on non-linefetch (0x1008) at 0xe0326000
>
> This has started with the enabling of pruss for am335x in the dts files.
>
> Turns out the is caused by the PRM reset handling not waiting for the
> reset bit to clear. To fix the issue, let's always wait for the reset
> bit to clear, even if there is a separate reset status register.
>
> We attempted to fix a similar issue for dra7 iva with a udelay() in
> commit effe89e40037 ("soc: ti: omap-prm: Fix occasional abort on reset
> deassert for dra7 iva"). There is no longer a need for the udelay()
> for dra7 iva reset either with the check added for reset bit clearing.
>
> Cc: Drew Fustini <[email protected]>
> Cc: Grygorii Strashko <[email protected]>
> Cc: "H. Nikolaus Schaller" <[email protected]>
> Cc: Robert Nelson <[email protected]>
> Cc: Yongqin Liu <[email protected]>
> Reported-by: Matti Vaittinen <[email protected]>
> Fixes: effe89e40037 ("soc: ti: omap-prm: Fix occasional abort on reset deassert for dra7 iva")

Tested-by: Matti Vaittinen <[email protected]>

> Signed-off-by: Tony Lindgren <[email protected]>
> ---
> drivers/soc/ti/omap_prm.c | 27 +++++++++++++++------------
> 1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c
> --- a/drivers/soc/ti/omap_prm.c
> +++ b/drivers/soc/ti/omap_prm.c
> @@ -825,25 +825,28 @@ static int omap_reset_deassert(struct reset_controller_dev *rcdev,
> writel_relaxed(v, reset->prm->base + reset->prm->data->rstctrl);
> spin_unlock_irqrestore(&reset->lock, flags);
>
> - if (!has_rstst)
> - goto exit;
> + /* wait for the reset bit to clear */
> + ret = readl_relaxed_poll_timeout_atomic(reset->prm->base +
> + reset->prm->data->rstctrl,
> + v, !(v & BIT(id)), 1,
> + OMAP_RESET_MAX_WAIT);
> + if (ret)
> + pr_err("%s: timedout waiting for %s:%lu\n", __func__,
> + reset->prm->data->name, id);

If I was writing this I might drop the __func__. AFAIR dyndbg allows
enabling the functipn names to be printed by +f. This is just a 'nit'
though - I am happy if this fix gets in no matter how this print
eventually looks like. I just thought I mention this as the __func__
catched my eye.

>
> /* wait for the status to be set */
> - ret = readl_relaxed_poll_timeout_atomic(reset->prm->base +
> + if (has_rstst) {
> + ret = readl_relaxed_poll_timeout_atomic(reset->prm->base +
> reset->prm->data->rstst,
> v, v & BIT(st_bit), 1,
> OMAP_RESET_MAX_WAIT);
> - if (ret)
> - pr_err("%s: timedout waiting for %s:%lu\n", __func__,
> - reset->prm->data->name, id);
> + if (ret)
> + pr_err("%s: timedout waiting for %s:%lu\n", __func__,
> + reset->prm->data->name, id);

Same here (although that would be unrelated change as the print exists
prior this patch).

I tested this patch on v5.15-rc3 using my BBB Rev C - it seems to fix
the boot issue on my board! Thanks a bunch!


Best Regards
--Matti Vaittinen


2021-10-06 05:11:58

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH] soc: ti: omap-prm: Fix external abort for am335x pruss

* Matti Vaittinen <[email protected]> [210930 11:20]:
> Thanks Tony!
>
> This was _much_ appreciated :)

Thanks for testing, applying this into fixes.

Regards,

Tony