2022-11-25 16:45:11

by Ricardo Ribalda

[permalink] [raw]
Subject: [PATCH] iommu/mediatek: Fix crash on isr after kexec()

If the system is rebooted via isr(), the IRQ handler might be triggerd
before the domain is initialized. Resulting on an invalid memory access
error.

Fix:
[ 0.500930] Unable to handle kernel read from unreadable memory at virtual address 0000000000000070
[ 0.501166] Call trace:
[ 0.501174] report_iommu_fault+0x28/0xfc
[ 0.501180] mtk_iommu_isr+0x10c/0x1c0

Signed-off-by: Ricardo Ribalda <[email protected]>
---
To: Yong Wu <[email protected]>
To: Joerg Roedel <[email protected]>
To: Will Deacon <[email protected]>
To: Robin Murphy <[email protected]>
To: Matthias Brugger <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
drivers/iommu/mtk_iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 2ab2ecfe01f8..17f6be5a5097 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm];
}

- if (report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
+ if (dom && report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
dev_err_ratelimited(
bank->parent_dev,

---
base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
change-id: 20221125-mtk-iommu-13023f971298

Best regards,
--
Ricardo Ribalda <[email protected]>


2022-11-25 17:46:59

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] iommu/mediatek: Fix crash on isr after kexec()

On 2022-11-25 16:28, Ricardo Ribalda wrote:
> If the system is rebooted via isr(), the IRQ handler might be triggerd
> before the domain is initialized. Resulting on an invalid memory access
> error.
>
> Fix:
> [ 0.500930] Unable to handle kernel read from unreadable memory at virtual address 0000000000000070
> [ 0.501166] Call trace:
> [ 0.501174] report_iommu_fault+0x28/0xfc
> [ 0.501180] mtk_iommu_isr+0x10c/0x1c0

Hmm, shouldn't we clear any pending faults at probe in
mtk_iommu_hw_init(), before the IRQ is requested? mtk_iommu_isr() might
still want to be robust against a spurious interrupt, but then it can
simply return without doing anything at all if the domain is NULL, since
we'll know that's the case.

Thanks,
Robin.

(It might be nice if request_irq() had a flag to say "if this IRQ looks
pending already just clear it" for drivers that know it could only be
spurious at that point; kexec seems to lead to this problem quite a lot...)

> Signed-off-by: Ricardo Ribalda <[email protected]>
> ---
> To: Yong Wu <[email protected]>
> To: Joerg Roedel <[email protected]>
> To: Will Deacon <[email protected]>
> To: Robin Murphy <[email protected]>
> To: Matthias Brugger <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> drivers/iommu/mtk_iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 2ab2ecfe01f8..17f6be5a5097 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
> fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm];
> }
>
> - if (report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
> + if (dom && report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
> write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
> dev_err_ratelimited(
> bank->parent_dev,
>
> ---
> base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
> change-id: 20221125-mtk-iommu-13023f971298
>
> Best regards,

2022-11-25 18:17:57

by Ricardo Ribalda

[permalink] [raw]
Subject: Re: [PATCH] iommu/mediatek: Fix crash on isr after kexec()

Hi Robin


Thanks for your review!

On Fri, 25 Nov 2022 at 18:02, Robin Murphy <[email protected]> wrote:
>
> On 2022-11-25 16:28, Ricardo Ribalda wrote:
> > If the system is rebooted via isr(), the IRQ handler might be triggerd
> > before the domain is initialized. Resulting on an invalid memory access
> > error.
> >
> > Fix:
> > [ 0.500930] Unable to handle kernel read from unreadable memory at virtual address 0000000000000070
> > [ 0.501166] Call trace:
> > [ 0.501174] report_iommu_fault+0x28/0xfc
> > [ 0.501180] mtk_iommu_isr+0x10c/0x1c0
>
> Hmm, shouldn't we clear any pending faults at probe in
> mtk_iommu_hw_init(), before the IRQ is requested? mtk_iommu_isr() might
> still want to be robust against a spurious interrupt, but then it can
> simply return without doing anything at all if the domain is NULL, since
> we'll know that's the case.
>
> Thanks,
> Robin.
>
> (It might be nice if request_irq() had a flag to say "if this IRQ looks
> pending already just clear it" for drivers that know it could only be
> spurious at that point; kexec seems to lead to this problem quite a lot...)

It is not only about the "last" IRQ before kexec. The peripherals
under the IOMMU might still active and producing faults and therefore
IRQs.

I tried this:

@@ -886,6 +886,11 @@ static int mtk_iommu_hw_init(const struct
mtk_iommu_data *data, unsigned int ban
upper_32_bits(data->protect_base);
writel_relaxed(regval, bankx->base + REG_MMU_IVRP_PADDR);

+ /* Clear previous IRQs */
+ regval = readl_relaxed(bankx->base + REG_MMU_INT_CONTROL0);
+ regval |= F_INT_CLR_BIT;
+ writel_relaxed(regval, bankx->base + REG_MMU_INT_CONTROL0);
+
if (devm_request_irq(bankx->pdev, bankx->irq, mtk_iommu_isr, 0,
dev_name(bankx->pdev), (void *)bankx)) {
writel_relaxed(0, bankx->base + REG_MMU_PT_BASE_ADDR);

And I still get the same crash


>
> > Signed-off-by: Ricardo Ribalda <[email protected]>
> > ---
> > To: Yong Wu <[email protected]>
> > To: Joerg Roedel <[email protected]>
> > To: Will Deacon <[email protected]>
> > To: Robin Murphy <[email protected]>
> > To: Matthias Brugger <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > ---
> > drivers/iommu/mtk_iommu.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 2ab2ecfe01f8..17f6be5a5097 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
> > fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm];
> > }
> >
> > - if (report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
> > + if (dom && report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova,
> > write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
> > dev_err_ratelimited(
> > bank->parent_dev,
> >
> > ---
> > base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
> > change-id: 20221125-mtk-iommu-13023f971298
> >
> > Best regards,



--
Ricardo Ribalda

2022-11-28 07:31:10

by Yong Wu (吴勇)

[permalink] [raw]
Subject: Re: [PATCH] iommu/mediatek: Fix crash on isr after kexec()

On Fri, 2022-11-25 at 17:28 +0100, Ricardo Ribalda wrote:
> If the system is rebooted via isr(), the IRQ handler might be
> triggerd
> before the domain is initialized. Resulting on an invalid memory
> access
> error.
>
> Fix:
> [ 0.500930] Unable to handle kernel read from unreadable memory at
> virtual address 0000000000000070
> [ 0.501166] Call trace:
> [ 0.501174] report_iommu_fault+0x28/0xfc
> [ 0.501180] mtk_iommu_isr+0x10c/0x1c0
>
> Signed-off-by: Ricardo Ribalda <[email protected]>
> ---
> To: Yong Wu <[email protected]>
> To: Joerg Roedel <[email protected]>
> To: Will Deacon <[email protected]>
> To: Robin Murphy <[email protected]>
> To: Matthias Brugger <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> drivers/iommu/mtk_iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 2ab2ecfe01f8..17f6be5a5097 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void
> *dev_id)
> fault_larb = data->plat_data-
> >larbid_remap[fault_larb][sub_comm];
> }
>
> - if (report_iommu_fault(&dom->domain, bank->parent_dev,
> fault_iova,
> + if (dom && report_iommu_fault(&dom->domain, bank->parent_dev,
> fault_iova,


Which SoC does this issue happen? Does this issue is happened in the
upstream kernel or the downstream kernel?

Normally each port enable the iommu defaultly. Let's print the error
log even though "dom" is null to check which port fail here. then
analyse the port's behavior.

if (!dom || report_iommu_fault(xx))
dev_err_ratelimited(xx)

> write ? IOMMU_FAULT_WRITE :
> IOMMU_FAULT_READ)) {
> dev_err_ratelimited(
> bank->parent_dev,
>
> ---
> base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
> change-id: 20221125-mtk-iommu-13023f971298
>
> Best regards,

2022-11-28 22:40:05

by Ricardo Ribalda

[permalink] [raw]
Subject: Re: [PATCH] iommu/mediatek: Fix crash on isr after kexec()

Hi Yong


On Mon, 28 Nov 2022 at 07:44, Yong Wu (吴勇) <[email protected]> wrote:
>
> On Fri, 2022-11-25 at 17:28 +0100, Ricardo Ribalda wrote:
> > If the system is rebooted via isr(), the IRQ handler might be
> > triggerd
> > before the domain is initialized. Resulting on an invalid memory
> > access
> > error.
> >
> > Fix:
> > [ 0.500930] Unable to handle kernel read from unreadable memory at
> > virtual address 0000000000000070
> > [ 0.501166] Call trace:
> > [ 0.501174] report_iommu_fault+0x28/0xfc
> > [ 0.501180] mtk_iommu_isr+0x10c/0x1c0
> >
> > Signed-off-by: Ricardo Ribalda <[email protected]>
> > ---
> > To: Yong Wu <[email protected]>
> > To: Joerg Roedel <[email protected]>
> > To: Will Deacon <[email protected]>
> > To: Robin Murphy <[email protected]>
> > To: Matthias Brugger <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > ---
> > drivers/iommu/mtk_iommu.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 2ab2ecfe01f8..17f6be5a5097 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void
> > *dev_id)
> > fault_larb = data->plat_data-
> > >larbid_remap[fault_larb][sub_comm];
> > }
> >
> > - if (report_iommu_fault(&dom->domain, bank->parent_dev,
> > fault_iova,
> > + if (dom && report_iommu_fault(&dom->domain, bank->parent_dev,
> > fault_iova,
>
>
> Which SoC does this issue happen? Does this issue is happened in the
> upstream kernel or the downstream kernel?

I am using chromeos-5.10 and chromeos-5.15 (which are pretty much upstream).

I have seen this issue at least with MT8195 and MT8183


>
> Normally each port enable the iommu defaultly. Let's print the error
> log even though "dom" is null to check which port fail here. then
> analyse the port's behavior.
>
> if (!dom || report_iommu_fault(xx))
> dev_err_ratelimited(xx)

sending a v2 with the change.

Thanks!


>
> > write ? IOMMU_FAULT_WRITE :
> > IOMMU_FAULT_READ)) {
> > dev_err_ratelimited(
> > bank->parent_dev,
> >
> > ---
> > base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
> > change-id: 20221125-mtk-iommu-13023f971298
> >
> > Best regards,



--
Ricardo Ribalda