2023-10-06 22:17:16

by Doug Anderson

[permalink] [raw]
Subject: [PATCH 1/3] arm64: Disable GiC priorities on Mediatek devices w/ firmware issues

In commit 44bd78dd2b88 ("irqchip/gic-v3: Disable pseudo NMIs on
Mediatek devices w/ firmware issues") we added a method for detecting
Mediatek devices with broken firmware and disabled pseudo-NMI. While
that worked, it didn't address the problem at a deep enough level.

The fundamental issue with this broken firmware is that it's not
saving and restoring several important GICR registers. The current
list is believed to be:
* GICR_NUM_IPRIORITYR
* GICR_CTLR
* GICR_ISPENDR0
* GICR_ISACTIVER0
* GICR_NSACR

Pseudo-NMI didn't work because it was the only thing (currently) in
the kernel that relied on the broken registers, so forcing pseudo-NMI
off was an effective fix. However, it could be observed that calling
system_uses_irq_prio_masking() on these systems still returned
"true". That caused confusion and led to the need for
commit a07a59415217 ("arm64: smp: avoid NMI IPIs with broken MediaTek
FW"). It's worried that the incorrect value returned by
system_uses_irq_prio_masking() on these systems will continue to
confuse future developers.

Let's fix the issue a little more completely by disabling IRQ
priorities at a deeper level in the kernel. Once we do this we can
revert some of the other bits of code dealing with this quirk.

Signed-off-by: Douglas Anderson <[email protected]>
---

arch/arm64/kernel/cpufeature.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 2806a2850e78..e35efab8efa9 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2094,9 +2094,30 @@ static int __init early_enable_pseudo_nmi(char *p)
}
early_param("irqchip.gicv3_pseudo_nmi", early_enable_pseudo_nmi);

+static bool are_gic_priorities_broken(void)
+{
+ bool is_broken = false;
+ struct device_node *np;
+
+ /*
+ * Detect broken Mediatek firmware that doesn't properly save and
+ * restore GIC priorities.
+ */
+ np = of_find_compatible_node(NULL, NULL, "arm,gic-v3");
+ if (np) {
+ is_broken = of_property_read_bool(np, "mediatek,broken-save-restore-fw");
+ of_node_put(np);
+ }
+
+ return is_broken;
+}
+
static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry,
int scope)
{
+ if (are_gic_priorities_broken())
+ return false;
+
/*
* ARM64_HAS_GIC_CPUIF_SYSREGS has a lower index, and is a boot CPU
* feature, so will be detected earlier.
--
2.42.0.609.gbb76f46606-goog


2023-10-06 22:17:17

by Doug Anderson

[permalink] [raw]
Subject: [PATCH 3/3] irqchip/gic-v3: Remove Mediatek pseudo-NMI firmware quirk handling

This is a partial revert of commit 44bd78dd2b88 ("irqchip/gic-v3:
Disable pseudo NMIs on Mediatek devices w/ firmware issues"). In the
patch ("arm64: Disable GiC priorities on Mediatek devices w/ firmware
issues") we've moved the quirk handling to another place and so it's
not needed in the GiC driver.

NOTE: this isn't a full revert because it leaves some of the changes
to the "quirks" structure around in case future code needs it.

Signed-off-by: Douglas Anderson <[email protected]>
---

drivers/irqchip/irq-gic-v3.c | 22 +---------------------
1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 787ccc880b22..9ff776709ae6 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -39,8 +39,7 @@

#define FLAGS_WORKAROUND_GICR_WAKER_MSM8996 (1ULL << 0)
#define FLAGS_WORKAROUND_CAVIUM_ERRATUM_38539 (1ULL << 1)
-#define FLAGS_WORKAROUND_MTK_GICR_SAVE (1ULL << 2)
-#define FLAGS_WORKAROUND_ASR_ERRATUM_8601001 (1ULL << 3)
+#define FLAGS_WORKAROUND_ASR_ERRATUM_8601001 (1ULL << 2)

#define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1)

@@ -1790,15 +1789,6 @@ static bool gic_enable_quirk_msm8996(void *data)
return true;
}

-static bool gic_enable_quirk_mtk_gicr(void *data)
-{
- struct gic_chip_data *d = data;
-
- d->flags |= FLAGS_WORKAROUND_MTK_GICR_SAVE;
-
- return true;
-}
-
static bool gic_enable_quirk_cavium_38539(void *data)
{
struct gic_chip_data *d = data;
@@ -1891,11 +1881,6 @@ static const struct gic_quirk gic_quirks[] = {
.compatible = "asr,asr8601-gic-v3",
.init = gic_enable_quirk_asr8601,
},
- {
- .desc = "GICv3: Mediatek Chromebook GICR save problem",
- .property = "mediatek,broken-save-restore-fw",
- .init = gic_enable_quirk_mtk_gicr,
- },
{
.desc = "GICv3: HIP06 erratum 161010803",
.iidr = 0x0204043b,
@@ -1957,11 +1942,6 @@ static void gic_enable_nmi_support(void)
if (!gic_prio_masking_enabled())
return;

- if (gic_data.flags & FLAGS_WORKAROUND_MTK_GICR_SAVE) {
- pr_warn("Skipping NMI enable due to firmware issues\n");
- return;
- }
-
rdist_nmi_refs = kcalloc(gic_data.ppi_nr + SGI_NR,
sizeof(*rdist_nmi_refs), GFP_KERNEL);
if (!rdist_nmi_refs)
--
2.42.0.609.gbb76f46606-goog

2023-10-18 11:01:54

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 1/3] arm64: Disable GiC priorities on Mediatek devices w/ firmware issues

On Fri, Oct 06, 2023 at 03:15:51PM -0700, Douglas Anderson wrote:
> In commit 44bd78dd2b88 ("irqchip/gic-v3: Disable pseudo NMIs on
> Mediatek devices w/ firmware issues") we added a method for detecting
> Mediatek devices with broken firmware and disabled pseudo-NMI. While
> that worked, it didn't address the problem at a deep enough level.
>
> The fundamental issue with this broken firmware is that it's not
> saving and restoring several important GICR registers. The current
> list is believed to be:
> * GICR_NUM_IPRIORITYR
> * GICR_CTLR
> * GICR_ISPENDR0
> * GICR_ISACTIVER0
> * GICR_NSACR
>
> Pseudo-NMI didn't work because it was the only thing (currently) in
> the kernel that relied on the broken registers, so forcing pseudo-NMI
> off was an effective fix. However, it could be observed that calling
> system_uses_irq_prio_masking() on these systems still returned
> "true". That caused confusion and led to the need for
> commit a07a59415217 ("arm64: smp: avoid NMI IPIs with broken MediaTek
> FW"). It's worried that the incorrect value returned by
> system_uses_irq_prio_masking() on these systems will continue to
> confuse future developers.
>
> Let's fix the issue a little more completely by disabling IRQ
> priorities at a deeper level in the kernel. Once we do this we can
> revert some of the other bits of code dealing with this quirk.
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> arch/arm64/kernel/cpufeature.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 2806a2850e78..e35efab8efa9 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2094,9 +2094,30 @@ static int __init early_enable_pseudo_nmi(char *p)
> }
> early_param("irqchip.gicv3_pseudo_nmi", early_enable_pseudo_nmi);
>
> +static bool are_gic_priorities_broken(void)
> +{
> + bool is_broken = false;
> + struct device_node *np;
> +
> + /*
> + * Detect broken Mediatek firmware that doesn't properly save and
> + * restore GIC priorities.
> + */
> + np = of_find_compatible_node(NULL, NULL, "arm,gic-v3");
> + if (np) {
> + is_broken = of_property_read_bool(np, "mediatek,broken-save-restore-fw");
> + of_node_put(np);
> + }
> +
> + return is_broken;
> +}

I'm definitely in favour of detecting this in the cpucap, but I think it'd be
better to parse the DT once on the boot CPU rather than on each CPU every time
it's brought up.

I think if we add something like:

#ifdef CONFIG_ARM64_PSEUDO_NMI
static void detect_system_supports_pseudo_nmi(void)
{
struct device_node *np;

if (!enable_pseudo_nmi)
return;

/*
* Detect broken Mediatek firmware that doesn't properly save and
* restore GIC priorities.
*/
np = of_find_compatible_node(NULL, NULL, "arm,gic-v3");
if (np && of_property_read_bool(np, "mediatek,broken-save-restore-fw")) {
pr_info("Pseudo-NMI disabled due to Mediatek Chromebook GICR save problem");
enable_pseudo_nmi = false;
}
of_node_put(np);
}
#endif /* CONFIG_ARM64_PSEUDO_NMI */
static inline void detect_system_supports_pseudo_nmi(void) { }
#endif

... then we can call that from init_cpu_features() before we call
setup_boot_cpu_capabilities(), and then the existing logic in
can_use_gic_priorities() should just work as that returns the value of
enable_pseudo_nmi.

Note: of_node_put(NULL) does nothing, like kfree(NULL), so it's fine for that
to be called in the !np case.

Would you be happy to fold that in? I'm happy with a Suggested-by tag if so. :)

Mark

2023-10-18 11:08:28

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 3/3] irqchip/gic-v3: Remove Mediatek pseudo-NMI firmware quirk handling

On Fri, Oct 06, 2023 at 03:15:53PM -0700, Douglas Anderson wrote:
> This is a partial revert of commit 44bd78dd2b88 ("irqchip/gic-v3:
> Disable pseudo NMIs on Mediatek devices w/ firmware issues"). In the
> patch ("arm64: Disable GiC priorities on Mediatek devices w/ firmware
> issues") we've moved the quirk handling to another place and so it's
> not needed in the GiC driver.
>
> NOTE: this isn't a full revert because it leaves some of the changes
> to the "quirks" structure around in case future code needs it.
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---

I think it might make sense to fold this into the patch adding the cpucap
detection. Otherwise, if you apply my suggestions to the first patch, there's a
2-commit window where we'll have two places that log that NMI is being disabled
due to the FW issue. That's not a functional issue, so doesn't matter that
much.

Either way:

Acked-by: Mark Rutland <[email protected]>

Mark.

>
> drivers/irqchip/irq-gic-v3.c | 22 +---------------------
> 1 file changed, 1 insertion(+), 21 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 787ccc880b22..9ff776709ae6 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -39,8 +39,7 @@
>
> #define FLAGS_WORKAROUND_GICR_WAKER_MSM8996 (1ULL << 0)
> #define FLAGS_WORKAROUND_CAVIUM_ERRATUM_38539 (1ULL << 1)
> -#define FLAGS_WORKAROUND_MTK_GICR_SAVE (1ULL << 2)
> -#define FLAGS_WORKAROUND_ASR_ERRATUM_8601001 (1ULL << 3)
> +#define FLAGS_WORKAROUND_ASR_ERRATUM_8601001 (1ULL << 2)
>
> #define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1)
>
> @@ -1790,15 +1789,6 @@ static bool gic_enable_quirk_msm8996(void *data)
> return true;
> }
>
> -static bool gic_enable_quirk_mtk_gicr(void *data)
> -{
> - struct gic_chip_data *d = data;
> -
> - d->flags |= FLAGS_WORKAROUND_MTK_GICR_SAVE;
> -
> - return true;
> -}
> -
> static bool gic_enable_quirk_cavium_38539(void *data)
> {
> struct gic_chip_data *d = data;
> @@ -1891,11 +1881,6 @@ static const struct gic_quirk gic_quirks[] = {
> .compatible = "asr,asr8601-gic-v3",
> .init = gic_enable_quirk_asr8601,
> },
> - {
> - .desc = "GICv3: Mediatek Chromebook GICR save problem",
> - .property = "mediatek,broken-save-restore-fw",
> - .init = gic_enable_quirk_mtk_gicr,
> - },
> {
> .desc = "GICv3: HIP06 erratum 161010803",
> .iidr = 0x0204043b,
> @@ -1957,11 +1942,6 @@ static void gic_enable_nmi_support(void)
> if (!gic_prio_masking_enabled())
> return;
>
> - if (gic_data.flags & FLAGS_WORKAROUND_MTK_GICR_SAVE) {
> - pr_warn("Skipping NMI enable due to firmware issues\n");
> - return;
> - }
> -
> rdist_nmi_refs = kcalloc(gic_data.ppi_nr + SGI_NR,
> sizeof(*rdist_nmi_refs), GFP_KERNEL);
> if (!rdist_nmi_refs)
> --
> 2.42.0.609.gbb76f46606-goog
>

2023-10-30 23:02:38

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 3/3] irqchip/gic-v3: Remove Mediatek pseudo-NMI firmware quirk handling

Hi,

On Wed, Oct 18, 2023 at 4:08 AM Mark Rutland <[email protected]> wrote:
>
> On Fri, Oct 06, 2023 at 03:15:53PM -0700, Douglas Anderson wrote:
> > This is a partial revert of commit 44bd78dd2b88 ("irqchip/gic-v3:
> > Disable pseudo NMIs on Mediatek devices w/ firmware issues"). In the
> > patch ("arm64: Disable GiC priorities on Mediatek devices w/ firmware
> > issues") we've moved the quirk handling to another place and so it's
> > not needed in the GiC driver.
> >
> > NOTE: this isn't a full revert because it leaves some of the changes
> > to the "quirks" structure around in case future code needs it.
> >
> > Signed-off-by: Douglas Anderson <[email protected]>
> > ---
>
> I think it might make sense to fold this into the patch adding the cpucap
> detection. Otherwise, if you apply my suggestions to the first patch, there's a
> 2-commit window where we'll have two places that log that NMI is being disabled
> due to the FW issue. That's not a functional issue, so doesn't matter that
> much.
>
> Either way:
>
> Acked-by: Mark Rutland <[email protected]>

I'm happy to go either way so I'd love some advice from maintainers
(Marc Zyngier, Catalin Marinas, Will Deacon) about what you'd prefer.

-Doug

2023-10-30 23:20:34

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 1/3] arm64: Disable GiC priorities on Mediatek devices w/ firmware issues

Hi,

On Wed, Oct 18, 2023 at 4:01 AM Mark Rutland <[email protected]> wrote:
>
> On Fri, Oct 06, 2023 at 03:15:51PM -0700, Douglas Anderson wrote:
> > In commit 44bd78dd2b88 ("irqchip/gic-v3: Disable pseudo NMIs on
> > Mediatek devices w/ firmware issues") we added a method for detecting
> > Mediatek devices with broken firmware and disabled pseudo-NMI. While
> > that worked, it didn't address the problem at a deep enough level.
> >
> > The fundamental issue with this broken firmware is that it's not
> > saving and restoring several important GICR registers. The current
> > list is believed to be:
> > * GICR_NUM_IPRIORITYR
> > * GICR_CTLR
> > * GICR_ISPENDR0
> > * GICR_ISACTIVER0
> > * GICR_NSACR
> >
> > Pseudo-NMI didn't work because it was the only thing (currently) in
> > the kernel that relied on the broken registers, so forcing pseudo-NMI
> > off was an effective fix. However, it could be observed that calling
> > system_uses_irq_prio_masking() on these systems still returned
> > "true". That caused confusion and led to the need for
> > commit a07a59415217 ("arm64: smp: avoid NMI IPIs with broken MediaTek
> > FW"). It's worried that the incorrect value returned by
> > system_uses_irq_prio_masking() on these systems will continue to
> > confuse future developers.
> >
> > Let's fix the issue a little more completely by disabling IRQ
> > priorities at a deeper level in the kernel. Once we do this we can
> > revert some of the other bits of code dealing with this quirk.
> >
> > Signed-off-by: Douglas Anderson <[email protected]>
> > ---
> >
> > arch/arm64/kernel/cpufeature.c | 21 +++++++++++++++++++++
> > 1 file changed, 21 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 2806a2850e78..e35efab8efa9 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -2094,9 +2094,30 @@ static int __init early_enable_pseudo_nmi(char *p)
> > }
> > early_param("irqchip.gicv3_pseudo_nmi", early_enable_pseudo_nmi);
> >
> > +static bool are_gic_priorities_broken(void)
> > +{
> > + bool is_broken = false;
> > + struct device_node *np;
> > +
> > + /*
> > + * Detect broken Mediatek firmware that doesn't properly save and
> > + * restore GIC priorities.
> > + */
> > + np = of_find_compatible_node(NULL, NULL, "arm,gic-v3");
> > + if (np) {
> > + is_broken = of_property_read_bool(np, "mediatek,broken-save-restore-fw");
> > + of_node_put(np);
> > + }
> > +
> > + return is_broken;
> > +}
>
> I'm definitely in favour of detecting this in the cpucap, but I think it'd be
> better to parse the DT once on the boot CPU rather than on each CPU every time
> it's brought up.
>
> I think if we add something like:
>
> #ifdef CONFIG_ARM64_PSEUDO_NMI
> static void detect_system_supports_pseudo_nmi(void)
> {
> struct device_node *np;
>
> if (!enable_pseudo_nmi)
> return;
>
> /*
> * Detect broken Mediatek firmware that doesn't properly save and
> * restore GIC priorities.
> */
> np = of_find_compatible_node(NULL, NULL, "arm,gic-v3");
> if (np && of_property_read_bool(np, "mediatek,broken-save-restore-fw")) {
> pr_info("Pseudo-NMI disabled due to Mediatek Chromebook GICR save problem");
> enable_pseudo_nmi = false;
> }
> of_node_put(np);
> }
> #endif /* CONFIG_ARM64_PSEUDO_NMI */
> static inline void detect_system_supports_pseudo_nmi(void) { }
> #endif
>
> ... then we can call that from init_cpu_features() before we call
> setup_boot_cpu_capabilities(), and then the existing logic in
> can_use_gic_priorities() should just work as that returns the value of
> enable_pseudo_nmi.
>
> Note: of_node_put(NULL) does nothing, like kfree(NULL), so it's fine for that
> to be called in the !np case.
>
> Would you be happy to fold that in? I'm happy with a Suggested-by tag if so. :)

Yup, that looks good to me and I can fold it in (fixing a few nits
like missing "\n" and adding __init to the function). I'll wait to get
maintainers opinions on whether to fold patch #3 in here and then send
a v2.

-Doug