2022-03-07 17:48:15

by Linu Cherian

[permalink] [raw]
Subject: [PATCH V3] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR

When a IAR register read races with a GIC interrupt RELEASE event,
GIC-CPU interface could wrongly return a valid INTID to the CPU
for an interrupt that is already released(non activated) instead of 0x3ff.

As a side effect, an interrupt handler could run twice, once with
interrupt priority and then with idle priority.

As a workaround, gic_read_iar is updated so that it will return a
valid interrupt ID only if there is a change in the active priority list
after the IAR read on all the affected Silicons.

Since there are silicon variants where both 23154 and 38545 are applicable,
workaround for erratum 23154 has been extended to address both of them.

Signed-off-by: Linu Cherian <[email protected]>
---
Changes since V2:
- Changed masked part number to individual part numbers
- Added additional comment to clarify on priority groups


Changes since V1:
- IIDR based quirk management done for 23154 has been reverted
- Extended existing 23154 errata to address 38545 as well,
so that existing static keys are reused.
- Added MIDR based support macros to cover all the affected parts
- Changed the unlikely construct to likely construct in the workaround
function.




Documentation/arm64/silicon-errata.rst | 2 +-
arch/arm64/Kconfig | 8 ++++++--
arch/arm64/include/asm/arch_gicv3.h | 23 +++++++++++++++++++++--
arch/arm64/include/asm/cputype.h | 13 +++++++++++++
arch/arm64/kernel/cpu_errata.c | 20 +++++++++++++++++---
5 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index ea281dd75517..466cb9e89047 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -136,7 +136,7 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 |
+----------------+-----------------+-----------------+-----------------------------+
-| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 |
+| Cavium | ThunderX GICv3 | #23154,38545 | CAVIUM_ERRATUM_23154 |
+----------------+-----------------+-----------------+-----------------------------+
| Cavium | ThunderX GICv3 | #38539 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 09b885cc4db5..778cc2e22c21 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -891,13 +891,17 @@ config CAVIUM_ERRATUM_23144
If unsure, say Y.

config CAVIUM_ERRATUM_23154
- bool "Cavium erratum 23154: Access to ICC_IAR1_EL1 is not sync'ed"
+ bool "Cavium errata 23154 and 38545: GICv3 lacks HW synchronisation"
default y
help
- The gicv3 of ThunderX requires a modified version for
+ The ThunderX GICv3 implementation requires a modified version for
reading the IAR status to ensure data synchronization
(access to icc_iar1_el1 is not sync'ed before and after).

+ It also suffers from erratum 38545 (also present on Marvell's
+ OcteonTX and OcteonTX2), resulting in deactivated interrupts being
+ spuriously presented to the CPU interface.
+
If unsure, say Y.

config CAVIUM_ERRATUM_27456
diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h
index 4ad22c3135db..8bd5afc7b692 100644
--- a/arch/arm64/include/asm/arch_gicv3.h
+++ b/arch/arm64/include/asm/arch_gicv3.h
@@ -53,17 +53,36 @@ static inline u64 gic_read_iar_common(void)
* The gicv3 of ThunderX requires a modified version for reading the
* IAR status to ensure data synchronization (access to icc_iar1_el1
* is not sync'ed before and after).
+ *
+ * Erratum 38545
+ *
+ * When a IAR register read races with a GIC interrupt RELEASE event,
+ * GIC-CPU interface could wrongly return a valid INTID to the CPU
+ * for an interrupt that is already released(non activated) instead of 0x3ff.
+ *
+ * To workaround this, return a valid interrupt ID only if there is a change
+ * in the active priority list after the IAR read.
+ *
+ * Common function used for both the workarounds since,
+ * 1. On Thunderx 88xx 1.x both erratas are applicable.
+ * 2. Having extra nops doesn't add any side effects for Silicons where
+ * erratum 23154 is not applicable.
*/
static inline u64 gic_read_iar_cavium_thunderx(void)
{
- u64 irqstat;
+ u64 irqstat, apr;

+ apr = read_sysreg_s(SYS_ICC_AP1R0_EL1);
nops(8);
irqstat = read_sysreg_s(SYS_ICC_IAR1_EL1);
nops(4);
mb();

- return irqstat;
+ /* Max priority groups implemented is only 32 */
+ if (likely(apr != read_sysreg_s(SYS_ICC_AP1R0_EL1)))
+ return irqstat;
+
+ return 0x3ff;
}

static inline void gic_write_ctlr(u32 val)
diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 999b9149f856..4596e7ca29a3 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -84,6 +84,13 @@
#define CAVIUM_CPU_PART_THUNDERX_81XX 0x0A2
#define CAVIUM_CPU_PART_THUNDERX_83XX 0x0A3
#define CAVIUM_CPU_PART_THUNDERX2 0x0AF
+/* OcteonTx2 series */
+#define CAVIUM_CPU_PART_OCTX2_98XX 0x0B1
+#define CAVIUM_CPU_PART_OCTX2_96XX 0x0B2
+#define CAVIUM_CPU_PART_OCTX2_95XX 0x0B3
+#define CAVIUM_CPU_PART_OCTX2_95XXN 0x0B4
+#define CAVIUM_CPU_PART_OCTX2_95XXMM 0x0B5
+#define CAVIUM_CPU_PART_OCTX2_95XXO 0x0B6

#define BRCM_CPU_PART_BRAHMA_B53 0x100
#define BRCM_CPU_PART_VULCAN 0x516
@@ -124,6 +131,12 @@
#define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
#define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
#define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)
+#define MIDR_OCTX2_98XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_98XX)
+#define MIDR_OCTX2_96XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_96XX)
+#define MIDR_OCTX2_95XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XX)
+#define MIDR_OCTX2_95XXN MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXN)
+#define MIDR_OCTX2_95XXMM MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXMM)
+#define MIDR_OCTX2_95XXO MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_OCTX2_95XXO)
#define MIDR_CAVIUM_THUNDERX2 MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX2)
#define MIDR_BRAHMA_B53 MIDR_CPU_MODEL(ARM_CPU_IMP_BRCM, BRCM_CPU_PART_BRAHMA_B53)
#define MIDR_BRCM_VULCAN MIDR_CPU_MODEL(ARM_CPU_IMP_BRCM, BRCM_CPU_PART_VULCAN)
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index b217941713a8..510f47055b91 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -214,6 +214,20 @@ static const struct arm64_cpu_capabilities arm64_repeat_tlbi_list[] = {
};
#endif

+#ifdef CONFIG_CAVIUM_ERRATUM_23154
+const struct midr_range cavium_erratum_23154_cpus[] = {
+ MIDR_ALL_VERSIONS(MIDR_THUNDERX),
+ MIDR_ALL_VERSIONS(MIDR_THUNDERX_81XX),
+ MIDR_ALL_VERSIONS(MIDR_THUNDERX_83XX),
+ MIDR_ALL_VERSIONS(MIDR_OCTX2_98XX),
+ MIDR_ALL_VERSIONS(MIDR_OCTX2_96XX),
+ MIDR_ALL_VERSIONS(MIDR_OCTX2_95XX),
+ MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXN),
+ MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXMM),
+ MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXO),
+};
+#endif
+
#ifdef CONFIG_CAVIUM_ERRATUM_27456
const struct midr_range cavium_erratum_27456_cpus[] = {
/* Cavium ThunderX, T88 pass 1.x - 2.1 */
@@ -425,10 +439,10 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
#endif
#ifdef CONFIG_CAVIUM_ERRATUM_23154
{
- /* Cavium ThunderX, pass 1.x */
- .desc = "Cavium erratum 23154",
+ .desc = "Cavium errata 23154 and 38545",
.capability = ARM64_WORKAROUND_CAVIUM_23154,
- ERRATA_MIDR_REV_RANGE(MIDR_THUNDERX, 0, 0, 1),
+ .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
+ ERRATA_MIDR_RANGE_LIST(cavium_erratum_23154_cpus),
},
#endif
#ifdef CONFIG_CAVIUM_ERRATUM_27456
--
2.31.1


2022-03-08 06:10:38

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH V3] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR

On Mon, 07 Mar 2022 14:30:14 +0000,
Linu Cherian <[email protected]> wrote:
>
> When a IAR register read races with a GIC interrupt RELEASE event,
> GIC-CPU interface could wrongly return a valid INTID to the CPU
> for an interrupt that is already released(non activated) instead of 0x3ff.
>
> As a side effect, an interrupt handler could run twice, once with
> interrupt priority and then with idle priority.
>
> As a workaround, gic_read_iar is updated so that it will return a
> valid interrupt ID only if there is a change in the active priority list
> after the IAR read on all the affected Silicons.
>
> Since there are silicon variants where both 23154 and 38545 are applicable,
> workaround for erratum 23154 has been extended to address both of them.
>
> Signed-off-by: Linu Cherian <[email protected]>
> ---
> Changes since V2:
> - Changed masked part number to individual part numbers
> - Added additional comment to clarify on priority groups
>
>
> Changes since V1:
> - IIDR based quirk management done for 23154 has been reverted
> - Extended existing 23154 errata to address 38545 as well,
> so that existing static keys are reused.
> - Added MIDR based support macros to cover all the affected parts
> - Changed the unlikely construct to likely construct in the workaround
> function.
>
>
>
>
> Documentation/arm64/silicon-errata.rst | 2 +-
> arch/arm64/Kconfig | 8 ++++++--
> arch/arm64/include/asm/arch_gicv3.h | 23 +++++++++++++++++++++--
> arch/arm64/include/asm/cputype.h | 13 +++++++++++++
> arch/arm64/kernel/cpu_errata.c | 20 +++++++++++++++++---
> 5 files changed, 58 insertions(+), 8 deletions(-)

Looks good to me this time.

Catalin, Will: happy to take this into the irqchip tree for 5.18 with
your Ack, or you can take it into the arm64 tree with my

Reviewed-by: Marc Zyngier <[email protected]>

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2022-03-08 07:06:34

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH V3] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR

On Mon, 7 Mar 2022 20:00:14 +0530, Linu Cherian wrote:
> When a IAR register read races with a GIC interrupt RELEASE event,
> GIC-CPU interface could wrongly return a valid INTID to the CPU
> for an interrupt that is already released(non activated) instead of 0x3ff.
>
> As a side effect, an interrupt handler could run twice, once with
> interrupt priority and then with idle priority.
>
> [...]

Applied to arm64 (for-next/errata), thanks!

[1/1] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR
https://git.kernel.org/arm64/c/24a147bcef8c

Cheers,
--
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

2022-03-09 22:03:15

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH V3] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR

On Mon, Mar 07, 2022 at 08:00:14PM +0530, Linu Cherian wrote:
> When a IAR register read races with a GIC interrupt RELEASE event,
> GIC-CPU interface could wrongly return a valid INTID to the CPU
> for an interrupt that is already released(non activated) instead of 0x3ff.
>
> As a side effect, an interrupt handler could run twice, once with
> interrupt priority and then with idle priority.
>
> As a workaround, gic_read_iar is updated so that it will return a
> valid interrupt ID only if there is a change in the active priority list
> after the IAR read on all the affected Silicons.
>
> Since there are silicon variants where both 23154 and 38545 are applicable,
> workaround for erratum 23154 has been extended to address both of them.
>
> Signed-off-by: Linu Cherian <[email protected]>

Reverting this commit from today's linux-next fixed global-out-of-bounds
accesses running CPU hotplug workloads on a non-ThunderX server.

psci: CPU88 killed (polled 0 ms)
==================================================================
BUG: KASAN: global-out-of-bounds in is_affected_midr_range_list
Read of size 4 at addr ffffa0ec80ddcc6c by task swapper/88/0

CPU: 88 PID: 0 Comm: swapper/88 Not tainted 5.17.0-rc7-next-20220309-dirty #25
Call trace:
dump_backtrace
show_stack
dump_stack_lvl
print_address_description.constprop.0
print_report
kasan_report
__asan_report_load4_noabort
is_affected_midr_range_list
is_midr_in_range_list at ./arch/arm64/include/asm/cputype.h:221
(inlined by) is_affected_midr_range_list at arch/arm64/kernel/cpu_errata.c:41
verify_local_cpu_caps
verify_local_cpu_caps at arch/arm64/kernel/cpufeature.c:2787
check_local_cpu_capabilities
verify_local_elf_hwcaps at arch/arm64/kernel/cpufeature.c:2852
(inlined by) verify_local_cpu_capabilities at arch/arm64/kernel/cpufeature.c:2922
(inlined by) check_local_cpu_capabilities at arch/arm64/kernel/cpufeature.c:2948
secondary_start_kernel
__secondary_switched

The buggy address belongs to the variable:
cavium_erratum_23154_cpus

The buggy address belongs to the virtual mapping at
[ffffa0ec80dd0000, ffffa0ec82140000) created by:
map_kernel


2022-03-10 00:34:46

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH V3] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR

On 2022-03-09 17:40, Qian Cai wrote:
> On Mon, Mar 07, 2022 at 08:00:14PM +0530, Linu Cherian wrote:
>> When a IAR register read races with a GIC interrupt RELEASE event,
>> GIC-CPU interface could wrongly return a valid INTID to the CPU
>> for an interrupt that is already released(non activated) instead of
>> 0x3ff.
>>
>> As a side effect, an interrupt handler could run twice, once with
>> interrupt priority and then with idle priority.
>>
>> As a workaround, gic_read_iar is updated so that it will return a
>> valid interrupt ID only if there is a change in the active priority
>> list
>> after the IAR read on all the affected Silicons.
>>
>> Since there are silicon variants where both 23154 and 38545 are
>> applicable,
>> workaround for erratum 23154 has been extended to address both of
>> them.
>>
>> Signed-off-by: Linu Cherian <[email protected]>
>
> Reverting this commit from today's linux-next fixed
> global-out-of-bounds
> accesses running CPU hotplug workloads on a non-ThunderX server.
>
> psci: CPU88 killed (polled 0 ms)
> ==================================================================
> BUG: KASAN: global-out-of-bounds in is_affected_midr_range_list
> Read of size 4 at addr ffffa0ec80ddcc6c by task swapper/88/0
>
> CPU: 88 PID: 0 Comm: swapper/88 Not tainted
> 5.17.0-rc7-next-20220309-dirty #25
> Call trace:
> dump_backtrace
> show_stack
> dump_stack_lvl
> print_address_description.constprop.0
> print_report
> kasan_report
> __asan_report_load4_noabort
> is_affected_midr_range_list
> is_midr_in_range_list at ./arch/arm64/include/asm/cputype.h:221
> (inlined by) is_affected_midr_range_list at
> arch/arm64/kernel/cpu_errata.c:41
> verify_local_cpu_caps
> verify_local_cpu_caps at arch/arm64/kernel/cpufeature.c:2787
> check_local_cpu_capabilities
> verify_local_elf_hwcaps at arch/arm64/kernel/cpufeature.c:2852
> (inlined by) verify_local_cpu_capabilities at
> arch/arm64/kernel/cpufeature.c:2922
> (inlined by) check_local_cpu_capabilities at
> arch/arm64/kernel/cpufeature.c:2948
> secondary_start_kernel
> __secondary_switched
>
> The buggy address belongs to the variable:
> cavium_erratum_23154_cpus
>
> The buggy address belongs to the virtual mapping at
> [ffffa0ec80dd0000, ffffa0ec82140000) created by:
> map_kernel

Urgh... Thanks for reporting this.

Will, can you either drop this patch, or squash the following
diff in?

Thanks,

M.

diff --git a/arch/arm64/kernel/cpu_errata.c
b/arch/arm64/kernel/cpu_errata.c
index 1d9d4f910de7..400a1c9cac90 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -225,6 +225,7 @@ const struct midr_range cavium_erratum_23154_cpus[]
= {
MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXN),
MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXMM),
MIDR_ALL_VERSIONS(MIDR_OCTX2_95XXO),
+ {},
};
#endif


--
Jazz is not dead. It just smells funny...

2023-05-30 08:25:09

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH V3] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR

On Tue, May 30, 2023 at 10:13 AM Geert Uytterhoeven
<[email protected]> wrote:
> On Mon, Mar 7, 2022 at 11:15 PM Will Deacon <[email protected]> wrote:
> > On Mon, 7 Mar 2022 20:00:14 +0530, Linu Cherian wrote:
> > > When a IAR register read races with a GIC interrupt RELEASE event,
> > > GIC-CPU interface could wrongly return a valid INTID to the CPU
> > > for an interrupt that is already released(non activated) instead of 0x3ff.
> > >
> > > As a side effect, an interrupt handler could run twice, once with
> > > interrupt priority and then with idle priority.
> > >
> > > [...]
> >
> > Applied to arm64 (for-next/errata), thanks!
> >
> > [1/1] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR
> > https://git.kernel.org/arm64/c/24a147bcef8c
>
> This workaround is now enabled on R-Car V4H:
>
> GIC: enabling workaround for GICv3: Cavium erratum 38539
>
> which is not a Cavium SoC. Is this expected?
> Thanks!

Please ignore, wrong thread. Sorry for the fuzz.
(note to myself: do not trust Gmail search to match on all search parameters)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-05-30 08:41:49

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH V3] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR

Hi all,

On Mon, Mar 7, 2022 at 11:15 PM Will Deacon <[email protected]> wrote:
> On Mon, 7 Mar 2022 20:00:14 +0530, Linu Cherian wrote:
> > When a IAR register read races with a GIC interrupt RELEASE event,
> > GIC-CPU interface could wrongly return a valid INTID to the CPU
> > for an interrupt that is already released(non activated) instead of 0x3ff.
> >
> > As a side effect, an interrupt handler could run twice, once with
> > interrupt priority and then with idle priority.
> >
> > [...]
>
> Applied to arm64 (for-next/errata), thanks!
>
> [1/1] irqchip/gic-v3: Workaround Marvell erratum 38545 when reading IAR
> https://git.kernel.org/arm64/c/24a147bcef8c

This workaround is now enabled on R-Car V4H:

GIC: enabling workaround for GICv3: Cavium erratum 38539

which is not a Cavium SoC. Is this expected?
Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds