Hi,
This series is a continuation of the work started by Daniel [1]. The goal
is to use GICv3 interrupt priorities to simulate an NMI.
To achieve this, set two priorities, one for standard interrupts and
another, higher priority, for NMIs. Whenever we want to disable interrupts,
we mask the standard priority instead so NMIs can still be raised. Some
corner cases though still require to actually mask all interrupts
effectively disabling the NMI.
Of course, using priority masking instead of PSR.I comes at some cost. On
hackbench, the drop of performance seems to be >1% on average for this
version. I can only attribute that to recent changes in the kernel as
hackbench seems slightly slower compared to my other benchmarks while the
runs with the use of GICv3 priorities have stayed in the same time frames.
KVM Guests do not seem to be affected preformance-wise by the host using
PMR to mask interrupts or not.
Currently, only PPIs and SPIs can be set as NMIs. IPIs being currently
hardcoded IRQ numbers, there isn't a generic interface to set SGIs as NMI
for now. I don't think there is any reason LPIs should be allowed to be set
as NMI as they do not have an active state.
When an NMI is active on a CPU, no other NMI can be triggered on the CPU.
Requirements to use this:
- Have GICv3
- SCR_EL3.FIQ is set to 1 when linux runs
- Select Kernel Feature -> Use ICC system registers for IRQ masking
* Patches 1 and 2 allows to detect and enable the use of GICv3 system
registers during boot time.
* Patch 3 introduces the masking of IRQs using priorities replacing irq
disabling.
* Patch 4 adds some utility functions
* Patch 5 add detection of the view linux has on GICv3 priorities, without
this we cannot easily mask specific priorities in an accurate manner
* Patch 6 adds the support for NMIs
Changes since V1[2]:
* Series rebased to v4.15-rc8.
* Check for arm64_early_features in this_cpu_has_cap (spotted by Suzuki).
* Fix issue where debug exception were not masked when enabling debug in
mdscr_el1.
Changes since RFC[3]:
* The series was rebased to v4.15-rc2 which implied some changes mainly
related to the work on exception entries and daif flags by James Morse.
- The first patch in the previous series was dropped because no longer
applicable.
- With the semantics James introduced of "inheriting" daif flags,
handling of PMR on exception entry is simplified as PMR is not altered
by taking an exception and already inherited from previous state.
- James pointed out that taking a PseudoNMI before reading the FAR_EL1
register should not be allowed as per the TRM (D10.2.29):
"FAR_EL1 is made UNKNOWN on an exception return from EL1."
So in this submission PSR.I bit is cleared only after FAR_EL1 is read.
* For KVM, only deal with PMR unmasking/restoring in common code, and VHE
specific code makes sure PSR.I bit is set when necessary.
* When detecting the GIC priority view (patch 5), wait for an actual
interrupt instead of trying only once.
[1] http://www.spinics.net/lists/arm-kernel/msg525077.html
[2] https://www.spinics.net/lists/arm-kernel/msg620763.html
[3] https://www.spinics.net/lists/arm-kernel/msg610736.html
Cheers,
Julien
-->
Daniel Thompson (3):
arm64: cpufeature: Allow early detect of specific features
arm64: alternative: Apply alternatives early in boot process
arm64: irqflags: Use ICC sysregs to implement IRQ masking
Julien Thierry (3):
irqchip/gic: Add functions to access irq priorities
arm64: Detect current view of GIC priorities
arm64: Add support for pseudo-NMIs
Documentation/arm64/booting.txt | 5 +
arch/arm64/Kconfig | 15 ++
arch/arm64/include/asm/alternative.h | 1 +
arch/arm64/include/asm/arch_gicv3.h | 42 +++++
arch/arm64/include/asm/assembler.h | 23 ++-
arch/arm64/include/asm/daifflags.h | 36 ++--
arch/arm64/include/asm/efi.h | 5 +
arch/arm64/include/asm/irqflags.h | 131 ++++++++++++++
arch/arm64/include/asm/processor.h | 4 +
arch/arm64/include/asm/ptrace.h | 14 +-
arch/arm64/include/asm/sysreg.h | 1 +
arch/arm64/kernel/alternative.c | 39 ++++-
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/cpufeature.c | 69 +++++---
arch/arm64/kernel/entry.S | 84 ++++++++-
arch/arm64/kernel/head.S | 38 ++++
arch/arm64/kernel/process.c | 6 +
arch/arm64/kernel/smp.c | 14 ++
arch/arm64/kvm/hyp/hyp-entry.S | 20 +++
arch/arm64/kvm/hyp/switch.c | 21 +++
arch/arm64/mm/proc.S | 23 +++
drivers/irqchip/irq-gic-common.c | 10 ++
drivers/irqchip/irq-gic-common.h | 2 +
drivers/irqchip/irq-gic-v3-its.c | 2 +-
drivers/irqchip/irq-gic-v3.c | 307 +++++++++++++++++++++++++++++----
include/linux/interrupt.h | 1 +
include/linux/irqchip/arm-gic-common.h | 6 +
include/linux/irqchip/arm-gic.h | 5 -
28 files changed, 841 insertions(+), 84 deletions(-)
--
1.9.1
From: Daniel Thompson <[email protected]>
Currently it is not possible to detect features of the boot CPU
until the other CPUs have been brought up.
This prevents us from reacting to features of the boot CPU until
fairly late in the boot process. To solve this we allow a subset
of features (that are likely to be common to all clusters) to be
detected based on the boot CPU alone.
Signed-off-by: Daniel Thompson <[email protected]>
[[email protected]: check non-boot cpu missing early features, avoid
duplicates between early features and normal
features]
Signed-off-by: Julien Thierry <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
---
arch/arm64/kernel/cpufeature.c | 69 ++++++++++++++++++++++++++++--------------
1 file changed, 47 insertions(+), 22 deletions(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a73a592..6698404 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -52,6 +52,8 @@
DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
EXPORT_SYMBOL(cpu_hwcaps);
+static void __init setup_early_feature_capabilities(void);
+
/*
* Flag to indicate if we have computed the system wide
* capabilities based on the boot time active CPUs. This
@@ -542,6 +544,8 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
sve_init_vq_map();
}
+
+ setup_early_feature_capabilities();
}
static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
@@ -846,7 +850,7 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
ID_AA64PFR0_FP_SHIFT) < 0;
}
-static const struct arm64_cpu_capabilities arm64_features[] = {
+static const struct arm64_cpu_capabilities arm64_early_features[] = {
{
.desc = "GIC system register CPU interface",
.capability = ARM64_HAS_SYSREG_GIC_CPUIF,
@@ -857,6 +861,10 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
.sign = FTR_UNSIGNED,
.min_field_value = 1,
},
+ {}
+};
+
+static const struct arm64_cpu_capabilities arm64_features[] = {
#ifdef CONFIG_ARM64_PAN
{
.desc = "Privileged Access Never",
@@ -1111,6 +1119,29 @@ void __init enable_cpu_capabilities(const struct arm64_cpu_capabilities *caps)
}
}
+/* Returns false on a capability mismatch */
+static bool
+verify_local_cpu_features(const struct arm64_cpu_capabilities *caps)
+{
+ for (; caps->matches; caps++) {
+ if (!cpus_have_cap(caps->capability))
+ continue;
+ /*
+ * If the new CPU misses an advertised feature, we cannot
+ * proceed further, park the cpu.
+ */
+ if (!caps->matches(caps, SCOPE_LOCAL_CPU)) {
+ pr_crit("CPU%d: missing feature: %s\n",
+ smp_processor_id(), caps->desc);
+ return false;
+ }
+ if (caps->enable)
+ caps->enable(NULL);
+ }
+
+ return true;
+}
+
/*
* Check for CPU features that are used in early boot
* based on the Boot CPU value.
@@ -1119,6 +1150,9 @@ static void check_early_cpu_features(void)
{
verify_cpu_run_el();
verify_cpu_asid_bits();
+
+ if (!verify_local_cpu_features(arm64_early_features))
+ cpu_panic_kernel();
}
static void
@@ -1133,26 +1167,6 @@ static void check_early_cpu_features(void)
}
}
-static void
-verify_local_cpu_features(const struct arm64_cpu_capabilities *caps)
-{
- for (; caps->matches; caps++) {
- if (!cpus_have_cap(caps->capability))
- continue;
- /*
- * If the new CPU misses an advertised feature, we cannot proceed
- * further, park the cpu.
- */
- if (!caps->matches(caps, SCOPE_LOCAL_CPU)) {
- pr_crit("CPU%d: missing feature: %s\n",
- smp_processor_id(), caps->desc);
- cpu_die_early();
- }
- if (caps->enable)
- caps->enable(NULL);
- }
-}
-
static void verify_sve_features(void)
{
u64 safe_zcr = read_sanitised_ftr_reg(SYS_ZCR_EL1);
@@ -1181,7 +1195,10 @@ static void verify_sve_features(void)
static void verify_local_cpu_capabilities(void)
{
verify_local_cpu_errata_workarounds();
- verify_local_cpu_features(arm64_features);
+
+ if (!verify_local_cpu_features(arm64_features))
+ cpu_die_early();
+
verify_local_elf_hwcaps(arm64_elf_hwcaps);
if (system_supports_32bit_el0())
@@ -1211,6 +1228,13 @@ void check_local_cpu_capabilities(void)
verify_local_cpu_capabilities();
}
+static void __init setup_early_feature_capabilities(void)
+{
+ update_cpu_capabilities(arm64_early_features,
+ "early detected feature:");
+ enable_cpu_capabilities(arm64_early_features);
+}
+
static void __init setup_feature_capabilities(void)
{
update_cpu_capabilities(arm64_features, "detected feature:");
@@ -1249,6 +1273,7 @@ static bool __this_cpu_has_cap(const struct arm64_cpu_capabilities *cap_array,
bool this_cpu_has_cap(unsigned int cap)
{
return (__this_cpu_has_cap(arm64_features, cap) ||
+ __this_cpu_has_cap(arm64_early_features, cap) ||
__this_cpu_has_cap(arm64_errata, cap));
}
--
1.9.1
From: Daniel Thompson <[email protected]>
Currently irqflags is implemented using the PSR's I bit. It is possible
to implement irqflags by using the co-processor interface to the GIC.
Using the co-processor interface makes it feasible to simulate NMIs
using GIC interrupt prioritization.
This patch changes the irqflags macros to modify, save and restore
ICC_PMR_EL1. This has a substantial knock on effect for the rest of
the kernel. There are four reasons for this:
1. The state of the PMR becomes part of the interrupt context and must be
saved and restored during exceptions. It is saved on the stack as part
of the saved context when an interrupt/exception is taken.
2. The hardware automatically masks the I bit (at boot, during traps, etc).
I bit status is inherited in the different kernel entry types and PMR
value is unaffected by exception. So once the I bit is inherited, IRQ
flags are back to the same state as before the exception.
In the interrupt entry, however, daif flags are not inherited.
Switching from I bit masking to PMR masking is done after acknowledging
the interrupt (otherwise PMR would prevent the IRQ ack).
3. Some instructions, such as wfi, require that the PMR not be used
for interrupt masking. Before calling these instructions we must
switch from PMR masking to I bit masking.
This is also the case when KVM runs a guest, if the CPU receives
an interrupt from the host, interrupts must not be masked in PMR
otherwise the GIC will not signal it to the CPU.
4. We use the alternatives system to allow a single kernel to boot and
be switched to the alternative masking approach at runtime.
Signed-off-by: Daniel Thompson <[email protected]>
[[email protected]: changes reflected in commit,
message, fixes, renaming]
Signed-off-by: Julien Thierry <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: James Morse <[email protected]>
---
arch/arm64/Kconfig | 15 ++++
arch/arm64/include/asm/arch_gicv3.h | 37 ++++++++++
arch/arm64/include/asm/assembler.h | 23 +++++-
arch/arm64/include/asm/daifflags.h | 36 +++++++---
arch/arm64/include/asm/efi.h | 5 ++
arch/arm64/include/asm/irqflags.h | 125 +++++++++++++++++++++++++++++++++
arch/arm64/include/asm/processor.h | 4 ++
arch/arm64/include/asm/ptrace.h | 14 +++-
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/entry.S | 28 +++++++-
arch/arm64/kernel/head.S | 38 ++++++++++
arch/arm64/kernel/process.c | 6 ++
arch/arm64/kernel/smp.c | 8 +++
arch/arm64/kvm/hyp/hyp-entry.S | 20 ++++++
arch/arm64/kvm/hyp/switch.c | 21 ++++++
arch/arm64/mm/proc.S | 23 ++++++
drivers/irqchip/irq-gic-v3-its.c | 2 +-
drivers/irqchip/irq-gic-v3.c | 82 +++++++++++----------
include/linux/irqchip/arm-gic-common.h | 6 ++
include/linux/irqchip/arm-gic.h | 5 --
20 files changed, 439 insertions(+), 60 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e..9834ff4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -850,6 +850,21 @@ config FORCE_MAX_ZONEORDER
However for 4K, we choose a higher default value, 11 as opposed to 10, giving us
4M allocations matching the default size used by generic code.
+config USE_ICC_SYSREGS_FOR_IRQFLAGS
+ bool "Use ICC system registers for IRQ masking"
+ select CONFIG_ARM_GIC_V3
+ help
+ Using the ICC system registers for IRQ masking makes it possible
+ to simulate NMI on ARM64 systems. This allows several interesting
+ features (especially debug features) to be used on these systems.
+
+ Say Y here to implement IRQ masking using ICC system
+ registers when the GIC System Registers are available. The changes
+ are applied dynamically using the alternatives system so it is safe
+ to enable this option on systems with older interrupt controllers.
+
+ If unsure, say N
+
menuconfig ARMV8_DEPRECATED
bool "Emulate deprecated/obsolete ARMv8 instructions"
depends on COMPAT
diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h
index 9becba9..490bb3a 100644
--- a/arch/arm64/include/asm/arch_gicv3.h
+++ b/arch/arm64/include/asm/arch_gicv3.h
@@ -76,6 +76,11 @@ static inline u64 gic_read_iar_cavium_thunderx(void)
return irqstat;
}
+static inline u32 gic_read_pmr(void)
+{
+ return read_sysreg_s(SYS_ICC_PMR_EL1);
+}
+
static inline void gic_write_pmr(u32 val)
{
write_sysreg_s(val, SYS_ICC_PMR_EL1);
@@ -145,5 +150,37 @@ static inline void gic_write_bpr1(u32 val)
#define gits_write_vpendbaser(v, c) writeq_relaxed(v, c)
#define gits_read_vpendbaser(c) readq_relaxed(c)
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+static inline void gic_start_pmr_masking(void)
+{
+ if (cpus_have_const_cap(ARM64_HAS_SYSREG_GIC_CPUIF)) {
+ gic_write_pmr(ICC_PMR_EL1_MASKED);
+ asm volatile ("msr daifclr, #2" : : : "memory");
+ }
+}
+
+static inline u32 gic_pmr_save_and_unmask(void)
+{
+ if (cpus_have_const_cap(ARM64_HAS_SYSREG_GIC_CPUIF)) {
+ u32 old;
+
+ old = gic_read_pmr();
+ gic_write_pmr(ICC_PMR_EL1_UNMASKED);
+ dsb(sy);
+
+ return old;
+ } else {
+ /* Idle priority, no masking */
+ return ICC_PMR_EL1_UNMASKED;
+ }
+}
+
+static inline void gic_pmr_restore(u32 pmr)
+{
+ if (cpus_have_const_cap(ARM64_HAS_SYSREG_GIC_CPUIF))
+ gic_write_pmr(pmr);
+}
+#endif
+
#endif /* __ASSEMBLY__ */
#endif /* __ASM_ARCH_GICV3_H */
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 8b16828..d320bd6 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -23,6 +23,7 @@
#ifndef __ASM_ASSEMBLER_H
#define __ASM_ASSEMBLER_H
+#include <asm/alternative.h>
#include <asm/asm-offsets.h>
#include <asm/cpufeature.h>
#include <asm/debug-monitors.h>
@@ -63,12 +64,30 @@
/*
* Enable and disable interrupts.
*/
- .macro disable_irq
+ .macro disable_irq, tmp
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ mov \tmp, #ICC_PMR_EL1_MASKED
+alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF
msr daifset, #2
+alternative_else
+ msr_s SYS_ICC_PMR_EL1, \tmp
+alternative_endif
+#else
+ msr daifset, #2
+#endif
.endm
- .macro enable_irq
+ .macro enable_irq, tmp
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ mov \tmp, #ICC_PMR_EL1_UNMASKED
+alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF
msr daifclr, #2
+alternative_else
+ msr_s SYS_ICC_PMR_EL1, \tmp
+alternative_endif
+#else
+ msr daifclr, #2
+#endif
.endm
.macro save_and_disable_irq, flags
diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index 22e4c83..ba85822 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -18,9 +18,24 @@
#include <linux/irqflags.h>
+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+
#define DAIF_PROCCTX 0
#define DAIF_PROCCTX_NOIRQ PSR_I_BIT
+#else
+
+#define DAIF_PROCCTX \
+ (cpus_have_const_cap(ARM64_HAS_SYSREG_GIC_CPUIF) ? \
+ MAKE_ARCH_FLAGS(0, ICC_PMR_EL1_UNMASKED) : \
+ 0)
+
+#define DAIF_PROCCTX_NOIRQ \
+ (cpus_have_const_cap(ARM64_HAS_SYSREG_GIC_CPUIF) ? \
+ MAKE_ARCH_FLAGS(0, ICC_PMR_EL1_MASKED) : \
+ PSR_I_BIT)
+#endif
+
/* mask/save/unmask/restore all exceptions, including interrupts. */
static inline void local_daif_mask(void)
{
@@ -36,11 +51,8 @@ static inline unsigned long local_daif_save(void)
{
unsigned long flags;
- asm volatile(
- "mrs %0, daif // local_daif_save\n"
- : "=r" (flags)
- :
- : "memory");
+ flags = arch_local_save_flags();
+
local_daif_mask();
return flags;
@@ -54,17 +66,21 @@ static inline void local_daif_unmask(void)
:
:
: "memory");
+
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ /* Unmask IRQs in PMR if needed */
+ if (cpus_have_const_cap(ARM64_HAS_SYSREG_GIC_CPUIF))
+ arch_local_irq_enable();
+#endif
}
static inline void local_daif_restore(unsigned long flags)
{
if (!arch_irqs_disabled_flags(flags))
trace_hardirqs_on();
- asm volatile(
- "msr daif, %0 // local_daif_restore"
- :
- : "r" (flags)
- : "memory");
+
+ arch_local_irq_restore(flags);
+
if (arch_irqs_disabled_flags(flags))
trace_hardirqs_off();
}
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index c4cd508..421525f 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -40,7 +40,12 @@
efi_virtmap_unload(); \
})
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+#define ARCH_EFI_IRQ_FLAGS_MASK \
+ (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT | ARCH_FLAG_PMR_EN)
+#else
#define ARCH_EFI_IRQ_FLAGS_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT)
+#endif
/* arch specific definitions used by the stub code */
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index 24692ed..3d5d443 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -18,7 +18,10 @@
#ifdef __KERNEL__
+#include <asm/alternative.h>
+#include <asm/cpufeature.h>
#include <asm/ptrace.h>
+#include <asm/sysreg.h>
/*
* Aarch64 has flags for masking: Debug, Asynchronous (serror), Interrupts and
@@ -33,6 +36,7 @@
* unmask it at all other times.
*/
+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
/*
* CPU interrupt mask handling.
*/
@@ -96,5 +100,126 @@ static inline int arch_irqs_disabled_flags(unsigned long flags)
{
return flags & PSR_I_BIT;
}
+
+static inline void maybe_switch_to_sysreg_gic_cpuif(void) {}
+
+#else /* CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS */
+
+#define ARCH_FLAG_PMR_EN 0x1
+
+#define MAKE_ARCH_FLAGS(daif, pmr) \
+ ((daif) | (((pmr) >> ICC_PMR_EL1_EN_SHIFT) & ARCH_FLAG_PMR_EN))
+
+#define ARCH_FLAGS_GET_PMR(flags) \
+ ((((flags) & ARCH_FLAG_PMR_EN) << ICC_PMR_EL1_EN_SHIFT) \
+ | ICC_PMR_EL1_MASKED)
+
+#define ARCH_FLAGS_GET_DAIF(flags) ((flags) & ~ARCH_FLAG_PMR_EN)
+
+/*
+ * CPU interrupt mask handling.
+ */
+static inline unsigned long arch_local_irq_save(void)
+{
+ unsigned long flags, masked = ICC_PMR_EL1_MASKED;
+ unsigned long pmr = 0;
+
+ asm volatile(ALTERNATIVE(
+ "mrs %0, daif // arch_local_irq_save\n"
+ "msr daifset, #2\n"
+ "mov %1, #" __stringify(ICC_PMR_EL1_UNMASKED),
+ /* --- */
+ "mrs %0, daif\n"
+ "mrs_s %1, " __stringify(SYS_ICC_PMR_EL1) "\n"
+ "msr_s " __stringify(SYS_ICC_PMR_EL1) ", %2",
+ ARM64_HAS_SYSREG_GIC_CPUIF)
+ : "=&r" (flags), "=&r" (pmr)
+ : "r" (masked)
+ : "memory");
+
+ return MAKE_ARCH_FLAGS(flags, pmr);
+}
+
+static inline void arch_local_irq_enable(void)
+{
+ unsigned long unmasked = ICC_PMR_EL1_UNMASKED;
+
+ asm volatile(ALTERNATIVE(
+ "msr daifclr, #2 // arch_local_irq_enable\n"
+ "nop",
+ "msr_s " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
+ "dsb sy",
+ ARM64_HAS_SYSREG_GIC_CPUIF)
+ :
+ : "r" (unmasked)
+ : "memory");
+}
+
+static inline void arch_local_irq_disable(void)
+{
+ unsigned long masked = ICC_PMR_EL1_MASKED;
+
+ asm volatile(ALTERNATIVE(
+ "msr daifset, #2 // arch_local_irq_disable",
+ "msr_s " __stringify(SYS_ICC_PMR_EL1) ",%0",
+ ARM64_HAS_SYSREG_GIC_CPUIF)
+ :
+ : "r" (masked)
+ : "memory");
+}
+
+/*
+ * Save the current interrupt enable state.
+ */
+static inline unsigned long arch_local_save_flags(void)
+{
+ unsigned long flags;
+ unsigned long pmr = 0;
+
+ asm volatile(ALTERNATIVE(
+ "mrs %0, daif // arch_local_save_flags\n"
+ "mov %1, #" __stringify(ICC_PMR_EL1_UNMASKED),
+ "mrs %0, daif\n"
+ "mrs_s %1, " __stringify(SYS_ICC_PMR_EL1),
+ ARM64_HAS_SYSREG_GIC_CPUIF)
+ : "=r" (flags), "=r" (pmr)
+ :
+ : "memory");
+
+ return MAKE_ARCH_FLAGS(flags, pmr);
+}
+
+/*
+ * restore saved IRQ state
+ */
+static inline void arch_local_irq_restore(unsigned long flags)
+{
+ unsigned long pmr = ARCH_FLAGS_GET_PMR(flags);
+
+ flags = ARCH_FLAGS_GET_DAIF(flags);
+
+ asm volatile(ALTERNATIVE(
+ "msr daif, %0 // arch_local_irq_restore\n"
+ "nop\n"
+ "nop",
+ "msr daif, %0\n"
+ "msr_s " __stringify(SYS_ICC_PMR_EL1) ",%1\n"
+ "dsb sy",
+ ARM64_HAS_SYSREG_GIC_CPUIF)
+ :
+ : "r" (flags), "r" (pmr)
+ : "memory");
+}
+
+static inline int arch_irqs_disabled_flags(unsigned long flags)
+{
+ return (ARCH_FLAGS_GET_DAIF(flags) & (PSR_I_BIT)) |
+ !(ARCH_FLAGS_GET_PMR(flags) & ICC_PMR_EL1_EN_BIT);
+}
+
+void maybe_switch_to_sysreg_gic_cpuif(void);
+
+#endif /* CONFIG_IRQFLAGS_GIC_MASKING */
+
#endif
#endif
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 023cacb..d569dee 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -137,6 +137,10 @@ static inline void start_thread_common(struct pt_regs *regs, unsigned long pc)
memset(regs, 0, sizeof(*regs));
forget_syscall(regs);
regs->pc = pc;
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ /* Have IRQs enabled by default */
+ regs->pmr_save = ICC_PMR_EL1_UNMASKED;
+#endif
}
static inline void start_thread(struct pt_regs *regs, unsigned long pc,
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index 6069d66..aa1e948 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -25,6 +25,12 @@
#define CurrentEL_EL1 (1 << 2)
#define CurrentEL_EL2 (2 << 2)
+/* PMR values used to mask/unmask interrupts */
+#define ICC_PMR_EL1_EN_SHIFT 6
+#define ICC_PMR_EL1_EN_BIT (1 << ICC_PMR_EL1_EN_SHIFT) // PMR IRQ enable
+#define ICC_PMR_EL1_UNMASKED 0xf0
+#define ICC_PMR_EL1_MASKED (ICC_PMR_EL1_UNMASKED ^ ICC_PMR_EL1_EN_BIT)
+
/* AArch32-specific ptrace requests */
#define COMPAT_PTRACE_GETREGS 12
#define COMPAT_PTRACE_SETREGS 13
@@ -136,7 +142,7 @@ struct pt_regs {
#endif
u64 orig_addr_limit;
- u64 unused; // maintain 16 byte alignment
+ u64 pmr_save;
u64 stackframe[2];
};
@@ -171,8 +177,14 @@ static inline void forget_syscall(struct pt_regs *regs)
#define processor_mode(regs) \
((regs)->pstate & PSR_MODE_MASK)
+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
#define interrupts_enabled(regs) \
(!((regs)->pstate & PSR_I_BIT))
+#else
+#define interrupts_enabled(regs) \
+ ((!((regs)->pstate & PSR_I_BIT)) && \
+ ((regs)->pmr_save & ICC_PMR_EL1_EN_BIT))
+#endif
#define fast_interrupts_enabled(regs) \
(!((regs)->pstate & PSR_F_BIT))
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 71bf088..6b00b0d 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -75,6 +75,7 @@ int main(void)
DEFINE(S_ORIG_X0, offsetof(struct pt_regs, orig_x0));
DEFINE(S_SYSCALLNO, offsetof(struct pt_regs, syscallno));
DEFINE(S_ORIG_ADDR_LIMIT, offsetof(struct pt_regs, orig_addr_limit));
+ DEFINE(S_PMR_SAVE, offsetof(struct pt_regs, pmr_save));
DEFINE(S_STACKFRAME, offsetof(struct pt_regs, stackframe));
DEFINE(S_FRAME_SIZE, sizeof(struct pt_regs));
BLANK();
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 6d14b8f..8209b45 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -20,6 +20,7 @@
#include <linux/init.h>
#include <linux/linkage.h>
+#include <linux/irqchip/arm-gic-v3.h>
#include <asm/alternative.h>
#include <asm/assembler.h>
@@ -210,6 +211,16 @@ alternative_else_nop_endif
msr sp_el0, tsk
.endif
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ /* Save pmr */
+alternative_if ARM64_HAS_SYSREG_GIC_CPUIF
+ mrs_s x20, SYS_ICC_PMR_EL1
+alternative_else
+ mov x20, #ICC_PMR_EL1_UNMASKED
+alternative_endif
+ str x20, [sp, #S_PMR_SAVE]
+#endif
+
/*
* Registers that may be useful after this macro is invoked:
*
@@ -230,6 +241,15 @@ alternative_else_nop_endif
/* No need to restore UAO, it will be restored from SPSR_EL1 */
.endif
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ /* Restore pmr, ensuring IRQs are off before restoring context. */
+alternative_if ARM64_HAS_SYSREG_GIC_CPUIF
+ ldr x20, [sp, #S_PMR_SAVE]
+ msr_s SYS_ICC_PMR_EL1, x20
+ dsb sy
+alternative_else_nop_endif
+#endif
+
ldp x21, x22, [sp, #S_PC] // load ELR, SPSR
.if \el == 0
ct_user_enter
@@ -820,17 +840,18 @@ ENDPROC(el0_error)
* and this includes saving x0 back into the kernel stack.
*/
ret_fast_syscall:
- disable_daif
+ disable_irq x21 // disable interrupts
str x0, [sp, #S_X0] // returned x0
ldr x1, [tsk, #TSK_TI_FLAGS] // re-check for syscall tracing
and x2, x1, #_TIF_SYSCALL_WORK
cbnz x2, ret_fast_syscall_trace
and x2, x1, #_TIF_WORK_MASK
cbnz x2, work_pending
+ disable_daif
enable_step_tsk x1, x2
kernel_exit 0
ret_fast_syscall_trace:
- enable_daif
+ enable_daif // enable interrupts
b __sys_trace_return_skipped // we already saved x0
/*
@@ -848,11 +869,12 @@ work_pending:
* "slow" syscall return path.
*/
ret_to_user:
- disable_daif
+ disable_irq x21 // disable interrupts
ldr x1, [tsk, #TSK_TI_FLAGS]
and x2, x1, #_TIF_WORK_MASK
cbnz x2, work_pending
finish_ret_to_user:
+ disable_daif
enable_step_tsk x1, x2
kernel_exit 0
ENDPROC(ret_to_user)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index e3cb9fb..ec2eb4a 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -563,6 +563,44 @@ set_cpu_boot_mode_flag:
ret
ENDPROC(set_cpu_boot_mode_flag)
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+/*
+ * void maybe_switch_to_sysreg_gic_cpuif(void)
+ *
+ * Enable interrupt controller system register access if this feature
+ * has been detected by the alternatives system.
+ *
+ * Before we jump into generic code we must enable interrupt controller system
+ * register access because this is required by the irqflags macros. We must
+ * also mask interrupts at the PMR and unmask them within the PSR. That leaves
+ * us set up and ready for the kernel to make its first call to
+ * arch_local_irq_enable().
+
+ *
+ */
+ENTRY(maybe_switch_to_sysreg_gic_cpuif)
+alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF
+ b 1f
+alternative_else
+ mrs_s x0, SYS_ICC_SRE_EL1
+alternative_endif
+ orr x0, x0, #1
+ msr_s SYS_ICC_SRE_EL1, x0 // Set ICC_SRE_EL1.SRE==1
+ isb // Make sure SRE is now set
+ mrs x0, daif
+ tbz x0, #7, no_mask_pmr // Are interrupts on?
+ mov x0, ICC_PMR_EL1_MASKED
+ msr_s SYS_ICC_PMR_EL1, x0 // Prepare for unmask of I bit
+ msr daifclr, #2 // Clear the I bit
+ b 1f
+no_mask_pmr:
+ mov x0, ICC_PMR_EL1_UNMASKED
+ msr_s SYS_ICC_PMR_EL1, x0
+1:
+ ret
+ENDPROC(maybe_switch_to_sysreg_gic_cpuif)
+#endif
+
/*
* These values are written with the MMU off, but read with the MMU on.
* Writers will invalidate the corresponding address, discarding up to a
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6b7dcf4..56871f2 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -66,6 +66,8 @@
EXPORT_SYMBOL(__stack_chk_guard);
#endif
+#include <asm/arch_gicv3.h>
+
/*
* Function pointers to optional machine specific functions
*/
@@ -224,6 +226,7 @@ void __show_regs(struct pt_regs *regs)
print_symbol("pc : %s\n", regs->pc);
print_symbol("lr : %s\n", lr);
printk("sp : %016llx\n", sp);
+ printk("pmr_save: %08llx\n", regs->pmr_save);
i = top_reg;
@@ -349,6 +352,9 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
} else {
memset(childregs, 0, sizeof(struct pt_regs));
childregs->pstate = PSR_MODE_EL1h;
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ childregs->pmr_save = ICC_PMR_EL1_UNMASKED;
+#endif
if (IS_ENABLED(CONFIG_ARM64_UAO) &&
cpus_have_const_cap(ARM64_HAS_UAO))
childregs->pstate |= PSR_UAO_BIT;
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 37361b5..ec56ee1 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -221,6 +221,8 @@ asmlinkage void secondary_start_kernel(void)
struct mm_struct *mm = &init_mm;
unsigned int cpu;
+ maybe_switch_to_sysreg_gic_cpuif();
+
cpu = task_cpu(current);
set_my_cpu_offset(per_cpu_offset(cpu));
@@ -459,6 +461,12 @@ void __init smp_prepare_boot_cpu(void)
* and/or scheduling is enabled.
*/
apply_alternatives_early();
+
+ /*
+ * Conditionally switch to GIC PMR for interrupt masking (this
+ * will be a nop if we are using normal interrupt masking)
+ */
+ maybe_switch_to_sysreg_gic_cpuif();
}
static u64 __init of_get_cpu_mpidr(struct device_node *dn)
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 5170ce1..e5e97e8 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -42,7 +42,27 @@
.endm
ENTRY(__vhe_hyp_call)
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+alternative_if ARM64_HAS_SYSREG_GIC_CPUIF
+ /*
+ * In non-VHE, trapping to EL2 will set the PSR.I bit.
+ * Force it here whenever we are playing with PMR.
+ */
+ str x19, [sp, #-16]!
+ mrs x19, daif
+ msr daifset, #2
+alternative_else_nop_endif
+#endif
+
do_el2_call
+
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+alternative_if ARM64_HAS_SYSREG_GIC_CPUIF
+ msr daif, x19
+ ldr x19, [sp], #16
+alternative_else_nop_endif
+#endif
+
/*
* We used to rely on having an exception return to get
* an implicit isb. In the E2H case, we don't have it anymore.
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index f7c651f..4fac70d 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -18,6 +18,9 @@
#include <linux/types.h>
#include <linux/jump_label.h>
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+#include <asm/arch_gicv3.h>
+#endif
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_hyp.h>
@@ -303,6 +306,19 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *guest_ctxt;
bool fp_enabled;
u64 exit_code;
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ u32 pmr_save;
+#endif
+
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ /*
+ * Having IRQs masked via PMR when entering the guest means the GIC
+ * will not signal the CPU of interrupts of lower priority, and the
+ * only way to get out will be via guest exceptions.
+ * Naturally, we want to avoid this.
+ */
+ pmr_save = gic_pmr_save_and_unmask();
+#endif
vcpu = kern_hyp_va(vcpu);
write_sysreg(vcpu, tpidr_el2);
@@ -417,6 +433,11 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
*/
__debug_cond_restore_host_state(vcpu);
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ /* PMR was unmasked, no need for dsb */
+ gic_pmr_restore(pmr_save);
+#endif
+
return exit_code;
}
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 95233df..8b91661 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -20,6 +20,7 @@
#include <linux/init.h>
#include <linux/linkage.h>
+#include <linux/irqchip/arm-gic-v3.h>
#include <asm/assembler.h>
#include <asm/asm-offsets.h>
#include <asm/hwcap.h>
@@ -47,11 +48,33 @@
* cpu_do_idle()
*
* Idle the processor (wait for interrupt).
+ *
+ * If CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS is set we must do additional
+ * work to ensure that interrupts are not masked at the PMR (because the
+ * core will not wake up if we block the wake up signal in the interrupt
+ * controller).
*/
ENTRY(cpu_do_idle)
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+alternative_if_not ARM64_HAS_SYSREG_GIC_CPUIF
+#endif
+ dsb sy // WFI may enter a low-power mode
+ wfi
+ ret
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+alternative_else
+ mrs x0, daif // save I bit
+ msr daifset, #2 // set I bit
+ mrs_s x1, SYS_ICC_PMR_EL1 // save PMR
+alternative_endif
+ mov x2, #ICC_PMR_EL1_UNMASKED
+ msr_s SYS_ICC_PMR_EL1, x2 // unmask at PMR
dsb sy // WFI may enter a low-power mode
wfi
+ msr_s SYS_ICC_PMR_EL1, x1 // restore PMR
+ msr daif, x0 // restore I bit
ret
+#endif
ENDPROC(cpu_do_idle)
#ifdef CONFIG_CPU_PM
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 06f025f..35e5f45 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -60,7 +60,7 @@
#define LPI_PROPBASE_SZ ALIGN(BIT(LPI_NRBITS), SZ_64K)
#define LPI_PENDBASE_SZ ALIGN(BIT(LPI_NRBITS) / 8, SZ_64K)
-#define LPI_PROP_DEFAULT_PRIO 0xa0
+#define LPI_PROP_DEFAULT_PRIO GICD_INT_DEF_PRI
/*
* Collection structure - just an ID, and a redistributor address to
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index b56c3e2..df51d96 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -71,9 +71,6 @@ struct gic_chip_data {
#define gic_data_rdist_rd_base() (gic_data_rdist()->rd_base)
#define gic_data_rdist_sgi_base() (gic_data_rdist_rd_base() + SZ_64K)
-/* Our default, arbitrary priority value. Linux only uses one anyway. */
-#define DEFAULT_PMR_VALUE 0xf0
-
static inline unsigned int gic_irq(struct irq_data *d)
{
return d->hwirq;
@@ -348,48 +345,55 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
{
u32 irqnr;
- do {
- irqnr = gic_read_iar();
+ irqnr = gic_read_iar();
- if (likely(irqnr > 15 && irqnr < 1020) || irqnr >= 8192) {
- int err;
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ isb();
+ /* Masking IRQs earlier would prevent to ack the current interrupt */
+ gic_start_pmr_masking();
+#endif
- if (static_key_true(&supports_deactivate))
+ if (likely(irqnr > 15 && irqnr < 1020) || irqnr >= 8192) {
+ int err;
+
+ if (static_key_true(&supports_deactivate))
+ gic_write_eoir(irqnr);
+ else {
+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ isb();
+#endif
+ }
+
+ err = handle_domain_irq(gic_data.domain, irqnr, regs);
+ if (err) {
+ WARN_ONCE(true, "Unexpected interrupt received!\n");
+ if (static_key_true(&supports_deactivate)) {
+ if (irqnr < 8192)
+ gic_write_dir(irqnr);
+ } else {
gic_write_eoir(irqnr);
- else
- isb();
-
- err = handle_domain_irq(gic_data.domain, irqnr, regs);
- if (err) {
- WARN_ONCE(true, "Unexpected interrupt received!\n");
- if (static_key_true(&supports_deactivate)) {
- if (irqnr < 8192)
- gic_write_dir(irqnr);
- } else {
- gic_write_eoir(irqnr);
- }
}
- continue;
}
- if (irqnr < 16) {
- gic_write_eoir(irqnr);
- if (static_key_true(&supports_deactivate))
- gic_write_dir(irqnr);
+ return;
+ }
+ if (irqnr < 16) {
+ gic_write_eoir(irqnr);
+ if (static_key_true(&supports_deactivate))
+ gic_write_dir(irqnr);
#ifdef CONFIG_SMP
- /*
- * Unlike GICv2, we don't need an smp_rmb() here.
- * The control dependency from gic_read_iar to
- * the ISB in gic_write_eoir is enough to ensure
- * that any shared data read by handle_IPI will
- * be read after the ACK.
- */
- handle_IPI(irqnr, regs);
+ /*
+ * Unlike GICv2, we don't need an smp_rmb() here.
+ * The control dependency from gic_read_iar to
+ * the ISB in gic_write_eoir is enough to ensure
+ * that any shared data read by handle_IPI will
+ * be read after the ACK.
+ */
+ handle_IPI(irqnr, regs);
#else
- WARN_ONCE(true, "Unexpected SGI received!\n");
+ WARN_ONCE(true, "Unexpected SGI received!\n");
#endif
- continue;
- }
- } while (irqnr != ICC_IAR1_EL1_SPURIOUS);
+ return;
+ }
}
static void __init gic_dist_init(void)
@@ -543,8 +547,10 @@ static void gic_cpu_sys_reg_init(void)
if (!gic_enable_sre())
pr_err("GIC: unable to set SRE (disabled at EL2), panic ahead\n");
+#ifndef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
/* Set priority mask register */
- gic_write_pmr(DEFAULT_PMR_VALUE);
+ gic_write_pmr(ICC_PMR_EL1_UNMASKED);
+#endif
/*
* Some firmwares hand over to the kernel with the BPR changed from
diff --git a/include/linux/irqchip/arm-gic-common.h b/include/linux/irqchip/arm-gic-common.h
index 0a83b43..2c9a4b3 100644
--- a/include/linux/irqchip/arm-gic-common.h
+++ b/include/linux/irqchip/arm-gic-common.h
@@ -13,6 +13,12 @@
#include <linux/types.h>
#include <linux/ioport.h>
+#define GICD_INT_DEF_PRI 0xc0
+#define GICD_INT_DEF_PRI_X4 ((GICD_INT_DEF_PRI << 24) |\
+ (GICD_INT_DEF_PRI << 16) |\
+ (GICD_INT_DEF_PRI << 8) |\
+ GICD_INT_DEF_PRI)
+
enum gic_type {
GIC_V2,
GIC_V3,
diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h
index d3453ee..47f5a8c 100644
--- a/include/linux/irqchip/arm-gic.h
+++ b/include/linux/irqchip/arm-gic.h
@@ -65,11 +65,6 @@
#define GICD_INT_EN_CLR_X32 0xffffffff
#define GICD_INT_EN_SET_SGI 0x0000ffff
#define GICD_INT_EN_CLR_PPI 0xffff0000
-#define GICD_INT_DEF_PRI 0xa0
-#define GICD_INT_DEF_PRI_X4 ((GICD_INT_DEF_PRI << 24) |\
- (GICD_INT_DEF_PRI << 16) |\
- (GICD_INT_DEF_PRI << 8) |\
- GICD_INT_DEF_PRI)
#define GICH_HCR 0x0
#define GICH_VTR 0x4
--
1.9.1
Add functions to read/write priorities to the GIC [re]distributor.
Signed-off-by: Julien Thierry <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
drivers/irqchip/irq-gic-common.c | 10 ++++++++++
drivers/irqchip/irq-gic-common.h | 2 ++
2 files changed, 12 insertions(+)
diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
index 30017df..1dfa60b 100644
--- a/drivers/irqchip/irq-gic-common.c
+++ b/drivers/irqchip/irq-gic-common.c
@@ -91,6 +91,16 @@ int gic_configure_irq(unsigned int irq, unsigned int type,
return ret;
}
+void gic_set_irq_prio(unsigned int irq, void __iomem *base, u8 prio)
+{
+ writeb_relaxed(prio, base + GIC_DIST_PRI + irq);
+}
+
+u8 gic_get_irq_prio(unsigned int irq, void __iomem *base)
+{
+ return readb_relaxed(base + GIC_DIST_PRI + irq);
+}
+
void gic_dist_config(void __iomem *base, int gic_irqs,
void (*sync_access)(void))
{
diff --git a/drivers/irqchip/irq-gic-common.h b/drivers/irqchip/irq-gic-common.h
index 3919cd7..1586dbd 100644
--- a/drivers/irqchip/irq-gic-common.h
+++ b/drivers/irqchip/irq-gic-common.h
@@ -35,6 +35,8 @@ void gic_dist_config(void __iomem *base, int gic_irqs,
void gic_cpu_config(void __iomem *base, void (*sync_access)(void));
void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
void *data);
+void gic_set_irq_prio(unsigned int irq, void __iomem *base, u8 prio);
+u8 gic_get_irq_prio(unsigned int irq, void __iomem *base);
void gic_set_kvm_info(const struct gic_kvm_info *info);
--
1.9.1
The values non secure EL1 needs to use for priority registers depends on
the value of SCR_EL3.FIQ.
Since we don't have access to SCR_EL3, we fake an interrupt and compare the
GIC priority with the one present in the [re]distributor.
Also, add firmware requirements related to SCR_EL3.
Signed-off-by: Julien Thierry <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
Documentation/arm64/booting.txt | 5 +++
arch/arm64/include/asm/arch_gicv3.h | 5 +++
arch/arm64/include/asm/irqflags.h | 6 +++
arch/arm64/include/asm/sysreg.h | 1 +
drivers/irqchip/irq-gic-v3.c | 86 +++++++++++++++++++++++++++++++++++++
5 files changed, 103 insertions(+)
diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 8d0df62..e387938 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -188,6 +188,11 @@ Before jumping into the kernel, the following conditions must be met:
the kernel image will be entered must be initialised by software at a
higher exception level to prevent execution in an UNKNOWN state.
+ - SCR_EL3.FIQ must have the same value across all CPUs the kernel is
+ executing on.
+ - The value of SCR_EL3.FIQ must be the same as the one present at boot
+ time whenever the kernel is executing.
+
For systems with a GICv3 interrupt controller to be used in v3 mode:
- If EL3 is present:
ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1.
diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h
index 490bb3a..ac7b7f6 100644
--- a/arch/arm64/include/asm/arch_gicv3.h
+++ b/arch/arm64/include/asm/arch_gicv3.h
@@ -124,6 +124,11 @@ static inline void gic_write_bpr1(u32 val)
write_sysreg_s(val, SYS_ICC_BPR1_EL1);
}
+static inline u32 gic_read_rpr(void)
+{
+ return read_sysreg_s(SYS_ICC_RPR_EL1);
+}
+
#define gic_read_typer(c) readq_relaxed(c)
#define gic_write_irouter(v, c) writeq_relaxed(v, c)
#define gic_read_lpir(c) readq_relaxed(c)
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index 3d5d443..d25e7ee 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -217,6 +217,12 @@ static inline int arch_irqs_disabled_flags(unsigned long flags)
!(ARCH_FLAGS_GET_PMR(flags) & ICC_PMR_EL1_EN_BIT);
}
+/* Mask IRQs at CPU level instead of GIC level */
+static inline void arch_irqs_daif_disable(void)
+{
+ asm volatile ("msr daifset, #2" : : : "memory");
+}
+
void maybe_switch_to_sysreg_gic_cpuif(void);
#endif /* CONFIG_IRQFLAGS_GIC_MASKING */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 08cc885..46fa869 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -304,6 +304,7 @@
#define SYS_ICC_SRE_EL1 sys_reg(3, 0, 12, 12, 5)
#define SYS_ICC_IGRPEN0_EL1 sys_reg(3, 0, 12, 12, 6)
#define SYS_ICC_IGRPEN1_EL1 sys_reg(3, 0, 12, 12, 7)
+#define SYS_ICC_RPR_EL1 sys_reg(3, 0, 12, 11, 3)
#define SYS_CONTEXTIDR_EL1 sys_reg(3, 0, 13, 0, 1)
#define SYS_TPIDR_EL1 sys_reg(3, 0, 13, 0, 4)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index df51d96..58b5e89 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -63,6 +63,10 @@ struct gic_chip_data {
static struct gic_chip_data gic_data __read_mostly;
static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+DEFINE_STATIC_KEY_FALSE(have_non_secure_prio_view);
+#endif
+
static struct gic_kvm_info gic_v3_kvm_info;
static DEFINE_PER_CPU(bool, has_rss);
@@ -997,6 +1001,84 @@ static int partition_domain_translate(struct irq_domain *d,
.select = gic_irq_domain_select,
};
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+/*
+ * The behaviours of RPR and PMR registers differ depending on the value of
+ * SCR_EL3.FIQ, while the behaviour of priority registers of the distributor
+ * and redistributors is always the same.
+ *
+ * If SCR_EL3.FIQ == 1, the values used for RPR and PMR are the same as the ones
+ * programmed in the distributor and redistributors registers.
+ *
+ * Otherwise, the value presented by RPR as well as the value which will be
+ * compared against PMR is: (GIC_(R)DIST_PRI[irq] >> 1) | 0x80;
+ *
+ * see GICv3/GICv4 Architecture Specification (IHI0069D):
+ * - section 4.8.1 Non-secure accesses to register fields for Secure interrupt
+ * priorities.
+ * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1
+ * interrupt.
+ */
+static void __init gic_detect_prio_view(void)
+{
+ /*
+ * Randomly picked SGI, must be <= 8 as other SGIs might be
+ * used by the firmware.
+ */
+ const u32 fake_irqnr = 7;
+ const u32 fake_irqmask = BIT(fake_irqnr);
+ void __iomem * const rdist_base = gic_data_rdist_sgi_base();
+ unsigned long irq_flags;
+ u32 acked_irqnr;
+ bool was_enabled;
+
+ irq_flags = arch_local_save_flags();
+
+ arch_irqs_daif_disable();
+
+ was_enabled = (readl_relaxed(rdist_base + GICD_ISENABLER) &
+ fake_irqmask);
+
+ if (!was_enabled)
+ writel_relaxed(fake_irqmask, rdist_base + GICD_ISENABLER);
+
+ /* Need to unmask to acknowledge the IRQ */
+ gic_write_pmr(ICC_PMR_EL1_UNMASKED);
+ dsb(sy);
+
+ /* Fake a pending SGI */
+ writel_relaxed(fake_irqmask, rdist_base + GICD_ISPENDR);
+ dsb(sy);
+
+ do {
+ acked_irqnr = gic_read_iar();
+
+ if (acked_irqnr == fake_irqnr) {
+ if (gic_read_rpr() == gic_get_irq_prio(acked_irqnr,
+ rdist_base))
+ static_branch_enable(&have_non_secure_prio_view);
+ } else {
+ pr_warn("Unexpected IRQ for priority detection: %u\n",
+ acked_irqnr);
+ }
+
+ if (acked_irqnr < 1020) {
+ gic_write_eoir(acked_irqnr);
+ if (static_key_true(&supports_deactivate))
+ gic_write_dir(acked_irqnr);
+ }
+ } while (acked_irqnr == ICC_IAR1_EL1_SPURIOUS);
+
+ /* Restore enabled state */
+ if (!was_enabled) {
+ writel_relaxed(fake_irqmask, rdist_base + GICD_ICENABLER);
+ gic_redist_wait_for_rwp();
+ }
+
+ arch_local_irq_restore(irq_flags);
+}
+#endif
+
static int __init gic_init_bases(void __iomem *dist_base,
struct redist_region *rdist_regs,
u32 nr_redist_regions,
@@ -1057,6 +1139,10 @@ static int __init gic_init_bases(void __iomem *dist_base,
gic_cpu_init();
gic_cpu_pm_init();
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ gic_detect_prio_view();
+#endif
+
return 0;
out_free:
--
1.9.1
arm64 does not provide native NMIs. Emulate the NMI behaviour using GIC
priorities.
Add the possibility to set an IRQ as an NMI and the handling of the NMI.
If the view of GIC priorities is the secure one (i.e. SCR_EL3.FIQ == 0), do
not allow the use of NMIs. Emit a warning when attempting to set an IRQ as
NMI under this scenario.
Signed-off-by: Julien Thierry <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
---
arch/arm64/kernel/entry.S | 56 +++++++++++++++++
drivers/irqchip/irq-gic-v3.c | 141 +++++++++++++++++++++++++++++++++++++++++++
include/linux/interrupt.h | 1 +
3 files changed, 198 insertions(+)
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 8209b45..170215c 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -355,6 +355,18 @@ alternative_else_nop_endif
mov sp, x19
.endm
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ /* Should be checked on return from irq handlers */
+ .macro branch_if_was_nmi, tmp, target
+ alternative_if ARM64_HAS_SYSREG_GIC_CPUIF
+ mrs \tmp, daif
+ alternative_else
+ mov \tmp, #0
+ alternative_endif
+ tbnz \tmp, #7, \target // Exiting an NMI
+ .endm
+#endif
+
/*
* These are the registers used in the syscall handler, and allow us to
* have in theory up to 7 arguments to a function - x0 to x6.
@@ -574,12 +586,30 @@ ENDPROC(el1_sync)
el1_irq:
kernel_entry 1
enable_da_f
+
#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ ldr x20, [sp, #S_PMR_SAVE]
+ /* Irqs were disabled, don't trace */
+ tbz x20, ICC_PMR_EL1_EN_SHIFT, 1f
+#endif
bl trace_hardirqs_off
+1:
#endif
irq_handler
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ /*
+ * Irqs were disabled, we have an nmi.
+ * We might have interrupted a context with interrupt disabled that set
+ * NEED_RESCHED flag.
+ * Skip preemption and irq tracing if needed.
+ */
+ tbz x20, ICC_PMR_EL1_EN_SHIFT, untraced_irq_exit
+ branch_if_was_nmi x0, skip_preempt
+#endif
+
#ifdef CONFIG_PREEMPT
ldr w24, [tsk, #TSK_TI_PREEMPT] // get preempt count
cbnz w24, 1f // preempt count != 0
@@ -588,9 +618,13 @@ el1_irq:
bl el1_preempt
1:
#endif
+
+skip_preempt:
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_on
#endif
+
+untraced_irq_exit:
kernel_exit 1
ENDPROC(el1_irq)
@@ -810,6 +844,11 @@ el0_irq_naked:
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_on
#endif
+
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ branch_if_was_nmi x2, nmi_ret_to_user
+#endif
+
b ret_to_user
ENDPROC(el0_irq)
@@ -1000,8 +1039,15 @@ ENTRY(cpu_switch_to)
ldp x27, x28, [x8], #16
ldp x29, x9, [x8], #16
ldr lr, [x8]
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ mrs x10, daif
+ msr daifset, #2
+#endif
mov sp, x9
msr sp_el0, x1
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ msr daif, x10
+#endif
ret
ENDPROC(cpu_switch_to)
NOKPROBE(cpu_switch_to)
@@ -1018,3 +1064,13 @@ ENTRY(ret_from_fork)
b ret_to_user
ENDPROC(ret_from_fork)
NOKPROBE(ret_from_fork)
+
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+/*
+ * NMI return path to EL0
+ */
+nmi_ret_to_user:
+ ldr x1, [tsk, #TSK_TI_FLAGS]
+ b finish_ret_to_user
+ENDPROC(nmi_ret_to_user)
+#endif
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 58b5e89..8d348f9 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -34,6 +34,8 @@
#include <linux/irqchip/arm-gic-v3.h>
#include <linux/irqchip/irq-partition-percpu.h>
+#include <trace/events/irq.h>
+
#include <asm/cputype.h>
#include <asm/exception.h>
#include <asm/smp_plat.h>
@@ -41,6 +43,8 @@
#include "irq-gic-common.h"
+#define GICD_INT_NMI_PRI 0xa0
+
struct redist_region {
void __iomem *redist_base;
phys_addr_t phys_base;
@@ -227,6 +231,87 @@ static void gic_unmask_irq(struct irq_data *d)
gic_poke_irq(d, GICD_ISENABLER);
}
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+/*
+ * Chip flow handler for SPIs set as NMI
+ */
+static void handle_fasteoi_nmi(struct irq_desc *desc)
+{
+ struct irq_chip *chip = irq_desc_get_chip(desc);
+ struct irqaction *action = desc->action;
+ unsigned int irq = irq_desc_get_irq(desc);
+ irqreturn_t res;
+
+ if (chip->irq_ack)
+ chip->irq_ack(&desc->irq_data);
+
+ trace_irq_handler_entry(irq, action);
+ res = action->handler(irq, action->dev_id);
+ trace_irq_handler_exit(irq, action, res);
+
+ if (chip->irq_eoi)
+ chip->irq_eoi(&desc->irq_data);
+}
+
+/*
+ * Chip flow handler for PPIs set as NMI
+ */
+static void handle_percpu_devid_nmi(struct irq_desc *desc)
+{
+ struct irq_chip *chip = irq_desc_get_chip(desc);
+ struct irqaction *action = desc->action;
+ unsigned int irq = irq_desc_get_irq(desc);
+ irqreturn_t res;
+
+ if (chip->irq_ack)
+ chip->irq_ack(&desc->irq_data);
+
+ trace_irq_handler_entry(irq, action);
+ res = action->handler(irq, raw_cpu_ptr(action->percpu_dev_id));
+ trace_irq_handler_exit(irq, action, res);
+
+ if (chip->irq_eoi)
+ chip->irq_eoi(&desc->irq_data);
+}
+
+static int gic_irq_set_irqchip_prio(struct irq_data *d, bool val)
+{
+ u8 prio;
+ irq_flow_handler_t handler;
+
+ if (gic_peek_irq(d, GICD_ISENABLER)) {
+ pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
+ return -EPERM;
+ }
+
+ if (val) {
+ prio = GICD_INT_NMI_PRI;
+
+ if (gic_irq(d) < 32)
+ handler = handle_percpu_devid_nmi;
+ else
+ handler = handle_fasteoi_nmi;
+ } else {
+ prio = GICD_INT_DEF_PRI;
+
+ if (gic_irq(d) < 32)
+ handler = handle_percpu_devid_irq;
+ else
+ handler = handle_fasteoi_irq;
+ }
+
+ /*
+ * Already in a locked context for the desc from calling
+ * irq_set_irq_chip_state.
+ * It should be safe to simply modify the handler.
+ */
+ irq_to_desc(d->irq)->handle_irq = handler;
+ gic_set_irq_prio(gic_irq(d), gic_dist_base(d), prio);
+
+ return 0;
+}
+#endif
+
static int gic_irq_set_irqchip_state(struct irq_data *d,
enum irqchip_irq_state which, bool val)
{
@@ -248,6 +333,18 @@ static int gic_irq_set_irqchip_state(struct irq_data *d,
reg = val ? GICD_ICENABLER : GICD_ISENABLER;
break;
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ case IRQCHIP_STATE_NMI:
+ if (static_branch_likely(&have_non_secure_prio_view)) {
+ return gic_irq_set_irqchip_prio(d, val);
+ } else if (val) {
+ pr_warn("Failed to set IRQ %u as NMI, NMIs are unsupported\n",
+ gic_irq(d));
+ return -EINVAL;
+ }
+ return 0;
+#endif
+
default:
return -EINVAL;
}
@@ -275,6 +372,13 @@ static int gic_irq_get_irqchip_state(struct irq_data *d,
*val = !gic_peek_irq(d, GICD_ISENABLER);
break;
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ case IRQCHIP_STATE_NMI:
+ *val = (gic_get_irq_prio(gic_irq(d), gic_dist_base(d)) ==
+ GICD_INT_NMI_PRI);
+ break;
+#endif
+
default:
return -EINVAL;
}
@@ -345,6 +449,22 @@ static u64 gic_mpidr_to_affinity(unsigned long mpidr)
return aff;
}
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+static void do_handle_nmi(unsigned int hwirq, struct pt_regs *regs)
+{
+ struct pt_regs *old_regs = set_irq_regs(regs);
+ unsigned int irq;
+
+ nmi_enter();
+
+ irq = irq_find_mapping(gic_data.domain, hwirq);
+ generic_handle_irq(irq);
+
+ nmi_exit();
+ set_irq_regs(old_regs);
+}
+#endif
+
static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
{
u32 irqnr;
@@ -360,6 +480,25 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
if (likely(irqnr > 15 && irqnr < 1020) || irqnr >= 8192) {
int err;
+#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
+ if (static_branch_likely(&have_non_secure_prio_view)
+ && unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
+ /*
+ * We need to prevent other NMIs to occur even after a
+ * priority drop.
+ * We keep I flag set until cpsr is restored from
+ * kernel_exit.
+ */
+ arch_irqs_daif_disable();
+
+ if (static_key_true(&supports_deactivate))
+ gic_write_eoir(irqnr);
+
+ do_handle_nmi(irqnr, regs);
+ return;
+ }
+#endif
+
if (static_key_true(&supports_deactivate))
gic_write_eoir(irqnr);
else {
@@ -1057,6 +1196,8 @@ static void __init gic_detect_prio_view(void)
if (gic_read_rpr() == gic_get_irq_prio(acked_irqnr,
rdist_base))
static_branch_enable(&have_non_secure_prio_view);
+ else
+ pr_warn("Cannot enable use of pseudo-NMIs\n");
} else {
pr_warn("Unexpected IRQ for priority detection: %u\n",
acked_irqnr);
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 69c2382..cdcefe4 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -421,6 +421,7 @@ enum irqchip_irq_state {
IRQCHIP_STATE_ACTIVE, /* Is interrupt in progress? */
IRQCHIP_STATE_MASKED, /* Is interrupt masked? */
IRQCHIP_STATE_LINE_LEVEL, /* Is IRQ line high? */
+ IRQCHIP_STATE_NMI, /* Is IRQ an NMI? */
};
extern int irq_get_irqchip_state(unsigned int irq, enum irqchip_irq_state which,
--
1.9.1
From: Daniel Thompson <[email protected]>
Currently alternatives are applied very late in the boot process (and
a long time after we enable scheduling). Some alternative sequences,
such as those that alter the way CPU context is stored, must be applied
much earlier in the boot sequence.
Introduce apply_alternatives_early() to allow some alternatives to be
applied immediately after we detect the CPU features of the boot CPU.
Signed-off-by: Daniel Thompson <[email protected]>
Signed-off-by: Julien Thierry <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/alternative.h | 1 +
arch/arm64/kernel/alternative.c | 39 +++++++++++++++++++++++++++++++++---
arch/arm64/kernel/smp.c | 6 ++++++
3 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 4a85c69..1fc1cdb 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -20,6 +20,7 @@ struct alt_instr {
u8 alt_len; /* size of new instruction(s), <= orig_len */
};
+void __init apply_alternatives_early(void);
void __init apply_alternatives_all(void);
void apply_alternatives(void *start, size_t length);
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 6dd0a3a3..78051d4 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -28,6 +28,18 @@
#include <asm/sections.h>
#include <linux/stop_machine.h>
+/*
+ * early-apply features are detected using only the boot CPU and checked on
+ * secondary CPUs startup, even then,
+ * These early-apply features should only include features where we must
+ * patch the kernel very early in the boot process.
+ *
+ * Note that the cpufeature logic *must* be made aware of early-apply
+ * features to ensure they are reported as enabled without waiting
+ * for other CPUs to boot.
+ */
+#define EARLY_APPLY_FEATURE_MASK BIT(ARM64_HAS_SYSREG_GIC_CPUIF)
+
#define __ALT_PTR(a,f) ((void *)&(a)->f + (a)->f)
#define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset)
#define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset)
@@ -105,7 +117,8 @@ static u32 get_alt_insn(struct alt_instr *alt, __le32 *insnptr, __le32 *altinsnp
return insn;
}
-static void __apply_alternatives(void *alt_region, bool use_linear_alias)
+static void __apply_alternatives(void *alt_region, bool use_linear_alias,
+ unsigned long feature_mask)
{
struct alt_instr *alt;
struct alt_region *region = alt_region;
@@ -115,6 +128,9 @@ static void __apply_alternatives(void *alt_region, bool use_linear_alias)
u32 insn;
int i, nr_inst;
+ if ((BIT(alt->cpufeature) & feature_mask) == 0)
+ continue;
+
if (!cpus_have_cap(alt->cpufeature))
continue;
@@ -138,6 +154,21 @@ static void __apply_alternatives(void *alt_region, bool use_linear_alias)
}
/*
+ * This is called very early in the boot process (directly after we run
+ * a feature detect on the boot CPU). No need to worry about other CPUs
+ * here.
+ */
+void apply_alternatives_early(void)
+{
+ struct alt_region region = {
+ .begin = (struct alt_instr *)__alt_instructions,
+ .end = (struct alt_instr *)__alt_instructions_end,
+ };
+
+ __apply_alternatives(®ion, true, EARLY_APPLY_FEATURE_MASK);
+}
+
+/*
* We might be patching the stop_machine state machine, so implement a
* really simple polling protocol here.
*/
@@ -156,7 +187,9 @@ static int __apply_alternatives_multi_stop(void *unused)
isb();
} else {
BUG_ON(patched);
- __apply_alternatives(®ion, true);
+
+ __apply_alternatives(®ion, true, ~EARLY_APPLY_FEATURE_MASK);
+
/* Barriers provided by the cache flushing */
WRITE_ONCE(patched, 1);
}
@@ -177,5 +210,5 @@ void apply_alternatives(void *start, size_t length)
.end = start + length,
};
- __apply_alternatives(®ion, false);
+ __apply_alternatives(®ion, false, -1);
}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 551eb07..37361b5 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -453,6 +453,12 @@ void __init smp_prepare_boot_cpu(void)
* cpuinfo_store_boot_cpu() above.
*/
update_cpu_errata_workarounds();
+ /*
+ * We now know enough about the boot CPU to apply the
+ * alternatives that cannot wait until interrupt handling
+ * and/or scheduling is enabled.
+ */
+ apply_alternatives_early();
}
static u64 __init of_get_cpu_mpidr(struct device_node *dn)
--
1.9.1
Hi,
On 17/01/18 11:54, Julien Thierry wrote:
> This series is a continuation of the work started by Daniel [1]. The goal
> is to use GICv3 interrupt priorities to simulate an NMI.
>
I have submitted a separate series making use of this feature for the
ARM PMUv3 interrupt [1].
[1] https://www.spinics.net/lists/arm-kernel/msg629402.html
Cheers,
--
Julien Thierry
On 17/01/18 11:54, Julien Thierry wrote:
> From: Daniel Thompson <[email protected]>
>
> Currently it is not possible to detect features of the boot CPU
> until the other CPUs have been brought up.
>
> This prevents us from reacting to features of the boot CPU until
> fairly late in the boot process. To solve this we allow a subset
> of features (that are likely to be common to all clusters) to be
> detected based on the boot CPU alone.
>
> Signed-off-by: Daniel Thompson <[email protected]>
> [[email protected]: check non-boot cpu missing early features, avoid
> duplicates between early features and normal
> features]
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Suzuki K Poulose <[email protected]>
> ---
> arch/arm64/kernel/cpufeature.c | 69 ++++++++++++++++++++++++++++--------------
> 1 file changed, 47 insertions(+), 22 deletions(-)
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index a73a592..6698404 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -52,6 +52,8 @@
> DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
> EXPORT_SYMBOL(cpu_hwcaps);
>
> +static void __init setup_early_feature_capabilities(void);
> +
> /*
> * Flag to indicate if we have computed the system wide
> * capabilities based on the boot time active CPUs. This
> @@ -542,6 +544,8 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
> init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
> sve_init_vq_map();
> }
> +
> + setup_early_feature_capabilities();
> }
>
> static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
> @@ -846,7 +850,7 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
> ID_AA64PFR0_FP_SHIFT) < 0;
> }
>
> -static const struct arm64_cpu_capabilities arm64_features[] = {
> +static const struct arm64_cpu_capabilities arm64_early_features[] = {
> {
> .desc = "GIC system register CPU interface",
> .capability = ARM64_HAS_SYSREG_GIC_CPUIF,
> @@ -857,6 +861,10 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
> .sign = FTR_UNSIGNED,
> .min_field_value = 1,
> },
> + {}
> +};
> +
Julien,
One potential problem with this is that we don't have a way
to make this work on a "theoretical" system with and without
GIC system reg interface. i.e, if we don't have the CONFIG
enabled for using ICC system regs for IRQ flags, the kernel
could still panic. I understand this is not a "normal" configuration
but, may be we could make the panic option based on whether
we actually use the system regs early enough ?
Btw, I am rewriting the capabilities infrastructure to allow per-cap
control on how it should be treated. I might add an EARLY scope for
caps which could cover this and may be VHE.
Suzuki
On 22/01/18 12:05, Suzuki K Poulose wrote:
> On 17/01/18 11:54, Julien Thierry wrote:
>> From: Daniel Thompson <[email protected]>
>>
>> Currently it is not possible to detect features of the boot CPU
>> until the other CPUs have been brought up.
>>
>> This prevents us from reacting to features of the boot CPU until
>> fairly late in the boot process. To solve this we allow a subset
>> of features (that are likely to be common to all clusters) to be
>> detected based on the boot CPU alone.
>>
>> Signed-off-by: Daniel Thompson <[email protected]>
>> [[email protected]: check non-boot cpu missing early features, avoid
>> duplicates between early features and normal
>> features]
>> Signed-off-by: Julien Thierry <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Suzuki K Poulose <[email protected]>
>> ---
>> arch/arm64/kernel/cpufeature.c | 69
>> ++++++++++++++++++++++++++++--------------
>> 1 file changed, 47 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/cpufeature.c
>> b/arch/arm64/kernel/cpufeature.c
>> index a73a592..6698404 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -52,6 +52,8 @@
>> DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
>> EXPORT_SYMBOL(cpu_hwcaps);
>>
>> +static void __init setup_early_feature_capabilities(void);
>> +
>> /*
>> * Flag to indicate if we have computed the system wide
>> * capabilities based on the boot time active CPUs. This
>> @@ -542,6 +544,8 @@ void __init init_cpu_features(struct cpuinfo_arm64
>> *info)
>> init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
>> sve_init_vq_map();
>> }
>> +
>> + setup_early_feature_capabilities();
>> }
>>
>> static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
>> @@ -846,7 +850,7 @@ static bool has_no_fpsimd(const struct
>> arm64_cpu_capabilities *entry, int __unus
>> ID_AA64PFR0_FP_SHIFT) < 0;
>> }
>>
>> -static const struct arm64_cpu_capabilities arm64_features[] = {
>> +static const struct arm64_cpu_capabilities arm64_early_features[] = {
>> {
>> .desc = "GIC system register CPU interface",
>> .capability = ARM64_HAS_SYSREG_GIC_CPUIF,
>> @@ -857,6 +861,10 @@ static bool has_no_fpsimd(const struct
>> arm64_cpu_capabilities *entry, int __unus
>> .sign = FTR_UNSIGNED,
>> .min_field_value = 1,
>> },
>> + {}
>> +};
>> +
>
>
> Julien,
>
> One potential problem with this is that we don't have a way
> to make this work on a "theoretical" system with and without
> GIC system reg interface. i.e, if we don't have the CONFIG
> enabled for using ICC system regs for IRQ flags, the kernel
> could still panic. I understand this is not a "normal" configuration
> but, may be we could make the panic option based on whether
> we actually use the system regs early enough ?
>
I see, however I'm not sure what happens in the GIC drivers if we have a
CPU running with a GICv3 and other CPUs with something else... But of
course this is not technically limited by the arm64 capabilities handling.
What behaviour would you be looking for? A way to prevent the CPU to be
brought up instead of panicking?
> Btw, I am rewriting the capabilities infrastructure to allow per-cap
> control on how it should be treated. I might add an EARLY scope for
> caps which could cover this and may be VHE.
Thanks,
--
Julien Thierry
On Mon, Jan 22, 2018 at 12:21:55PM +0000, Julien Thierry wrote:
> On 22/01/18 12:05, Suzuki K Poulose wrote:
> > On 17/01/18 11:54, Julien Thierry wrote:
> > > From: Daniel Thompson <[email protected]>
> > >
> > > Currently it is not possible to detect features of the boot CPU
> > > until the other CPUs have been brought up.
> > >
> > > This prevents us from reacting to features of the boot CPU until
> > > fairly late in the boot process. To solve this we allow a subset
> > > of features (that are likely to be common to all clusters) to be
> > > detected based on the boot CPU alone.
> > >
> > > Signed-off-by: Daniel Thompson <[email protected]>
> > > [[email protected]: check non-boot cpu missing early features, avoid
> > > duplicates between early features and normal
> > > features]
> > > Signed-off-by: Julien Thierry <[email protected]>
> > > Cc: Catalin Marinas <[email protected]>
> > > Cc: Will Deacon <[email protected]>
> > > Cc: Suzuki K Poulose <[email protected]>
> > > ---
> > > arch/arm64/kernel/cpufeature.c | 69
> > > ++++++++++++++++++++++++++++--------------
> > > 1 file changed, 47 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/arch/arm64/kernel/cpufeature.c
> > > b/arch/arm64/kernel/cpufeature.c
> > > index a73a592..6698404 100644
> > > --- a/arch/arm64/kernel/cpufeature.c
> > > +++ b/arch/arm64/kernel/cpufeature.c
> > > @@ -52,6 +52,8 @@
> > > DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
> > > EXPORT_SYMBOL(cpu_hwcaps);
> > >
> > > +static void __init setup_early_feature_capabilities(void);
> > > +
> > > /*
> > > * Flag to indicate if we have computed the system wide
> > > * capabilities based on the boot time active CPUs. This
> > > @@ -542,6 +544,8 @@ void __init init_cpu_features(struct
> > > cpuinfo_arm64 *info)
> > > init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
> > > sve_init_vq_map();
> > > }
> > > +
> > > + setup_early_feature_capabilities();
> > > }
> > >
> > > static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
> > > @@ -846,7 +850,7 @@ static bool has_no_fpsimd(const struct
> > > arm64_cpu_capabilities *entry, int __unus
> > > ID_AA64PFR0_FP_SHIFT) < 0;
> > > }
> > >
> > > -static const struct arm64_cpu_capabilities arm64_features[] = {
> > > +static const struct arm64_cpu_capabilities arm64_early_features[] = {
> > > {
> > > .desc = "GIC system register CPU interface",
> > > .capability = ARM64_HAS_SYSREG_GIC_CPUIF,
> > > @@ -857,6 +861,10 @@ static bool has_no_fpsimd(const struct
> > > arm64_cpu_capabilities *entry, int __unus
> > > .sign = FTR_UNSIGNED,
> > > .min_field_value = 1,
> > > },
> > > + {}
> > > +};
> > > +
> >
> >
> > Julien,
> >
> > One potential problem with this is that we don't have a way
> > to make this work on a "theoretical" system with and without
> > GIC system reg interface. i.e, if we don't have the CONFIG
> > enabled for using ICC system regs for IRQ flags, the kernel
> > could still panic. I understand this is not a "normal" configuration
> > but, may be we could make the panic option based on whether
> > we actually use the system regs early enough ?
> >
>
> I see, however I'm not sure what happens in the GIC drivers if we have a CPU
> running with a GICv3 and other CPUs with something else... But of course
> this is not technically limited by the arm64 capabilities handling.
Shouldn't each CPU be sharing the same GIC anyway? It so its not some
have GICv3+ and some have GICv2. The theoretical system described above
*has* a GICv3+ but some participants in the cluster are not able to
talk to it as like a co-processor.
The ARM ARM is a little vague about whether, if a GIC implements a
system register interface, then a core must provide access to it. Even
so, first question is whether such a system is architecture compliant?
Daniel.
> What behaviour would you be looking for? A way to prevent the CPU to be
> brought up instead of panicking?
>
> > Btw, I am rewriting the capabilities infrastructure to allow per-cap
> > control on how it should be treated. I might add an EARLY scope for
> > caps which could cover this and may be VHE.
>
> Thanks,
>
> --
> Julien Thierry
On 22/01/18 13:38, Daniel Thompson wrote:
> On Mon, Jan 22, 2018 at 12:21:55PM +0000, Julien Thierry wrote:
>> On 22/01/18 12:05, Suzuki K Poulose wrote:
>>> On 17/01/18 11:54, Julien Thierry wrote:
>>>> From: Daniel Thompson <[email protected]>
>>>>
>>>> Currently it is not possible to detect features of the boot CPU
>>>> until the other CPUs have been brought up.
>>>>
>>>> This prevents us from reacting to features of the boot CPU until
>>>> fairly late in the boot process. To solve this we allow a subset
>>>> of features (that are likely to be common to all clusters) to be
>>>> detected based on the boot CPU alone.
>>>>
>>>> Signed-off-by: Daniel Thompson <[email protected]>
>>>> [[email protected]: check non-boot cpu missing early features, avoid
>>>> duplicates between early features and normal
>>>> features]
>>>> Signed-off-by: Julien Thierry <[email protected]>
>>>> Cc: Catalin Marinas <[email protected]>
>>>> Cc: Will Deacon <[email protected]>
>>>> Cc: Suzuki K Poulose <[email protected]>
>>>> ---
>>>> arch/arm64/kernel/cpufeature.c | 69
>>>> ++++++++++++++++++++++++++++--------------
>>>> 1 file changed, 47 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/cpufeature.c
>>>> b/arch/arm64/kernel/cpufeature.c
>>>> index a73a592..6698404 100644
>>>> --- a/arch/arm64/kernel/cpufeature.c
>>>> +++ b/arch/arm64/kernel/cpufeature.c
>>>> @@ -52,6 +52,8 @@
>>>> DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
>>>> EXPORT_SYMBOL(cpu_hwcaps);
>>>>
>>>> +static void __init setup_early_feature_capabilities(void);
>>>> +
>>>> /*
>>>> * Flag to indicate if we have computed the system wide
>>>> * capabilities based on the boot time active CPUs. This
>>>> @@ -542,6 +544,8 @@ void __init init_cpu_features(struct
>>>> cpuinfo_arm64 *info)
>>>> init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
>>>> sve_init_vq_map();
>>>> }
>>>> +
>>>> + setup_early_feature_capabilities();
>>>> }
>>>>
>>>> static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
>>>> @@ -846,7 +850,7 @@ static bool has_no_fpsimd(const struct
>>>> arm64_cpu_capabilities *entry, int __unus
>>>> ID_AA64PFR0_FP_SHIFT) < 0;
>>>> }
>>>>
>>>> -static const struct arm64_cpu_capabilities arm64_features[] = {
>>>> +static const struct arm64_cpu_capabilities arm64_early_features[] = {
>>>> {
>>>> .desc = "GIC system register CPU interface",
>>>> .capability = ARM64_HAS_SYSREG_GIC_CPUIF,
>>>> @@ -857,6 +861,10 @@ static bool has_no_fpsimd(const struct
>>>> arm64_cpu_capabilities *entry, int __unus
>>>> .sign = FTR_UNSIGNED,
>>>> .min_field_value = 1,
>>>> },
>>>> + {}
>>>> +};
>>>> +
>>>
>>>
>>> Julien,
>>>
>>> One potential problem with this is that we don't have a way
>>> to make this work on a "theoretical" system with and without
>>> GIC system reg interface. i.e, if we don't have the CONFIG
>>> enabled for using ICC system regs for IRQ flags, the kernel
>>> could still panic. I understand this is not a "normal" configuration
>>> but, may be we could make the panic option based on whether
>>> we actually use the system regs early enough ?
>>>
>>
>> I see, however I'm not sure what happens in the GIC drivers if we have a CPU
>> running with a GICv3 and other CPUs with something else... But of course
>> this is not technically limited by the arm64 capabilities handling.
>
> Shouldn't each CPU be sharing the same GIC anyway? It so its not some
> have GICv3+ and some have GICv2. The theoretical system described above
> *has* a GICv3+ but some participants in the cluster are not able to
> talk to it as like a co-processor.
There is some level of confusion between the GIC CPU interface (which is
really in the CPU) and the GIC itself. You can easily end-up in a
situation where you do have the HW, but it is configured in a way that
prevents you from using it. Case in point: GICv3 with GICv2
compatibility used in virtualization.
> The ARM ARM is a little vague about whether, if a GIC implements a
> system register interface, then a core must provide access to it. Even
> so, first question is whether such a system is architecture compliant?
Again, it is not the GIC that implements the system registers. And no,
these system registers are not required to be accessible (see
ICC_SRE_EL2.Enable == 0 for example).
So I believe there is value in checking those as early as possible, and
set the expectations accordingly (such as in [1] and [2]).
Thanks,
M.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/irqchip/irq-gic-v3.c#n536
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/cpufeature.c#n798
--
Jazz is not dead. It just smells funny...
On 22/01/18 13:57, Marc Zyngier wrote:
> On 22/01/18 13:38, Daniel Thompson wrote:
>> On Mon, Jan 22, 2018 at 12:21:55PM +0000, Julien Thierry wrote:
>>> On 22/01/18 12:05, Suzuki K Poulose wrote:
>>>> On 17/01/18 11:54, Julien Thierry wrote:
>>>>> From: Daniel Thompson <[email protected]>
>>>>>
>>>>> Currently it is not possible to detect features of the boot CPU
>>>>> until the other CPUs have been brought up.
>>>>>
>>>>> This prevents us from reacting to features of the boot CPU until
>>>>> fairly late in the boot process. To solve this we allow a subset
>>>>> of features (that are likely to be common to all clusters) to be
>>>>> detected based on the boot CPU alone.
>>>>>
>>>>> Signed-off-by: Daniel Thompson <[email protected]>
>>>>> [[email protected]: check non-boot cpu missing early features, avoid
>>>>> duplicates between early features and normal
>>>>> features]
>>>>> Signed-off-by: Julien Thierry <[email protected]>
>>>>> Cc: Catalin Marinas <[email protected]>
>>>>> Cc: Will Deacon <[email protected]>
>>>>> Cc: Suzuki K Poulose <[email protected]>
>>>>> ---
>>>>> arch/arm64/kernel/cpufeature.c | 69
>>>>> ++++++++++++++++++++++++++++--------------
>>>>> 1 file changed, 47 insertions(+), 22 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm64/kernel/cpufeature.c
>>>>> b/arch/arm64/kernel/cpufeature.c
>>>>> index a73a592..6698404 100644
>>>>> --- a/arch/arm64/kernel/cpufeature.c
>>>>> +++ b/arch/arm64/kernel/cpufeature.c
>>>>> @@ -52,6 +52,8 @@
>>>>> DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
>>>>> EXPORT_SYMBOL(cpu_hwcaps);
>>>>>
>>>>> +static void __init setup_early_feature_capabilities(void);
>>>>> +
>>>>> /*
>>>>> * Flag to indicate if we have computed the system wide
>>>>> * capabilities based on the boot time active CPUs. This
>>>>> @@ -542,6 +544,8 @@ void __init init_cpu_features(struct
>>>>> cpuinfo_arm64 *info)
>>>>> init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
>>>>> sve_init_vq_map();
>>>>> }
>>>>> +
>>>>> + setup_early_feature_capabilities();
>>>>> }
>>>>>
>>>>> static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
>>>>> @@ -846,7 +850,7 @@ static bool has_no_fpsimd(const struct
>>>>> arm64_cpu_capabilities *entry, int __unus
>>>>> ID_AA64PFR0_FP_SHIFT) < 0;
>>>>> }
>>>>>
>>>>> -static const struct arm64_cpu_capabilities arm64_features[] = {
>>>>> +static const struct arm64_cpu_capabilities arm64_early_features[] = {
>>>>> {
>>>>> .desc = "GIC system register CPU interface",
>>>>> .capability = ARM64_HAS_SYSREG_GIC_CPUIF,
>>>>> @@ -857,6 +861,10 @@ static bool has_no_fpsimd(const struct
>>>>> arm64_cpu_capabilities *entry, int __unus
>>>>> .sign = FTR_UNSIGNED,
>>>>> .min_field_value = 1,
>>>>> },
>>>>> + {}
>>>>> +};
>>>>> +
>>>>
>>>>
>>>> Julien,
>>>>
>>>> One potential problem with this is that we don't have a way
>>>> to make this work on a "theoretical" system with and without
>>>> GIC system reg interface. i.e, if we don't have the CONFIG
>>>> enabled for using ICC system regs for IRQ flags, the kernel
>>>> could still panic. I understand this is not a "normal" configuration
>>>> but, may be we could make the panic option based on whether
>>>> we actually use the system regs early enough ?
>>>>
>>>
>>> I see, however I'm not sure what happens in the GIC drivers if we have a CPU
>>> running with a GICv3 and other CPUs with something else... But of course
>>> this is not technically limited by the arm64 capabilities handling.
>>
>> Shouldn't each CPU be sharing the same GIC anyway? It so its not some
>> have GICv3+ and some have GICv2. The theoretical system described above
>> *has* a GICv3+ but some participants in the cluster are not able to
>> talk to it as like a co-processor.
>
> There is some level of confusion between the GIC CPU interface (which is
> really in the CPU) and the GIC itself. You can easily end-up in a
> situation where you do have the HW, but it is configured in a way that
> prevents you from using it. Case in point: GICv3 with GICv2
> compatibility used in virtualization.
>
>> The ARM ARM is a little vague about whether, if a GIC implements a
>> system register interface, then a core must provide access to it. Even
>> so, first question is whether such a system is architecture compliant?
>
> Again, it is not the GIC that implements the system registers. And no,
> these system registers are not required to be accessible (see
> ICC_SRE_EL2.Enable == 0 for example).
>
> So I believe there is value in checking those as early as possible, and
> set the expectations accordingly (such as in [1] and [2]).
>
So in the end, if we boot on a CPU that can access ICC_CPUIF, it looks
like we'll prevent bringing up the CPUs that cannot access the
ICC_CPUIF, and if we boot on a CPU that cannot access ICC_CPUIF,
everything that gets brought up afterwards will be run on GICv2
compatibility mode?
We never run different GIC driver on different CPUs, right?
In the patch, check_early_cpu_features panics when features don't match,
but nothing really prevents us to use cpu_die_early instead.
Would that solve the issue Suzuki?
Cheers,
--
Julien Thierry
On 22/01/18 14:14, Julien Thierry wrote:
>
>
> On 22/01/18 13:57, Marc Zyngier wrote:
>> On 22/01/18 13:38, Daniel Thompson wrote:
>>> On Mon, Jan 22, 2018 at 12:21:55PM +0000, Julien Thierry wrote:
>>>> On 22/01/18 12:05, Suzuki K Poulose wrote:
>>>>> On 17/01/18 11:54, Julien Thierry wrote:
>>>>>> From: Daniel Thompson <[email protected]>
>>>>>>
>>>>>> Currently it is not possible to detect features of the boot CPU
>>>>>> until the other CPUs have been brought up.
>>>>>>
>>>>>> This prevents us from reacting to features of the boot CPU until
>>>>>> fairly late in the boot process. To solve this we allow a subset
>>>>>> of features (that are likely to be common to all clusters) to be
>>>>>> detected based on the boot CPU alone.
>>>>>>
>>>>>> Signed-off-by: Daniel Thompson <[email protected]>
>>>>>> [[email protected]: check non-boot cpu missing early features, avoid
>>>>>> duplicates between early features and normal
>>>>>> features]
>>>>>> Signed-off-by: Julien Thierry <[email protected]>
>>>>>> Cc: Catalin Marinas <[email protected]>
>>>>>> Cc: Will Deacon <[email protected]>
>>>>>> Cc: Suzuki K Poulose <[email protected]>
>>>>>> ---
>>>>>> arch/arm64/kernel/cpufeature.c | 69
>>>>>> ++++++++++++++++++++++++++++--------------
>>>>>> 1 file changed, 47 insertions(+), 22 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/arm64/kernel/cpufeature.c
>>>>>> b/arch/arm64/kernel/cpufeature.c
>>>>>> index a73a592..6698404 100644
>>>>>> --- a/arch/arm64/kernel/cpufeature.c
>>>>>> +++ b/arch/arm64/kernel/cpufeature.c
>>>>>> @@ -52,6 +52,8 @@
>>>>>> DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
>>>>>> EXPORT_SYMBOL(cpu_hwcaps);
>>>>>>
>>>>>> +static void __init setup_early_feature_capabilities(void);
>>>>>> +
>>>>>> /*
>>>>>> * Flag to indicate if we have computed the system wide
>>>>>> * capabilities based on the boot time active CPUs. This
>>>>>> @@ -542,6 +544,8 @@ void __init init_cpu_features(struct
>>>>>> cpuinfo_arm64 *info)
>>>>>> init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
>>>>>> sve_init_vq_map();
>>>>>> }
>>>>>> +
>>>>>> + setup_early_feature_capabilities();
>>>>>> }
>>>>>>
>>>>>> static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
>>>>>> @@ -846,7 +850,7 @@ static bool has_no_fpsimd(const struct
>>>>>> arm64_cpu_capabilities *entry, int __unus
>>>>>> ID_AA64PFR0_FP_SHIFT) < 0;
>>>>>> }
>>>>>>
>>>>>> -static const struct arm64_cpu_capabilities arm64_features[] = {
>>>>>> +static const struct arm64_cpu_capabilities arm64_early_features[] = {
>>>>>> {
>>>>>> .desc = "GIC system register CPU interface",
>>>>>> .capability = ARM64_HAS_SYSREG_GIC_CPUIF,
>>>>>> @@ -857,6 +861,10 @@ static bool has_no_fpsimd(const struct
>>>>>> arm64_cpu_capabilities *entry, int __unus
>>>>>> .sign = FTR_UNSIGNED,
>>>>>> .min_field_value = 1,
>>>>>> },
>>>>>> + {}
>>>>>> +};
>>>>>> +
>>>>>
>>>>>
>>>>> Julien,
>>>>>
>>>>> One potential problem with this is that we don't have a way
>>>>> to make this work on a "theoretical" system with and without
>>>>> GIC system reg interface. i.e, if we don't have the CONFIG
>>>>> enabled for using ICC system regs for IRQ flags, the kernel
>>>>> could still panic. I understand this is not a "normal" configuration
>>>>> but, may be we could make the panic option based on whether
>>>>> we actually use the system regs early enough ?
>>>>>
>>>>
>>>> I see, however I'm not sure what happens in the GIC drivers if we have a CPU
>>>> running with a GICv3 and other CPUs with something else... But of course
>>>> this is not technically limited by the arm64 capabilities handling.
>>>
>>> Shouldn't each CPU be sharing the same GIC anyway? It so its not some
>>> have GICv3+ and some have GICv2. The theoretical system described above
>>> *has* a GICv3+ but some participants in the cluster are not able to
>>> talk to it as like a co-processor.
>>
>> There is some level of confusion between the GIC CPU interface (which is
>> really in the CPU) and the GIC itself. You can easily end-up in a
>> situation where you do have the HW, but it is configured in a way that
>> prevents you from using it. Case in point: GICv3 with GICv2
>> compatibility used in virtualization.
>>
>>> The ARM ARM is a little vague about whether, if a GIC implements a
>>> system register interface, then a core must provide access to it. Even
>>> so, first question is whether such a system is architecture compliant?
>>
>> Again, it is not the GIC that implements the system registers. And no,
>> these system registers are not required to be accessible (see
>> ICC_SRE_EL2.Enable == 0 for example).
>>
>> So I believe there is value in checking those as early as possible, and
>> set the expectations accordingly (such as in [1] and [2]).
>>
>
> So in the end, if we boot on a CPU that can access ICC_CPUIF, it looks
> like we'll prevent bringing up the CPUs that cannot access the
> ICC_CPUIF,
Correct.
> and if we boot on a CPU that cannot access ICC_CPUIF,
> everything that gets brought up afterwards will be run on GICv2
> compatibility mode?
Probably not, as I assume the firmware still gives you the description
of a GICv3, so things will grind to a halt at that point.
> We never run different GIC driver on different CPUs, right?
We don't. And please stop giving people horrible ideas! ;-)
Thanks,
M.
> In the patch, check_early_cpu_features panics when features don't match,
> but nothing really prevents us to use cpu_die_early instead.
>
> Would that solve the issue Suzuki?
>
> Cheers,
>
--
Jazz is not dead. It just smells funny...
On 22/01/18 12:21, Julien Thierry wrote:
>
>
> On 22/01/18 12:05, Suzuki K Poulose wrote:
>> On 17/01/18 11:54, Julien Thierry wrote:
>>> From: Daniel Thompson <[email protected]>
>>>
>>> Currently it is not possible to detect features of the boot CPU
>>> until the other CPUs have been brought up.
>>>
>>> This prevents us from reacting to features of the boot CPU until
>>> fairly late in the boot process. To solve this we allow a subset
>>> of features (that are likely to be common to all clusters) to be
>>> detected based on the boot CPU alone.
>>>
>>> Signed-off-by: Daniel Thompson <[email protected]>
>>> [[email protected]: check non-boot cpu missing early features, avoid
>>> duplicates between early features and normal
>>> features]
>>> Signed-off-by: Julien Thierry <[email protected]>
>>> Cc: Catalin Marinas <[email protected]>
>>> Cc: Will Deacon <[email protected]>
>>> Cc: Suzuki K Poulose <[email protected]>
>>> ---
>>> arch/arm64/kernel/cpufeature.c | 69 ++++++++++++++++++++++++++++--------------
>>> 1 file changed, 47 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>>> index a73a592..6698404 100644
>>> --- a/arch/arm64/kernel/cpufeature.c
>>> +++ b/arch/arm64/kernel/cpufeature.c
>>> @@ -52,6 +52,8 @@
>>> DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
>>> EXPORT_SYMBOL(cpu_hwcaps);
>>>
>>> +static void __init setup_early_feature_capabilities(void);
>>> +
>>> /*
>>> * Flag to indicate if we have computed the system wide
>>> * capabilities based on the boot time active CPUs. This
>>> @@ -542,6 +544,8 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
>>> init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
>>> sve_init_vq_map();
>>> }
>>> +
>>> + setup_early_feature_capabilities();
>>> }
>>>
>>> static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
>>> @@ -846,7 +850,7 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
>>> ID_AA64PFR0_FP_SHIFT) < 0;
>>> }
>>>
>>> -static const struct arm64_cpu_capabilities arm64_features[] = {
>>> +static const struct arm64_cpu_capabilities arm64_early_features[] = {
>>> {
>>> .desc = "GIC system register CPU interface",
>>> .capability = ARM64_HAS_SYSREG_GIC_CPUIF,
>>> @@ -857,6 +861,10 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
>>> .sign = FTR_UNSIGNED,
>>> .min_field_value = 1,
>>> },
>>> + {}
>>> +};
>>> +
>>
>>
>> Julien,
>>
>> One potential problem with this is that we don't have a way
>> to make this work on a "theoretical" system with and without
>> GIC system reg interface. i.e, if we don't have the CONFIG
>> enabled for using ICC system regs for IRQ flags, the kernel
>> could still panic. I understand this is not a "normal" configuration
>> but, may be we could make the panic option based on whether
>> we actually use the system regs early enough ?
>>
>
> I see, however I'm not sure what happens in the GIC drivers if we have a CPU running with a GICv3 and other CPUs with something else... But of course this is not technically limited by the arm64 capabilities handling.
>
> What behaviour would you be looking for? A way to prevent the CPU to be brought up instead of panicking?
>
If we have the CONFIG enabled for using system regs, we can continue
to panic the system. Otherwise, we should ignore the mismatch early,
as we don't use the system register access unless all boot time active
CPUs have it.
In a nutshell, this is an early feature only if the CONFIG is enabled,
otherwise should fall back to the normal behavior.
Cheers
Suzuki
On 22/01/18 14:45, Suzuki K Poulose wrote:
> On 22/01/18 12:21, Julien Thierry wrote:
>>
>>
>> On 22/01/18 12:05, Suzuki K Poulose wrote:
>>> On 17/01/18 11:54, Julien Thierry wrote:
>>>> From: Daniel Thompson <[email protected]>
>>>>
>>>> Currently it is not possible to detect features of the boot CPU
>>>> until the other CPUs have been brought up.
>>>>
>>>> This prevents us from reacting to features of the boot CPU until
>>>> fairly late in the boot process. To solve this we allow a subset
>>>> of features (that are likely to be common to all clusters) to be
>>>> detected based on the boot CPU alone.
>>>>
>>>> Signed-off-by: Daniel Thompson <[email protected]>
>>>> [[email protected]: check non-boot cpu missing early features,
>>>> avoid
>>>> duplicates between early features and normal
>>>> features]
>>>> Signed-off-by: Julien Thierry <[email protected]>
>>>> Cc: Catalin Marinas <[email protected]>
>>>> Cc: Will Deacon <[email protected]>
>>>> Cc: Suzuki K Poulose <[email protected]>
>>>> ---
>>>> arch/arm64/kernel/cpufeature.c | 69
>>>> ++++++++++++++++++++++++++++--------------
>>>> 1 file changed, 47 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/cpufeature.c
>>>> b/arch/arm64/kernel/cpufeature.c
>>>> index a73a592..6698404 100644
>>>> --- a/arch/arm64/kernel/cpufeature.c
>>>> +++ b/arch/arm64/kernel/cpufeature.c
>>>> @@ -52,6 +52,8 @@
>>>> DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
>>>> EXPORT_SYMBOL(cpu_hwcaps);
>>>>
>>>> +static void __init setup_early_feature_capabilities(void);
>>>> +
>>>> /*
>>>> * Flag to indicate if we have computed the system wide
>>>> * capabilities based on the boot time active CPUs. This
>>>> @@ -542,6 +544,8 @@ void __init init_cpu_features(struct
>>>> cpuinfo_arm64 *info)
>>>> init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
>>>> sve_init_vq_map();
>>>> }
>>>> +
>>>> + setup_early_feature_capabilities();
>>>> }
>>>>
>>>> static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
>>>> @@ -846,7 +850,7 @@ static bool has_no_fpsimd(const struct
>>>> arm64_cpu_capabilities *entry, int __unus
>>>> ID_AA64PFR0_FP_SHIFT) < 0;
>>>> }
>>>>
>>>> -static const struct arm64_cpu_capabilities arm64_features[] = {
>>>> +static const struct arm64_cpu_capabilities arm64_early_features[] = {
>>>> {
>>>> .desc = "GIC system register CPU interface",
>>>> .capability = ARM64_HAS_SYSREG_GIC_CPUIF,
>>>> @@ -857,6 +861,10 @@ static bool has_no_fpsimd(const struct
>>>> arm64_cpu_capabilities *entry, int __unus
>>>> .sign = FTR_UNSIGNED,
>>>> .min_field_value = 1,
>>>> },
>>>> + {}
>>>> +};
>>>> +
>>>
>>>
>>> Julien,
>>>
>>> One potential problem with this is that we don't have a way
>>> to make this work on a "theoretical" system with and without
>>> GIC system reg interface. i.e, if we don't have the CONFIG
>>> enabled for using ICC system regs for IRQ flags, the kernel
>>> could still panic. I understand this is not a "normal" configuration
>>> but, may be we could make the panic option based on whether
>>> we actually use the system regs early enough ?
>>>
>>
>> I see, however I'm not sure what happens in the GIC drivers if we have
>> a CPU running with a GICv3 and other CPUs with something else... But
>> of course this is not technically limited by the arm64 capabilities
>> handling.
>>
>> What behaviour would you be looking for? A way to prevent the CPU to
>> be brought up instead of panicking?
>>
>
> If we have the CONFIG enabled for using system regs, we can continue
> to panic the system. Otherwise, we should ignore the mismatch early,
> as we don't use the system register access unless all boot time active
> CPUs have it.
>
Hmmm, we use the CPUIF (if available) in the first CPU pretty much as
soon as we re-enable interrupts in the GICv3 driver, which is way before
the other CPUs are brought up.
other CPUs get to die_early().
> In a nutshell, this is an early feature only if the CONFIG is enabled,
> otherwise should fall back to the normal behavior.
>
Maybe we should just not panic and let the mismatching CPUs die.
It's a system wide feature and linux will try to make the other CPUs
match the boot CPU's config anyway.
--
Julien Thierry
On 22/01/18 15:01, Julien Thierry wrote:
>
>
> On 22/01/18 14:45, Suzuki K Poulose wrote:
>> On 22/01/18 12:21, Julien Thierry wrote:
>>>
>>>
>>> On 22/01/18 12:05, Suzuki K Poulose wrote:
>>>> On 17/01/18 11:54, Julien Thierry wrote:
>>>>> From: Daniel Thompson <[email protected]>
>>>>>
>>>>> Currently it is not possible to detect features of the boot CPU
>>>>> until the other CPUs have been brought up.
>>>>>
>>>>> This prevents us from reacting to features of the boot CPU until
>>>>> fairly late in the boot process. To solve this we allow a subset
>>>>> of features (that are likely to be common to all clusters) to be
>>>>> detected based on the boot CPU alone.
>>>>>
>>>>> Signed-off-by: Daniel Thompson <[email protected]>
>>>>> [[email protected]: check non-boot cpu missing early features, avoid
>>>>> duplicates between early features and normal
>>>>> features]
>>>>> Signed-off-by: Julien Thierry <[email protected]>
>>>>> Cc: Catalin Marinas <[email protected]>
>>>>> Cc: Will Deacon <[email protected]>
>>>>> Cc: Suzuki K Poulose <[email protected]>
>>>>> ---
>>>>> arch/arm64/kernel/cpufeature.c | 69 ++++++++++++++++++++++++++++--------------
>>>>> 1 file changed, 47 insertions(+), 22 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>>>>> index a73a592..6698404 100644
>>>>> --- a/arch/arm64/kernel/cpufeature.c
>>>>> +++ b/arch/arm64/kernel/cpufeature.c
>>>>> @@ -52,6 +52,8 @@
>>>>> DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
>>>>> EXPORT_SYMBOL(cpu_hwcaps);
>>>>>
>>>>> +static void __init setup_early_feature_capabilities(void);
>>>>> +
>>>>> /*
>>>>> * Flag to indicate if we have computed the system wide
>>>>> * capabilities based on the boot time active CPUs. This
>>>>> @@ -542,6 +544,8 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
>>>>> init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
>>>>> sve_init_vq_map();
>>>>> }
>>>>> +
>>>>> + setup_early_feature_capabilities();
>>>>> }
>>>>>
>>>>> static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
>>>>> @@ -846,7 +850,7 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
>>>>> ID_AA64PFR0_FP_SHIFT) < 0;
>>>>> }
>>>>>
>>>>> -static const struct arm64_cpu_capabilities arm64_features[] = {
>>>>> +static const struct arm64_cpu_capabilities arm64_early_features[] = {
>>>>> {
>>>>> .desc = "GIC system register CPU interface",
>>>>> .capability = ARM64_HAS_SYSREG_GIC_CPUIF,
>>>>> @@ -857,6 +861,10 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
>>>>> .sign = FTR_UNSIGNED,
>>>>> .min_field_value = 1,
>>>>> },
>>>>> + {}
>>>>> +};
>>>>> +
>>>>
>>>>
>>>> Julien,
>>>>
>>>> One potential problem with this is that we don't have a way
>>>> to make this work on a "theoretical" system with and without
>>>> GIC system reg interface. i.e, if we don't have the CONFIG
>>>> enabled for using ICC system regs for IRQ flags, the kernel
>>>> could still panic. I understand this is not a "normal" configuration
>>>> but, may be we could make the panic option based on whether
>>>> we actually use the system regs early enough ?
>>>>
>>>
>>> I see, however I'm not sure what happens in the GIC drivers if we have a CPU running with a GICv3 and other CPUs with something else... But of course this is not technically limited by the arm64 capabilities handling.
>>>
>>> What behaviour would you be looking for? A way to prevent the CPU to be brought up instead of panicking?
>>>
>>
>> If we have the CONFIG enabled for using system regs, we can continue
>> to panic the system. Otherwise, we should ignore the mismatch early,
>> as we don't use the system register access unless all boot time active
>> CPUs have it.
>>
>
> Hmmm, we use the CPUIF (if available) in the first CPU pretty much as soon as we re-enable interrupts in the GICv3 driver, which is way before the other CPUs are brought up.
Isn't this CPUIF access an alternative, patched only when CPUIF feature
enabled ? (which is done only after all the allowed SMP CPUs are brought up )
>
> other CPUs get to die_early().
Really ? I thought only late CPUs are sent to die_early().
>
>> In a nutshell, this is an early feature only if the CONFIG is enabled,
>> otherwise should fall back to the normal behavior.
>>
>
> Maybe we should just not panic and let the mismatching CPUs die.
> It's a system wide feature and linux will try to make the other CPUs match the boot CPU's config anyway.
>
Suzuki
On 22/01/18 15:13, Suzuki K Poulose wrote:
> On 22/01/18 15:01, Julien Thierry wrote:
>>
>>
>> On 22/01/18 14:45, Suzuki K Poulose wrote:
>>> On 22/01/18 12:21, Julien Thierry wrote:
>>>>
>>>>
>>>> On 22/01/18 12:05, Suzuki K Poulose wrote:
>>>>> On 17/01/18 11:54, Julien Thierry wrote:
>>>>>> From: Daniel Thompson <[email protected]>
>>>>>>
>>>>>> Currently it is not possible to detect features of the boot CPU
>>>>>> until the other CPUs have been brought up.
>>>>>>
>>>>>> This prevents us from reacting to features of the boot CPU until
>>>>>> fairly late in the boot process. To solve this we allow a subset
>>>>>> of features (that are likely to be common to all clusters) to be
>>>>>> detected based on the boot CPU alone.
>>>>>>
>>>>>> Signed-off-by: Daniel Thompson <[email protected]>
>>>>>> [[email protected]: check non-boot cpu missing early
>>>>>> features, avoid
>>>>>> duplicates between early features and normal
>>>>>> features]
>>>>>> Signed-off-by: Julien Thierry <[email protected]>
>>>>>> Cc: Catalin Marinas <[email protected]>
>>>>>> Cc: Will Deacon <[email protected]>
>>>>>> Cc: Suzuki K Poulose <[email protected]>
>>>>>> ---
>>>>>> arch/arm64/kernel/cpufeature.c | 69
>>>>>> ++++++++++++++++++++++++++++--------------
>>>>>> 1 file changed, 47 insertions(+), 22 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/arm64/kernel/cpufeature.c
>>>>>> b/arch/arm64/kernel/cpufeature.c
>>>>>> index a73a592..6698404 100644
>>>>>> --- a/arch/arm64/kernel/cpufeature.c
>>>>>> +++ b/arch/arm64/kernel/cpufeature.c
>>>>>> @@ -52,6 +52,8 @@
>>>>>> DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
>>>>>> EXPORT_SYMBOL(cpu_hwcaps);
>>>>>>
>>>>>> +static void __init setup_early_feature_capabilities(void);
>>>>>> +
>>>>>> /*
>>>>>> * Flag to indicate if we have computed the system wide
>>>>>> * capabilities based on the boot time active CPUs. This
>>>>>> @@ -542,6 +544,8 @@ void __init init_cpu_features(struct
>>>>>> cpuinfo_arm64 *info)
>>>>>> init_cpu_ftr_reg(SYS_ZCR_EL1, info->reg_zcr);
>>>>>> sve_init_vq_map();
>>>>>> }
>>>>>> +
>>>>>> + setup_early_feature_capabilities();
>>>>>> }
>>>>>>
>>>>>> static void update_cpu_ftr_reg(struct arm64_ftr_reg *reg, u64 new)
>>>>>> @@ -846,7 +850,7 @@ static bool has_no_fpsimd(const struct
>>>>>> arm64_cpu_capabilities *entry, int __unus
>>>>>> ID_AA64PFR0_FP_SHIFT) < 0;
>>>>>> }
>>>>>>
>>>>>> -static const struct arm64_cpu_capabilities arm64_features[] = {
>>>>>> +static const struct arm64_cpu_capabilities arm64_early_features[]
>>>>>> = {
>>>>>> {
>>>>>> .desc = "GIC system register CPU interface",
>>>>>> .capability = ARM64_HAS_SYSREG_GIC_CPUIF,
>>>>>> @@ -857,6 +861,10 @@ static bool has_no_fpsimd(const struct
>>>>>> arm64_cpu_capabilities *entry, int __unus
>>>>>> .sign = FTR_UNSIGNED,
>>>>>> .min_field_value = 1,
>>>>>> },
>>>>>> + {}
>>>>>> +};
>>>>>> +
>>>>>
>>>>>
>>>>> Julien,
>>>>>
>>>>> One potential problem with this is that we don't have a way
>>>>> to make this work on a "theoretical" system with and without
>>>>> GIC system reg interface. i.e, if we don't have the CONFIG
>>>>> enabled for using ICC system regs for IRQ flags, the kernel
>>>>> could still panic. I understand this is not a "normal" configuration
>>>>> but, may be we could make the panic option based on whether
>>>>> we actually use the system regs early enough ?
>>>>>
>>>>
>>>> I see, however I'm not sure what happens in the GIC drivers if we
>>>> have a CPU running with a GICv3 and other CPUs with something
>>>> else... But of course this is not technically limited by the arm64
>>>> capabilities handling.
>>>>
>>>> What behaviour would you be looking for? A way to prevent the CPU to
>>>> be brought up instead of panicking?
>>>>
>>>
>>> If we have the CONFIG enabled for using system regs, we can continue
>>> to panic the system. Otherwise, we should ignore the mismatch early,
>>> as we don't use the system register access unless all boot time active
>>> CPUs have it.
>>>
>>
>> Hmmm, we use the CPUIF (if available) in the first CPU pretty much as
>> soon as we re-enable interrupts in the GICv3 driver, which is way
>> before the other CPUs are brought up.
>
> Isn't this CPUIF access an alternative, patched only when CPUIF feature
> enabled ? (which is done only after all the allowed SMP CPUs are brought
> up )
The GICv3 doesn't rely on the alternatives, most of the operations are
done via the CPUIF (ack IRQ, eoi, send sgi, etc ...).
So once GICv3 has been successfully probed and interrupts enabled, CPUIF
might get used by the GICv3 driver.
>>
>> other CPUs get to die_early().
>
> Really ? I thought only late CPUs are sent to die_early().
Hmmm, I might be wrong here but that was my understanding of the call to
verify_local_cpu_features in verify_local_cpu_capabilities.
>>
>>> In a nutshell, this is an early feature only if the CONFIG is enabled,
>>> otherwise should fall back to the normal behavior.
>>>
>>
>> Maybe we should just not panic and let the mismatching CPUs die.
>> It's a system wide feature and linux will try to make the other CPUs
>> match the boot CPU's config anyway.
>>
>
> Suzuki
--
Julien Thierry
On 22/01/18 15:23, Julien Thierry wrote:
>
>
> On 22/01/18 15:13, Suzuki K Poulose wrote:
>> On 22/01/18 15:01, Julien Thierry wrote:
>>>
>>>
>>> On 22/01/18 14:45, Suzuki K Poulose wrote:
>>>> On 22/01/18 12:21, Julien Thierry wrote:
>>>>>
>>>>>
>>>>> On 22/01/18 12:05, Suzuki K Poulose wrote:
>>>>>> On 17/01/18 11:54, Julien Thierry wrote:
>>>>>>> From: Daniel Thompson <[email protected]>
>>>>>> Julien,
>>>>>>
>>>>>> One potential problem with this is that we don't have a way
>>>>>> to make this work on a "theoretical" system with and without
>>>>>> GIC system reg interface. i.e, if we don't have the CONFIG
>>>>>> enabled for using ICC system regs for IRQ flags, the kernel
>>>>>> could still panic. I understand this is not a "normal" configuration
>>>>>> but, may be we could make the panic option based on whether
>>>>>> we actually use the system regs early enough ?
>>>>>>
>>>>>
>>>>> I see, however I'm not sure what happens in the GIC drivers if we have a CPU running with a GICv3 and other CPUs with something else... But of course this is not technically limited by the arm64 capabilities handling.
>>>>>
>>>>> What behaviour would you be looking for? A way to prevent the CPU to be brought up instead of panicking?
>>>>>
>>>>
>>>> If we have the CONFIG enabled for using system regs, we can continue
>>>> to panic the system. Otherwise, we should ignore the mismatch early,
>>>> as we don't use the system register access unless all boot time active
>>>> CPUs have it.
>>>>
>>>
>>> Hmmm, we use the CPUIF (if available) in the first CPU pretty much as soon as we re-enable interrupts in the GICv3 driver, which is way before the other CPUs are brought up.
>>
>> Isn't this CPUIF access an alternative, patched only when CPUIF feature
>> enabled ? (which is done only after all the allowed SMP CPUs are brought up )
>
> The GICv3 doesn't rely on the alternatives, most of the operations are done via the CPUIF (ack IRQ, eoi, send sgi, etc ...).
>
> So once GICv3 has been successfully probed and interrupts enabled, CPUIF might get used by the GICv3 driver.
>
Aha, OK. I am sorry. I was thinking that the ARM64_HAS_SYSREG_GIC_CPUIF was used just for that.
In that case, I think you are not breaking any current behavior, so thats fine.
>>>
>>> other CPUs get to die_early().
>>
>> Really ? I thought only late CPUs are sent to die_early().
>
> Hmmm, I might be wrong here but that was my understanding of the call to verify_local_cpu_features in verify_local_cpu_capabilities.
>
The verify_local_cpu_features() is invoked only if the CPU is brought up late from userspace,
after we have finalised the system wide capabilities.
Sorry for the noise.
Suzuki
Hi, Julien
On 2018/1/17 19:54, Julien Thierry wrote:
> The values non secure EL1 needs to use for priority registers depends on
> the value of SCR_EL3.FIQ.
>
> Since we don't have access to SCR_EL3, we fake an interrupt and compare the
> GIC priority with the one present in the [re]distributor.
>
> Also, add firmware requirements related to SCR_EL3.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Jason Cooper <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> ---
> Documentation/arm64/booting.txt | 5 +++
> arch/arm64/include/asm/arch_gicv3.h | 5 +++
> arch/arm64/include/asm/irqflags.h | 6 +++
> arch/arm64/include/asm/sysreg.h | 1 +
> drivers/irqchip/irq-gic-v3.c | 86 +++++++++++++++++++++++++++++++++++++
> 5 files changed, 103 insertions(+)
>
> diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
> index 8d0df62..e387938 100644
> --- a/Documentation/arm64/booting.txt
> +++ b/Documentation/arm64/booting.txt
> @@ -188,6 +188,11 @@ Before jumping into the kernel, the following conditions must be met:
> the kernel image will be entered must be initialised by software at a
> higher exception level to prevent execution in an UNKNOWN state.
>
> + - SCR_EL3.FIQ must have the same value across all CPUs the kernel is
> + executing on.
> + - The value of SCR_EL3.FIQ must be the same as the one present at boot
> + time whenever the kernel is executing.
> +
> For systems with a GICv3 interrupt controller to be used in v3 mode:
> - If EL3 is present:
> ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1.
> diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h
> index 490bb3a..ac7b7f6 100644
> --- a/arch/arm64/include/asm/arch_gicv3.h
> +++ b/arch/arm64/include/asm/arch_gicv3.h
> @@ -124,6 +124,11 @@ static inline void gic_write_bpr1(u32 val)
> write_sysreg_s(val, SYS_ICC_BPR1_EL1);
> }
>
> +static inline u32 gic_read_rpr(void)
> +{
> + return read_sysreg_s(SYS_ICC_RPR_EL1);
> +}
> +
> #define gic_read_typer(c) readq_relaxed(c)
> #define gic_write_irouter(v, c) writeq_relaxed(v, c)
> #define gic_read_lpir(c) readq_relaxed(c)
> diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
> index 3d5d443..d25e7ee 100644
> --- a/arch/arm64/include/asm/irqflags.h
> +++ b/arch/arm64/include/asm/irqflags.h
> @@ -217,6 +217,12 @@ static inline int arch_irqs_disabled_flags(unsigned long flags)
> !(ARCH_FLAGS_GET_PMR(flags) & ICC_PMR_EL1_EN_BIT);
> }
>
> +/* Mask IRQs at CPU level instead of GIC level */
> +static inline void arch_irqs_daif_disable(void)
> +{
> + asm volatile ("msr daifset, #2" : : : "memory");
> +}
> +
> void maybe_switch_to_sysreg_gic_cpuif(void);
>
> #endif /* CONFIG_IRQFLAGS_GIC_MASKING */
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 08cc885..46fa869 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -304,6 +304,7 @@
> #define SYS_ICC_SRE_EL1 sys_reg(3, 0, 12, 12, 5)
> #define SYS_ICC_IGRPEN0_EL1 sys_reg(3, 0, 12, 12, 6)
> #define SYS_ICC_IGRPEN1_EL1 sys_reg(3, 0, 12, 12, 7)
> +#define SYS_ICC_RPR_EL1 sys_reg(3, 0, 12, 11, 3)
>
> #define SYS_CONTEXTIDR_EL1 sys_reg(3, 0, 13, 0, 1)
> #define SYS_TPIDR_EL1 sys_reg(3, 0, 13, 0, 4)
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index df51d96..58b5e89 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -63,6 +63,10 @@ struct gic_chip_data {
> static struct gic_chip_data gic_data __read_mostly;
> static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
>
> +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
> +DEFINE_STATIC_KEY_FALSE(have_non_secure_prio_view);
> +#endif
> +
> static struct gic_kvm_info gic_v3_kvm_info;
> static DEFINE_PER_CPU(bool, has_rss);
>
> @@ -997,6 +1001,84 @@ static int partition_domain_translate(struct irq_domain *d,
> .select = gic_irq_domain_select,
> };
>
> +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
> +/*
> + * The behaviours of RPR and PMR registers differ depending on the value of
> + * SCR_EL3.FIQ, while the behaviour of priority registers of the distributor
> + * and redistributors is always the same.
> + *
> + * If SCR_EL3.FIQ == 1, the values used for RPR and PMR are the same as the ones
> + * programmed in the distributor and redistributors registers.
> + *
> + * Otherwise, the value presented by RPR as well as the value which will be
> + * compared against PMR is: (GIC_(R)DIST_PRI[irq] >> 1) | 0x80;
> + *
> + * see GICv3/GICv4 Architecture Specification (IHI0069D):
> + * - section 4.8.1 Non-secure accesses to register fields for Secure interrupt
> + * priorities.
> + * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1
> + * interrupt.
> + */
I think we can use write/read PMR to check if SCR_EL3.FIQ == 1.
Like this:
gic_write_pmr(0xf0);
if (gic_read_pmr() == 0xf0) // if SCR_EL3.FIQ == 0, the read value is
0xf8 here
static_branch_enable(&have_non_secure_prio_view);
Thanks,
Yang
> +static void __init gic_detect_prio_view(void)
> +{
> + /*
> + * Randomly picked SGI, must be <= 8 as other SGIs might be
> + * used by the firmware.
> + */
> + const u32 fake_irqnr = 7;
> + const u32 fake_irqmask = BIT(fake_irqnr);
> + void __iomem * const rdist_base = gic_data_rdist_sgi_base();
> + unsigned long irq_flags;
> + u32 acked_irqnr;
> + bool was_enabled;
> +
> + irq_flags = arch_local_save_flags();
> +
> + arch_irqs_daif_disable();
> +
> + was_enabled = (readl_relaxed(rdist_base + GICD_ISENABLER) &
> + fake_irqmask);
> +
> + if (!was_enabled)
> + writel_relaxed(fake_irqmask, rdist_base + GICD_ISENABLER);
> +
> + /* Need to unmask to acknowledge the IRQ */
> + gic_write_pmr(ICC_PMR_EL1_UNMASKED);
> + dsb(sy);
> +
> + /* Fake a pending SGI */
> + writel_relaxed(fake_irqmask, rdist_base + GICD_ISPENDR);
> + dsb(sy);
> +
> + do {
> + acked_irqnr = gic_read_iar();
> +
> + if (acked_irqnr == fake_irqnr) {
> + if (gic_read_rpr() == gic_get_irq_prio(acked_irqnr,
> + rdist_base))
> + static_branch_enable(&have_non_secure_prio_view);
> + } else {
> + pr_warn("Unexpected IRQ for priority detection: %u\n",
> + acked_irqnr);
> + }
> +
> + if (acked_irqnr < 1020) {
> + gic_write_eoir(acked_irqnr);
> + if (static_key_true(&supports_deactivate))
> + gic_write_dir(acked_irqnr);
> + }
> + } while (acked_irqnr == ICC_IAR1_EL1_SPURIOUS);
> +
> + /* Restore enabled state */
> + if (!was_enabled) {
> + writel_relaxed(fake_irqmask, rdist_base + GICD_ICENABLER);
> + gic_redist_wait_for_rwp();
> + }
> +
> + arch_local_irq_restore(irq_flags);
> +}
> +#endif
> +
> static int __init gic_init_bases(void __iomem *dist_base,
> struct redist_region *rdist_regs,
> u32 nr_redist_regions,
> @@ -1057,6 +1139,10 @@ static int __init gic_init_bases(void __iomem *dist_base,
> gic_cpu_init();
> gic_cpu_pm_init();
>
> +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
> + gic_detect_prio_view();
> +#endif
> +
> return 0;
>
> out_free:
> --
> 1.9.1
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> .
>
Hi Julien,
I am interested in evaluating if using this is feasible for our
Android devices. There is quite a usecase for lockup detection that it
seems worthwhile if it works well. Atleast I feel this can be used a
debug option considering the performance downgrade.
Do you have more details of if any GICv3 based system will work, or is
there a way an SoC can be misconfigured so that this series will not
work? I think Marc told me that's possible, but I wasn't sure. I will
be quite happy if it works on SoC as long as they have the requisite
GIC version.
Some more questions below:
On Wed, Jan 17, 2018 at 3:54 AM, Julien Thierry <[email protected]> wrote:
> Hi,
>
> This series is a continuation of the work started by Daniel [1]. The goal
> is to use GICv3 interrupt priorities to simulate an NMI.
>
> To achieve this, set two priorities, one for standard interrupts and
> another, higher priority, for NMIs. Whenever we want to disable interrupts,
> we mask the standard priority instead so NMIs can still be raised. Some
> corner cases though still require to actually mask all interrupts
> effectively disabling the NMI.
>
> Of course, using priority masking instead of PSR.I comes at some cost. On
> hackbench, the drop of performance seems to be >1% on average for this
> version. I can only attribute that to recent changes in the kernel as
Do you have more specific performance data on the performance overhead
with this series?
> hackbench seems slightly slower compared to my other benchmarks while the
> runs with the use of GICv3 priorities have stayed in the same time frames.
> KVM Guests do not seem to be affected preformance-wise by the host using
> PMR to mask interrupts or not.
>
> Currently, only PPIs and SPIs can be set as NMIs. IPIs being currently
> hardcoded IRQ numbers, there isn't a generic interface to set SGIs as NMI
> for now. I don't think there is any reason LPIs should be allowed to be set
> as NMI as they do not have an active state.
> When an NMI is active on a CPU, no other NMI can be triggered on the CPU.
>
>
> Requirements to use this:
> - Have GICv3
> - SCR_EL3.FIQ is set to 1 when linux runs
Ah I see it mentioned here. Again, can you clarify if this is
something that can be misconfigured? Is it something that the
bootloader sets?
Sorry if these questions sound premature, I haven't yet taken a closer
look at the series.
thanks,
- Joel
On Wed, Jan 17, 2018 at 4:10 AM, Julien Thierry <[email protected]> wrote:
> Hi,
>
> On 17/01/18 11:54, Julien Thierry wrote:
>>
>> This series is a continuation of the work started by Daniel [1]. The goal
>> is to use GICv3 interrupt priorities to simulate an NMI.
>>
>
>
> I have submitted a separate series making use of this feature for the ARM
> PMUv3 interrupt [1].
I guess the hard lockup detector using NMI could be a nice next step
to see how well it works with lock up detection. That's the main
usecase for my interest. However, perf profiling is also a strong one.
thanks,
- Joel
Hi Joel,
Thanks for the interest.
On 29/04/18 07:35, Joel Fernandes wrote:
> Hi Julien,
>
> I am interested in evaluating if using this is feasible for our
> Android devices. There is quite a usecase for lockup detection that it
> seems worthwhile if it works well. Atleast I feel this can be used a
> debug option considering the performance downgrade.
>
> Do you have more details of if any GICv3 based system will work, or is
> there a way an SoC can be misconfigured so that this series will not
> work? I think Marc told me that's possible, but I wasn't sure. I will
> be quite happy if it works on SoC as long as they have the requisite
> GIC version.
>
> Some more questions below:
>
> On Wed, Jan 17, 2018 at 3:54 AM, Julien Thierry <[email protected]> wrote:
>> Hi,
>>
>> This series is a continuation of the work started by Daniel [1]. The goal
>> is to use GICv3 interrupt priorities to simulate an NMI.
>>
>> To achieve this, set two priorities, one for standard interrupts and
>> another, higher priority, for NMIs. Whenever we want to disable interrupts,
>> we mask the standard priority instead so NMIs can still be raised. Some
>> corner cases though still require to actually mask all interrupts
>> effectively disabling the NMI.
>>
>> Of course, using priority masking instead of PSR.I comes at some cost. On
>> hackbench, the drop of performance seems to be >1% on average for this
>> version. I can only attribute that to recent changes in the kernel as
>
> Do you have more specific performance data on the performance overhead
> with this series?
>
Not at the moment. I was planning on doing a v3 anyway considering this
series is getting a bit old and the GICv3 driver has had some modifications.
Once I get to it I can try to have more detailed performance data on a
recent kernel. I've really only measured the performance on hackbench
and on kernel build from defconfig (and for the kernel build the
performance difference was completely hidden by the noise).
>> hackbench seems slightly slower compared to my other benchmarks while the
>> runs with the use of GICv3 priorities have stayed in the same time frames.
>> KVM Guests do not seem to be affected preformance-wise by the host using
>> PMR to mask interrupts or not.
>>
>> Currently, only PPIs and SPIs can be set as NMIs. IPIs being currently
>> hardcoded IRQ numbers, there isn't a generic interface to set SGIs as NMI
>> for now. I don't think there is any reason LPIs should be allowed to be set
>> as NMI as they do not have an active state.
>> When an NMI is active on a CPU, no other NMI can be triggered on the CPU.
>>
>>
>> Requirements to use this:
>> - Have GICv3
>> - SCR_EL3.FIQ is set to 1 when linux runs
>
> Ah I see it mentioned here. Again, can you clarify if this is
> something that can be misconfigured? Is it something that the
> bootloader sets?
>
Yes, this is something that the bootloader sets and we have seen a few
cases where it is set to 0, so it can be "misconfigured".
It is not impossible to handle this case, but this bit affects the view
the GICv3 CPU interface has on interrupt priority values. However it
requires to add some conditions in both the interrupt handling and
masking/unmasking code, so ideally we would avoid adding things to this.
But the idea is that Linux only deals with group 1 interrupts, and group
1 interrupts are only signaled as FIQs when the execution state is
secure or at EL3, which should never happen in Linux's case. So ideally
we'd like firmwares to set up this bit properly rather than to have to
deal with both cases when only one of them makes sense for Linux.
> Sorry if these questions sound premature, I haven't yet taken a closer
> look at the series.
>
Cheers,
--
Julien Thierry
On 29/04/18 07:37, Joel Fernandes wrote:
> On Wed, Jan 17, 2018 at 4:10 AM, Julien Thierry <[email protected]> wrote:
>> Hi,
>>
>> On 17/01/18 11:54, Julien Thierry wrote:
>>>
>>> This series is a continuation of the work started by Daniel [1]. The goal
>>> is to use GICv3 interrupt priorities to simulate an NMI.
>>>
>>
>>
>> I have submitted a separate series making use of this feature for the ARM
>> PMUv3 interrupt [1].
>
> I guess the hard lockup detector using NMI could be a nice next step
> to see how well it works with lock up detection. That's the main
> usecase for my interest. However, perf profiling is also a strong one.
>
From my understanding, Linux's hardlockup detector already uses the ARM
PMU interrupt to check whether some task is stuck. I haven't looked at
the details of the implementation yet, but in theory having the PMU
interrupt as NMI should make the hard lockup detector use the NMI.
When I do the v3, I'll have a look at this to check whether the
hardlockup detector works fine when using NMI.
Cheers,
--
Julien Thierry
On Mon, Apr 30, 2018 at 10:53:17AM +0100, Julien Thierry wrote:
>
>
> On 29/04/18 07:37, Joel Fernandes wrote:
> > On Wed, Jan 17, 2018 at 4:10 AM, Julien Thierry <[email protected]> wrote:
> > > Hi,
> > >
> > > On 17/01/18 11:54, Julien Thierry wrote:
> > > >
> > > > This series is a continuation of the work started by Daniel [1]. The goal
> > > > is to use GICv3 interrupt priorities to simulate an NMI.
> > > >
> > >
> > >
> > > I have submitted a separate series making use of this feature for the ARM
> > > PMUv3 interrupt [1].
> >
> > I guess the hard lockup detector using NMI could be a nice next step
> > to see how well it works with lock up detection. That's the main
> > usecase for my interest. However, perf profiling is also a strong one.
> >
>
> From my understanding, Linux's hardlockup detector already uses the ARM PMU
> interrupt to check whether some task is stuck. I haven't looked at the
> details of the implementation yet, but in theory having the PMU interrupt as
> NMI should make the hard lockup detector use the NMI.
>
> When I do the v3, I'll have a look at this to check whether the hardlockup
> detector works fine when using NMI.
That's what I saw on arch/arm (with some of the much older FIQ work).
Once you have PMU and the appropriate config to *admit* to supporting
hard lockup then it will "just work" and be setup automatically during
kernel boot.
Actually the problem then becomes that if you want to use the PMU
for anything else then you may end up having to disable the hard
lockup detector.
Daniel.
> > On 29/04/18 07:37, Joel Fernandes wrote:
> > > On Wed, Jan 17, 2018 at 4:10 AM, Julien Thierry <
[email protected]> wrote:
> > > > Hi,
> > > >
> > > > On 17/01/18 11:54, Julien Thierry wrote:
> > > > >
> > > > > This series is a continuation of the work started by Daniel [1].
The goal
> > > > > is to use GICv3 interrupt priorities to simulate an NMI.
> > > > >
> > > >
> > > >
> > > > I have submitted a separate series making use of this feature for
the ARM
> > > > PMUv3 interrupt [1].
> > >
> > > I guess the hard lockup detector using NMI could be a nice next step
> > > to see how well it works with lock up detection. That's the main
> > > usecase for my interest. However, perf profiling is also a strong one.
> > >
> >
> > From my understanding, Linux's hardlockup detector already uses the ARM
PMU
> > interrupt to check whether some task is stuck. I haven't looked at the
> > details of the implementation yet, but in theory having the PMU
interrupt as
> > NMI should make the hard lockup detector use the NMI.
> >
> > When I do the v3, I'll have a look at this to check whether the
hardlockup
> > detector works fine when using NMI.
> That's what I saw on arch/arm (with some of the much older FIQ work).
> Once you have PMU and the appropriate config to *admit* to supporting
> hard lockup then it will "just work" and be setup automatically during
> kernel boot.
> Actually the problem then becomes that if you want to use the PMU
> for anything else then you may end up having to disable the hard
> lockup detector.
This problem is not anything pseudo-NMI specific though right?
Contention/constraints on PMU resources should be a problem even on
platforms with real NMI.
thanks,
- Joel
On Mon, Apr 30, 2018 at 2:46 AM Julien Thierry <[email protected]>
wrote:
[...]
> > On Wed, Jan 17, 2018 at 3:54 AM, Julien Thierry <[email protected]>
wrote:
> >> Hi,
> >>
> >> This series is a continuation of the work started by Daniel [1]. The
goal
> >> is to use GICv3 interrupt priorities to simulate an NMI.
> >>
> >> To achieve this, set two priorities, one for standard interrupts and
> >> another, higher priority, for NMIs. Whenever we want to disable
interrupts,
> >> we mask the standard priority instead so NMIs can still be raised. Some
> >> corner cases though still require to actually mask all interrupts
> >> effectively disabling the NMI.
> >>
> >> Of course, using priority masking instead of PSR.I comes at some cost.
On
> >> hackbench, the drop of performance seems to be >1% on average for this
> >> version. I can only attribute that to recent changes in the kernel as
> >
> > Do you have more specific performance data on the performance overhead
> > with this series?
> >
> Not at the moment. I was planning on doing a v3 anyway considering this
> series is getting a bit old and the GICv3 driver has had some
modifications.
Great! Looking forward to it, will try to find some time to review this set
as well.
> Once I get to it I can try to have more detailed performance data on a
> recent kernel. I've really only measured the performance on hackbench
> and on kernel build from defconfig (and for the kernel build the
> performance difference was completely hidden by the noise).
> >> hackbench seems slightly slower compared to my other benchmarks while
the
> >> runs with the use of GICv3 priorities have stayed in the same time
frames.
> >> KVM Guests do not seem to be affected preformance-wise by the host
using
> >> PMR to mask interrupts or not.
> >>
> >> Currently, only PPIs and SPIs can be set as NMIs. IPIs being currently
> >> hardcoded IRQ numbers, there isn't a generic interface to set SGIs as
NMI
> >> for now. I don't think there is any reason LPIs should be allowed to
be set
> >> as NMI as they do not have an active state.
> >> When an NMI is active on a CPU, no other NMI can be triggered on the
CPU.
> >>
> >>
> >> Requirements to use this:
> >> - Have GICv3
> >> - SCR_EL3.FIQ is set to 1 when linux runs
> >
> > Ah I see it mentioned here. Again, can you clarify if this is
> > something that can be misconfigured? Is it something that the
> > bootloader sets?
> >
> Yes, this is something that the bootloader sets and we have seen a few
> cases where it is set to 0, so it can be "misconfigured".
> It is not impossible to handle this case, but this bit affects the view
> the GICv3 CPU interface has on interrupt priority values. However it
> requires to add some conditions in both the interrupt handling and
> masking/unmasking code, so ideally we would avoid adding things to this.
> But the idea is that Linux only deals with group 1 interrupts, and group
> 1 interrupts are only signaled as FIQs when the execution state is
> secure or at EL3, which should never happen in Linux's case. So ideally
> we'd like firmwares to set up this bit properly rather than to have to
> deal with both cases when only one of them makes sense for Linux.
From what I see, on all our platforms, FIQs are delivered to the secure
monitor only. Which is the reason for this patchset in the first place. I
can't imagine a usecase that is not designed like this (and have not come
across this), so its probably Ok to just assume SCR_EL3.FIQ is to 1.
In the future, if SCR_EL3.FIQ is set 0, then the NMI should use the FIQ
mechanism delivered to the non-secure OS.
Does what I say make sense or was I just shooting arrows in the dark? :-P
thanks,
- Joel
On Tue, May 01, 2018 at 06:18:44PM +0000, Joel Fernandes wrote:
> > > From my understanding, Linux's hardlockup detector already uses the ARM
> PMU
> > > interrupt to check whether some task is stuck. I haven't looked at the
> > > details of the implementation yet, but in theory having the PMU
> interrupt as
> > > NMI should make the hard lockup detector use the NMI.
> > >
> > > When I do the v3, I'll have a look at this to check whether the
> hardlockup
> > > detector works fine when using NMI.
>
> > That's what I saw on arch/arm (with some of the much older FIQ work).
>
> > Once you have PMU and the appropriate config to *admit* to supporting
> > hard lockup then it will "just work" and be setup automatically during
> > kernel boot.
>
> > Actually the problem then becomes that if you want to use the PMU
> > for anything else then you may end up having to disable the hard
> > lockup detector.
>
> This problem is not anything pseudo-NMI specific though right?
> Contention/constraints on PMU resources should be a problem even on
> platforms with real NMI.
Quite so. Nothing specific to pseudo-NMI; merely part of life's rich tapestry.
Moreover it is a potential surprise for anyone coming from x86 since I think
the performance monitors make it easier to run hard lockup alongside
some simple perf activity.
Either way, if you are impacted, its easy to disable the hard lockup
using procfs.
Daniel.
On 01/05/18 21:51, Joel Fernandes wrote:
> On Mon, Apr 30, 2018 at 2:46 AM Julien Thierry <[email protected]>
> wrote:
> [...]
>>> On Wed, Jan 17, 2018 at 3:54 AM, Julien Thierry <[email protected]>
> wrote:
>>>> Hi,
>>>>
>>>> This series is a continuation of the work started by Daniel [1]. The
> goal
>>>> is to use GICv3 interrupt priorities to simulate an NMI.
>>>>
>>>> To achieve this, set two priorities, one for standard interrupts and
>>>> another, higher priority, for NMIs. Whenever we want to disable
> interrupts,
>>>> we mask the standard priority instead so NMIs can still be raised. Some
>>>> corner cases though still require to actually mask all interrupts
>>>> effectively disabling the NMI.
>>>>
>>>> Of course, using priority masking instead of PSR.I comes at some cost.
> On
>>>> hackbench, the drop of performance seems to be >1% on average for this
>>>> version. I can only attribute that to recent changes in the kernel as
>>>
>>> Do you have more specific performance data on the performance overhead
>>> with this series?
>>>
>
>> Not at the moment. I was planning on doing a v3 anyway considering this
>> series is getting a bit old and the GICv3 driver has had some
> modifications.
>
> Great! Looking forward to it, will try to find some time to review this set
> as well.
>
>> Once I get to it I can try to have more detailed performance data on a
>> recent kernel. I've really only measured the performance on hackbench
>> and on kernel build from defconfig (and for the kernel build the
>> performance difference was completely hidden by the noise).
>
>>>> hackbench seems slightly slower compared to my other benchmarks while
> the
>>>> runs with the use of GICv3 priorities have stayed in the same time
> frames.
>>>> KVM Guests do not seem to be affected preformance-wise by the host
> using
>>>> PMR to mask interrupts or not.
>>>>
>>>> Currently, only PPIs and SPIs can be set as NMIs. IPIs being currently
>>>> hardcoded IRQ numbers, there isn't a generic interface to set SGIs as
> NMI
>>>> for now. I don't think there is any reason LPIs should be allowed to
> be set
>>>> as NMI as they do not have an active state.
>>>> When an NMI is active on a CPU, no other NMI can be triggered on the
> CPU.
>>>>
>>>>
>>>> Requirements to use this:
>>>> - Have GICv3
>>>> - SCR_EL3.FIQ is set to 1 when linux runs
>>>
>>> Ah I see it mentioned here. Again, can you clarify if this is
>>> something that can be misconfigured? Is it something that the
>>> bootloader sets?
>>>
>
>> Yes, this is something that the bootloader sets and we have seen a few
>> cases where it is set to 0, so it can be "misconfigured".
>
>> It is not impossible to handle this case, but this bit affects the view
>> the GICv3 CPU interface has on interrupt priority values. However it
>> requires to add some conditions in both the interrupt handling and
>> masking/unmasking code, so ideally we would avoid adding things to this.
>
>> But the idea is that Linux only deals with group 1 interrupts, and group
>> 1 interrupts are only signaled as FIQs when the execution state is
>> secure or at EL3, which should never happen in Linux's case. So ideally
>> we'd like firmwares to set up this bit properly rather than to have to
>> deal with both cases when only one of them makes sense for Linux.
>
> From what I see, on all our platforms, FIQs are delivered to the secure
> monitor only. Which is the reason for this patchset in the first place. I
> can't imagine a usecase that is not designed like this (and have not come
> across this), so its probably Ok to just assume SCR_EL3.FIQ is to 1.
>
> In the future, if SCR_EL3.FIQ is set 0, then the NMI should use the FIQ
> mechanism delivered to the non-secure OS.
>
> Does what I say make sense or was I just shooting arrows in the dark? :-P
It would mean teaching Group-0 interrupts to the arm64 kernel. Not an
impossible task, but that'd be catering for a minority of broken
systems. In my book, that's at the absolute bottom of the priority range
(pun intended...).
M.
--
Jazz is not dead. It just smells funny...
Hi,
In order to prepare the v3 of this patchset, I'd like people's opinion
on what this patch does. More below.
On 17/01/18 11:54, Julien Thierry wrote:
> From: Daniel Thompson <[email protected]>
>
> Currently alternatives are applied very late in the boot process (and
> a long time after we enable scheduling). Some alternative sequences,
> such as those that alter the way CPU context is stored, must be applied
> much earlier in the boot sequence.
>
> Introduce apply_alternatives_early() to allow some alternatives to be
> applied immediately after we detect the CPU features of the boot CPU.
>
> Signed-off-by: Daniel Thompson <[email protected]>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm64/include/asm/alternative.h | 1 +
> arch/arm64/kernel/alternative.c | 39 +++++++++++++++++++++++++++++++++---
> arch/arm64/kernel/smp.c | 6 ++++++
> 3 files changed, 43 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
> index 4a85c69..1fc1cdb 100644
> --- a/arch/arm64/include/asm/alternative.h
> +++ b/arch/arm64/include/asm/alternative.h
> @@ -20,6 +20,7 @@ struct alt_instr {
> u8 alt_len; /* size of new instruction(s), <= orig_len */
> };
>
> +void __init apply_alternatives_early(void);
> void __init apply_alternatives_all(void);
> void apply_alternatives(void *start, size_t length);
>
> diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> index 6dd0a3a3..78051d4 100644
> --- a/arch/arm64/kernel/alternative.c
> +++ b/arch/arm64/kernel/alternative.c
> @@ -28,6 +28,18 @@
> #include <asm/sections.h>
> #include <linux/stop_machine.h>
>
> +/*
> + * early-apply features are detected using only the boot CPU and checked on
> + * secondary CPUs startup, even then,
> + * These early-apply features should only include features where we must
> + * patch the kernel very early in the boot process.
> + *
> + * Note that the cpufeature logic *must* be made aware of early-apply
> + * features to ensure they are reported as enabled without waiting
> + * for other CPUs to boot.
> + */
> +#define EARLY_APPLY_FEATURE_MASK BIT(ARM64_HAS_SYSREG_GIC_CPUIF)
> +
Following the change in the cpufeature infrastructure,
ARM64_HAS_SYSREG_GIC_CPUIF will have the scope
ARM64_CPUCAP_SCOPE_BOOT_CPU in order to be checked early in the boot
process.
Now, regarding the early application of alternative, I am wondering
whether we can apply all the alternatives associated with SCOPE_BOOT
features that *do not* have a cpu_enable callback.
Otherwise we can keep the macro to list individually each feature that
is patchable at boot time as the current patch does (or put this info in
a flag within the arm64_cpu_capabilities structure).
Any thoughts or preferences on this?
Thanks,
> #define __ALT_PTR(a,f) ((void *)&(a)->f + (a)->f)
> #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset)
> #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset)
> @@ -105,7 +117,8 @@ static u32 get_alt_insn(struct alt_instr *alt, __le32 *insnptr, __le32 *altinsnp
> return insn;
> }
>
> -static void __apply_alternatives(void *alt_region, bool use_linear_alias)
> +static void __apply_alternatives(void *alt_region, bool use_linear_alias,
> + unsigned long feature_mask)
> {
> struct alt_instr *alt;
> struct alt_region *region = alt_region;
> @@ -115,6 +128,9 @@ static void __apply_alternatives(void *alt_region, bool use_linear_alias)
> u32 insn;
> int i, nr_inst;
>
> + if ((BIT(alt->cpufeature) & feature_mask) == 0)
> + continue;
> +
> if (!cpus_have_cap(alt->cpufeature))
> continue;
>
> @@ -138,6 +154,21 @@ static void __apply_alternatives(void *alt_region, bool use_linear_alias)
> }
>
> /*
> + * This is called very early in the boot process (directly after we run
> + * a feature detect on the boot CPU). No need to worry about other CPUs
> + * here.
> + */
> +void apply_alternatives_early(void)
> +{
> + struct alt_region region = {
> + .begin = (struct alt_instr *)__alt_instructions,
> + .end = (struct alt_instr *)__alt_instructions_end,
> + };
> +
> + __apply_alternatives(®ion, true, EARLY_APPLY_FEATURE_MASK);
> +}
> +
> +/*
> * We might be patching the stop_machine state machine, so implement a
> * really simple polling protocol here.
> */
> @@ -156,7 +187,9 @@ static int __apply_alternatives_multi_stop(void *unused)
> isb();
> } else {
> BUG_ON(patched);
> - __apply_alternatives(®ion, true);
> +
> + __apply_alternatives(®ion, true, ~EARLY_APPLY_FEATURE_MASK);
> +
> /* Barriers provided by the cache flushing */
> WRITE_ONCE(patched, 1);
> }
> @@ -177,5 +210,5 @@ void apply_alternatives(void *start, size_t length)
> .end = start + length,
> };
>
> - __apply_alternatives(®ion, false);
> + __apply_alternatives(®ion, false, -1);
> }
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 551eb07..37361b5 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -453,6 +453,12 @@ void __init smp_prepare_boot_cpu(void)
> * cpuinfo_store_boot_cpu() above.
> */
> update_cpu_errata_workarounds();
> + /*
> + * We now know enough about the boot CPU to apply the
> + * alternatives that cannot wait until interrupt handling
> + * and/or scheduling is enabled.
> + */
> + apply_alternatives_early();
> }
>
> static u64 __init of_get_cpu_mpidr(struct device_node *dn)
> --
> 1.9.1
>
--
Julien Thierry
On Fri, May 04, 2018 at 11:06:56AM +0100, Julien Thierry wrote:
> Hi,
>
> In order to prepare the v3 of this patchset, I'd like people's opinion on
> what this patch does. More below.
>
> On 17/01/18 11:54, Julien Thierry wrote:
> > From: Daniel Thompson <[email protected]>
> >
> > Currently alternatives are applied very late in the boot process (and
> > a long time after we enable scheduling). Some alternative sequences,
> > such as those that alter the way CPU context is stored, must be applied
> > much earlier in the boot sequence.
> >
> > Introduce apply_alternatives_early() to allow some alternatives to be
> > applied immediately after we detect the CPU features of the boot CPU.
> >
> > Signed-off-by: Daniel Thompson <[email protected]>
> > Signed-off-by: Julien Thierry <[email protected]>
> > Cc: Catalin Marinas <[email protected]>
> > Cc: Will Deacon <[email protected]>
> > ---
> > arch/arm64/include/asm/alternative.h | 1 +
> > arch/arm64/kernel/alternative.c | 39 +++++++++++++++++++++++++++++++++---
> > arch/arm64/kernel/smp.c | 6 ++++++
> > 3 files changed, 43 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
> > index 4a85c69..1fc1cdb 100644
> > --- a/arch/arm64/include/asm/alternative.h
> > +++ b/arch/arm64/include/asm/alternative.h
> > @@ -20,6 +20,7 @@ struct alt_instr {
> > u8 alt_len; /* size of new instruction(s), <= orig_len */
> > };
> >
> > +void __init apply_alternatives_early(void);
> > void __init apply_alternatives_all(void);
> > void apply_alternatives(void *start, size_t length);
> >
> > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> > index 6dd0a3a3..78051d4 100644
> > --- a/arch/arm64/kernel/alternative.c
> > +++ b/arch/arm64/kernel/alternative.c
> > @@ -28,6 +28,18 @@
> > #include <asm/sections.h>
> > #include <linux/stop_machine.h>
> >
> > +/*
> > + * early-apply features are detected using only the boot CPU and checked on
> > + * secondary CPUs startup, even then,
> > + * These early-apply features should only include features where we must
> > + * patch the kernel very early in the boot process.
> > + *
> > + * Note that the cpufeature logic *must* be made aware of early-apply
> > + * features to ensure they are reported as enabled without waiting
> > + * for other CPUs to boot.
> > + */
> > +#define EARLY_APPLY_FEATURE_MASK BIT(ARM64_HAS_SYSREG_GIC_CPUIF)
> > +
>
> Following the change in the cpufeature infrastructure,
> ARM64_HAS_SYSREG_GIC_CPUIF will have the scope ARM64_CPUCAP_SCOPE_BOOT_CPU
> in order to be checked early in the boot process.
>
> Now, regarding the early application of alternative, I am wondering whether
> we can apply all the alternatives associated with SCOPE_BOOT features that
> *do not* have a cpu_enable callback.
>
> Otherwise we can keep the macro to list individually each feature that is
> patchable at boot time as the current patch does (or put this info in a flag
> within the arm64_cpu_capabilities structure).
>
> Any thoughts or preferences on this?
If I understand ARM64_CPUCAP_SCOPE_BOOT_CPU correctly it certainly seems
safe to apply the alternatives early (it means that a CPU that
contradicts a CSCOPE_BOOT_CPU won't be allowed to join the system,
right?).
It also makes the system to apply errata fixes more powerful: maybe a
future errata must be applied before we commence threading.
This I have a preference for striping this out and relying on
SCOPE_BOOT_CPU instead. It's a weak preference though since I haven't
studied exactly what errate fixes this will bring into the scope of
early boot.
I don't think you'll regret changing it. This patch has always been
a *total* PITA to rebase so aligning it better with upstream will make
it easier to nurse the patch set until the if-and-when point it hits
the upstream.
Daniel.
> Thanks,
>
> > #define __ALT_PTR(a,f) ((void *)&(a)->f + (a)->f)
> > #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset)
> > #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset)
> > @@ -105,7 +117,8 @@ static u32 get_alt_insn(struct alt_instr *alt, __le32 *insnptr, __le32 *altinsnp
> > return insn;
> > }
> >
> > -static void __apply_alternatives(void *alt_region, bool use_linear_alias)
> > +static void __apply_alternatives(void *alt_region, bool use_linear_alias,
> > + unsigned long feature_mask)
> > {
> > struct alt_instr *alt;
> > struct alt_region *region = alt_region;
> > @@ -115,6 +128,9 @@ static void __apply_alternatives(void *alt_region, bool use_linear_alias)
> > u32 insn;
> > int i, nr_inst;
> >
> > + if ((BIT(alt->cpufeature) & feature_mask) == 0)
> > + continue;
> > +
> > if (!cpus_have_cap(alt->cpufeature))
> > continue;
> >
> > @@ -138,6 +154,21 @@ static void __apply_alternatives(void *alt_region, bool use_linear_alias)
> > }
> >
> > /*
> > + * This is called very early in the boot process (directly after we run
> > + * a feature detect on the boot CPU). No need to worry about other CPUs
> > + * here.
> > + */
> > +void apply_alternatives_early(void)
> > +{
> > + struct alt_region region = {
> > + .begin = (struct alt_instr *)__alt_instructions,
> > + .end = (struct alt_instr *)__alt_instructions_end,
> > + };
> > +
> > + __apply_alternatives(®ion, true, EARLY_APPLY_FEATURE_MASK);
> > +}
> > +
> > +/*
> > * We might be patching the stop_machine state machine, so implement a
> > * really simple polling protocol here.
> > */
> > @@ -156,7 +187,9 @@ static int __apply_alternatives_multi_stop(void *unused)
> > isb();
> > } else {
> > BUG_ON(patched);
> > - __apply_alternatives(®ion, true);
> > +
> > + __apply_alternatives(®ion, true, ~EARLY_APPLY_FEATURE_MASK);
> > +
> > /* Barriers provided by the cache flushing */
> > WRITE_ONCE(patched, 1);
> > }
> > @@ -177,5 +210,5 @@ void apply_alternatives(void *start, size_t length)
> > .end = start + length,
> > };
> >
> > - __apply_alternatives(®ion, false);
> > + __apply_alternatives(®ion, false, -1);
> > }
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 551eb07..37361b5 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -453,6 +453,12 @@ void __init smp_prepare_boot_cpu(void)
> > * cpuinfo_store_boot_cpu() above.
> > */
> > update_cpu_errata_workarounds();
> > + /*
> > + * We now know enough about the boot CPU to apply the
> > + * alternatives that cannot wait until interrupt handling
> > + * and/or scheduling is enabled.
> > + */
> > + apply_alternatives_early();
> > }
> >
> > static u64 __init of_get_cpu_mpidr(struct device_node *dn)
> > --
> > 1.9.1
> >
>
> --
> Julien Thierry
On 05/04/2018 11:06 AM, Julien Thierry wrote:
> Hi,
>
> In order to prepare the v3 of this patchset, I'd like people's opinion
> on what this patch does. More below.
>
> On 17/01/18 11:54, Julien Thierry wrote:
>> From: Daniel Thompson <[email protected]>
>>
>> Currently alternatives are applied very late in the boot process (and
>> a long time after we enable scheduling). Some alternative sequences,
>> such as those that alter the way CPU context is stored, must be applied
>> much earlier in the boot sequence.
>> +/*
>> + * early-apply features are detected using only the boot CPU and
>> checked on
>> + * secondary CPUs startup, even then,
>> + * These early-apply features should only include features where we must
>> + * patch the kernel very early in the boot process.
>> + *
>> + * Note that the cpufeature logic *must* be made aware of early-apply
>> + * features to ensure they are reported as enabled without waiting
>> + * for other CPUs to boot.
>> + */
>> +#define EARLY_APPLY_FEATURE_MASK BIT(ARM64_HAS_SYSREG_GIC_CPUIF)
>> +
>
> Following the change in the cpufeature infrastructure,
> ARM64_HAS_SYSREG_GIC_CPUIF will have the scope
> ARM64_CPUCAP_SCOPE_BOOT_CPU in order to be checked early in the boot
> process.
Thats correct.
>
> Now, regarding the early application of alternative, I am wondering
> whether we can apply all the alternatives associated with SCOPE_BOOT
> features that *do not* have a cpu_enable callback.
>
I don't understand why would you skip the ones that have a "cpu_enable"
callback. Could you explain this a bit ? Ideally you should be able to
apply the alternatives for features with the SCOPE_BOOT, provided the
cpu_enable() callback is written properly.
> Otherwise we can keep the macro to list individually each feature that
> is patchable at boot time as the current patch does (or put this info in
> a flag within the arm64_cpu_capabilities structure)
You may be able to build up the mask of *available* capabilities with
SCOPE_BOOT at boot time by playing some trick in the
setup_boot_cpu_capabilities(), rather than embedding it in the
capabilities (and then parsing the entire table(s)) or manually keeping
track of the capabilities by having a separate mask.
Suzuki
>
> Any thoughts or preferences on this?
>
> Thanks,
>
>> #define __ALT_PTR(a,f) ((void *)&(a)->f + (a)->f)
>> #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset)
>> #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset)
>> @@ -105,7 +117,8 @@ static u32 get_alt_insn(struct alt_instr *alt,
>> __le32 *insnptr, __le32 *altinsnp
>> return insn;
>> }
>>
>> -static void __apply_alternatives(void *alt_region, bool
>> use_linear_alias)
>> +static void __apply_alternatives(void *alt_region, bool
>> use_linear_alias,
>> + unsigned long feature_mask)
>> {
>> struct alt_instr *alt;
>> struct alt_region *region = alt_region;
>> @@ -115,6 +128,9 @@ static void __apply_alternatives(void *alt_region,
>> bool use_linear_alias)
>> u32 insn;
>> int i, nr_inst;
>>
>> + if ((BIT(alt->cpufeature) & feature_mask) == 0)
>> + continue;
>> +
>> if (!cpus_have_cap(alt->cpufeature))
>> continue;
>>
>> @@ -138,6 +154,21 @@ static void __apply_alternatives(void
>> *alt_region, bool use_linear_alias)
>> }
>>
>> /*
>> + * This is called very early in the boot process (directly after we run
>> + * a feature detect on the boot CPU). No need to worry about other CPUs
>> + * here.
>> + */
>> +void apply_alternatives_early(void)
>> +{
>> + struct alt_region region = {
>> + .begin = (struct alt_instr *)__alt_instructions,
>> + .end = (struct alt_instr *)__alt_instructions_end,
>> + };
>> +
>> + __apply_alternatives(®ion, true, EARLY_APPLY_FEATURE_MASK);
>> +}
>> +
>> +/*
>> * We might be patching the stop_machine state machine, so implement a
>> * really simple polling protocol here.
>> */
>> @@ -156,7 +187,9 @@ static int __apply_alternatives_multi_stop(void
>> *unused)
>> isb();
>> } else {
>> BUG_ON(patched);
>> - __apply_alternatives(®ion, true);
>> +
>> + __apply_alternatives(®ion, true, ~EARLY_APPLY_FEATURE_MASK);
>> +
>> /* Barriers provided by the cache flushing */
>> WRITE_ONCE(patched, 1);
>> }
>> @@ -177,5 +210,5 @@ void apply_alternatives(void *start, size_t length)
>> .end = start + length,
>> };
>>
>> - __apply_alternatives(®ion, false);
>> + __apply_alternatives(®ion, false, -1);
>> }
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index 551eb07..37361b5 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -453,6 +453,12 @@ void __init smp_prepare_boot_cpu(void)
>> * cpuinfo_store_boot_cpu() above.
>> */
>> update_cpu_errata_workarounds();
>> + /*
>> + * We now know enough about the boot CPU to apply the
>> + * alternatives that cannot wait until interrupt handling
>> + * and/or scheduling is enabled.
>> + */
>> + apply_alternatives_early();
>> }
>>
>> static u64 __init of_get_cpu_mpidr(struct device_node *dn)
>> --
>> 1.9.1
>>
>
On 09/05/18 22:52, Suzuki K Poulose wrote:
> On 05/04/2018 11:06 AM, Julien Thierry wrote:
>> Hi,
>>
>> In order to prepare the v3 of this patchset, I'd like people's opinion
>> on what this patch does. More below.
>>
>> On 17/01/18 11:54, Julien Thierry wrote:
>>> From: Daniel Thompson <[email protected]>
>>>
>>> Currently alternatives are applied very late in the boot process (and
>>> a long time after we enable scheduling). Some alternative sequences,
>>> such as those that alter the way CPU context is stored, must be applied
>>> much earlier in the boot sequence.
>
>>> +/*
>>> + * early-apply features are detected using only the boot CPU and
>>> checked on
>>> + * secondary CPUs startup, even then,
>>> + * These early-apply features should only include features where we
>>> must
>>> + * patch the kernel very early in the boot process.
>>> + *
>>> + * Note that the cpufeature logic *must* be made aware of early-apply
>>> + * features to ensure they are reported as enabled without waiting
>>> + * for other CPUs to boot.
>>> + */
>>> +#define EARLY_APPLY_FEATURE_MASK BIT(ARM64_HAS_SYSREG_GIC_CPUIF)
>>> +
>>
>> Following the change in the cpufeature infrastructure,
>> ARM64_HAS_SYSREG_GIC_CPUIF will have the scope
>> ARM64_CPUCAP_SCOPE_BOOT_CPU in order to be checked early in the boot
>> process.
>
> Thats correct.
>
>>
>> Now, regarding the early application of alternative, I am wondering
>> whether we can apply all the alternatives associated with SCOPE_BOOT
>> features that *do not* have a cpu_enable callback.
>>
>
> I don't understand why would you skip the ones that have a "cpu_enable"
> callback. Could you explain this a bit ? Ideally you should be able to
> apply the alternatives for features with the SCOPE_BOOT, provided the
> cpu_enable() callback is written properly.
>
In my mind the "cpu_enable" callback is the setup a cpu should perform
before using the feature (i.e. the code getting patched in by the
alternative). So I was worried about the code getting patched by the
boot cpu and then have the secondary cpus ending up executing patched
code before the cpu_enable for the corresponding feature gets called.
Or is there a requirement for secondary cpu startup code to be free of
alternative code?
>
>> Otherwise we can keep the macro to list individually each feature that
>> is patchable at boot time as the current patch does (or put this info
>> in a flag within the arm64_cpu_capabilities structure)
>
> You may be able to build up the mask of *available* capabilities with
> SCOPE_BOOT at boot time by playing some trick in the
> setup_boot_cpu_capabilities(), rather than embedding it in the
> capabilities (and then parsing the entire table(s)) or manually keeping
> track of the capabilities by having a separate mask.
>
Yes, I like that idea.
Thanks,
> Suzuki
>
>>
>> Any thoughts or preferences on this?
>>
>> Thanks,
>>
>>> #define __ALT_PTR(a,f) ((void *)&(a)->f + (a)->f)
>>> #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset)
>>> #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset)
>>> @@ -105,7 +117,8 @@ static u32 get_alt_insn(struct alt_instr *alt,
>>> __le32 *insnptr, __le32 *altinsnp
>>> return insn;
>>> }
>>>
>>> -static void __apply_alternatives(void *alt_region, bool
>>> use_linear_alias)
>>> +static void __apply_alternatives(void *alt_region, bool
>>> use_linear_alias,
>>> + unsigned long feature_mask)
>>> {
>>> struct alt_instr *alt;
>>> struct alt_region *region = alt_region;
>>> @@ -115,6 +128,9 @@ static void __apply_alternatives(void
>>> *alt_region, bool use_linear_alias)
>>> u32 insn;
>>> int i, nr_inst;
>>>
>>> + if ((BIT(alt->cpufeature) & feature_mask) == 0)
>>> + continue;
>>> +
>>> if (!cpus_have_cap(alt->cpufeature))
>>> continue;
>>>
>>> @@ -138,6 +154,21 @@ static void __apply_alternatives(void
>>> *alt_region, bool use_linear_alias)
>>> }
>>>
>>> /*
>>> + * This is called very early in the boot process (directly after we run
>>> + * a feature detect on the boot CPU). No need to worry about other CPUs
>>> + * here.
>>> + */
>>> +void apply_alternatives_early(void)
>>> +{
>>> + struct alt_region region = {
>>> + .begin = (struct alt_instr *)__alt_instructions,
>>> + .end = (struct alt_instr *)__alt_instructions_end,
>>> + };
>>> +
>>> + __apply_alternatives(®ion, true, EARLY_APPLY_FEATURE_MASK);
>>> +}
>>> +
>>> +/*
>>> * We might be patching the stop_machine state machine, so implement a
>>> * really simple polling protocol here.
>>> */
>>> @@ -156,7 +187,9 @@ static int __apply_alternatives_multi_stop(void
>>> *unused)
>>> isb();
>>> } else {
>>> BUG_ON(patched);
>>> - __apply_alternatives(®ion, true);
>>> +
>>> + __apply_alternatives(®ion, true, ~EARLY_APPLY_FEATURE_MASK);
>>> +
>>> /* Barriers provided by the cache flushing */
>>> WRITE_ONCE(patched, 1);
>>> }
>>> @@ -177,5 +210,5 @@ void apply_alternatives(void *start, size_t length)
>>> .end = start + length,
>>> };
>>>
>>> - __apply_alternatives(®ion, false);
>>> + __apply_alternatives(®ion, false, -1);
>>> }
>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>>> index 551eb07..37361b5 100644
>>> --- a/arch/arm64/kernel/smp.c
>>> +++ b/arch/arm64/kernel/smp.c
>>> @@ -453,6 +453,12 @@ void __init smp_prepare_boot_cpu(void)
>>> * cpuinfo_store_boot_cpu() above.
>>> */
>>> update_cpu_errata_workarounds();
>>> + /*
>>> + * We now know enough about the boot CPU to apply the
>>> + * alternatives that cannot wait until interrupt handling
>>> + * and/or scheduling is enabled.
>>> + */
>>> + apply_alternatives_early();
>>> }
>>>
>>> static u64 __init of_get_cpu_mpidr(struct device_node *dn)
>>> --
>>> 1.9.1
>>>
>>
>
--
Julien Thierry
On 11/05/18 09:12, Julien Thierry wrote:
>
>
> On 09/05/18 22:52, Suzuki K Poulose wrote:
>> On 05/04/2018 11:06 AM, Julien Thierry wrote:
>>> Hi,
>>>
>>> In order to prepare the v3 of this patchset, I'd like people's opinion on what this patch does. More below.
>>>
>>> On 17/01/18 11:54, Julien Thierry wrote:
>>>> From: Daniel Thompson <[email protected]>
>>>>
>>>> Currently alternatives are applied very late in the boot process (and
>>>> a long time after we enable scheduling). Some alternative sequences,
>>>> such as those that alter the way CPU context is stored, must be applied
>>>> much earlier in the boot sequence.
>>
>>>> +/*
>>>> + * early-apply features are detected using only the boot CPU and checked on
>>>> + * secondary CPUs startup, even then,
>>>> + * These early-apply features should only include features where we must
>>>> + * patch the kernel very early in the boot process.
>>>> + *
>>>> + * Note that the cpufeature logic *must* be made aware of early-apply
>>>> + * features to ensure they are reported as enabled without waiting
>>>> + * for other CPUs to boot.
>>>> + */
>>>> +#define EARLY_APPLY_FEATURE_MASK BIT(ARM64_HAS_SYSREG_GIC_CPUIF)
>>>> +
>>>
>>> Following the change in the cpufeature infrastructure, ARM64_HAS_SYSREG_GIC_CPUIF will have the scope ARM64_CPUCAP_SCOPE_BOOT_CPU in order to be checked early in the boot process.
>>
>> Thats correct.
>>
>>>
>>> Now, regarding the early application of alternative, I am wondering whether we can apply all the alternatives associated with SCOPE_BOOT features that *do not* have a cpu_enable callback.
>>>
>>
>> I don't understand why would you skip the ones that have a "cpu_enable" callback. Could you explain this a bit ? Ideally you should be able to
>> apply the alternatives for features with the SCOPE_BOOT, provided the
>> cpu_enable() callback is written properly.
>>
>
> In my mind the "cpu_enable" callback is the setup a cpu should perform before using the feature (i.e. the code getting patched in by the alternative). So I was worried about the code getting patched by the boot cpu and then have the secondary cpus ending up executing patched code before the cpu_enable for the corresponding feature gets called.
> Or is there a requirement for secondary cpu startup code to be free of alternative code?
There are no imposed restrictions. It is upto the capability to decide
what can be done in cpu_enable() and what can be patched. So, if you
make sure the patched code can be safely executed by secondary it is
fine. May be you could even patch in some code in the early boot up, to
do what you do in "cpu_enable()" for the secondary to safely execute
the patched code.
Anyway, if the secondary CPUs don't have the feature you are going to
panic the system. So I don't think there is a big difference in the
outcome if there is a mismatch, except for a clean message about the
conflict.
Cheers
Suzuki