Hi,
This series is a continuation of the work started by Daniel [1]. The goal
is to use GICv3 interrupt priorities to simulate an NMI.
The patches depend on the core API for NMIs patches [2]. Both series can
be found on this branch:
git clone http://linux-arm.org/linux-jt.git -b v5.0-pseudo-nmi
To achieve this, set two priorities, one for standard interrupts and
another, higher priority, for NMIs. Whenever we want to disable interrupts,
we mask the standard priority instead so NMIs can still be raised. Some
corner cases though still require to actually mask all interrupts
effectively disabling the NMI.
Daniel Thompson ran some benchmarks [3] on a previous version showing a
small (<1%) performance drop when using interrupt priorities on Cortex-A53
and GIC-500.
Currently, only PPIs and SPIs can be set as NMIs. IPIs being currently
hardcoded IRQ numbers, there isn't a generic interface to set SGIs as NMI
for now. LPIs being controlled by the ITS cannot be delivered as NMI.
When an NMI is active on a CPU, no other NMI can be triggered on the CPU.
Requirements to use this:
- Have GICv3
- SCR_EL3.FIQ is set to 1 when linux runs or have single security state
- Select Kernel Feature -> Support for NMI-like interrupts
- Provide "enable_pseudo_nmi" on the kernel command line
* Patch 1 fixes an existing issue with current NMI contexts in arm64
* Patches 2 and 3 are just a bit of cleanup
* Patch 4 introduces a CPU feature to check if priority masking should be
used
* Patches 5 and 6 add the support for priority masking in GICv3 driver
* Patches 7 to 13 add the support for priority masking the arch/arm64
code
* Patches 14 and 15 allow us to apply alternatives earlier in the boot
process
* Patches 16 to 18 starts the PMR masking on cpu startup and provides
primitives for arm64 GICv3 driver to perform priority masking
* Patches 19 to 22 Add support for pseudo-NMIs in GICv3 driver
* Patches 23 to 25 Add support for receiving NMIs in arch/arm64
* Patch 26 adds the build config and command line option to enable
pseudo-NMIs
Changes since v8[4]:
* Rebase on v5.0-rc3
* Add Acked-by and Reviewed-by tags
* Simplify arch_local_save_flags code
* Fix issue in cpufeature when enabling early alternatives
[1] http://www.spinics.net/lists/arm-kernel/msg525077.html
[2] https://lkml.org/lkml/2018/11/12/2113
[3] https://lkml.org/lkml/2018/7/20/803
[4] https://lkml.org/lkml/2019/1/8/456
Cheers,
Julien
-->
Daniel Thompson (1):
arm64: alternative: Apply alternatives early in boot process
Julien Thierry (25):
arm64: Fix HCR.TGE status for NMI contexts
arm64: Remove unused daif related functions/macros
arm64: cpufeature: Set SYSREG_GIC_CPUIF as a boot system feature
arm64: cpufeature: Add cpufeature for IRQ priority masking
arm/arm64: gic-v3: Add PMR and RPR accessors
irqchip/gic-v3: Switch to PMR masking before calling IRQ handler
arm64: ptrace: Provide definitions for PMR values
arm64: Make PMR part of task context
arm64: Unmask PMR before going idle
arm64: kvm: Unmask PMR before entering guest
efi: Let architectures decide the flags that should be saved/restored
arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking
arm64: daifflags: Include PMR in daifflags restore operations
arm64: alternative: Allow alternative status checking per cpufeature
irqchip/gic-v3: Factor group0 detection into functions
arm64: Switch to PMR masking when starting CPUs
arm64: gic-v3: Implement arch support for priority masking
irqchip/gic-v3: Detect if GIC can support pseudo-NMIs
irqchip/gic-v3: Handle pseudo-NMIs
irqchip/gic: Add functions to access irq priorities
irqchip/gic-v3: Allow interrupts to be set as pseudo-NMI
arm64: Handle serror in NMI context
arm64: Skip preemption when exiting an NMI
arm64: Skip irqflags tracing for NMI in IRQs disabled context
arm64: Enable the support of pseudo-NMIs
Documentation/admin-guide/kernel-parameters.txt | 6 +
Documentation/arm64/booting.txt | 5 +
arch/arm/include/asm/arch_gicv3.h | 33 ++++
arch/arm64/Kconfig | 14 ++
arch/arm64/include/asm/alternative.h | 4 +-
arch/arm64/include/asm/arch_gicv3.h | 32 +++
arch/arm64/include/asm/assembler.h | 10 +-
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/cpufeature.h | 10 +
arch/arm64/include/asm/daifflags.h | 41 ++--
arch/arm64/include/asm/efi.h | 6 +
arch/arm64/include/asm/hardirq.h | 28 +++
arch/arm64/include/asm/irqflags.h | 104 +++++++---
arch/arm64/include/asm/kvm_host.h | 12 ++
arch/arm64/include/asm/processor.h | 3 +
arch/arm64/include/asm/ptrace.h | 26 ++-
arch/arm64/kernel/alternative.c | 60 +++++-
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/cpufeature.c | 42 +++-
arch/arm64/kernel/entry.S | 43 ++++
arch/arm64/kernel/irq.c | 3 +
arch/arm64/kernel/process.c | 51 +++++
arch/arm64/kernel/smp.c | 33 ++++
arch/arm64/kernel/traps.c | 8 +-
arch/arm64/kvm/hyp/switch.c | 16 ++
arch/arm64/mm/proc.S | 11 --
drivers/firmware/efi/runtime-wrappers.c | 17 +-
drivers/irqchip/irq-gic-common.c | 10 +
drivers/irqchip/irq-gic-common.h | 2 +
drivers/irqchip/irq-gic-v3.c | 252 +++++++++++++++++++++---
include/linux/efi.h | 5 +-
include/linux/hardirq.h | 7 +
32 files changed, 783 insertions(+), 115 deletions(-)
--
1.9.1
It is not supported to have some CPUs using GICv3 sysreg CPU interface
while some others do not.
Once ICC_SRE_EL1.SRE is set on a CPU, the bit cannot be cleared. Since
matching this feature require setting ICC_SRE_EL1.SRE, it cannot be
turned off if found on a CPU.
Set the feature as STRICT_BOOT, if boot CPU has it, all other CPUs are
required to have it.
Signed-off-by: Julien Thierry <[email protected]>
Suggested-by: Daniel Thompson <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
arch/arm64/kernel/cpufeature.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index f6d84e2..b9c0adf 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1207,7 +1207,7 @@ static void cpu_enable_address_auth(struct arm64_cpu_capabilities const *cap)
{
.desc = "GIC system register CPU interface",
.capability = ARM64_HAS_SYSREG_GIC_CPUIF,
- .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .type = ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE,
.matches = has_useable_gicv3_cpuif,
.sys_reg = SYS_ID_AA64PFR0_EL1,
.field_pos = ID_AA64PFR0_GIC_SHIFT,
--
1.9.1
Add helper functions to access system registers related to interrupt
priorities: PMR and RPR.
Signed-off-by: Julien Thierry <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
arch/arm/include/asm/arch_gicv3.h | 16 ++++++++++++++++
arch/arm64/include/asm/arch_gicv3.h | 15 +++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/arch/arm/include/asm/arch_gicv3.h b/arch/arm/include/asm/arch_gicv3.h
index 0bd5307..bef0b5d 100644
--- a/arch/arm/include/asm/arch_gicv3.h
+++ b/arch/arm/include/asm/arch_gicv3.h
@@ -34,6 +34,7 @@
#define ICC_SRE __ACCESS_CP15(c12, 0, c12, 5)
#define ICC_IGRPEN1 __ACCESS_CP15(c12, 0, c12, 7)
#define ICC_BPR1 __ACCESS_CP15(c12, 0, c12, 3)
+#define ICC_RPR __ACCESS_CP15(c12, 0, c11, 3)
#define __ICC_AP0Rx(x) __ACCESS_CP15(c12, 0, c8, 4 | x)
#define ICC_AP0R0 __ICC_AP0Rx(0)
@@ -245,6 +246,21 @@ static inline void gic_write_bpr1(u32 val)
write_sysreg(val, ICC_BPR1);
}
+static inline u32 gic_read_pmr(void)
+{
+ return read_sysreg(ICC_PMR);
+}
+
+static inline void gic_write_pmr(u32 val)
+{
+ write_sysreg(val, ICC_PMR);
+}
+
+static inline u32 gic_read_rpr(void)
+{
+ return read_sysreg(ICC_RPR);
+}
+
/*
* Even in 32bit systems that use LPAE, there is no guarantee that the I/O
* interface provides true 64bit atomic accesses, so using strd/ldrd doesn't
diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h
index e278f94..37193e2 100644
--- a/arch/arm64/include/asm/arch_gicv3.h
+++ b/arch/arm64/include/asm/arch_gicv3.h
@@ -114,6 +114,21 @@ static inline void gic_write_bpr1(u32 val)
write_sysreg_s(val, SYS_ICC_BPR1_EL1);
}
+static inline u32 gic_read_pmr(void)
+{
+ return read_sysreg_s(SYS_ICC_PMR_EL1);
+}
+
+static inline void gic_write_pmr(u32 val)
+{
+ write_sysreg_s(val, SYS_ICC_PMR_EL1);
+}
+
+static inline u32 gic_read_rpr(void)
+{
+ return read_sysreg_s(SYS_ICC_RPR_EL1);
+}
+
#define gic_read_typer(c) readq_relaxed(c)
#define gic_write_irouter(v, c) writeq_relaxed(v, c)
#define gic_read_lpir(c) readq_relaxed(c)
--
1.9.1
Add a cpufeature indicating whether a cpu supports masking interrupts
by priority.
The feature will be properly enabled in a later patch.
Signed-off-by: Julien Thierry <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/cpucaps.h | 3 ++-
arch/arm64/include/asm/cpufeature.h | 6 ++++++
arch/arm64/kernel/cpufeature.c | 23 +++++++++++++++++++++++
3 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 82e9099..f6a76e4 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -60,7 +60,8 @@
#define ARM64_HAS_ADDRESS_AUTH_IMP_DEF 39
#define ARM64_HAS_GENERIC_AUTH_ARCH 40
#define ARM64_HAS_GENERIC_AUTH_IMP_DEF 41
+#define ARM64_HAS_IRQ_PRIO_MASKING 42
-#define ARM64_NCAPS 42
+#define ARM64_NCAPS 43
#endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index dfcfba7..89c3f31 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -612,6 +612,12 @@ static inline bool system_supports_generic_auth(void)
cpus_have_const_cap(ARM64_HAS_GENERIC_AUTH_IMP_DEF));
}
+static inline bool system_uses_irq_prio_masking(void)
+{
+ return IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) &&
+ cpus_have_const_cap(ARM64_HAS_IRQ_PRIO_MASKING);
+}
+
#define ARM64_SSBD_UNKNOWN -1
#define ARM64_SSBD_FORCE_DISABLE 0
#define ARM64_SSBD_KERNEL 1
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b9c0adf..6f56e0a 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1203,6 +1203,14 @@ static void cpu_enable_address_auth(struct arm64_cpu_capabilities const *cap)
}
#endif /* CONFIG_ARM64_PTR_AUTH */
+#ifdef CONFIG_ARM64_PSEUDO_NMI
+static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry,
+ int scope)
+{
+ return false;
+}
+#endif
+
static const struct arm64_cpu_capabilities arm64_features[] = {
{
.desc = "GIC system register CPU interface",
@@ -1480,6 +1488,21 @@ static void cpu_enable_address_auth(struct arm64_cpu_capabilities const *cap)
.matches = has_cpuid_feature,
},
#endif /* CONFIG_ARM64_PTR_AUTH */
+#ifdef CONFIG_ARM64_PSEUDO_NMI
+ {
+ /*
+ * Depends on having GICv3
+ */
+ .desc = "IRQ priority masking",
+ .capability = ARM64_HAS_IRQ_PRIO_MASKING,
+ .type = ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE,
+ .matches = can_use_gic_priorities,
+ .sys_reg = SYS_ID_AA64PFR0_EL1,
+ .field_pos = ID_AA64PFR0_GIC_SHIFT,
+ .sign = FTR_UNSIGNED,
+ .min_field_value = 1,
+ },
+#endif
{},
};
--
1.9.1
Introduce fixed values for PMR that are going to be used to mask and
unmask interrupts by priority.
The current priority given to GIC interrupts is 0xa0, so clearing PMR's
most significant bit is enough to mask interrupts.
Signed-off-by: Julien Thierry <[email protected]>
Suggested-by: Daniel Thompson <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/ptrace.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index fce22c4..05cf913 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -25,6 +25,18 @@
#define CurrentEL_EL1 (1 << 2)
#define CurrentEL_EL2 (2 << 2)
+/*
+ * PMR values used to mask/unmask interrupts.
+ *
+ * GIC priority masking works as follows: if an IRQ's priority is a higher value
+ * than the value held in PMR, that interrupt is masked. A lower value of PMR
+ * means more IRQ priorities are masked.
+ *
+ * To mask priorities, we clear the most significant bit of PMR.
+ */
+#define GIC_PRIO_IRQON 0xf0
+#define GIC_PRIO_IRQOFF (GIC_PRIO_IRQON & ~0x80)
+
/* Additional SPSR bits not exposed in the UABI */
#define PSR_IL_BIT (1 << 20)
--
1.9.1
CPU does not received signals for interrupts with a priority masked by
ICC_PMR_EL1. This means the CPU might not come back from a WFI
instruction.
Make sure ICC_PMR_EL1 does not mask interrupts when doing a WFI.
Since the logic of cpu_do_idle is becoming a bit more complex than just
two instructions, lets turn it from ASM to C.
Signed-off-by: Julien Thierry <[email protected]>
Suggested-by: Daniel Thompson <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/kernel/process.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
arch/arm64/mm/proc.S | 11 -----------
2 files changed, 45 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6d410fc..f05b63f 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -51,6 +51,7 @@
#include <linux/thread_info.h>
#include <asm/alternative.h>
+#include <asm/arch_gicv3.h>
#include <asm/compat.h>
#include <asm/cacheflush.h>
#include <asm/exec.h>
@@ -74,6 +75,50 @@
void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
+static inline void __cpu_do_idle(void)
+{
+ dsb(sy);
+ wfi();
+}
+
+static inline void __cpu_do_idle_irqprio(void)
+{
+ unsigned long pmr;
+ unsigned long daif_bits;
+
+ daif_bits = read_sysreg(daif);
+ write_sysreg(daif_bits | PSR_I_BIT, daif);
+
+ /*
+ * Unmask PMR before going idle to make sure interrupts can
+ * be raised.
+ */
+ pmr = gic_read_pmr();
+ gic_write_pmr(GIC_PRIO_IRQON);
+
+ __cpu_do_idle();
+
+ gic_write_pmr(pmr);
+ write_sysreg(daif_bits, daif);
+}
+
+/*
+ * cpu_do_idle()
+ *
+ * Idle the processor (wait for interrupt).
+ *
+ * If the CPU supports priority masking we must do additional work to
+ * ensure that interrupts are not masked at the PMR (because the core will
+ * not wake up if we block the wake up signal in the interrupt controller).
+ */
+void cpu_do_idle(void)
+{
+ if (system_uses_irq_prio_masking())
+ __cpu_do_idle_irqprio();
+ else
+ __cpu_do_idle();
+}
+
/*
* This is our default idle handler.
*/
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 73886a5..3ea4f3b 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -55,17 +55,6 @@
#define MAIR(attr, mt) ((attr) << ((mt) * 8))
-/*
- * cpu_do_idle()
- *
- * Idle the processor (wait for interrupt).
- */
-ENTRY(cpu_do_idle)
- dsb sy // WFI may enter a low-power mode
- wfi
- ret
-ENDPROC(cpu_do_idle)
-
#ifdef CONFIG_CPU_PM
/**
* cpu_do_suspend - save CPU registers context
--
1.9.1
Interrupts masked by ICC_PMR_EL1 will not be signaled to the CPU. This
means that hypervisor will not receive masked interrupts while running a
guest.
Avoid this by making sure ICC_PMR_EL1 is unmasked when we enter a guest.
Signed-off-by: Julien Thierry <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
---
arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
arch/arm64/kvm/hyp/switch.c | 16 ++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7732d0b..a1f9f55 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -24,6 +24,7 @@
#include <linux/types.h>
#include <linux/kvm_types.h>
+#include <asm/arch_gicv3.h>
#include <asm/cpufeature.h>
#include <asm/daifflags.h>
#include <asm/fpsimd.h>
@@ -474,6 +475,17 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
static inline void kvm_arm_vhe_guest_enter(void)
{
local_daif_mask();
+
+ /*
+ * Having IRQs masked via PMR when entering the guest means the GIC
+ * will not signal the CPU of interrupts of lower priority, and the
+ * only way to get out will be via guest exceptions.
+ * Naturally, we want to avoid this.
+ */
+ if (system_uses_irq_prio_masking()) {
+ gic_write_pmr(GIC_PRIO_IRQON);
+ dsb(sy);
+ }
}
static inline void kvm_arm_vhe_guest_exit(void)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index b0b1478..6a4c2d6 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -22,6 +22,7 @@
#include <kvm/arm_psci.h>
+#include <asm/arch_gicv3.h>
#include <asm/cpufeature.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
@@ -521,6 +522,17 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *guest_ctxt;
u64 exit_code;
+ /*
+ * Having IRQs masked via PMR when entering the guest means the GIC
+ * will not signal the CPU of interrupts of lower priority, and the
+ * only way to get out will be via guest exceptions.
+ * Naturally, we want to avoid this.
+ */
+ if (system_uses_irq_prio_masking()) {
+ gic_write_pmr(GIC_PRIO_IRQON);
+ dsb(sy);
+ }
+
vcpu = kern_hyp_va(vcpu);
host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
@@ -573,6 +585,10 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
*/
__debug_switch_to_host(vcpu);
+ /* Returning to host will clear PSR.I, remask PMR if needed */
+ if (system_uses_irq_prio_masking())
+ gic_write_pmr(GIC_PRIO_IRQOFF);
+
return exit_code;
}
--
1.9.1
Currently, irqflags are saved before calling runtime services and
checked for mismatch on return.
Provide a pair of overridable macros to save and restore (if needed) the
state that need to be preserved on return from a runtime service.
This allows to check for flags that are not necesarly related to
irqflags.
Signed-off-by: Julien Thierry <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: [email protected]
---
drivers/firmware/efi/runtime-wrappers.c | 17 +++++++++++++++--
include/linux/efi.h | 5 +++--
2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index 8903b9c..2f4b68b 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -89,11 +89,24 @@
efi_rts_work.status; \
})
+#ifndef arch_efi_save_flags
+#define arch_efi_save_flags(state_flags) local_save_flags(state_flags)
+#define arch_efi_restore_flags(state_flags) local_irq_restore(state_flags)
+#endif
+
+inline unsigned long efi_call_virt_save_flags(void)
+{
+ unsigned long flags;
+
+ arch_efi_save_flags(flags);
+ return flags;
+}
+
void efi_call_virt_check_flags(unsigned long flags, const char *call)
{
unsigned long cur_flags, mismatch;
- local_save_flags(cur_flags);
+ cur_flags = efi_call_virt_save_flags();
mismatch = flags ^ cur_flags;
if (!WARN_ON_ONCE(mismatch & ARCH_EFI_IRQ_FLAGS_MASK))
@@ -102,7 +115,7 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call)
add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_NOW_UNRELIABLE);
pr_err_ratelimited(FW_BUG "IRQ flags corrupted (0x%08lx=>0x%08lx) by EFI %s\n",
flags, cur_flags, call);
- local_irq_restore(flags);
+ arch_efi_restore_flags(flags);
}
/*
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 45ff763..bd80b7e 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1607,6 +1607,7 @@ efi_status_t efi_setup_gop(efi_system_table_t *sys_table_arg,
bool efi_runtime_disabled(void);
extern void efi_call_virt_check_flags(unsigned long flags, const char *call);
+extern unsigned long efi_call_virt_save_flags(void);
enum efi_secureboot_mode {
efi_secureboot_mode_unset,
@@ -1652,7 +1653,7 @@ enum efi_secureboot_mode {
\
arch_efi_call_virt_setup(); \
\
- local_save_flags(__flags); \
+ __flags = efi_call_virt_save_flags(); \
__s = arch_efi_call_virt(p, f, args); \
efi_call_virt_check_flags(__flags, __stringify(f)); \
\
@@ -1667,7 +1668,7 @@ enum efi_secureboot_mode {
\
arch_efi_call_virt_setup(); \
\
- local_save_flags(__flags); \
+ __flags = efi_call_virt_save_flags(); \
arch_efi_call_virt(p, f, args); \
efi_call_virt_check_flags(__flags, __stringify(f)); \
\
--
1.9.1
Implement architecture specific primitive allowing the GICv3 driver to
use priorities to mask interrupts.
Signed-off-by: Julien Thierry <[email protected]>
Suggested-by: Daniel Thompson <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/arch_gicv3.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h
index b5f8142..14b41dd 100644
--- a/arch/arm64/include/asm/arch_gicv3.h
+++ b/arch/arm64/include/asm/arch_gicv3.h
@@ -22,6 +22,7 @@
#ifndef __ASSEMBLY__
+#include <linux/irqchip/arm-gic-common.h>
#include <linux/stringify.h>
#include <asm/barrier.h>
#include <asm/cacheflush.h>
@@ -162,14 +163,13 @@ static inline bool gic_prio_masking_enabled(void)
static inline void gic_pmr_mask_irqs(void)
{
- /* Should not get called yet. */
- WARN_ON_ONCE(true);
+ BUILD_BUG_ON(GICD_INT_DEF_PRI <= GIC_PRIO_IRQOFF);
+ gic_write_pmr(GIC_PRIO_IRQOFF);
}
static inline void gic_arch_enable_irqs(void)
{
- /* Should not get called yet. */
- WARN_ON_ONCE(true);
+ asm volatile ("msr daifclr, #2" : : : "memory");
}
#endif /* __ASSEMBLY__ */
--
1.9.1
Handling of an NMI should not set any TIF flags. For NMIs received from
EL0 the current exit path is safe to use.
However, an NMI received at EL1 could have interrupted some task context
that has set the TIF_NEED_RESCHED flag. Preempting a task should not
happen as a result of an NMI.
Skip preemption after handling an NMI from EL1.
Signed-off-by: Julien Thierry <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
arch/arm64/kernel/entry.S | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 35a47f6..a0b0a22 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -624,6 +624,14 @@ el1_irq:
#ifdef CONFIG_PREEMPT
ldr x24, [tsk, #TSK_TI_PREEMPT] // get preempt count
+alternative_if ARM64_HAS_IRQ_PRIO_MASKING
+ /*
+ * DA_F were cleared at start of handling. If anything is set in DAIF,
+ * we come back from an NMI, so skip preemption
+ */
+ mrs x0, daif
+ orr x24, x24, x0
+alternative_else_nop_endif
cbnz x24, 1f // preempt count != 0
bl el1_preempt
1:
--
1.9.1
When an NMI is raised while interrupts where disabled, the IRQ tracing
already is in the correct state (i.e. hardirqs_off) and should be left
as such when returning to the interrupted context.
Check whether PMR was masking interrupts when the NMI was raised and
skip IRQ tracing if necessary.
Signed-off-by: Julien Thierry <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
arch/arm64/kernel/entry.S | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index a0b0a22..bd0b078 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -617,7 +617,18 @@ el1_irq:
kernel_entry 1
enable_da_f
#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_ARM64_PSEUDO_NMI
+alternative_if ARM64_HAS_IRQ_PRIO_MASKING
+ ldr x20, [sp, #S_PMR_SAVE]
+alternative_else
+ mov x20, #GIC_PRIO_IRQON
+alternative_endif
+ cmp x20, #GIC_PRIO_IRQOFF
+ /* Irqs were disabled, don't trace */
+ b.ls 1f
+#endif
bl trace_hardirqs_off
+1:
#endif
irq_handler
@@ -637,8 +648,18 @@ alternative_else_nop_endif
1:
#endif
#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_ARM64_PSEUDO_NMI
+ /*
+ * if IRQs were disabled when we received the interrupt, we have an NMI
+ * and we are not re-enabling interrupt upon eret. Skip tracing.
+ */
+ cmp x20, #GIC_PRIO_IRQOFF
+ b.ls 1f
+#endif
bl trace_hardirqs_on
+1:
#endif
+
kernel_exit 1
ENDPROC(el1_irq)
--
1.9.1
The addition of PMR should not bypass the semantics of daifflags.
When DA_F are set, I bit is also set as no interrupts (even of higher
priority) is allowed.
When DA_F are cleared, I bit is cleared and interrupt enabling/disabling
goes through ICC_PMR_EL1.
Signed-off-by: Julien Thierry <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: James Morse <[email protected]>
---
arch/arm64/include/asm/daifflags.h | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index 546bc39..1fd390e 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -18,6 +18,8 @@
#include <linux/irqflags.h>
+#include <asm/cpufeature.h>
+
#define DAIF_PROCCTX 0
#define DAIF_PROCCTX_NOIRQ PSR_I_BIT
@@ -36,7 +38,13 @@ static inline unsigned long local_daif_save(void)
{
unsigned long flags;
- flags = arch_local_save_flags();
+ flags = read_sysreg(daif);
+
+ if (system_uses_irq_prio_masking()) {
+ /* If IRQs are masked with PMR, reflect it in the flags */
+ if (read_sysreg_s(SYS_ICC_PMR_EL1) <= GIC_PRIO_IRQOFF)
+ flags |= PSR_I_BIT;
+ }
local_daif_mask();
@@ -45,12 +53,27 @@ static inline unsigned long local_daif_save(void)
static inline void local_daif_restore(unsigned long flags)
{
- if (!arch_irqs_disabled_flags(flags))
+ bool irq_disabled = flags & PSR_I_BIT;
+
+ if (!irq_disabled) {
trace_hardirqs_on();
- arch_local_irq_restore(flags);
+ if (system_uses_irq_prio_masking())
+ arch_local_irq_enable();
+ } else if (!(flags & PSR_A_BIT)) {
+ /*
+ * If interrupts are disabled but we can take
+ * asynchronous errors, we can take NMIs
+ */
+ if (system_uses_irq_prio_masking()) {
+ flags &= ~PSR_I_BIT;
+ arch_local_irq_disable();
+ }
+ }
+
+ write_sysreg(flags, daif);
- if (arch_irqs_disabled_flags(flags))
+ if (irq_disabled)
trace_hardirqs_off();
}
--
1.9.1
Per definition of the daifflags, Serrors can occur during any interrupt
context, that includes NMI contexts. Trying to nmi_enter in an nmi context
will crash.
Skip nmi_enter/nmi_exit when serror occurred during an NMI.
Suggested-by: James Morse <[email protected]>
Signed-off-by: Julien Thierry <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Dave Martin <[email protected]>
Cc: James Morse <[email protected]>
---
arch/arm64/kernel/traps.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 4e2fb87..8ad119c 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -898,13 +898,17 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr)
asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
{
- nmi_enter();
+ const bool was_in_nmi = in_nmi();
+
+ if (!was_in_nmi)
+ nmi_enter();
/* non-RAS errors are not containable */
if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(regs, esr))
arm64_serror_panic(regs, esr);
- nmi_exit();
+ if (!was_in_nmi)
+ nmi_exit();
}
void __pte_error(const char *file, int line, unsigned long val)
--
1.9.1
The values non secure EL1 needs to use for PMR and RPR registers depends on
the value of SCR_EL3.FIQ.
The values non secure EL1 sees from the distributor and redistributor
depend on whether security is enabled for the GIC or not.
To avoid having to deal with two sets of values for PMR
masking/unmasking, only enable pseudo-NMIs when GIC has non-secure view
of priorities.
Also, add firmware requirements related to SCR_EL3.
Signed-off-by: Julien Thierry <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
Documentation/arm64/booting.txt | 5 ++++
drivers/irqchip/irq-gic-v3.c | 58 ++++++++++++++++++++++++++++++++++++-----
2 files changed, 57 insertions(+), 6 deletions(-)
diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 8df9f46..fbab7e2 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -188,6 +188,11 @@ Before jumping into the kernel, the following conditions must be met:
the kernel image will be entered must be initialised by software at a
higher exception level to prevent execution in an UNKNOWN state.
+ - SCR_EL3.FIQ must have the same value across all CPUs the kernel is
+ executing on.
+ - The value of SCR_EL3.FIQ must be the same as the one present at boot
+ time whenever the kernel is executing.
+
For systems with a GICv3 interrupt controller to be used in v3 mode:
- If EL3 is present:
ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1.
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 5a703ae..5374b43 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -66,6 +66,31 @@ struct gic_chip_data {
static struct gic_chip_data gic_data __read_mostly;
static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key);
+/*
+ * The behaviours of RPR and PMR registers differ depending on the value of
+ * SCR_EL3.FIQ, and the behaviour of non-secure priority registers of the
+ * distributor and redistributors depends on whether security is enabled in the
+ * GIC.
+ *
+ * When security is enabled, non-secure priority values from the (re)distributor
+ * are presented to the GIC CPUIF as follow:
+ * (GIC_(R)DIST_PRI[irq] >> 1) | 0x80;
+ *
+ * If SCR_EL3.FIQ == 1, the values writen to/read from PMR and RPR at non-secure
+ * EL1 are subject to a similar operation thus matching the priorities presented
+ * from the (re)distributor when security is enabled.
+ *
+ * see GICv3/GICv4 Architecture Specification (IHI0069D):
+ * - section 4.8.1 Non-secure accesses to register fields for Secure interrupt
+ * priorities.
+ * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1
+ * interrupt.
+ *
+ * For now, we only support pseudo-NMIs if we have non-secure view of
+ * priorities.
+ */
+static DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis);
+
static struct gic_kvm_info gic_v3_kvm_info;
static DEFINE_PER_CPU(bool, has_rss);
@@ -232,6 +257,12 @@ static void gic_unmask_irq(struct irq_data *d)
gic_poke_irq(d, GICD_ISENABLER);
}
+static inline bool gic_supports_nmi(void)
+{
+ return IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) &&
+ static_branch_likely(&supports_pseudo_nmis);
+}
+
static int gic_irq_set_irqchip_state(struct irq_data *d,
enum irqchip_irq_state which, bool val)
{
@@ -573,6 +604,12 @@ static void gic_update_vlpi_properties(void)
!gic_data.rdists.has_direct_lpi ? "no " : "");
}
+/* Check whether it's single security state view */
+static inline bool gic_dist_security_disabled(void)
+{
+ return readl_relaxed(gic_data.dist_base + GICD_CTLR) & GICD_CTLR_DS;
+}
+
static void gic_cpu_sys_reg_init(void)
{
int i, cpu = smp_processor_id();
@@ -598,6 +635,9 @@ static void gic_cpu_sys_reg_init(void)
/* Set priority mask register */
if (!gic_prio_masking_enabled())
write_gicreg(DEFAULT_PMR_VALUE, ICC_PMR_EL1);
+ else if (gic_supports_nmi() && group0)
+ /* Mismatch configuration with boot CPU */
+ WARN_ON(!gic_dist_security_disabled());
/*
* Some firmwares hand over to the kernel with the BPR changed from
@@ -852,12 +892,6 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
#endif
#ifdef CONFIG_CPU_PM
-/* Check whether it's single security state view */
-static bool gic_dist_security_disabled(void)
-{
- return readl_relaxed(gic_data.dist_base + GICD_CTLR) & GICD_CTLR_DS;
-}
-
static int gic_cpu_pm_notifier(struct notifier_block *self,
unsigned long cmd, void *v)
{
@@ -1110,6 +1144,11 @@ static bool gic_enable_quirk_msm8996(void *data)
return true;
}
+static void gic_enable_nmi_support(void)
+{
+ static_branch_enable(&supports_pseudo_nmis);
+}
+
static int __init gic_init_bases(void __iomem *dist_base,
struct redist_region *rdist_regs,
u32 nr_redist_regions,
@@ -1179,6 +1218,13 @@ static int __init gic_init_bases(void __iomem *dist_base,
its_cpu_init();
}
+ if (gic_prio_masking_enabled()) {
+ if (!gic_has_group0() || gic_dist_security_disabled())
+ gic_enable_nmi_support();
+ else
+ pr_warn("SCR_EL3.FIQ is cleared, cannot enable use of pseudo-NMIs\n");
+ }
+
return 0;
out_free:
--
1.9.1
Add accessors to the GIC distributor/redistributors priority registers.
Signed-off-by: Julien Thierry <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
drivers/irqchip/irq-gic-common.c | 10 ++++++++++
drivers/irqchip/irq-gic-common.h | 2 ++
2 files changed, 12 insertions(+)
diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
index 3c93c6f..04eadbc 100644
--- a/drivers/irqchip/irq-gic-common.c
+++ b/drivers/irqchip/irq-gic-common.c
@@ -110,6 +110,16 @@ int gic_configure_irq(unsigned int irq, unsigned int type,
return ret;
}
+void gic_set_irq_prio(unsigned int irq, void __iomem *base, u8 prio)
+{
+ writeb_relaxed(prio, base + GIC_DIST_PRI + irq);
+}
+
+u8 gic_get_irq_prio(unsigned int irq, void __iomem *base)
+{
+ return readb_relaxed(base + GIC_DIST_PRI + irq);
+}
+
void gic_dist_config(void __iomem *base, int gic_irqs,
void (*sync_access)(void))
{
diff --git a/drivers/irqchip/irq-gic-common.h b/drivers/irqchip/irq-gic-common.h
index 97e58fb..f1c6f5a 100644
--- a/drivers/irqchip/irq-gic-common.h
+++ b/drivers/irqchip/irq-gic-common.h
@@ -38,6 +38,8 @@ void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
void *data);
void gic_enable_of_quirks(const struct device_node *np,
const struct gic_quirk *quirks, void *data);
+void gic_set_irq_prio(unsigned int irq, void __iomem *base, u8 prio);
+u8 gic_get_irq_prio(unsigned int irq, void __iomem *base);
void gic_set_kvm_info(const struct gic_kvm_info *info);
--
1.9.1
Implement NMI callbacks for GICv3 irqchip. Install NMI safe handlers
when setting up interrupt line as NMI.
Only SPIs and PPIs are allowed to be set up as NMI.
Signed-off-by: Julien Thierry <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
drivers/irqchip/irq-gic-v3.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 84 insertions(+)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 4df1e94..447d8ab 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -27,6 +27,7 @@
#include <linux/of_address.h>
#include <linux/of_irq.h>
#include <linux/percpu.h>
+#include <linux/refcount.h>
#include <linux/slab.h>
#include <linux/irqchip.h>
@@ -93,6 +94,9 @@ struct gic_chip_data {
*/
static DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis);
+/* ppi_nmi_refs[n] == number of cpus having ppi[n + 16] set as NMI */
+static refcount_t ppi_nmi_refs[16];
+
static struct gic_kvm_info gic_v3_kvm_info;
static DEFINE_PER_CPU(bool, has_rss);
@@ -320,6 +324,72 @@ static int gic_irq_get_irqchip_state(struct irq_data *d,
return 0;
}
+static int gic_irq_nmi_setup(struct irq_data *d)
+{
+ struct irq_desc *desc = irq_to_desc(d->irq);
+
+ if (!gic_supports_nmi())
+ return -EINVAL;
+
+ if (gic_peek_irq(d, GICD_ISENABLER)) {
+ pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
+ return -EINVAL;
+ }
+
+ /*
+ * A secondary irq_chip should be in charge of LPI request,
+ * it should not be possible to get there
+ */
+ if (WARN_ON(gic_irq(d) >= 8192))
+ return -EINVAL;
+
+ /* desc lock should already be held */
+ if (gic_irq(d) < 32) {
+ /* Setting up PPI as NMI, only switch handler for first NMI */
+ if (!refcount_inc_not_zero(&ppi_nmi_refs[gic_irq(d) - 16])) {
+ refcount_set(&ppi_nmi_refs[gic_irq(d) - 16], 1);
+ desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
+ }
+ } else {
+ desc->handle_irq = handle_fasteoi_nmi;
+ }
+
+ gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_NMI_PRI);
+
+ return 0;
+}
+
+static void gic_irq_nmi_teardown(struct irq_data *d)
+{
+ struct irq_desc *desc = irq_to_desc(d->irq);
+
+ if (WARN_ON(!gic_supports_nmi()))
+ return;
+
+ if (gic_peek_irq(d, GICD_ISENABLER)) {
+ pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
+ return;
+ }
+
+ /*
+ * A secondary irq_chip should be in charge of LPI request,
+ * it should not be possible to get there
+ */
+ if (WARN_ON(gic_irq(d) >= 8192))
+ return;
+
+ /* desc lock should already be held */
+ if (gic_irq(d) < 32) {
+ /* Tearing down NMI, only switch handler for last NMI */
+ if (refcount_dec_and_test(&ppi_nmi_refs[gic_irq(d) - 16]))
+ desc->handle_irq = handle_percpu_devid_irq;
+ } else {
+ desc->handle_irq = handle_fasteoi_irq;
+ }
+
+ gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_DEF_PRI);
+}
+
static void gic_eoi_irq(struct irq_data *d)
{
gic_write_eoir(gic_irq(d));
@@ -958,6 +1028,8 @@ static inline void gic_cpu_pm_init(void) { }
.irq_set_affinity = gic_set_affinity,
.irq_get_irqchip_state = gic_irq_get_irqchip_state,
.irq_set_irqchip_state = gic_irq_set_irqchip_state,
+ .irq_nmi_setup = gic_irq_nmi_setup,
+ .irq_nmi_teardown = gic_irq_nmi_teardown,
.flags = IRQCHIP_SET_TYPE_MASKED |
IRQCHIP_SKIP_SET_WAKE |
IRQCHIP_MASK_ON_SUSPEND,
@@ -973,6 +1045,8 @@ static inline void gic_cpu_pm_init(void) { }
.irq_get_irqchip_state = gic_irq_get_irqchip_state,
.irq_set_irqchip_state = gic_irq_set_irqchip_state,
.irq_set_vcpu_affinity = gic_irq_set_vcpu_affinity,
+ .irq_nmi_setup = gic_irq_nmi_setup,
+ .irq_nmi_teardown = gic_irq_nmi_teardown,
.flags = IRQCHIP_SET_TYPE_MASKED |
IRQCHIP_SKIP_SET_WAKE |
IRQCHIP_MASK_ON_SUSPEND,
@@ -1176,7 +1250,17 @@ static bool gic_enable_quirk_msm8996(void *data)
static void gic_enable_nmi_support(void)
{
+ int i;
+
+ for (i = 0; i < 16; i++)
+ refcount_set(&ppi_nmi_refs[i], 0);
+
static_branch_enable(&supports_pseudo_nmis);
+
+ if (static_branch_likely(&supports_deactivate_key))
+ gic_eoimode1_chip.flags |= IRQCHIP_SUPPORTS_NMI;
+ else
+ gic_chip.flags |= IRQCHIP_SUPPORTS_NMI;
}
static int __init gic_init_bases(void __iomem *dist_base,
--
1.9.1
Provide a higher priority to be used for pseudo-NMIs. When such an
interrupt is received, keep interrupts fully disabled at CPU level to
prevent receiving other pseudo-NMIs while handling the current one.
Signed-off-by: Julien Thierry <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
drivers/irqchip/irq-gic-v3.c | 42 ++++++++++++++++++++++++++++++++++++------
1 file changed, 36 insertions(+), 6 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 5374b43..4df1e94 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -41,6 +41,8 @@
#include "irq-gic-common.h"
+#define GICD_INT_NMI_PRI (GICD_INT_DEF_PRI & ~0x80)
+
#define FLAGS_WORKAROUND_GICR_WAKER_MSM8996 (1ULL << 0)
struct redist_region {
@@ -381,12 +383,45 @@ static u64 gic_mpidr_to_affinity(unsigned long mpidr)
return aff;
}
+static inline void gic_deactivate_unexpected_irq(u32 irqnr)
+{
+ if (static_branch_likely(&supports_deactivate_key)) {
+ if (irqnr < 8192)
+ gic_write_dir(irqnr);
+ } else {
+ gic_write_eoir(irqnr);
+ }
+}
+
+static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
+{
+ int err;
+
+ if (static_branch_likely(&supports_deactivate_key))
+ gic_write_eoir(irqnr);
+ /*
+ * Leave the PSR.I bit set to prevent other NMIs to be
+ * received while handling this one.
+ * PSR.I will be restored when we ERET to the
+ * interrupted context.
+ */
+ err = handle_domain_nmi(gic_data.domain, irqnr, regs);
+ if (err)
+ gic_deactivate_unexpected_irq(irqnr);
+}
+
static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
{
u32 irqnr;
irqnr = gic_read_iar();
+ if (gic_supports_nmi() &&
+ unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
+ gic_handle_nmi(irqnr, regs);
+ return;
+ }
+
if (gic_prio_masking_enabled()) {
gic_pmr_mask_irqs();
gic_arch_enable_irqs();
@@ -403,12 +438,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
err = handle_domain_irq(gic_data.domain, irqnr, regs);
if (err) {
WARN_ONCE(true, "Unexpected interrupt received!\n");
- if (static_branch_likely(&supports_deactivate_key)) {
- if (irqnr < 8192)
- gic_write_dir(irqnr);
- } else {
- gic_write_eoir(irqnr);
- }
+ gic_deactivate_unexpected_irq(irqnr);
}
return;
}
--
1.9.1
Once the boot CPU has been prepared or a new secondary CPU has been
brought up, use ICC_PMR_EL1 to mask interrupts on that CPU and clear
PSR.I bit.
Since ICC_PMR_EL1 is initialized at CPU bringup, avoid overwriting
it in the GICv3 driver.
Signed-off-by: Julien Thierry <[email protected]>
Suggested-by: Daniel Thompson <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: James Morse <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
arch/arm64/kernel/smp.c | 26 ++++++++++++++++++++++++++
drivers/irqchip/irq-gic-v3.c | 8 +++++++-
2 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index a944edd..824de70 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -35,6 +35,7 @@
#include <linux/smp.h>
#include <linux/seq_file.h>
#include <linux/irq.h>
+#include <linux/irqchip/arm-gic-v3.h>
#include <linux/percpu.h>
#include <linux/clockchips.h>
#include <linux/completion.h>
@@ -180,6 +181,24 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
return ret;
}
+static void init_gic_priority_masking(void)
+{
+ u32 cpuflags;
+
+ if (WARN_ON(!gic_enable_sre()))
+ return;
+
+ cpuflags = read_sysreg(daif);
+
+ WARN_ON(!(cpuflags & PSR_I_BIT));
+
+ gic_write_pmr(GIC_PRIO_IRQOFF);
+
+ /* We can only unmask PSR.I if we can take aborts */
+ if (!(cpuflags & PSR_A_BIT))
+ write_sysreg(cpuflags & ~PSR_I_BIT, daif);
+}
+
/*
* This is the secondary CPU boot entry. We're using this CPUs
* idle thread stack, but a set of temporary page tables.
@@ -206,6 +225,9 @@ asmlinkage notrace void secondary_start_kernel(void)
*/
cpu_uninstall_idmap();
+ if (system_uses_irq_prio_masking())
+ init_gic_priority_masking();
+
preempt_disable();
trace_hardirqs_off();
@@ -426,6 +448,10 @@ void __init smp_prepare_boot_cpu(void)
* and/or scheduling is enabled.
*/
apply_boot_alternatives();
+
+ /* Conditionally switch to GIC PMR for interrupt masking */
+ if (system_uses_irq_prio_masking())
+ init_gic_priority_masking();
}
static u64 __init of_get_cpu_mpidr(struct device_node *dn)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index da547e0..5a703ae 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -415,6 +415,9 @@ static u32 gic_get_pribits(void)
static bool gic_has_group0(void)
{
u32 val;
+ u32 old_pmr;
+
+ old_pmr = gic_read_pmr();
/*
* Let's find out if Group0 is under control of EL3 or not by
@@ -430,6 +433,8 @@ static bool gic_has_group0(void)
gic_write_pmr(BIT(8 - gic_get_pribits()));
val = gic_read_pmr();
+ gic_write_pmr(old_pmr);
+
return val != 0;
}
@@ -591,7 +596,8 @@ static void gic_cpu_sys_reg_init(void)
group0 = gic_has_group0();
/* Set priority mask register */
- write_gicreg(DEFAULT_PMR_VALUE, ICC_PMR_EL1);
+ if (!gic_prio_masking_enabled())
+ write_gicreg(DEFAULT_PMR_VALUE, ICC_PMR_EL1);
/*
* Some firmwares hand over to the kernel with the BPR changed from
--
1.9.1
Add a build option and a command line parameter to build and enable the
support of pseudo-NMIs.
Signed-off-by: Julien Thierry <[email protected]>
Suggested-by: Daniel Thompson <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
arch/arm64/Kconfig | 14 ++++++++++++++
arch/arm64/kernel/cpufeature.c | 11 ++++++++++-
3 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b799bcf..173e2cc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1197,6 +1197,12 @@
to discrete, to make X server driver able to add WB
entry later. This parameter enables that.
+ enable_pseudo_nmi [ARM64]
+ Enables support for pseudo-NMIs in the kernel. This
+ requires both the kernel to be built with
+ CONFIG_ARM64_PSEUDO_NMI and to be running on a
+ platform with GICv3.
+
enable_timer_pin_1 [X86]
Enable PIN 1 of APIC timer
Can be useful to work around chipset bugs
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4168d3..8d84bfd 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1328,6 +1328,20 @@ config ARM64_MODULE_PLTS
bool
select HAVE_MOD_ARCH_SPECIFIC
+config ARM64_PSEUDO_NMI
+ bool "Support for NMI-like interrupts"
+ select CONFIG_ARM_GIC_V3
+ help
+ Adds support for mimicking Non-Maskable Interrupts through the use of
+ GIC interrupt priority. This support requires version 3 or later of
+ Arm GIC.
+
+ This high priority configuration for interrupts need to be
+ explicitly enabled through the new kernel parameter
+ "enable_pseudo_nmi".
+
+ If unsure, say N
+
config RELOCATABLE
bool
help
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b530fb24..e66d778 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1207,10 +1207,19 @@ static void cpu_enable_address_auth(struct arm64_cpu_capabilities const *cap)
#endif /* CONFIG_ARM64_PTR_AUTH */
#ifdef CONFIG_ARM64_PSEUDO_NMI
+static bool enable_pseudo_nmi;
+
+static int __init early_enable_pseudo_nmi(char *p)
+{
+ enable_pseudo_nmi = true;
+ return 0;
+}
+early_param("enable_pseudo_nmi", early_enable_pseudo_nmi);
+
static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry,
int scope)
{
- return false;
+ return enable_pseudo_nmi && has_useable_gicv3_cpuif(entry, scope);
}
#endif
--
1.9.1
The code to detect whether Linux has access to group0 interrupts can
prove useful in other parts of the driver.
Provide a separate function to do this.
Signed-off-by: Julien Thierry <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
drivers/irqchip/irq-gic-v3.c | 55 +++++++++++++++++++++++++++++---------------
1 file changed, 36 insertions(+), 19 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 8148a92..da547e0 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -400,6 +400,39 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
}
}
+static u32 gic_get_pribits(void)
+{
+ u32 pribits;
+
+ pribits = gic_read_ctlr();
+ pribits &= ICC_CTLR_EL1_PRI_BITS_MASK;
+ pribits >>= ICC_CTLR_EL1_PRI_BITS_SHIFT;
+ pribits++;
+
+ return pribits;
+}
+
+static bool gic_has_group0(void)
+{
+ u32 val;
+
+ /*
+ * Let's find out if Group0 is under control of EL3 or not by
+ * setting the highest possible, non-zero priority in PMR.
+ *
+ * If SCR_EL3.FIQ is set, the priority gets shifted down in
+ * order for the CPU interface to set bit 7, and keep the
+ * actual priority in the non-secure range. In the process, it
+ * looses the least significant bit and the actual priority
+ * becomes 0x80. Reading it back returns 0, indicating that
+ * we're don't have access to Group0.
+ */
+ gic_write_pmr(BIT(8 - gic_get_pribits()));
+ val = gic_read_pmr();
+
+ return val != 0;
+}
+
static void __init gic_dist_init(void)
{
unsigned int i;
@@ -541,7 +574,7 @@ static void gic_cpu_sys_reg_init(void)
u64 mpidr = cpu_logical_map(cpu);
u64 need_rss = MPIDR_RS(mpidr);
bool group0;
- u32 val, pribits;
+ u32 pribits;
/*
* Need to check that the SRE bit has actually been set. If
@@ -553,25 +586,9 @@ static void gic_cpu_sys_reg_init(void)
if (!gic_enable_sre())
pr_err("GIC: unable to set SRE (disabled at EL2), panic ahead\n");
- pribits = gic_read_ctlr();
- pribits &= ICC_CTLR_EL1_PRI_BITS_MASK;
- pribits >>= ICC_CTLR_EL1_PRI_BITS_SHIFT;
- pribits++;
+ pribits = gic_get_pribits();
- /*
- * Let's find out if Group0 is under control of EL3 or not by
- * setting the highest possible, non-zero priority in PMR.
- *
- * If SCR_EL3.FIQ is set, the priority gets shifted down in
- * order for the CPU interface to set bit 7, and keep the
- * actual priority in the non-secure range. In the process, it
- * looses the least significant bit and the actual priority
- * becomes 0x80. Reading it back returns 0, indicating that
- * we're don't have access to Group0.
- */
- write_gicreg(BIT(8 - pribits), ICC_PMR_EL1);
- val = read_gicreg(ICC_PMR_EL1);
- group0 = val != 0;
+ group0 = gic_has_group0();
/* Set priority mask register */
write_gicreg(DEFAULT_PMR_VALUE, ICC_PMR_EL1);
--
1.9.1
From: Daniel Thompson <[email protected]>
Currently alternatives are applied very late in the boot process (and
a long time after we enable scheduling). Some alternative sequences,
such as those that alter the way CPU context is stored, must be applied
much earlier in the boot sequence.
Introduce apply_boot_alternatives() to allow some alternatives to be
applied immediately after we detect the CPU features of the boot CPU.
Signed-off-by: Daniel Thompson <[email protected]>
[[email protected]: rename to fit new cpufeature framework better,
apply BOOT_SCOPE feature early in boot]
Signed-off-by: Julien Thierry <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Christoffer Dall <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
---
arch/arm64/include/asm/alternative.h | 1 +
arch/arm64/include/asm/cpufeature.h | 4 ++++
arch/arm64/kernel/alternative.c | 43 +++++++++++++++++++++++++++++++-----
arch/arm64/kernel/cpufeature.c | 6 +++++
arch/arm64/kernel/smp.c | 7 ++++++
5 files changed, 56 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 9806a23..b9f8d78 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -25,6 +25,7 @@ struct alt_instr {
typedef void (*alternative_cb_t)(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
+void __init apply_boot_alternatives(void);
void __init apply_alternatives_all(void);
bool alternative_is_applied(u16 cpufeature);
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 89c3f31..e505e1f 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -391,6 +391,10 @@ static inline int cpucap_default_scope(const struct arm64_cpu_capabilities *cap)
extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
extern struct static_key_false arm64_const_caps_ready;
+/* ARM64 CAPS + alternative_cb */
+#define ARM64_NPATCHABLE (ARM64_NCAPS + 1)
+extern DECLARE_BITMAP(boot_capabilities, ARM64_NPATCHABLE);
+
#define for_each_available_cap(cap) \
for_each_set_bit(cap, cpu_hwcaps, ARM64_NCAPS)
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index c947d22..a9b4677 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -155,7 +155,8 @@ static void clean_dcache_range_nopatch(u64 start, u64 end)
} while (cur += d_size, cur < end);
}
-static void __apply_alternatives(void *alt_region, bool is_module)
+static void __apply_alternatives(void *alt_region, bool is_module,
+ unsigned long *feature_mask)
{
struct alt_instr *alt;
struct alt_region *region = alt_region;
@@ -165,6 +166,9 @@ static void __apply_alternatives(void *alt_region, bool is_module)
for (alt = region->begin; alt < region->end; alt++) {
int nr_inst;
+ if (!test_bit(alt->cpufeature, feature_mask))
+ continue;
+
/* Use ARM64_CB_PATCH as an unconditional patch */
if (alt->cpufeature < ARM64_CB_PATCH &&
!cpus_have_cap(alt->cpufeature))
@@ -203,8 +207,11 @@ static void __apply_alternatives(void *alt_region, bool is_module)
__flush_icache_all();
isb();
- /* We applied all that was available */
- bitmap_copy(applied_alternatives, cpu_hwcaps, ARM64_NCAPS);
+ /* Ignore ARM64_CB bit from feature mask */
+ bitmap_or(applied_alternatives, applied_alternatives,
+ feature_mask, ARM64_NCAPS);
+ bitmap_and(applied_alternatives, applied_alternatives,
+ cpu_hwcaps, ARM64_NCAPS);
}
}
@@ -225,8 +232,13 @@ static int __apply_alternatives_multi_stop(void *unused)
cpu_relax();
isb();
} else {
+ DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
+
+ bitmap_complement(remaining_capabilities, boot_capabilities,
+ ARM64_NPATCHABLE);
+
BUG_ON(all_alternatives_applied);
- __apply_alternatives(®ion, false);
+ __apply_alternatives(®ion, false, remaining_capabilities);
/* Barriers provided by the cache flushing */
WRITE_ONCE(all_alternatives_applied, 1);
}
@@ -240,6 +252,24 @@ void __init apply_alternatives_all(void)
stop_machine(__apply_alternatives_multi_stop, NULL, cpu_online_mask);
}
+/*
+ * This is called very early in the boot process (directly after we run
+ * a feature detect on the boot CPU). No need to worry about other CPUs
+ * here.
+ */
+void __init apply_boot_alternatives(void)
+{
+ struct alt_region region = {
+ .begin = (struct alt_instr *)__alt_instructions,
+ .end = (struct alt_instr *)__alt_instructions_end,
+ };
+
+ /* If called on non-boot cpu things could go wrong */
+ WARN_ON(smp_processor_id() != 0);
+
+ __apply_alternatives(®ion, false, &boot_capabilities[0]);
+}
+
#ifdef CONFIG_MODULES
void apply_alternatives_module(void *start, size_t length)
{
@@ -247,7 +277,10 @@ void apply_alternatives_module(void *start, size_t length)
.begin = start,
.end = start + length,
};
+ DECLARE_BITMAP(all_capabilities, ARM64_NPATCHABLE);
+
+ bitmap_fill(all_capabilities, ARM64_NPATCHABLE);
- __apply_alternatives(®ion, true);
+ __apply_alternatives(®ion, true, &all_capabilities[0]);
}
#endif
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index d607ea3..b530fb24 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -54,6 +54,9 @@
EXPORT_SYMBOL(cpu_hwcaps);
static struct arm64_cpu_capabilities const __ro_after_init *cpu_hwcaps_ptrs[ARM64_NCAPS];
+/* Need also bit for ARM64_CB_PATCH */
+DECLARE_BITMAP(boot_capabilities, ARM64_NPATCHABLE);
+
/*
* Flag to indicate if we have computed the system wide
* capabilities based on the boot time active CPUs. This
@@ -1677,6 +1680,9 @@ static void update_cpu_capabilities(u16 scope_mask)
if (caps->desc)
pr_info("detected: %s\n", caps->desc);
cpus_set_cap(caps->capability);
+
+ if ((scope_mask & SCOPE_BOOT_CPU) && (caps->type & SCOPE_BOOT_CPU))
+ set_bit(caps->capability, boot_capabilities);
}
}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 1598d6f..a944edd 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -419,6 +419,13 @@ void __init smp_prepare_boot_cpu(void)
*/
jump_label_init();
cpuinfo_store_boot_cpu();
+
+ /*
+ * We now know enough about the boot CPU to apply the
+ * alternatives that cannot wait until interrupt handling
+ * and/or scheduling is enabled.
+ */
+ apply_boot_alternatives();
}
static u64 __init of_get_cpu_mpidr(struct device_node *dn)
--
1.9.1
In preparation for the application of alternatives at different points
during the boot process, provide the possibility to check whether
alternatives for a feature of interest was already applied instead of
having a global boolean for all alternatives.
Make VHE enablement code check for the VHE feature instead of considering
all alternatives.
Signed-off-by: Julien Thierry <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Christoffer Dall <[email protected]>
---
arch/arm64/include/asm/alternative.h | 3 +--
arch/arm64/kernel/alternative.c | 21 +++++++++++++++++----
arch/arm64/kernel/cpufeature.c | 2 +-
3 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 4b650ec..9806a23 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -14,8 +14,6 @@
#include <linux/stddef.h>
#include <linux/stringify.h>
-extern int alternatives_applied;
-
struct alt_instr {
s32 orig_offset; /* offset to original instruction */
s32 alt_offset; /* offset to replacement instruction */
@@ -28,6 +26,7 @@ typedef void (*alternative_cb_t)(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
void __init apply_alternatives_all(void);
+bool alternative_is_applied(u16 cpufeature);
#ifdef CONFIG_MODULES
void apply_alternatives_module(void *start, size_t length);
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index b5d6039..c947d22 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -32,13 +32,23 @@
#define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset)
#define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset)
-int alternatives_applied;
+static int all_alternatives_applied;
+
+static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
struct alt_region {
struct alt_instr *begin;
struct alt_instr *end;
};
+bool alternative_is_applied(u16 cpufeature)
+{
+ if (WARN_ON(cpufeature >= ARM64_NCAPS))
+ return false;
+
+ return test_bit(cpufeature, applied_alternatives);
+}
+
/*
* Check if the target PC is within an alternative block.
*/
@@ -192,6 +202,9 @@ static void __apply_alternatives(void *alt_region, bool is_module)
dsb(ish);
__flush_icache_all();
isb();
+
+ /* We applied all that was available */
+ bitmap_copy(applied_alternatives, cpu_hwcaps, ARM64_NCAPS);
}
}
@@ -208,14 +221,14 @@ static int __apply_alternatives_multi_stop(void *unused)
/* We always have a CPU 0 at this point (__init) */
if (smp_processor_id()) {
- while (!READ_ONCE(alternatives_applied))
+ while (!READ_ONCE(all_alternatives_applied))
cpu_relax();
isb();
} else {
- BUG_ON(alternatives_applied);
+ BUG_ON(all_alternatives_applied);
__apply_alternatives(®ion, false);
/* Barriers provided by the cache flushing */
- WRITE_ONCE(alternatives_applied, 1);
+ WRITE_ONCE(all_alternatives_applied, 1);
}
return 0;
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6f56e0a..d607ea3 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1118,7 +1118,7 @@ static void cpu_copy_el2regs(const struct arm64_cpu_capabilities *__unused)
* that, freshly-onlined CPUs will set tpidr_el2, so we don't need to
* do anything here.
*/
- if (!alternatives_applied)
+ if (!alternative_is_applied(ARM64_HAS_VIRT_HOST_EXTN))
write_sysreg(read_sysreg(tpidr_el1), tpidr_el2);
}
#endif
--
1.9.1
Instead disabling interrupts by setting the PSR.I bit, use a priority
higher than the one used for interrupts to mask them via PMR.
When using PMR to disable interrupts, the value of PMR will be used
instead of PSR.[DAIF] for the irqflags.
Signed-off-by: Julien Thierry <[email protected]>
Suggested-by: Daniel Thompson <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Oleg Nesterov <[email protected]>
---
arch/arm64/include/asm/efi.h | 6 +++
arch/arm64/include/asm/irqflags.h | 104 ++++++++++++++++++++++++++++----------
2 files changed, 82 insertions(+), 28 deletions(-)
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 7ed3208..3db6772 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -44,6 +44,12 @@
#define ARCH_EFI_IRQ_FLAGS_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT)
+#define arch_efi_save_flags(state_flags) \
+ ((void)((state_flags) = read_sysreg(daif)))
+
+#define arch_efi_restore_flags(state_flags) write_sysreg(state_flags, daif)
+
+
/* arch specific definitions used by the stub code */
/*
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index 24692ed..7e82a92 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -18,7 +18,9 @@
#ifdef __KERNEL__
+#include <asm/alternative.h>
#include <asm/ptrace.h>
+#include <asm/sysreg.h>
/*
* Aarch64 has flags for masking: Debug, Asynchronous (serror), Interrupts and
@@ -36,33 +38,31 @@
/*
* CPU interrupt mask handling.
*/
-static inline unsigned long arch_local_irq_save(void)
-{
- unsigned long flags;
- asm volatile(
- "mrs %0, daif // arch_local_irq_save\n"
- "msr daifset, #2"
- : "=r" (flags)
- :
- : "memory");
- return flags;
-}
-
static inline void arch_local_irq_enable(void)
{
- asm volatile(
- "msr daifclr, #2 // arch_local_irq_enable"
- :
+ unsigned long unmasked = GIC_PRIO_IRQON;
+
+ asm volatile(ALTERNATIVE(
+ "msr daifclr, #2 // arch_local_irq_enable\n"
+ "nop",
+ "msr_s " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
+ "dsb sy",
+ ARM64_HAS_IRQ_PRIO_MASKING)
:
+ : "r" (unmasked)
: "memory");
}
static inline void arch_local_irq_disable(void)
{
- asm volatile(
- "msr daifset, #2 // arch_local_irq_disable"
- :
+ unsigned long masked = GIC_PRIO_IRQOFF;
+
+ asm volatile(ALTERNATIVE(
+ "msr daifset, #2 // arch_local_irq_disable",
+ "msr_s " __stringify(SYS_ICC_PMR_EL1) ", %0",
+ ARM64_HAS_IRQ_PRIO_MASKING)
:
+ : "r" (masked)
: "memory");
}
@@ -71,12 +71,44 @@ static inline void arch_local_irq_disable(void)
*/
static inline unsigned long arch_local_save_flags(void)
{
+ unsigned long daif_bits;
unsigned long flags;
- asm volatile(
- "mrs %0, daif // arch_local_save_flags"
- : "=r" (flags)
- :
+
+ daif_bits = read_sysreg(daif);
+
+ /*
+ * The asm is logically equivalent to:
+ *
+ * if (system_uses_irq_prio_masking())
+ * flags = (daif_bits & PSR_I_BIT) ?
+ * GIC_PRIO_IRQOFF :
+ * read_sysreg_s(SYS_ICC_PMR_EL1);
+ * else
+ * flags = daif_bits;
+ */
+ asm volatile(ALTERNATIVE(
+ "mov %0, %1\n"
+ "nop\n"
+ "nop",
+ "mrs_s %0, " __stringify(SYS_ICC_PMR_EL1) "\n"
+ "ands %1, %1, " __stringify(PSR_I_BIT) "\n"
+ "csel %0, %0, %2, eq",
+ ARM64_HAS_IRQ_PRIO_MASKING)
+ : "=&r" (flags), "+r" (daif_bits)
+ : "r" (GIC_PRIO_IRQOFF)
: "memory");
+
+ return flags;
+}
+
+static inline unsigned long arch_local_irq_save(void)
+{
+ unsigned long flags;
+
+ flags = arch_local_save_flags();
+
+ arch_local_irq_disable();
+
return flags;
}
@@ -85,16 +117,32 @@ static inline unsigned long arch_local_save_flags(void)
*/
static inline void arch_local_irq_restore(unsigned long flags)
{
- asm volatile(
- "msr daif, %0 // arch_local_irq_restore"
- :
- : "r" (flags)
- : "memory");
+ asm volatile(ALTERNATIVE(
+ "msr daif, %0\n"
+ "nop",
+ "msr_s " __stringify(SYS_ICC_PMR_EL1) ", %0\n"
+ "dsb sy",
+ ARM64_HAS_IRQ_PRIO_MASKING)
+ : "+r" (flags)
+ :
+ : "memory");
}
static inline int arch_irqs_disabled_flags(unsigned long flags)
{
- return flags & PSR_I_BIT;
+ int res;
+
+ asm volatile(ALTERNATIVE(
+ "and %w0, %w1, #" __stringify(PSR_I_BIT) "\n"
+ "nop",
+ "cmp %w1, #" __stringify(GIC_PRIO_IRQOFF) "\n"
+ "cset %w0, ls",
+ ARM64_HAS_IRQ_PRIO_MASKING)
+ : "=&r" (res)
+ : "r" ((int) flags)
+ : "memory");
+
+ return res;
}
#endif
#endif
--
1.9.1
There are some helpers to modify PSR.[DAIF] bits that are not referenced
anywhere. The less these bits are available outside of local_irq_*
functions the better.
Get rid of those unused helpers.
Signed-off-by: Julien Thierry <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: James Morse <[email protected]>
---
arch/arm64/include/asm/assembler.h | 10 +---------
arch/arm64/include/asm/daifflags.h | 10 ----------
2 files changed, 1 insertion(+), 19 deletions(-)
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 4feb611..7acf243 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -62,16 +62,8 @@
.endm
/*
- * Enable and disable interrupts.
+ * Save/restore interrupts.
*/
- .macro disable_irq
- msr daifset, #2
- .endm
-
- .macro enable_irq
- msr daifclr, #2
- .endm
-
.macro save_and_disable_irq, flags
mrs \flags, daif
msr daifset, #2
diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index 8d91f22..546bc39 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -43,16 +43,6 @@ static inline unsigned long local_daif_save(void)
return flags;
}
-static inline void local_daif_unmask(void)
-{
- trace_hardirqs_on();
- asm volatile(
- "msr daifclr, #0xf // local_daif_unmask"
- :
- :
- : "memory");
-}
-
static inline void local_daif_restore(unsigned long flags)
{
if (!arch_irqs_disabled_flags(flags))
--
1.9.1
In order to replace PSR.I interrupt disabling/enabling with ICC_PMR_EL1
interrupt masking, ICC_PMR_EL1 needs to be saved/restored when
taking/returning from an exception. This mimics the way hardware saves
and restores PSR.I bit in spsr_el1 for exceptions and ERET.
Add PMR to the registers to save in the pt_regs struct upon kernel entry,
and restore it before ERET. Also, initialize it to a sane value when
creating new tasks.
Signed-off-by: Julien Thierry <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Dave Martin <[email protected]>
---
arch/arm64/include/asm/processor.h | 3 +++
arch/arm64/include/asm/ptrace.h | 14 +++++++++++---
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/entry.S | 14 ++++++++++++++
arch/arm64/kernel/process.c | 6 ++++++
5 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index f1a7ab1..5d9ce62 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -191,6 +191,9 @@ static inline void start_thread_common(struct pt_regs *regs, unsigned long pc)
memset(regs, 0, sizeof(*regs));
forget_syscall(regs);
regs->pc = pc;
+
+ if (system_uses_irq_prio_masking())
+ regs->pmr_save = GIC_PRIO_IRQON;
}
static inline void start_thread(struct pt_regs *regs, unsigned long pc,
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index 05cf913..43e5df5 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -19,6 +19,8 @@
#ifndef __ASM_PTRACE_H
#define __ASM_PTRACE_H
+#include <asm/cpufeature.h>
+
#include <uapi/asm/ptrace.h>
/* Current Exception Level values, as contained in CurrentEL */
@@ -179,7 +181,8 @@ struct pt_regs {
#endif
u64 orig_addr_limit;
- u64 unused; // maintain 16 byte alignment
+ /* Only valid when ARM64_HAS_IRQ_PRIO_MASKING is enabled. */
+ u64 pmr_save;
u64 stackframe[2];
};
@@ -214,8 +217,13 @@ static inline void forget_syscall(struct pt_regs *regs)
#define processor_mode(regs) \
((regs)->pstate & PSR_MODE_MASK)
-#define interrupts_enabled(regs) \
- (!((regs)->pstate & PSR_I_BIT))
+#define irqs_priority_unmasked(regs) \
+ (system_uses_irq_prio_masking() ? \
+ (regs)->pmr_save == GIC_PRIO_IRQON : \
+ true)
+
+#define interrupts_enabled(regs) \
+ (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs))
#define fast_interrupts_enabled(regs) \
(!((regs)->pstate & PSR_F_BIT))
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 65b8afc..90ab2cf 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -81,6 +81,7 @@ int main(void)
DEFINE(S_ORIG_X0, offsetof(struct pt_regs, orig_x0));
DEFINE(S_SYSCALLNO, offsetof(struct pt_regs, syscallno));
DEFINE(S_ORIG_ADDR_LIMIT, offsetof(struct pt_regs, orig_addr_limit));
+ DEFINE(S_PMR_SAVE, offsetof(struct pt_regs, pmr_save));
DEFINE(S_STACKFRAME, offsetof(struct pt_regs, stackframe));
DEFINE(S_FRAME_SIZE, sizeof(struct pt_regs));
BLANK();
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 0ec0c46..35a47f6 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -249,6 +249,12 @@ alternative_else_nop_endif
msr sp_el0, tsk
.endif
+ /* Save pmr */
+alternative_if ARM64_HAS_IRQ_PRIO_MASKING
+ mrs_s x20, SYS_ICC_PMR_EL1
+ str x20, [sp, #S_PMR_SAVE]
+alternative_else_nop_endif
+
/*
* Registers that may be useful after this macro is invoked:
*
@@ -269,6 +275,14 @@ alternative_else_nop_endif
/* No need to restore UAO, it will be restored from SPSR_EL1 */
.endif
+ /* Restore pmr */
+alternative_if ARM64_HAS_IRQ_PRIO_MASKING
+ ldr x20, [sp, #S_PMR_SAVE]
+ msr_s SYS_ICC_PMR_EL1, x20
+ /* Ensure priority change is seen by redistributor */
+ dsb sy
+alternative_else_nop_endif
+
ldp x21, x22, [sp, #S_PC] // load ELR, SPSR
.if \el == 0
ct_user_enter
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index a0f985a..6d410fc 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -232,6 +232,9 @@ void __show_regs(struct pt_regs *regs)
printk("sp : %016llx\n", sp);
+ if (system_uses_irq_prio_masking())
+ printk("pmr_save: %08llx\n", regs->pmr_save);
+
i = top_reg;
while (i >= 0) {
@@ -363,6 +366,9 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE)
childregs->pstate |= PSR_SSBS_BIT;
+ if (system_uses_irq_prio_masking())
+ childregs->pmr_save = GIC_PRIO_IRQON;
+
p->thread.cpu_context.x19 = stack_start;
p->thread.cpu_context.x20 = stk_sz;
}
--
1.9.1
Mask the IRQ priority through PMR and re-enable IRQs at CPU level,
allowing only higher priority interrupts to be received during interrupt
handling.
Signed-off-by: Julien Thierry <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Marc Zyngier <[email protected]>
---
arch/arm/include/asm/arch_gicv3.h | 17 +++++++++++++++++
arch/arm64/include/asm/arch_gicv3.h | 17 +++++++++++++++++
drivers/irqchip/irq-gic-v3.c | 5 +++++
3 files changed, 39 insertions(+)
diff --git a/arch/arm/include/asm/arch_gicv3.h b/arch/arm/include/asm/arch_gicv3.h
index bef0b5d..f6f485f 100644
--- a/arch/arm/include/asm/arch_gicv3.h
+++ b/arch/arm/include/asm/arch_gicv3.h
@@ -363,5 +363,22 @@ static inline void gits_write_vpendbaser(u64 val, void * __iomem addr)
#define gits_read_vpendbaser(c) __gic_readq_nonatomic(c)
+static inline bool gic_prio_masking_enabled(void)
+{
+ return false;
+}
+
+static inline void gic_pmr_mask_irqs(void)
+{
+ /* Should not get called. */
+ WARN_ON_ONCE(true);
+}
+
+static inline void gic_arch_enable_irqs(void)
+{
+ /* Should not get called. */
+ WARN_ON_ONCE(true);
+}
+
#endif /* !__ASSEMBLY__ */
#endif /* !__ASM_ARCH_GICV3_H */
diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h
index 37193e2..b5f8142 100644
--- a/arch/arm64/include/asm/arch_gicv3.h
+++ b/arch/arm64/include/asm/arch_gicv3.h
@@ -155,5 +155,22 @@ static inline u32 gic_read_rpr(void)
#define gits_write_vpendbaser(v, c) writeq_relaxed(v, c)
#define gits_read_vpendbaser(c) readq_relaxed(c)
+static inline bool gic_prio_masking_enabled(void)
+{
+ return system_uses_irq_prio_masking();
+}
+
+static inline void gic_pmr_mask_irqs(void)
+{
+ /* Should not get called yet. */
+ WARN_ON_ONCE(true);
+}
+
+static inline void gic_arch_enable_irqs(void)
+{
+ /* Should not get called yet. */
+ WARN_ON_ONCE(true);
+}
+
#endif /* __ASSEMBLY__ */
#endif /* __ASM_ARCH_GICV3_H */
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 0868a9d..8148a92 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -356,6 +356,11 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
irqnr = gic_read_iar();
+ if (gic_prio_masking_enabled()) {
+ gic_pmr_mask_irqs();
+ gic_arch_enable_irqs();
+ }
+
if (likely(irqnr > 15 && irqnr < 1020) || irqnr >= 8192) {
int err;
--
1.9.1
When using VHE, the host needs to clear HCR_EL2.TGE bit in order
to interract with guest TLBs, switching from EL2&0 translation regime
to EL1&0.
However, some non-maskable asynchronous event could happen while TGE is
cleared like SDEI. Because of this address translation operations
relying on EL2&0 translation regime could fail (tlb invalidation,
userspace access, ...).
Fix this by properly setting HCR_EL2.TGE when entering NMI context and
clear it if necessary when returning to the interrupted context.
Signed-off-by: Julien Thierry <[email protected]>
Suggested-by: Marc Zyngier <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: James Morse <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/arm64/include/asm/hardirq.h | 28 ++++++++++++++++++++++++++++
arch/arm64/kernel/irq.c | 3 +++
include/linux/hardirq.h | 7 +++++++
3 files changed, 38 insertions(+)
diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index 1473fc2..94b7481 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -19,6 +19,7 @@
#include <linux/cache.h>
#include <linux/threads.h>
#include <asm/irq.h>
+#include <asm/kvm_arm.h>
#define NR_IPI 7
@@ -37,6 +38,33 @@
#define __ARCH_IRQ_EXIT_IRQS_DISABLED 1
+struct nmi_ctx {
+ u64 hcr;
+};
+
+DECLARE_PER_CPU(struct nmi_ctx, nmi_contexts);
+
+#define arch_nmi_enter() \
+ do { \
+ if (is_kernel_in_hyp_mode()) { \
+ struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts); \
+ nmi_ctx->hcr = read_sysreg(hcr_el2); \
+ if (!(nmi_ctx->hcr & HCR_TGE)) { \
+ write_sysreg(nmi_ctx->hcr | HCR_TGE, hcr_el2); \
+ isb(); \
+ } \
+ } \
+ } while (0)
+
+#define arch_nmi_exit() \
+ do { \
+ if (is_kernel_in_hyp_mode()) { \
+ struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts); \
+ if (!(nmi_ctx->hcr & HCR_TGE)) \
+ write_sysreg(nmi_ctx->hcr, hcr_el2); \
+ } \
+ } while (0)
+
static inline void ack_bad_irq(unsigned int irq)
{
extern unsigned long irq_err_count;
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 780a12f..92fa817 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -33,6 +33,9 @@
unsigned long irq_err_count;
+/* Only access this in an NMI enter/exit */
+DEFINE_PER_CPU(struct nmi_ctx, nmi_contexts);
+
DEFINE_PER_CPU(unsigned long *, irq_stack_ptr);
int arch_show_interrupts(struct seq_file *p, int prec)
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 0fbbcdf..da0af63 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -60,8 +60,14 @@ static inline void rcu_nmi_exit(void)
*/
extern void irq_exit(void);
+#ifndef arch_nmi_enter
+#define arch_nmi_enter() do { } while (0)
+#define arch_nmi_exit() do { } while (0)
+#endif
+
#define nmi_enter() \
do { \
+ arch_nmi_enter(); \
printk_nmi_enter(); \
lockdep_off(); \
ftrace_nmi_enter(); \
@@ -80,6 +86,7 @@ static inline void rcu_nmi_exit(void)
ftrace_nmi_exit(); \
lockdep_on(); \
printk_nmi_exit(); \
+ arch_nmi_exit(); \
} while (0)
#endif /* LINUX_HARDIRQ_H */
--
1.9.1
On Mon, 21 Jan 2019 at 16:34, Julien Thierry <[email protected]> wrote:
>
> Currently, irqflags are saved before calling runtime services and
> checked for mismatch on return.
>
> Provide a pair of overridable macros to save and restore (if needed) the
> state that need to be preserved on return from a runtime service.
> This allows to check for flags that are not necesarly related to
> irqflags.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: [email protected]
> ---
> drivers/firmware/efi/runtime-wrappers.c | 17 +++++++++++++++--
> include/linux/efi.h | 5 +++--
> 2 files changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
> index 8903b9c..2f4b68b 100644
> --- a/drivers/firmware/efi/runtime-wrappers.c
> +++ b/drivers/firmware/efi/runtime-wrappers.c
> @@ -89,11 +89,24 @@
> efi_rts_work.status; \
> })
>
> +#ifndef arch_efi_save_flags
> +#define arch_efi_save_flags(state_flags) local_save_flags(state_flags)
> +#define arch_efi_restore_flags(state_flags) local_irq_restore(state_flags)
> +#endif
> +
> +inline unsigned long efi_call_virt_save_flags(void)
If you drop the 'inline' here,
Acked-by: Ard Biesheuvel <[email protected]>
> +{
> + unsigned long flags;
> +
> + arch_efi_save_flags(flags);
> + return flags;
> +}
> +
> void efi_call_virt_check_flags(unsigned long flags, const char *call)
> {
> unsigned long cur_flags, mismatch;
>
> - local_save_flags(cur_flags);
> + cur_flags = efi_call_virt_save_flags();
>
> mismatch = flags ^ cur_flags;
> if (!WARN_ON_ONCE(mismatch & ARCH_EFI_IRQ_FLAGS_MASK))
> @@ -102,7 +115,7 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call)
> add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_NOW_UNRELIABLE);
> pr_err_ratelimited(FW_BUG "IRQ flags corrupted (0x%08lx=>0x%08lx) by EFI %s\n",
> flags, cur_flags, call);
> - local_irq_restore(flags);
> + arch_efi_restore_flags(flags);
> }
>
> /*
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 45ff763..bd80b7e 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -1607,6 +1607,7 @@ efi_status_t efi_setup_gop(efi_system_table_t *sys_table_arg,
>
> bool efi_runtime_disabled(void);
> extern void efi_call_virt_check_flags(unsigned long flags, const char *call);
> +extern unsigned long efi_call_virt_save_flags(void);
>
> enum efi_secureboot_mode {
> efi_secureboot_mode_unset,
> @@ -1652,7 +1653,7 @@ enum efi_secureboot_mode {
> \
> arch_efi_call_virt_setup(); \
> \
> - local_save_flags(__flags); \
> + __flags = efi_call_virt_save_flags(); \
> __s = arch_efi_call_virt(p, f, args); \
> efi_call_virt_check_flags(__flags, __stringify(f)); \
> \
> @@ -1667,7 +1668,7 @@ enum efi_secureboot_mode {
> \
> arch_efi_call_virt_setup(); \
> \
> - local_save_flags(__flags); \
> + __flags = efi_call_virt_save_flags(); \
> arch_efi_call_virt(p, f, args); \
> efi_call_virt_check_flags(__flags, __stringify(f)); \
> \
> --
> 1.9.1
>
On Mon, 21 Jan 2019 at 16:34, Julien Thierry <[email protected]> wrote:
>
> Instead disabling interrupts by setting the PSR.I bit, use a priority
> higher than the one used for interrupts to mask them via PMR.
>
> When using PMR to disable interrupts, the value of PMR will be used
> instead of PSR.[DAIF] for the irqflags.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Suggested-by: Daniel Thompson <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> ---
> arch/arm64/include/asm/efi.h | 6 +++
> arch/arm64/include/asm/irqflags.h | 104 ++++++++++++++++++++++++++++----------
> 2 files changed, 82 insertions(+), 28 deletions(-)
>
> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
> index 7ed3208..3db6772 100644
> --- a/arch/arm64/include/asm/efi.h
> +++ b/arch/arm64/include/asm/efi.h
> @@ -44,6 +44,12 @@
>
> #define ARCH_EFI_IRQ_FLAGS_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT)
>
> +#define arch_efi_save_flags(state_flags) \
> + ((void)((state_flags) = read_sysreg(daif)))
> +
> +#define arch_efi_restore_flags(state_flags) write_sysreg(state_flags, daif)
> +
> +
> /* arch specific definitions used by the stub code */
>
> /*
For the EFI changes
Acked-by: Ard Biesheuvel <[email protected]>
although it deserves a comment as to why it is unaffected by
ARM64_HAS_IRQ_PRIO_MASKING
> diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
> index 24692ed..7e82a92 100644
> --- a/arch/arm64/include/asm/irqflags.h
> +++ b/arch/arm64/include/asm/irqflags.h
> @@ -18,7 +18,9 @@
>
> #ifdef __KERNEL__
>
> +#include <asm/alternative.h>
> #include <asm/ptrace.h>
> +#include <asm/sysreg.h>
>
> /*
> * Aarch64 has flags for masking: Debug, Asynchronous (serror), Interrupts and
> @@ -36,33 +38,31 @@
> /*
> * CPU interrupt mask handling.
> */
> -static inline unsigned long arch_local_irq_save(void)
> -{
> - unsigned long flags;
> - asm volatile(
> - "mrs %0, daif // arch_local_irq_save\n"
> - "msr daifset, #2"
> - : "=r" (flags)
> - :
> - : "memory");
> - return flags;
> -}
> -
> static inline void arch_local_irq_enable(void)
> {
> - asm volatile(
> - "msr daifclr, #2 // arch_local_irq_enable"
> - :
> + unsigned long unmasked = GIC_PRIO_IRQON;
> +
> + asm volatile(ALTERNATIVE(
> + "msr daifclr, #2 // arch_local_irq_enable\n"
> + "nop",
> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
> + "dsb sy",
> + ARM64_HAS_IRQ_PRIO_MASKING)
> :
> + : "r" (unmasked)
> : "memory");
> }
>
> static inline void arch_local_irq_disable(void)
> {
> - asm volatile(
> - "msr daifset, #2 // arch_local_irq_disable"
> - :
> + unsigned long masked = GIC_PRIO_IRQOFF;
> +
> + asm volatile(ALTERNATIVE(
> + "msr daifset, #2 // arch_local_irq_disable",
> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ", %0",
> + ARM64_HAS_IRQ_PRIO_MASKING)
> :
> + : "r" (masked)
> : "memory");
> }
>
> @@ -71,12 +71,44 @@ static inline void arch_local_irq_disable(void)
> */
> static inline unsigned long arch_local_save_flags(void)
> {
> + unsigned long daif_bits;
> unsigned long flags;
> - asm volatile(
> - "mrs %0, daif // arch_local_save_flags"
> - : "=r" (flags)
> - :
> +
> + daif_bits = read_sysreg(daif);
> +
> + /*
> + * The asm is logically equivalent to:
> + *
> + * if (system_uses_irq_prio_masking())
> + * flags = (daif_bits & PSR_I_BIT) ?
> + * GIC_PRIO_IRQOFF :
> + * read_sysreg_s(SYS_ICC_PMR_EL1);
> + * else
> + * flags = daif_bits;
> + */
> + asm volatile(ALTERNATIVE(
> + "mov %0, %1\n"
> + "nop\n"
> + "nop",
> + "mrs_s %0, " __stringify(SYS_ICC_PMR_EL1) "\n"
> + "ands %1, %1, " __stringify(PSR_I_BIT) "\n"
> + "csel %0, %0, %2, eq",
> + ARM64_HAS_IRQ_PRIO_MASKING)
> + : "=&r" (flags), "+r" (daif_bits)
> + : "r" (GIC_PRIO_IRQOFF)
> : "memory");
> +
> + return flags;
> +}
> +
> +static inline unsigned long arch_local_irq_save(void)
> +{
> + unsigned long flags;
> +
> + flags = arch_local_save_flags();
> +
> + arch_local_irq_disable();
> +
> return flags;
> }
>
> @@ -85,16 +117,32 @@ static inline unsigned long arch_local_save_flags(void)
> */
> static inline void arch_local_irq_restore(unsigned long flags)
> {
> - asm volatile(
> - "msr daif, %0 // arch_local_irq_restore"
> - :
> - : "r" (flags)
> - : "memory");
> + asm volatile(ALTERNATIVE(
> + "msr daif, %0\n"
> + "nop",
> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ", %0\n"
> + "dsb sy",
> + ARM64_HAS_IRQ_PRIO_MASKING)
> + : "+r" (flags)
> + :
> + : "memory");
> }
>
> static inline int arch_irqs_disabled_flags(unsigned long flags)
> {
> - return flags & PSR_I_BIT;
> + int res;
> +
> + asm volatile(ALTERNATIVE(
> + "and %w0, %w1, #" __stringify(PSR_I_BIT) "\n"
> + "nop",
> + "cmp %w1, #" __stringify(GIC_PRIO_IRQOFF) "\n"
> + "cset %w0, ls",
> + ARM64_HAS_IRQ_PRIO_MASKING)
> + : "=&r" (res)
> + : "r" ((int) flags)
> + : "memory");
> +
> + return res;
> }
> #endif
> #endif
> --
> 1.9.1
>
On 21/01/2019 15:45, Ard Biesheuvel wrote:
> On Mon, 21 Jan 2019 at 16:34, Julien Thierry <[email protected]> wrote:
>>
>> Instead disabling interrupts by setting the PSR.I bit, use a priority
>> higher than the one used for interrupts to mask them via PMR.
>>
>> When using PMR to disable interrupts, the value of PMR will be used
>> instead of PSR.[DAIF] for the irqflags.
>>
>> Signed-off-by: Julien Thierry <[email protected]>
>> Suggested-by: Daniel Thompson <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Ard Biesheuvel <[email protected]>
>> Cc: Oleg Nesterov <[email protected]>
>> ---
>> arch/arm64/include/asm/efi.h | 6 +++
>> arch/arm64/include/asm/irqflags.h | 104 ++++++++++++++++++++++++++++----------
>> 2 files changed, 82 insertions(+), 28 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
>> index 7ed3208..3db6772 100644
>> --- a/arch/arm64/include/asm/efi.h
>> +++ b/arch/arm64/include/asm/efi.h
>> @@ -44,6 +44,12 @@
>>
>> #define ARCH_EFI_IRQ_FLAGS_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT)
>>
>> +#define arch_efi_save_flags(state_flags) \
>> + ((void)((state_flags) = read_sysreg(daif)))
>> +
>> +#define arch_efi_restore_flags(state_flags) write_sysreg(state_flags, daif)
>> +
>> +
>> /* arch specific definitions used by the stub code */
>>
>> /*
>
> For the EFI changes
>
> Acked-by: Ard Biesheuvel <[email protected]>
Thanks!
>
> although it deserves a comment as to why it is unaffected by
> ARM64_HAS_IRQ_PRIO_MASKING
So far we've relied on EFI not modifying PMR, otherwise we could've
ended up without any interrupts in Linux after coming back from an EFI
service.
Now, the ARM64_HAS_IRQ_PRIO_MASKING only modifies the behaviour of
Linux, but we should have the same expectations from EFI. If we want to
be paranoid we could save/restore PMR along with the DAIF flags.
But you're right, it is fair to add a comment clearly stating that we
expect EFI not to modify PMR. I'll include that in the next posting.
Thanks,
Julien
>
>
>> diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
>> index 24692ed..7e82a92 100644
>> --- a/arch/arm64/include/asm/irqflags.h
>> +++ b/arch/arm64/include/asm/irqflags.h
>> @@ -18,7 +18,9 @@
>>
>> #ifdef __KERNEL__
>>
>> +#include <asm/alternative.h>
>> #include <asm/ptrace.h>
>> +#include <asm/sysreg.h>
>>
>> /*
>> * Aarch64 has flags for masking: Debug, Asynchronous (serror), Interrupts and
>> @@ -36,33 +38,31 @@
>> /*
>> * CPU interrupt mask handling.
>> */
>> -static inline unsigned long arch_local_irq_save(void)
>> -{
>> - unsigned long flags;
>> - asm volatile(
>> - "mrs %0, daif // arch_local_irq_save\n"
>> - "msr daifset, #2"
>> - : "=r" (flags)
>> - :
>> - : "memory");
>> - return flags;
>> -}
>> -
>> static inline void arch_local_irq_enable(void)
>> {
>> - asm volatile(
>> - "msr daifclr, #2 // arch_local_irq_enable"
>> - :
>> + unsigned long unmasked = GIC_PRIO_IRQON;
>> +
>> + asm volatile(ALTERNATIVE(
>> + "msr daifclr, #2 // arch_local_irq_enable\n"
>> + "nop",
>> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
>> + "dsb sy",
>> + ARM64_HAS_IRQ_PRIO_MASKING)
>> :
>> + : "r" (unmasked)
>> : "memory");
>> }
>>
>> static inline void arch_local_irq_disable(void)
>> {
>> - asm volatile(
>> - "msr daifset, #2 // arch_local_irq_disable"
>> - :
>> + unsigned long masked = GIC_PRIO_IRQOFF;
>> +
>> + asm volatile(ALTERNATIVE(
>> + "msr daifset, #2 // arch_local_irq_disable",
>> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ", %0",
>> + ARM64_HAS_IRQ_PRIO_MASKING)
>> :
>> + : "r" (masked)
>> : "memory");
>> }
>>
>> @@ -71,12 +71,44 @@ static inline void arch_local_irq_disable(void)
>> */
>> static inline unsigned long arch_local_save_flags(void)
>> {
>> + unsigned long daif_bits;
>> unsigned long flags;
>> - asm volatile(
>> - "mrs %0, daif // arch_local_save_flags"
>> - : "=r" (flags)
>> - :
>> +
>> + daif_bits = read_sysreg(daif);
>> +
>> + /*
>> + * The asm is logically equivalent to:
>> + *
>> + * if (system_uses_irq_prio_masking())
>> + * flags = (daif_bits & PSR_I_BIT) ?
>> + * GIC_PRIO_IRQOFF :
>> + * read_sysreg_s(SYS_ICC_PMR_EL1);
>> + * else
>> + * flags = daif_bits;
>> + */
>> + asm volatile(ALTERNATIVE(
>> + "mov %0, %1\n"
>> + "nop\n"
>> + "nop",
>> + "mrs_s %0, " __stringify(SYS_ICC_PMR_EL1) "\n"
>> + "ands %1, %1, " __stringify(PSR_I_BIT) "\n"
>> + "csel %0, %0, %2, eq",
>> + ARM64_HAS_IRQ_PRIO_MASKING)
>> + : "=&r" (flags), "+r" (daif_bits)
>> + : "r" (GIC_PRIO_IRQOFF)
>> : "memory");
>> +
>> + return flags;
>> +}
>> +
>> +static inline unsigned long arch_local_irq_save(void)
>> +{
>> + unsigned long flags;
>> +
>> + flags = arch_local_save_flags();
>> +
>> + arch_local_irq_disable();
>> +
>> return flags;
>> }
>>
>> @@ -85,16 +117,32 @@ static inline unsigned long arch_local_save_flags(void)
>> */
>> static inline void arch_local_irq_restore(unsigned long flags)
>> {
>> - asm volatile(
>> - "msr daif, %0 // arch_local_irq_restore"
>> - :
>> - : "r" (flags)
>> - : "memory");
>> + asm volatile(ALTERNATIVE(
>> + "msr daif, %0\n"
>> + "nop",
>> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ", %0\n"
>> + "dsb sy",
>> + ARM64_HAS_IRQ_PRIO_MASKING)
>> + : "+r" (flags)
>> + :
>> + : "memory");
>> }
>>
>> static inline int arch_irqs_disabled_flags(unsigned long flags)
>> {
>> - return flags & PSR_I_BIT;
>> + int res;
>> +
>> + asm volatile(ALTERNATIVE(
>> + "and %w0, %w1, #" __stringify(PSR_I_BIT) "\n"
>> + "nop",
>> + "cmp %w1, #" __stringify(GIC_PRIO_IRQOFF) "\n"
>> + "cset %w0, ls",
>> + ARM64_HAS_IRQ_PRIO_MASKING)
>> + : "=&r" (res)
>> + : "r" ((int) flags)
>> + : "memory");
>> +
>> + return res;
>> }
>> #endif
>> #endif
>> --
>> 1.9.1
>>
--
Julien Thierry
On Mon, Jan 21, 2019 at 03:33:31PM +0000, Julien Thierry wrote:
> diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
> index 24692ed..7e82a92 100644
> --- a/arch/arm64/include/asm/irqflags.h
> +++ b/arch/arm64/include/asm/irqflags.h
> @@ -18,7 +18,9 @@
>
> #ifdef __KERNEL__
>
> +#include <asm/alternative.h>
> #include <asm/ptrace.h>
> +#include <asm/sysreg.h>
>
> /*
> * Aarch64 has flags for masking: Debug, Asynchronous (serror), Interrupts and
> @@ -36,33 +38,31 @@
> /*
> * CPU interrupt mask handling.
> */
> -static inline unsigned long arch_local_irq_save(void)
> -{
> - unsigned long flags;
> - asm volatile(
> - "mrs %0, daif // arch_local_irq_save\n"
> - "msr daifset, #2"
> - : "=r" (flags)
> - :
> - : "memory");
> - return flags;
> -}
> -
> static inline void arch_local_irq_enable(void)
> {
> - asm volatile(
> - "msr daifclr, #2 // arch_local_irq_enable"
> - :
> + unsigned long unmasked = GIC_PRIO_IRQON;
> +
> + asm volatile(ALTERNATIVE(
> + "msr daifclr, #2 // arch_local_irq_enable\n"
> + "nop",
> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
> + "dsb sy",
> + ARM64_HAS_IRQ_PRIO_MASKING)
> :
> + : "r" (unmasked)
> : "memory");
> }
>
> static inline void arch_local_irq_disable(void)
> {
> - asm volatile(
> - "msr daifset, #2 // arch_local_irq_disable"
> - :
> + unsigned long masked = GIC_PRIO_IRQOFF;
> +
> + asm volatile(ALTERNATIVE(
> + "msr daifset, #2 // arch_local_irq_disable",
> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ", %0",
> + ARM64_HAS_IRQ_PRIO_MASKING)
> :
> + : "r" (masked)
> : "memory");
> }
Nitpicks: you could drop masked/unmasked variables here (it's up to you,
it wouldn't make any difference on the generated asm).
> @@ -71,12 +71,44 @@ static inline void arch_local_irq_disable(void)
> */
> static inline unsigned long arch_local_save_flags(void)
> {
> + unsigned long daif_bits;
> unsigned long flags;
> - asm volatile(
> - "mrs %0, daif // arch_local_save_flags"
> - : "=r" (flags)
> - :
> +
> + daif_bits = read_sysreg(daif);
> +
> + /*
> + * The asm is logically equivalent to:
> + *
> + * if (system_uses_irq_prio_masking())
> + * flags = (daif_bits & PSR_I_BIT) ?
> + * GIC_PRIO_IRQOFF :
> + * read_sysreg_s(SYS_ICC_PMR_EL1);
> + * else
> + * flags = daif_bits;
> + */
> + asm volatile(ALTERNATIVE(
> + "mov %0, %1\n"
> + "nop\n"
> + "nop",
> + "mrs_s %0, " __stringify(SYS_ICC_PMR_EL1) "\n"
> + "ands %1, %1, " __stringify(PSR_I_BIT) "\n"
> + "csel %0, %0, %2, eq",
> + ARM64_HAS_IRQ_PRIO_MASKING)
> + : "=&r" (flags), "+r" (daif_bits)
> + : "r" (GIC_PRIO_IRQOFF)
> : "memory");
> +
> + return flags;
> +}
BTW, how's the code generated from the C version? It will have a branch
but may not be too bad. Either way is fine by me.
Reviewed-by: Catalin Marinas <[email protected]>
On Mon, Jan 21, 2019 at 03:33:28PM +0000, Julien Thierry wrote:
> CPU does not received signals for interrupts with a priority masked by
> ICC_PMR_EL1. This means the CPU might not come back from a WFI
> instruction.
>
> Make sure ICC_PMR_EL1 does not mask interrupts when doing a WFI.
>
> Since the logic of cpu_do_idle is becoming a bit more complex than just
> two instructions, lets turn it from ASM to C.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Suggested-by: Daniel Thompson <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
On Mon, 21 Jan 2019 at 16:36, Julien Thierry <[email protected]> wrote:
>
> CPU does not received signals for interrupts with a priority masked by
> ICC_PMR_EL1. This means the CPU might not come back from a WFI
> instruction.
>
> Make sure ICC_PMR_EL1 does not mask interrupts when doing a WFI.
>
> Since the logic of cpu_do_idle is becoming a bit more complex than just
> two instructions, lets turn it from ASM to C.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Suggested-by: Daniel Thompson <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm64/kernel/process.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> arch/arm64/mm/proc.S | 11 -----------
> 2 files changed, 45 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 6d410fc..f05b63f 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -51,6 +51,7 @@
> #include <linux/thread_info.h>
>
> #include <asm/alternative.h>
> +#include <asm/arch_gicv3.h>
> #include <asm/compat.h>
> #include <asm/cacheflush.h>
> #include <asm/exec.h>
> @@ -74,6 +75,50 @@
>
> void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
>
> +static inline void __cpu_do_idle(void)
> +{
> + dsb(sy);
> + wfi();
> +}
> +
> +static inline void __cpu_do_idle_irqprio(void)
Can we drop the 'inline's from all the function definitions in .c
files? (not just in this patch)
We tend to leave that to the compiler.
> +{
> + unsigned long pmr;
> + unsigned long daif_bits;
> +
> + daif_bits = read_sysreg(daif);
> + write_sysreg(daif_bits | PSR_I_BIT, daif);
> +
> + /*
> + * Unmask PMR before going idle to make sure interrupts can
> + * be raised.
> + */
> + pmr = gic_read_pmr();
> + gic_write_pmr(GIC_PRIO_IRQON);
> +
> + __cpu_do_idle();
> +
> + gic_write_pmr(pmr);
> + write_sysreg(daif_bits, daif);
> +}
> +
> +/*
> + * cpu_do_idle()
> + *
> + * Idle the processor (wait for interrupt).
> + *
> + * If the CPU supports priority masking we must do additional work to
> + * ensure that interrupts are not masked at the PMR (because the core will
> + * not wake up if we block the wake up signal in the interrupt controller).
> + */
> +void cpu_do_idle(void)
> +{
> + if (system_uses_irq_prio_masking())
> + __cpu_do_idle_irqprio();
> + else
> + __cpu_do_idle();
> +}
> +
> /*
> * This is our default idle handler.
> */
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 73886a5..3ea4f3b 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -55,17 +55,6 @@
>
> #define MAIR(attr, mt) ((attr) << ((mt) * 8))
>
> -/*
> - * cpu_do_idle()
> - *
> - * Idle the processor (wait for interrupt).
> - */
> -ENTRY(cpu_do_idle)
> - dsb sy // WFI may enter a low-power mode
> - wfi
> - ret
> -ENDPROC(cpu_do_idle)
> -
> #ifdef CONFIG_CPU_PM
> /**
> * cpu_do_suspend - save CPU registers context
> --
> 1.9.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 22/01/2019 20:18, Ard Biesheuvel wrote:
> On Mon, 21 Jan 2019 at 16:36, Julien Thierry <[email protected]> wrote:
>>
>> CPU does not received signals for interrupts with a priority masked by
>> ICC_PMR_EL1. This means the CPU might not come back from a WFI
>> instruction.
>>
>> Make sure ICC_PMR_EL1 does not mask interrupts when doing a WFI.
>>
>> Since the logic of cpu_do_idle is becoming a bit more complex than just
>> two instructions, lets turn it from ASM to C.
>>
>> Signed-off-by: Julien Thierry <[email protected]>
>> Suggested-by: Daniel Thompson <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> ---
>> arch/arm64/kernel/process.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>> arch/arm64/mm/proc.S | 11 -----------
>> 2 files changed, 45 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> index 6d410fc..f05b63f 100644
>> --- a/arch/arm64/kernel/process.c
>> +++ b/arch/arm64/kernel/process.c
>> @@ -51,6 +51,7 @@
>> #include <linux/thread_info.h>
>>
>> #include <asm/alternative.h>
>> +#include <asm/arch_gicv3.h>
>> #include <asm/compat.h>
>> #include <asm/cacheflush.h>
>> #include <asm/exec.h>
>> @@ -74,6 +75,50 @@
>>
>> void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
>>
>> +static inline void __cpu_do_idle(void)
>> +{
>> + dsb(sy);
>> + wfi();
>> +}
>> +
>> +static inline void __cpu_do_idle_irqprio(void)
>
> Can we drop the 'inline's from all the function definitions in .c
> files? (not just in this patch)
> We tend to leave that to the compiler.
>
Sure, I guess cpu_do_idle() isn't a performance critical path anyway.
Unless anyone argues against it, I'll drop these inlines for the next
version.
Thanks,
Julien
>> +{
>> + unsigned long pmr;
>> + unsigned long daif_bits;
>> +
>> + daif_bits = read_sysreg(daif);
>> + write_sysreg(daif_bits | PSR_I_BIT, daif);
>> +
>> + /*
>> + * Unmask PMR before going idle to make sure interrupts can
>> + * be raised.
>> + */
>> + pmr = gic_read_pmr();
>> + gic_write_pmr(GIC_PRIO_IRQON);
>> +
>> + __cpu_do_idle();
>> +
>> + gic_write_pmr(pmr);
>> + write_sysreg(daif_bits, daif);
>> +}
>> +
>> +/*
>> + * cpu_do_idle()
>> + *
>> + * Idle the processor (wait for interrupt).
>> + *
>> + * If the CPU supports priority masking we must do additional work to
>> + * ensure that interrupts are not masked at the PMR (because the core will
>> + * not wake up if we block the wake up signal in the interrupt controller).
>> + */
>> +void cpu_do_idle(void)
>> +{
>> + if (system_uses_irq_prio_masking())
>> + __cpu_do_idle_irqprio();
>> + else
>> + __cpu_do_idle();
>> +}
>> +
>> /*
>> * This is our default idle handler.
>> */
>> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
>> index 73886a5..3ea4f3b 100644
>> --- a/arch/arm64/mm/proc.S
>> +++ b/arch/arm64/mm/proc.S
>> @@ -55,17 +55,6 @@
>>
>> #define MAIR(attr, mt) ((attr) << ((mt) * 8))
>>
>> -/*
>> - * cpu_do_idle()
>> - *
>> - * Idle the processor (wait for interrupt).
>> - */
>> -ENTRY(cpu_do_idle)
>> - dsb sy // WFI may enter a low-power mode
>> - wfi
>> - ret
>> -ENDPROC(cpu_do_idle)
>> -
>> #ifdef CONFIG_CPU_PM
>> /**
>> * cpu_do_suspend - save CPU registers context
>> --
>> 1.9.1
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
Julien Thierry
On 21/01/2019 15:42, Ard Biesheuvel wrote:
> On Mon, 21 Jan 2019 at 16:34, Julien Thierry <[email protected]> wrote:
>>
>> Currently, irqflags are saved before calling runtime services and
>> checked for mismatch on return.
>>
>> Provide a pair of overridable macros to save and restore (if needed) the
>> state that need to be preserved on return from a runtime service.
>> This allows to check for flags that are not necesarly related to
>> irqflags.
>>
>> Signed-off-by: Julien Thierry <[email protected]>
>> Acked-by: Catalin Marinas <[email protected]>
>> Cc: Ard Biesheuvel <[email protected]>
>> Cc: [email protected]
>> ---
>> drivers/firmware/efi/runtime-wrappers.c | 17 +++++++++++++++--
>> include/linux/efi.h | 5 +++--
>> 2 files changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
>> index 8903b9c..2f4b68b 100644
>> --- a/drivers/firmware/efi/runtime-wrappers.c
>> +++ b/drivers/firmware/efi/runtime-wrappers.c
>> @@ -89,11 +89,24 @@
>> efi_rts_work.status; \
>> })
>>
>> +#ifndef arch_efi_save_flags
>> +#define arch_efi_save_flags(state_flags) local_save_flags(state_flags)
>> +#define arch_efi_restore_flags(state_flags) local_irq_restore(state_flags)
>> +#endif
>> +
>> +inline unsigned long efi_call_virt_save_flags(void)
>
> If you drop the 'inline' here,
>
Sure, I'll drop it in the next version.
> Acked-by: Ard Biesheuvel <[email protected]>
>
Thanks,
Julien
>> +{
>> + unsigned long flags;
>> +
>> + arch_efi_save_flags(flags);
>> + return flags;
>> +}
>> +
>> void efi_call_virt_check_flags(unsigned long flags, const char *call)
>> {
>> unsigned long cur_flags, mismatch;
>>
>> - local_save_flags(cur_flags);
>> + cur_flags = efi_call_virt_save_flags();
>>
>> mismatch = flags ^ cur_flags;
>> if (!WARN_ON_ONCE(mismatch & ARCH_EFI_IRQ_FLAGS_MASK))
>> @@ -102,7 +115,7 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call)
>> add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_NOW_UNRELIABLE);
>> pr_err_ratelimited(FW_BUG "IRQ flags corrupted (0x%08lx=>0x%08lx) by EFI %s\n",
>> flags, cur_flags, call);
>> - local_irq_restore(flags);
>> + arch_efi_restore_flags(flags);
>> }
>>
>> /*
>> diff --git a/include/linux/efi.h b/include/linux/efi.h
>> index 45ff763..bd80b7e 100644
>> --- a/include/linux/efi.h
>> +++ b/include/linux/efi.h
>> @@ -1607,6 +1607,7 @@ efi_status_t efi_setup_gop(efi_system_table_t *sys_table_arg,
>>
>> bool efi_runtime_disabled(void);
>> extern void efi_call_virt_check_flags(unsigned long flags, const char *call);
>> +extern unsigned long efi_call_virt_save_flags(void);
>>
>> enum efi_secureboot_mode {
>> efi_secureboot_mode_unset,
>> @@ -1652,7 +1653,7 @@ enum efi_secureboot_mode {
>> \
>> arch_efi_call_virt_setup(); \
>> \
>> - local_save_flags(__flags); \
>> + __flags = efi_call_virt_save_flags(); \
>> __s = arch_efi_call_virt(p, f, args); \
>> efi_call_virt_check_flags(__flags, __stringify(f)); \
>> \
>> @@ -1667,7 +1668,7 @@ enum efi_secureboot_mode {
>> \
>> arch_efi_call_virt_setup(); \
>> \
>> - local_save_flags(__flags); \
>> + __flags = efi_call_virt_save_flags(); \
>> arch_efi_call_virt(p, f, args); \
>> efi_call_virt_check_flags(__flags, __stringify(f)); \
>> \
>> --
>> 1.9.1
>>
--
Julien Thierry
On Wed, 23 Jan 2019 at 09:56, Julien Thierry <[email protected]> wrote:
>
>
>
> On 22/01/2019 20:18, Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 16:36, Julien Thierry <[email protected]> wrote:
> >>
> >> CPU does not received signals for interrupts with a priority masked by
> >> ICC_PMR_EL1. This means the CPU might not come back from a WFI
> >> instruction.
> >>
> >> Make sure ICC_PMR_EL1 does not mask interrupts when doing a WFI.
> >>
> >> Since the logic of cpu_do_idle is becoming a bit more complex than just
> >> two instructions, lets turn it from ASM to C.
> >>
> >> Signed-off-by: Julien Thierry <[email protected]>
> >> Suggested-by: Daniel Thompson <[email protected]>
> >> Cc: Catalin Marinas <[email protected]>
> >> Cc: Will Deacon <[email protected]>
> >> ---
> >> arch/arm64/kernel/process.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> >> arch/arm64/mm/proc.S | 11 -----------
> >> 2 files changed, 45 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> >> index 6d410fc..f05b63f 100644
> >> --- a/arch/arm64/kernel/process.c
> >> +++ b/arch/arm64/kernel/process.c
> >> @@ -51,6 +51,7 @@
> >> #include <linux/thread_info.h>
> >>
> >> #include <asm/alternative.h>
> >> +#include <asm/arch_gicv3.h>
> >> #include <asm/compat.h>
> >> #include <asm/cacheflush.h>
> >> #include <asm/exec.h>
> >> @@ -74,6 +75,50 @@
> >>
> >> void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
> >>
> >> +static inline void __cpu_do_idle(void)
> >> +{
> >> + dsb(sy);
> >> + wfi();
> >> +}
> >> +
> >> +static inline void __cpu_do_idle_irqprio(void)
> >
> > Can we drop the 'inline's from all the function definitions in .c
> > files? (not just in this patch)
> > We tend to leave that to the compiler.
> >
>
> Sure, I guess cpu_do_idle() isn't a performance critical path anyway.
>
And even if they were, it would not make a difference, since the
compiler will inline this anyway.
For arm64, inline expands to
#define inline inline __attribute__((__always_inline__)) __gnu_inline \
__maybe_unused notrace
so please just avoid it where you can.
> Unless anyone argues against it, I'll drop these inlines for the next
> version.
>
> Thanks,
>
> Julien
>
> >> +{
> >> + unsigned long pmr;
> >> + unsigned long daif_bits;
> >> +
> >> + daif_bits = read_sysreg(daif);
> >> + write_sysreg(daif_bits | PSR_I_BIT, daif);
> >> +
> >> + /*
> >> + * Unmask PMR before going idle to make sure interrupts can
> >> + * be raised.
> >> + */
> >> + pmr = gic_read_pmr();
> >> + gic_write_pmr(GIC_PRIO_IRQON);
> >> +
> >> + __cpu_do_idle();
> >> +
> >> + gic_write_pmr(pmr);
> >> + write_sysreg(daif_bits, daif);
> >> +}
> >> +
> >> +/*
> >> + * cpu_do_idle()
> >> + *
> >> + * Idle the processor (wait for interrupt).
> >> + *
> >> + * If the CPU supports priority masking we must do additional work to
> >> + * ensure that interrupts are not masked at the PMR (because the core will
> >> + * not wake up if we block the wake up signal in the interrupt controller).
> >> + */
> >> +void cpu_do_idle(void)
> >> +{
> >> + if (system_uses_irq_prio_masking())
> >> + __cpu_do_idle_irqprio();
> >> + else
> >> + __cpu_do_idle();
> >> +}
> >> +
> >> /*
> >> * This is our default idle handler.
> >> */
> >> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> >> index 73886a5..3ea4f3b 100644
> >> --- a/arch/arm64/mm/proc.S
> >> +++ b/arch/arm64/mm/proc.S
> >> @@ -55,17 +55,6 @@
> >>
> >> #define MAIR(attr, mt) ((attr) << ((mt) * 8))
> >>
> >> -/*
> >> - * cpu_do_idle()
> >> - *
> >> - * Idle the processor (wait for interrupt).
> >> - */
> >> -ENTRY(cpu_do_idle)
> >> - dsb sy // WFI may enter a low-power mode
> >> - wfi
> >> - ret
> >> -ENDPROC(cpu_do_idle)
> >> -
> >> #ifdef CONFIG_CPU_PM
> >> /**
> >> * cpu_do_suspend - save CPU registers context
> >> --
> >> 1.9.1
> >>
> >>
> >> _______________________________________________
> >> linux-arm-kernel mailing list
> >> [email protected]
> >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> --
> Julien Thierry
On 22/01/2019 15:21, Catalin Marinas wrote:
> On Mon, Jan 21, 2019 at 03:33:31PM +0000, Julien Thierry wrote:
>> diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
>> index 24692ed..7e82a92 100644
>> --- a/arch/arm64/include/asm/irqflags.h
>> +++ b/arch/arm64/include/asm/irqflags.h
>> @@ -18,7 +18,9 @@
>>
>> #ifdef __KERNEL__
>>
>> +#include <asm/alternative.h>
>> #include <asm/ptrace.h>
>> +#include <asm/sysreg.h>
>>
>> /*
>> * Aarch64 has flags for masking: Debug, Asynchronous (serror), Interrupts and
>> @@ -36,33 +38,31 @@
>> /*
>> * CPU interrupt mask handling.
>> */
>> -static inline unsigned long arch_local_irq_save(void)
>> -{
>> - unsigned long flags;
>> - asm volatile(
>> - "mrs %0, daif // arch_local_irq_save\n"
>> - "msr daifset, #2"
>> - : "=r" (flags)
>> - :
>> - : "memory");
>> - return flags;
>> -}
>> -
>> static inline void arch_local_irq_enable(void)
>> {
>> - asm volatile(
>> - "msr daifclr, #2 // arch_local_irq_enable"
>> - :
>> + unsigned long unmasked = GIC_PRIO_IRQON;
>> +
>> + asm volatile(ALTERNATIVE(
>> + "msr daifclr, #2 // arch_local_irq_enable\n"
>> + "nop",
>> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
>> + "dsb sy",
>> + ARM64_HAS_IRQ_PRIO_MASKING)
>> :
>> + : "r" (unmasked)
>> : "memory");
>> }
>>
>> static inline void arch_local_irq_disable(void)
>> {
>> - asm volatile(
>> - "msr daifset, #2 // arch_local_irq_disable"
>> - :
>> + unsigned long masked = GIC_PRIO_IRQOFF;
>> +
>> + asm volatile(ALTERNATIVE(
>> + "msr daifset, #2 // arch_local_irq_disable",
>> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ", %0",
>> + ARM64_HAS_IRQ_PRIO_MASKING)
>> :
>> + : "r" (masked)
>> : "memory");
>> }
>
> Nitpicks: you could drop masked/unmasked variables here (it's up to you,
> it wouldn't make any difference on the generated asm).
>
Good point, I'll do that.
>> @@ -71,12 +71,44 @@ static inline void arch_local_irq_disable(void)
>> */
>> static inline unsigned long arch_local_save_flags(void)
>> {
>> + unsigned long daif_bits;
>> unsigned long flags;
>> - asm volatile(
>> - "mrs %0, daif // arch_local_save_flags"
>> - : "=r" (flags)
>> - :
>> +
>> + daif_bits = read_sysreg(daif);
>> +
>> + /*
>> + * The asm is logically equivalent to:
>> + *
>> + * if (system_uses_irq_prio_masking())
>> + * flags = (daif_bits & PSR_I_BIT) ?
>> + * GIC_PRIO_IRQOFF :
>> + * read_sysreg_s(SYS_ICC_PMR_EL1);
>> + * else
>> + * flags = daif_bits;
>> + */
>> + asm volatile(ALTERNATIVE(
>> + "mov %0, %1\n"
>> + "nop\n"
>> + "nop",
>> + "mrs_s %0, " __stringify(SYS_ICC_PMR_EL1) "\n"
>> + "ands %1, %1, " __stringify(PSR_I_BIT) "\n"
>> + "csel %0, %0, %2, eq",
>> + ARM64_HAS_IRQ_PRIO_MASKING)
>> + : "=&r" (flags), "+r" (daif_bits)
>> + : "r" (GIC_PRIO_IRQOFF)
>> : "memory");
>> +
>> + return flags;
>> +}
>
> BTW, how's the code generated from the C version? It will have a branch
> but may not be too bad. Either way is fine by me.
>
It's a bit hard to talk about the code generated from the C version as
it can lie within several layers of inline, so the instructions for that
section are a bit more scattered.
However, it seems like the compiler is more clever (maybe the asm
volatile prevents some optimizations regarding register allocation or
instruction ordering) and the C version seems to perform slightly better
(although it could be within the noise) despite the branch.
So, I'll just switch up to the C version.
> Reviewed-by: Catalin Marinas <[email protected]>
>
--
Julien Thierry
On 2019/1/21 23:33, Julien Thierry wrote:
> Implement NMI callbacks for GICv3 irqchip. Install NMI safe handlers
> when setting up interrupt line as NMI.
>
> Only SPIs and PPIs are allowed to be set up as NMI.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Jason Cooper <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> ---
> drivers/irqchip/irq-gic-v3.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 84 insertions(+)
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 4df1e94..447d8ab 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
(snip)
>
> +static int gic_irq_nmi_setup(struct irq_data *d)
> +{
> + struct irq_desc *desc = irq_to_desc(d->irq);
> +
> + if (!gic_supports_nmi())
> + return -EINVAL;
> +
> + if (gic_peek_irq(d, GICD_ISENABLER)) {
> + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
> + return -EINVAL;
> + }
> +
> + /*
> + * A secondary irq_chip should be in charge of LPI request,
> + * it should not be possible to get there
> + */
> + if (WARN_ON(gic_irq(d) >= 8192))
> + return -EINVAL;
> +
> + /* desc lock should already be held */
> + if (gic_irq(d) < 32) {
> + /* Setting up PPI as NMI, only switch handler for first NMI */
> + if (!refcount_inc_not_zero(&ppi_nmi_refs[gic_irq(d) - 16])) {
> + refcount_set(&ppi_nmi_refs[gic_irq(d) - 16], 1);
> + desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
> + }
> + } else {
> + desc->handle_irq = handle_fasteoi_nmi;
> + }
> +
> + gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_NMI_PRI);
> +
> + return 0;
> +}
> +
> +static void gic_irq_nmi_teardown(struct irq_data *d)
> +{
> + struct irq_desc *desc = irq_to_desc(d->irq);
> +
> + if (WARN_ON(!gic_supports_nmi()))
> + return;
> +
> + if (gic_peek_irq(d, GICD_ISENABLER)) {
> + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
> + return;
> + }
> +
> + /*
> + * A secondary irq_chip should be in charge of LPI request,
> + * it should not be possible to get there
> + */
> + if (WARN_ON(gic_irq(d) >= 8192))
> + return;
> +
> + /* desc lock should already be held */
> + if (gic_irq(d) < 32) {
> + /* Tearing down NMI, only switch handler for last NMI */
> + if (refcount_dec_and_test(&ppi_nmi_refs[gic_irq(d) - 16]))
> + desc->handle_irq = handle_percpu_devid_irq;
> + } else {
> + desc->handle_irq = handle_fasteoi_irq;
> + }
> +
> + gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_DEF_PRI);
> +}
> +
Hello Julien,
I am afraid the setting of priority is not correct here. If the irq is in redistributor(gic_irq(d) < 32),
we should set the priority on each cpu, while we just set the priority on the current cpu here.
static inline void __iomem *gic_dist_base(struct irq_data *d)
{
if (gic_irq_in_rdist(d)) /* SGI+PPI -> SGI_base for this CPU */
return gic_data_rdist_sgi_base();
if (d->hwirq <= 1023) /* SPI -> dist_base */
return gic_data.dist_base;
return NULL;
}
I tried to add a smp_call_function here, but the kernel reported a warning as we have disabled irq
when calling raw_spin_lock_irqsave in request_nmi or ready_percpu_nmi.
[ 2.137262] Call trace:
[ 2.137265] smp_call_function_many+0xf8/0x3a0
[ 2.137267] smp_call_function+0x40/0x58
[ 2.137271] gic_irq_nmi_setup+0xe8/0x118
[ 2.137275] ready_percpu_nmi+0x6c/0xf0
[ 2.137279] armpmu_request_irq+0x228/0x250
[ 2.137281] arm_pmu_acpi_init+0x150/0x2f0
[ 2.137284] do_one_initcall+0x54/0x218
[ 2.137289] kernel_init_freeable+0x230/0x354
[ 2.137293] kernel_init+0x18/0x118
[ 2.137295] ret_from_fork+0x10/0x18
I am exploring a better way to solve this issue.
Thanks,
Wei Li
On Sat, 26 Jan 2019 10:19:52 +0000,
"liwei (GF)" <[email protected]> wrote:
>
>
>
> On 2019/1/21 23:33, Julien Thierry wrote:
> > Implement NMI callbacks for GICv3 irqchip. Install NMI safe handlers
> > when setting up interrupt line as NMI.
> >
> > Only SPIs and PPIs are allowed to be set up as NMI.
> >
> > Signed-off-by: Julien Thierry <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Jason Cooper <[email protected]>
> > Cc: Marc Zyngier <[email protected]>
> > ---
> > drivers/irqchip/irq-gic-v3.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 84 insertions(+)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > index 4df1e94..447d8ab 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> (snip)
> >
> > +static int gic_irq_nmi_setup(struct irq_data *d)
> > +{
> > + struct irq_desc *desc = irq_to_desc(d->irq);
> > +
> > + if (!gic_supports_nmi())
> > + return -EINVAL;
> > +
> > + if (gic_peek_irq(d, GICD_ISENABLER)) {
> > + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
> > + return -EINVAL;
> > + }
> > +
> > + /*
> > + * A secondary irq_chip should be in charge of LPI request,
> > + * it should not be possible to get there
> > + */
> > + if (WARN_ON(gic_irq(d) >= 8192))
> > + return -EINVAL;
> > +
> > + /* desc lock should already be held */
> > + if (gic_irq(d) < 32) {
> > + /* Setting up PPI as NMI, only switch handler for first NMI */
> > + if (!refcount_inc_not_zero(&ppi_nmi_refs[gic_irq(d) - 16])) {
> > + refcount_set(&ppi_nmi_refs[gic_irq(d) - 16], 1);
> > + desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
> > + }
> > + } else {
> > + desc->handle_irq = handle_fasteoi_nmi;
> > + }
> > +
> > + gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_NMI_PRI);
> > +
> > + return 0;
> > +}
> > +
> > +static void gic_irq_nmi_teardown(struct irq_data *d)
> > +{
> > + struct irq_desc *desc = irq_to_desc(d->irq);
> > +
> > + if (WARN_ON(!gic_supports_nmi()))
> > + return;
> > +
> > + if (gic_peek_irq(d, GICD_ISENABLER)) {
> > + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
> > + return;
> > + }
> > +
> > + /*
> > + * A secondary irq_chip should be in charge of LPI request,
> > + * it should not be possible to get there
> > + */
> > + if (WARN_ON(gic_irq(d) >= 8192))
> > + return;
> > +
> > + /* desc lock should already be held */
> > + if (gic_irq(d) < 32) {
> > + /* Tearing down NMI, only switch handler for last NMI */
> > + if (refcount_dec_and_test(&ppi_nmi_refs[gic_irq(d) - 16]))
> > + desc->handle_irq = handle_percpu_devid_irq;
> > + } else {
> > + desc->handle_irq = handle_fasteoi_irq;
> > + }
> > +
> > + gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_DEF_PRI);
> > +}
> > +
>
> Hello Julien,
> I am afraid the setting of priority is not correct here. If the irq
> is in redistributor(gic_irq(d) < 32), we should set the priority on
> each cpu, while we just set the priority on the current cpu here.
I think it really boils down to what semantic we want to present to
the drivers. Given that all operations on PPIs are per-CPU
(enable/disable), I do not see why the NMI setting should be any
different.
I'm certainly not keen on having diverging semantics here.
> static inline void __iomem *gic_dist_base(struct irq_data *d)
> {
> if (gic_irq_in_rdist(d)) /* SGI+PPI -> SGI_base for this CPU */
> return gic_data_rdist_sgi_base();
>
> if (d->hwirq <= 1023) /* SPI -> dist_base */
> return gic_data.dist_base;
>
> return NULL;
> }
>
> I tried to add a smp_call_function here, but the kernel reported a warning as we have disabled irq
> when calling raw_spin_lock_irqsave in request_nmi or ready_percpu_nmi.
> [ 2.137262] Call trace:
> [ 2.137265] smp_call_function_many+0xf8/0x3a0
> [ 2.137267] smp_call_function+0x40/0x58
> [ 2.137271] gic_irq_nmi_setup+0xe8/0x118
> [ 2.137275] ready_percpu_nmi+0x6c/0xf0
> [ 2.137279] armpmu_request_irq+0x228/0x250
> [ 2.137281] arm_pmu_acpi_init+0x150/0x2f0
> [ 2.137284] do_one_initcall+0x54/0x218
> [ 2.137289] kernel_init_freeable+0x230/0x354
> [ 2.137293] kernel_init+0x18/0x118
> [ 2.137295] ret_from_fork+0x10/0x18
>
> I am exploring a better way to solve this issue.
The real question is "why should you care?". As far as the system is
concerned, a PPI only makes sense on a single CPU. So a single PPI on
two CPUs represent to different interrupts. Yes, they have the same
interrupt number, but they really are two independent interrupt lines.
What issue do you see with this?
Thanks,
M.
--
Jazz is not dead, it just smell funny.
Hi,
On 26/01/2019 10:19, liwei (GF) wrote:
>
>
> On 2019/1/21 23:33, Julien Thierry wrote:
>> Implement NMI callbacks for GICv3 irqchip. Install NMI safe handlers
>> when setting up interrupt line as NMI.
>>
>> Only SPIs and PPIs are allowed to be set up as NMI.
>>
>> Signed-off-by: Julien Thierry <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Jason Cooper <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> ---
>> drivers/irqchip/irq-gic-v3.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 84 insertions(+)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
>> index 4df1e94..447d8ab 100644
>> --- a/drivers/irqchip/irq-gic-v3.c
>> +++ b/drivers/irqchip/irq-gic-v3.c
> (snip)
>>
>> +static int gic_irq_nmi_setup(struct irq_data *d)
>> +{
>> + struct irq_desc *desc = irq_to_desc(d->irq);
>> +
>> + if (!gic_supports_nmi())
>> + return -EINVAL;
>> +
>> + if (gic_peek_irq(d, GICD_ISENABLER)) {
>> + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
>> + return -EINVAL;
>> + }
>> +
>> + /*
>> + * A secondary irq_chip should be in charge of LPI request,
>> + * it should not be possible to get there
>> + */
>> + if (WARN_ON(gic_irq(d) >= 8192))
>> + return -EINVAL;
>> +
>> + /* desc lock should already be held */
>> + if (gic_irq(d) < 32) {
>> + /* Setting up PPI as NMI, only switch handler for first NMI */
>> + if (!refcount_inc_not_zero(&ppi_nmi_refs[gic_irq(d) - 16])) {
>> + refcount_set(&ppi_nmi_refs[gic_irq(d) - 16], 1);
>> + desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
>> + }
>> + } else {
>> + desc->handle_irq = handle_fasteoi_nmi;
>> + }
>> +
>> + gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_NMI_PRI);
>> +
>> + return 0;
>> +}
>> +
>> +static void gic_irq_nmi_teardown(struct irq_data *d)
>> +{
>> + struct irq_desc *desc = irq_to_desc(d->irq);
>> +
>> + if (WARN_ON(!gic_supports_nmi()))
>> + return;
>> +
>> + if (gic_peek_irq(d, GICD_ISENABLER)) {
>> + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
>> + return;
>> + }
>> +
>> + /*
>> + * A secondary irq_chip should be in charge of LPI request,
>> + * it should not be possible to get there
>> + */
>> + if (WARN_ON(gic_irq(d) >= 8192))
>> + return;
>> +
>> + /* desc lock should already be held */
>> + if (gic_irq(d) < 32) {
>> + /* Tearing down NMI, only switch handler for last NMI */
>> + if (refcount_dec_and_test(&ppi_nmi_refs[gic_irq(d) - 16]))
>> + desc->handle_irq = handle_percpu_devid_irq;
>> + } else {
>> + desc->handle_irq = handle_fasteoi_irq;
>> + }
>> +
>> + gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_DEF_PRI);
>> +}
>> +
>
> Hello Julien,
> I am afraid the setting of priority is not correct here. If the irq is in redistributor(gic_irq(d) < 32),
> we should set the priority on each cpu, while we just set the priority on the current cpu here.
As Marc stated, to work with PPIs, the core IRQ API provides a set of
*_percpu_irq functions (request, enable, disable...).
The current idea is that the driver is in charge of calling
ready_percpu_nmi() before enabling on the correct CPU, in a similar
manner that the driver is in charge of calling enable_percpu_irq() and
disable_percpu_irq() on the correct CPU.
> static inline void __iomem *gic_dist_base(struct irq_data *d)
> {
> if (gic_irq_in_rdist(d)) /* SGI+PPI -> SGI_base for this CPU */
> return gic_data_rdist_sgi_base();
>
> if (d->hwirq <= 1023) /* SPI -> dist_base */
> return gic_data.dist_base;
>
> return NULL;
> }
>
> I tried to add a smp_call_function here, but the kernel reported a warning as we have disabled irq
> when calling raw_spin_lock_irqsave in request_nmi or ready_percpu_nmi.
> [ 2.137262] Call trace:
> [ 2.137265] smp_call_function_many+0xf8/0x3a0
> [ 2.137267] smp_call_function+0x40/0x58
> [ 2.137271] gic_irq_nmi_setup+0xe8/0x118
> [ 2.137275] ready_percpu_nmi+0x6c/0xf0> [ 2.137279] armpmu_request_irq+0x228/0x250
Your issue lies here, if your PMU IRQ is a PPI, you shouldn't be calling
ready_percpu_nmi() at the time you request but probably somewhere like
arm_perf_starting_cpu().
And you wouldn't need the smp_call to set the priority.
Hope this helps.
> [ 2.137281] arm_pmu_acpi_init+0x150/0x2f0
> [ 2.137284] do_one_initcall+0x54/0x218
> [ 2.137289] kernel_init_freeable+0x230/0x354
> [ 2.137293] kernel_init+0x18/0x118
> [ 2.137295] ret_from_fork+0x10/0x18
>
Thanks,
--
Julien Thierry
On Mon, 21 Jan 2019 15:33:21 +0000,
Julien Thierry <[email protected]> wrote:
>
> There are some helpers to modify PSR.[DAIF] bits that are not referenced
> anywhere. The less these bits are available outside of local_irq_*
> functions the better.
>
> Get rid of those unused helpers.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Reviewed-by: Mark Rutland <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: James Morse <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:22 +0000,
Julien Thierry <[email protected]> wrote:
>
> It is not supported to have some CPUs using GICv3 sysreg CPU interface
> while some others do not.
>
> Once ICC_SRE_EL1.SRE is set on a CPU, the bit cannot be cleared. Since
> matching this feature require setting ICC_SRE_EL1.SRE, it cannot be
> turned off if found on a CPU.
>
> Set the feature as STRICT_BOOT, if boot CPU has it, all other CPUs are
> required to have it.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Suggested-by: Daniel Thompson <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Reviewed-by: Mark Rutland <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Suzuki K Poulose <[email protected]>
> Cc: Marc Zyngier <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:23 +0000,
Julien Thierry <[email protected]> wrote:
>
> Add a cpufeature indicating whether a cpu supports masking interrupts
> by priority.
>
> The feature will be properly enabled in a later patch.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Reviewed-by: Mark Rutland <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Suzuki K Poulose <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:24 +0000,
Julien Thierry <[email protected]> wrote:
>
> Add helper functions to access system registers related to interrupt
> priorities: PMR and RPR.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Reviewed-by: Mark Rutland <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Marc Zyngier <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:25 +0000,
Julien Thierry <[email protected]> wrote:
>
> Mask the IRQ priority through PMR and re-enable IRQs at CPU level,
> allowing only higher priority interrupts to be received during interrupt
> handling.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Jason Cooper <[email protected]>
> Cc: Marc Zyngier <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:26 +0000,
Julien Thierry <[email protected]> wrote:
>
> Introduce fixed values for PMR that are going to be used to mask and
> unmask interrupts by priority.
>
> The current priority given to GIC interrupts is 0xa0, so clearing PMR's
> most significant bit is enough to mask interrupts.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Suggested-by: Daniel Thompson <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm64/include/asm/ptrace.h | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
> index fce22c4..05cf913 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -25,6 +25,18 @@
> #define CurrentEL_EL1 (1 << 2)
> #define CurrentEL_EL2 (2 << 2)
>
> +/*
> + * PMR values used to mask/unmask interrupts.
> + *
> + * GIC priority masking works as follows: if an IRQ's priority is a higher value
> + * than the value held in PMR, that interrupt is masked. A lower value of PMR
> + * means more IRQ priorities are masked.
Nit: It is not the priorities that are masked, but interrupts that
have a priority higher than that of PMR.
> + *
> + * To mask priorities, we clear the most significant bit of PMR.
> + */
> +#define GIC_PRIO_IRQON 0xf0
> +#define GIC_PRIO_IRQOFF (GIC_PRIO_IRQON & ~0x80)
> +
> /* Additional SPSR bits not exposed in the UABI */
> #define PSR_IL_BIT (1 << 20)
>
> --
> 1.9.1
>
Otherwise:
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:27 +0000,
Julien Thierry <[email protected]> wrote:
>
> In order to replace PSR.I interrupt disabling/enabling with ICC_PMR_EL1
> interrupt masking, ICC_PMR_EL1 needs to be saved/restored when
> taking/returning from an exception. This mimics the way hardware saves
> and restores PSR.I bit in spsr_el1 for exceptions and ERET.
>
> Add PMR to the registers to save in the pt_regs struct upon kernel entry,
> and restore it before ERET. Also, initialize it to a sane value when
> creating new tasks.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Reviewed-by: Catalin Marinas <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Dave Martin <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:28 +0000,
Julien Thierry <[email protected]> wrote:
>
> CPU does not received signals for interrupts with a priority masked by
> ICC_PMR_EL1. This means the CPU might not come back from a WFI
> instruction.
>
> Make sure ICC_PMR_EL1 does not mask interrupts when doing a WFI.
>
> Since the logic of cpu_do_idle is becoming a bit more complex than just
> two instructions, lets turn it from ASM to C.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Suggested-by: Daniel Thompson <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:29 +0000,
Julien Thierry <[email protected]> wrote:
>
> Interrupts masked by ICC_PMR_EL1 will not be signaled to the CPU. This
> means that hypervisor will not receive masked interrupts while running a
> guest.
>
> Avoid this by making sure ICC_PMR_EL1 is unmasked when we enter a guest.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: [email protected]
> ---
> arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
> arch/arm64/kvm/hyp/switch.c | 16 ++++++++++++++++
> 2 files changed, 28 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7732d0b..a1f9f55 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -24,6 +24,7 @@
>
> #include <linux/types.h>
> #include <linux/kvm_types.h>
> +#include <asm/arch_gicv3.h>
> #include <asm/cpufeature.h>
> #include <asm/daifflags.h>
> #include <asm/fpsimd.h>
> @@ -474,6 +475,17 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> static inline void kvm_arm_vhe_guest_enter(void)
> {
> local_daif_mask();
> +
> + /*
> + * Having IRQs masked via PMR when entering the guest means the GIC
> + * will not signal the CPU of interrupts of lower priority, and the
> + * only way to get out will be via guest exceptions.
> + * Naturally, we want to avoid this.
> + */
> + if (system_uses_irq_prio_masking()) {
> + gic_write_pmr(GIC_PRIO_IRQON);
> + dsb(sy);
> + }
> }
>
> static inline void kvm_arm_vhe_guest_exit(void)
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index b0b1478..6a4c2d6 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -22,6 +22,7 @@
>
> #include <kvm/arm_psci.h>
>
> +#include <asm/arch_gicv3.h>
> #include <asm/cpufeature.h>
> #include <asm/kvm_asm.h>
> #include <asm/kvm_emulate.h>
> @@ -521,6 +522,17 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
> struct kvm_cpu_context *guest_ctxt;
> u64 exit_code;
>
> + /*
> + * Having IRQs masked via PMR when entering the guest means the GIC
> + * will not signal the CPU of interrupts of lower priority, and the
> + * only way to get out will be via guest exceptions.
> + * Naturally, we want to avoid this.
> + */
> + if (system_uses_irq_prio_masking()) {
> + gic_write_pmr(GIC_PRIO_IRQON);
> + dsb(sy);
> + }
> +
> vcpu = kern_hyp_va(vcpu);
>
> host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> @@ -573,6 +585,10 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
> */
> __debug_switch_to_host(vcpu);
>
> + /* Returning to host will clear PSR.I, remask PMR if needed */
> + if (system_uses_irq_prio_masking())
> + gic_write_pmr(GIC_PRIO_IRQOFF);
> +
> return exit_code;
> }
It'd be good to have helpers for these PMR manipulations, which I
suspect you could use in other parts of the code such as patch #9.
The logic itself seems good to me, so irrespective of the above:
Reviewed-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:30 +0000,
Julien Thierry <[email protected]> wrote:
>
> Currently, irqflags are saved before calling runtime services and
> checked for mismatch on return.
>
> Provide a pair of overridable macros to save and restore (if needed) the
> state that need to be preserved on return from a runtime service.
> This allows to check for flags that are not necesarly related to
> irqflags.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: [email protected]
With the changes requested by Ard,
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:32 +0000,
Julien Thierry <[email protected]> wrote:
>
> The addition of PMR should not bypass the semantics of daifflags.
>
> When DA_F are set, I bit is also set as no interrupts (even of higher
> priority) is allowed.
>
> When DA_F are cleared, I bit is cleared and interrupt enabling/disabling
> goes through ICC_PMR_EL1.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Reviewed-by: Catalin Marinas <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: James Morse <[email protected]>
> ---
> arch/arm64/include/asm/daifflags.h | 31 +++++++++++++++++++++++++++----
> 1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
> index 546bc39..1fd390e 100644
> --- a/arch/arm64/include/asm/daifflags.h
> +++ b/arch/arm64/include/asm/daifflags.h
> @@ -18,6 +18,8 @@
>
> #include <linux/irqflags.h>
>
> +#include <asm/cpufeature.h>
> +
> #define DAIF_PROCCTX 0
> #define DAIF_PROCCTX_NOIRQ PSR_I_BIT
>
> @@ -36,7 +38,13 @@ static inline unsigned long local_daif_save(void)
> {
> unsigned long flags;
>
> - flags = arch_local_save_flags();
> + flags = read_sysreg(daif);
> +
> + if (system_uses_irq_prio_masking()) {
> + /* If IRQs are masked with PMR, reflect it in the flags */
> + if (read_sysreg_s(SYS_ICC_PMR_EL1) <= GIC_PRIO_IRQOFF)
> + flags |= PSR_I_BIT;
> + }
>
> local_daif_mask();
>
> @@ -45,12 +53,27 @@ static inline unsigned long local_daif_save(void)
>
> static inline void local_daif_restore(unsigned long flags)
> {
> - if (!arch_irqs_disabled_flags(flags))
> + bool irq_disabled = flags & PSR_I_BIT;
> +
> + if (!irq_disabled) {
> trace_hardirqs_on();
>
> - arch_local_irq_restore(flags);
> + if (system_uses_irq_prio_masking())
> + arch_local_irq_enable();
> + } else if (!(flags & PSR_A_BIT)) {
> + /*
> + * If interrupts are disabled but we can take
> + * asynchronous errors, we can take NMIs
> + */
> + if (system_uses_irq_prio_masking()) {
> + flags &= ~PSR_I_BIT;
> + arch_local_irq_disable();
> + }
> + }
> +
> + write_sysreg(flags, daif);
After much head scratching, I finally came to the conclusion that the
above is safe, as a write to PSTATE is guaranteed not to be visible to
instructions before it in program order (and the write to PMR is
itself self-synchronising). I guess that a reference to the ARM ARM is
in order here (no pun intended...).
>
> - if (arch_irqs_disabled_flags(flags))
> + if (irq_disabled)
> trace_hardirqs_off();
> }
>
> --
> 1.9.1
>
With the above,
Reviewed-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:33 +0000,
Julien Thierry <[email protected]> wrote:
>
> In preparation for the application of alternatives at different points
> during the boot process, provide the possibility to check whether
> alternatives for a feature of interest was already applied instead of
> having a global boolean for all alternatives.
>
> Make VHE enablement code check for the VHE feature instead of considering
> all alternatives.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Suzuki K Poulose <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Christoffer Dall <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:34 +0000,
Julien Thierry <[email protected]> wrote:
>
> From: Daniel Thompson <[email protected]>
>
> Currently alternatives are applied very late in the boot process (and
> a long time after we enable scheduling). Some alternative sequences,
> such as those that alter the way CPU context is stored, must be applied
> much earlier in the boot sequence.
>
> Introduce apply_boot_alternatives() to allow some alternatives to be
> applied immediately after we detect the CPU features of the boot CPU.
>
> Signed-off-by: Daniel Thompson <[email protected]>
> [[email protected]: rename to fit new cpufeature framework better,
> apply BOOT_SCOPE feature early in boot]
> Signed-off-by: Julien Thierry <[email protected]>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Suzuki K Poulose <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:35 +0000,
Julien Thierry <[email protected]> wrote:
>
> The code to detect whether Linux has access to group0 interrupts can
> prove useful in other parts of the driver.
>
> Provide a separate function to do this.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Jason Cooper <[email protected]>
> Cc: Marc Zyngier <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:36 +0000,
Julien Thierry <[email protected]> wrote:
>
> Once the boot CPU has been prepared or a new secondary CPU has been
> brought up, use ICC_PMR_EL1 to mask interrupts on that CPU and clear
> PSR.I bit.
>
> Since ICC_PMR_EL1 is initialized at CPU bringup, avoid overwriting
> it in the GICv3 driver.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Suggested-by: Daniel Thompson <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: James Morse <[email protected]>
> Cc: Marc Zyngier <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:37 +0000,
Julien Thierry <[email protected]> wrote:
>
> Implement architecture specific primitive allowing the GICv3 driver to
> use priorities to mask interrupts.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Suggested-by: Daniel Thompson <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:38 +0000,
Julien Thierry <[email protected]> wrote:
>
> The values non secure EL1 needs to use for PMR and RPR registers depends on
> the value of SCR_EL3.FIQ.
>
> The values non secure EL1 sees from the distributor and redistributor
> depend on whether security is enabled for the GIC or not.
>
> To avoid having to deal with two sets of values for PMR
> masking/unmasking, only enable pseudo-NMIs when GIC has non-secure view
> of priorities.
>
> Also, add firmware requirements related to SCR_EL3.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Jason Cooper <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> ---
> Documentation/arm64/booting.txt | 5 ++++
> drivers/irqchip/irq-gic-v3.c | 58 ++++++++++++++++++++++++++++++++++++-----
> 2 files changed, 57 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
> index 8df9f46..fbab7e2 100644
> --- a/Documentation/arm64/booting.txt
> +++ b/Documentation/arm64/booting.txt
> @@ -188,6 +188,11 @@ Before jumping into the kernel, the following conditions must be met:
> the kernel image will be entered must be initialised by software at a
> higher exception level to prevent execution in an UNKNOWN state.
>
> + - SCR_EL3.FIQ must have the same value across all CPUs the kernel is
> + executing on.
> + - The value of SCR_EL3.FIQ must be the same as the one present at boot
> + time whenever the kernel is executing.
> +
> For systems with a GICv3 interrupt controller to be used in v3 mode:
> - If EL3 is present:
> ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1.
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 5a703ae..5374b43 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -66,6 +66,31 @@ struct gic_chip_data {
> static struct gic_chip_data gic_data __read_mostly;
> static DEFINE_STATIC_KEY_TRUE(supports_deactivate_key);
>
> +/*
> + * The behaviours of RPR and PMR registers differ depending on the value of
> + * SCR_EL3.FIQ, and the behaviour of non-secure priority registers of the
> + * distributor and redistributors depends on whether security is enabled in the
> + * GIC.
> + *
> + * When security is enabled, non-secure priority values from the (re)distributor
> + * are presented to the GIC CPUIF as follow:
> + * (GIC_(R)DIST_PRI[irq] >> 1) | 0x80;
> + *
> + * If SCR_EL3.FIQ == 1, the values writen to/read from PMR and RPR at non-secure
> + * EL1 are subject to a similar operation thus matching the priorities presented
> + * from the (re)distributor when security is enabled.
> + *
> + * see GICv3/GICv4 Architecture Specification (IHI0069D):
> + * - section 4.8.1 Non-secure accesses to register fields for Secure interrupt
> + * priorities.
> + * - Figure 4-7 Secure read of the priority field for a Non-secure Group 1
> + * interrupt.
> + *
> + * For now, we only support pseudo-NMIs if we have non-secure view of
> + * priorities.
> + */
> +static DEFINE_STATIC_KEY_FALSE(supports_pseudo_nmis);
> +
> static struct gic_kvm_info gic_v3_kvm_info;
> static DEFINE_PER_CPU(bool, has_rss);
>
> @@ -232,6 +257,12 @@ static void gic_unmask_irq(struct irq_data *d)
> gic_poke_irq(d, GICD_ISENABLER);
> }
>
> +static inline bool gic_supports_nmi(void)
> +{
> + return IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) &&
> + static_branch_likely(&supports_pseudo_nmis);
> +}
> +
> static int gic_irq_set_irqchip_state(struct irq_data *d,
> enum irqchip_irq_state which, bool val)
> {
> @@ -573,6 +604,12 @@ static void gic_update_vlpi_properties(void)
> !gic_data.rdists.has_direct_lpi ? "no " : "");
> }
>
> +/* Check whether it's single security state view */
> +static inline bool gic_dist_security_disabled(void)
> +{
> + return readl_relaxed(gic_data.dist_base + GICD_CTLR) & GICD_CTLR_DS;
> +}
> +
> static void gic_cpu_sys_reg_init(void)
> {
> int i, cpu = smp_processor_id();
> @@ -598,6 +635,9 @@ static void gic_cpu_sys_reg_init(void)
> /* Set priority mask register */
> if (!gic_prio_masking_enabled())
> write_gicreg(DEFAULT_PMR_VALUE, ICC_PMR_EL1);
> + else if (gic_supports_nmi() && group0)
> + /* Mismatch configuration with boot CPU */
> + WARN_ON(!gic_dist_security_disabled());
You can probably write this as a single line:
WARN_ON(gic_supports_nmi() && group0 && !gic_dist_security_disabled());
Maybe even add a comment saying that in this case, the system is
likely to be dead, as the masking of interrupt will not work
correctly.
>
> /*
> * Some firmwares hand over to the kernel with the BPR changed from
> @@ -852,12 +892,6 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
> #endif
>
> #ifdef CONFIG_CPU_PM
> -/* Check whether it's single security state view */
> -static bool gic_dist_security_disabled(void)
> -{
> - return readl_relaxed(gic_data.dist_base + GICD_CTLR) & GICD_CTLR_DS;
> -}
> -
> static int gic_cpu_pm_notifier(struct notifier_block *self,
> unsigned long cmd, void *v)
> {
> @@ -1110,6 +1144,11 @@ static bool gic_enable_quirk_msm8996(void *data)
> return true;
> }
>
> +static void gic_enable_nmi_support(void)
> +{
> + static_branch_enable(&supports_pseudo_nmis);
> +}
> +
> static int __init gic_init_bases(void __iomem *dist_base,
> struct redist_region *rdist_regs,
> u32 nr_redist_regions,
> @@ -1179,6 +1218,13 @@ static int __init gic_init_bases(void __iomem *dist_base,
> its_cpu_init();
> }
>
> + if (gic_prio_masking_enabled()) {
> + if (!gic_has_group0() || gic_dist_security_disabled())
> + gic_enable_nmi_support();
> + else
> + pr_warn("SCR_EL3.FIQ is cleared, cannot enable use of pseudo-NMIs\n");
> + }
> +
> return 0;
>
> out_free:
> --
> 1.9.1
>
Otherwise:
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
Hi Julien,
On 21/01/2019 15:33, Julien Thierry wrote:
> When using VHE, the host needs to clear HCR_EL2.TGE bit in order
> to interract with guest TLBs, switching from EL2&0 translation regime
(interact)
> to EL1&0.
>
> However, some non-maskable asynchronous event could happen while TGE is
> cleared like SDEI. Because of this address translation operations
> relying on EL2&0 translation regime could fail (tlb invalidation,
> userspace access, ...).
>
> Fix this by properly setting HCR_EL2.TGE when entering NMI context and
> clear it if necessary when returning to the interrupted context.
Yes please. This would not have been fun to debug!
Reviewed-by: James Morse <[email protected]>
I was looking for why we need core code to do this, instead of updating the
arch's call sites. Your 'irqdesc: Add domain handlers for NMIs' patch (pointed
to from the cover letter) is the reason: core-code calls nmi_enter()/nmi_exit()
itself.
Thanks,
James
> diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
> index 1473fc2..94b7481 100644
> --- a/arch/arm64/include/asm/hardirq.h
> +++ b/arch/arm64/include/asm/hardirq.h
> @@ -19,6 +19,7 @@
> #include <linux/cache.h>
> #include <linux/threads.h>
> #include <asm/irq.h>
> +#include <asm/kvm_arm.h>
percpu.h?
sysreg.h?
barrier.h?
> @@ -37,6 +38,33 @@
>
> #define __ARCH_IRQ_EXIT_IRQS_DISABLED 1
>
> +struct nmi_ctx {
> + u64 hcr;
> +};
> +
> +DECLARE_PER_CPU(struct nmi_ctx, nmi_contexts);
> +
> +#define arch_nmi_enter() \
> + do { \
> + if (is_kernel_in_hyp_mode()) { \
> + struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts); \
> + nmi_ctx->hcr = read_sysreg(hcr_el2); \
> + if (!(nmi_ctx->hcr & HCR_TGE)) { \
> + write_sysreg(nmi_ctx->hcr | HCR_TGE, hcr_el2); \
> + isb(); \
> + } \
> + } \
> + } while (0)
> +
> +#define arch_nmi_exit() \
> + do { \
> + if (is_kernel_in_hyp_mode()) { \
> + struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts); \
> + if (!(nmi_ctx->hcr & HCR_TGE)) \
> + write_sysreg(nmi_ctx->hcr, hcr_el2); \
> + } \
> + } while (0)
> +
> static inline void ack_bad_irq(unsigned int irq)
> {
> extern unsigned long irq_err_count;
> diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> index 0fbbcdf..da0af63 100644
> --- a/include/linux/hardirq.h
> +++ b/include/linux/hardirq.h
> @@ -60,8 +60,14 @@ static inline void rcu_nmi_exit(void)
> */
> extern void irq_exit(void);
>
> +#ifndef arch_nmi_enter
> +#define arch_nmi_enter() do { } while (0)
> +#define arch_nmi_exit() do { } while (0)
> +#endif
> +
> #define nmi_enter() \
> do { \
> + arch_nmi_enter(); \
> printk_nmi_enter(); \
> lockdep_off(); \
> ftrace_nmi_enter(); \
> @@ -80,6 +86,7 @@ static inline void rcu_nmi_exit(void)
> ftrace_nmi_exit(); \
> lockdep_on(); \
> printk_nmi_exit(); \
> + arch_nmi_exit(); \
> } while (0)
>
> #endif /* LINUX_HARDIRQ_H */
>
On Mon, 21 Jan 2019 15:33:39 +0000,
Julien Thierry <[email protected]> wrote:
>
> Provide a higher priority to be used for pseudo-NMIs. When such an
> interrupt is received, keep interrupts fully disabled at CPU level to
> prevent receiving other pseudo-NMIs while handling the current one.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Jason Cooper <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> ---
> drivers/irqchip/irq-gic-v3.c | 42 ++++++++++++++++++++++++++++++++++++------
> 1 file changed, 36 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 5374b43..4df1e94 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -41,6 +41,8 @@
>
> #include "irq-gic-common.h"
>
> +#define GICD_INT_NMI_PRI (GICD_INT_DEF_PRI & ~0x80)
> +
> #define FLAGS_WORKAROUND_GICR_WAKER_MSM8996 (1ULL << 0)
>
> struct redist_region {
> @@ -381,12 +383,45 @@ static u64 gic_mpidr_to_affinity(unsigned long mpidr)
> return aff;
> }
>
> +static inline void gic_deactivate_unexpected_irq(u32 irqnr)
Same remark as on some other patches: you should be able to drop the
inline attribute without any ill effect. I'd also like this to be
renamed "gic_deactivate_spurious", or something similar.
> +{
> + if (static_branch_likely(&supports_deactivate_key)) {
> + if (irqnr < 8192)
> + gic_write_dir(irqnr);
> + } else {
> + gic_write_eoir(irqnr);
> + }
> +}
> +
> +static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
> +{
> + int err;
> +
> + if (static_branch_likely(&supports_deactivate_key))
> + gic_write_eoir(irqnr);
> + /*
> + * Leave the PSR.I bit set to prevent other NMIs to be
> + * received while handling this one.
> + * PSR.I will be restored when we ERET to the
> + * interrupted context.
> + */
> + err = handle_domain_nmi(gic_data.domain, irqnr, regs);
So this is the point where we start having a dependency on the NMI
series. We definitely need to start talking merge strategy.
> + if (err)
> + gic_deactivate_unexpected_irq(irqnr);
> +}
> +
> static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
> {
> u32 irqnr;
>
> irqnr = gic_read_iar();
>
> + if (gic_supports_nmi() &&
> + unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
> + gic_handle_nmi(irqnr, regs);
> + return;
> + }
> +
> if (gic_prio_masking_enabled()) {
> gic_pmr_mask_irqs();
> gic_arch_enable_irqs();
> @@ -403,12 +438,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
> err = handle_domain_irq(gic_data.domain, irqnr, regs);
> if (err) {
> WARN_ONCE(true, "Unexpected interrupt received!\n");
> - if (static_branch_likely(&supports_deactivate_key)) {
> - if (irqnr < 8192)
> - gic_write_dir(irqnr);
> - } else {
> - gic_write_eoir(irqnr);
> - }
> + gic_deactivate_unexpected_irq(irqnr);
> }
> return;
> }
> --
> 1.9.1
>
With the above addressed,
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:40 +0000,
Julien Thierry <[email protected]> wrote:
>
> Add accessors to the GIC distributor/redistributors priority registers.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Jason Cooper <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> ---
> drivers/irqchip/irq-gic-common.c | 10 ++++++++++
> drivers/irqchip/irq-gic-common.h | 2 ++
> 2 files changed, 12 insertions(+)
>
> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
> index 3c93c6f..04eadbc 100644
> --- a/drivers/irqchip/irq-gic-common.c
> +++ b/drivers/irqchip/irq-gic-common.c
> @@ -110,6 +110,16 @@ int gic_configure_irq(unsigned int irq, unsigned int type,
> return ret;
> }
>
> +void gic_set_irq_prio(unsigned int irq, void __iomem *base, u8 prio)
> +{
> + writeb_relaxed(prio, base + GIC_DIST_PRI + irq);
> +}
> +
> +u8 gic_get_irq_prio(unsigned int irq, void __iomem *base)
> +{
> + return readb_relaxed(base + GIC_DIST_PRI + irq);
> +}
> +
Is there a reason why this is in irq-gic-common.c? If we can avoid it,
I'd rather this is made private to GICv3, as I do not intend to
support this for earlier revisions on the architecture.
Thanks,
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:41 +0000,
Julien Thierry <[email protected]> wrote:
>
> Implement NMI callbacks for GICv3 irqchip. Install NMI safe handlers
> when setting up interrupt line as NMI.
>
> Only SPIs and PPIs are allowed to be set up as NMI.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Jason Cooper <[email protected]>
> Cc: Marc Zyngier <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:42 +0000,
Julien Thierry <[email protected]> wrote:
>
> Per definition of the daifflags, Serrors can occur during any interrupt
> context, that includes NMI contexts. Trying to nmi_enter in an nmi context
> will crash.
>
> Skip nmi_enter/nmi_exit when serror occurred during an NMI.
>
> Suggested-by: James Morse <[email protected]>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Dave Martin <[email protected]>
> Cc: James Morse <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:43 +0000,
Julien Thierry <[email protected]> wrote:
>
> Handling of an NMI should not set any TIF flags. For NMIs received from
> EL0 the current exit path is safe to use.
>
> However, an NMI received at EL1 could have interrupted some task context
> that has set the TIF_NEED_RESCHED flag. Preempting a task should not
> happen as a result of an NMI.
>
> Skip preemption after handling an NMI from EL1.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> ---
> arch/arm64/kernel/entry.S | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 35a47f6..a0b0a22 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -624,6 +624,14 @@ el1_irq:
>
> #ifdef CONFIG_PREEMPT
> ldr x24, [tsk, #TSK_TI_PREEMPT] // get preempt count
> +alternative_if ARM64_HAS_IRQ_PRIO_MASKING
> + /*
> + * DA_F were cleared at start of handling. If anything is set in DAIF,
> + * we come back from an NMI, so skip preemption
> + */
> + mrs x0, daif
> + orr x24, x24, x0
> +alternative_else_nop_endif
> cbnz x24, 1f // preempt count != 0
> bl el1_preempt
> 1:
I find this a bit ugly, as what we have in x24 is not the preempt
count anymore. Maybe amend the comment above?
The code being nonetheless correct:
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:44 +0000,
Julien Thierry <[email protected]> wrote:
>
> When an NMI is raised while interrupts where disabled, the IRQ tracing
> already is in the correct state (i.e. hardirqs_off) and should be left
> as such when returning to the interrupted context.
>
> Check whether PMR was masking interrupts when the NMI was raised and
> skip IRQ tracing if necessary.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Marc Zyngier <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
M.
--
Jazz is not dead, it just smell funny.
On Mon, 21 Jan 2019 15:33:45 +0000,
Julien Thierry <[email protected]> wrote:
>
> Add a build option and a command line parameter to build and enable the
> support of pseudo-NMIs.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Suggested-by: Daniel Thompson <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
> arch/arm64/Kconfig | 14 ++++++++++++++
> arch/arm64/kernel/cpufeature.c | 11 ++++++++++-
> 3 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index b799bcf..173e2cc 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1197,6 +1197,12 @@
> to discrete, to make X server driver able to add WB
> entry later. This parameter enables that.
>
> + enable_pseudo_nmi [ARM64]
> + Enables support for pseudo-NMIs in the kernel. This
> + requires both the kernel to be built with
> + CONFIG_ARM64_PSEUDO_NMI and to be running on a
> + platform with GICv3.
> +
> enable_timer_pin_1 [X86]
> Enable PIN 1 of APIC timer
> Can be useful to work around chipset bugs
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a4168d3..8d84bfd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1328,6 +1328,20 @@ config ARM64_MODULE_PLTS
> bool
> select HAVE_MOD_ARCH_SPECIFIC
>
> +config ARM64_PSEUDO_NMI
> + bool "Support for NMI-like interrupts"
> + select CONFIG_ARM_GIC_V3
> + help
> + Adds support for mimicking Non-Maskable Interrupts through the use of
> + GIC interrupt priority. This support requires version 3 or later of
> + Arm GIC.
> +
> + This high priority configuration for interrupts need to be
s/need/needs/
> + explicitly enabled through the new kernel parameter
It won't be new forever... ;-)
> + "enable_pseudo_nmi".
I'm not overly keen on this name. We already have "irqchip.gicv3_nolpi",
so why not adopt something similar. "irqchip.gicv3_pseudo_nmi", taking a
boolean value?
> +
> + If unsure, say N
> +
> config RELOCATABLE
> bool
> help
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index b530fb24..e66d778 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1207,10 +1207,19 @@ static void cpu_enable_address_auth(struct arm64_cpu_capabilities const *cap)
> #endif /* CONFIG_ARM64_PTR_AUTH */
>
> #ifdef CONFIG_ARM64_PSEUDO_NMI
> +static bool enable_pseudo_nmi;
> +
> +static int __init early_enable_pseudo_nmi(char *p)
> +{
> + enable_pseudo_nmi = true;
And if you're happy with the above, this becomes:
return strtobool(p, &enable_pseudo_nmi);
> + return 0;
> +}
> +early_param("enable_pseudo_nmi", early_enable_pseudo_nmi);
> +
> static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry,
> int scope)
> {
> - return false;
> + return enable_pseudo_nmi && has_useable_gicv3_cpuif(entry, scope);
> }
> #endif
>
> --
> 1.9.1
>
Thanks,
M.
--
Jazz is not dead, it just smell funny.
Hello Julien & Marc,
Thanks for your reply, I misunderstood the usage of ready_percpu_nmi() and
teardown_percpu_nmi() indeed.
I think that adding a note about this in the comment of request_percpu_nmi() will be nice.
Regards,
Wei Li
On 2019/1/28 16:57, Julien Thierry wrote:
> Hi,
>
> On 26/01/2019 10:19, liwei (GF) wrote:
>>
>>
>> On 2019/1/21 23:33, Julien Thierry wrote:
>>> Implement NMI callbacks for GICv3 irqchip. Install NMI safe handlers
>>> when setting up interrupt line as NMI.
>>>
>>> Only SPIs and PPIs are allowed to be set up as NMI.
>>>
>>> Signed-off-by: Julien Thierry <[email protected]>
>>> Cc: Thomas Gleixner <[email protected]>
>>> Cc: Jason Cooper <[email protected]>
>>> Cc: Marc Zyngier <[email protected]>
>>> ---
>>> drivers/irqchip/irq-gic-v3.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 84 insertions(+)
>>>
>>> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
>>> index 4df1e94..447d8ab 100644
>>> --- a/drivers/irqchip/irq-gic-v3.c
>>> +++ b/drivers/irqchip/irq-gic-v3.c
>> (snip)
>>>
>>> +static int gic_irq_nmi_setup(struct irq_data *d)
>>> +{
>>> + struct irq_desc *desc = irq_to_desc(d->irq);
>>> +
>>> + if (!gic_supports_nmi())
>>> + return -EINVAL;
>>> +
>>> + if (gic_peek_irq(d, GICD_ISENABLER)) {
>>> + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + /*
>>> + * A secondary irq_chip should be in charge of LPI request,
>>> + * it should not be possible to get there
>>> + */
>>> + if (WARN_ON(gic_irq(d) >= 8192))
>>> + return -EINVAL;
>>> +
>>> + /* desc lock should already be held */
>>> + if (gic_irq(d) < 32) {
>>> + /* Setting up PPI as NMI, only switch handler for first NMI */
>>> + if (!refcount_inc_not_zero(&ppi_nmi_refs[gic_irq(d) - 16])) {
>>> + refcount_set(&ppi_nmi_refs[gic_irq(d) - 16], 1);
>>> + desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
>>> + }
>>> + } else {
>>> + desc->handle_irq = handle_fasteoi_nmi;
>>> + }
>>> +
>>> + gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_NMI_PRI);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void gic_irq_nmi_teardown(struct irq_data *d)
>>> +{
>>> + struct irq_desc *desc = irq_to_desc(d->irq);
>>> +
>>> + if (WARN_ON(!gic_supports_nmi()))
>>> + return;
>>> +
>>> + if (gic_peek_irq(d, GICD_ISENABLER)) {
>>> + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
>>> + return;
>>> + }
>>> +
>>> + /*
>>> + * A secondary irq_chip should be in charge of LPI request,
>>> + * it should not be possible to get there
>>> + */
>>> + if (WARN_ON(gic_irq(d) >= 8192))
>>> + return;
>>> +
>>> + /* desc lock should already be held */
>>> + if (gic_irq(d) < 32) {
>>> + /* Tearing down NMI, only switch handler for last NMI */
>>> + if (refcount_dec_and_test(&ppi_nmi_refs[gic_irq(d) - 16]))
>>> + desc->handle_irq = handle_percpu_devid_irq;
>>> + } else {
>>> + desc->handle_irq = handle_fasteoi_irq;
>>> + }
>>> +
>>> + gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_DEF_PRI);
>>> +}
>>> +
>>
>> Hello Julien,
>> I am afraid the setting of priority is not correct here. If the irq is in redistributor(gic_irq(d) < 32),
>> we should set the priority on each cpu, while we just set the priority on the current cpu here.
>
> As Marc stated, to work with PPIs, the core IRQ API provides a set of
> *_percpu_irq functions (request, enable, disable...).
>
> The current idea is that the driver is in charge of calling
> ready_percpu_nmi() before enabling on the correct CPU, in a similar
> manner that the driver is in charge of calling enable_percpu_irq() and
> disable_percpu_irq() on the correct CPU.
>
>
>> static inline void __iomem *gic_dist_base(struct irq_data *d)
>> {
>> if (gic_irq_in_rdist(d)) /* SGI+PPI -> SGI_base for this CPU */
>> return gic_data_rdist_sgi_base();
>>
>> if (d->hwirq <= 1023) /* SPI -> dist_base */
>> return gic_data.dist_base;
>>
>> return NULL;
>> }
>>
>> I tried to add a smp_call_function here, but the kernel reported a warning as we have disabled irq
>> when calling raw_spin_lock_irqsave in request_nmi or ready_percpu_nmi.
>> [ 2.137262] Call trace:
>> [ 2.137265] smp_call_function_many+0xf8/0x3a0
>> [ 2.137267] smp_call_function+0x40/0x58
>> [ 2.137271] gic_irq_nmi_setup+0xe8/0x118
>> [ 2.137275] ready_percpu_nmi+0x6c/0xf0> [ 2.137279] armpmu_request_irq+0x228/0x250
>
> Your issue lies here, if your PMU IRQ is a PPI, you shouldn't be calling
> ready_percpu_nmi() at the time you request but probably somewhere like
> arm_perf_starting_cpu().
>
> And you wouldn't need the smp_call to set the priority.
>
> Hope this helps.
>
>> [ 2.137281] arm_pmu_acpi_init+0x150/0x2f0
>> [ 2.137284] do_one_initcall+0x54/0x218
>> [ 2.137289] kernel_init_freeable+0x230/0x354
>> [ 2.137293] kernel_init+0x18/0x118
>> [ 2.137295] ret_from_fork+0x10/0x18
>>
>
> Thanks,
>
On 28/01/2019 13:59, liwei (GF) wrote:
> Hello Julien & Marc,
> Thanks for your reply, I misunderstood the usage of ready_percpu_nmi() and
> teardown_percpu_nmi() indeed.
> I think that adding a note about this in the comment of request_percpu_nmi() will be nice.
Yes, that's a good suggestion, I'll add that.
Also, if this helps, I have been working on some patches that are not
fully ready for submission to make the Arm PMU interrupt into an NMI (I
was holding onto them until this series would be merged).
I pushed them on this branch:
http://linux-arm.org/linux-jt.git -b nmi_for_pmu
Thanks,
Julien
> On 2019/1/28 16:57, Julien Thierry wrote:
>> Hi,
>>
>> On 26/01/2019 10:19, liwei (GF) wrote:
>>>
>>>
>>> On 2019/1/21 23:33, Julien Thierry wrote:
>>>> Implement NMI callbacks for GICv3 irqchip. Install NMI safe handlers
>>>> when setting up interrupt line as NMI.
>>>>
>>>> Only SPIs and PPIs are allowed to be set up as NMI.
>>>>
>>>> Signed-off-by: Julien Thierry <[email protected]>
>>>> Cc: Thomas Gleixner <[email protected]>
>>>> Cc: Jason Cooper <[email protected]>
>>>> Cc: Marc Zyngier <[email protected]>
>>>> ---
>>>> drivers/irqchip/irq-gic-v3.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 84 insertions(+)
>>>>
>>>> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
>>>> index 4df1e94..447d8ab 100644
>>>> --- a/drivers/irqchip/irq-gic-v3.c
>>>> +++ b/drivers/irqchip/irq-gic-v3.c
>>> (snip)
>>>>
>>>> +static int gic_irq_nmi_setup(struct irq_data *d)
>>>> +{
>>>> + struct irq_desc *desc = irq_to_desc(d->irq);
>>>> +
>>>> + if (!gic_supports_nmi())
>>>> + return -EINVAL;
>>>> +
>>>> + if (gic_peek_irq(d, GICD_ISENABLER)) {
>>>> + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + /*
>>>> + * A secondary irq_chip should be in charge of LPI request,
>>>> + * it should not be possible to get there
>>>> + */
>>>> + if (WARN_ON(gic_irq(d) >= 8192))
>>>> + return -EINVAL;
>>>> +
>>>> + /* desc lock should already be held */
>>>> + if (gic_irq(d) < 32) {
>>>> + /* Setting up PPI as NMI, only switch handler for first NMI */
>>>> + if (!refcount_inc_not_zero(&ppi_nmi_refs[gic_irq(d) - 16])) {
>>>> + refcount_set(&ppi_nmi_refs[gic_irq(d) - 16], 1);
>>>> + desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
>>>> + }
>>>> + } else {
>>>> + desc->handle_irq = handle_fasteoi_nmi;
>>>> + }
>>>> +
>>>> + gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_NMI_PRI);
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static void gic_irq_nmi_teardown(struct irq_data *d)
>>>> +{
>>>> + struct irq_desc *desc = irq_to_desc(d->irq);
>>>> +
>>>> + if (WARN_ON(!gic_supports_nmi()))
>>>> + return;
>>>> +
>>>> + if (gic_peek_irq(d, GICD_ISENABLER)) {
>>>> + pr_err("Cannot set NMI property of enabled IRQ %u\n", d->irq);
>>>> + return;
>>>> + }
>>>> +
>>>> + /*
>>>> + * A secondary irq_chip should be in charge of LPI request,
>>>> + * it should not be possible to get there
>>>> + */
>>>> + if (WARN_ON(gic_irq(d) >= 8192))
>>>> + return;
>>>> +
>>>> + /* desc lock should already be held */
>>>> + if (gic_irq(d) < 32) {
>>>> + /* Tearing down NMI, only switch handler for last NMI */
>>>> + if (refcount_dec_and_test(&ppi_nmi_refs[gic_irq(d) - 16]))
>>>> + desc->handle_irq = handle_percpu_devid_irq;
>>>> + } else {
>>>> + desc->handle_irq = handle_fasteoi_irq;
>>>> + }
>>>> +
>>>> + gic_set_irq_prio(gic_irq(d), gic_dist_base(d), GICD_INT_DEF_PRI);
>>>> +}
>>>> +
>>>
>>> Hello Julien,
>>> I am afraid the setting of priority is not correct here. If the irq is in redistributor(gic_irq(d) < 32),
>>> we should set the priority on each cpu, while we just set the priority on the current cpu here.
>>
>> As Marc stated, to work with PPIs, the core IRQ API provides a set of
>> *_percpu_irq functions (request, enable, disable...).
>>
>> The current idea is that the driver is in charge of calling
>> ready_percpu_nmi() before enabling on the correct CPU, in a similar
>> manner that the driver is in charge of calling enable_percpu_irq() and
>> disable_percpu_irq() on the correct CPU.
>>
>>
>>> static inline void __iomem *gic_dist_base(struct irq_data *d)
>>> {
>>> if (gic_irq_in_rdist(d)) /* SGI+PPI -> SGI_base for this CPU */
>>> return gic_data_rdist_sgi_base();
>>>
>>> if (d->hwirq <= 1023) /* SPI -> dist_base */
>>> return gic_data.dist_base;
>>>
>>> return NULL;
>>> }
>>>
>>> I tried to add a smp_call_function here, but the kernel reported a warning as we have disabled irq
>>> when calling raw_spin_lock_irqsave in request_nmi or ready_percpu_nmi.
>>> [ 2.137262] Call trace:
>>> [ 2.137265] smp_call_function_many+0xf8/0x3a0
>>> [ 2.137267] smp_call_function+0x40/0x58
>>> [ 2.137271] gic_irq_nmi_setup+0xe8/0x118
>>> [ 2.137275] ready_percpu_nmi+0x6c/0xf0> [ 2.137279] armpmu_request_irq+0x228/0x250
>>
>> Your issue lies here, if your PMU IRQ is a PPI, you shouldn't be calling
>> ready_percpu_nmi() at the time you request but probably somewhere like
>> arm_perf_starting_cpu().
>>
>> And you wouldn't need the smp_call to set the priority.
>>
>> Hope this helps.
>>
>>> [ 2.137281] arm_pmu_acpi_init+0x150/0x2f0
>>> [ 2.137284] do_one_initcall+0x54/0x218
>>> [ 2.137289] kernel_init_freeable+0x230/0x354
>>> [ 2.137293] kernel_init+0x18/0x118
>>> [ 2.137295] ret_from_fork+0x10/0x18
>>>
>>
>> Thanks,
>>
>
--
Julien Thierry
Hi James,
On 28/01/2019 11:48, James Morse wrote:
> Hi Julien,
>
> On 21/01/2019 15:33, Julien Thierry wrote:
>> When using VHE, the host needs to clear HCR_EL2.TGE bit in order
>> to interract with guest TLBs, switching from EL2&0 translation regime
>
> (interact)
>
>
>> to EL1&0.
>>
>> However, some non-maskable asynchronous event could happen while TGE is
>> cleared like SDEI. Because of this address translation operations
>> relying on EL2&0 translation regime could fail (tlb invalidation,
>> userspace access, ...).
>>
>> Fix this by properly setting HCR_EL2.TGE when entering NMI context and
>> clear it if necessary when returning to the interrupted context.
>
> Yes please. This would not have been fun to debug!
>
> Reviewed-by: James Morse <[email protected]>
>
>
Thanks.
>
> I was looking for why we need core code to do this, instead of updating the
> arch's call sites. Your 'irqdesc: Add domain handlers for NMIs' patch (pointed
> to from the cover letter) is the reason: core-code calls nmi_enter()/nmi_exit()
> itself.
>
Yes, that's the main reason.
>
>> diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
>> index 1473fc2..94b7481 100644
>> --- a/arch/arm64/include/asm/hardirq.h
>> +++ b/arch/arm64/include/asm/hardirq.h
>> @@ -19,6 +19,7 @@
>> #include <linux/cache.h>
>> #include <linux/threads.h>
>> #include <asm/irq.h>
>> +#include <asm/kvm_arm.h>
>
> percpu.h?
> sysreg.h?
> barrier.h?
>
Good point, I'll add those.
Thanks,
--
Julien Thierry
On 28/01/2019 11:59, Marc Zyngier wrote:
> On Mon, 21 Jan 2019 15:33:39 +0000,
> Julien Thierry <[email protected]> wrote:
>>
>> Provide a higher priority to be used for pseudo-NMIs. When such an
>> interrupt is received, keep interrupts fully disabled at CPU level to
>> prevent receiving other pseudo-NMIs while handling the current one.
>>
>> Signed-off-by: Julien Thierry <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Jason Cooper <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> ---
>> drivers/irqchip/irq-gic-v3.c | 42 ++++++++++++++++++++++++++++++++++++------
>> 1 file changed, 36 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
>> index 5374b43..4df1e94 100644
>> --- a/drivers/irqchip/irq-gic-v3.c
>> +++ b/drivers/irqchip/irq-gic-v3.c
>> @@ -41,6 +41,8 @@
>>
>> #include "irq-gic-common.h"
>>
>> +#define GICD_INT_NMI_PRI (GICD_INT_DEF_PRI & ~0x80)
>> +
>> #define FLAGS_WORKAROUND_GICR_WAKER_MSM8996 (1ULL << 0)
>>
>> struct redist_region {
>> @@ -381,12 +383,45 @@ static u64 gic_mpidr_to_affinity(unsigned long mpidr)
>> return aff;
>> }
>>
>> +static inline void gic_deactivate_unexpected_irq(u32 irqnr)
>
> Same remark as on some other patches: you should be able to drop the
> inline attribute without any ill effect. I'd also like this to be
> renamed "gic_deactivate_spurious", or something similar.
>
I'm a bit concern about using spurious since it is not related to
GICC_INT_SPURIOUS, we actually read a valid IRQ number, we just don't
know how we should handle it.
Would "gic_deactivate_unhandled" work? Or "gic_deactivate_bad"?
Thanks,
--
Julien Thierry
On 28/01/2019 12:04, Marc Zyngier wrote:
> On Mon, 21 Jan 2019 15:33:40 +0000,
> Julien Thierry <[email protected]> wrote:
>>
>> Add accessors to the GIC distributor/redistributors priority registers.
>>
>> Signed-off-by: Julien Thierry <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Jason Cooper <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> ---
>> drivers/irqchip/irq-gic-common.c | 10 ++++++++++
>> drivers/irqchip/irq-gic-common.h | 2 ++
>> 2 files changed, 12 insertions(+)
>>
>> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
>> index 3c93c6f..04eadbc 100644
>> --- a/drivers/irqchip/irq-gic-common.c
>> +++ b/drivers/irqchip/irq-gic-common.c
>> @@ -110,6 +110,16 @@ int gic_configure_irq(unsigned int irq, unsigned int type,
>> return ret;
>> }
>>
>> +void gic_set_irq_prio(unsigned int irq, void __iomem *base, u8 prio)
>> +{
>> + writeb_relaxed(prio, base + GIC_DIST_PRI + irq);
>> +}
>> +
>> +u8 gic_get_irq_prio(unsigned int irq, void __iomem *base)
>> +{
>> + return readb_relaxed(base + GIC_DIST_PRI + irq);
>> +}
>> +
>
> Is there a reason why this is in irq-gic-common.c? If we can avoid it,
> I'd rather this is made private to GICv3, as I do not intend to
> support this for earlier revisions on the architecture.
>
The only reason I put it in common is that the same code works both for
gicv3 and gic drivers.
But if you prefer the gic driver not to see it I'll move it to the gicv3
driver.
Thanks,
--
Julien Thierry
On Tue, 29 Jan 2019 11:33:35 +0000,
Julien Thierry <[email protected]> wrote:
>
>
>
> On 28/01/2019 11:59, Marc Zyngier wrote:
> > On Mon, 21 Jan 2019 15:33:39 +0000,
> > Julien Thierry <[email protected]> wrote:
> >>
> >> Provide a higher priority to be used for pseudo-NMIs. When such an
> >> interrupt is received, keep interrupts fully disabled at CPU level to
> >> prevent receiving other pseudo-NMIs while handling the current one.
> >>
> >> Signed-off-by: Julien Thierry <[email protected]>
> >> Cc: Thomas Gleixner <[email protected]>
> >> Cc: Jason Cooper <[email protected]>
> >> Cc: Marc Zyngier <[email protected]>
> >> ---
> >> drivers/irqchip/irq-gic-v3.c | 42 ++++++++++++++++++++++++++++++++++++------
> >> 1 file changed, 36 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> >> index 5374b43..4df1e94 100644
> >> --- a/drivers/irqchip/irq-gic-v3.c
> >> +++ b/drivers/irqchip/irq-gic-v3.c
> >> @@ -41,6 +41,8 @@
> >>
> >> #include "irq-gic-common.h"
> >>
> >> +#define GICD_INT_NMI_PRI (GICD_INT_DEF_PRI & ~0x80)
> >> +
> >> #define FLAGS_WORKAROUND_GICR_WAKER_MSM8996 (1ULL << 0)
> >>
> >> struct redist_region {
> >> @@ -381,12 +383,45 @@ static u64 gic_mpidr_to_affinity(unsigned long mpidr)
> >> return aff;
> >> }
> >>
> >> +static inline void gic_deactivate_unexpected_irq(u32 irqnr)
> >
> > Same remark as on some other patches: you should be able to drop the
> > inline attribute without any ill effect. I'd also like this to be
> > renamed "gic_deactivate_spurious", or something similar.
> >
>
> I'm a bit concern about using spurious since it is not related to
> GICC_INT_SPURIOUS, we actually read a valid IRQ number, we just don't
> know how we should handle it.
Well, I'd say that in the case of GICC_INT_SPURIOUS, there is nothing
to deactivate, but hey... ;-)
> Would "gic_deactivate_unhandled" work? Or "gic_deactivate_bad"?
Sure, any will do.
M.
--
Jazz is not dead, it just smell funny.
On 23/01/2019 10:44, Julien Thierry wrote:
>
>
> On 22/01/2019 15:21, Catalin Marinas wrote:
>> On Mon, Jan 21, 2019 at 03:33:31PM +0000, Julien Thierry wrote:
>>> diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
>>> index 24692ed..7e82a92 100644
>>> --- a/arch/arm64/include/asm/irqflags.h
>>> +++ b/arch/arm64/include/asm/irqflags.h
>>> @@ -18,7 +18,9 @@
>>>
>>> #ifdef __KERNEL__
>>>
>>> +#include <asm/alternative.h>
>>> #include <asm/ptrace.h>
>>> +#include <asm/sysreg.h>
>>>
>>> /*
>>> * Aarch64 has flags for masking: Debug, Asynchronous (serror), Interrupts and
>>> @@ -36,33 +38,31 @@
>>> /*
>>> * CPU interrupt mask handling.
>>> */
>>> -static inline unsigned long arch_local_irq_save(void)
>>> -{
>>> - unsigned long flags;
>>> - asm volatile(
>>> - "mrs %0, daif // arch_local_irq_save\n"
>>> - "msr daifset, #2"
>>> - : "=r" (flags)
>>> - :
>>> - : "memory");
>>> - return flags;
>>> -}
>>> -
>>> static inline void arch_local_irq_enable(void)
>>> {
>>> - asm volatile(
>>> - "msr daifclr, #2 // arch_local_irq_enable"
>>> - :
>>> + unsigned long unmasked = GIC_PRIO_IRQON;
>>> +
>>> + asm volatile(ALTERNATIVE(
>>> + "msr daifclr, #2 // arch_local_irq_enable\n"
>>> + "nop",
>>> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
>>> + "dsb sy",
>>> + ARM64_HAS_IRQ_PRIO_MASKING)
>>> :
>>> + : "r" (unmasked)
>>> : "memory");
>>> }
>>>
>>> static inline void arch_local_irq_disable(void)
>>> {
>>> - asm volatile(
>>> - "msr daifset, #2 // arch_local_irq_disable"
>>> - :
>>> + unsigned long masked = GIC_PRIO_IRQOFF;
>>> +
>>> + asm volatile(ALTERNATIVE(
>>> + "msr daifset, #2 // arch_local_irq_disable",
>>> + "msr_s " __stringify(SYS_ICC_PMR_EL1) ", %0",
>>> + ARM64_HAS_IRQ_PRIO_MASKING)
>>> :
>>> + : "r" (masked)
>>> : "memory");
>>> }
>>
>> Nitpicks: you could drop masked/unmasked variables here (it's up to you,
>> it wouldn't make any difference on the generated asm).
>>
>
> Good point, I'll do that.
>
>>> @@ -71,12 +71,44 @@ static inline void arch_local_irq_disable(void)
>>> */
>>> static inline unsigned long arch_local_save_flags(void)
>>> {
>>> + unsigned long daif_bits;
>>> unsigned long flags;
>>> - asm volatile(
>>> - "mrs %0, daif // arch_local_save_flags"
>>> - : "=r" (flags)
>>> - :
>>> +
>>> + daif_bits = read_sysreg(daif);
>>> +
>>> + /*
>>> + * The asm is logically equivalent to:
>>> + *
>>> + * if (system_uses_irq_prio_masking())
>>> + * flags = (daif_bits & PSR_I_BIT) ?
>>> + * GIC_PRIO_IRQOFF :
>>> + * read_sysreg_s(SYS_ICC_PMR_EL1);
>>> + * else
>>> + * flags = daif_bits;
>>> + */
>>> + asm volatile(ALTERNATIVE(
>>> + "mov %0, %1\n"
>>> + "nop\n"
>>> + "nop",
>>> + "mrs_s %0, " __stringify(SYS_ICC_PMR_EL1) "\n"
>>> + "ands %1, %1, " __stringify(PSR_I_BIT) "\n"
>>> + "csel %0, %0, %2, eq",
>>> + ARM64_HAS_IRQ_PRIO_MASKING)
>>> + : "=&r" (flags), "+r" (daif_bits)
>>> + : "r" (GIC_PRIO_IRQOFF)
>>> : "memory");
>>> +
>>> + return flags;
>>> +}
>>
>> BTW, how's the code generated from the C version? It will have a branch
>> but may not be too bad. Either way is fine by me.
>>
>
> It's a bit hard to talk about the code generated from the C version as
> it can lie within several layers of inline, so the instructions for that
> section are a bit more scattered.
>
> However, it seems like the compiler is more clever (maybe the asm
> volatile prevents some optimizations regarding register allocation or
> instruction ordering) and the C version seems to perform slightly better
> (although it could be within the noise) despite the branch.
>
> So, I'll just switch up to the C version.
>
I remember a reason to stick to the asm version. There is a time
interval between the (early) CPU feature being enabled and the (early)
alternatives being applied.
For the early ones, interrupts remain disabled between CPU feature
enabling and alternative application. However, if you enable lockdep, it
will do some interrupts_disabled() checks which become
local_irqs_disabled_flags(local_save_flags()). If one is using the
alternative and the other the static branch, lockdep will emit a warning
about it.
So, I tried moving all irqflags functions but using the static branch
for arch_local_irq_restore() slows things down a bit, and I'm not sure
having arch_local_irq_save() use static branch while
arch_local_irq_restore() uses alternative makes much sense.
In the end, I'll stick to the ASM + alternative version to avoid the
lockdep issue and have consistant manipulation of the irqflags.
Thanks,
--
Julien Thierry
On Mon, Jan 21, 2019 at 03:33:29PM +0000, Julien Thierry wrote:
> Interrupts masked by ICC_PMR_EL1 will not be signaled to the CPU. This
> means that hypervisor will not receive masked interrupts while running a
> guest.
>
You could add to the commit description how this works overall,
something along the lines of:
We need to make sure that all maskable interrupts are masked from the
time we call local_irq_disable() in the main run loop, and remain so
until we call local_irq_enable() after returning from the guest, and we
need to ensure that we see no interrupts at all (including pseudo-NMIs)
in the middle of the VM world-switch, while at the same time we need to
ensure we exit the guest when there are interrupts for the host.
We can accomplish this with pseudo-NMIs enabled by:
(1) local_irq_disable: set the priority mask
(2) enter guest: set PSTATE.I
(3) clear the priority mask
(4) eret to guest
(5) exit guest: set the priotiy mask
clear PSTATE.I (and restore other host PSTATE bits)
(6) local_irq_enable: clear the priority mask.
Also, took me a while to realize that when we come back from the guest,
we call local_daif_restore with DAIF_PROCCTX_NOIRQ, which actually does
both of the things in (5).
> Avoid this by making sure ICC_PMR_EL1 is unmasked when we enter a guest.
>
> Signed-off-by: Julien Thierry <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Cc: Christoffer Dall <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: [email protected]
> ---
> arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
> arch/arm64/kvm/hyp/switch.c | 16 ++++++++++++++++
> 2 files changed, 28 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7732d0b..a1f9f55 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -24,6 +24,7 @@
>
> #include <linux/types.h>
> #include <linux/kvm_types.h>
> +#include <asm/arch_gicv3.h>
> #include <asm/cpufeature.h>
> #include <asm/daifflags.h>
> #include <asm/fpsimd.h>
> @@ -474,6 +475,17 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> static inline void kvm_arm_vhe_guest_enter(void)
> {
> local_daif_mask();
> +
> + /*
> + * Having IRQs masked via PMR when entering the guest means the GIC
> + * will not signal the CPU of interrupts of lower priority, and the
> + * only way to get out will be via guest exceptions.
> + * Naturally, we want to avoid this.
> + */
> + if (system_uses_irq_prio_masking()) {
> + gic_write_pmr(GIC_PRIO_IRQON);
> + dsb(sy);
> + }
> }
>
> static inline void kvm_arm_vhe_guest_exit(void)
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index b0b1478..6a4c2d6 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -22,6 +22,7 @@
>
> #include <kvm/arm_psci.h>
>
> +#include <asm/arch_gicv3.h>
> #include <asm/cpufeature.h>
> #include <asm/kvm_asm.h>
> #include <asm/kvm_emulate.h>
> @@ -521,6 +522,17 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
> struct kvm_cpu_context *guest_ctxt;
> u64 exit_code;
>
> + /*
> + * Having IRQs masked via PMR when entering the guest means the GIC
> + * will not signal the CPU of interrupts of lower priority, and the
> + * only way to get out will be via guest exceptions.
> + * Naturally, we want to avoid this.
> + */
> + if (system_uses_irq_prio_masking()) {
> + gic_write_pmr(GIC_PRIO_IRQON);
> + dsb(sy);
> + }
> +
> vcpu = kern_hyp_va(vcpu);
>
> host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> @@ -573,6 +585,10 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
> */
> __debug_switch_to_host(vcpu);
>
> + /* Returning to host will clear PSR.I, remask PMR if needed */
> + if (system_uses_irq_prio_masking())
> + gic_write_pmr(GIC_PRIO_IRQOFF);
> +
> return exit_code;
> }
>
nit: you could consider moving the non-vhe part into a new
kvm_arm_nvhe_guest_enter, for symmetry with the vhe part.
Otherwise looks good to me:
Reviewed-by: Christoffer Dall <[email protected]>
On 28/01/2019 12:47, Marc Zyngier wrote:
> On Mon, 21 Jan 2019 15:33:45 +0000,
> Julien Thierry <[email protected]> wrote:
>>
>> Add a build option and a command line parameter to build and enable the
>> support of pseudo-NMIs.
>>
>> Signed-off-by: Julien Thierry <[email protected]>
>> Suggested-by: Daniel Thompson <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> ---
>> Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
>> arch/arm64/Kconfig | 14 ++++++++++++++
>> arch/arm64/kernel/cpufeature.c | 11 ++++++++++-
>> 3 files changed, 30 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index b799bcf..173e2cc 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1197,6 +1197,12 @@
>> to discrete, to make X server driver able to add WB
>> entry later. This parameter enables that.
>>
>> + enable_pseudo_nmi [ARM64]
>> + Enables support for pseudo-NMIs in the kernel. This
>> + requires both the kernel to be built with
>> + CONFIG_ARM64_PSEUDO_NMI and to be running on a
>> + platform with GICv3.
>> +
>> enable_timer_pin_1 [X86]
>> Enable PIN 1 of APIC timer
>> Can be useful to work around chipset bugs
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index a4168d3..8d84bfd 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1328,6 +1328,20 @@ config ARM64_MODULE_PLTS
>> bool
>> select HAVE_MOD_ARCH_SPECIFIC
>>
>> +config ARM64_PSEUDO_NMI
>> + bool "Support for NMI-like interrupts"
>> + select CONFIG_ARM_GIC_V3
>> + help
>> + Adds support for mimicking Non-Maskable Interrupts through the use of
>> + GIC interrupt priority. This support requires version 3 or later of
>> + Arm GIC.
>> +
>> + This high priority configuration for interrupts need to be
>
> s/need/needs/
>
>> + explicitly enabled through the new kernel parameter
>
> It won't be new forever... ;-)
>
Good point!
>> + "enable_pseudo_nmi".
>
> I'm not overly keen on this name. We already have "irqchip.gicv3_nolpi",
> so why not adopt something similar. "irqchip.gicv3_pseudo_nmi", taking a
> boolean value?
>
Sure, I'm fine with that.
>> +
>> + If unsure, say N
>> +
>> config RELOCATABLE
>> bool
>> help
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index b530fb24..e66d778 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -1207,10 +1207,19 @@ static void cpu_enable_address_auth(struct arm64_cpu_capabilities const *cap)
>> #endif /* CONFIG_ARM64_PTR_AUTH */
>>
>> #ifdef CONFIG_ARM64_PSEUDO_NMI
>> +static bool enable_pseudo_nmi;
>> +
>> +static int __init early_enable_pseudo_nmi(char *p)
>> +{
>> + enable_pseudo_nmi = true;
>
> And if you're happy with the above, this becomes:
>
> return strtobool(p, &enable_pseudo_nmi);
>
Thanks,
--
Julien Thierry
On 30/01/2019 12:07, Christoffer Dall wrote:
> On Mon, Jan 21, 2019 at 03:33:29PM +0000, Julien Thierry wrote:
>> Interrupts masked by ICC_PMR_EL1 will not be signaled to the CPU. This
>> means that hypervisor will not receive masked interrupts while running a
>> guest.
>>
>
> You could add to the commit description how this works overall,
> something along the lines of:
>
> We need to make sure that all maskable interrupts are masked from the
> time we call local_irq_disable() in the main run loop, and remain so
> until we call local_irq_enable() after returning from the guest, and we
> need to ensure that we see no interrupts at all (including pseudo-NMIs)
> in the middle of the VM world-switch, while at the same time we need to
> ensure we exit the guest when there are interrupts for the host.
>
> We can accomplish this with pseudo-NMIs enabled by:
> (1) local_irq_disable: set the priority mask
> (2) enter guest: set PSTATE.I
> (3) clear the priority mask
> (4) eret to guest
> (5) exit guest: set the priotiy mask
> clear PSTATE.I (and restore other host PSTATE bits)
> (6) local_irq_enable: clear the priority mask.
>
Thanks for the suggestion. I'll add it to the commit.
> Also, took me a while to realize that when we come back from the guest,
> we call local_daif_restore with DAIF_PROCCTX_NOIRQ, which actually does
> both of the things in (5).
>
Yes, I can imagine this being no that obvious. I'll add a comment above
the call to local_daif_restore() in kvm_arm_vhe_guest_exit().
>> Avoid this by making sure ICC_PMR_EL1 is unmasked when we enter a guest.
>>
>> Signed-off-by: Julien Thierry <[email protected]>
>> Acked-by: Catalin Marinas <[email protected]>
>> Cc: Christoffer Dall <[email protected]>
>> Cc: Marc Zyngier <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: [email protected]
>> ---
>> arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
>> arch/arm64/kvm/hyp/switch.c | 16 ++++++++++++++++
>> 2 files changed, 28 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 7732d0b..a1f9f55 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -24,6 +24,7 @@
>>
>> #include <linux/types.h>
>> #include <linux/kvm_types.h>
>> +#include <asm/arch_gicv3.h>
>> #include <asm/cpufeature.h>
>> #include <asm/daifflags.h>
>> #include <asm/fpsimd.h>
>> @@ -474,6 +475,17 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
>> static inline void kvm_arm_vhe_guest_enter(void)
>> {
>> local_daif_mask();
>> +
>> + /*
>> + * Having IRQs masked via PMR when entering the guest means the GIC
>> + * will not signal the CPU of interrupts of lower priority, and the
>> + * only way to get out will be via guest exceptions.
>> + * Naturally, we want to avoid this.
>> + */
>> + if (system_uses_irq_prio_masking()) {
>> + gic_write_pmr(GIC_PRIO_IRQON);
>> + dsb(sy);
>> + }
>> }
>>
>> static inline void kvm_arm_vhe_guest_exit(void)
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index b0b1478..6a4c2d6 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -22,6 +22,7 @@
>>
>> #include <kvm/arm_psci.h>
>>
>> +#include <asm/arch_gicv3.h>
>> #include <asm/cpufeature.h>
>> #include <asm/kvm_asm.h>
>> #include <asm/kvm_emulate.h>
>> @@ -521,6 +522,17 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>> struct kvm_cpu_context *guest_ctxt;
>> u64 exit_code;
>>
>> + /*
>> + * Having IRQs masked via PMR when entering the guest means the GIC
>> + * will not signal the CPU of interrupts of lower priority, and the
>> + * only way to get out will be via guest exceptions.
>> + * Naturally, we want to avoid this.
>> + */
>> + if (system_uses_irq_prio_masking()) {
>> + gic_write_pmr(GIC_PRIO_IRQON);
>> + dsb(sy);
>> + }
>> +
>> vcpu = kern_hyp_va(vcpu);
>>
>> host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>> @@ -573,6 +585,10 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>> */
>> __debug_switch_to_host(vcpu);
>>
>> + /* Returning to host will clear PSR.I, remask PMR if needed */
>> + if (system_uses_irq_prio_masking())
>> + gic_write_pmr(GIC_PRIO_IRQOFF);
>> +
>> return exit_code;
>> }
>>
>
> nit: you could consider moving the non-vhe part into a new
> kvm_arm_nvhe_guest_enter, for symmetry with the vhe part.
>
The idea is tempting, however on the one hand for VHE we do the guest
enter/exit around the VHE vcpu run call, but in the non-VHE the guest
enter/exit would be called from within the vcpu run call since we rely
on the hvc happening to mask daif before unmasking PMR.
So, I'm not sure we would gain that much symmetry with the VHE side.
Unless we move the kvm_arm_vhe_guest_enter()/exit() to VHE vcpu run call
(they are empty for arch/arm and only called in one site anyway...).
> Otherwise looks good to me:
>
> Reviewed-by: Christoffer Dall <[email protected]>
>
Thanks,
--
Julien Thierry
On Mon, Jan 28, 2019 at 03:42:42PM +0000, Julien Thierry wrote:
> Hi James,
>
> On 28/01/2019 11:48, James Morse wrote:
> > Hi Julien,
> >
> > On 21/01/2019 15:33, Julien Thierry wrote:
> >> When using VHE, the host needs to clear HCR_EL2.TGE bit in order
> >> to interract with guest TLBs, switching from EL2&0 translation regime
> >
> > (interact)
> >
> >
> >> to EL1&0.
> >>
> >> However, some non-maskable asynchronous event could happen while TGE is
> >> cleared like SDEI. Because of this address translation operations
> >> relying on EL2&0 translation regime could fail (tlb invalidation,
> >> userspace access, ...).
> >>
> >> Fix this by properly setting HCR_EL2.TGE when entering NMI context and
> >> clear it if necessary when returning to the interrupted context.
> >
> > Yes please. This would not have been fun to debug!
> >
> > Reviewed-by: James Morse <[email protected]>
> >
> >
>
> Thanks.
>
> >
> > I was looking for why we need core code to do this, instead of updating the
> > arch's call sites. Your 'irqdesc: Add domain handlers for NMIs' patch (pointed
> > to from the cover letter) is the reason: core-code calls nmi_enter()/nmi_exit()
> > itself.
> >
>
> Yes, that's the main reason.
>
I wondered the same thing, but I don't understand the explanation :(
Why can't we do a local_daif_mask() around the (very small) calls that
clear TGE instead?
Thanks,
Christoffer
On 31/01/2019 08:19, Christoffer Dall wrote:
> On Mon, Jan 28, 2019 at 03:42:42PM +0000, Julien Thierry wrote:
>> Hi James,
>>
>> On 28/01/2019 11:48, James Morse wrote:
>>> Hi Julien,
>>>
>>> On 21/01/2019 15:33, Julien Thierry wrote:
>>>> When using VHE, the host needs to clear HCR_EL2.TGE bit in order
>>>> to interract with guest TLBs, switching from EL2&0 translation regime
>>>
>>> (interact)
>>>
>>>
>>>> to EL1&0.
>>>>
>>>> However, some non-maskable asynchronous event could happen while TGE is
>>>> cleared like SDEI. Because of this address translation operations
>>>> relying on EL2&0 translation regime could fail (tlb invalidation,
>>>> userspace access, ...).
>>>>
>>>> Fix this by properly setting HCR_EL2.TGE when entering NMI context and
>>>> clear it if necessary when returning to the interrupted context.
>>>
>>> Yes please. This would not have been fun to debug!
>>>
>>> Reviewed-by: James Morse <[email protected]>
>>>
>>>
>>
>> Thanks.
>>
>>>
>>> I was looking for why we need core code to do this, instead of updating the
>>> arch's call sites. Your 'irqdesc: Add domain handlers for NMIs' patch (pointed
>>> to from the cover letter) is the reason: core-code calls nmi_enter()/nmi_exit()
>>> itself.
>>>
>>
>> Yes, that's the main reason.
>>
> I wondered the same thing, but I don't understand the explanation :(
>
> Why can't we do a local_daif_mask() around the (very small) calls that
> clear TGE instead?
>
That would protect against the pseudo-NMIs, but you can still get an
SDEI at that point even with all daif bits set. Or did I misunderstand
how SDEI works?
Thanks,
--
Julien Thierry
On Thu, Jan 31, 2019 at 08:56:04AM +0000, Julien Thierry wrote:
>
>
> On 31/01/2019 08:19, Christoffer Dall wrote:
> > On Mon, Jan 28, 2019 at 03:42:42PM +0000, Julien Thierry wrote:
> >> Hi James,
> >>
> >> On 28/01/2019 11:48, James Morse wrote:
> >>> Hi Julien,
> >>>
> >>> On 21/01/2019 15:33, Julien Thierry wrote:
> >>>> When using VHE, the host needs to clear HCR_EL2.TGE bit in order
> >>>> to interract with guest TLBs, switching from EL2&0 translation regime
> >>>
> >>> (interact)
> >>>
> >>>
> >>>> to EL1&0.
> >>>>
> >>>> However, some non-maskable asynchronous event could happen while TGE is
> >>>> cleared like SDEI. Because of this address translation operations
> >>>> relying on EL2&0 translation regime could fail (tlb invalidation,
> >>>> userspace access, ...).
> >>>>
> >>>> Fix this by properly setting HCR_EL2.TGE when entering NMI context and
> >>>> clear it if necessary when returning to the interrupted context.
> >>>
> >>> Yes please. This would not have been fun to debug!
> >>>
> >>> Reviewed-by: James Morse <[email protected]>
> >>>
> >>>
> >>
> >> Thanks.
> >>
> >>>
> >>> I was looking for why we need core code to do this, instead of updating the
> >>> arch's call sites. Your 'irqdesc: Add domain handlers for NMIs' patch (pointed
> >>> to from the cover letter) is the reason: core-code calls nmi_enter()/nmi_exit()
> >>> itself.
> >>>
> >>
> >> Yes, that's the main reason.
> >>
> > I wondered the same thing, but I don't understand the explanation :(
> >
> > Why can't we do a local_daif_mask() around the (very small) calls that
> > clear TGE instead?
> >
>
> That would protect against the pseudo-NMIs, but you can still get an
> SDEI at that point even with all daif bits set. Or did I misunderstand
> how SDEI works?
>
I don't know the details of SDEI. From looking at this patch, the
logical conclusion would be that SDEIs can then only be delivered once
we've called nmi_enter, but since we don't call this directly from the
code that clears TGE for doing guest TLB invalidation (or do we?) then
masking interrupts at the PSTATE level should be sufficient.
Surely I'm missing some part of the bigger picture here.
Thanks,
Christoffer
On 31/01/2019 09:27, Christoffer Dall wrote:
> On Thu, Jan 31, 2019 at 08:56:04AM +0000, Julien Thierry wrote:
>>
>>
>> On 31/01/2019 08:19, Christoffer Dall wrote:
>>> On Mon, Jan 28, 2019 at 03:42:42PM +0000, Julien Thierry wrote:
>>>> Hi James,
>>>>
>>>> On 28/01/2019 11:48, James Morse wrote:
>>>>> Hi Julien,
>>>>>
>>>>> On 21/01/2019 15:33, Julien Thierry wrote:
>>>>>> When using VHE, the host needs to clear HCR_EL2.TGE bit in order
>>>>>> to interract with guest TLBs, switching from EL2&0 translation regime
>>>>>
>>>>> (interact)
>>>>>
>>>>>
>>>>>> to EL1&0.
>>>>>>
>>>>>> However, some non-maskable asynchronous event could happen while TGE is
>>>>>> cleared like SDEI. Because of this address translation operations
>>>>>> relying on EL2&0 translation regime could fail (tlb invalidation,
>>>>>> userspace access, ...).
>>>>>>
>>>>>> Fix this by properly setting HCR_EL2.TGE when entering NMI context and
>>>>>> clear it if necessary when returning to the interrupted context.
>>>>>
>>>>> Yes please. This would not have been fun to debug!
>>>>>
>>>>> Reviewed-by: James Morse <[email protected]>
>>>>>
>>>>>
>>>>
>>>> Thanks.
>>>>
>>>>>
>>>>> I was looking for why we need core code to do this, instead of updating the
>>>>> arch's call sites. Your 'irqdesc: Add domain handlers for NMIs' patch (pointed
>>>>> to from the cover letter) is the reason: core-code calls nmi_enter()/nmi_exit()
>>>>> itself.
>>>>>
>>>>
>>>> Yes, that's the main reason.
>>>>
>>> I wondered the same thing, but I don't understand the explanation :(
>>>
>>> Why can't we do a local_daif_mask() around the (very small) calls that
>>> clear TGE instead?
>>>
>>
>> That would protect against the pseudo-NMIs, but you can still get an
>> SDEI at that point even with all daif bits set. Or did I misunderstand
>> how SDEI works?
>>
>
> I don't know the details of SDEI. From looking at this patch, the
> logical conclusion would be that SDEIs can then only be delivered once
> we've called nmi_enter, but since we don't call this directly from the
> code that clears TGE for doing guest TLB invalidation (or do we?) then
> masking interrupts at the PSTATE level should be sufficient.
>
> Surely I'm missing some part of the bigger picture here.
>
I'm not sure I understand. SDEI uses the NMI context and AFAIU, it is an
interrupt that the firmware sends to the OS, and it is sent regardless
of the PSTATE at the OS EL.
So, the worrying part is:
- Hyp clears TGE
- Exception/interrupt taken to EL3
- Firmware decides it's a good time to send an SDEI to the OS
- SDEI handler (at EL2 for VHE) does nmi_enter()
- SDEI handler needs to do cache invalidation or something with the
EL2&0 translation regime but TGE is cleared
We don't expect the code that clears TGE to call nmi_enter().
--
Julien Thierry
On Thu, Jan 31, 2019 at 09:40:02AM +0000, Julien Thierry wrote:
>
>
> On 31/01/2019 09:27, Christoffer Dall wrote:
> > On Thu, Jan 31, 2019 at 08:56:04AM +0000, Julien Thierry wrote:
> >>
> >>
> >> On 31/01/2019 08:19, Christoffer Dall wrote:
> >>> On Mon, Jan 28, 2019 at 03:42:42PM +0000, Julien Thierry wrote:
> >>>> Hi James,
> >>>>
> >>>> On 28/01/2019 11:48, James Morse wrote:
> >>>>> Hi Julien,
> >>>>>
> >>>>> On 21/01/2019 15:33, Julien Thierry wrote:
> >>>>>> When using VHE, the host needs to clear HCR_EL2.TGE bit in order
> >>>>>> to interract with guest TLBs, switching from EL2&0 translation regime
> >>>>>
> >>>>> (interact)
> >>>>>
> >>>>>
> >>>>>> to EL1&0.
> >>>>>>
> >>>>>> However, some non-maskable asynchronous event could happen while TGE is
> >>>>>> cleared like SDEI. Because of this address translation operations
> >>>>>> relying on EL2&0 translation regime could fail (tlb invalidation,
> >>>>>> userspace access, ...).
> >>>>>>
> >>>>>> Fix this by properly setting HCR_EL2.TGE when entering NMI context and
> >>>>>> clear it if necessary when returning to the interrupted context.
> >>>>>
> >>>>> Yes please. This would not have been fun to debug!
> >>>>>
> >>>>> Reviewed-by: James Morse <[email protected]>
> >>>>>
> >>>>>
> >>>>
> >>>> Thanks.
> >>>>
> >>>>>
> >>>>> I was looking for why we need core code to do this, instead of updating the
> >>>>> arch's call sites. Your 'irqdesc: Add domain handlers for NMIs' patch (pointed
> >>>>> to from the cover letter) is the reason: core-code calls nmi_enter()/nmi_exit()
> >>>>> itself.
> >>>>>
> >>>>
> >>>> Yes, that's the main reason.
> >>>>
> >>> I wondered the same thing, but I don't understand the explanation :(
> >>>
> >>> Why can't we do a local_daif_mask() around the (very small) calls that
> >>> clear TGE instead?
> >>>
> >>
> >> That would protect against the pseudo-NMIs, but you can still get an
> >> SDEI at that point even with all daif bits set. Or did I misunderstand
> >> how SDEI works?
> >>
> >
> > I don't know the details of SDEI. From looking at this patch, the
> > logical conclusion would be that SDEIs can then only be delivered once
> > we've called nmi_enter, but since we don't call this directly from the
> > code that clears TGE for doing guest TLB invalidation (or do we?) then
> > masking interrupts at the PSTATE level should be sufficient.
> >
> > Surely I'm missing some part of the bigger picture here.
> >
>
> I'm not sure I understand. SDEI uses the NMI context and AFAIU, it is an
> interrupt that the firmware sends to the OS, and it is sent regardless
> of the PSTATE at the OS EL.
>
> So, the worrying part is:
> - Hyp clears TGE
> - Exception/interrupt taken to EL3
> - Firmware decides it's a good time to send an SDEI to the OS
> - SDEI handler (at EL2 for VHE) does nmi_enter()
> - SDEI handler needs to do cache invalidation or something with the
> EL2&0 translation regime but TGE is cleared
>
> We don't expect the code that clears TGE to call nmi_enter().
>
You do understand :)
I didn't understand that the SDEI handler calls nmi_enter() -- and to be
fair the commit message didn't really provide that link -- but it
makes perfect sense now. I naively thought that SDEI had respected the
pstate bits setting before, and that this was becoming a problem with
the introduction of pseudo-NMIs, but I clearly came at this from the
wrong direction.
Thanks for the explanation!
Christoffer
On 31/01/2019 09:40, Julien Thierry wrote:
>
>
> On 31/01/2019 09:27, Christoffer Dall wrote:
>> On Thu, Jan 31, 2019 at 08:56:04AM +0000, Julien Thierry wrote:
>>>
>>>
>>> On 31/01/2019 08:19, Christoffer Dall wrote:
>>>> On Mon, Jan 28, 2019 at 03:42:42PM +0000, Julien Thierry wrote:
>>>>> Hi James,
>>>>>
>>>>> On 28/01/2019 11:48, James Morse wrote:
>>>>>> Hi Julien,
>>>>>>
>>>>>> On 21/01/2019 15:33, Julien Thierry wrote:
>>>>>>> When using VHE, the host needs to clear HCR_EL2.TGE bit in order
>>>>>>> to interract with guest TLBs, switching from EL2&0 translation regime
>>>>>>
>>>>>> (interact)
>>>>>>
>>>>>>
>>>>>>> to EL1&0.
>>>>>>>
>>>>>>> However, some non-maskable asynchronous event could happen while TGE is
>>>>>>> cleared like SDEI. Because of this address translation operations
>>>>>>> relying on EL2&0 translation regime could fail (tlb invalidation,
>>>>>>> userspace access, ...).
>>>>>>>
>>>>>>> Fix this by properly setting HCR_EL2.TGE when entering NMI context and
>>>>>>> clear it if necessary when returning to the interrupted context.
>>>>>>
>>>>>> Yes please. This would not have been fun to debug!
>>>>>>
>>>>>> Reviewed-by: James Morse <[email protected]>
>>>>>>
>>>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>>
>>>>>> I was looking for why we need core code to do this, instead of updating the
>>>>>> arch's call sites. Your 'irqdesc: Add domain handlers for NMIs' patch (pointed
>>>>>> to from the cover letter) is the reason: core-code calls nmi_enter()/nmi_exit()
>>>>>> itself.
>>>>>>
>>>>>
>>>>> Yes, that's the main reason.
>>>>>
>>>> I wondered the same thing, but I don't understand the explanation :(
>>>>
>>>> Why can't we do a local_daif_mask() around the (very small) calls that
>>>> clear TGE instead?
>>>>
>>>
>>> That would protect against the pseudo-NMIs, but you can still get an
>>> SDEI at that point even with all daif bits set. Or did I misunderstand
>>> how SDEI works?
>>>
>>
>> I don't know the details of SDEI. From looking at this patch, the
>> logical conclusion would be that SDEIs can then only be delivered once
>> we've called nmi_enter, but since we don't call this directly from the
>> code that clears TGE for doing guest TLB invalidation (or do we?) then
>> masking interrupts at the PSTATE level should be sufficient.
>>
>> Surely I'm missing some part of the bigger picture here.
>>
>
> I'm not sure I understand. SDEI uses the NMI context and AFAIU, it is an
> interrupt that the firmware sends to the OS, and it is sent regardless
> of the PSTATE at the OS EL.
I don't think we can describe SDEI as an interrupt. It is not even an
exception. It is just EL3 ERET-ing to a pre-defined location. And yes,
it will completely ignore any form of mask bit.
>
> So, the worrying part is:
> - Hyp clears TGE
> - Exception/interrupt taken to EL3
> - Firmware decides it's a good time to send an SDEI to the OS
> - SDEI handler (at EL2 for VHE) does nmi_enter()
> - SDEI handler needs to do cache invalidation or something with the
> EL2&0 translation regime but TGE is cleared
>
> We don't expect the code that clears TGE to call nmi_enter().
Indeed. Without this patch, SDEI is already broken. Pseudo-NMIs only
make the bug easier to trigger.
Thanks,
M.
--
Jazz is not dead. It just smells funny...