2021-09-24 13:35:04

by Pingfan Liu

[permalink] [raw]
Subject: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of

After introducing arm64/kernel/entry_common.c which is akin to
kernel/entry/common.c , the housekeeping of rcu/trace are done twice as
the following:
enter_from_kernel_mode()->rcu_irq_enter().
And
gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter()

Besides redundance, based on code analysis, the redundance also raise
some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes
rcu_is_cpu_rrupt_from_idle() unexpected.

Nmi also faces duplicate accounts. This series aims to address these
duplicate issues.
[1-2/5]: address nmi account duplicate
[3-4/5]: address rcu housekeeping duplicate in irq
[5/5]: as a natural result of [3-4/5], address a history issue. [1]


History:
v1 -> v2:
change the subject as the motivation varies.
add the fix for nmi account duplicate

The subject of v1 is "[PATCH 1/3] kernel/irq: __handle_domain_irq()
makes irq_enter/exit arch optional". [2] It is brought up to fix [1].

There have been some tries to enable crash-stop-NMI on arm64, one by me,
the other by Yuichi's [4]. I hope after this series, they can advance,
as Marc said in [3] "No additional NMI patches will make it until we
have resolved the issues"

[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
[2] https://lore.kernel.org/linux-arm-kernel/[email protected]
[3] https://lore.kernel.org/linux-arm-kernel/[email protected]
[4] https://lore.kernel.org/linux-arm-kernel/[email protected]

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Joey Gouly <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Julien Thierry <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yuichi Ito <[email protected]>
Cc: [email protected]
To: [email protected]


Pingfan Liu (5):
arm64/entry-common: push the judgement of nmi ahead
irqchip/GICv3: expose handle_nmi() directly
kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch
optional
irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64
irqchip/GICv3: make reschedule-ipi light weight

arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/irq.h | 7 ++++
arch/arm64/kernel/entry-common.c | 45 +++++++++++++++-------
arch/arm64/kernel/irq.c | 29 ++++++++++++++
drivers/irqchip/irq-gic-v3.c | 66 ++++++++++++++++++++------------
kernel/irq/Kconfig | 3 ++
kernel/irq/irqdesc.c | 4 ++
7 files changed, 116 insertions(+), 39 deletions(-)

--
2.31.1


2021-09-24 13:36:13

by Pingfan Liu

[permalink] [raw]
Subject: [PATCHv2 4/5] irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64

The call to rcu_irq_enter() originated from gic_handle_irq() is
redundant now, since arm64 has enter_from_kernel_mode() akin to
irqenter_entry(), which has already called rcu_irq_enter().

Based on code analysis, the redundant can raise some mistake, e.g.
rcu_data->dynticks_nmi_nesting inc 2, which causes
rcu_is_cpu_rrupt_from_idle() unexpected.

So eliminate the call to irq_enter() in handle_domain_irq(). And
accordingly supplementing irq_enter_rcu().

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Joey Gouly <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Julien Thierry <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yuichi Ito <[email protected]>
Cc: [email protected]
To: [email protected]
---
arch/arm64/Kconfig | 1 +
drivers/irqchip/irq-gic-v3.c | 2 ++
2 files changed, 3 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5c7ae4c3954b..d29bae38a951 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -98,6 +98,7 @@ config ARM64
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARM_AMBA
select ARM_ARCH_TIMER
+ select HAVE_ARCH_IRQENTRY
select ARM_GIC
select AUDIT_ARCH_COMPAT_GENERIC
select ARM_GIC_V2M if PCI
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 89dcec902a82..906538fa8771 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -729,10 +729,12 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
else
isb();

+ irq_enter_rcu();
if (handle_domain_irq(gic_data.domain, irqnr, regs)) {
WARN_ONCE(true, "Unexpected interrupt received!\n");
gic_deactivate_unhandled(irqnr);
}
+ irq_exit_rcu();
}

static u32 gic_get_pribits(void)
--
2.31.1

2021-09-24 19:06:03

by Pingfan Liu

[permalink] [raw]
Subject: [PATCHv2 5/5] irqchip/GICv3: make reschedule-ipi light weight

To achieve the light weight as
DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_reschedule_ipi) on x86, it had
better treat irqnr differently at the frontend. And let IPI_RESCHEDULE
call __irq_enter_raw() instead of irq_enter_rcu().

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Joey Gouly <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Julien Thierry <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yuichi Ito <[email protected]>
Cc: [email protected]
To: [email protected]
---
drivers/irqchip/irq-gic-v3.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 906538fa8771..593d4539a209 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -709,6 +709,9 @@ static void gic_handle_nmi(struct pt_regs *regs)
gic_deactivate_unhandled(irqnr);
}

+/* RESCHEDULE IPI hwirq nr is 0, and the only raw one */
+static unsigned long raw_interrupt_mask = 1;
+
static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
{
u32 irqnr;
@@ -729,12 +732,20 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
else
isb();

- irq_enter_rcu();
+ if (raw_interrupt_mask & 1 << irqnr)
+ __irq_enter_raw();
+ else
+ irq_enter_rcu();
+
if (handle_domain_irq(gic_data.domain, irqnr, regs)) {
WARN_ONCE(true, "Unexpected interrupt received!\n");
gic_deactivate_unhandled(irqnr);
}
- irq_exit_rcu();
+
+ if (raw_interrupt_mask & 1 << irqnr)
+ __irq_exit_raw();
+ else
+ irq_exit_rcu();
}

static u32 gic_get_pribits(void)
--
2.31.1

2021-09-24 20:50:15

by Pingfan Liu

[permalink] [raw]
Subject: [PATCHv2 2/5] irqchip/GICv3: expose handle_nmi() directly

With the previous patch, the NMI should be dispatched at irqentry level.
Accordingly adjust GICv3 to utilize the hooks, so NMI handler can be
dispatched.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Joey Gouly <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Julien Thierry <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yuichi Ito <[email protected]>
Cc: [email protected]
To: [email protected]
---
arch/arm64/include/asm/irq.h | 2 ++
drivers/irqchip/irq-gic-v3.c | 53 +++++++++++++++++++-----------------
2 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index a59b1745f458..c39627290a60 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -11,6 +11,8 @@ struct pt_regs;
int set_handle_irq(void (*handle_irq)(struct pt_regs *));
#define set_handle_irq set_handle_irq
int set_handle_fiq(void (*handle_fiq)(struct pt_regs *));
+int set_handle_nmi(void (*handle_nmi)(struct pt_regs *));
+int set_nmi_discriminator(bool (*discriminator)(void));

extern void (*handle_arch_irq)(struct pt_regs *regs);
extern void (*handle_arch_fiq)(struct pt_regs *regs);
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index fd4e9a37fea6..89dcec902a82 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -644,28 +644,12 @@ static void gic_deactivate_unhandled(u32 irqnr)
}
}

-static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs)
+static bool gic_is_in_nmi(void)
{
- bool irqs_enabled = interrupts_enabled(regs);
- int err;
-
- if (irqs_enabled)
- nmi_enter();
-
- if (static_branch_likely(&supports_deactivate_key))
- gic_write_eoir(irqnr);
- /*
- * Leave the PSR.I bit set to prevent other NMIs to be
- * received while handling this one.
- * PSR.I will be restored when we ERET to the
- * interrupted context.
- */
- err = handle_domain_nmi(gic_data.domain, irqnr, regs);
- if (err)
- gic_deactivate_unhandled(irqnr);
+ if (gic_supports_nmi() && unlikely(gic_read_rpr() == GICD_INT_NMI_PRI))
+ return true;

- if (irqs_enabled)
- nmi_exit();
+ return false;
}

static u32 do_read_iar(struct pt_regs *regs)
@@ -702,21 +686,38 @@ static u32 do_read_iar(struct pt_regs *regs)
return iar;
}

-static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
+static void gic_handle_nmi(struct pt_regs *regs)
{
u32 irqnr;
+ int err;

irqnr = do_read_iar(regs);

/* Check for special IDs first */
if ((irqnr >= 1020 && irqnr <= 1023))
return;
+ if (static_branch_likely(&supports_deactivate_key))
+ gic_write_eoir(irqnr);
+ /*
+ * Leave the PSR.I bit set to prevent other NMIs to be
+ * received while handling this one.
+ * PSR.I will be restored when we ERET to the
+ * interrupted context.
+ */
+ err = handle_domain_nmi(gic_data.domain, irqnr, regs);
+ if (err)
+ gic_deactivate_unhandled(irqnr);
+}

- if (gic_supports_nmi() &&
- unlikely(gic_read_rpr() == GICD_INT_RPR_PRI(GICD_INT_NMI_PRI))) {
- gic_handle_nmi(irqnr, regs);
+static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
+{
+ u32 irqnr;
+
+ irqnr = do_read_iar(regs);
+
+ /* Check for special IDs first */
+ if ((irqnr >= 1020 && irqnr <= 1023))
return;
- }

if (gic_prio_masking_enabled()) {
gic_pmr_mask_irqs();
@@ -1791,6 +1792,8 @@ static int __init gic_init_bases(void __iomem *dist_base,
}

set_handle_irq(gic_handle_irq);
+ set_handle_nmi(gic_handle_nmi);
+ set_nmi_discriminator(gic_is_in_nmi);

gic_update_rdist_properties();

--
2.31.1

2021-09-24 20:50:15

by Pingfan Liu

[permalink] [raw]
Subject: [PATCHv2 3/5] kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch optional

handle_domain_irq() contains two major parts:
-1. irq_enter()/irq_exit(), which serves as hooks for rcu and trace etc.
-2. irq mapping and dispatching

After the introduction of irqentry_enter()/irqentry_exit() and arch
specific counterpart (e.g. arm64), roughly speaking, rcu_irq_enter() has
already been called. Hence here comes requirement to move
irq_enter/irq_exit out of handle_domain_irq(). And arches should handle
about irq_enter_rcu()/irq_exit_rcu() by themself.

Since there is still arches, which does not adopt irqentry_enter(), let
irq_enter()/irq_exit() arch optional in handle_domain_irq().

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Joey Gouly <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Julien Thierry <[email protected]>
Cc: Yuichi Ito <[email protected]>
Cc: [email protected]
To: [email protected]
---
kernel/irq/Kconfig | 3 +++
kernel/irq/irqdesc.c | 4 ++++
2 files changed, 7 insertions(+)

diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index fbc54c2a7f23..defa1db2d664 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -100,6 +100,9 @@ config IRQ_MSI_IOMMU
config HANDLE_DOMAIN_IRQ
bool

+config HAVE_ARCH_IRQENTRY
+ bool
+
config IRQ_TIMINGS
bool

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 4e3c29bb603c..fd5dd9d278b5 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -693,7 +693,9 @@ int handle_domain_irq(struct irq_domain *domain,
struct irq_desc *desc;
int ret = 0;

+#ifndef CONFIG_HAVE_ARCH_IRQENTRY
irq_enter();
+#endif

/* The irqdomain code provides boundary checks */
desc = irq_resolve_mapping(domain, hwirq);
@@ -702,7 +704,9 @@ int handle_domain_irq(struct irq_domain *domain,
else
ret = -EINVAL;

+#ifndef CONFIG_HAVE_ARCH_IRQENTRY
irq_exit();
+#endif
set_irq_regs(old_regs);
return ret;
}
--
2.31.1

2021-09-25 00:06:57

by Pingfan Liu

[permalink] [raw]
Subject: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead

In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
irq, which makes the condition !interrupts_enabled(regs) fail to detect
the NMI. This will cause a mistaken account for irq.

Introducing two interfaces: handle_arch_nmi and interrupt_is_nmi to
judge NMI at this stage.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Joey Gouly <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Julien Thierry <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yuichi Ito <[email protected]>
Cc: [email protected]
To: [email protected]
---
arch/arm64/include/asm/irq.h | 5 ++++
arch/arm64/kernel/entry-common.c | 45 ++++++++++++++++++++++----------
arch/arm64/kernel/irq.c | 29 ++++++++++++++++++++
3 files changed, 65 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index fac08e18bcd5..a59b1745f458 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -12,6 +12,11 @@ int set_handle_irq(void (*handle_irq)(struct pt_regs *));
#define set_handle_irq set_handle_irq
int set_handle_fiq(void (*handle_fiq)(struct pt_regs *));

+extern void (*handle_arch_irq)(struct pt_regs *regs);
+extern void (*handle_arch_fiq)(struct pt_regs *regs);
+extern void (*handle_arch_nmi)(struct pt_regs *regs);
+extern bool (*interrupt_is_nmi)(void);
+
static inline int nr_legacy_irqs(void)
{
return 0;
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 32f9796c4ffe..69a8cc082712 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -24,6 +24,7 @@
#include <asm/stacktrace.h>
#include <asm/sysreg.h>
#include <asm/system_misc.h>
+#include <asm/irq.h>

/*
* Handle IRQ/context state management when entering from kernel mode.
@@ -219,17 +220,28 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs)
lockdep_hardirqs_on(CALLER_ADDR0);
}

-static void noinstr enter_el1_irq_or_nmi(struct pt_regs *regs)
+static inline bool arm64_in_nmi(struct pt_regs *regs)
{
- if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
+ if (!interrupts_enabled(regs) || (*interrupt_is_nmi)())
+ return true;
+ return false;
+}
+
+/* return true if in irq, otherwise in nmi */
+static bool noinstr enter_el1_irq_or_nmi(struct pt_regs *regs)
+{
+ if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && arm64_in_nmi(regs)) {
arm64_enter_nmi(regs);
- else
+ return false;
+ } else {
enter_from_kernel_mode(regs);
+ return true;
+ }
}

-static void noinstr exit_el1_irq_or_nmi(struct pt_regs *regs)
+static void noinstr exit_el1_irq_or_nmi(struct pt_regs *regs, bool in_irq)
{
- if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
+ if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !in_irq)
arm64_exit_nmi(regs);
else
exit_to_kernel_mode(regs);
@@ -269,9 +281,6 @@ static void do_interrupt_handler(struct pt_regs *regs,
handler(regs);
}

-extern void (*handle_arch_irq)(struct pt_regs *);
-extern void (*handle_arch_fiq)(struct pt_regs *);
-
static void noinstr __panic_unhandled(struct pt_regs *regs, const char *vector,
unsigned int esr)
{
@@ -433,12 +442,20 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
}

static void noinstr el1_interrupt(struct pt_regs *regs,
- void (*handler)(struct pt_regs *))
+ void (*handler)(struct pt_regs *),
+ void (*nmi_handler)(struct pt_regs *))
{
+ bool in_irq;
+ void (*h)(struct pt_regs *regs);
+
write_sysreg(DAIF_PROCCTX_NOIRQ, daif);

- enter_el1_irq_or_nmi(regs);
- do_interrupt_handler(regs, handler);
+ in_irq = enter_el1_irq_or_nmi(regs);
+ if (in_irq)
+ h = handler;
+ else
+ h = nmi_handler;
+ do_interrupt_handler(regs, h);

/*
* Note: thread_info::preempt_count includes both thread_info::count
@@ -449,17 +466,17 @@ static void noinstr el1_interrupt(struct pt_regs *regs,
READ_ONCE(current_thread_info()->preempt_count) == 0)
arm64_preempt_schedule_irq();

- exit_el1_irq_or_nmi(regs);
+ exit_el1_irq_or_nmi(regs, in_irq);
}

asmlinkage void noinstr el1h_64_irq_handler(struct pt_regs *regs)
{
- el1_interrupt(regs, handle_arch_irq);
+ el1_interrupt(regs, handle_arch_irq, handle_arch_nmi);
}

asmlinkage void noinstr el1h_64_fiq_handler(struct pt_regs *regs)
{
- el1_interrupt(regs, handle_arch_fiq);
+ el1_interrupt(regs, handle_arch_fiq, handle_arch_nmi);
}

asmlinkage void noinstr el1h_64_error_handler(struct pt_regs *regs)
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index bda49430c9ea..e67435eb4cba 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -81,8 +81,19 @@ static void default_handle_fiq(struct pt_regs *regs)
panic("FIQ taken without a root FIQ handler\n");
}

+static void default_handle_nmi(struct pt_regs *unused)
+{
+}
+
+static bool default_nmi_discriminator(void)
+{
+ return false;
+}
+
void (*handle_arch_irq)(struct pt_regs *) __ro_after_init = default_handle_irq;
void (*handle_arch_fiq)(struct pt_regs *) __ro_after_init = default_handle_fiq;
+void (*handle_arch_nmi)(struct pt_regs *) __ro_after_init = default_handle_nmi;
+bool (*interrupt_is_nmi)(void) __ro_after_init = default_nmi_discriminator;

int __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
{
@@ -104,6 +115,24 @@ int __init set_handle_fiq(void (*handle_fiq)(struct pt_regs *))
return 0;
}

+int __init set_handle_nmi(void (*handle_nmi)(struct pt_regs *))
+{
+ if (handle_arch_nmi != default_handle_nmi)
+ return -EBUSY;
+
+ handle_arch_nmi = handle_nmi;
+ return 0;
+}
+
+int __init set_nmi_discriminator(bool (*discriminator)(void))
+{
+ if (interrupt_is_nmi != default_nmi_discriminator)
+ return -EBUSY;
+
+ interrupt_is_nmi = discriminator;
+ return 0;
+}
+
void __init init_IRQ(void)
{
init_irq_stacks();
--
2.31.1

2021-09-25 03:46:11

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead

On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> irq, which makes the condition !interrupts_enabled(regs) fail to detect
> the NMI. This will cause a mistaken account for irq.

Can you please explain this in more detail? It's not clear which
specific case you mean when you say "NMI interrupts an irq", as that
could mean a number of distinct scenarios.

AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
causes a new exception we'll do the right thing. So either I'm missing a
subtlety or you're describing a different scenario..

Note that the entry code is only trying to distinguish between:

a) This exception is *definitely* an NMI (because regular interrupts
were masked).

b) This exception is *either* and IRQ or an NMI (and this *cannot* be
distinguished until we acknowledge the interrupt), so we treat it as
an IRQ for now.

... and we leave it to the irqchip to handle the gory details. We only
need to distinguish (a) early to avoid nesting IRQ logic within itself
in an unsafe way.

Thanks,
Mark.

> Introducing two interfaces: handle_arch_nmi and interrupt_is_nmi to
> judge NMI at this stage.
>
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Joey Gouly <[email protected]>
> Cc: Sami Tolvanen <[email protected]>
> Cc: Julien Thierry <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Yuichi Ito <[email protected]>
> Cc: [email protected]
> To: [email protected]
> ---
> arch/arm64/include/asm/irq.h | 5 ++++
> arch/arm64/kernel/entry-common.c | 45 ++++++++++++++++++++++----------
> arch/arm64/kernel/irq.c | 29 ++++++++++++++++++++
> 3 files changed, 65 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
> index fac08e18bcd5..a59b1745f458 100644
> --- a/arch/arm64/include/asm/irq.h
> +++ b/arch/arm64/include/asm/irq.h
> @@ -12,6 +12,11 @@ int set_handle_irq(void (*handle_irq)(struct pt_regs *));
> #define set_handle_irq set_handle_irq
> int set_handle_fiq(void (*handle_fiq)(struct pt_regs *));
>
> +extern void (*handle_arch_irq)(struct pt_regs *regs);
> +extern void (*handle_arch_fiq)(struct pt_regs *regs);
> +extern void (*handle_arch_nmi)(struct pt_regs *regs);
> +extern bool (*interrupt_is_nmi)(void);
> +
> static inline int nr_legacy_irqs(void)
> {
> return 0;
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 32f9796c4ffe..69a8cc082712 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -24,6 +24,7 @@
> #include <asm/stacktrace.h>
> #include <asm/sysreg.h>
> #include <asm/system_misc.h>
> +#include <asm/irq.h>
>
> /*
> * Handle IRQ/context state management when entering from kernel mode.
> @@ -219,17 +220,28 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs)
> lockdep_hardirqs_on(CALLER_ADDR0);
> }
>
> -static void noinstr enter_el1_irq_or_nmi(struct pt_regs *regs)
> +static inline bool arm64_in_nmi(struct pt_regs *regs)
> {
> - if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> + if (!interrupts_enabled(regs) || (*interrupt_is_nmi)())
> + return true;
> + return false;
> +}
> +
> +/* return true if in irq, otherwise in nmi */
> +static bool noinstr enter_el1_irq_or_nmi(struct pt_regs *regs)
> +{
> + if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && arm64_in_nmi(regs)) {
> arm64_enter_nmi(regs);
> - else
> + return false;
> + } else {
> enter_from_kernel_mode(regs);
> + return true;
> + }
> }
>
> -static void noinstr exit_el1_irq_or_nmi(struct pt_regs *regs)
> +static void noinstr exit_el1_irq_or_nmi(struct pt_regs *regs, bool in_irq)
> {
> - if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> + if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !in_irq)
> arm64_exit_nmi(regs);
> else
> exit_to_kernel_mode(regs);
> @@ -269,9 +281,6 @@ static void do_interrupt_handler(struct pt_regs *regs,
> handler(regs);
> }
>
> -extern void (*handle_arch_irq)(struct pt_regs *);
> -extern void (*handle_arch_fiq)(struct pt_regs *);
> -
> static void noinstr __panic_unhandled(struct pt_regs *regs, const char *vector,
> unsigned int esr)
> {
> @@ -433,12 +442,20 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
> }
>
> static void noinstr el1_interrupt(struct pt_regs *regs,
> - void (*handler)(struct pt_regs *))
> + void (*handler)(struct pt_regs *),
> + void (*nmi_handler)(struct pt_regs *))
> {
> + bool in_irq;
> + void (*h)(struct pt_regs *regs);
> +
> write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
>
> - enter_el1_irq_or_nmi(regs);
> - do_interrupt_handler(regs, handler);
> + in_irq = enter_el1_irq_or_nmi(regs);
> + if (in_irq)
> + h = handler;
> + else
> + h = nmi_handler;
> + do_interrupt_handler(regs, h);
>
> /*
> * Note: thread_info::preempt_count includes both thread_info::count
> @@ -449,17 +466,17 @@ static void noinstr el1_interrupt(struct pt_regs *regs,
> READ_ONCE(current_thread_info()->preempt_count) == 0)
> arm64_preempt_schedule_irq();
>
> - exit_el1_irq_or_nmi(regs);
> + exit_el1_irq_or_nmi(regs, in_irq);
> }
>
> asmlinkage void noinstr el1h_64_irq_handler(struct pt_regs *regs)
> {
> - el1_interrupt(regs, handle_arch_irq);
> + el1_interrupt(regs, handle_arch_irq, handle_arch_nmi);
> }
>
> asmlinkage void noinstr el1h_64_fiq_handler(struct pt_regs *regs)
> {
> - el1_interrupt(regs, handle_arch_fiq);
> + el1_interrupt(regs, handle_arch_fiq, handle_arch_nmi);
> }
>
> asmlinkage void noinstr el1h_64_error_handler(struct pt_regs *regs)
> diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
> index bda49430c9ea..e67435eb4cba 100644
> --- a/arch/arm64/kernel/irq.c
> +++ b/arch/arm64/kernel/irq.c
> @@ -81,8 +81,19 @@ static void default_handle_fiq(struct pt_regs *regs)
> panic("FIQ taken without a root FIQ handler\n");
> }
>
> +static void default_handle_nmi(struct pt_regs *unused)
> +{
> +}
> +
> +static bool default_nmi_discriminator(void)
> +{
> + return false;
> +}
> +
> void (*handle_arch_irq)(struct pt_regs *) __ro_after_init = default_handle_irq;
> void (*handle_arch_fiq)(struct pt_regs *) __ro_after_init = default_handle_fiq;
> +void (*handle_arch_nmi)(struct pt_regs *) __ro_after_init = default_handle_nmi;
> +bool (*interrupt_is_nmi)(void) __ro_after_init = default_nmi_discriminator;
>
> int __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
> {
> @@ -104,6 +115,24 @@ int __init set_handle_fiq(void (*handle_fiq)(struct pt_regs *))
> return 0;
> }
>
> +int __init set_handle_nmi(void (*handle_nmi)(struct pt_regs *))
> +{
> + if (handle_arch_nmi != default_handle_nmi)
> + return -EBUSY;
> +
> + handle_arch_nmi = handle_nmi;
> + return 0;
> +}
> +
> +int __init set_nmi_discriminator(bool (*discriminator)(void))
> +{
> + if (interrupt_is_nmi != default_nmi_discriminator)
> + return -EBUSY;
> +
> + interrupt_is_nmi = discriminator;
> + return 0;
> +}
> +
> void __init init_IRQ(void)
> {
> init_irq_stacks();
> --
> 2.31.1
>

2021-09-25 04:15:32

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of

[Adding Paul for RCU, s390 folk for entry code RCU semantics]

On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote:
> After introducing arm64/kernel/entry_common.c which is akin to
> kernel/entry/common.c , the housekeeping of rcu/trace are done twice as
> the following:
> enter_from_kernel_mode()->rcu_irq_enter().
> And
> gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter()
>
> Besides redundance, based on code analysis, the redundance also raise
> some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes
> rcu_is_cpu_rrupt_from_idle() unexpected.

Hmmm...

The fundamental questionss are:

1) Who is supposed to be responsible for doing the rcu entry/exit?

2) Is it supposed to matter if this happens multiple times?

For (1), I'd generally expect that this is supposed to happen in the
arch/common entry code, since that itself (or the irqchip driver) could
depend on RCU, and if that's the case thatn handle_domain_irq()
shouldn't need to call rcu_irq_enter(). That would be consistent with
the way we handle all other exceptions.

For (2) I don't know whether the level of nesting is suppoosed to
matter. I was under the impression it wasn't meant to matter in general,
so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a
specific level of nesting.

From a glance it looks like this would cause rcu_sched_clock_irq() to
skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which
doesn't sound right, at least...

Thomas, Paul, thoughts?

AFAICT, s390 will have a similar flow on its IRQ handling path, so if
this is a real issue they'll be affected too.

Thanks,
Mark.

> Nmi also faces duplicate accounts. This series aims to address these
> duplicate issues.
> [1-2/5]: address nmi account duplicate
> [3-4/5]: address rcu housekeeping duplicate in irq
> [5/5]: as a natural result of [3-4/5], address a history issue. [1]
>
>
> History:
> v1 -> v2:
> change the subject as the motivation varies.
> add the fix for nmi account duplicate
>
> The subject of v1 is "[PATCH 1/3] kernel/irq: __handle_domain_irq()
> makes irq_enter/exit arch optional". [2] It is brought up to fix [1].
>
> There have been some tries to enable crash-stop-NMI on arm64, one by me,
> the other by Yuichi's [4]. I hope after this series, they can advance,
> as Marc said in [3] "No additional NMI patches will make it until we
> have resolved the issues"
>
> [1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
> [2] https://lore.kernel.org/linux-arm-kernel/[email protected]
> [3] https://lore.kernel.org/linux-arm-kernel/[email protected]
> [4] https://lore.kernel.org/linux-arm-kernel/[email protected]
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Joey Gouly <[email protected]>
> Cc: Sami Tolvanen <[email protected]>
> Cc: Julien Thierry <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Yuichi Ito <[email protected]>
> Cc: [email protected]
> To: [email protected]
>
>
> Pingfan Liu (5):
> arm64/entry-common: push the judgement of nmi ahead
> irqchip/GICv3: expose handle_nmi() directly
> kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch
> optional
> irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64
> irqchip/GICv3: make reschedule-ipi light weight
>
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/irq.h | 7 ++++
> arch/arm64/kernel/entry-common.c | 45 +++++++++++++++-------
> arch/arm64/kernel/irq.c | 29 ++++++++++++++
> drivers/irqchip/irq-gic-v3.c | 66 ++++++++++++++++++++------------
> kernel/irq/Kconfig | 3 ++
> kernel/irq/irqdesc.c | 4 ++
> 7 files changed, 116 insertions(+), 39 deletions(-)
>
> --
> 2.31.1
>

2021-09-25 06:38:25

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of

On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote:
> [Adding Paul for RCU, s390 folk for entry code RCU semantics]
>
> On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote:
> > After introducing arm64/kernel/entry_common.c which is akin to
> > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as
> > the following:
> > enter_from_kernel_mode()->rcu_irq_enter().
> > And
> > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter()
> >
> > Besides redundance, based on code analysis, the redundance also raise
> > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes
> > rcu_is_cpu_rrupt_from_idle() unexpected.
>
> Hmmm...
>
> The fundamental questionss are:
>
> 1) Who is supposed to be responsible for doing the rcu entry/exit?
>
> 2) Is it supposed to matter if this happens multiple times?
>
> For (1), I'd generally expect that this is supposed to happen in the
> arch/common entry code, since that itself (or the irqchip driver) could
> depend on RCU, and if that's the case thatn handle_domain_irq()
> shouldn't need to call rcu_irq_enter(). That would be consistent with
> the way we handle all other exceptions.
>
> For (2) I don't know whether the level of nesting is suppoosed to
> matter. I was under the impression it wasn't meant to matter in general,
> so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a
> specific level of nesting.
>
> >From a glance it looks like this would cause rcu_sched_clock_irq() to
> skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which
> doesn't sound right, at least...
>
> Thomas, Paul, thoughts?

It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls
be balanced. Normally, this is taken care of by the fact that irq_enter()
invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly,
nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit().

But if you are doing some special-case exception where the handler needs
to use RCU readers, but where the rest of the work is not needed, then
the resulting calls to rcu_irq_enter() and rcu_irq_exit() must be in
the architecture-specific code and must be properly balanced.

So if exception entry invokes rcu_irq_enter() twice, then exception
exit also needs to invoke rcu_irq_exit() twice.

There are some constraints on where calls to these functions are place.
For example, any exception-entry code prior to the call to rcu_irq_enter()
must consist solely of functions marked noinstr, but Thomas can tell
you more.

Or am I missing the point of your question?

Thanx, Paul

> AFAICT, s390 will have a similar flow on its IRQ handling path, so if
> this is a real issue they'll be affected too.
>
> Thanks,
> Mark.
>
> > Nmi also faces duplicate accounts. This series aims to address these
> > duplicate issues.
> > [1-2/5]: address nmi account duplicate
> > [3-4/5]: address rcu housekeeping duplicate in irq
> > [5/5]: as a natural result of [3-4/5], address a history issue. [1]
> >
> >
> > History:
> > v1 -> v2:
> > change the subject as the motivation varies.
> > add the fix for nmi account duplicate
> >
> > The subject of v1 is "[PATCH 1/3] kernel/irq: __handle_domain_irq()
> > makes irq_enter/exit arch optional". [2] It is brought up to fix [1].
> >
> > There have been some tries to enable crash-stop-NMI on arm64, one by me,
> > the other by Yuichi's [4]. I hope after this series, they can advance,
> > as Marc said in [3] "No additional NMI patches will make it until we
> > have resolved the issues"
> >
> > [1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > [2] https://lore.kernel.org/linux-arm-kernel/[email protected]
> > [3] https://lore.kernel.org/linux-arm-kernel/[email protected]
> > [4] https://lore.kernel.org/linux-arm-kernel/[email protected]
> >
> > Cc: Catalin Marinas <[email protected]>
> > Cc: Will Deacon <[email protected]>
> > Cc: Mark Rutland <[email protected]>
> > Cc: Marc Zyngier <[email protected]>
> > Cc: Joey Gouly <[email protected]>
> > Cc: Sami Tolvanen <[email protected]>
> > Cc: Julien Thierry <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Yuichi Ito <[email protected]>
> > Cc: [email protected]
> > To: [email protected]
> >
> >
> > Pingfan Liu (5):
> > arm64/entry-common: push the judgement of nmi ahead
> > irqchip/GICv3: expose handle_nmi() directly
> > kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch
> > optional
> > irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64
> > irqchip/GICv3: make reschedule-ipi light weight
> >
> > arch/arm64/Kconfig | 1 +
> > arch/arm64/include/asm/irq.h | 7 ++++
> > arch/arm64/kernel/entry-common.c | 45 +++++++++++++++-------
> > arch/arm64/kernel/irq.c | 29 ++++++++++++++
> > drivers/irqchip/irq-gic-v3.c | 66 ++++++++++++++++++++------------
> > kernel/irq/Kconfig | 3 ++
> > kernel/irq/irqdesc.c | 4 ++
> > 7 files changed, 116 insertions(+), 39 deletions(-)
> >
> > --
> > 2.31.1
> >

2021-09-25 15:14:20

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of

On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote:
> [Adding Paul for RCU, s390 folk for entry code RCU semantics]
>
> On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote:
> > After introducing arm64/kernel/entry_common.c which is akin to
> > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as
> > the following:
> > enter_from_kernel_mode()->rcu_irq_enter().
> > And
> > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter()
> >
> > Besides redundance, based on code analysis, the redundance also raise
> > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes
> > rcu_is_cpu_rrupt_from_idle() unexpected.
>
> Hmmm...
>
> The fundamental questionss are:
>
> 1) Who is supposed to be responsible for doing the rcu entry/exit?
>
> 2) Is it supposed to matter if this happens multiple times?
>
> For (1), I'd generally expect that this is supposed to happen in the
> arch/common entry code, since that itself (or the irqchip driver) could
> depend on RCU, and if that's the case thatn handle_domain_irq()
> shouldn't need to call rcu_irq_enter(). That would be consistent with
> the way we handle all other exceptions.
>
In my humble opinion, it had better happen in arch/common entry code as
you said. But for many arches which assures that before
handle_domain_irq(), no data is involved in rcu updater, it can be done
in handle_domain_irq(). And that is a cheap way to integrate with rcu
system (at least for the time being).

For the (2), it goes deeply into RCU core, hope guides from Paul and
Thomas.
But at least for the condition
if ((user || rcu_is_cpu_rrupt_from_idle()) && rcu_nohz_full_cpu())
in rcu_pending(), it makes sense to tell the nested interrupt from a
first level interrupt.

Thanks,

Pingfan
> For (2) I don't know whether the level of nesting is suppoosed to
> matter. I was under the impression it wasn't meant to matter in general,
> so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a
> specific level of nesting.
>
> From a glance it looks like this would cause rcu_sched_clock_irq() to
> skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which
> doesn't sound right, at least...
>
> Thomas, Paul, thoughts?
>
> AFAICT, s390 will have a similar flow on its IRQ handling path, so if
> this is a real issue they'll be affected too.
>
> Thanks,
> Mark.
>
> > Nmi also faces duplicate accounts. This series aims to address these
> > duplicate issues.
> > [1-2/5]: address nmi account duplicate
> > [3-4/5]: address rcu housekeeping duplicate in irq
> > [5/5]: as a natural result of [3-4/5], address a history issue. [1]
> >
> >
> > History:
> > v1 -> v2:
> > change the subject as the motivation varies.
> > add the fix for nmi account duplicate
> >
> > The subject of v1 is "[PATCH 1/3] kernel/irq: __handle_domain_irq()
> > makes irq_enter/exit arch optional". [2] It is brought up to fix [1].
> >
> > There have been some tries to enable crash-stop-NMI on arm64, one by me,
> > the other by Yuichi's [4]. I hope after this series, they can advance,
> > as Marc said in [3] "No additional NMI patches will make it until we
> > have resolved the issues"
> >
> > [1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > [2] https://lore.kernel.org/linux-arm-kernel/[email protected]
> > [3] https://lore.kernel.org/linux-arm-kernel/[email protected]
> > [4] https://lore.kernel.org/linux-arm-kernel/[email protected]
> >
> > Cc: Catalin Marinas <[email protected]>
> > Cc: Will Deacon <[email protected]>
> > Cc: Mark Rutland <[email protected]>
> > Cc: Marc Zyngier <[email protected]>
> > Cc: Joey Gouly <[email protected]>
> > Cc: Sami Tolvanen <[email protected]>
> > Cc: Julien Thierry <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Yuichi Ito <[email protected]>
> > Cc: [email protected]
> > To: [email protected]
> >
> >
> > Pingfan Liu (5):
> > arm64/entry-common: push the judgement of nmi ahead
> > irqchip/GICv3: expose handle_nmi() directly
> > kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch
> > optional
> > irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64
> > irqchip/GICv3: make reschedule-ipi light weight
> >
> > arch/arm64/Kconfig | 1 +
> > arch/arm64/include/asm/irq.h | 7 ++++
> > arch/arm64/kernel/entry-common.c | 45 +++++++++++++++-------
> > arch/arm64/kernel/irq.c | 29 ++++++++++++++
> > drivers/irqchip/irq-gic-v3.c | 66 ++++++++++++++++++++------------
> > kernel/irq/Kconfig | 3 ++
> > kernel/irq/irqdesc.c | 4 ++
> > 7 files changed, 116 insertions(+), 39 deletions(-)
> >
> > --
> > 2.31.1
> >
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2021-09-25 15:51:24

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead

On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote:
> On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> > irq, which makes the condition !interrupts_enabled(regs) fail to detect
> > the NMI. This will cause a mistaken account for irq.
>
Sorry about the confusing word "account", it should be "lockdep/rcu/.."

> Can you please explain this in more detail? It's not clear which
> specific case you mean when you say "NMI interrupts an irq", as that
> could mean a number of distinct scenarios.
>
> AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
> causes a new exception we'll do the right thing. So either I'm missing a
> subtlety or you're describing a different scenario..
>
> Note that the entry code is only trying to distinguish between:
>
> a) This exception is *definitely* an NMI (because regular interrupts
> were masked).
>
> b) This exception is *either* and IRQ or an NMI (and this *cannot* be
> distinguished until we acknowledge the interrupt), so we treat it as
> an IRQ for now.
>
b) is the aim.

At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc.
While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(),
which does not call rcu_irq_enter_check_tick(). So it is not proper to
"treat it as an IRQ for now"

> ... and we leave it to the irqchip to handle the gory details. We only

The detail should hide in irqchip to decide if an exception is either
NMI or IRQ. But could irqchip export the interface to entry? (This patch
export two: handle_arch_nmi() and interrupt_is_nmi() ).

Thanks,

Pingfan

2021-09-27 09:24:41

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of

On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote:
> On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote:
> > [Adding Paul for RCU, s390 folk for entry code RCU semantics]
> >
> > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote:
> > > After introducing arm64/kernel/entry_common.c which is akin to
> > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as
> > > the following:
> > > enter_from_kernel_mode()->rcu_irq_enter().
> > > And
> > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter()
> > >
> > > Besides redundance, based on code analysis, the redundance also raise
> > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes
> > > rcu_is_cpu_rrupt_from_idle() unexpected.
> >
> > Hmmm...
> >
> > The fundamental questionss are:
> >
> > 1) Who is supposed to be responsible for doing the rcu entry/exit?
> >
> > 2) Is it supposed to matter if this happens multiple times?
> >
> > For (1), I'd generally expect that this is supposed to happen in the
> > arch/common entry code, since that itself (or the irqchip driver) could
> > depend on RCU, and if that's the case thatn handle_domain_irq()
> > shouldn't need to call rcu_irq_enter(). That would be consistent with
> > the way we handle all other exceptions.
> >
> > For (2) I don't know whether the level of nesting is suppoosed to
> > matter. I was under the impression it wasn't meant to matter in general,
> > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a
> > specific level of nesting.
> >
> > >From a glance it looks like this would cause rcu_sched_clock_irq() to
> > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which
> > doesn't sound right, at least...
> >
> > Thomas, Paul, thoughts?
>
> It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls
> be balanced. Normally, this is taken care of by the fact that irq_enter()
> invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly,
> nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit().

Sure; I didn't mean to suggest those weren't balanced! The problem here
is *nesting*. Due to the structure of our entry code and the core IRQ
code, when handling an IRQ we have a sequence:

irq_enter() // arch code
irq_enter() // irq code

< irq handler here >

irq_exit() // irq code
irq_exit() // arch code

... and if we use something like rcu_is_cpu_rrupt_from_idle() in the
middle (e.g. as part of rcu_sched_clock_irq()), this will not give the
expected result because of the additional nesting, since
rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting
is only incremented once per exception entry, when it does:

/* Are we at first interrupt nesting level? */
nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
if (nesting > 1)
return false;

What I'm trying to figure out is whether that expectation is legitimate,
and assuming so, where the entry/exit should happen.

Thanks,
Mark.

> But if you are doing some special-case exception where the handler needs
> to use RCU readers, but where the rest of the work is not needed, then
> the resulting calls to rcu_irq_enter() and rcu_irq_exit() must be in
> the architecture-specific code and must be properly balanced.
>
> So if exception entry invokes rcu_irq_enter() twice, then exception
> exit also needs to invoke rcu_irq_exit() twice.
>
> There are some constraints on where calls to these functions are place.
> For example, any exception-entry code prior to the call to rcu_irq_enter()
> must consist solely of functions marked noinstr, but Thomas can tell
> you more.
>
> Or am I missing the point of your question?
>
> Thanx, Paul
>
> > AFAICT, s390 will have a similar flow on its IRQ handling path, so if
> > this is a real issue they'll be affected too.
> >
> > Thanks,
> > Mark.
> >
> > > Nmi also faces duplicate accounts. This series aims to address these
> > > duplicate issues.
> > > [1-2/5]: address nmi account duplicate
> > > [3-4/5]: address rcu housekeeping duplicate in irq
> > > [5/5]: as a natural result of [3-4/5], address a history issue. [1]
> > >
> > >
> > > History:
> > > v1 -> v2:
> > > change the subject as the motivation varies.
> > > add the fix for nmi account duplicate
> > >
> > > The subject of v1 is "[PATCH 1/3] kernel/irq: __handle_domain_irq()
> > > makes irq_enter/exit arch optional". [2] It is brought up to fix [1].
> > >
> > > There have been some tries to enable crash-stop-NMI on arm64, one by me,
> > > the other by Yuichi's [4]. I hope after this series, they can advance,
> > > as Marc said in [3] "No additional NMI patches will make it until we
> > > have resolved the issues"
> > >
> > > [1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > [2] https://lore.kernel.org/linux-arm-kernel/[email protected]
> > > [3] https://lore.kernel.org/linux-arm-kernel/[email protected]
> > > [4] https://lore.kernel.org/linux-arm-kernel/[email protected]
> > >
> > > Cc: Catalin Marinas <[email protected]>
> > > Cc: Will Deacon <[email protected]>
> > > Cc: Mark Rutland <[email protected]>
> > > Cc: Marc Zyngier <[email protected]>
> > > Cc: Joey Gouly <[email protected]>
> > > Cc: Sami Tolvanen <[email protected]>
> > > Cc: Julien Thierry <[email protected]>
> > > Cc: Thomas Gleixner <[email protected]>
> > > Cc: Yuichi Ito <[email protected]>
> > > Cc: [email protected]
> > > To: [email protected]
> > >
> > >
> > > Pingfan Liu (5):
> > > arm64/entry-common: push the judgement of nmi ahead
> > > irqchip/GICv3: expose handle_nmi() directly
> > > kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch
> > > optional
> > > irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64
> > > irqchip/GICv3: make reschedule-ipi light weight
> > >
> > > arch/arm64/Kconfig | 1 +
> > > arch/arm64/include/asm/irq.h | 7 ++++
> > > arch/arm64/kernel/entry-common.c | 45 +++++++++++++++-------
> > > arch/arm64/kernel/irq.c | 29 ++++++++++++++
> > > drivers/irqchip/irq-gic-v3.c | 66 ++++++++++++++++++++------------
> > > kernel/irq/Kconfig | 3 ++
> > > kernel/irq/irqdesc.c | 4 ++
> > > 7 files changed, 116 insertions(+), 39 deletions(-)
> > >
> > > --
> > > 2.31.1
> > >

2021-09-28 00:10:57

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of

On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote:
> On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote:
> > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote:
> > > [Adding Paul for RCU, s390 folk for entry code RCU semantics]
> > >
> > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote:
> > > > After introducing arm64/kernel/entry_common.c which is akin to
> > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as
> > > > the following:
> > > > enter_from_kernel_mode()->rcu_irq_enter().
> > > > And
> > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter()
> > > >
> > > > Besides redundance, based on code analysis, the redundance also raise
> > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes
> > > > rcu_is_cpu_rrupt_from_idle() unexpected.
> > >
> > > Hmmm...
> > >
> > > The fundamental questionss are:
> > >
> > > 1) Who is supposed to be responsible for doing the rcu entry/exit?
> > >
> > > 2) Is it supposed to matter if this happens multiple times?
> > >
> > > For (1), I'd generally expect that this is supposed to happen in the
> > > arch/common entry code, since that itself (or the irqchip driver) could
> > > depend on RCU, and if that's the case thatn handle_domain_irq()
> > > shouldn't need to call rcu_irq_enter(). That would be consistent with
> > > the way we handle all other exceptions.
> > >
> > > For (2) I don't know whether the level of nesting is suppoosed to
> > > matter. I was under the impression it wasn't meant to matter in general,
> > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a
> > > specific level of nesting.
> > >
> > > >From a glance it looks like this would cause rcu_sched_clock_irq() to
> > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which
> > > doesn't sound right, at least...
> > >
> > > Thomas, Paul, thoughts?
> >
> > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls
> > be balanced. Normally, this is taken care of by the fact that irq_enter()
> > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly,
> > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit().
>
> Sure; I didn't mean to suggest those weren't balanced! The problem here
> is *nesting*. Due to the structure of our entry code and the core IRQ
> code, when handling an IRQ we have a sequence:
>
> irq_enter() // arch code
> irq_enter() // irq code
>
> < irq handler here >
>
> irq_exit() // irq code
> irq_exit() // arch code
>
> ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the
> middle (e.g. as part of rcu_sched_clock_irq()), this will not give the
> expected result because of the additional nesting, since
> rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting
> is only incremented once per exception entry, when it does:
>
> /* Are we at first interrupt nesting level? */
> nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> if (nesting > 1)
> return false;
>
> What I'm trying to figure out is whether that expectation is legitimate,
> and assuming so, where the entry/exit should happen.

Oooh...

The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will
be unable to detect a userspace quiescent state for a non-nohz_full
CPU. That could result in RCU CPU stall warnings if a user task runs
continuously on a given CPU for more than 21 seconds (60 seconds in
some distros). And this can easily happen if the user has a CPU-bound
thread that is the only runnable task on that CPU.

So, yes, this does need some sort of resolution.

The traditional approach is (as you surmise) to have only a single call
to irq_enter() on exception entry and only a single call to irq_exit()
on exception exit. If this is feasible, it is highly recommended.

In theory, we could have that "1" in "nesting > 1" be a constant supplied
by the architecture (you would want "3" if I remember correctly) but
in practice could we please avoid this? For one thing, if there is
some other path into the kernel for your architecture that does only a
single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand
a chance. It would need to compare against a different value depending
on what exception showed up. Even if that cannot happen, it would be
better if your architecture could remain in blissful ignorance of the
colorful details of ->dynticks_nmi_nesting manipulations.

Another approach would be for the arch code to supply RCU a function that
it calls. If there is such a function (or perhaps better, if some new
Kconfig option is enabled), RCU invokes it. Otherwise, it compares to
"1" as it does now. But you break it, you buy it! ;-)

Thoughts? Other approaches?

Thanx, Paul

> Thanks,
> Mark.
>
> > But if you are doing some special-case exception where the handler needs
> > to use RCU readers, but where the rest of the work is not needed, then
> > the resulting calls to rcu_irq_enter() and rcu_irq_exit() must be in
> > the architecture-specific code and must be properly balanced.
> >
> > So if exception entry invokes rcu_irq_enter() twice, then exception
> > exit also needs to invoke rcu_irq_exit() twice.
> >
> > There are some constraints on where calls to these functions are place.
> > For example, any exception-entry code prior to the call to rcu_irq_enter()
> > must consist solely of functions marked noinstr, but Thomas can tell
> > you more.
> >
> > Or am I missing the point of your question?
> >
> > Thanx, Paul
> >
> > > AFAICT, s390 will have a similar flow on its IRQ handling path, so if
> > > this is a real issue they'll be affected too.
> > >
> > > Thanks,
> > > Mark.
> > >
> > > > Nmi also faces duplicate accounts. This series aims to address these
> > > > duplicate issues.
> > > > [1-2/5]: address nmi account duplicate
> > > > [3-4/5]: address rcu housekeeping duplicate in irq
> > > > [5/5]: as a natural result of [3-4/5], address a history issue. [1]
> > > >
> > > >
> > > > History:
> > > > v1 -> v2:
> > > > change the subject as the motivation varies.
> > > > add the fix for nmi account duplicate
> > > >
> > > > The subject of v1 is "[PATCH 1/3] kernel/irq: __handle_domain_irq()
> > > > makes irq_enter/exit arch optional". [2] It is brought up to fix [1].
> > > >
> > > > There have been some tries to enable crash-stop-NMI on arm64, one by me,
> > > > the other by Yuichi's [4]. I hope after this series, they can advance,
> > > > as Marc said in [3] "No additional NMI patches will make it until we
> > > > have resolved the issues"
> > > >
> > > > [1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > > > [2] https://lore.kernel.org/linux-arm-kernel/[email protected]
> > > > [3] https://lore.kernel.org/linux-arm-kernel/[email protected]
> > > > [4] https://lore.kernel.org/linux-arm-kernel/[email protected]
> > > >
> > > > Cc: Catalin Marinas <[email protected]>
> > > > Cc: Will Deacon <[email protected]>
> > > > Cc: Mark Rutland <[email protected]>
> > > > Cc: Marc Zyngier <[email protected]>
> > > > Cc: Joey Gouly <[email protected]>
> > > > Cc: Sami Tolvanen <[email protected]>
> > > > Cc: Julien Thierry <[email protected]>
> > > > Cc: Thomas Gleixner <[email protected]>
> > > > Cc: Yuichi Ito <[email protected]>
> > > > Cc: [email protected]
> > > > To: [email protected]
> > > >
> > > >
> > > > Pingfan Liu (5):
> > > > arm64/entry-common: push the judgement of nmi ahead
> > > > irqchip/GICv3: expose handle_nmi() directly
> > > > kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch
> > > > optional
> > > > irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64
> > > > irqchip/GICv3: make reschedule-ipi light weight
> > > >
> > > > arch/arm64/Kconfig | 1 +
> > > > arch/arm64/include/asm/irq.h | 7 ++++
> > > > arch/arm64/kernel/entry-common.c | 45 +++++++++++++++-------
> > > > arch/arm64/kernel/irq.c | 29 ++++++++++++++
> > > > drivers/irqchip/irq-gic-v3.c | 66 ++++++++++++++++++++------------
> > > > kernel/irq/Kconfig | 3 ++
> > > > kernel/irq/irqdesc.c | 4 ++
> > > > 7 files changed, 116 insertions(+), 39 deletions(-)
> > > >
> > > > --
> > > > 2.31.1
> > > >

2021-09-28 08:35:04

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of

On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote:
> On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote:
> > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote:
> > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote:
> > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics]
> > > >
> > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote:
> > > > > After introducing arm64/kernel/entry_common.c which is akin to
> > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as
> > > > > the following:
> > > > > enter_from_kernel_mode()->rcu_irq_enter().
> > > > > And
> > > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter()
> > > > >
> > > > > Besides redundance, based on code analysis, the redundance also raise
> > > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes
> > > > > rcu_is_cpu_rrupt_from_idle() unexpected.
> > > >
> > > > Hmmm...
> > > >
> > > > The fundamental questionss are:
> > > >
> > > > 1) Who is supposed to be responsible for doing the rcu entry/exit?
> > > >
> > > > 2) Is it supposed to matter if this happens multiple times?
> > > >
> > > > For (1), I'd generally expect that this is supposed to happen in the
> > > > arch/common entry code, since that itself (or the irqchip driver) could
> > > > depend on RCU, and if that's the case thatn handle_domain_irq()
> > > > shouldn't need to call rcu_irq_enter(). That would be consistent with
> > > > the way we handle all other exceptions.
> > > >
> > > > For (2) I don't know whether the level of nesting is suppoosed to
> > > > matter. I was under the impression it wasn't meant to matter in general,
> > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a
> > > > specific level of nesting.
> > > >
> > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to
> > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which
> > > > doesn't sound right, at least...
> > > >
> > > > Thomas, Paul, thoughts?
> > >
> > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls
> > > be balanced. Normally, this is taken care of by the fact that irq_enter()
> > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly,
> > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit().
> >
> > Sure; I didn't mean to suggest those weren't balanced! The problem here
> > is *nesting*. Due to the structure of our entry code and the core IRQ
> > code, when handling an IRQ we have a sequence:
> >
> > irq_enter() // arch code
> > irq_enter() // irq code
> >
> > < irq handler here >
> >
> > irq_exit() // irq code
> > irq_exit() // arch code
> >
> > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the
> > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the
> > expected result because of the additional nesting, since
> > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting
> > is only incremented once per exception entry, when it does:
> >
> > /* Are we at first interrupt nesting level? */
> > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> > if (nesting > 1)
> > return false;
> >
> > What I'm trying to figure out is whether that expectation is legitimate,
> > and assuming so, where the entry/exit should happen.
>
> Oooh...
>
> The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will
> be unable to detect a userspace quiescent state for a non-nohz_full
> CPU. That could result in RCU CPU stall warnings if a user task runs
> continuously on a given CPU for more than 21 seconds (60 seconds in
> some distros). And this can easily happen if the user has a CPU-bound
> thread that is the only runnable task on that CPU.
>
> So, yes, this does need some sort of resolution.
>
> The traditional approach is (as you surmise) to have only a single call
> to irq_enter() on exception entry and only a single call to irq_exit()
> on exception exit. If this is feasible, it is highly recommended.

Cool; that's roughly what I was expecting / hoping to hear!

> In theory, we could have that "1" in "nesting > 1" be a constant supplied
> by the architecture (you would want "3" if I remember correctly) but
> in practice could we please avoid this? For one thing, if there is
> some other path into the kernel for your architecture that does only a
> single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand
> a chance. It would need to compare against a different value depending
> on what exception showed up. Even if that cannot happen, it would be
> better if your architecture could remain in blissful ignorance of the
> colorful details of ->dynticks_nmi_nesting manipulations.

I completely agree. I think it's much harder to keep that in check than
to enforce a "once per architectural exception" policy in the arch code.

> Another approach would be for the arch code to supply RCU a function that
> it calls. If there is such a function (or perhaps better, if some new
> Kconfig option is enabled), RCU invokes it. Otherwise, it compares to
> "1" as it does now. But you break it, you buy it! ;-)

I guess we could look at the exception regs and inspect the original
context, but it sounds overkill...

I think the cleanest thing is to leave this to arch code, and have the
common IRQ code stay well clear. Unfortunately most architectures
(including arch/arm) still need the common IRQ code to handle this, so
we'll have to make that conditional on Kconfig, something like the below
(build+boot tested only).

If there are no objections, I'll go check who else needs the same
treatment (IIUC at least s390 will), and spin that as a real
patch/series.

Thanks,
Mark.

---->8----
diff --git a/arch/Kconfig b/arch/Kconfig
index 8df1c7102643..c59475e50e4c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -225,6 +225,12 @@ config GENERIC_SMP_IDLE_THREAD
config GENERIC_IDLE_POLL_SETUP
bool

+config ARCH_ENTERS_IRQ
+ bool
+ help
+ An architecture should select this when it performs irq entry
+ management itself (e.g. calling irq_enter() and irq_exit()).
+
config ARCH_HAS_FORTIFY_SOURCE
bool
help
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5c7ae4c3954b..fa6476bf2b4d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -16,6 +16,7 @@ config ARM64
select ARCH_ENABLE_MEMORY_HOTREMOVE
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
+ select ARCH_ENTERS_IRQ
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 4e3c29bb603c..6affa12222e0 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -677,6 +677,15 @@ int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq)
EXPORT_SYMBOL_GPL(generic_handle_domain_irq);

#ifdef CONFIG_HANDLE_DOMAIN_IRQ
+
+#ifdef ARCH_ENTERS_IRQ
+#define handle_irq_enter()
+#define handle_irq_exit()
+#else
+#define handle_irq_enter() irq_enter()
+#define handle_irq_exit() irq_exit()
+#endif
+
/**
* handle_domain_irq - Invoke the handler for a HW irq belonging to a domain,
* usually for a root interrupt controller
@@ -693,7 +702,7 @@ int handle_domain_irq(struct irq_domain *domain,
struct irq_desc *desc;
int ret = 0;

- irq_enter();
+ handle_irq_enter();

/* The irqdomain code provides boundary checks */
desc = irq_resolve_mapping(domain, hwirq);
@@ -702,7 +711,7 @@ int handle_domain_irq(struct irq_domain *domain,
else
ret = -EINVAL;

- irq_exit();
+ handle_irq_exit();
set_irq_regs(old_regs);
return ret;
}

2021-09-28 08:38:48

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of

On Tue, Sep 28, 2021 at 09:32:22AM +0100, Mark Rutland wrote:
> On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote:
> > > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote:
> > > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics]
> > > > >
> > > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote:
> > > > > > After introducing arm64/kernel/entry_common.c which is akin to
> > > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as
> > > > > > the following:
> > > > > > enter_from_kernel_mode()->rcu_irq_enter().
> > > > > > And
> > > > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter()
> > > > > >
> > > > > > Besides redundance, based on code analysis, the redundance also raise
> > > > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes
> > > > > > rcu_is_cpu_rrupt_from_idle() unexpected.
> > > > >
> > > > > Hmmm...
> > > > >
> > > > > The fundamental questionss are:
> > > > >
> > > > > 1) Who is supposed to be responsible for doing the rcu entry/exit?
> > > > >
> > > > > 2) Is it supposed to matter if this happens multiple times?
> > > > >
> > > > > For (1), I'd generally expect that this is supposed to happen in the
> > > > > arch/common entry code, since that itself (or the irqchip driver) could
> > > > > depend on RCU, and if that's the case thatn handle_domain_irq()
> > > > > shouldn't need to call rcu_irq_enter(). That would be consistent with
> > > > > the way we handle all other exceptions.
> > > > >
> > > > > For (2) I don't know whether the level of nesting is suppoosed to
> > > > > matter. I was under the impression it wasn't meant to matter in general,
> > > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a
> > > > > specific level of nesting.
> > > > >
> > > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to
> > > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which
> > > > > doesn't sound right, at least...
> > > > >
> > > > > Thomas, Paul, thoughts?
> > > >
> > > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls
> > > > be balanced. Normally, this is taken care of by the fact that irq_enter()
> > > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly,
> > > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit().
> > >
> > > Sure; I didn't mean to suggest those weren't balanced! The problem here
> > > is *nesting*. Due to the structure of our entry code and the core IRQ
> > > code, when handling an IRQ we have a sequence:
> > >
> > > irq_enter() // arch code
> > > irq_enter() // irq code
> > >
> > > < irq handler here >
> > >
> > > irq_exit() // irq code
> > > irq_exit() // arch code
> > >
> > > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the
> > > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the
> > > expected result because of the additional nesting, since
> > > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting
> > > is only incremented once per exception entry, when it does:
> > >
> > > /* Are we at first interrupt nesting level? */
> > > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> > > if (nesting > 1)
> > > return false;
> > >
> > > What I'm trying to figure out is whether that expectation is legitimate,
> > > and assuming so, where the entry/exit should happen.
> >
> > Oooh...
> >
> > The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will
> > be unable to detect a userspace quiescent state for a non-nohz_full
> > CPU. That could result in RCU CPU stall warnings if a user task runs
> > continuously on a given CPU for more than 21 seconds (60 seconds in
> > some distros). And this can easily happen if the user has a CPU-bound
> > thread that is the only runnable task on that CPU.
> >
> > So, yes, this does need some sort of resolution.
> >
> > The traditional approach is (as you surmise) to have only a single call
> > to irq_enter() on exception entry and only a single call to irq_exit()
> > on exception exit. If this is feasible, it is highly recommended.
>
> Cool; that's roughly what I was expecting / hoping to hear!
>
> > In theory, we could have that "1" in "nesting > 1" be a constant supplied
> > by the architecture (you would want "3" if I remember correctly) but
> > in practice could we please avoid this? For one thing, if there is
> > some other path into the kernel for your architecture that does only a
> > single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand
> > a chance. It would need to compare against a different value depending
> > on what exception showed up. Even if that cannot happen, it would be
> > better if your architecture could remain in blissful ignorance of the
> > colorful details of ->dynticks_nmi_nesting manipulations.
>
> I completely agree. I think it's much harder to keep that in check than
> to enforce a "once per architectural exception" policy in the arch code.
>
> > Another approach would be for the arch code to supply RCU a function that
> > it calls. If there is such a function (or perhaps better, if some new
> > Kconfig option is enabled), RCU invokes it. Otherwise, it compares to
> > "1" as it does now. But you break it, you buy it! ;-)
>
> I guess we could look at the exception regs and inspect the original
> context, but it sounds overkill...
>
> I think the cleanest thing is to leave this to arch code, and have the
> common IRQ code stay well clear. Unfortunately most architectures
> (including arch/arm) still need the common IRQ code to handle this, so
> we'll have to make that conditional on Kconfig, something like the below
> (build+boot tested only).
>
> If there are no objections, I'll go check who else needs the same
> treatment (IIUC at least s390 will), and spin that as a real
> patch/series.

Ah, looking again this is basically Pinfan's patch 2, so ignore the
below, and I'll review Pingfan's patch instead.

Thanks,
Mark.

2021-09-28 08:56:05

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 3/5] kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch optional

On Fri, Sep 24, 2021 at 09:28:35PM +0800, Pingfan Liu wrote:
> handle_domain_irq() contains two major parts:
> -1. irq_enter()/irq_exit(), which serves as hooks for rcu and trace etc.
> -2. irq mapping and dispatching
>
> After the introduction of irqentry_enter()/irqentry_exit() and arch
> specific counterpart (e.g. arm64), roughly speaking, rcu_irq_enter() has
> already been called. Hence here comes requirement to move
> irq_enter/irq_exit out of handle_domain_irq(). And arches should handle
> about irq_enter_rcu()/irq_exit_rcu() by themself.
>
> Since there is still arches, which does not adopt irqentry_enter(), let
> irq_enter()/irq_exit() arch optional in handle_domain_irq().

The patch below looks good to me, but the commit message is a little
hard to follow. How about:

When an IRQ is taken, some accounting needs to be performed to enter
and exit IRQ context around the IRQ handler. Historically arch code
would leave this to the irqchip or core IRQ code, but these days we
want this to happen in exception entry code, and architectures such as
arm64 do this.

Currently handle_domain_irq() performs this entry/exit accounting, and
if used on an architecture where the entry code also does this, the
entry/exit accounting will be performed twice per IRQ. This is
problematic as core RCU code such as rcu_is_cpu_rrupt_from_idle()
depends on this happening once per IRQ, and will not detect quescent
periods correctly, leading to stall warnings.

As irqchip drivers which use handle_domain_irq() need to work on
architectures with or without their own entry/exit accounting, this
patch makes handle_domain_irq() conditionally perform the entry
accounting depending on a new HAVE_ARCH_IRQENTRY Kconfig symbol that
architectures can select if they perform this entry accounting
themselves.

For architectures which do not select the symbol. there should be no
functional change as a result of this patch.

With that commit message:

Reviewed-by: Mark Rutland <[email protected]>

Mark.

> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Joey Gouly <[email protected]>
> Cc: Sami Tolvanen <[email protected]>
> Cc: Julien Thierry <[email protected]>
> Cc: Yuichi Ito <[email protected]>
> Cc: [email protected]
> To: [email protected]
> ---
> kernel/irq/Kconfig | 3 +++
> kernel/irq/irqdesc.c | 4 ++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
> index fbc54c2a7f23..defa1db2d664 100644
> --- a/kernel/irq/Kconfig
> +++ b/kernel/irq/Kconfig
> @@ -100,6 +100,9 @@ config IRQ_MSI_IOMMU
> config HANDLE_DOMAIN_IRQ
> bool
>
> +config HAVE_ARCH_IRQENTRY
> + bool
> +
> config IRQ_TIMINGS
> bool
>
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index 4e3c29bb603c..fd5dd9d278b5 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -693,7 +693,9 @@ int handle_domain_irq(struct irq_domain *domain,
> struct irq_desc *desc;
> int ret = 0;
>
> +#ifndef CONFIG_HAVE_ARCH_IRQENTRY
> irq_enter();
> +#endif
>
> /* The irqdomain code provides boundary checks */
> desc = irq_resolve_mapping(domain, hwirq);
> @@ -702,7 +704,9 @@ int handle_domain_irq(struct irq_domain *domain,
> else
> ret = -EINVAL;
>
> +#ifndef CONFIG_HAVE_ARCH_IRQENTRY
> irq_exit();
> +#endif
> set_irq_regs(old_regs);
> return ret;
> }
> --
> 2.31.1
>

2021-09-28 09:14:51

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 4/5] irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64

On Fri, Sep 24, 2021 at 09:28:36PM +0800, Pingfan Liu wrote:
> The call to rcu_irq_enter() originated from gic_handle_irq() is
> redundant now, since arm64 has enter_from_kernel_mode() akin to
> irqenter_entry(), which has already called rcu_irq_enter().

Here I think you're referring to the call in handle_domain_irq(), but
that isn't clear from the commit message.

> Based on code analysis, the redundant can raise some mistake, e.g.
> rcu_data->dynticks_nmi_nesting inc 2, which causes
> rcu_is_cpu_rrupt_from_idle() unexpected.
>
> So eliminate the call to irq_enter() in handle_domain_irq(). And
> accordingly supplementing irq_enter_rcu().

We support many more irqchips on arm64, and GICv3 can be used on regular
32-bit arm, so this isn't right. Moving the irq_enter_rcu() call
into the GICv3 driver specifically breaks other drivers on arm64 by
removing the call, and breaks the GICv3 driver on arm by adding a
duplicate call.

It looks like this should live in do_interrupt_handler() in
arch/arm64/kernel/entry-common.c, e.g.

| static void do_interrupt_handler(struct pt_regs *regs,
| void (*handler)(struct pt_regs *))
| {
| irq_enter_rcu();
| if (on_thread_stack())
| call_on_irq_stack(regs, handler);
| else
| handler(regs);
| irq_exit_rcu();
| }

... unless there's some problem with that?

Thanks,
Mark.

> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Joey Gouly <[email protected]>
> Cc: Sami Tolvanen <[email protected]>
> Cc: Julien Thierry <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Yuichi Ito <[email protected]>
> Cc: [email protected]
> To: [email protected]
> ---
> arch/arm64/Kconfig | 1 +
> drivers/irqchip/irq-gic-v3.c | 2 ++
> 2 files changed, 3 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 5c7ae4c3954b..d29bae38a951 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -98,6 +98,7 @@ config ARM64
> select ARCH_HAS_UBSAN_SANITIZE_ALL
> select ARM_AMBA
> select ARM_ARCH_TIMER
> + select HAVE_ARCH_IRQENTRY
> select ARM_GIC
> select AUDIT_ARCH_COMPAT_GENERIC
> select ARM_GIC_V2M if PCI
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 89dcec902a82..906538fa8771 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -729,10 +729,12 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs
> else
> isb();
>
> + irq_enter_rcu();
> if (handle_domain_irq(gic_data.domain, irqnr, regs)) {
> WARN_ONCE(true, "Unexpected interrupt received!\n");
> gic_deactivate_unhandled(irqnr);
> }
> + irq_exit_rcu();
> }
>
> static u32 gic_get_pribits(void)
> --
> 2.31.1
>

2021-09-28 09:56:00

by Sven Schnelle

[permalink] [raw]
Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of

Mark Rutland <[email protected]> writes:

> On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote:
>> On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote:
>> > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote:
>> > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote:
>> > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics]
>> > > >
>> > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote:
>> > > > > After introducing arm64/kernel/entry_common.c which is akin to
>> > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as
>> > > > > the following:
>> > > > > enter_from_kernel_mode()->rcu_irq_enter().
>> > > > > And
>> > > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter()
>> > > > >
>> > > > > Besides redundance, based on code analysis, the redundance also raise
>> > > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes
>> > > > > rcu_is_cpu_rrupt_from_idle() unexpected.
>> > > >
>> > > > Hmmm...
>> > > >
>> > > > The fundamental questionss are:
>> > > >
>> > > > 1) Who is supposed to be responsible for doing the rcu entry/exit?
>> > > >
>> > > > 2) Is it supposed to matter if this happens multiple times?
>> > > >
>> > > > For (1), I'd generally expect that this is supposed to happen in the
>> > > > arch/common entry code, since that itself (or the irqchip driver) could
>> > > > depend on RCU, and if that's the case thatn handle_domain_irq()
>> > > > shouldn't need to call rcu_irq_enter(). That would be consistent with
>> > > > the way we handle all other exceptions.
>> > > >
>> > > > For (2) I don't know whether the level of nesting is suppoosed to
>> > > > matter. I was under the impression it wasn't meant to matter in general,
>> > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a
>> > > > specific level of nesting.
>> > > >
>> > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to
>> > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which
>> > > > doesn't sound right, at least...
>> > > >
>> > > > Thomas, Paul, thoughts?
>> > >
>> > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls
>> > > be balanced. Normally, this is taken care of by the fact that irq_enter()
>> > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly,
>> > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit().
>> >
>> > Sure; I didn't mean to suggest those weren't balanced! The problem here
>> > is *nesting*. Due to the structure of our entry code and the core IRQ
>> > code, when handling an IRQ we have a sequence:
>> >
>> > irq_enter() // arch code
>> > irq_enter() // irq code
>> >
>> > < irq handler here >
>> >
>> > irq_exit() // irq code
>> > irq_exit() // arch code
>> >
>> > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the
>> > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the
>> > expected result because of the additional nesting, since
>> > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting
>> > is only incremented once per exception entry, when it does:
>> >
>> > /* Are we at first interrupt nesting level? */
>> > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
>> > if (nesting > 1)
>> > return false;
>> >
>> > What I'm trying to figure out is whether that expectation is legitimate,
>> > and assuming so, where the entry/exit should happen.
>>
>> Oooh...
>>
>> The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will
>> be unable to detect a userspace quiescent state for a non-nohz_full
>> CPU. That could result in RCU CPU stall warnings if a user task runs
>> continuously on a given CPU for more than 21 seconds (60 seconds in
>> some distros). And this can easily happen if the user has a CPU-bound
>> thread that is the only runnable task on that CPU.
>>
>> So, yes, this does need some sort of resolution.
>>
>> The traditional approach is (as you surmise) to have only a single call
>> to irq_enter() on exception entry and only a single call to irq_exit()
>> on exception exit. If this is feasible, it is highly recommended.
>
> Cool; that's roughly what I was expecting / hoping to hear!
>
>> In theory, we could have that "1" in "nesting > 1" be a constant supplied
>> by the architecture (you would want "3" if I remember correctly) but
>> in practice could we please avoid this? For one thing, if there is
>> some other path into the kernel for your architecture that does only a
>> single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand
>> a chance. It would need to compare against a different value depending
>> on what exception showed up. Even if that cannot happen, it would be
>> better if your architecture could remain in blissful ignorance of the
>> colorful details of ->dynticks_nmi_nesting manipulations.
>
> I completely agree. I think it's much harder to keep that in check than
> to enforce a "once per architectural exception" policy in the arch code.
>
>> Another approach would be for the arch code to supply RCU a function that
>> it calls. If there is such a function (or perhaps better, if some new
>> Kconfig option is enabled), RCU invokes it. Otherwise, it compares to
>> "1" as it does now. But you break it, you buy it! ;-)
>
> I guess we could look at the exception regs and inspect the original
> context, but it sounds overkill...
>
> I think the cleanest thing is to leave this to arch code, and have the
> common IRQ code stay well clear. Unfortunately most architectures
> (including arch/arm) still need the common IRQ code to handle this, so
> we'll have to make that conditional on Kconfig, something like the below
> (build+boot tested only).
>
> If there are no objections, I'll go check who else needs the same
> treatment (IIUC at least s390 will), and spin that as a real
> patch/series.

Hmm, s390 doesn't use handle_domain_irq() and doesn't have
HANDLE_DOMAIN_IRQ set. So i don't think the patch below applies to s390.
However, i'll follow the code to make sure we're not calling
irq_enter/irq_exit twice.

> Thanks,
> Mark.
>
> ---->8----
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 8df1c7102643..c59475e50e4c 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -225,6 +225,12 @@ config GENERIC_SMP_IDLE_THREAD
> config GENERIC_IDLE_POLL_SETUP
> bool
>
> +config ARCH_ENTERS_IRQ
> + bool
> + help
> + An architecture should select this when it performs irq entry
> + management itself (e.g. calling irq_enter() and irq_exit()).
> +
> config ARCH_HAS_FORTIFY_SOURCE
> bool
> help
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 5c7ae4c3954b..fa6476bf2b4d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -16,6 +16,7 @@ config ARM64
> select ARCH_ENABLE_MEMORY_HOTREMOVE
> select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
> select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
> + select ARCH_ENTERS_IRQ
> select ARCH_HAS_CACHE_LINE_SIZE
> select ARCH_HAS_DEBUG_VIRTUAL
> select ARCH_HAS_DEBUG_VM_PGTABLE
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index 4e3c29bb603c..6affa12222e0 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -677,6 +677,15 @@ int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq)
> EXPORT_SYMBOL_GPL(generic_handle_domain_irq);
>
> #ifdef CONFIG_HANDLE_DOMAIN_IRQ
> +
> +#ifdef ARCH_ENTERS_IRQ
> +#define handle_irq_enter()
> +#define handle_irq_exit()
> +#else
> +#define handle_irq_enter() irq_enter()
> +#define handle_irq_exit() irq_exit()
> +#endif
> +
> /**
> * handle_domain_irq - Invoke the handler for a HW irq belonging to a domain,
> * usually for a root interrupt controller
> @@ -693,7 +702,7 @@ int handle_domain_irq(struct irq_domain *domain,
> struct irq_desc *desc;
> int ret = 0;
>
> - irq_enter();
> + handle_irq_enter();
>
> /* The irqdomain code provides boundary checks */
> desc = irq_resolve_mapping(domain, hwirq);
> @@ -702,7 +711,7 @@ int handle_domain_irq(struct irq_domain *domain,
> else
> ret = -EINVAL;
>
> - irq_exit();
> + handle_irq_exit();
> set_irq_regs(old_regs);
> return ret;
> }

2021-09-28 10:29:01

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of

On Tue, Sep 28, 2021 at 11:52:51AM +0200, Sven Schnelle wrote:
> Mark Rutland <[email protected]> writes:
>
> > On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote:
> >> On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote:
> >> > Sure; I didn't mean to suggest those weren't balanced! The problem here
> >> > is *nesting*. Due to the structure of our entry code and the core IRQ
> >> > code, when handling an IRQ we have a sequence:
> >> >
> >> > irq_enter() // arch code
> >> > irq_enter() // irq code
> >> >
> >> > < irq handler here >
> >> >
> >> > irq_exit() // irq code
> >> > irq_exit() // arch code
> >> >
> >> > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the
> >> > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the
> >> > expected result because of the additional nesting, since
> >> > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting
> >> > is only incremented once per exception entry, when it does:
> >> >
> >> > /* Are we at first interrupt nesting level? */
> >> > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> >> > if (nesting > 1)
> >> > return false;
> >> >
> >> > What I'm trying to figure out is whether that expectation is legitimate,
> >> > and assuming so, where the entry/exit should happen.
> >>
> >> Oooh...
> >>
> >> The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will
> >> be unable to detect a userspace quiescent state for a non-nohz_full
> >> CPU. That could result in RCU CPU stall warnings if a user task runs
> >> continuously on a given CPU for more than 21 seconds (60 seconds in
> >> some distros). And this can easily happen if the user has a CPU-bound
> >> thread that is the only runnable task on that CPU.
> >>
> >> So, yes, this does need some sort of resolution.
> >>
> >> The traditional approach is (as you surmise) to have only a single call
> >> to irq_enter() on exception entry and only a single call to irq_exit()
> >> on exception exit. If this is feasible, it is highly recommended.
> >
> > Cool; that's roughly what I was expecting / hoping to hear!
> >
> >> In theory, we could have that "1" in "nesting > 1" be a constant supplied
> >> by the architecture (you would want "3" if I remember correctly) but
> >> in practice could we please avoid this? For one thing, if there is
> >> some other path into the kernel for your architecture that does only a
> >> single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand
> >> a chance. It would need to compare against a different value depending
> >> on what exception showed up. Even if that cannot happen, it would be
> >> better if your architecture could remain in blissful ignorance of the
> >> colorful details of ->dynticks_nmi_nesting manipulations.
> >
> > I completely agree. I think it's much harder to keep that in check than
> > to enforce a "once per architectural exception" policy in the arch code.
> >
> >> Another approach would be for the arch code to supply RCU a function that
> >> it calls. If there is such a function (or perhaps better, if some new
> >> Kconfig option is enabled), RCU invokes it. Otherwise, it compares to
> >> "1" as it does now. But you break it, you buy it! ;-)
> >
> > I guess we could look at the exception regs and inspect the original
> > context, but it sounds overkill...
> >
> > I think the cleanest thing is to leave this to arch code, and have the
> > common IRQ code stay well clear. Unfortunately most architectures
> > (including arch/arm) still need the common IRQ code to handle this, so
> > we'll have to make that conditional on Kconfig, something like the below
> > (build+boot tested only).
> >
> > If there are no objections, I'll go check who else needs the same
> > treatment (IIUC at least s390 will), and spin that as a real
> > patch/series.
>
> Hmm, s390 doesn't use handle_domain_irq() and doesn't have
> HANDLE_DOMAIN_IRQ set. So i don't think the patch below applies to s390.
> However, i'll follow the code to make sure we're not calling
> irq_enter/irq_exit twice.

I wasn't clear, but for s390, my concern was that in do_io_irq() and
do_ext_irq() you have the sequence:

irqentry_enter() // calls rcu_irq_enter()
irq_enter(); // calls rcu_irq_enter() then irq_enter_rcu();

< handler>

irq_exit(); // calls __irq_exit_rcu then rcu_irq_exit();
irqentry_exit(); // calls rcu_irq_exit()

... and so IIUC you call rcu_irq_enter() and rcu_irq_exit() twice,
getting the same double-increment of `dynticks_nmi_nesting` per
interrupt, and the same potential problem with
rcu_is_cpu_rrupt_from_idle().

Thanks,
Mark.

2021-09-28 13:58:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of

On Tue, Sep 28, 2021 at 09:32:22AM +0100, Mark Rutland wrote:
> On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote:
> > > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote:
> > > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics]
> > > > >
> > > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote:
> > > > > > After introducing arm64/kernel/entry_common.c which is akin to
> > > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as
> > > > > > the following:
> > > > > > enter_from_kernel_mode()->rcu_irq_enter().
> > > > > > And
> > > > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter()
> > > > > >
> > > > > > Besides redundance, based on code analysis, the redundance also raise
> > > > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes
> > > > > > rcu_is_cpu_rrupt_from_idle() unexpected.
> > > > >
> > > > > Hmmm...
> > > > >
> > > > > The fundamental questionss are:
> > > > >
> > > > > 1) Who is supposed to be responsible for doing the rcu entry/exit?
> > > > >
> > > > > 2) Is it supposed to matter if this happens multiple times?
> > > > >
> > > > > For (1), I'd generally expect that this is supposed to happen in the
> > > > > arch/common entry code, since that itself (or the irqchip driver) could
> > > > > depend on RCU, and if that's the case thatn handle_domain_irq()
> > > > > shouldn't need to call rcu_irq_enter(). That would be consistent with
> > > > > the way we handle all other exceptions.
> > > > >
> > > > > For (2) I don't know whether the level of nesting is suppoosed to
> > > > > matter. I was under the impression it wasn't meant to matter in general,
> > > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a
> > > > > specific level of nesting.
> > > > >
> > > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to
> > > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which
> > > > > doesn't sound right, at least...
> > > > >
> > > > > Thomas, Paul, thoughts?
> > > >
> > > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls
> > > > be balanced. Normally, this is taken care of by the fact that irq_enter()
> > > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly,
> > > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit().
> > >
> > > Sure; I didn't mean to suggest those weren't balanced! The problem here
> > > is *nesting*. Due to the structure of our entry code and the core IRQ
> > > code, when handling an IRQ we have a sequence:
> > >
> > > irq_enter() // arch code
> > > irq_enter() // irq code
> > >
> > > < irq handler here >
> > >
> > > irq_exit() // irq code
> > > irq_exit() // arch code
> > >
> > > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the
> > > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the
> > > expected result because of the additional nesting, since
> > > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting
> > > is only incremented once per exception entry, when it does:
> > >
> > > /* Are we at first interrupt nesting level? */
> > > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> > > if (nesting > 1)
> > > return false;
> > >
> > > What I'm trying to figure out is whether that expectation is legitimate,
> > > and assuming so, where the entry/exit should happen.
> >
> > Oooh...
> >
> > The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will
> > be unable to detect a userspace quiescent state for a non-nohz_full
> > CPU. That could result in RCU CPU stall warnings if a user task runs
> > continuously on a given CPU for more than 21 seconds (60 seconds in
> > some distros). And this can easily happen if the user has a CPU-bound
> > thread that is the only runnable task on that CPU.
> >
> > So, yes, this does need some sort of resolution.
> >
> > The traditional approach is (as you surmise) to have only a single call
> > to irq_enter() on exception entry and only a single call to irq_exit()
> > on exception exit. If this is feasible, it is highly recommended.
>
> Cool; that's roughly what I was expecting / hoping to hear!
>
> > In theory, we could have that "1" in "nesting > 1" be a constant supplied
> > by the architecture (you would want "3" if I remember correctly) but
> > in practice could we please avoid this? For one thing, if there is
> > some other path into the kernel for your architecture that does only a
> > single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand
> > a chance. It would need to compare against a different value depending
> > on what exception showed up. Even if that cannot happen, it would be
> > better if your architecture could remain in blissful ignorance of the
> > colorful details of ->dynticks_nmi_nesting manipulations.
>
> I completely agree. I think it's much harder to keep that in check than
> to enforce a "once per architectural exception" policy in the arch code.
>
> > Another approach would be for the arch code to supply RCU a function that
> > it calls. If there is such a function (or perhaps better, if some new
> > Kconfig option is enabled), RCU invokes it. Otherwise, it compares to
> > "1" as it does now. But you break it, you buy it! ;-)
>
> I guess we could look at the exception regs and inspect the original
> context, but it sounds overkill...
>
> I think the cleanest thing is to leave this to arch code, and have the
> common IRQ code stay well clear. Unfortunately most architectures
> (including arch/arm) still need the common IRQ code to handle this, so
> we'll have to make that conditional on Kconfig, something like the below
> (build+boot tested only).
>
> If there are no objections, I'll go check who else needs the same
> treatment (IIUC at least s390 will), and spin that as a real
> patch/series.

This approach (whether from you or Pinfan) looks good to me!

Thanx, Paul

> Thanks,
> Mark.
>
> ---->8----
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 8df1c7102643..c59475e50e4c 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -225,6 +225,12 @@ config GENERIC_SMP_IDLE_THREAD
> config GENERIC_IDLE_POLL_SETUP
> bool
>
> +config ARCH_ENTERS_IRQ
> + bool
> + help
> + An architecture should select this when it performs irq entry
> + management itself (e.g. calling irq_enter() and irq_exit()).
> +
> config ARCH_HAS_FORTIFY_SOURCE
> bool
> help
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 5c7ae4c3954b..fa6476bf2b4d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -16,6 +16,7 @@ config ARM64
> select ARCH_ENABLE_MEMORY_HOTREMOVE
> select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
> select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
> + select ARCH_ENTERS_IRQ
> select ARCH_HAS_CACHE_LINE_SIZE
> select ARCH_HAS_DEBUG_VIRTUAL
> select ARCH_HAS_DEBUG_VM_PGTABLE
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index 4e3c29bb603c..6affa12222e0 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -677,6 +677,15 @@ int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq)
> EXPORT_SYMBOL_GPL(generic_handle_domain_irq);
>
> #ifdef CONFIG_HANDLE_DOMAIN_IRQ
> +
> +#ifdef ARCH_ENTERS_IRQ
> +#define handle_irq_enter()
> +#define handle_irq_exit()
> +#else
> +#define handle_irq_enter() irq_enter()
> +#define handle_irq_exit() irq_exit()
> +#endif
> +
> /**
> * handle_domain_irq - Invoke the handler for a HW irq belonging to a domain,
> * usually for a root interrupt controller
> @@ -693,7 +702,7 @@ int handle_domain_irq(struct irq_domain *domain,
> struct irq_desc *desc;
> int ret = 0;
>
> - irq_enter();
> + handle_irq_enter();
>
> /* The irqdomain code provides boundary checks */
> desc = irq_resolve_mapping(domain, hwirq);
> @@ -702,7 +711,7 @@ int handle_domain_irq(struct irq_domain *domain,
> else
> ret = -EINVAL;
>
> - irq_exit();
> + handle_irq_exit();
> set_irq_regs(old_regs);
> return ret;
> }
>

2021-09-29 03:18:54

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 3/5] kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch optional

On Tue, Sep 28, 2021 at 09:55:00AM +0100, Mark Rutland wrote:
> On Fri, Sep 24, 2021 at 09:28:35PM +0800, Pingfan Liu wrote:
> > handle_domain_irq() contains two major parts:
> > -1. irq_enter()/irq_exit(), which serves as hooks for rcu and trace etc.
> > -2. irq mapping and dispatching
> >
> > After the introduction of irqentry_enter()/irqentry_exit() and arch
> > specific counterpart (e.g. arm64), roughly speaking, rcu_irq_enter() has
> > already been called. Hence here comes requirement to move
> > irq_enter/irq_exit out of handle_domain_irq(). And arches should handle
> > about irq_enter_rcu()/irq_exit_rcu() by themself.
> >
> > Since there is still arches, which does not adopt irqentry_enter(), let
> > irq_enter()/irq_exit() arch optional in handle_domain_irq().
>
> The patch below looks good to me, but the commit message is a little
> hard to follow. How about:
>
> When an IRQ is taken, some accounting needs to be performed to enter
> and exit IRQ context around the IRQ handler. Historically arch code
> would leave this to the irqchip or core IRQ code, but these days we
> want this to happen in exception entry code, and architectures such as
> arm64 do this.
>
> Currently handle_domain_irq() performs this entry/exit accounting, and
> if used on an architecture where the entry code also does this, the
> entry/exit accounting will be performed twice per IRQ. This is
> problematic as core RCU code such as rcu_is_cpu_rrupt_from_idle()
> depends on this happening once per IRQ, and will not detect quescent
> periods correctly, leading to stall warnings.
>
> As irqchip drivers which use handle_domain_irq() need to work on
> architectures with or without their own entry/exit accounting, this
> patch makes handle_domain_irq() conditionally perform the entry
> accounting depending on a new HAVE_ARCH_IRQENTRY Kconfig symbol that
> architectures can select if they perform this entry accounting
> themselves.
>
> For architectures which do not select the symbol. there should be no
> functional change as a result of this patch.
>
Thanks for your precious time to improve the log. It looks great, and I
will use it in V2. I will keep learning to improve my level of log.

> With that commit message:
>
> Reviewed-by: Mark Rutland <[email protected]>
>
Thanks,

Pingfan

2021-09-29 03:56:02

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 4/5] irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64

On Tue, Sep 28, 2021 at 10:10:53AM +0100, Mark Rutland wrote:
> On Fri, Sep 24, 2021 at 09:28:36PM +0800, Pingfan Liu wrote:
> > The call to rcu_irq_enter() originated from gic_handle_irq() is
> > redundant now, since arm64 has enter_from_kernel_mode() akin to
> > irqenter_entry(), which has already called rcu_irq_enter().
>
> Here I think you're referring to the call in handle_domain_irq(), but
> that isn't clear from the commit message.
>
Yes, and I will make it clear in V2.

> > Based on code analysis, the redundant can raise some mistake, e.g.
> > rcu_data->dynticks_nmi_nesting inc 2, which causes
> > rcu_is_cpu_rrupt_from_idle() unexpected.
> >
> > So eliminate the call to irq_enter() in handle_domain_irq(). And
> > accordingly supplementing irq_enter_rcu().
>
> We support many more irqchips on arm64, and GICv3 can be used on regular
> 32-bit arm, so this isn't right. Moving the irq_enter_rcu() call
> into the GICv3 driver specifically breaks other drivers on arm64 by
> removing the call, and breaks the GICv3 driver on arm by adding a
> duplicate call.
>
Oops. I forgot to protect the code in GICv3 with CONFIG_HAVE_ARCH_IRQENTRY

> It looks like this should live in do_interrupt_handler() in
> arch/arm64/kernel/entry-common.c, e.g.
>
> | static void do_interrupt_handler(struct pt_regs *regs,
> | void (*handler)(struct pt_regs *))
> | {
> | irq_enter_rcu();
> | if (on_thread_stack())
> | call_on_irq_stack(regs, handler);
> | else
> | handler(regs);
> | irq_exit_rcu();
> | }
>
> ... unless there's some problem with that?
>
Yeah, do_interrupt_handler() is a more suitable place. But to resolve
the performance regression of rescheduling IPI [1], it is badly demanded to
distinguish irqnr before calling irq_enter_rcu() (please see 5/5 and [2]
for the context). So it is a compromise to host the code in GICv3.

Any good idea?


[1]: https://lore.kernel.org/linux-arm-kernel/[email protected]/
[2]: https://lore.kernel.org/linux-arm-kernel/[email protected]/



Thanks,

Pingfan

2021-09-29 07:27:17

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCHv2 5/5] irqchip/GICv3: make reschedule-ipi light weight

On Fri, 24 Sep 2021 14:28:37 +0100,
Pingfan Liu <[email protected]> wrote:
>
> To achieve the light weight as
> DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_reschedule_ipi) on x86, it had
> better treat irqnr differently at the frontend. And let IPI_RESCHEDULE
> call __irq_enter_raw() instead of irq_enter_rcu().
>
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Joey Gouly <[email protected]>
> Cc: Sami Tolvanen <[email protected]>
> Cc: Julien Thierry <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Yuichi Ito <[email protected]>
> Cc: [email protected]
> To: [email protected]
> ---
> drivers/irqchip/irq-gic-v3.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 906538fa8771..593d4539a209 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -709,6 +709,9 @@ static void gic_handle_nmi(struct pt_regs *regs)
> gic_deactivate_unhandled(irqnr);
> }
>
> +/* RESCHEDULE IPI hwirq nr is 0, and the only raw one */
> +static unsigned long raw_interrupt_mask = 1;

I'm afraid you have the wrong end of the stick. This isn't a GIC
property. This is an architecture decision, and only the architecture
can expose what they want.

M.

--
Without deviation from the norm, progress is not possible.

2021-09-29 07:54:22

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCHv2 4/5] irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64

On Wed, 29 Sep 2021 04:10:11 +0100,
Pingfan Liu <[email protected]> wrote:
>
> On Tue, Sep 28, 2021 at 10:10:53AM +0100, Mark Rutland wrote:
> > On Fri, Sep 24, 2021 at 09:28:36PM +0800, Pingfan Liu wrote:
> > > The call to rcu_irq_enter() originated from gic_handle_irq() is
> > > redundant now, since arm64 has enter_from_kernel_mode() akin to
> > > irqenter_entry(), which has already called rcu_irq_enter().
> >
> > Here I think you're referring to the call in handle_domain_irq(), but
> > that isn't clear from the commit message.
> >
> Yes, and I will make it clear in V2.
>
> > > Based on code analysis, the redundant can raise some mistake, e.g.
> > > rcu_data->dynticks_nmi_nesting inc 2, which causes
> > > rcu_is_cpu_rrupt_from_idle() unexpected.
> > >
> > > So eliminate the call to irq_enter() in handle_domain_irq(). And
> > > accordingly supplementing irq_enter_rcu().
> >
> > We support many more irqchips on arm64, and GICv3 can be used on regular
> > 32-bit arm, so this isn't right. Moving the irq_enter_rcu() call
> > into the GICv3 driver specifically breaks other drivers on arm64 by
> > removing the call, and breaks the GICv3 driver on arm by adding a
> > duplicate call.
> >
> Oops. I forgot to protect the code in GICv3 with CONFIG_HAVE_ARCH_IRQENTRY
>
> > It looks like this should live in do_interrupt_handler() in
> > arch/arm64/kernel/entry-common.c, e.g.
> >
> > | static void do_interrupt_handler(struct pt_regs *regs,
> > | void (*handler)(struct pt_regs *))
> > | {
> > | irq_enter_rcu();
> > | if (on_thread_stack())
> > | call_on_irq_stack(regs, handler);
> > | else
> > | handler(regs);
> > | irq_exit_rcu();
> > | }
> >
> > ... unless there's some problem with that?
> >
> Yeah, do_interrupt_handler() is a more suitable place. But to resolve
> the performance regression of rescheduling IPI [1], it is badly demanded to
> distinguish irqnr before calling irq_enter_rcu() (please see 5/5 and [2]
> for the context). So it is a compromise to host the code in GICv3.
>
> Any good idea?

There is no way we are going to single out a particular interrupt
controller. As for the "regression", we'll have to look at the numbers
once we have fixed the whole infrastructure.

M.

--
Without deviation from the norm, progress is not possible.

2021-09-29 08:34:41

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 4/5] irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64

On Wed, Sep 29, 2021 at 08:20:35AM +0100, Marc Zyngier wrote:
> On Wed, 29 Sep 2021 04:10:11 +0100,
> Pingfan Liu <[email protected]> wrote:
> >
> > On Tue, Sep 28, 2021 at 10:10:53AM +0100, Mark Rutland wrote:
> > > On Fri, Sep 24, 2021 at 09:28:36PM +0800, Pingfan Liu wrote:
> > > > The call to rcu_irq_enter() originated from gic_handle_irq() is
> > > > redundant now, since arm64 has enter_from_kernel_mode() akin to
> > > > irqenter_entry(), which has already called rcu_irq_enter().
> > >
> > > Here I think you're referring to the call in handle_domain_irq(), but
> > > that isn't clear from the commit message.
> > >
> > Yes, and I will make it clear in V2.
> >
> > > > Based on code analysis, the redundant can raise some mistake, e.g.
> > > > rcu_data->dynticks_nmi_nesting inc 2, which causes
> > > > rcu_is_cpu_rrupt_from_idle() unexpected.
> > > >
> > > > So eliminate the call to irq_enter() in handle_domain_irq(). And
> > > > accordingly supplementing irq_enter_rcu().
> > >
> > > We support many more irqchips on arm64, and GICv3 can be used on regular
> > > 32-bit arm, so this isn't right. Moving the irq_enter_rcu() call
> > > into the GICv3 driver specifically breaks other drivers on arm64 by
> > > removing the call, and breaks the GICv3 driver on arm by adding a
> > > duplicate call.
> > >
> > Oops. I forgot to protect the code in GICv3 with CONFIG_HAVE_ARCH_IRQENTRY
> >
> > > It looks like this should live in do_interrupt_handler() in
> > > arch/arm64/kernel/entry-common.c, e.g.
> > >
> > > | static void do_interrupt_handler(struct pt_regs *regs,
> > > | void (*handler)(struct pt_regs *))
> > > | {
> > > | irq_enter_rcu();
> > > | if (on_thread_stack())
> > > | call_on_irq_stack(regs, handler);
> > > | else
> > > | handler(regs);
> > > | irq_exit_rcu();
> > > | }
> > >
> > > ... unless there's some problem with that?
> > >
> > Yeah, do_interrupt_handler() is a more suitable place. But to resolve
> > the performance regression of rescheduling IPI [1], it is badly demanded to
> > distinguish irqnr before calling irq_enter_rcu() (please see 5/5 and [2]
> > for the context). So it is a compromise to host the code in GICv3.
> >
> > Any good idea?
>
> There is no way we are going to single out a particular interrupt
> controller. As for the "regression", we'll have to look at the numbers
> once we have fixed the whole infrastructure.
>
But I just realize that at present, gic_handle_nmi() sits behind
gic_handle_irq(). So it will make an mistaken for accounting of normal
interrupt if calling irq_enter_rcu() in do_interrupt_handler().

And going through drivers/irqchip/irq-chip-gic*, I think there are only
two files should be handled: irq-gic.c and irq-gic-v3.c.

Thanks,

Pingfan

2021-09-29 08:36:24

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 5/5] irqchip/GICv3: make reschedule-ipi light weight

On Wed, Sep 29, 2021 at 08:24:34AM +0100, Marc Zyngier wrote:
> On Fri, 24 Sep 2021 14:28:37 +0100,
> Pingfan Liu <[email protected]> wrote:
> >
> > To achieve the light weight as
> > DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_reschedule_ipi) on x86, it had
> > better treat irqnr differently at the frontend. And let IPI_RESCHEDULE
> > call __irq_enter_raw() instead of irq_enter_rcu().
> >
> > Signed-off-by: Pingfan Liu <[email protected]>
> > Cc: Catalin Marinas <[email protected]>
> > Cc: Will Deacon <[email protected]>
> > Cc: Mark Rutland <[email protected]>
> > Cc: Marc Zyngier <[email protected]>
> > Cc: Joey Gouly <[email protected]>
> > Cc: Sami Tolvanen <[email protected]>
> > Cc: Julien Thierry <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Yuichi Ito <[email protected]>
> > Cc: [email protected]
> > To: [email protected]
> > ---
> > drivers/irqchip/irq-gic-v3.c | 15 +++++++++++++--
> > 1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > index 906538fa8771..593d4539a209 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> > @@ -709,6 +709,9 @@ static void gic_handle_nmi(struct pt_regs *regs)
> > gic_deactivate_unhandled(irqnr);
> > }
> >
> > +/* RESCHEDULE IPI hwirq nr is 0, and the only raw one */
> > +static unsigned long raw_interrupt_mask = 1;
>
> I'm afraid you have the wrong end of the stick. This isn't a GIC
> property. This is an architecture decision, and only the architecture
> can expose what they want.
>
Could it done be export an interface int (*get_irq_nr)(pte_regs) to
do_interrupt_handler()? So it can be seen as an architecture
decision.


Thanks,

Pingfan
>
> --
> Without deviation from the norm, progress is not possible.
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2021-09-29 10:44:18

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 4/5] irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64

On Wed, Sep 29, 2021 at 04:27:11PM +0800, Pingfan Liu wrote:
> On Wed, Sep 29, 2021 at 08:20:35AM +0100, Marc Zyngier wrote:
> > On Wed, 29 Sep 2021 04:10:11 +0100,
> > Pingfan Liu <[email protected]> wrote:
> > >
> > > On Tue, Sep 28, 2021 at 10:10:53AM +0100, Mark Rutland wrote:
> > > > On Fri, Sep 24, 2021 at 09:28:36PM +0800, Pingfan Liu wrote:
> > > > > The call to rcu_irq_enter() originated from gic_handle_irq() is
> > > > > redundant now, since arm64 has enter_from_kernel_mode() akin to
> > > > > irqenter_entry(), which has already called rcu_irq_enter().
> > > >
> > > > Here I think you're referring to the call in handle_domain_irq(), but
> > > > that isn't clear from the commit message.
> > > >
> > > Yes, and I will make it clear in V2.
> > >
> > > > > Based on code analysis, the redundant can raise some mistake, e.g.
> > > > > rcu_data->dynticks_nmi_nesting inc 2, which causes
> > > > > rcu_is_cpu_rrupt_from_idle() unexpected.
> > > > >
> > > > > So eliminate the call to irq_enter() in handle_domain_irq(). And
> > > > > accordingly supplementing irq_enter_rcu().
> > > >
> > > > We support many more irqchips on arm64, and GICv3 can be used on regular
> > > > 32-bit arm, so this isn't right. Moving the irq_enter_rcu() call
> > > > into the GICv3 driver specifically breaks other drivers on arm64 by
> > > > removing the call, and breaks the GICv3 driver on arm by adding a
> > > > duplicate call.
> > > >
> > > Oops. I forgot to protect the code in GICv3 with CONFIG_HAVE_ARCH_IRQENTRY
> > >
> > > > It looks like this should live in do_interrupt_handler() in
> > > > arch/arm64/kernel/entry-common.c, e.g.
> > > >
> > > > | static void do_interrupt_handler(struct pt_regs *regs,
> > > > | void (*handler)(struct pt_regs *))
> > > > | {
> > > > | irq_enter_rcu();
> > > > | if (on_thread_stack())
> > > > | call_on_irq_stack(regs, handler);
> > > > | else
> > > > | handler(regs);
> > > > | irq_exit_rcu();
> > > > | }
> > > >
> > > > ... unless there's some problem with that?
> > > >
> > > Yeah, do_interrupt_handler() is a more suitable place. But to resolve
> > > the performance regression of rescheduling IPI [1], it is badly demanded to
> > > distinguish irqnr before calling irq_enter_rcu() (please see 5/5 and [2]
> > > for the context). So it is a compromise to host the code in GICv3.
> > >
> > > Any good idea?
> >
> > There is no way we are going to single out a particular interrupt
> > controller. As for the "regression", we'll have to look at the numbers
> > once we have fixed the whole infrastructure.
> >
> But I just realize that at present, gic_handle_nmi() sits behind
> gic_handle_irq(). So it will make an mistaken for accounting of normal
> interrupt if calling irq_enter_rcu() in do_interrupt_handler().

We can restructure entry-common.c to avoid that if necessary.

TBH, the more I see problems in this area the more I want to rip out the
pNMI bits...

> And going through drivers/irqchip/irq-chip-gic*, I think there are only
> two files should be handled: irq-gic.c and irq-gic-v3.c.

That are irqchips other than GICv2 and GICv3 that are used as the root
irqchip on arm64. For example, Raspberry Pi 3 uses
drivers/irqchip/irq-bcm2836.c as its root irqchip.

Thanks,
Mark.

2021-09-29 11:55:20

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 4/5] irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64

On Wed, Sep 29, 2021 at 10:23:58AM +0100, Mark Rutland wrote:
> On Wed, Sep 29, 2021 at 04:27:11PM +0800, Pingfan Liu wrote:
> > On Wed, Sep 29, 2021 at 08:20:35AM +0100, Marc Zyngier wrote:
> > > On Wed, 29 Sep 2021 04:10:11 +0100,
> > > Pingfan Liu <[email protected]> wrote:
> > > >
> > > > On Tue, Sep 28, 2021 at 10:10:53AM +0100, Mark Rutland wrote:
> > > > > On Fri, Sep 24, 2021 at 09:28:36PM +0800, Pingfan Liu wrote:
> > > > > > The call to rcu_irq_enter() originated from gic_handle_irq() is
> > > > > > redundant now, since arm64 has enter_from_kernel_mode() akin to
> > > > > > irqenter_entry(), which has already called rcu_irq_enter().
> > > > >
> > > > > Here I think you're referring to the call in handle_domain_irq(), but
> > > > > that isn't clear from the commit message.
> > > > >
> > > > Yes, and I will make it clear in V2.
> > > >
> > > > > > Based on code analysis, the redundant can raise some mistake, e.g.
> > > > > > rcu_data->dynticks_nmi_nesting inc 2, which causes
> > > > > > rcu_is_cpu_rrupt_from_idle() unexpected.
> > > > > >
> > > > > > So eliminate the call to irq_enter() in handle_domain_irq(). And
> > > > > > accordingly supplementing irq_enter_rcu().
> > > > >
> > > > > We support many more irqchips on arm64, and GICv3 can be used on regular
> > > > > 32-bit arm, so this isn't right. Moving the irq_enter_rcu() call
> > > > > into the GICv3 driver specifically breaks other drivers on arm64 by
> > > > > removing the call, and breaks the GICv3 driver on arm by adding a
> > > > > duplicate call.
> > > > >
> > > > Oops. I forgot to protect the code in GICv3 with CONFIG_HAVE_ARCH_IRQENTRY
> > > >
> > > > > It looks like this should live in do_interrupt_handler() in
> > > > > arch/arm64/kernel/entry-common.c, e.g.
> > > > >
> > > > > | static void do_interrupt_handler(struct pt_regs *regs,
> > > > > | void (*handler)(struct pt_regs *))
> > > > > | {
> > > > > | irq_enter_rcu();
> > > > > | if (on_thread_stack())
> > > > > | call_on_irq_stack(regs, handler);
> > > > > | else
> > > > > | handler(regs);
> > > > > | irq_exit_rcu();
> > > > > | }
> > > > >
> > > > > ... unless there's some problem with that?
> > > > >
> > > > Yeah, do_interrupt_handler() is a more suitable place. But to resolve
> > > > the performance regression of rescheduling IPI [1], it is badly demanded to
> > > > distinguish irqnr before calling irq_enter_rcu() (please see 5/5 and [2]
> > > > for the context). So it is a compromise to host the code in GICv3.
> > > >
> > > > Any good idea?
> > >
> > > There is no way we are going to single out a particular interrupt
> > > controller. As for the "regression", we'll have to look at the numbers
> > > once we have fixed the whole infrastructure.
> > >
> > But I just realize that at present, gic_handle_nmi() sits behind
> > gic_handle_irq(). So it will make an mistaken for accounting of normal
> > interrupt if calling irq_enter_rcu() in do_interrupt_handler().
>
> We can restructure entry-common.c to avoid that if necessary.
>
> TBH, the more I see problems in this area the more I want to rip out the
> pNMI bits...
>
Could you give further comments and some guide to my reply to [1/5],
which can help to decide pNMI at this earlier stage? If things can go
that way, then everything can be fixed easier.

I think they abstract the ability of irqchip by exporting two
interfaces:
void (*handle_arch_nmi)(struct pt_regs *regs);
bool (*interrupt_is_nmi)(void);
And each irqchip can select whether to implement or not.

> > And going through drivers/irqchip/irq-chip-gic*, I think there are only
> > two files should be handled: irq-gic.c and irq-gic-v3.c.
>
> That are irqchips other than GICv2 and GICv3 that are used as the root
> irqchip on arm64. For example, Raspberry Pi 3 uses
> drivers/irqchip/irq-bcm2836.c as its root irqchip.
>

Thanks for the explanation. The situation is worse than I had thought. And
no way out in that direction.


Thanks,

Pingfan

2021-09-29 15:58:32

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 4/5] irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64

On Wed, Sep 29, 2021 at 10:23:58AM +0100, Mark Rutland wrote:
> On Wed, Sep 29, 2021 at 04:27:11PM +0800, Pingfan Liu wrote:
> > On Wed, Sep 29, 2021 at 08:20:35AM +0100, Marc Zyngier wrote:
> > > On Wed, 29 Sep 2021 04:10:11 +0100,
> > > Pingfan Liu <[email protected]> wrote:
> > > >
> > > > On Tue, Sep 28, 2021 at 10:10:53AM +0100, Mark Rutland wrote:
> > > > > On Fri, Sep 24, 2021 at 09:28:36PM +0800, Pingfan Liu wrote:
> > > > > > The call to rcu_irq_enter() originated from gic_handle_irq() is
> > > > > > redundant now, since arm64 has enter_from_kernel_mode() akin to
> > > > > > irqenter_entry(), which has already called rcu_irq_enter().
> > > > >
> > > > > Here I think you're referring to the call in handle_domain_irq(), but
> > > > > that isn't clear from the commit message.
> > > > >
> > > > Yes, and I will make it clear in V2.
> > > >
> > > > > > Based on code analysis, the redundant can raise some mistake, e.g.
> > > > > > rcu_data->dynticks_nmi_nesting inc 2, which causes
> > > > > > rcu_is_cpu_rrupt_from_idle() unexpected.
> > > > > >
> > > > > > So eliminate the call to irq_enter() in handle_domain_irq(). And
> > > > > > accordingly supplementing irq_enter_rcu().
> > > > >
> > > > > We support many more irqchips on arm64, and GICv3 can be used on regular
> > > > > 32-bit arm, so this isn't right. Moving the irq_enter_rcu() call
> > > > > into the GICv3 driver specifically breaks other drivers on arm64 by
> > > > > removing the call, and breaks the GICv3 driver on arm by adding a
> > > > > duplicate call.
> > > > >
> > > > Oops. I forgot to protect the code in GICv3 with CONFIG_HAVE_ARCH_IRQENTRY
> > > >
> > > > > It looks like this should live in do_interrupt_handler() in
> > > > > arch/arm64/kernel/entry-common.c, e.g.
> > > > >
> > > > > | static void do_interrupt_handler(struct pt_regs *regs,
> > > > > | void (*handler)(struct pt_regs *))
> > > > > | {
> > > > > | irq_enter_rcu();
> > > > > | if (on_thread_stack())
> > > > > | call_on_irq_stack(regs, handler);
> > > > > | else
> > > > > | handler(regs);
> > > > > | irq_exit_rcu();
> > > > > | }
> > > > >
> > > > > ... unless there's some problem with that?
> > > > >
> > > > Yeah, do_interrupt_handler() is a more suitable place. But to resolve
> > > > the performance regression of rescheduling IPI [1], it is badly demanded to
> > > > distinguish irqnr before calling irq_enter_rcu() (please see 5/5 and [2]
> > > > for the context). So it is a compromise to host the code in GICv3.
> > > >
> > > > Any good idea?
> > >
> > > There is no way we are going to single out a particular interrupt
> > > controller. As for the "regression", we'll have to look at the numbers
> > > once we have fixed the whole infrastructure.
> > >
> > But I just realize that at present, gic_handle_nmi() sits behind
> > gic_handle_irq(). So it will make an mistaken for accounting of normal
> > interrupt if calling irq_enter_rcu() in do_interrupt_handler().
>
> We can restructure entry-common.c to avoid that if necessary.
>
> TBH, the more I see problems in this area the more I want to rip out the
> pNMI bits...
>

Overlook the undetermined pNMI, what about the partial patch like the following:

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 32f9796c4ffe..3c46f8fd0e2e 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -219,17 +219,20 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs)
lockdep_hardirqs_on(CALLER_ADDR0);
}

-static void noinstr enter_el1_irq_or_nmi(struct pt_regs *regs)
+static bool noinstr enter_el1_irq_or_nmi(struct pt_regs *regs)
{
- if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
+ if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs)) {
arm64_enter_nmi(regs);
- else
+ return false;
+ } else {
enter_from_kernel_mode(regs);
+ return true;
+ }
}

-static void noinstr exit_el1_irq_or_nmi(struct pt_regs *regs)
+static void noinstr exit_el1_irq_or_nmi(struct pt_regs *regs, bool is_irq)
{
- if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
+ if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !is_irq)
arm64_exit_nmi(regs);
else
exit_to_kernel_mode(regs);
@@ -261,12 +264,19 @@ static void __sched arm64_preempt_schedule_irq(void)
}

static void do_interrupt_handler(struct pt_regs *regs,
- void (*handler)(struct pt_regs *))
+ void (*handler)(struct pt_regs *),
+ bool is_irq)
{
+ if (likely(is_irq))
+ irq_enter_rcu();
+
if (on_thread_stack())
call_on_irq_stack(regs, handler);
else
handler(regs);
+
+ if (likely(is_irq))
+ irq_exit_rcu();
}

extern void (*handle_arch_irq)(struct pt_regs *);
@@ -435,10 +445,12 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
static void noinstr el1_interrupt(struct pt_regs *regs,
void (*handler)(struct pt_regs *))
{
+ bool is_irq;
+
write_sysreg(DAIF_PROCCTX_NOIRQ, daif);

- enter_el1_irq_or_nmi(regs);
- do_interrupt_handler(regs, handler);
+ is_irq = enter_el1_irq_or_nmi(regs);
+ do_interrupt_handler(regs, handler, is_irq);

/*
* Note: thread_info::preempt_count includes both thread_info::count
@@ -449,7 +461,7 @@ static void noinstr el1_interrupt(struct pt_regs *regs,
READ_ONCE(current_thread_info()->preempt_count) == 0)
arm64_preempt_schedule_irq();

- exit_el1_irq_or_nmi(regs);
+ exit_el1_irq_or_nmi(regs, is_irq);
}

asmlinkage void noinstr el1h_64_irq_handler(struct pt_regs *regs)
--
2.31.1


And the rest of do_interrupt_handler() in entry-common.c should be
converted accordingly.


Thanks,

Pingfan

2021-09-29 18:05:43

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 4/5] irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64

On Wed, Sep 29, 2021 at 10:29:09PM +0800, Pingfan Liu wrote:
> On Wed, Sep 29, 2021 at 10:23:58AM +0100, Mark Rutland wrote:
> > On Wed, Sep 29, 2021 at 04:27:11PM +0800, Pingfan Liu wrote:
> > > On Wed, Sep 29, 2021 at 08:20:35AM +0100, Marc Zyngier wrote:
> > > > On Wed, 29 Sep 2021 04:10:11 +0100,
> > > > Pingfan Liu <[email protected]> wrote:
> > > > >
> > > > > On Tue, Sep 28, 2021 at 10:10:53AM +0100, Mark Rutland wrote:
> > > > > > On Fri, Sep 24, 2021 at 09:28:36PM +0800, Pingfan Liu wrote:
> > > > > > > The call to rcu_irq_enter() originated from gic_handle_irq() is
> > > > > > > redundant now, since arm64 has enter_from_kernel_mode() akin to
> > > > > > > irqenter_entry(), which has already called rcu_irq_enter().
> > > > > >
> > > > > > Here I think you're referring to the call in handle_domain_irq(), but
> > > > > > that isn't clear from the commit message.
> > > > > >
> > > > > Yes, and I will make it clear in V2.
> > > > >
> > > > > > > Based on code analysis, the redundant can raise some mistake, e.g.
> > > > > > > rcu_data->dynticks_nmi_nesting inc 2, which causes
> > > > > > > rcu_is_cpu_rrupt_from_idle() unexpected.
> > > > > > >
> > > > > > > So eliminate the call to irq_enter() in handle_domain_irq(). And
> > > > > > > accordingly supplementing irq_enter_rcu().
> > > > > >
> > > > > > We support many more irqchips on arm64, and GICv3 can be used on regular
> > > > > > 32-bit arm, so this isn't right. Moving the irq_enter_rcu() call
> > > > > > into the GICv3 driver specifically breaks other drivers on arm64 by
> > > > > > removing the call, and breaks the GICv3 driver on arm by adding a
> > > > > > duplicate call.
> > > > > >
> > > > > Oops. I forgot to protect the code in GICv3 with CONFIG_HAVE_ARCH_IRQENTRY
> > > > >
> > > > > > It looks like this should live in do_interrupt_handler() in
> > > > > > arch/arm64/kernel/entry-common.c, e.g.
> > > > > >
> > > > > > | static void do_interrupt_handler(struct pt_regs *regs,
> > > > > > | void (*handler)(struct pt_regs *))
> > > > > > | {
> > > > > > | irq_enter_rcu();
> > > > > > | if (on_thread_stack())
> > > > > > | call_on_irq_stack(regs, handler);
> > > > > > | else
> > > > > > | handler(regs);
> > > > > > | irq_exit_rcu();
> > > > > > | }
> > > > > >
> > > > > > ... unless there's some problem with that?
> > > > > >
> > > > > Yeah, do_interrupt_handler() is a more suitable place. But to resolve
> > > > > the performance regression of rescheduling IPI [1], it is badly demanded to
> > > > > distinguish irqnr before calling irq_enter_rcu() (please see 5/5 and [2]
> > > > > for the context). So it is a compromise to host the code in GICv3.
> > > > >
> > > > > Any good idea?
> > > >
> > > > There is no way we are going to single out a particular interrupt
> > > > controller. As for the "regression", we'll have to look at the numbers
> > > > once we have fixed the whole infrastructure.
> > > >
> > > But I just realize that at present, gic_handle_nmi() sits behind
> > > gic_handle_irq(). So it will make an mistaken for accounting of normal
> > > interrupt if calling irq_enter_rcu() in do_interrupt_handler().
> >
> > We can restructure entry-common.c to avoid that if necessary.
> >
> > TBH, the more I see problems in this area the more I want to rip out the
> > pNMI bits...
> >
>
> Overlook the undetermined pNMI, what about the partial patch like the following:
>
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 32f9796c4ffe..3c46f8fd0e2e 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -219,17 +219,20 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs)
> lockdep_hardirqs_on(CALLER_ADDR0);
> }
>
> -static void noinstr enter_el1_irq_or_nmi(struct pt_regs *regs)
> +static bool noinstr enter_el1_irq_or_nmi(struct pt_regs *regs)
> {
> - if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> + if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs)) {
> arm64_enter_nmi(regs);
> - else
> + return false;
> + } else {
> enter_from_kernel_mode(regs);
> + return true;
> + }
> }
>
> -static void noinstr exit_el1_irq_or_nmi(struct pt_regs *regs)
> +static void noinstr exit_el1_irq_or_nmi(struct pt_regs *regs, bool is_irq)
> {
> - if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
> + if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !is_irq)
> arm64_exit_nmi(regs);
> else
> exit_to_kernel_mode(regs);
> @@ -261,12 +264,19 @@ static void __sched arm64_preempt_schedule_irq(void)
> }
>
> static void do_interrupt_handler(struct pt_regs *regs,
> - void (*handler)(struct pt_regs *))
> + void (*handler)(struct pt_regs *),
> + bool is_irq)
> {
> + if (likely(is_irq))
> + irq_enter_rcu();
> +
> if (on_thread_stack())
> call_on_irq_stack(regs, handler);
> else
> handler(regs);
> +
> + if (likely(is_irq))
> + irq_exit_rcu();
> }

I'm not keen on having a bunch of conditional calls like this, since
it's easy to get wrong and static analyzers are liable to complain if
they don't accurately track the stat of the condition variable across
multiple blocks, and tbh I wasn't too keen on the *_irq_or_nmi() helpers
after I wrote them.

I reckon structurally the below would be better; we can add the
irq_{enter,exit}_rcu() calls in __el1_interrupt() and el0_interrupt(),
around the calls to do_interrupt_handler(), and it makes it clearer by
that we won't preempt if IRQs were masked in the interrupted context
(which the preempt_count manipulation in __nmi_{enter,exit} ensures
today).

Thanks,
Mark.

---->8----
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 32f9796c4ffe..7af7ddbea4b6 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -219,22 +219,6 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs)
lockdep_hardirqs_on(CALLER_ADDR0);
}

-static void noinstr enter_el1_irq_or_nmi(struct pt_regs *regs)
-{
- if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
- arm64_enter_nmi(regs);
- else
- enter_from_kernel_mode(regs);
-}
-
-static void noinstr exit_el1_irq_or_nmi(struct pt_regs *regs)
-{
- if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
- arm64_exit_nmi(regs);
- else
- exit_to_kernel_mode(regs);
-}
-
static void __sched arm64_preempt_schedule_irq(void)
{
lockdep_assert_irqs_disabled();
@@ -432,12 +416,18 @@ asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
}
}

-static void noinstr el1_interrupt(struct pt_regs *regs,
- void (*handler)(struct pt_regs *))
+static __always_inline void
+__el1_pnmi(struct pt_regs *regs, void (*handler)(struct pt_regs *))
{
- write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
+ arm64_enter_nmi(regs);
+ do_interrupt_handler(regs, handler);
+ arm64_exit_nmi(regs);
+}

- enter_el1_irq_or_nmi(regs);
+static __always_inline void
+__el1_interrupt(struct pt_regs *regs, void (*handler)(struct pt_regs *))
+{
+ enter_from_kernel_mode(regs);
do_interrupt_handler(regs, handler);

/*
@@ -449,7 +439,18 @@ static void noinstr el1_interrupt(struct pt_regs *regs,
READ_ONCE(current_thread_info()->preempt_count) == 0)
arm64_preempt_schedule_irq();

- exit_el1_irq_or_nmi(regs);
+ exit_to_kernel_mode(regs);
+}
+
+static void noinstr el1_interrupt(struct pt_regs *regs,
+ void (*handler)(struct pt_regs *))
+{
+ write_sysreg(DAIF_PROCCTX_NOIRQ, daif);
+
+ if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs))
+ __el1_pnmi(regs, handler);
+ else
+ __el1_interrupt(regs, handler);
}

asmlinkage void noinstr el1h_64_irq_handler(struct pt_regs *regs)

2021-09-30 18:21:23

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead

On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote:
> On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote:
> > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> > > irq, which makes the condition !interrupts_enabled(regs) fail to detect
> > > the NMI. This will cause a mistaken account for irq.
> >
> Sorry about the confusing word "account", it should be "lockdep/rcu/.."
>
> > Can you please explain this in more detail? It's not clear which
> > specific case you mean when you say "NMI interrupts an irq", as that
> > could mean a number of distinct scenarios.
> >
> > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
> > causes a new exception we'll do the right thing. So either I'm missing a
> > subtlety or you're describing a different scenario..
> >
> > Note that the entry code is only trying to distinguish between:
> >
> > a) This exception is *definitely* an NMI (because regular interrupts
> > were masked).
> >
> > b) This exception is *either* and IRQ or an NMI (and this *cannot* be
> > distinguished until we acknowledge the interrupt), so we treat it as
> > an IRQ for now.
> >
> b) is the aim.
>
> At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc.
> While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(),
> which does not call rcu_irq_enter_check_tick(). So it is not proper to
> "treat it as an IRQ for now"

I'm struggling to understand the problem here. What is "not proper", and
why?

Do you think there's a correctness problem, or that we're doing more
work than necessary?

If you could give a specific example of a problem, it would really help.

I'm aware that we do more work than strictly necessary when we take a
pNMI from a context with IRQs enabled, but that's how we'd intended this
to work, as it's vastly simpler to manage the state that way. Unless
there's a real problem with that approach I'd prefer to leave it as-is.

Thanks,
Mark.

2021-10-08 04:06:14

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead

Sorry that I missed this message and I am just back from a long
festival.

Adding Paul for RCU guidance.

On Thu, Sep 30, 2021 at 02:32:57PM +0100, Mark Rutland wrote:
> On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote:
> > On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote:
> > > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> > > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> > > > irq, which makes the condition !interrupts_enabled(regs) fail to detect
> > > > the NMI. This will cause a mistaken account for irq.
> > >
> > Sorry about the confusing word "account", it should be "lockdep/rcu/.."
> >
> > > Can you please explain this in more detail? It's not clear which
> > > specific case you mean when you say "NMI interrupts an irq", as that
> > > could mean a number of distinct scenarios.
> > >
> > > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
> > > causes a new exception we'll do the right thing. So either I'm missing a
> > > subtlety or you're describing a different scenario..
> > >
> > > Note that the entry code is only trying to distinguish between:
> > >
> > > a) This exception is *definitely* an NMI (because regular interrupts
> > > were masked).
> > >
> > > b) This exception is *either* and IRQ or an NMI (and this *cannot* be
> > > distinguished until we acknowledge the interrupt), so we treat it as
> > > an IRQ for now.
> > >
> > b) is the aim.
> >
> > At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc.
> > While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(),
> > which does not call rcu_irq_enter_check_tick(). So it is not proper to
> > "treat it as an IRQ for now"
>
> I'm struggling to understand the problem here. What is "not proper", and
> why?
>
> Do you think there's a correctness problem, or that we're doing more
> work than necessary?
>
I had thought it just did redundant accounting. But after revisiting RCU
code, I think it confronts a real bug.

> If you could give a specific example of a problem, it would really help.
>
Refer to rcu_nmi_enter(), which can be called by
enter_from_kernel_mode():

||noinstr void rcu_nmi_enter(void)
||{
|| ...
|| if (rcu_dynticks_curr_cpu_in_eqs()) {
||
|| if (!in_nmi())
|| rcu_dynticks_task_exit();
||
|| // RCU is not watching here ...
|| rcu_dynticks_eqs_exit();
|| // ... but is watching here.
||
|| if (!in_nmi()) {
|| instrumentation_begin();
|| rcu_cleanup_after_idle();
|| instrumentation_end();
|| }
||
|| instrumentation_begin();
|| // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
|| instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks));
|| // instrumentation for the noinstr rcu_dynticks_eqs_exit()
|| instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
||
|| incby = 1;
|| } else if (!in_nmi()) {
|| instrumentation_begin();
|| rcu_irq_enter_check_tick();
|| } else {
|| instrumentation_begin();
|| }
|| ...
||}

There is 3 pieces of code put under the
protection of if (!in_nmi()). At least the last one
"rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it
is supposed to hold a spin lock with irqoff by
"raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same
scenario in rcu_nmi_exit()->rcu_prepare_for_idle().

As for the first two "if (!in_nmi())", I have no idea of why, except
breaching spin_lock_irq() by NMI. Hope Paul can give some guide.


Thanks,

Pingfan


> I'm aware that we do more work than strictly necessary when we take a
> pNMI from a context with IRQs enabled, but that's how we'd intended this
> to work, as it's vastly simpler to manage the state that way. Unless
> there's a real problem with that approach I'd prefer to leave it as-is.
>
> Thanks,
> Mark.
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2021-10-08 14:59:31

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead

On Fri, Oct 08, 2021 at 12:01:25PM +0800, Pingfan Liu wrote:
> Sorry that I missed this message and I am just back from a long
> festival.
>
> Adding Paul for RCU guidance.
>
> On Thu, Sep 30, 2021 at 02:32:57PM +0100, Mark Rutland wrote:
> > On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote:
> > > On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote:
> > > > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> > > > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> > > > > irq, which makes the condition !interrupts_enabled(regs) fail to detect
> > > > > the NMI. This will cause a mistaken account for irq.
> > > >
> > > Sorry about the confusing word "account", it should be "lockdep/rcu/.."
> > >
> > > > Can you please explain this in more detail? It's not clear which
> > > > specific case you mean when you say "NMI interrupts an irq", as that
> > > > could mean a number of distinct scenarios.
> > > >
> > > > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
> > > > causes a new exception we'll do the right thing. So either I'm missing a
> > > > subtlety or you're describing a different scenario..
> > > >
> > > > Note that the entry code is only trying to distinguish between:
> > > >
> > > > a) This exception is *definitely* an NMI (because regular interrupts
> > > > were masked).
> > > >
> > > > b) This exception is *either* and IRQ or an NMI (and this *cannot* be
> > > > distinguished until we acknowledge the interrupt), so we treat it as
> > > > an IRQ for now.
> > > >
> > > b) is the aim.
> > >
> > > At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc.
> > > While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(),
> > > which does not call rcu_irq_enter_check_tick(). So it is not proper to
> > > "treat it as an IRQ for now"
> >
> > I'm struggling to understand the problem here. What is "not proper", and
> > why?
> >
> > Do you think there's a correctness problem, or that we're doing more
> > work than necessary?
> >
> I had thought it just did redundant accounting. But after revisiting RCU
> code, I think it confronts a real bug.
>
> > If you could give a specific example of a problem, it would really help.
> >
> Refer to rcu_nmi_enter(), which can be called by
> enter_from_kernel_mode():
>
> ||noinstr void rcu_nmi_enter(void)
> ||{
> || ...
> || if (rcu_dynticks_curr_cpu_in_eqs()) {
> ||
> || if (!in_nmi())
> || rcu_dynticks_task_exit();
> ||
> || // RCU is not watching here ...
> || rcu_dynticks_eqs_exit();
> || // ... but is watching here.
> ||
> || if (!in_nmi()) {
> || instrumentation_begin();
> || rcu_cleanup_after_idle();
> || instrumentation_end();
> || }
> ||
> || instrumentation_begin();
> || // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
> || instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks));
> || // instrumentation for the noinstr rcu_dynticks_eqs_exit()
> || instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
> ||
> || incby = 1;
> || } else if (!in_nmi()) {
> || instrumentation_begin();
> || rcu_irq_enter_check_tick();
> || } else {
> || instrumentation_begin();
> || }
> || ...
> ||}
>

Forget to supplement the context for understanding the case:
On arm64, at present, a pNMI (akin to NMI) may call rcu_nmi_enter()
without calling "__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);".
As a result it can be mistaken as an normal interrupt in
rcu_nmi_enter().

And this may cause the following issue:
> There is 3 pieces of code put under the
> protection of if (!in_nmi()). At least the last one
> "rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it
> is supposed to hold a spin lock with irqoff by
> "raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same
> scenario in rcu_nmi_exit()->rcu_prepare_for_idle().
>
> As for the first two "if (!in_nmi())", I have no idea of why, except
> breaching spin_lock_irq() by NMI. Hope Paul can give some guide.
>
>
> Thanks,
>
> Pingfan
>
>
> > I'm aware that we do more work than strictly necessary when we take a
> > pNMI from a context with IRQs enabled, but that's how we'd intended this
> > to work, as it's vastly simpler to manage the state that way. Unless
> > there's a real problem with that approach I'd prefer to leave it as-is.
> >
> > Thanks,
> > Mark.
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2021-10-08 15:46:28

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead

On Fri, Oct 08, 2021 at 12:01:25PM +0800, Pingfan Liu wrote:
> Sorry that I missed this message and I am just back from a long
> festival.
>
> Adding Paul for RCU guidance.

Didn't the recent patch series cover this, or is this a new problem?

Thanx, Paul

> On Thu, Sep 30, 2021 at 02:32:57PM +0100, Mark Rutland wrote:
> > On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote:
> > > On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote:
> > > > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> > > > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> > > > > irq, which makes the condition !interrupts_enabled(regs) fail to detect
> > > > > the NMI. This will cause a mistaken account for irq.
> > > >
> > > Sorry about the confusing word "account", it should be "lockdep/rcu/.."
> > >
> > > > Can you please explain this in more detail? It's not clear which
> > > > specific case you mean when you say "NMI interrupts an irq", as that
> > > > could mean a number of distinct scenarios.
> > > >
> > > > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
> > > > causes a new exception we'll do the right thing. So either I'm missing a
> > > > subtlety or you're describing a different scenario..
> > > >
> > > > Note that the entry code is only trying to distinguish between:
> > > >
> > > > a) This exception is *definitely* an NMI (because regular interrupts
> > > > were masked).
> > > >
> > > > b) This exception is *either* and IRQ or an NMI (and this *cannot* be
> > > > distinguished until we acknowledge the interrupt), so we treat it as
> > > > an IRQ for now.
> > > >
> > > b) is the aim.
> > >
> > > At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc.
> > > While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(),
> > > which does not call rcu_irq_enter_check_tick(). So it is not proper to
> > > "treat it as an IRQ for now"
> >
> > I'm struggling to understand the problem here. What is "not proper", and
> > why?
> >
> > Do you think there's a correctness problem, or that we're doing more
> > work than necessary?
> >
> I had thought it just did redundant accounting. But after revisiting RCU
> code, I think it confronts a real bug.
>
> > If you could give a specific example of a problem, it would really help.
> >
> Refer to rcu_nmi_enter(), which can be called by
> enter_from_kernel_mode():
>
> ||noinstr void rcu_nmi_enter(void)
> ||{
> || ...
> || if (rcu_dynticks_curr_cpu_in_eqs()) {
> ||
> || if (!in_nmi())
> || rcu_dynticks_task_exit();
> ||
> || // RCU is not watching here ...
> || rcu_dynticks_eqs_exit();
> || // ... but is watching here.
> ||
> || if (!in_nmi()) {
> || instrumentation_begin();
> || rcu_cleanup_after_idle();
> || instrumentation_end();
> || }
> ||
> || instrumentation_begin();
> || // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
> || instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks));
> || // instrumentation for the noinstr rcu_dynticks_eqs_exit()
> || instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
> ||
> || incby = 1;
> || } else if (!in_nmi()) {
> || instrumentation_begin();
> || rcu_irq_enter_check_tick();
> || } else {
> || instrumentation_begin();
> || }
> || ...
> ||}
>
> There is 3 pieces of code put under the
> protection of if (!in_nmi()). At least the last one
> "rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it
> is supposed to hold a spin lock with irqoff by
> "raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same
> scenario in rcu_nmi_exit()->rcu_prepare_for_idle().
>
> As for the first two "if (!in_nmi())", I have no idea of why, except
> breaching spin_lock_irq() by NMI. Hope Paul can give some guide.
>
>
> Thanks,
>
> Pingfan
>
>
> > I'm aware that we do more work than strictly necessary when we take a
> > pNMI from a context with IRQs enabled, but that's how we'd intended this
> > to work, as it's vastly simpler to manage the state that way. Unless
> > there's a real problem with that approach I'd prefer to leave it as-is.
> >
> > Thanks,
> > Mark.
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2021-10-08 17:26:46

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead

On Fri, Oct 08, 2021 at 10:55:04PM +0800, Pingfan Liu wrote:
> On Fri, Oct 08, 2021 at 12:01:25PM +0800, Pingfan Liu wrote:
> > Sorry that I missed this message and I am just back from a long
> > festival.
> >
> > Adding Paul for RCU guidance.
> >
> > On Thu, Sep 30, 2021 at 02:32:57PM +0100, Mark Rutland wrote:
> > > On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote:
> > > > On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote:
> > > > > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> > > > > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> > > > > > irq, which makes the condition !interrupts_enabled(regs) fail to detect
> > > > > > the NMI. This will cause a mistaken account for irq.
> > > > >
> > > > Sorry about the confusing word "account", it should be "lockdep/rcu/.."
> > > >
> > > > > Can you please explain this in more detail? It's not clear which
> > > > > specific case you mean when you say "NMI interrupts an irq", as that
> > > > > could mean a number of distinct scenarios.
> > > > >
> > > > > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
> > > > > causes a new exception we'll do the right thing. So either I'm missing a
> > > > > subtlety or you're describing a different scenario..
> > > > >
> > > > > Note that the entry code is only trying to distinguish between:
> > > > >
> > > > > a) This exception is *definitely* an NMI (because regular interrupts
> > > > > were masked).
> > > > >
> > > > > b) This exception is *either* and IRQ or an NMI (and this *cannot* be
> > > > > distinguished until we acknowledge the interrupt), so we treat it as
> > > > > an IRQ for now.
> > > > >
> > > > b) is the aim.
> > > >
> > > > At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc.
> > > > While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(),
> > > > which does not call rcu_irq_enter_check_tick(). So it is not proper to
> > > > "treat it as an IRQ for now"
> > >
> > > I'm struggling to understand the problem here. What is "not proper", and
> > > why?
> > >
> > > Do you think there's a correctness problem, or that we're doing more
> > > work than necessary?
> > >
> > I had thought it just did redundant accounting. But after revisiting RCU
> > code, I think it confronts a real bug.
> >
> > > If you could give a specific example of a problem, it would really help.
> > >
> > Refer to rcu_nmi_enter(), which can be called by
> > enter_from_kernel_mode():
> >
> > ||noinstr void rcu_nmi_enter(void)
> > ||{
> > || ...
> > || if (rcu_dynticks_curr_cpu_in_eqs()) {
> > ||
> > || if (!in_nmi())
> > || rcu_dynticks_task_exit();
> > ||
> > || // RCU is not watching here ...
> > || rcu_dynticks_eqs_exit();
> > || // ... but is watching here.
> > ||
> > || if (!in_nmi()) {
> > || instrumentation_begin();
> > || rcu_cleanup_after_idle();
> > || instrumentation_end();
> > || }
> > ||
> > || instrumentation_begin();
> > || // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
> > || instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks));
> > || // instrumentation for the noinstr rcu_dynticks_eqs_exit()
> > || instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
> > ||
> > || incby = 1;
> > || } else if (!in_nmi()) {
> > || instrumentation_begin();
> > || rcu_irq_enter_check_tick();
> > || } else {
> > || instrumentation_begin();
> > || }
> > || ...
> > ||}
> >
>
> Forget to supplement the context for understanding the case:
> On arm64, at present, a pNMI (akin to NMI) may call rcu_nmi_enter()
> without calling "__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);".
> As a result it can be mistaken as an normal interrupt in
> rcu_nmi_enter().

I appreciate that there's a window where we treat the pNMI like an IRQ,
but that's by design, and we account for this in gic_handle_irq() and
gic_handle_nmi() where we "upgrade" to NMI context with
nmi_enter()..nmi_exit().

The idea is that we have two cases:

1) If we take a pNMI from a context where IRQs were masked, we know it
must be a pNMI, and perform the NMI entry immediately to avoid
reentrancy problems.

I think we're all happy with this case.

2) If we take a pNMI from a context where IRQs were unmasked, we don't know
whether the trigger was a pNMI/IRQ until we read from the GIC, and
since we *could* have taken an IRQ, this is equivalent to taking a
spurious IRQ, and while handling that, taking the NMI, e.g.

< run with IRQs unmasked >
~~~ take IRQ ~~~
< enter IRQ >
~~~ take NMI exception ~~~
< enter NMI >
< handle NMI >
< exit NMI >
~~~ return from NMI exception ~~~
< handle IRQ / spurious / do-nothing >
< exit IRQ >
~~~ return from IRQ exception ~~~
< continue running with IRQs unmasked >

... except that we don't do the HW NMI exception entry/exit, just all
the necessary SW accounting.


Note that case (2) can *never* nest within itself or within case (1).

Do you have a specific example of something that goes wrong with the
above? e.g. something that's inconsistent with that rationale?

> And this may cause the following issue:
> > There is 3 pieces of code put under the
> > protection of if (!in_nmi()). At least the last one
> > "rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it
> > is supposed to hold a spin lock with irqoff by
> > "raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same
> > scenario in rcu_nmi_exit()->rcu_prepare_for_idle().
> >
> > As for the first two "if (!in_nmi())", I have no idea of why, except
> > breaching spin_lock_irq() by NMI. Hope Paul can give some guide.

That code (in enter_from_kernel_mode()) only runs in case 2, where it
cannot be nested within a pNMI, so I struggle to see how this can
deadlock. It it can, then I would expect the general case of a pNMI
nesting within and IRQ would be broken?

Can you give a concrete example of a sequence that would lockup?
Currently I can't see how that's possible.

Thanks,
Mark.

2021-10-09 03:51:40

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead

On Fri, Oct 08, 2021 at 06:25:13PM +0100, Mark Rutland wrote:
> On Fri, Oct 08, 2021 at 10:55:04PM +0800, Pingfan Liu wrote:
> > On Fri, Oct 08, 2021 at 12:01:25PM +0800, Pingfan Liu wrote:
> > > Sorry that I missed this message and I am just back from a long
> > > festival.
> > >
> > > Adding Paul for RCU guidance.
> > >
> > > On Thu, Sep 30, 2021 at 02:32:57PM +0100, Mark Rutland wrote:
> > > > On Sat, Sep 25, 2021 at 11:39:55PM +0800, Pingfan Liu wrote:
> > > > > On Fri, Sep 24, 2021 at 06:53:06PM +0100, Mark Rutland wrote:
> > > > > > On Fri, Sep 24, 2021 at 09:28:33PM +0800, Pingfan Liu wrote:
> > > > > > > In enter_el1_irq_or_nmi(), it can be the case which NMI interrupts an
> > > > > > > irq, which makes the condition !interrupts_enabled(regs) fail to detect
> > > > > > > the NMI. This will cause a mistaken account for irq.
> > > > > >
> > > > > Sorry about the confusing word "account", it should be "lockdep/rcu/.."
> > > > >
> > > > > > Can you please explain this in more detail? It's not clear which
> > > > > > specific case you mean when you say "NMI interrupts an irq", as that
> > > > > > could mean a number of distinct scenarios.
> > > > > >
> > > > > > AFAICT, if we're in an IRQ handler (with NMIs unmasked), and an NMI
> > > > > > causes a new exception we'll do the right thing. So either I'm missing a
> > > > > > subtlety or you're describing a different scenario..
> > > > > >
> > > > > > Note that the entry code is only trying to distinguish between:
> > > > > >
> > > > > > a) This exception is *definitely* an NMI (because regular interrupts
> > > > > > were masked).
> > > > > >
> > > > > > b) This exception is *either* and IRQ or an NMI (and this *cannot* be
> > > > > > distinguished until we acknowledge the interrupt), so we treat it as
> > > > > > an IRQ for now.
> > > > > >
> > > > > b) is the aim.
> > > > >
> > > > > At the entry, enter_el1_irq_or_nmi() -> enter_from_kernel_mode()->rcu_irq_enter()/rcu_irq_enter_check_tick() etc.
> > > > > While at irqchip level, gic_handle_irq()->gic_handle_nmi()->nmi_enter(),
> > > > > which does not call rcu_irq_enter_check_tick(). So it is not proper to
> > > > > "treat it as an IRQ for now"
> > > >
> > > > I'm struggling to understand the problem here. What is "not proper", and
> > > > why?
> > > >
> > > > Do you think there's a correctness problem, or that we're doing more
> > > > work than necessary?
> > > >
> > > I had thought it just did redundant accounting. But after revisiting RCU
> > > code, I think it confronts a real bug.
> > >
> > > > If you could give a specific example of a problem, it would really help.
> > > >
> > > Refer to rcu_nmi_enter(), which can be called by
> > > enter_from_kernel_mode():
> > >
> > > ||noinstr void rcu_nmi_enter(void)
> > > ||{
> > > || ...
> > > || if (rcu_dynticks_curr_cpu_in_eqs()) {
> > > ||
> > > || if (!in_nmi())
> > > || rcu_dynticks_task_exit();
> > > ||
> > > || // RCU is not watching here ...
> > > || rcu_dynticks_eqs_exit();
> > > || // ... but is watching here.
> > > ||
> > > || if (!in_nmi()) {
> > > || instrumentation_begin();
> > > || rcu_cleanup_after_idle();
> > > || instrumentation_end();
> > > || }
> > > ||
> > > || instrumentation_begin();
> > > || // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
> > > || instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks));
> > > || // instrumentation for the noinstr rcu_dynticks_eqs_exit()
> > > || instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
> > > ||
> > > || incby = 1;
> > > || } else if (!in_nmi()) {
> > > || instrumentation_begin();
> > > || rcu_irq_enter_check_tick();
> > > || } else {
> > > || instrumentation_begin();
> > > || }
> > > || ...
> > > ||}
> > >
> >
> > Forget to supplement the context for understanding the case:
> > On arm64, at present, a pNMI (akin to NMI) may call rcu_nmi_enter()
> > without calling "__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);".
> > As a result it can be mistaken as an normal interrupt in
> > rcu_nmi_enter().
>
> I appreciate that there's a window where we treat the pNMI like an IRQ,
> but that's by design, and we account for this in gic_handle_irq() and
> gic_handle_nmi() where we "upgrade" to NMI context with
> nmi_enter()..nmi_exit().
>
> The idea is that we have two cases:
>
> 1) If we take a pNMI from a context where IRQs were masked, we know it
> must be a pNMI, and perform the NMI entry immediately to avoid
> reentrancy problems.
>
> I think we're all happy with this case.
>
Right.

> 2) If we take a pNMI from a context where IRQs were unmasked, we don't know
> whether the trigger was a pNMI/IRQ until we read from the GIC, and
> since we *could* have taken an IRQ, this is equivalent to taking a
> spurious IRQ, and while handling that, taking the NMI, e.g.
>
> < run with IRQs unmasked >
> ~~~ take IRQ ~~~
> < enter IRQ >
> ~~~ take NMI exception ~~~
> < enter NMI >
> < handle NMI >
> < exit NMI >
> ~~~ return from NMI exception ~~~
> < handle IRQ / spurious / do-nothing >
> < exit IRQ >
> ~~~ return from IRQ exception ~~~
> < continue running with IRQs unmasked >
>
Yes, here I am on the same page. (I think I used a wrong example in
previous email, which caused the confusion)

> ... except that we don't do the HW NMI exception entry/exit, just all
> the necessary SW accounting.
>
A little but important thing: local_irq_save() etc can not disable pNMI.

>
> Note that case (2) can *never* nest within itself or within case (1).
>
> Do you have a specific example of something that goes wrong with the
> above? e.g. something that's inconsistent with that rationale?
>
Please see the following comment.

> > And this may cause the following issue:
> > > There is 3 pieces of code put under the
> > > protection of if (!in_nmi()). At least the last one
> > > "rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it
> > > is supposed to hold a spin lock with irqoff by
> > > "raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same
> > > scenario in rcu_nmi_exit()->rcu_prepare_for_idle().

Sorry that this should be an wrong example, since here it takes the case (1).

Concentrating on the spin lock rcu_node->lock, there are two operators:
raw_spin_lock_rcu_node()
raw_spin_trylock_rcu_node()

Then suppose the scenario for deadlock:
note_gp_changes() in non-irq-context
{
local_irq_save(flags);
...
raw_spin_trylock_rcu_node(rnp) // hold lock
needwake = __note_gp_changes(rnp, rdp); ------\
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); \
} \
\---> pNMI breaks in due to local_irq_save() can not disable it.
rcu_irq_enter() without __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET)
->rcu_nmi_enter()
{
else if (!in_nmi())
rcu_irq_enter_check_tick()
->__rcu_irq_enter_check_tick()
{
...
raw_spin_lock_rcu_node(rdp->mynode);

//Oops deadlock!
}
}


> > >
> > > As for the first two "if (!in_nmi())", I have no idea of why, except
> > > breaching spin_lock_irq() by NMI. Hope Paul can give some guide.
>
> That code (in enter_from_kernel_mode()) only runs in case 2, where it
> cannot be nested within a pNMI, so I struggle to see how this can
> deadlock. It it can, then I would expect the general case of a pNMI
> nesting within and IRQ would be broken?
>
Sorry again for the previous misleading wrong example. Hope my new
example can help.

> Can you give a concrete example of a sequence that would lockup?
> Currently I can't see how that's possible.
>

It seems the RCU subsystem has a strict semantic on NMI and normal
interrupt. Besides the deadlock example, there may be other supprise to
confront with (will trace it on another mail with Paul)

Thanks,

Pingfan

2021-10-09 04:16:01

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead

On Fri, Oct 08, 2021 at 08:45:23AM -0700, Paul E. McKenney wrote:
> On Fri, Oct 08, 2021 at 12:01:25PM +0800, Pingfan Liu wrote:
> > Sorry that I missed this message and I am just back from a long
> > festival.
> >
> > Adding Paul for RCU guidance.
>
> Didn't the recent patch series cover this, or is this a new problem?
>
Sorry not to explain it clearly. This is a new problem.

The acked recent series derive from [3-4/5], which addresses the nested calling:
in a single normal interrupt handler
rcu_irq_enter()
rcu_irq_enter()
...
rcu_irq_exit()
rcu_irq_exit()


While this new problem [1-2/5] is about pNMI (similar to NMI in this context).
On arm64, the current process in a pNMI handler looks like:
rcu_irq_enter(){ rcu_nmi_enter()}
^^^ At this point, the handler is treated as a normal interrupt temporary, (no chance to __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);).
So rcu_nmi_enter() can not distinguish NMI, because "if (!in_nmi())" can not tell it. (goto "questionA")
nmi_enter()
NMI handler
nmi_exit()
rcu_irq_exit()

[...]
> > Refer to rcu_nmi_enter(), which can be called by
> > enter_from_kernel_mode():
> >
> > ||noinstr void rcu_nmi_enter(void)
> > ||{
> > || ...
> > || if (rcu_dynticks_curr_cpu_in_eqs()) {
> > ||
> > || if (!in_nmi())
> > || rcu_dynticks_task_exit();
> > ||
> > || // RCU is not watching here ...
> > || rcu_dynticks_eqs_exit();
> > || // ... but is watching here.
> > ||
> > || if (!in_nmi()) {
> > || instrumentation_begin();
> > || rcu_cleanup_after_idle();
> > || instrumentation_end();
> > || }
> > ||
> > || instrumentation_begin();
> > || // instrumentation for the noinstr rcu_dynticks_curr_cpu_in_eqs()
> > || instrument_atomic_read(&rdp->dynticks, sizeof(rdp->dynticks));
> > || // instrumentation for the noinstr rcu_dynticks_eqs_exit()
> > || instrument_atomic_write(&rdp->dynticks, sizeof(rdp->dynticks));
> > ||
> > || incby = 1;
> > || } else if (!in_nmi()) {
> > || instrumentation_begin();
> > || rcu_irq_enter_check_tick();
> > || } else {
> > || instrumentation_begin();
> > || }
> > || ...
> > ||}
> >
> > There is 3 pieces of code put under the
> > protection of if (!in_nmi()). At least the last one
> > "rcu_irq_enter_check_tick()" can trigger a hard lock up bug. Because it
> > is supposed to hold a spin lock with irqoff by
> > "raw_spin_lock_rcu_node(rdp->mynode)", but pNMI can breach it. The same
> > scenario in rcu_nmi_exit()->rcu_prepare_for_idle().
> >

questionA:
> > As for the first two "if (!in_nmi())", I have no idea of why, except
> > breaching spin_lock_irq() by NMI. Hope Paul can give some guide.
> >

Thanks,

Pingfan