2020-09-15 18:12:29

by Michal Suchánek

[permalink] [raw]
Subject: [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting"

This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d.

This commit causes the kernel to oops and reboot when injecting a SLB
multihit which causes a MCE.

Before this commit a SLB multihit was corrected by the kernel and the
system continued to operate normally.

cc: [email protected]
Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
Signed-off-by: Michal Suchanek <[email protected]>
---
arch/powerpc/kernel/mce.c | 7 -------
arch/powerpc/kernel/traps.c | 18 +++---------------
2 files changed, 3 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index ada59f6c4298..2e13528dcc92 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -591,14 +591,10 @@ EXPORT_SYMBOL_GPL(machine_check_print_event_info);
long notrace machine_check_early(struct pt_regs *regs)
{
long handled = 0;
- bool nested = in_nmi();
u8 ftrace_enabled = this_cpu_get_ftrace_enabled();

this_cpu_set_ftrace_enabled(0);

- if (!nested)
- nmi_enter();
-
hv_nmi_check_nonrecoverable(regs);

/*
@@ -607,9 +603,6 @@ long notrace machine_check_early(struct pt_regs *regs)
if (ppc_md.machine_check_early)
handled = ppc_md.machine_check_early(regs);

- if (!nested)
- nmi_exit();
-
this_cpu_set_ftrace_enabled(ftrace_enabled);

return handled;
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index d1ebe152f210..7853b770918d 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -827,19 +827,7 @@ void machine_check_exception(struct pt_regs *regs)
{
int recover = 0;

- /*
- * BOOK3S_64 does not call this handler as a non-maskable interrupt
- * (it uses its own early real-mode handler to handle the MCE proper
- * and then raises irq_work to call this handler when interrupts are
- * enabled).
- *
- * This is silly. The BOOK3S_64 should just call a different function
- * rather than expecting semantics to magically change. Something
- * like 'non_nmi_machine_check_exception()', perhaps?
- */
- const bool nmi = !IS_ENABLED(CONFIG_PPC_BOOK3S_64);
-
- if (nmi) nmi_enter();
+ nmi_enter();

__this_cpu_inc(irq_stat.mce_exceptions);

@@ -865,7 +853,7 @@ void machine_check_exception(struct pt_regs *regs)
if (check_io_access(regs))
goto bail;

- if (nmi) nmi_exit();
+ nmi_exit();

die("Machine check", regs, SIGBUS);

@@ -876,7 +864,7 @@ void machine_check_exception(struct pt_regs *regs)
return;

bail:
- if (nmi) nmi_exit();
+ nmi_exit();
}

void SMIException(struct pt_regs *regs)
--
2.28.0


2020-09-15 18:33:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting"

On Tue, Sep 15, 2020 at 08:06:59PM +0200, Michal Suchanek wrote:
> This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d.
>
> This commit causes the kernel to oops and reboot when injecting a SLB
> multihit which causes a MCE.
>
> Before this commit a SLB multihit was corrected by the kernel and the
> system continued to operate normally.
>
> cc: [email protected]
> Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
> Signed-off-by: Michal Suchanek <[email protected]>

Ever since 69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()")
nmi_enter() supports nesting natively.

2020-09-16 09:01:39

by Michal Suchánek

[permalink] [raw]
Subject: Re: [PATCH] Revert "powerpc/64s: machine check interrupt update NMI accounting"

On Tue, Sep 15, 2020 at 08:16:42PM +0200, [email protected] wrote:
> On Tue, Sep 15, 2020 at 08:06:59PM +0200, Michal Suchanek wrote:
> > This reverts commit 116ac378bb3ff844df333e7609e7604651a0db9d.
> >
> > This commit causes the kernel to oops and reboot when injecting a SLB
> > multihit which causes a MCE.
> >
> > Before this commit a SLB multihit was corrected by the kernel and the
> > system continued to operate normally.
> >
> > cc: [email protected]
> > Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
> > Signed-off-by: Michal Suchanek <[email protected]>
>
> Ever since 69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()")
> nmi_enter() supports nesting natively.

And this patch was merged in parallel with this native nesting support
and conflicted with it - hence the explicit nesting in the hunk that did
not conflict.

Either way the bug is present on kernels both with and without
69ea03b56ed2. So besides the conflict 69ea03b56ed2 does not affect this
problem.

Thanks

Michal