2019-04-12 12:07:41

by Xiongfeng Wang

[permalink] [raw]
Subject: [RFC PATCH 0/3] Enable kprobe to monitor sdei event handler

When I use kprobe to monitor a sdei event handler, the CPU will hang. It's
because when I probe the event handler, the instruction will be replaced with
brk instruction and brk exception is unmaskable. But 'vbar_el1' contains
'tramp_vectors' in '_sdei_handler' when SDEI events interrupt userspace, so
we will go to the wrong place if brk exception happens.

I notice that 'ghes_sdei_normal_callback' call several funtions that are not
marked as 'nokprobe'. So I was wondering if we can enable kprobe in '_sdei_handler'.


Xiongfeng Wang (3):
Revert "arm64: debug: remove unused local_dbg_{enable, disable}
macros"
sdei: enable dbg in '_sdei_handler'
stop_machine: mask sdei before running the callback

arch/arm64/include/asm/debug-monitors.h | 1 +
arch/arm64/include/asm/irqflags.h | 4 +++
arch/arm64/kernel/debug-monitors.c | 8 ++++++
arch/arm64/kernel/sdei.c | 43 ++++++++++++++++++++++++++-------
kernel/stop_machine.c | 9 +++++++
5 files changed, 56 insertions(+), 9 deletions(-)

--
1.7.12.4


2019-04-12 12:07:34

by Xiongfeng Wang

[permalink] [raw]
Subject: [RFC PATCH 3/3] stop_machine: mask sdei before running the callback

Kprobes use 'stop_machine' to modify code which could be run in the
sdei event handler at the same time. This patch mask sdei before running
the stop_machine callback to avoid this race condition.

Signed-off-by: Xiongfeng Wang <[email protected]>
---
kernel/stop_machine.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 067cb83..7b95632 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -22,6 +22,9 @@
#include <linux/atomic.h>
#include <linux/nmi.h>
#include <linux/sched/wake_q.h>
+#ifdef CONFIG_ARM64
+#include <linux/arm_sdei.h>
+#endif

/*
* Structure to determine completion condition and record errors. May
@@ -208,6 +211,9 @@ static int multi_cpu_stop(void *data)
case MULTI_STOP_DISABLE_IRQ:
local_irq_disable();
hard_irq_disable();
+#ifdef CONFIG_ARM64
+ sdei_mask_local_cpu();
+#endif
break;
case MULTI_STOP_RUN:
if (is_active)
@@ -227,6 +233,9 @@ static int multi_cpu_stop(void *data)
}
} while (curstate != MULTI_STOP_EXIT);

+#ifdef CONFIG_ARM64
+ sdei_unmask_local_cpu();
+#endif
local_irq_restore(flags);
return err;
}
--
1.7.12.4

2019-04-12 12:08:27

by Xiongfeng Wang

[permalink] [raw]
Subject: [RFC PATCH 1/3] Revert "arm64: debug: remove unused local_dbg_{enable, disable} macros"

This reverts commit 2572214170fb95370be21915c0397f4b6a27e7e3.

This is a preparation for the following patch. We need to enable debug
in _sdei_handler, so let's revert this patch.

Signed-off-by: Xiongfeng Wang <[email protected]>
---
arch/arm64/include/asm/irqflags.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index 43d8366..0a54c45 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -140,5 +140,9 @@ static inline int arch_irqs_disabled_flags(unsigned long flags)

return res;
}
+
+#define local_dbg_enable() asm("msr daifclr, #8" : : : "memory")
+#define local_dbg_disable() asm("msr daifset, #8" : : : "memory")
+
#endif
#endif
--
1.7.12.4

2019-04-12 12:09:09

by Xiongfeng Wang

[permalink] [raw]
Subject: [RFC PATCH 2/3] sdei: enable dbg in '_sdei_handler'

When we monitor a sdei_event callback using 'kprobe', the singlestep
handler can not be called because dbg is masked in sdei_handler. This
patch enable dbg in '_sdei_handler'.

When SDEI events interrupt the userspace, 'vbar_el1' contains
'tramp_vectors' if CONFIG_UNMAP_KERNEL_AT_EL0 is enabled. So we need to
restore 'vbar_el1' with kernel vectors, otherwise we will go to the
wrong place when brk exception or dbg exception happens.

SDEI events may interrupt 'kernel_entry' before we save 'spsr_el1' and
'elr_el1', and dbg exception will corrupts these two registers. So we
also need to save and restore these two registers.

If SDEI events interrupt an instruction being singlestepped, the
instruction in '_sdei_handler' will begin to be singlestepped after we
enable dbg. So we need to disable singlestep in the beginning of
_sdei_handler if we find out we interrupt a singlestep process.

Signed-off-by: Xiongfeng Wang <[email protected]>
---
arch/arm64/include/asm/debug-monitors.h | 1 +
arch/arm64/kernel/debug-monitors.c | 8 ++++++
arch/arm64/kernel/sdei.c | 43 ++++++++++++++++++++++++++-------
3 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
index a44cf52..a730266 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -121,6 +121,7 @@ enum dbg_active_el {
void user_fastforward_single_step(struct task_struct *task);

void kernel_enable_single_step(struct pt_regs *regs);
+void kernel_enable_single_step_noregs(void);
void kernel_disable_single_step(void);
int kernel_active_single_step(void);

diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index d7bb6ae..d6074f4 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -404,6 +404,14 @@ void kernel_enable_single_step(struct pt_regs *regs)
}
NOKPROBE_SYMBOL(kernel_enable_single_step);

+void kernel_enable_single_step_noregs(void)
+{
+ WARN_ON(!irqs_disabled());
+ mdscr_write(mdscr_read() | DBG_MDSCR_SS);
+ enable_debug_monitors(DBG_ACTIVE_EL1);
+}
+NOKPROBE_SYMBOL(kernel_enable_single_step_noregs);
+
void kernel_disable_single_step(void)
{
WARN_ON(!irqs_disabled());
diff --git a/arch/arm64/kernel/sdei.c b/arch/arm64/kernel/sdei.c
index ea94cf8..9dd9cf6 100644
--- a/arch/arm64/kernel/sdei.c
+++ b/arch/arm64/kernel/sdei.c
@@ -9,6 +9,7 @@
#include <linux/uaccess.h>

#include <asm/alternative.h>
+#include <asm/debug-monitors.h>
#include <asm/kprobes.h>
#include <asm/mmu.h>
#include <asm/ptrace.h>
@@ -176,6 +177,8 @@ unsigned long sdei_arch_get_entry_point(int conduit)

}

+extern char vectors[]; /* kernel exception vectors */
+
/*
* __sdei_handler() returns one of:
* SDEI_EV_HANDLED - success, return to the interrupted context.
@@ -189,8 +192,10 @@ static __kprobes unsigned long _sdei_handler(struct pt_regs *regs,
int i, err = 0;
int clobbered_registers = 4;
u64 elr = read_sysreg(elr_el1);
+ u64 spsr = read_sysreg(spsr_el1);
u32 kernel_mode = read_sysreg(CurrentEL) | 1; /* +SPSel */
unsigned long vbar = read_sysreg(vbar_el1);
+ int ss_active = 0;

if (arm64_kernel_unmapped_at_el0())
clobbered_registers++;
@@ -207,19 +212,39 @@ static __kprobes unsigned long _sdei_handler(struct pt_regs *regs,
*/
__uaccess_enable_hw_pan();

+ /*
+ * Enable dbg here so that we can kprobe a sdei event handler. Before we
+ * enable dbg, we need to restore vbar_el1 with kernel vectors
+ */
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+ write_sysreg(vectors, vbar_el1);
+ isb();
+#endif
+ ss_active = kernel_active_single_step();
+ if (ss_active)
+ kernel_disable_single_step();
+ local_dbg_enable();
+
err = sdei_event_handler(regs, arg);
+
+ local_dbg_disable();
+ if (ss_active)
+ kernel_enable_single_step_noregs();
+
+ /*
+ * brk exception will corrupt elr_el1 and spsr_el1, and trust firmware
+ * doesn't save it for us. So we need to restore these two registers
+ * after 'sdei_event_handler'.
+ */
+ write_sysreg(elr, elr_el1);
+ write_sysreg(spsr, spsr_el1);
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+ write_sysreg(vbar, vbar_el1);
+#endif
+
if (err)
return SDEI_EV_FAILED;

- if (elr != read_sysreg(elr_el1)) {
- /*
- * We took a synchronous exception from the SDEI handler.
- * This could deadlock, and if you interrupt KVM it will
- * hyp-panic instead.
- */
- pr_warn("unsafe: exception during handler\n");
- }
-
mode = regs->pstate & (PSR_MODE32_BIT | PSR_MODE_MASK);

/*
--
1.7.12.4

2019-04-24 16:24:36

by James Morse

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] sdei: enable dbg in '_sdei_handler'

Hi Xiongfeng Wang,

On 12/04/2019 13:04, Xiongfeng Wang wrote:
> When we monitor a sdei_event callback using 'kprobe', the singlestep
> handler can not be called because dbg is masked in sdei_handler.

For a good reason: the debug hardware may be in use single-stepping the kernel, or worse:
in use by a KVM guest.

The DAIF flags are all set because none of these things are safe for an
SDEI-event/NMI-handler. We haven't done KVM's guest exit work yet, so things like the
debug hardware belong to the guest.

Its not safe to take an exception from NMI context, avoid it at all costs!


> This patch enable dbg in '_sdei_handler'.

This is very bad as the debug hardware may have been in use by something else. A malicious
guest can now cause you to take breakpoint/watchpoint exceptions.


> When SDEI events interrupt the userspace, 'vbar_el1' contains
> 'tramp_vectors' if CONFIG_UNMAP_KERNEL_AT_EL0 is enabled. So we need to
> restore 'vbar_el1' with kernel vectors, otherwise we will go to the
> wrong place when brk exception or dbg exception happens.

This is the tip of the iceberg. You may have been partway through KVM's world-switch,
you'd need to temporarily save/restore the host context. This ends up being a
parody-world-switch, which has to be kept in-sync with KVM. We didn't go this way because
its fragile and unmaintainable.


> SDEI events may interrupt 'kernel_entry' before we save 'spsr_el1' and
> 'elr_el1', and dbg exception will corrupts these two registers. So we
> also need to save and restore these two registers.

(or don't do anything that would cause them to get clobbered)


> If SDEI events interrupt an instruction being singlestepped, the
> instruction in '_sdei_handler' will begin to be singlestepped after we
> enable dbg. So we need to disable singlestep in the beginning of
> _sdei_handler if we find out we interrupt a singlestep process.

And now the arch code's do_debug_exception() is re-rentrant, which it doesn't expect.

The kprobes core code can't be kprobed, but it can be interrupted by NMI. If you can
kprobe the NMI, you've made the kprobes core re-entrant too. A quick look shows
raw_spin_lock() that could deadlock, and it passes values around with a per-cpu variable.

You could interrupt any part of the krpobes machinery with an SDEI event, take a kprobe,
then take a critical SDEI event, and a third kprobe.

I don't see how its possible to make this safe without re-writing kprobes to be NMI safe.


> diff --git a/arch/arm64/kernel/sdei.c b/arch/arm64/kernel/sdei.c
> index ea94cf8..9dd9cf6 100644
> --- a/arch/arm64/kernel/sdei.c
> +++ b/arch/arm64/kernel/sdei.c

> - if (elr != read_sysreg(elr_el1)) {
> - /*
> - * We took a synchronous exception from the SDEI handler.
> - * This could deadlock, and if you interrupt KVM it will
> - * hyp-panic instead.
> - */
> - pr_warn("unsafe: exception during handler\n");
> - }

This is here to catch a WARN_ON() firing and not killing the system. Its not safe to take
an exception from the NMI handler.


Why do you need to kprobe an NMI handler?


Thanks,

James

2019-04-24 22:35:33

by James Morse

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] Enable kprobe to monitor sdei event handler

Hi Xiongfeng Wang,

On 12/04/2019 13:04, Xiongfeng Wang wrote:
> When I use kprobe to monitor a sdei event handler,

Don't do this! SDEI is like an NMI, it isn't safe to kprobe it as it can interrupt the
kprobe code, causing it become re-entrant.


> the CPU will hang. It's
> because when I probe the event handler, the instruction will be replaced with
> brk instruction and brk exception is unmaskable. But 'vbar_el1' contains
> 'tramp_vectors' in '_sdei_handler' when SDEI events interrupt userspace, so
> we will go to the wrong place if brk exception happens.

This was lucky! Its even more fun if the SDEI event interrupted a guest: the kvm vectors
will give you a hyp-panic.

The __kprobes and NOKPROBE_SYMBOL() litter should stop you doing this.


> I notice that 'ghes_sdei_normal_callback' call several funtions that are not
> marked as 'nokprobe'.

Bother. We should probably blacklist those too, its not safe.


> So I was wondering if we can enable kprobe in '_sdei_handler'.

I don't think this can be done safely.


If you need to monitor your SDEI event handler you can just use printk(). Once nmi_enter()
has been called these are safe as they stash data in a per-cpu buffer. The SDEI handler
will exit via the IRQ vector if it can, which will cause this buffer to be flushed to the
console in a timely manner.


Why do you need to kprobe an NMI handler?



Thanks!

James

2019-04-26 08:21:34

by Xiongfeng Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] Enable kprobe to monitor sdei event handler

Hi James,

Thanks for your reply!

On 2019/4/25 0:20, James Morse wrote:
> Hi Xiongfeng Wang,
>
> On 12/04/2019 13:04, Xiongfeng Wang wrote:
>> When I use kprobe to monitor a sdei event handler,
>
> Don't do this! SDEI is like an NMI, it isn't safe to kprobe it as it can interrupt the
> kprobe code, causing it become re-entrant.
>
>
>> the CPU will hang. It's
>> because when I probe the event handler, the instruction will be replaced with
>> brk instruction and brk exception is unmaskable. But 'vbar_el1' contains
>> 'tramp_vectors' in '_sdei_handler' when SDEI events interrupt userspace, so
>> we will go to the wrong place if brk exception happens.
>
> This was lucky! Its even more fun if the SDEI event interrupted a guest: the kvm vectors
> will give you a hyp-panic.
>
> The __kprobes and NOKPROBE_SYMBOL() litter should stop you doing this.
>
>
>> I notice that 'ghes_sdei_normal_callback' call several funtions that are not
>> marked as 'nokprobe'.
>
> Bother. We should probably blacklist those too, its not safe.
>
>
>> So I was wondering if we can enable kprobe in '_sdei_handler'.
>
> I don't think this can be done safely.
>
>
> If you need to monitor your SDEI event handler you can just use printk(). Once nmi_enter()
> has been called these are safe as they stash data in a per-cpu buffer. The SDEI handler
> will exit via the IRQ vector if it can, which will cause this buffer to be flushed to the
> console in a timely manner.
>

Thanks for your advice. I agree it's really not a good idea to take exception in NMI context.

>
> Why do you need to kprobe an NMI handler?
>

Because that 'Pseudo NMI' has a great effect on performance, we are still planning to
use SDEI for hardlockup detection.

When someone kprobe the functions in _sdei_handler, things will go wrong.
It's not that we want to monitor the SDEI event handler. It's just that we want to make sure
the system goes well even some people are monitoring any functions available. Some test engineer
may test 'kprobe' by monitoring all the functions that are allowed to be kprobed.
Anyway, thanks for your advice. I think I will need to mark all the functions called
in __sdei_handler as 'nokprobe'.

Thanks,
Xiongfeng

>
>
> Thanks!
>
> James
>
> .
>