2020-08-05 19:24:38

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers

On Wed, 5 Aug 2020 at 16:36, Marco Elver <[email protected]> wrote:
>
> On Wed, 5 Aug 2020 at 16:17, <[email protected]> wrote:
> >
> > On Wed, Aug 05, 2020 at 04:12:37PM +0200, [email protected] wrote:
> > > On Wed, Aug 05, 2020 at 03:59:40PM +0200, Marco Elver wrote:
> > > > On Wed, Aug 05, 2020 at 03:42PM +0200, [email protected] wrote:
> > >
> > > > > Shouldn't we __always_inline those? They're going to be really small.
> > > >
> > > > I can send a v2, and you can choose. For reference, though:
> > > >
> > > > ffffffff86271ee0 <arch_local_save_flags>:
> > > > ffffffff86271ee0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > > > ffffffff86271ee5: 48 83 3d 43 87 e4 01 cmpq $0x0,0x1e48743(%rip) # ffffffff880ba630 <pv_ops+0x120>
> > > > ffffffff86271eec: 00
> > > > ffffffff86271eed: 74 0d je ffffffff86271efc <arch_local_save_flags+0x1c>
> > > > ffffffff86271eef: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > > > ffffffff86271ef4: ff 14 25 30 a6 0b 88 callq *0xffffffff880ba630
> > > > ffffffff86271efb: c3 retq
> > > > ffffffff86271efc: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > > > ffffffff86271f01: 0f 0b ud2
> > >
> > > > ffffffff86271a90 <arch_local_irq_restore>:
> > > > ffffffff86271a90: 53 push %rbx
> > > > ffffffff86271a91: 48 89 fb mov %rdi,%rbx
> > > > ffffffff86271a94: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > > > ffffffff86271a99: 48 83 3d 97 8b e4 01 cmpq $0x0,0x1e48b97(%rip) # ffffffff880ba638 <pv_ops+0x128>
> > > > ffffffff86271aa0: 00
> > > > ffffffff86271aa1: 74 11 je ffffffff86271ab4 <arch_local_irq_restore+0x24>
> > > > ffffffff86271aa3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > > > ffffffff86271aa8: 48 89 df mov %rbx,%rdi
> > > > ffffffff86271aab: ff 14 25 38 a6 0b 88 callq *0xffffffff880ba638
> > > > ffffffff86271ab2: 5b pop %rbx
> > > > ffffffff86271ab3: c3 retq
> > > > ffffffff86271ab4: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > > > ffffffff86271ab9: 0f 0b ud2
> > >
> > >
> > > Blergh, that's abysmall. In part I suspect because you have
> > > CONFIG_PARAVIRT_DEBUG, let me try and untangle that PV macro maze.
> >
> > Yeah, look here:
> >
> > 0000 0000000000462149 <arch_local_save_flags>:
> > 0000 462149: ff 14 25 00 00 00 00 callq *0x0
> > 0003 46214c: R_X86_64_32S pv_ops+0x120
> > 0007 462150: c3 retq
> >
> >
> > That's exactly what I was expecting.
>
> Ah, for some reason the __always_inline version does *not* work with
> KCSAN -- I'm getting various warnings, including the same lockdep
> warning. I think there is some weirdness when this stuff gets inlined
> into instrumented functions. At least with KCSAN, when any accesses
> here are instrumented, and then KCSAN disable/enables interrupts,
> things break. So, these functions should never be instrumented,
> noinstr or not. Marking them 'inline noinstr' seems like the safest
> option. Without CONFIG_PARAVIRT_DEBUG, any compiler should hopefully
> inline them?

Oh well, it seems that KCSAN on syzbot still crashes even with this
"fix". It's harder to reproduce though, and I don't have a clear
reproducer other than "fuzz the kernel" right now. I think the new IRQ
state tracking code is still not compatible with KCSAN, even though we
thought it would be. Most likely there are still ways to get recursion
lockdep->KCSAN. An alternative would be to deal with the recursion
like we did before, instead of trying to squash all of it. I'll try to
investigate -- Peter, if you have ideas, help is appreciated.

Thanks,
-- Marco


2020-08-06 07:49:36

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers

On Wed, Aug 05, 2020 at 07:31PM +0200, Marco Elver wrote:
...
> Oh well, it seems that KCSAN on syzbot still crashes even with this
> "fix". It's harder to reproduce though, and I don't have a clear
> reproducer other than "fuzz the kernel" right now. I think the new IRQ
> state tracking code is still not compatible with KCSAN, even though we
> thought it would be. Most likely there are still ways to get recursion
> lockdep->KCSAN. An alternative would be to deal with the recursion
> like we did before, instead of trying to squash all of it. I'll try to
> investigate -- Peter, if you have ideas, help is appreciated.

Testing my hypothesis that raw then nested non-raw
local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
below. This is at least 1 case I can think of that we're bound to hit.

Thanks,
-- Marco

------ >8 ------

diff --git a/init/main.c b/init/main.c
index 15bd0efff3df..0873319dcff4 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1041,6 +1041,22 @@ asmlinkage __visible void __init start_kernel(void)
sfi_init_late();
kcsan_init();

+ /* DEBUG CODE */
+ lockdep_assert_irqs_enabled(); /* Pass. */
+ {
+ unsigned long flags1;
+ raw_local_irq_save(flags1);
+ {
+ unsigned long flags2;
+ lockdep_assert_irqs_enabled(); /* Pass - expectedly blind. */
+ local_irq_save(flags2);
+ lockdep_assert_irqs_disabled(); /* Pass. */
+ local_irq_restore(flags2);
+ }
+ raw_local_irq_restore(flags1);
+ }
+ lockdep_assert_irqs_enabled(); /* FAIL! */
+
/* Do the rest non-__init'ed, we're now alive */
arch_call_rest_init();

2020-08-06 17:05:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers

On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote:
> Testing my hypothesis that raw then nested non-raw
> local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
> below. This is at least 1 case I can think of that we're bound to hit.

Aaargh!

> diff --git a/init/main.c b/init/main.c
> index 15bd0efff3df..0873319dcff4 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1041,6 +1041,22 @@ asmlinkage __visible void __init start_kernel(void)
> sfi_init_late();
> kcsan_init();
>
> + /* DEBUG CODE */
> + lockdep_assert_irqs_enabled(); /* Pass. */
> + {
> + unsigned long flags1;
> + raw_local_irq_save(flags1);

This disables IRQs but doesn't trace..

> + {
> + unsigned long flags2;
> + lockdep_assert_irqs_enabled(); /* Pass - expectedly blind. */

Indeed, we didn't trace the above disable, so software state is still
on.

> + local_irq_save(flags2);

So here we save IRQ state, and unconditionally disable IRQs and trace
them disabled.

> + lockdep_assert_irqs_disabled(); /* Pass. */
> + local_irq_restore(flags2);

But here, we restore IRQ state to 'disabled' and explicitly trace it
disabled *again* (which is a bit daft, but whatever).

> + }
> + raw_local_irq_restore(flags1);

This then restores the IRQ state to enable, but no tracing.

> + }
> + lockdep_assert_irqs_enabled(); /* FAIL! */

And we're out of sync... :/

/me goes ponder things...

How's something like this then?

---
include/linux/sched.h | 3 ---
kernel/kcsan/core.c | 62 ++++++++++++++++++++++++++++++++++++---------------
2 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 06ec60462af0..2f5aef57e687 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1193,9 +1193,6 @@ struct task_struct {

#ifdef CONFIG_KCSAN
struct kcsan_ctx kcsan_ctx;
-#ifdef CONFIG_TRACE_IRQFLAGS
- struct irqtrace_events kcsan_save_irqtrace;
-#endif
#endif

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index 9147ff6a12e5..9c4436bf0561 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -291,17 +291,50 @@ static inline unsigned int get_delay(void)
0);
}

-void kcsan_save_irqtrace(struct task_struct *task)
+/*
+ * KCSAN hooks are everywhere, which means they're NMI like for interrupt
+ * tracing. In order to present a 'normal' as possible context to the code
+ * called by KCSAN when reporting errors we need to update the irq-tracing
+ * state.
+ *
+ * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
+ * runtime is entered for every memory access, and potentially useful
+ * information is lost if dirtied by KCSAN.
+ */
+
+struct kcsan_irq_state {
+ unsigned long flags;
+#ifdef CONFIG_TRACE_IRQFLAGS
+ int hardirqs;
+ struct irqtrace_events irqtrace;
+#endif
+};
+
+void kcsan_save_irqtrace(struct kcsan_irq_state *irq_state)
{
#ifdef CONFIG_TRACE_IRQFLAGS
- task->kcsan_save_irqtrace = task->irqtrace;
+ irq_state->irqtrace = task->irqtrace;
+ irq_state->hardirq = lockdep_hardirqs_enabled();
#endif
+ if (!kcsan_interrupt_watcher) {
+ raw_local_irq_save(irq_state->flags);
+ lockdep_hardirqs_off(CALLER_ADDR0);
+ }
}

-void kcsan_restore_irqtrace(struct task_struct *task)
+void kcsan_restore_irqtrace(struct kcsan_irq_state *irq_state)
{
+ if (!kcsan_interrupt_watcher) {
+#ifdef CONFIG_TRACE_IRQFLAGS
+ if (irq_state->hardirqs) {
+ lockdep_hardirqs_on_prepare(CALLER_ADDR0);
+ lockdep_hardirqs_on(CALLER_ADDR0);
+ }
+#endif
+ raw_local_irq_restore(irq_state->flags);
+ }
#ifdef CONFIG_TRACE_IRQFLAGS
- task->irqtrace = task->kcsan_save_irqtrace;
+ task->irqtrace = irq_state->irqtrace;
#endif
}

@@ -350,11 +383,13 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr,
flags = user_access_save();

if (consumed) {
- kcsan_save_irqtrace(current);
+ struct kcsan_irq_state irqstate;
+
+ kcsan_save_irqtrace(&irqstate);
kcsan_report(ptr, size, type, KCSAN_VALUE_CHANGE_MAYBE,
KCSAN_REPORT_CONSUMED_WATCHPOINT,
watchpoint - watchpoints);
- kcsan_restore_irqtrace(current);
+ kcsan_restore_irqtrace(&irqstate);
} else {
/*
* The other thread may not print any diagnostics, as it has
@@ -387,7 +422,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
unsigned long access_mask;
enum kcsan_value_change value_change = KCSAN_VALUE_CHANGE_MAYBE;
unsigned long ua_flags = user_access_save();
- unsigned long irq_flags = 0;
+ struct kcsan_irq_state irqstate;

/*
* Always reset kcsan_skip counter in slow-path to avoid underflow; see
@@ -412,14 +447,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
goto out;
}

- /*
- * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
- * runtime is entered for every memory access, and potentially useful
- * information is lost if dirtied by KCSAN.
- */
- kcsan_save_irqtrace(current);
- if (!kcsan_interrupt_watcher)
- local_irq_save(irq_flags);
+ kcsan_save_irqtrace(&irqstate);

watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
if (watchpoint == NULL) {
@@ -559,9 +587,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
remove_watchpoint(watchpoint);
kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS);
out_unlock:
- if (!kcsan_interrupt_watcher)
- local_irq_restore(irq_flags);
- kcsan_restore_irqtrace(current);
+ kcsan_restore_irqtrace(&irqstate);
out:
user_access_restore(ua_flags);
}


2020-08-06 17:26:59

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers

On Thu, Aug 06, 2020 at 01:32PM +0200, [email protected] wrote:
> On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote:
> > Testing my hypothesis that raw then nested non-raw
> > local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
> > below. This is at least 1 case I can think of that we're bound to hit.
...
>
> /me goes ponder things...
>
> How's something like this then?
>
> ---
> include/linux/sched.h | 3 ---
> kernel/kcsan/core.c | 62 ++++++++++++++++++++++++++++++++++++---------------
> 2 files changed, 44 insertions(+), 21 deletions(-)

Thank you! That approach seems to pass syzbot (also with
CONFIG_PARAVIRT) and kcsan-test tests.

I had to modify it some, so that report.c's use of the restore logic
works and not mess up the IRQ trace printed on KCSAN reports (with
CONFIG_KCSAN_VERBOSE).

I still need to fully convince myself all is well now and we don't end
up with more fixes. :-) If it passes further testing, I'll send it as a
real patch (I want to add you as Co-developed-by, but would need your
Signed-off-by for the code you pasted, I think.)

Thanks,
-- Marco

------ >8 ------

diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index 9147ff6a12e5..b1d5dca10aa5 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -4,6 +4,7 @@
#include <linux/bug.h>
#include <linux/delay.h>
#include <linux/export.h>
+#include <linux/ftrace.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/list.h>
@@ -291,13 +292,28 @@ static inline unsigned int get_delay(void)
0);
}

-void kcsan_save_irqtrace(struct task_struct *task)
-{
+/*
+ * KCSAN instrumentation is everywhere, which means we must treat the hooks
+ * NMI-like for interrupt tracing. In order to present a 'normal' as possible
+ * context to the code called by KCSAN when reporting errors we need to update
+ * the IRQ-tracing state.
+ *
+ * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
+ * runtime is entered for every memory access, and potentially useful
+ * information is lost if dirtied by KCSAN.
+ */
+
+struct kcsan_irq_state {
+ unsigned long flags;
#ifdef CONFIG_TRACE_IRQFLAGS
- task->kcsan_save_irqtrace = task->irqtrace;
+ int hardirqs;
#endif
-}
+};

+/*
+ * This is also called by the reporting task for the other task, to generate the
+ * right report with CONFIG_KCSAN_VERBOSE. No harm in restoring more than once.
+ */
void kcsan_restore_irqtrace(struct task_struct *task)
{
#ifdef CONFIG_TRACE_IRQFLAGS
@@ -305,6 +321,34 @@ void kcsan_restore_irqtrace(struct task_struct *task)
#endif
}

+static void kcsan_irq_save(struct kcsan_irq_state *irq_state) {
+#ifdef CONFIG_TRACE_IRQFLAGS
+ current->kcsan_save_irqtrace = current->irqtrace;
+ irq_state->hardirqs = lockdep_hardirqs_enabled();
+#endif
+ if (!kcsan_interrupt_watcher) {
+ raw_local_irq_save(irq_state->flags);
+ kcsan_disable_current(); /* Lockdep might WARN. */
+ lockdep_hardirqs_off(CALLER_ADDR0);
+ kcsan_enable_current();
+ }
+}
+
+static void kcsan_irq_restore(struct kcsan_irq_state *irq_state) {
+ if (!kcsan_interrupt_watcher) {
+#ifdef CONFIG_TRACE_IRQFLAGS
+ if (irq_state->hardirqs) {
+ kcsan_disable_current(); /* Lockdep might WARN. */
+ lockdep_hardirqs_on_prepare(CALLER_ADDR0);
+ lockdep_hardirqs_on(CALLER_ADDR0);
+ kcsan_enable_current();
+ }
+#endif
+ raw_local_irq_restore(irq_state->flags);
+ }
+ kcsan_restore_irqtrace(current);
+}
+
/*
* Pull everything together: check_access() below contains the performance
* critical operations; the fast-path (including check_access) functions should
@@ -350,11 +394,13 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr,
flags = user_access_save();

if (consumed) {
- kcsan_save_irqtrace(current);
+ struct kcsan_irq_state irqstate;
+
+ kcsan_irq_save(&irqstate);
kcsan_report(ptr, size, type, KCSAN_VALUE_CHANGE_MAYBE,
KCSAN_REPORT_CONSUMED_WATCHPOINT,
watchpoint - watchpoints);
- kcsan_restore_irqtrace(current);
+ kcsan_irq_restore(&irqstate);
} else {
/*
* The other thread may not print any diagnostics, as it has
@@ -387,7 +433,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
unsigned long access_mask;
enum kcsan_value_change value_change = KCSAN_VALUE_CHANGE_MAYBE;
unsigned long ua_flags = user_access_save();
- unsigned long irq_flags = 0;
+ struct kcsan_irq_state irqstate;

/*
* Always reset kcsan_skip counter in slow-path to avoid underflow; see
@@ -412,14 +458,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
goto out;
}

- /*
- * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
- * runtime is entered for every memory access, and potentially useful
- * information is lost if dirtied by KCSAN.
- */
- kcsan_save_irqtrace(current);
- if (!kcsan_interrupt_watcher)
- local_irq_save(irq_flags);
+ kcsan_irq_save(&irqstate);

watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
if (watchpoint == NULL) {
@@ -559,9 +598,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
remove_watchpoint(watchpoint);
kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS);
out_unlock:
- if (!kcsan_interrupt_watcher)
- local_irq_restore(irq_flags);
- kcsan_restore_irqtrace(current);
+ kcsan_irq_restore(&irqstate);
out:
user_access_restore(ua_flags);
}
diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
index 29480010dc30..6eb35a9514d8 100644
--- a/kernel/kcsan/kcsan.h
+++ b/kernel/kcsan/kcsan.h
@@ -24,9 +24,8 @@ extern unsigned int kcsan_udelay_interrupt;
extern bool kcsan_enabled;

/*
- * Save/restore IRQ flags state trace dirtied by KCSAN.
+ * Restore IRQ flags state trace dirtied by KCSAN.
*/
-void kcsan_save_irqtrace(struct task_struct *task);
void kcsan_restore_irqtrace(struct task_struct *task);

/*