2020-08-20 07:53:21

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 0/9] TRACE_IRQFLAGS wreckage


TRACE_IRQFLAGS

local_irq_*() keeps a software state that mirrors the hardware state,
used for lockdep, includes tracepoints.

raw_local_irq_*() does not update the software state, no tracing.

---

Problem 1:

raw_local_irq_save(); // software state on
local_irq_save(); // software state off
...
local_irq_restore(); // software state still off, because we don't enable IRQs
raw_local_irq_restore(); // software state still off, *whoopsie*

existing instances:

- lock_acquire()
raw_local_irq_save()
__lock_acquire()
arch_spin_lock(&graph_lock)
pv_wait() := kvm_wait() (same or worse for Xen/HyperV)
local_irq_save()

- trace_clock_global()
raw_local_irq_save()
arch_spin_lock()
pv_wait() := kvm_wait()
local_irq_save()

- apic_retrigger_irq()
raw_local_irq_save()
apic->send_IPI() := default_send_IPI_single_phys()
local_irq_save()

Possible solutions:

A) make it work by enabling the tracing inside raw_*()
B) make it work by keeping tracing disabled inside raw_*()
C) call it broken and clean it up now

Now, given that the only reason to use the raw_* variant is because you don't
want tracing. Therefore A) seems like a weird option (although it can be done).
C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just
because there is one raw user, this strips the validation/tracing off for all
the other users.

So we pick B) and declare any code that ends up doing:

raw_local_irq_save()
local_irq_save()
lockdep_assert_irqs_disabled();

broken. AFAICT this problem has existed forever, the only reason it came
up is because I changed IRQ tracing vs lockdep recursion and the first
instance is fairly common, the other cases hardly ever happen.

---

Problem 2:

raw_local_irq_save(); // software state on
trace_*()
...
perf_tp_event()
...
perf_callchain()
<#PF>
trace_hardirqs_off(); // software state off
...
if (regs_irqs_disabled(regs)) // false
trace_hardirqs_on();
</#PF>
raw_local_irq_restore(); // software state stays off, *whoopsie*

existing instances:

- lock_acquire() / lock_release()
raw_local_irq_save()
trace_lock_acquire() / trace_lock_release()

- function tracing

Possible solutions:

A) fix every architecture's entry code
B) only fix kernel/entry/common.c
C) fix lockdep tracepoints and pray

This series does C, AFAICT this problem has existed forever.

---

Problem 3:

raw_local_irq_save(); // software state on
<#NMI>
trace_hardirqs_off(); // software state off
...
if (regs_irqs_disabled(regs)) // false
trace_hardirqs_on();
</#NMI>
raw_local_irq_restore(); // software state stays off, *whoopsie*

Possible solutions:

This *should* not be a problem if an architecture has it's entry ordering
right. In particular we rely on the architecture doing nmi_enter() before
trace_hardirqs_off().

In that case, in_nmi() will be true, and lockdep_hardirqs_*() should NO-OP,
except if CONFIG_TRACE_IRQFLAGS_NMI (x86).

There might be a problem with using lockdep_assert_irqs_disabled() from NMI
context, if so, those needs a little TLC.

---

The patches in this series do (in reverse order):

- 2C
- 1B
- fix fallout in idle due to the trace_lock_*() tracepoints suddenly
being visible to rcu-lockdep.

---
arch/arm/mach-omap2/pm34xx.c | 4 --
arch/arm64/kernel/process.c | 2 -
arch/nds32/include/asm/irqflags.h | 5 ++
arch/powerpc/include/asm/hw_irq.h | 11 ++---
arch/s390/kernel/idle.c | 3 -
arch/x86/entry/thunk_32.S | 5 --
arch/x86/include/asm/mmu.h | 1
arch/x86/kernel/process.c | 4 --
arch/x86/mm/tlb.c | 13 +-----
drivers/cpuidle/cpuidle.c | 19 +++++++--
drivers/idle/intel_idle.c | 16 --------
include/linux/cpuidle.h | 13 +++---
include/linux/irqflags.h | 73 ++++++++++++++++++++------------------
include/linux/lockdep.h | 18 ++++++---
include/linux/mmu_context.h | 5 ++
kernel/locking/lockdep.c | 18 +++++----
kernel/sched/idle.c | 21 ++++------
17 files changed, 112 insertions(+), 119 deletions(-)




2020-08-20 14:38:11

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/9] TRACE_IRQFLAGS wreckage


I tested this series on top of tip/master and triggered the below
warning when running the irqsoff tracer boot up test (config attached).

-- Steve

Testing tracer irqsoff:

=============================
WARNING: suspicious RCU usage
5.9.0-rc1-test+ #92 Not tainted
-----------------------------
include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/2/0.

stack backtrace:
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.9.0-rc1-test+ #92
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
Call Trace:
dump_stack+0x8d/0xc0
lock_acquire.cold+0x23/0x2c
? default_idle_call+0x4d/0x210
_raw_spin_lock_irqsave+0x50/0xa0
? check_critical_timing+0x69/0x160
check_critical_timing+0x69/0x160
? default_idle_call+0x4d/0x210
stop_critical_timings+0xdd/0xf0
default_idle_call+0x4d/0x210
do_idle+0x1f6/0x260
cpu_startup_entry+0x19/0x20
start_secondary+0x110/0x140
secondary_startup_64+0xb6/0xc0

2020-08-20 14:41:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/9] TRACE_IRQFLAGS wreckage

On Thu, 20 Aug 2020 10:36:43 -0400
Steven Rostedt <[email protected]> wrote:

> I tested this series on top of tip/master and triggered the below
> warning when running the irqsoff tracer boot up test (config attached).

Now attaching config!

-- Steve


Attachments:
(No filename) (264.00 B)
config (154.32 kB)
Download all attachments

2020-08-20 14:51:37

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 0/9] TRACE_IRQFLAGS wreckage

On Thu, Aug 20, 2020 at 10:36AM -0400, Steven Rostedt wrote:
>
> I tested this series on top of tip/master and triggered the below
> warning when running the irqsoff tracer boot up test (config attached).
>
> -- Steve
>
> Testing tracer irqsoff:
>
> =============================
> WARNING: suspicious RCU usage
> 5.9.0-rc1-test+ #92 Not tainted
> -----------------------------
> include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 2, debug_locks = 1
> RCU used illegally from extended quiescent state!
> no locks held by swapper/2/0.
>
> stack backtrace:
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.9.0-rc1-test+ #92
> Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
> Call Trace:
> dump_stack+0x8d/0xc0
> lock_acquire.cold+0x23/0x2c
> ? default_idle_call+0x4d/0x210
> _raw_spin_lock_irqsave+0x50/0xa0
> ? check_critical_timing+0x69/0x160
> check_critical_timing+0x69/0x160
> ? default_idle_call+0x4d/0x210
> stop_critical_timings+0xdd/0xf0
> default_idle_call+0x4d/0x210
> do_idle+0x1f6/0x260
> cpu_startup_entry+0x19/0x20
> start_secondary+0x110/0x140
> secondary_startup_64+0xb6/0xc0

Ah, I was just trying to figure out if it's due to KCSAN instrumentation
or not. I ran into a similar warning due to tracing of prandom_u32(),
that also triggered lockdep warnings (all warnings attached).

Otherwise, this survived syzkaller testing until I enabled KCSAN. :-/

The first thing KCSAN wants is to just guard against recursion due to
tracing even if KCSAN instrumentation is disabled for the function. This
recursion seems to be new.

I got further with the below patch for KCSAN. Unfortunately there are a
bunch of merge conflicts with latest KCSAN on -rcu tree if you were to
attach the patch to your series, so not sure what the best strategy here
is.

Config is also attached.

Thanks,
-- Marco

------ >8 ------

From 5b12649cbcf0c7d164e6995c40ce88a4e3e86f28 Mon Sep 17 00:00:00 2001
From: Marco Elver <[email protected]>
Date: Thu, 6 Aug 2020 14:19:10 +0200
Subject: [PATCH] kcsan: Avoid recursion due to tracing

We can observe recursion due to calling library functions that are being
traced from kcsan_setup_watchpoint(). This can happen even though
instrumentation is disabled for such functions.

For example in lib/random32.o (where instrumentation is disabled), we
may still get recursion with recent changes to tracing:

prandom_u32+0x11/0xe0
kcsan_setup_watchpoint+0x32/0x580
rcu_is_watching+0x21/0x60
rcu_read_lock_sched_held+0x26/0xa0
trace_prandom_u32+0x7e/0x190
prandom_u32+0xa8/0xe0
kcsan_setup_watchpoint+0x32/0x580
rcu_is_watching+0x21/0x60
rcu_read_lock_sched_held+0x26/0xa0
trace_prandom_u32+0x7e/0x190
prandom_u32+0xa8/0xe0
kcsan_setup_watchpoint+0x32/0x580
...

Avoid the recursion by disabling KCSAN in the following regions in
kcsan_setup_watchpoint():

- after checking if KCSAN is enabled, to before delay initiation;
- after the delay, until the end of the function.

The exception that remains is udelay(), where we want to keep KCSAN
enabled as otherwise we would miss races between nested interrupts.

Signed-off-by: Marco Elver <[email protected]>
---
kernel/kcsan/core.c | 52 +++++++++++++++++++++++++++++++++------------
1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index 9147ff6a12e5..c5a295afdb94 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -261,14 +261,15 @@ should_watch(const volatile void *ptr, size_t size, int type, struct kcsan_ctx *
return false;

/*
- * NOTE: If we get here, kcsan_skip must always be reset in slow path
- * via reset_kcsan_skip() to avoid underflow.
+ * Note: If we get here, kcsan_skip must always be reset in slow path to
+ * avoid underflow.
*/

/* this operation should be watched */
return true;
}

+/* Requires: KCSAN is disabled, to avoid prandom_u32() recursion. */
static inline void reset_kcsan_skip(void)
{
long skip_count = kcsan_skip_watch -
@@ -283,12 +284,23 @@ static __always_inline bool kcsan_is_enabled(void)
return READ_ONCE(kcsan_enabled) && get_ctx()->disable_count == 0;
}

-static inline unsigned int get_delay(void)
+/*
+ * Introduce delay depending on context and configuration.
+ * Requires: KCSAN is disabled, to avoid prandom_u32() recursion.
+ */
+static void delay_access(void)
{
unsigned int delay = in_task() ? kcsan_udelay_task : kcsan_udelay_interrupt;
- return delay - (IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
- prandom_u32_max(delay) :
- 0);
+
+ /*
+ * Call prandom_u32_max() with KCSAN disabled, as tracing may be
+ * enabled, which would cause recursion.
+ */
+ delay -= IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ? prandom_u32_max(delay) : 0;
+
+ kcsan_enable_current();
+ udelay(delay);
+ kcsan_disable_current();
}

void kcsan_save_irqtrace(struct task_struct *task)
@@ -386,17 +398,30 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
} expect_value;
unsigned long access_mask;
enum kcsan_value_change value_change = KCSAN_VALUE_CHANGE_MAYBE;
- unsigned long ua_flags = user_access_save();
+ unsigned long ua_flags = 0;
unsigned long irq_flags = 0;

/*
* Always reset kcsan_skip counter in slow-path to avoid underflow; see
- * should_watch().
+ * should_watch(). This write is redundant if KCSAN is enabled, as we
+ * call reset_kcsan_skip() below, yet we cannot call reset_kcsan_skip()
+ * here to avoid prandom_u32() recursion.
*/
- reset_kcsan_skip();
-
+ this_cpu_write(kcsan_skip, kcsan_skip_watch);
if (!kcsan_is_enabled())
- goto out;
+ return;
+
+ ua_flags = user_access_save();
+
+ /*
+ * Disable KCSAN to ensure that library functions we call do not result
+ * in recursion. This can still occur even if instrumentation is
+ * disabled for these functions, but tracing is enabled.
+ */
+ kcsan_disable_current();
+
+ /* Randomized reset, after enabled-check to avoid recursion. */
+ reset_kcsan_skip();

/*
* Special atomic rules: unlikely to be true, so we check them here in
@@ -458,19 +483,17 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
}

if (IS_ENABLED(CONFIG_KCSAN_DEBUG)) {
- kcsan_disable_current();
pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
is_write ? "write" : "read", size, ptr,
watchpoint_slot((unsigned long)ptr),
encode_watchpoint((unsigned long)ptr, size, is_write));
- kcsan_enable_current();
}

/*
* Delay this thread, to increase probability of observing a racy
* conflicting access.
*/
- udelay(get_delay());
+ delay_access();

/*
* Re-read value, and check if it is as expected; if not, we infer a
@@ -563,6 +586,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
local_irq_restore(irq_flags);
kcsan_restore_irqtrace(current);
out:
+ kcsan_enable_current_nowarn();
user_access_restore(ua_flags);
}

--
2.28.0.297.g1956fa8f8d-goog


Attachments:
(No filename) (7.17 kB)
dmesg.txt (17.85 kB)
.config (213.26 kB)
Download all attachments

2020-08-20 15:03:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/9] TRACE_IRQFLAGS wreckage

On Thu, Aug 20, 2020 at 10:36:43AM -0400, Steven Rostedt wrote:
>
> I tested this series on top of tip/master and triggered the below
> warning when running the irqsoff tracer boot up test (config attached).
>
> -- Steve
>
> Testing tracer irqsoff:
>
> =============================
> WARNING: suspicious RCU usage
> 5.9.0-rc1-test+ #92 Not tainted
> -----------------------------
> include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 2, debug_locks = 1
> RCU used illegally from extended quiescent state!
> no locks held by swapper/2/0.
>
> stack backtrace:
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.9.0-rc1-test+ #92
> Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
> Call Trace:
> dump_stack+0x8d/0xc0
> lock_acquire.cold+0x23/0x2c
> ? default_idle_call+0x4d/0x210
> _raw_spin_lock_irqsave+0x50/0xa0
> ? check_critical_timing+0x69/0x160
> check_critical_timing+0x69/0x160
> ? default_idle_call+0x4d/0x210
> stop_critical_timings+0xdd/0xf0
> default_idle_call+0x4d/0x210
> do_idle+0x1f6/0x260
> cpu_startup_entry+0x19/0x20
> start_secondary+0x110/0x140
> secondary_startup_64+0xb6/0xc0

Shiny, I think that wants something like the below, but let me go frob
my config and test it.

---
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -151,8 +151,8 @@ static void enter_s2idle_proper(struct c
* cpuidle mechanism enables interrupts and doing that with timekeeping
* suspended is generally unsafe.
*/
- rcu_idle_enter();
stop_critical_timings();
+ rcu_idle_enter();
drv->states[index].enter_s2idle(dev, drv, index);
if (WARN_ON_ONCE(!irqs_disabled()))
local_irq_disable();
@@ -164,8 +164,8 @@ static void enter_s2idle_proper(struct c
* FIXME, order against rcu_idle_exit ?
*/
RCU_NONIDLE(tick_unfreeze());
- start_critical_timings();
rcu_idle_exit();
+ start_critical_timings();

time_end = ns_to_ktime(local_clock());

@@ -236,11 +236,11 @@ int cpuidle_enter_state(struct cpuidle_d
trace_cpu_idle(index, dev->cpu);
time_start = ns_to_ktime(local_clock());

- rcu_idle_enter();
stop_critical_timings();
+ rcu_idle_enter();
entered_state = target_state->enter(dev, drv, index);
- start_critical_timings();
rcu_idle_exit();
+ start_critical_timings();

sched_clock_idle_wakeup_event();
time_end = ns_to_ktime(local_clock());
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -55,16 +55,16 @@ __setup("hlt", cpu_idle_nopoll_setup);
static noinline int __cpuidle cpu_idle_poll(void)
{
trace_cpu_idle(0, smp_processor_id());
+ stop_critical_timings();
rcu_idle_enter();
local_irq_enable();
- stop_critical_timings();

while (!tif_need_resched() &&
(cpu_idle_force_poll || tick_check_broadcast_expired()))
cpu_relax();

- start_critical_timings();
rcu_idle_exit();
+ start_critical_timings();
trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());

return 1;
@@ -91,11 +91,11 @@ void __cpuidle default_idle_call(void)
if (current_clr_polling_and_test()) {
local_irq_enable();
} else {
- rcu_idle_enter();
stop_critical_timings();
+ rcu_idle_enter();
arch_cpu_idle();
- start_critical_timings();
rcu_idle_exit();
+ start_critical_timings();
}
}

2020-08-20 16:58:00

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/9] TRACE_IRQFLAGS wreckage

On Thu, 20 Aug 2020 16:58:21 +0200
[email protected] wrote:

> @@ -91,11 +91,11 @@ void __cpuidle default_idle_call(void)
> if (current_clr_polling_and_test()) {
> local_irq_enable();
> } else {
> - rcu_idle_enter();
> stop_critical_timings();
> + rcu_idle_enter();
> arch_cpu_idle();
> - start_critical_timings();
> rcu_idle_exit();
> + start_critical_timings();
> }
> }
>

tip/master didn't have the above function and instead I had this:

@@ -93,11 +93,11 @@ void __cpuidle default_idle_call(void)
} else {

trace_cpu_idle(1, smp_processor_id());
- rcu_idle_enter();
stop_critical_timings();
+ rcu_idle_enter();
arch_cpu_idle();
- start_critical_timings();
rcu_idle_exit();
+ start_critical_timings();
trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
}
}

But it booted to completion without warning with this patch applied.

Reviewed-by: Steven Rostedt (VMware) <[email protected]>

For the entire series, with this update.

-- Steve

2020-08-20 17:22:33

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 0/9] TRACE_IRQFLAGS wreckage

On Thu, Aug 20, 2020 at 04:58PM +0200, [email protected] wrote:
> On Thu, Aug 20, 2020 at 10:36:43AM -0400, Steven Rostedt wrote:
> >
> > I tested this series on top of tip/master and triggered the below
> > warning when running the irqsoff tracer boot up test (config attached).
> >
> > -- Steve
> >
> > Testing tracer irqsoff:
> >
> > =============================
> > WARNING: suspicious RCU usage
> > 5.9.0-rc1-test+ #92 Not tainted
> > -----------------------------
> > include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
...
>
> Shiny, I think that wants something like the below, but let me go frob
> my config and test it.
>
> ---
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
...

With that applied (manually, due to conflicts), I still get warnings for
certain call locations with KCSAN on (that is with my fix from the other
email):

| =============================
| WARNING: suspicious RCU usage
| 5.9.0-rc1+ #23 Tainted: G W
| -----------------------------
| include/trace/events/random.h:310 suspicious rcu_dereference_check() usage!
|
| other info that might help us debug this:
|
|
| rcu_scheduler_active = 2, debug_locks = 0
| RCU used illegally from extended quiescent state!
| no locks held by swapper/1/0.
|
| stack backtrace:
| CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 5.9.0-rc1+ #23
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
| Call Trace:
| __dump_stack lib/dump_stack.c:77 [inline]
| dump_stack+0xf1/0x14d lib/dump_stack.c:118
| trace_prandom_u32 include/trace/events/random.h:310 [inline]
| prandom_u32+0x1ee/0x200 lib/random32.c:86
| prandom_u32_max include/linux/prandom.h:46 [inline]
| reset_kcsan_skip kernel/kcsan/core.c:277 [inline]
| kcsan_setup_watchpoint+0x9b/0x600 kernel/kcsan/core.c:424
| is_idle_task+0xd/0x20 include/linux/sched.h:1671 <==== inline, but not noinstr
| irqentry_enter+0x17/0x50 kernel/entry/common.c:293 <==== noinstr function

We can fix that with the patch below.

That leaves me with this, still due to prandom_u32(). :-/

| ------------[ cut here ]------------
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 4 PID: 1861 at kernel/locking/lockdep.c:4875 check_flags.part.0+0x157/0x160 kernel/locking/lockdep.c:4875
| Modules linked in:
| CPU: 4 PID: 1861 Comm: kworker/u16:4 Not tainted 5.9.0-rc1+ #24
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
| RIP: 0010:check_flags.part.0+0x157/0x160 kernel/locking/lockdep.c:4875
| Code: c0 0f 84 70 5d 00 00 44 8b 0d fd 11 5f 06 45 85 c9 0f 85 60 5d 00 00 48 c7 c6 3e d0 f4 86 48 c7 c7 b2 49 f3 86 e8 8d 49 f6 ff <0f> 0b e9 46 5d 00 00 66 90 41 57 41 56 49 89 fe 41 55 41 89 d5 41
| RSP: 0000:ffffc900034bfcb0 EFLAGS: 00010082
| RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8136161c
| RDX: ffff88881a9dcb00 RSI: ffffffff81363835 RDI: 0000000000000006
| RBP: ffffc900034bfd00 R08: 0000000000000000 R09: 0000ffffffffffff
| R10: 0000000000000104 R11: 0000ffff874efd6b R12: ffffffff874f26c0
| R13: 0000000000000244 R14: 0000000000000000 R15: 0000000000000046
| FS: 0000000000000000(0000) GS:ffff88881fc00000(0000) knlGS:0000000000000000
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 0000000000000000 CR3: 0000000007489001 CR4: 0000000000770ee0
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
| PKRU: 55555554
| Call Trace:
| check_flags kernel/locking/lockdep.c:4871 [inline]
| lock_is_held_type+0x42/0x100 kernel/locking/lockdep.c:5042
| lock_is_held include/linux/lockdep.h:267 [inline]
| rcu_read_lock_sched_held+0x41/0x80 kernel/rcu/update.c:136
| trace_prandom_u32 include/trace/events/random.h:310 [inline]
| prandom_u32+0x1bb/0x200 lib/random32.c:86
| prandom_u32_max include/linux/prandom.h:46 [inline]
| reset_kcsan_skip kernel/kcsan/core.c:277 [inline]
| kcsan_setup_watchpoint+0x9b/0x600 kernel/kcsan/core.c:424
| perf_lock_task_context+0x5e3/0x6e0 kernel/events/core.c:1491
| perf_pin_task_context kernel/events/core.c:1506 [inline]
| perf_event_exit_task_context kernel/events/core.c:12284 [inline]
| perf_event_exit_task+0x1e2/0x910 kernel/events/core.c:12364
| do_exit+0x70e/0x18b0 kernel/exit.c:815
| call_usermodehelper_exec_async+0x2e2/0x2f0 kernel/umh.c:114
| ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
| irq event stamp: 107
| hardirqs last enabled at (107): [<ffffffff815532ab>] perf_lock_task_context+0x5db/0x6e0 kernel/events/core.c:1491
| hardirqs last disabled at (106): [<ffffffff81552f12>] perf_lock_task_context+0x242/0x6e0 kernel/events/core.c:1459
| softirqs last enabled at (0): [<ffffffff8129b95e>] copy_process+0xe9e/0x3970 kernel/fork.c:2004
| softirqs last disabled at (0): [<0000000000000000>] 0x0
| ---[ end trace a3058d9b157af5c4 ]---
| possible reason: unannotated irqs-off.
| irq event stamp: 107
| hardirqs last enabled at (107): [<ffffffff815532ab>] perf_lock_task_context+0x5db/0x6e0 kernel/events/core.c:1491
| hardirqs last disabled at (106): [<ffffffff81552f12>] perf_lock_task_context+0x242/0x6e0 kernel/events/core.c:1459
| softirqs last enabled at (0): [<ffffffff8129b95e>] copy_process+0xe9e/0x3970 kernel/fork.c:2004
| softirqs last disabled at (0): [<0000000000000000>] 0x0

Suggestions?

Thanks,
-- Marco

------ >8 ------

From 4ec9dd472c978e1eba622fb22bc04e4357f10421 Mon Sep 17 00:00:00 2001
From: Marco Elver <[email protected]>
Date: Thu, 20 Aug 2020 19:06:09 +0200
Subject: [PATCH] sched: Turn inline into __always_inline due to noinstr use

is_idle_task() may be used from noinstr functions such as
irqentry_enter(). Since the compiler is free to not inline regular
inline functions, switch to using __always_inline.

Signed-off-by: Marco Elver <[email protected]>
---
include/linux/sched.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93ecd930efd3..afe01e232935 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1666,7 +1666,7 @@ extern struct task_struct *idle_task(int cpu);
*
* Return: 1 if @p is an idle task. 0 otherwise.
*/
-static inline bool is_idle_task(const struct task_struct *p)
+static __always_inline bool is_idle_task(const struct task_struct *p)
{
return !!(p->flags & PF_IDLE);
}
--
2.28.0.297.g1956fa8f8d-goog

2020-08-20 21:48:11

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/9] TRACE_IRQFLAGS wreckage

On Thu, 20 Aug 2020 19:20:46 +0200
Marco Elver <[email protected]> wrote:

> On Thu, Aug 20, 2020 at 04:58PM +0200, [email protected] wrote:
> > On Thu, Aug 20, 2020 at 10:36:43AM -0400, Steven Rostedt wrote:
> > >
> > > I tested this series on top of tip/master and triggered the below
> > > warning when running the irqsoff tracer boot up test (config attached).
> > >
> > > -- Steve
> > >
> > > Testing tracer irqsoff:
> > >
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 5.9.0-rc1-test+ #92 Not tainted
> > > -----------------------------
> > > include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
> ...
> >
> > Shiny, I think that wants something like the below, but let me go frob
> > my config and test it.
> >
> > ---
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> ...
>
> With that applied (manually, due to conflicts), I still get warnings for
> certain call locations with KCSAN on (that is with my fix from the other
> email):
>
> | =============================
> | WARNING: suspicious RCU usage
> | 5.9.0-rc1+ #23 Tainted: G W
> | -----------------------------
> | include/trace/events/random.h:310 suspicious rcu_dereference_check() usage!
> |
> | other info that might help us debug this:
> |
> |
> | rcu_scheduler_active = 2, debug_locks = 0
> | RCU used illegally from extended quiescent state!
> | no locks held by swapper/1/0.
> |
> | stack backtrace:
> | CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 5.9.0-rc1+ #23
> | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> | Call Trace:
> | __dump_stack lib/dump_stack.c:77 [inline]
> | dump_stack+0xf1/0x14d lib/dump_stack.c:118
> | trace_prandom_u32 include/trace/events/random.h:310 [inline]
> | prandom_u32+0x1ee/0x200 lib/random32.c:86
> | prandom_u32_max include/linux/prandom.h:46 [inline]
> | reset_kcsan_skip kernel/kcsan/core.c:277 [inline]
> | kcsan_setup_watchpoint+0x9b/0x600 kernel/kcsan/core.c:424
> | is_idle_task+0xd/0x20 include/linux/sched.h:1671 <==== inline, but not noinstr
> | irqentry_enter+0x17/0x50 kernel/entry/common.c:293 <==== noinstr function
>

What happens if you apply the below patch?

-- Steve

diff --git a/lib/random32.c b/lib/random32.c
index 932345323af0..1c5607a411d4 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -83,7 +83,7 @@ u32 prandom_u32(void)
u32 res;

res = prandom_u32_state(state);
- trace_prandom_u32(res);
+ trace_prandom_u32_rcuidle(res);
put_cpu_var(net_rand_state);

return res;

2020-08-21 06:38:54

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 0/9] TRACE_IRQFLAGS wreckage

On Thu, Aug 20, 2020 at 03:59PM -0400, Steven Rostedt wrote:
> On Thu, 20 Aug 2020 19:20:46 +0200
> Marco Elver <[email protected]> wrote:
>
> > On Thu, Aug 20, 2020 at 04:58PM +0200, [email protected] wrote:
> > > On Thu, Aug 20, 2020 at 10:36:43AM -0400, Steven Rostedt wrote:
> > > >
> > > > I tested this series on top of tip/master and triggered the below
> > > > warning when running the irqsoff tracer boot up test (config attached).
> > > >
> > > > -- Steve
> > > >
> > > > Testing tracer irqsoff:
> > > >
> > > > =============================
> > > > WARNING: suspicious RCU usage
> > > > 5.9.0-rc1-test+ #92 Not tainted
> > > > -----------------------------
> > > > include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
> > ...
> > >
> > > Shiny, I think that wants something like the below, but let me go frob
> > > my config and test it.
> > >
> > > ---
> > > --- a/drivers/cpuidle/cpuidle.c
> > > +++ b/drivers/cpuidle/cpuidle.c
> > ...
> >
> > With that applied (manually, due to conflicts), I still get warnings for
> > certain call locations with KCSAN on (that is with my fix from the other
> > email):
> >
> > | =============================
> > | WARNING: suspicious RCU usage
> > | 5.9.0-rc1+ #23 Tainted: G W
> > | -----------------------------
> > | include/trace/events/random.h:310 suspicious rcu_dereference_check() usage!
> > |
> > | other info that might help us debug this:
> > |
> > |
> > | rcu_scheduler_active = 2, debug_locks = 0
> > | RCU used illegally from extended quiescent state!
> > | no locks held by swapper/1/0.
> > |
> > | stack backtrace:
> > | CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 5.9.0-rc1+ #23
> > | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > | Call Trace:
> > | __dump_stack lib/dump_stack.c:77 [inline]
> > | dump_stack+0xf1/0x14d lib/dump_stack.c:118
> > | trace_prandom_u32 include/trace/events/random.h:310 [inline]
> > | prandom_u32+0x1ee/0x200 lib/random32.c:86
> > | prandom_u32_max include/linux/prandom.h:46 [inline]
> > | reset_kcsan_skip kernel/kcsan/core.c:277 [inline]
> > | kcsan_setup_watchpoint+0x9b/0x600 kernel/kcsan/core.c:424
> > | is_idle_task+0xd/0x20 include/linux/sched.h:1671 <==== inline, but not noinstr
> > | irqentry_enter+0x17/0x50 kernel/entry/common.c:293 <==== noinstr function
> >
>
> What happens if you apply the below patch?
>
> -- Steve
>
> diff --git a/lib/random32.c b/lib/random32.c
> index 932345323af0..1c5607a411d4 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -83,7 +83,7 @@ u32 prandom_u32(void)
> u32 res;
>
> res = prandom_u32_state(state);
> - trace_prandom_u32(res);
> + trace_prandom_u32_rcuidle(res);
> put_cpu_var(net_rand_state);
>
> return res;

Thank you, this resolves the problem. It also works if I remove my 2
other patches (for now, I think I still need the recursion-guard but
it's not urgent, will send that separately).

And my apologies, it seems 5.9-rc1 is already broken, which I should
have noticed. I sent a separate patch, which should be picked up into
5.9: https://lkml.kernel.org/r/[email protected]

With that fix + start_critical_timings-switcheroo, this series is:

Tested-by: Marco Elver <[email protected]>

Thanks,
-- Marco

2020-08-21 06:56:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/9] TRACE_IRQFLAGS wreckage

On Thu, Aug 20, 2020 at 07:20:46PM +0200, Marco Elver wrote:
> From 4ec9dd472c978e1eba622fb22bc04e4357f10421 Mon Sep 17 00:00:00 2001
> From: Marco Elver <[email protected]>
> Date: Thu, 20 Aug 2020 19:06:09 +0200
> Subject: [PATCH] sched: Turn inline into __always_inline due to noinstr use
>
> is_idle_task() may be used from noinstr functions such as
> irqentry_enter(). Since the compiler is free to not inline regular
> inline functions, switch to using __always_inline.
>
> Signed-off-by: Marco Elver <[email protected]>

Fair enough I suppose; didn't that generate an objtool warning on your
build? (of course that requires PARAVIRT=n, I should get back to fixing
that)

> ---
> include/linux/sched.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 93ecd930efd3..afe01e232935 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1666,7 +1666,7 @@ extern struct task_struct *idle_task(int cpu);
> *
> * Return: 1 if @p is an idle task. 0 otherwise.
> */
> -static inline bool is_idle_task(const struct task_struct *p)
> +static __always_inline bool is_idle_task(const struct task_struct *p)
> {
> return !!(p->flags & PF_IDLE);
> }
> --
> 2.28.0.297.g1956fa8f8d-goog

2020-08-21 07:09:59

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 0/9] TRACE_IRQFLAGS wreckage

On Fri, 21 Aug 2020 at 08:54, <[email protected]> wrote:
>
> On Thu, Aug 20, 2020 at 07:20:46PM +0200, Marco Elver wrote:
> > From 4ec9dd472c978e1eba622fb22bc04e4357f10421 Mon Sep 17 00:00:00 2001
> > From: Marco Elver <[email protected]>
> > Date: Thu, 20 Aug 2020 19:06:09 +0200
> > Subject: [PATCH] sched: Turn inline into __always_inline due to noinstr use
> >
> > is_idle_task() may be used from noinstr functions such as
> > irqentry_enter(). Since the compiler is free to not inline regular
> > inline functions, switch to using __always_inline.
> >
> > Signed-off-by: Marco Elver <[email protected]>
>
> Fair enough I suppose; didn't that generate an objtool warning on your
> build? (of course that requires PARAVIRT=n, I should get back to fixing
> that)

It does now after turning on VMLINUX_VALIDATION.

Although note that this patch (and the other, which I'll send
separately) wasn't required after the rcuidle fix. I suppose there is
still a small chance things can go wrong though, but syzkaller wasn't
able to find such a case (yet).

Thanks,
-- Marco

Subject: [tip: sched/urgent] sched: Use __always_inline on is_idle_task()

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: c94a88f341c9b8f05d8639f62bb5d95936f881cd
Gitweb: https://git.kernel.org/tip/c94a88f341c9b8f05d8639f62bb5d95936f881cd
Author: Marco Elver <[email protected]>
AuthorDate: Thu, 20 Aug 2020 19:20:46 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:41:51 +02:00

sched: Use __always_inline on is_idle_task()

is_idle_task() may be used from noinstr functions such as
irqentry_enter(). Since the compiler is free to not inline regular
inline functions, switch to using __always_inline.

Signed-off-by: Marco Elver <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
include/linux/sched.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93ecd93..afe01e2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1666,7 +1666,7 @@ extern struct task_struct *idle_task(int cpu);
*
* Return: 1 if @p is an idle task. 0 otherwise.
*/
-static inline bool is_idle_task(const struct task_struct *p)
+static __always_inline bool is_idle_task(const struct task_struct *p)
{
return !!(p->flags & PF_IDLE);
}