2020-10-01 18:26:41

by Naresh Kamboju

[permalink] [raw]
Subject: BUG: kernel NULL pointer dereference, address: 00000024 - ftrace

While running kselftest on i386 on Linux next 20201001 kernel this bug
was noticed.
BUG: kernel NULL pointer dereference, address: 00000024


steps to reproduce:
# cd /opt/kselftests/ftrace
# ./ftracetest

metadata:
git branch: master
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
git commit: d39294091fee6b89d9c4a683bb19441b25098330
git describe: next-20201001ftracetest
make_kernelversion: 5.9.0-rc7
kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/intel-core2-32/lkft/linux-next/872/config

# selftests: ftrace: ftracetest
# === Ftrace unit tests ===
# [1] Basic trace file check [PASS]
# [2] Basic test for tracers [PASS]
# [3] Basic trace clock test [PASS]
# [4] Basic event tracing check [PASS]
# [5] Change the ringbuffer size [PASS]
# [6] Snapshot and tracing setting [PASS]
# [7] trace_pipe and trace_marker [PASS]
# [8] Test ftrace direct functions against tracers [UNRESOLVED]
# [9] Test ftrace direct functions against kprobes [UNRESOLVED]
# [10] Generic dynamic event - add/remove kprobe events [PASS]
# [11] Generic dynamic event - add/remove synthetic events [PASS]
# [12] Generic dynamic event - selective clear (compatibility) [PASS]
# [13] Generic dynamic event - generic clear event [PASS]
# [14] event tracing - enable/disable with event level files
[ 207.481763] BUG: kernel NULL pointer dereference, address: 00000024
[ 207.488050] #PF: supervisor read access in kernel mode
[ 207.493205] #PF: error_code(0x0000) - not-present page
[ 207.498355] *pde = 00000000
[ 207.501258] Oops: 0000 [#1] SMP
[ 207.504421] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W
5.9.0-rc7-next-20201001 #1
[ 207.513205] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 207.520680] EIP: trace_event_raw_event_sched_switch+0x10/0x180
[ 207.526510] Code: 08 5b 5e 5f 5d c3 8d 74 26 00 31 c0 eb e4 e8 27
1d de 00 8d b4 26 00 00 00 00 55 89 e5 57 56 53 89 c6 89 cf 83 ec 28
8b 45 08 <8b> 5e 24 89 55 cc 89 45 d0 65 a1 14 00 00 00 89 45 f0 31 c0
f6 c7
[ 207.545253] EAX: f55aa480 EBX: f55aa480 ECX: f55adb40 EDX: 00000000
[ 207.551510] ESI: 00000000 EDI: f55adb40 EBP: f55bbf20 ESP: f55bbeec
[ 207.557769] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210092
[ 207.564554] CR0: 80050033 CR2: 00000024 CR3: 10830000 CR4: 003506d0
[ 207.570811] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 207.577069] DR6: fffe0ff0 DR7: 00000400
[ 207.580898] Call Trace:
[ 207.583344] ? pick_next_task_fair+0x165/0x3b0
[ 207.587794] __schedule+0x4dc/0xa90
[ 207.591283] schedule_idle+0x1c/0x40
[ 207.594863] do_idle+0x185/0x2a0
[ 207.598097] cpu_startup_entry+0x25/0x30
[ 207.602021] start_secondary+0x106/0x140
[ 207.605946] startup_32_smp+0x164/0x168
[ 207.609778] Modules linked in: sch_fq sch_ingress veth algif_hash
x86_pkg_temp_thermal fuse [last unloaded: test_bpf]
[ 207.620385] CR2: 0000000000000024
[ 207.623697] ---[ end trace 6ad815a56bdaaf71 ]---
[ 207.628342] EIP: trace_event_raw_event_sched_switch+0x10/0x180
[ 207.634173] Code: 08 5b 5e 5f 5d c3 8d 74 26 00 31 c0 eb e4 e8 27
1d de 00 8d b4 26 00 00 00 00 55 89 e5 57 56 53 89 c6 89 cf 83 ec 28
8b 45 08 <8b> 5e 24 89 55 cc 89 45 d0 65 a1 14 00 00 00 89 45 f0 31 c0
f6 c7
[ 207.652918] EAX: f55aa480 EBX: f55aa480 ECX: f55adb40 EDX: 00000000
[ 207.659175] ESI: 00000000 EDI: f55adb40 EBP: f55bbf20 ESP: f55bbeec
[ 207.665433] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210092
[ 207.672211] CR0: 80050033 CR2: 00000024 CR3: 10830000 CR4: 003506d0
[ 207.678466] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 207.684740] DR6: fffe0ff0 DR7: 00000400
[ 207.688597] Kernel panic - not syncing: Attempted to kill the idle task!
[ 207.695329] Kernel Offset: disabled
[ 207.698828] ---[ end Kernel panic - not syncing: Attempted to kill
the idle task! ]---


full test log,
https://lkft.validation.linaro.org/scheduler/job/1809544#L6012

--
Linaro LKFT
https://lkft.linaro.org


Subject: [tip: core/static_call] tracepoint: Fix out of sync data passing by static caller

The following commit has been merged into the core/static_call branch of tip:

Commit-ID: 547305a64632813286700cb6d768bfe773df7d19
Gitweb: https://git.kernel.org/tip/547305a64632813286700cb6d768bfe773df7d19
Author: Steven Rostedt (VMware) <[email protected]>
AuthorDate: Thu, 01 Oct 2020 21:27:57 -04:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 02 Oct 2020 21:18:25 +02:00

tracepoint: Fix out of sync data passing by static caller

Naresh reported a bug that appears to be a side effect of the static
calls. It happens when going from more than one tracepoint callback to
a single one, and removing the first callback on the list. The list of
tracepoint callbacks holds data and a function to call with the
parameters of that tracepoint and a handler to the associated data.

old_list:
0: func = foo; data = NULL;
1: func = bar; data = &bar_struct;

new_list:
0: func = bar; data = &bar_struct;

CPU 0 CPU 1
----- -----
tp_funcs = old_list;
tp_static_caller = tp_interator

__DO_TRACE()

data = tp_funcs[0].data = NULL;

tp_funcs = new_list;
tracepoint_update_call()
tp_static_caller = tp_funcs[0] = bar;
tp_static_caller(data)
bar(data)
x = data->item = NULL->item

BOOM!

To solve this, add a tracepoint_synchronize_unregister() between
changing tp_funcs and updating the static tracepoint, that does both a
synchronize_rcu() and synchronize_srcu(). This will ensure that when
the static call is updated to the single callback that it will be
receiving the data that it registered with.

Fixes: d25e37d89dd2f ("tracepoint: Optimize using static_call()")
Reported-by: Naresh Kamboju <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/linux-next/CA+G9fYvPXVRO0NV7yL=FxCmFEMYkCwdz7R=9W+_votpT824YJA@mail.gmail.com
---
kernel/tracepoint.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index e92f3fb..26efd22 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -221,7 +221,7 @@ static void *func_remove(struct tracepoint_func **funcs,
return old;
}

-static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func *tp_funcs)
+static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func *tp_funcs, bool sync)
{
void *func = tp->iterator;

@@ -229,8 +229,17 @@ static void tracepoint_update_call(struct tracepoint *tp, struct tracepoint_func
if (!tp->static_call_key)
return;

- if (!tp_funcs[1].func)
+ if (!tp_funcs[1].func) {
func = tp_funcs[0].func;
+ /*
+ * If going from the iterator back to a single caller,
+ * we need to synchronize with __DO_TRACE to make sure
+ * that the data passed to the callback is the one that
+ * belongs to that callback.
+ */
+ if (sync)
+ tracepoint_synchronize_unregister();
+ }

__static_call_update(tp->static_call_key, tp->static_call_tramp, func);
}
@@ -265,7 +274,7 @@ static int tracepoint_add_func(struct tracepoint *tp,
* include/linux/tracepoint.h using rcu_dereference_sched().
*/
rcu_assign_pointer(tp->funcs, tp_funcs);
- tracepoint_update_call(tp, tp_funcs);
+ tracepoint_update_call(tp, tp_funcs, false);
static_key_enable(&tp->key);

release_probes(old);
@@ -297,11 +306,12 @@ static int tracepoint_remove_func(struct tracepoint *tp,
tp->unregfunc();

static_key_disable(&tp->key);
+ rcu_assign_pointer(tp->funcs, tp_funcs);
} else {
- tracepoint_update_call(tp, tp_funcs);
+ rcu_assign_pointer(tp->funcs, tp_funcs);
+ tracepoint_update_call(tp, tp_funcs,
+ tp_funcs[0].func != old[0].func);
}
-
- rcu_assign_pointer(tp->funcs, tp_funcs);
release_probes(old);
return 0;
}