2022-10-31 21:14:14

by postix

[permalink] [raw]
Subject: Fwd: [REGRESSION 6.0.x / 6.1.x] NULL dereferencing at tracing

> Can you apply this to see if it fixes it?
>
> I'm guessing there's a path to the release of the file descriptor where
> the ring buffer isn't allocated (and this expected it to be).
>
> I'll investigate further to see if I can find that path.
>
> -- Steve
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 199759c73519..c1c7ce4c6ddb 100644 ---
> a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@
> -937,6 +937,9 @@ void ring_buffer_wake_waiters(struct trace_buffer
> *buffer, int cpu) struct ring_buffer_per_cpu *cpu_buffer;
> struct rb_irq_work *rbwork;
>
> + if (!buffer) + return; + if (cpu == RING_BUFFER_ALL_CPUS) {
>
> /* Wake up individual ones too. One level recursion */

Dear Steve,


I have tested your suggested patch using kernel 6.1.0-rc2, but
unfortunately it didn't fix the issue for me.

Thank you for looking into it though!


Best Regards

--AD






2022-11-02 17:14:35

by postix

[permalink] [raw]
Subject: Re: Fwd: [REGRESSION 6.0.x / 6.1.x] NULL dereferencing at tracing

Hello everyone,

I have added lot's of debug printk's to see what's happening and I found
that the "cpu" counter, which is used to access the buffer's array
elements (cpu_buffer = buffer->buffers[cpu]) in the ring_buffer_wake_waiters
function, exceeds the maximum number of total of total cores, namely in
my case 24, which means, it should only run from 0..23. However, upon
debugging, it runs up to 31, and thus causing a NULL pointer dereference
(&cpu_buffer->irq_work).

After adding a return statement in case cpu > 24, the bug is no longer
reproducible.

You can find the diff between v6.1-rc2 and the patched version with
added debug log in [1].
The corresponding dmesg output can be found in [2].

I hope this gives you a good hint to find the root cause!

[1] https://paste.opensuse.org/e60601aa
[2] https://paste.opensuse.org/bf1398ce