LinuxLists.cc - [PATCH] tracing/ring-buffer: Remove integrity check at end of iter read

2023-02-08 09:06:28

Subject: [PATCH] tracing/ring-buffer: Remove integrity check at end of iter read

Concurrently closing "trace" file and writing into ring buffer [1] can
cause WARNINGs [2]. It has been reported in
Link: https://lore.kernel.org/all/[email protected]/

It seems a data race between ring_buffer writing and integrity check.
That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG
was cleared when doing integrity check:
rb_check_pages() rb_handle_head_page():
-------- --------
rb_head_page_deactivate()
rb_head_page_set_normal()
rb_head_page_activate()

Integrity check at end of iter read was added since commit 659f451ff213
("ring-buffer: Add integrity check at end of iter read"). As it's commit
message said:
> As reading via an iterator requires disabling the ring buffer, it
> is a perfect place to have it.
However, since commit 1039221cc278 ("ring-buffer: Do not disable recording
when there is an iterator"), ring buffer was not disabled at that place,
so that integrity check should be removed.

1:
``` read_trace.sh
while true;
do
# the "trace" file is closed after read
head -1 /sys/kernel/tracing/trace > /dev/null
done
```
``` repro.sh
sysctl -w kernel.panic_on_warn=1
# function tracer will writing enough data into ring_buffer
echo function > /sys/kernel/tracing/current_tracer
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
./read_trace.sh &
```

2:
------------[ cut here ]------------
WARNING: CPU: 9 PID: 62 at kernel/trace/ring_buffer.c:2653
rb_move_tail+0x450/0x470
Modules linked in:
CPU: 9 PID: 62 Comm: ksoftirqd/9 Tainted: G W 6.2.0-rc6+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:rb_move_tail+0x450/0x470
Code: ff ff 4c 89 c8 f0 4d 0f b1 02 48 89 c2 48 83 e2 fc 49 39 d0 75 24
83 e0 03 83 f8 02 0f 84 e1 fb ff ff 48 8b 57 10 f0 ff 42 08 <0f> 0b 83
f8 02 0f 84 ce fb ff ff e9 db
RSP: 0018:ffffb5564089bd00 EFLAGS: 00000203
RAX: 0000000000000000 RBX: ffff9db385a2bf81 RCX: ffffb5564089bd18
RDX: ffff9db281110100 RSI: 0000000000000fe4 RDI: ffff9db380145400
RBP: ffff9db385a2bf80 R08: ffff9db385a2bfc0 R09: ffff9db385a2bfc2
R10: ffff9db385a6c000 R11: ffff9db385a2bf80 R12: 0000000000000000
R13: 00000000000003e8 R14: ffff9db281110100 R15: ffffffffbb006108
FS: 0000000000000000(0000) GS:ffff9db3bdcc0000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005602323024c8 CR3: 0000000022e0c000 CR4: 00000000000006e0
Call Trace:
<TASK>
ring_buffer_lock_reserve+0x136/0x360
? __do_softirq+0x287/0x2df
? __pfx_rcu_softirq_qs+0x10/0x10
trace_function+0x21/0x110
? __pfx_rcu_softirq_qs+0x10/0x10
? __do_softirq+0x287/0x2df
function_trace_call+0xf6/0x120
0xffffffffc038f097
? rcu_softirq_qs+0x5/0x140
rcu_softirq_qs+0x5/0x140
__do_softirq+0x287/0x2df
run_ksoftirqd+0x2a/0x30
smpboot_thread_fn+0x188/0x220
? __pfx_smpboot_thread_fn+0x10/0x10
kthread+0xe7/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
---[ end trace 0000000000000000 ]---

Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator")
Signed-off-by: Zheng Yejian <[email protected]>
---
kernel/trace/ring_buffer.c | 11 -----------
1 file changed, 11 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c366a0a9ddba..34e955bd1e59 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5203,17 +5203,6 @@ void
ring_buffer_read_finish(struct ring_buffer_iter *iter)
{
struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
- unsigned long flags;
-
- /*
- * Ring buffer is disabled from recording, here's a good place
- * to check the integrity of the ring buffer.
- * Must prevent readers from trying to read, as the check
- * clears the HEAD page and readers require it.
- */
- raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
- rb_check_pages(cpu_buffer);
- raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);

atomic_dec(&cpu_buffer->resize_disabled);
kfree(iter->event);
--
2.25.1

2023-02-08 22:36:09

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH] tracing/ring-buffer: Remove integrity check at end of iter read

On Wed, 8 Feb 2023 17:08:14 +0800
Zheng Yejian <[email protected]> wrote:

> Concurrently closing "trace" file and writing into ring buffer [1] can
> cause WARNINGs [2]. It has been reported in
> Link: https://lore.kernel.org/all/[email protected]/
>
> It seems a data race between ring_buffer writing and integrity check.
> That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG
> was cleared when doing integrity check:
> rb_check_pages() rb_handle_head_page():
> -------- --------
> rb_head_page_deactivate()
> rb_head_page_set_normal()
> rb_head_page_activate()
>

Good catch!

> Integrity check at end of iter read was added since commit 659f451ff213
> ("ring-buffer: Add integrity check at end of iter read"). As it's commit
> message said:
> > As reading via an iterator requires disabling the ring buffer, it
> > is a perfect place to have it.
> However, since commit 1039221cc278 ("ring-buffer: Do not disable recording
> when there is an iterator"), ring buffer was not disabled at that place,
> so that integrity check should be removed.
>
> 1:
> ``` read_trace.sh
> while true;
> do
> # the "trace" file is closed after read
> head -1 /sys/kernel/tracing/trace > /dev/null
> done
> ```
> ``` repro.sh
> sysctl -w kernel.panic_on_warn=1
> # function tracer will writing enough data into ring_buffer
> echo function > /sys/kernel/tracing/current_tracer
> ./read_trace.sh &
> ./read_trace.sh &
> ./read_trace.sh &
> ./read_trace.sh &
> ./read_trace.sh &
> ./read_trace.sh &
> ./read_trace.sh &
> ./read_trace.sh &
> ```
>

> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator")
> Signed-off-by: Zheng Yejian <[email protected]>
> ---
> kernel/trace/ring_buffer.c | 11 -----------
> 1 file changed, 11 deletions(-)
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index c366a0a9ddba..34e955bd1e59 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -5203,17 +5203,6 @@ void
> ring_buffer_read_finish(struct ring_buffer_iter *iter)
> {
> struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
> - unsigned long flags;
> -
> - /*
> - * Ring buffer is disabled from recording, here's a good place
> - * to check the integrity of the ring buffer.
> - * Must prevent readers from trying to read, as the check
> - * clears the HEAD page and readers require it.
> - */
> - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> - rb_check_pages(cpu_buffer);
> - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);

I would rather find a way to make this still work than just removing it.

Perhaps there's no reason to clear the flags, and change rb_check_pages()
to mask them out before testing. Something like:

static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
{
struct list_head *head = cpu_buffer->pages;
struct buffer_page *bpage, *tmp;

if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head))
return -1;
if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head))
return -1;

if (rb_check_list(cpu_buffer, head))
return -1;

list_for_each_entry_safe(bpage, tmp, head, list) {
if (RB_WARN_ON(cpu_buffer,
rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list))
return -1;
if (RB_WARN_ON(cpu_buffer,
rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list))
return -1;
if (rb_check_list(cpu_buffer, &bpage->list))
return -1;
}

return 0;
}

I haven't tested the above.

?

-- Steve

>
> atomic_dec(&cpu_buffer->resize_disabled);
> kfree(iter->event);

2023-02-09 03:51:26

by Zheng Yejian

[permalink] [raw]

Subject: Re: [PATCH] tracing/ring-buffer: Remove integrity check at end of iter read

On 2023/2/9 06:36, Steven Rostedt wrote:
> On Wed, 8 Feb 2023 17:08:14 +0800
> Zheng Yejian <[email protected]> wrote:
>
>> Concurrently closing "trace" file and writing into ring buffer [1] can
>> cause WARNINGs [2]. It has been reported in
>> Link: https://lore.kernel.org/all/[email protected]/
>>
>> It seems a data race between ring_buffer writing and integrity check.
>> That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG
>> was cleared when doing integrity check:
>> rb_check_pages() rb_handle_head_page():
>> -------- --------
>> rb_head_page_deactivate()
>> rb_head_page_set_normal()
>> rb_head_page_activate()
>>
>
> Good catch!

Thanks!

>
>> Integrity check at end of iter read was added since commit 659f451ff213
>> ("ring-buffer: Add integrity check at end of iter read"). As it's commit
>> message said:
>> > As reading via an iterator requires disabling the ring buffer, it
>> > is a perfect place to have it.
>> However, since commit 1039221cc278 ("ring-buffer: Do not disable recording
>> when there is an iterator"), ring buffer was not disabled at that place,
>> so that integrity check should be removed.
>>
>> 1:
>> ``` read_trace.sh
>> while true;
>> do
>> # the "trace" file is closed after read
>> head -1 /sys/kernel/tracing/trace > /dev/null
>> done
>> ```
>> ``` repro.sh
>> sysctl -w kernel.panic_on_warn=1
>> # function tracer will writing enough data into ring_buffer
>> echo function > /sys/kernel/tracing/current_tracer
>> ./read_trace.sh &
>> ./read_trace.sh &
>> ./read_trace.sh &
>> ./read_trace.sh &
>> ./read_trace.sh &
>> ./read_trace.sh &
>> ./read_trace.sh &
>> ./read_trace.sh &
>> ```
>>
>
>
>> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator")
>> Signed-off-by: Zheng Yejian <[email protected]>
>> ---
>> kernel/trace/ring_buffer.c | 11 -----------
>> 1 file changed, 11 deletions(-)
>>
>> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
>> index c366a0a9ddba..34e955bd1e59 100644
>> --- a/kernel/trace/ring_buffer.c
>> +++ b/kernel/trace/ring_buffer.c
>> @@ -5203,17 +5203,6 @@ void
>> ring_buffer_read_finish(struct ring_buffer_iter *iter)
>> {
>> struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
>> - unsigned long flags;
>> -
>> - /*
>> - * Ring buffer is disabled from recording, here's a good place
>> - * to check the integrity of the ring buffer.
>> - * Must prevent readers from trying to read, as the check
>> - * clears the HEAD page and readers require it.
>> - */
>> - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>> - rb_check_pages(cpu_buffer);
>> - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
>
> I would rather find a way to make this still work than just removing it.

Yes, we can try to find the way.

>
> Perhaps there's no reason to clear the flags, and change rb_check_pages()
> to mask them out before testing. Something like:
>
> static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
> {
> struct list_head *head = cpu_buffer->pages;
> struct buffer_page *bpage, *tmp;
>
> if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head))
> return -1;
> if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head))
> return -1;
>
> if (rb_check_list(cpu_buffer, head))

rb_check_list() expect to check a page with RB_FLAG being cleared,
but in this solution, rb_head_page_deactivate() is not called before,
so we may not call it directly? The same problem with below check for
"bpage->list".

> return -1;
>
> list_for_each_entry_safe(bpage, tmp, head, list) {

I'd like to know if there is a case that "head" happens to be a
"reader_page", and the ring buffer is not exactly being traversed?

> if (RB_WARN_ON(cpu_buffer,
> rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list))
> return -1;
> if (RB_WARN_ON(cpu_buffer,
> rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list))
> return -1;
> if (rb_check_list(cpu_buffer, &bpage->list))
> return -1;
> }
>
> return 0;
> }
>
> I haven't tested the above.
>
> ?
>
> -- Steve
>
>
>>
>> atomic_dec(&cpu_buffer->resize_disabled);
>> kfree(iter->event);
>

2023-02-10 15:23:43

by Mukesh Ojha

[permalink] [raw]

Subject: Re: [PATCH] tracing/ring-buffer: Remove integrity check at end of iter read

On 2023/2/9 06:36, Steven Rostedt wrote:
>> On Wed, 8 Feb 2023 17:08:14 +0800
>> Zheng Yejian <[email protected]> wrote:
>>
>>> Concurrently closing "trace" file and writing into ring buffer [1] can
>>> cause WARNINGs [2]. It has been reported in
>>> Link: https://lore.kernel.org/all/[email protected]/
>>>
>>> It seems a data race between ring_buffer writing and integrity check.
>>> That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG
>>> was cleared when doing integrity check:
>>> rb_check_pages() rb_handle_head_page():
>>> -------- --------
>>> rb_head_page_deactivate()
>>> rb_head_page_set_normal()
>>> rb_head_page_activate()
>>>
>>
>> Good catch!
>
>Thanks!
>
>>
>>> Integrity check at end of iter read was added since commit 659f451ff213
>>> ("ring-buffer: Add integrity check at end of iter read"). As it's commit
>>> message said:
>>> > As reading via an iterator requires disabling the ring buffer, it
>>> > is a perfect place to have it.
>>> However, since commit 1039221cc278 ("ring-buffer: Do not disable recording
>>> when there is an iterator"), ring buffer was not disabled at that place,
>>> so that integrity check should be removed.
>>>
>>> 1:
>>> ``` read_trace.sh
>>> while true;
>>> do
>>> # the "trace" file is closed after read
>>> head -1 /sys/kernel/tracing/trace > /dev/null
>>> done
>>> ```
>>> ``` repro.sh
>>> sysctl -w kernel.panic_on_warn=1
>>> # function tracer will writing enough data into ring_buffer
>>> echo function > /sys/kernel/tracing/current_tracer
>>> ./read_trace.sh &
>>> ./read_trace.sh &
>>> ./read_trace.sh &
>>> ./read_trace.sh &
>>> ./read_trace.sh &
>>> ./read_trace.sh &
>>> ./read_trace.sh &
>>> ./read_trace.sh &
>>> ```
>>>
>>
>>
>>> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator")
>>> Signed-off-by: Zheng Yejian <[email protected]>
>>> ---
>>> kernel/trace/ring_buffer.c | 11 -----------
>>> 1 file changed, 11 deletions(-)
>>>
>>> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
>>> index c366a0a9ddba..34e955bd1e59 100644
>>> --- a/kernel/trace/ring_buffer.c
>>> +++ b/kernel/trace/ring_buffer.c
>>> @@ -5203,17 +5203,6 @@ void
>>> ring_buffer_read_finish(struct ring_buffer_iter *iter)
>>> {
>>> struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
>>> - unsigned long flags;
>>> -
>>> - /*
>>> - * Ring buffer is disabled from recording, here's a good place
>>> - * to check the integrity of the ring buffer.
>>> - * Must prevent readers from trying to read, as the check
>>> - * clears the HEAD page and readers require it.
>>> - */
>>> - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>>> - rb_check_pages(cpu_buffer);
>>> - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
>>
>> I would rather find a way to make this still work than just removing it.
>
>Yes, we can try to find the way.
>
>>
>> Perhaps there's no reason to clear the flags, and change rb_check_pages()
>> to mask them out before testing. Something like:
>>
>> static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
>> {
>> struct list_head *head = cpu_buffer->pages;
>> struct buffer_page *bpage, *tmp;
>>
>> if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head))
>> return -1;
>> if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head))
>> return -1;
>>
>> if (rb_check_list(cpu_buffer, head))
>
>rb_check_list() expect to check a page with RB_FLAG being cleared,
>but in this solution, rb_head_page_deactivate() is not called before,
>so we may not call it directly? The same problem with below check for
>"bpage->list".

Correct.

>> return -1;
>>
>> list_for_each_entry_safe(bpage, tmp, head, list) {
>
>I'd like to know if there is a case that "head" happens to be a
>"reader_page", and the ring buffer is not exactly being traversed?

In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as
cpu_buffer->pages has duplicate entry in the list.

-00 |rb_list_head_clear(inline)
-00 |rb_head_page_deactivate(inline)
-00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200)
-01 |atomic_try_cmpxchg_acquire(inline)
-01 |queued_spin_lock(inline)
-01 |do_raw_spin_lock_flags(inline)
-01 |__raw_spin_lock_irqsave(inline)
-01 |_raw_spin_lock_irqsave(inline)
-01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780)
-02 |cpumask_next(inline)
-02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00)
-03 |__fput(file = 0xFFFFFF8A53A63F00)
-04 |____fput(work = ?)
-05 |_raw_spin_unlock_irq(inline)
-05 |task_work_run()
-06 |tracehook_notify_resume(inline)
-06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868)
-07 |prepare_exit_to_user_mode(inline)
-07 |exit_to_user_mode(inline)
-07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0)
-08 |el0t_64_sync_handler(regs = ?)
-09 |el0t_64_sync(asm)

...
..
ffffff80359eeb00 --> Duplicate entry
ffffff80359ee300
ffffff80359ee180
ffffff80359eeec0
ffffff80359eec00
ffffff80359ee800 -- Tail page
ffffff80359eedc0 -- Head page
ffffff80359ee640
ffffff80359ee080
ffffff80359ee700
ffffff80359ee7c0
ffffff80359eed80
ffffff80359ee900
ffffff80359ee9c0
ffffff80359eea00
ffffff80359eea80
ffffff80359eec80
ffffff80359ee240
ffffff80359ee6c0
ffffff80359ee0c0
ffffff80359ee8c0
ffffff80359ee940
ffffff80359eee00
ffffff80359ee000
ffffff80359eeb00 ---> Duplicate entry

-Mukesh

>
>> if (RB_WARN_ON(cpu_buffer,
>> rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list))
>> return -1;
>> if (RB_WARN_ON(cpu_buffer,
>> rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list))
>> return -1;
>> if (rb_check_list(cpu_buffer, &bpage->list))
>> return -1;
>> }
>>
>> return 0;
>> }
>>
>> I haven't tested the above.
>>
>> ?
>>
>> -- Steve
>>
>>
>>>
>>> atomic_dec(&cpu_buffer->resize_disabled);
>>> kfree(iter->event);

2023-02-11 16:37:20

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH] tracing/ring-buffer: Remove integrity check at end of iter read

On Fri, 10 Feb 2023 20:52:36 +0530
Mukesh Ojha <[email protected]> wrote:

> >> return -1;
> >>
> >> list_for_each_entry_safe(bpage, tmp, head, list) {
> >
> >I'd like to know if there is a case that "head" happens to be a
> >"reader_page", and the ring buffer is not exactly being traversed?

No, the way it works is that the reader page is found by searching for the
head pointer, and then it is set when swapped. Basically, the pseudo code
is:

reader->next = head_page | HEAD_FLAG
val = head_page->prev->next
val &= ~FLAGS
val |= HEAD_FLAG
cmpxchg(head_page->prev->next, val, reader)

The HEAD_FLAG is always on the pointer that points to the head page that
gets swapped. This will never point to the reader page, as that would mean
the writer has access to it.

>
> In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as
> cpu_buffer->pages has duplicate entry in the list.
>
> -00 |rb_list_head_clear(inline)
> -00 |rb_head_page_deactivate(inline)
> -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200)
> -01 |atomic_try_cmpxchg_acquire(inline)
> -01 |queued_spin_lock(inline)
> -01 |do_raw_spin_lock_flags(inline)
> -01 |__raw_spin_lock_irqsave(inline)
> -01 |_raw_spin_lock_irqsave(inline)
> -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780)
> -02 |cpumask_next(inline)
> -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00)
> -03 |__fput(file = 0xFFFFFF8A53A63F00)
> -04 |____fput(work = ?)
> -05 |_raw_spin_unlock_irq(inline)
> -05 |task_work_run()
> -06 |tracehook_notify_resume(inline)
> -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868)
> -07 |prepare_exit_to_user_mode(inline)
> -07 |exit_to_user_mode(inline)
> -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0)
> -08 |el0t_64_sync_handler(regs = ?)
> -09 |el0t_64_sync(asm)
>
> ...
> ..
> ffffff80359eeb00 --> Duplicate entry
> ffffff80359ee300
> ffffff80359ee180
> ffffff80359eeec0
> ffffff80359eec00
> ffffff80359ee800 -- Tail page
> ffffff80359eedc0 -- Head page
> ffffff80359ee640
> ffffff80359ee080
> ffffff80359ee700
> ffffff80359ee7c0
> ffffff80359eed80
> ffffff80359ee900
> ffffff80359ee9c0
> ffffff80359eea00
> ffffff80359eea80
> ffffff80359eec80
> ffffff80359ee240
> ffffff80359ee6c0
> ffffff80359ee0c0
> ffffff80359ee8c0
> ffffff80359ee940
> ffffff80359eee00
> ffffff80359ee000
> ffffff80359eeb00 ---> Duplicate entry

So this is a separate issue where the ring buffer is corrupted?

-- Steve

2023-02-14 12:14:49

by Mukesh Ojha

[permalink] [raw]

Subject: Re: [PATCH] tracing/ring-buffer: Remove integrity check at end of iter read

On 2/11/2023 10:07 PM, Steven Rostedt wrote:
> On Fri, 10 Feb 2023 20:52:36 +0530
> Mukesh Ojha <[email protected]> wrote:
>
>>>> return -1;
>>>>
>>>> list_for_each_entry_safe(bpage, tmp, head, list) {
>>>
>>> I'd like to know if there is a case that "head" happens to be a
>>> "reader_page", and the ring buffer is not exactly being traversed?
>
> No, the way it works is that the reader page is found by searching for the
> head pointer, and then it is set when swapped. Basically, the pseudo code
> is:
>
> reader->next = head_page | HEAD_FLAG
> val = head_page->prev->next
> val &= ~FLAGS
> val |= HEAD_FLAG
> cmpxchg(head_page->prev->next, val, reader)
>
> The HEAD_FLAG is always on the pointer that points to the head page that
> gets swapped. This will never point to the reader page, as that would mean
> the writer has access to it.
>
>>
>> In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as
>> cpu_buffer->pages has duplicate entry in the list.
>>
>> -00 |rb_list_head_clear(inline)
>> -00 |rb_head_page_deactivate(inline)
>> -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200)
>> -01 |atomic_try_cmpxchg_acquire(inline)
>> -01 |queued_spin_lock(inline)
>> -01 |do_raw_spin_lock_flags(inline)
>> -01 |__raw_spin_lock_irqsave(inline)
>> -01 |_raw_spin_lock_irqsave(inline)
>> -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780)
>> -02 |cpumask_next(inline)
>> -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00)
>> -03 |__fput(file = 0xFFFFFF8A53A63F00)
>> -04 |____fput(work = ?)
>> -05 |_raw_spin_unlock_irq(inline)
>> -05 |task_work_run()
>> -06 |tracehook_notify_resume(inline)
>> -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868)
>> -07 |prepare_exit_to_user_mode(inline)
>> -07 |exit_to_user_mode(inline)
>> -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0)
>> -08 |el0t_64_sync_handler(regs = ?)
>> -09 |el0t_64_sync(asm)
>>
>> ...
>> ..
>> ffffff80359eeb00 --> Duplicate entry
>> ffffff80359ee300
>> ffffff80359ee180
>> ffffff80359eeec0
>> ffffff80359eec00
>> ffffff80359ee800 -- Tail page
>> ffffff80359eedc0 -- Head page
>> ffffff80359ee640
>> ffffff80359ee080
>> ffffff80359ee700
>> ffffff80359ee7c0
>> ffffff80359eed80
>> ffffff80359ee900
>> ffffff80359ee9c0
>> ffffff80359eea00
>> ffffff80359eea80
>> ffffff80359eec80
>> ffffff80359ee240
>> ffffff80359ee6c0
>> ffffff80359ee0c0
>> ffffff80359ee8c0
>> ffffff80359ee940
>> ffffff80359eee00
>> ffffff80359ee000
>> ffffff80359eeb00 ---> Duplicate entry
>
> So this is a separate issue where the ring buffer is corrupted?

It looks to be different issue and there also i see similar call stack
of tracing_release() but in that issue it is looping forever in
deactivate call due to list corruption.

I am not yet able to root cause a place of corruption as it is
reproduced only once, will need to check more on this.

For this issue, i have posted at
https://lore.kernel.org/lkml/[email protected]/

-Mukesh

>
> -- Steve