2020-09-04 06:13:31

by Gaurav Kohli

[permalink] [raw]
Subject: [PATCH] trace: Fix race in trace_open and buffer resize call

Below race can come, if trace_open and resize of
cpu buffer is running parallely on different cpus
CPUX CPUY
ring_buffer_resize
atomic_read(&buffer->resize_disabled)
tracing_open
tracing_reset_online_cpus
ring_buffer_reset_cpu
rb_reset_cpu
rb_update_pages
remove/insert pages
resetting pointer
This race can cause data abort or some times infinte loop in
rb_remove_pages and rb_insert_pages while checking pages
for sanity.
Take ring buffer lock in trace_open to avoid resetting of cpu buffer.

Signed-off-by: Gaurav Kohli <[email protected]>

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 136ea09..55f9115 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -163,6 +163,8 @@ bool ring_buffer_empty_cpu(struct trace_buffer *buffer, int cpu);

void ring_buffer_record_disable(struct trace_buffer *buffer);
void ring_buffer_record_enable(struct trace_buffer *buffer);
+void ring_buffer_mutex_acquire(struct trace_buffer *buffer);
+void ring_buffer_mutex_release(struct trace_buffer *buffer);
void ring_buffer_record_off(struct trace_buffer *buffer);
void ring_buffer_record_on(struct trace_buffer *buffer);
bool ring_buffer_record_is_on(struct trace_buffer *buffer);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 93ef0ab..638ec8f 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3632,6 +3632,25 @@ void ring_buffer_record_enable(struct trace_buffer *buffer)
EXPORT_SYMBOL_GPL(ring_buffer_record_enable);

/**
+ * ring_buffer_mutex_acquire - prevent resetting of buffer
+ * during resize
+ */
+void ring_buffer_mutex_acquire(struct trace_buffer *buffer)
+{
+ mutex_lock(&buffer->mutex);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_mutex_acquire);
+
+/**
+ * ring_buffer_mutex_release - prevent resetting of buffer
+ * during resize
+ */
+void ring_buffer_mutex_release(struct trace_buffer *buffer)
+{
+ mutex_unlock(&buffer->mutex);
+}
+EXPORT_SYMBOL_GPL(ring_buffer_mutex_release);
+/**
* ring_buffer_record_off - stop all writes into the buffer
* @buffer: The ring buffer to stop writes to.
*
@@ -4918,6 +4937,8 @@ void ring_buffer_reset(struct trace_buffer *buffer)
struct ring_buffer_per_cpu *cpu_buffer;
int cpu;

+ /* prevent another thread from changing buffer sizes */
+ mutex_lock(&buffer->mutex);
for_each_buffer_cpu(buffer, cpu) {
cpu_buffer = buffer->buffers[cpu];

@@ -4936,6 +4957,7 @@ void ring_buffer_reset(struct trace_buffer *buffer)
atomic_dec(&cpu_buffer->record_disabled);
atomic_dec(&cpu_buffer->resize_disabled);
}
+ mutex_unlock(&buffer->mutex);
}
EXPORT_SYMBOL_GPL(ring_buffer_reset);

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f40d850..392e9aa 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2006,6 +2006,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf)
if (!buffer)
return;

+ ring_buffer_mutex_acquire(buffer);
+
ring_buffer_record_disable(buffer);

/* Make sure all commits have finished */
@@ -2016,6 +2018,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf)
ring_buffer_reset_online_cpus(buffer);

ring_buffer_record_enable(buffer);
+
+ ring_buffer_mutex_release(buffer);
}

/* Must have trace_types_lock held */
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project


2020-09-14 04:32:11

by Gaurav Kohli

[permalink] [raw]
Subject: Re: [PATCH] trace: Fix race in trace_open and buffer resize call

Hi Steven,

Please let us know, if below change looks good.
Or let us know some other way to solve this.

Thanks,
Gaurav



On 9/4/2020 11:39 AM, Gaurav Kohli wrote:
> Below race can come, if trace_open and resize of
> cpu buffer is running parallely on different cpus
> CPUX CPUY
> ring_buffer_resize
> atomic_read(&buffer->resize_disabled)
> tracing_open
> tracing_reset_online_cpus
> ring_buffer_reset_cpu
> rb_reset_cpu
> rb_update_pages
> remove/insert pages
> resetting pointer
> This race can cause data abort or some times infinte loop in
> rb_remove_pages and rb_insert_pages while checking pages
> for sanity.
> Take ring buffer lock in trace_open to avoid resetting of cpu buffer.
>
> Signed-off-by: Gaurav Kohli <[email protected]>
>
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index 136ea09..55f9115 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -163,6 +163,8 @@ bool ring_buffer_empty_cpu(struct trace_buffer *buffer, int cpu);
>
> void ring_buffer_record_disable(struct trace_buffer *buffer);
> void ring_buffer_record_enable(struct trace_buffer *buffer);
> +void ring_buffer_mutex_acquire(struct trace_buffer *buffer);
> +void ring_buffer_mutex_release(struct trace_buffer *buffer);
> void ring_buffer_record_off(struct trace_buffer *buffer);
> void ring_buffer_record_on(struct trace_buffer *buffer);
> bool ring_buffer_record_is_on(struct trace_buffer *buffer);
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 93ef0ab..638ec8f 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -3632,6 +3632,25 @@ void ring_buffer_record_enable(struct trace_buffer *buffer)
> EXPORT_SYMBOL_GPL(ring_buffer_record_enable);
>
> /**
> + * ring_buffer_mutex_acquire - prevent resetting of buffer
> + * during resize
> + */
> +void ring_buffer_mutex_acquire(struct trace_buffer *buffer)
> +{
> + mutex_lock(&buffer->mutex);
> +}
> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_acquire);
> +
> +/**
> + * ring_buffer_mutex_release - prevent resetting of buffer
> + * during resize
> + */
> +void ring_buffer_mutex_release(struct trace_buffer *buffer)
> +{
> + mutex_unlock(&buffer->mutex);
> +}
> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release);
> +/**
> * ring_buffer_record_off - stop all writes into the buffer
> * @buffer: The ring buffer to stop writes to.
> *
> @@ -4918,6 +4937,8 @@ void ring_buffer_reset(struct trace_buffer *buffer)
> struct ring_buffer_per_cpu *cpu_buffer;
> int cpu;
>
> + /* prevent another thread from changing buffer sizes */
> + mutex_lock(&buffer->mutex);
> for_each_buffer_cpu(buffer, cpu) {
> cpu_buffer = buffer->buffers[cpu];
>
> @@ -4936,6 +4957,7 @@ void ring_buffer_reset(struct trace_buffer *buffer)
> atomic_dec(&cpu_buffer->record_disabled);
> atomic_dec(&cpu_buffer->resize_disabled);
> }
> + mutex_unlock(&buffer->mutex);
> }
> EXPORT_SYMBOL_GPL(ring_buffer_reset);
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index f40d850..392e9aa 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2006,6 +2006,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf)
> if (!buffer)
> return;
>
> + ring_buffer_mutex_acquire(buffer);
> +
> ring_buffer_record_disable(buffer);
>
> /* Make sure all commits have finished */
> @@ -2016,6 +2018,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf)
> ring_buffer_reset_online_cpus(buffer);
>
> ring_buffer_record_enable(buffer);
> +
> + ring_buffer_mutex_release(buffer);
> }
>
> /* Must have trace_types_lock held */
>

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2020-09-14 16:24:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] trace: Fix race in trace_open and buffer resize call

On Mon, 14 Sep 2020 10:00:50 +0530
Gaurav Kohli <[email protected]> wrote:

> Hi Steven,
>
> Please let us know, if below change looks good.
> Or let us know some other way to solve this.
>
> Thanks,
> Gaurav
>
>

Hmm, for some reason, I don't see this in my INBOX, but it shows up in my
LKML folder. :-/


>
> On 9/4/2020 11:39 AM, Gaurav Kohli wrote:
> > Below race can come, if trace_open and resize of
> > cpu buffer is running parallely on different cpus
> > CPUX CPUY
> > ring_buffer_resize
> > atomic_read(&buffer->resize_disabled)
> > tracing_open
> > tracing_reset_online_cpus
> > ring_buffer_reset_cpu
> > rb_reset_cpu
> > rb_update_pages
> > remove/insert pages
> > resetting pointer
> > This race can cause data abort or some times infinte loop in
> > rb_remove_pages and rb_insert_pages while checking pages
> > for sanity.
> > Take ring buffer lock in trace_open to avoid resetting of cpu buffer.
> >
> > Signed-off-by: Gaurav Kohli <[email protected]>
> >
> > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> > index 136ea09..55f9115 100644
> > --- a/include/linux/ring_buffer.h
> > +++ b/include/linux/ring_buffer.h
> > @@ -163,6 +163,8 @@ bool ring_buffer_empty_cpu(struct trace_buffer *buffer, int cpu);
> >
> > void ring_buffer_record_disable(struct trace_buffer *buffer);
> > void ring_buffer_record_enable(struct trace_buffer *buffer);
> > +void ring_buffer_mutex_acquire(struct trace_buffer *buffer);
> > +void ring_buffer_mutex_release(struct trace_buffer *buffer);
> > void ring_buffer_record_off(struct trace_buffer *buffer);
> > void ring_buffer_record_on(struct trace_buffer *buffer);
> > bool ring_buffer_record_is_on(struct trace_buffer *buffer);
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index 93ef0ab..638ec8f 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -3632,6 +3632,25 @@ void ring_buffer_record_enable(struct trace_buffer *buffer)
> > EXPORT_SYMBOL_GPL(ring_buffer_record_enable);
> >
> > /**
> > + * ring_buffer_mutex_acquire - prevent resetting of buffer
> > + * during resize
> > + */
> > +void ring_buffer_mutex_acquire(struct trace_buffer *buffer)
> > +{
> > + mutex_lock(&buffer->mutex);
> > +}
> > +EXPORT_SYMBOL_GPL(ring_buffer_mutex_acquire);
> > +
> > +/**
> > + * ring_buffer_mutex_release - prevent resetting of buffer
> > + * during resize
> > + */
> > +void ring_buffer_mutex_release(struct trace_buffer *buffer)
> > +{
> > + mutex_unlock(&buffer->mutex);
> > +}
> > +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release);

I really do not like to export these.

> > +/**
> > * ring_buffer_record_off - stop all writes into the buffer
> > * @buffer: The ring buffer to stop writes to.
> > *
> > @@ -4918,6 +4937,8 @@ void ring_buffer_reset(struct trace_buffer *buffer)
> > struct ring_buffer_per_cpu *cpu_buffer;
> > int cpu;
> >
> > + /* prevent another thread from changing buffer sizes */
> > + mutex_lock(&buffer->mutex);
> > for_each_buffer_cpu(buffer, cpu) {
> > cpu_buffer = buffer->buffers[cpu];
> >
> > @@ -4936,6 +4957,7 @@ void ring_buffer_reset(struct trace_buffer *buffer)
> > atomic_dec(&cpu_buffer->record_disabled);
> > atomic_dec(&cpu_buffer->resize_disabled);
> > }
> > + mutex_unlock(&buffer->mutex);
> > }
> > EXPORT_SYMBOL_GPL(ring_buffer_reset);
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index f40d850..392e9aa 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -2006,6 +2006,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf)
> > if (!buffer)
> > return;
> >
> > + ring_buffer_mutex_acquire(buffer);
> > +
> > ring_buffer_record_disable(buffer);

Hmm, why do we disable here as it gets disabled again in the call to
ring_buffer_reset_online_cpus()? Perhaps we don't need to disable the
buffer here. The only difference is that we have:

buf->time_start = buffer_ftrace_now(buf, buf->cpu);

And that the above disables the entire buffer, whereas the reset only
resets individual ones.

But I don't think that will make any difference.

-- Steve


> >
> > /* Make sure all commits have finished */
> > @@ -2016,6 +2018,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf)
> > ring_buffer_reset_online_cpus(buffer);
> >
> > ring_buffer_record_enable(buffer);
> > +
> > + ring_buffer_mutex_release(buffer);
> > }
> >
> > /* Must have trace_types_lock held */
> >
>

2020-09-15 05:11:19

by Gaurav Kohli

[permalink] [raw]
Subject: Re: [PATCH] trace: Fix race in trace_open and buffer resize call



Hi Steven,
thanks for reply.

On 9/14/2020 9:49 PM, Steven Rostedt wrote:
> On Mon, 14 Sep 2020 10:00:50 +0530
> Gaurav Kohli <[email protected]> wrote:
>
>> Hi Steven,
>>
>> Please let us know, if below change looks good.
>> Or let us know some other way to solve this.
>>
>> Thanks,
>> Gaurav
>>
>>
>
> Hmm, for some reason, I don't see this in my INBOX, but it shows up in my
> LKML folder. :-/
>
>


>>> +void ring_buffer_mutex_release(struct trace_buffer *buffer)
>>> +{
>>> + mutex_unlock(&buffer->mutex);
>>> +}
>>> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release);
>
> I really do not like to export these.
>

Actually available reader lock is not helping
here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to
resolve this(this came on 4.19/5.4), in latest tip it is trace buffer
lock. Due to this i have exported api.
>>> +/**
>>> * ring_buffer_record_off - stop all writes into the buffer
>>> * @buffer: The ring buffer to stop writes to.
>>> *
>>> @@ -4918,6 +4937,8 @@ void ring_buffer_reset(struct trace_buffer
*buffer)
>>> struct ring_buffer_per_cpu *cpu_buffer;
>>> int cpu;
>>> + /* prevent another thread from changing buffer sizes */
>>> + mutex_lock(&buffer->mutex);
>>> for_each_buffer_cpu(buffer, cpu) {
>>> cpu_buffer = buffer->buffers[cpu];
>>> @@ -4936,6 +4957,7 @@ void ring_buffer_reset(struct trace_buffer
*buffer)
>>> atomic_dec(&cpu_buffer->record_disabled);
>>> atomic_dec(&cpu_buffer->resize_disabled);
>>> }
>>> + mutex_unlock(&buffer->mutex);
>>> }
>>> EXPORT_SYMBOL_GPL(ring_buffer_reset);
>>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>>> index f40d850..392e9aa 100644
>>> --- a/kernel/trace/trace.c
>>> +++ b/kernel/trace/trace.c
>>> @@ -2006,6 +2006,8 @@ void tracing_reset_online_cpus(struct
array_buffer *buf)
>>> if (!buffer)
>>> return;
>>> + ring_buffer_mutex_acquire(buffer);
>>> +
>>> ring_buffer_record_disable(buffer);
>
> Hmm, why do we disable here as it gets disabled again in the call to
> ring_buffer_reset_online_cpus()? Perhaps we don't need to disable the
You mean cpu_buffer->reader_lock in reset_disabled_cpu_buffer?
Actually reader lock is already there but this is not helping if
tracing_open and ring_buffer_resize are running parallel on different cpus.

We are seeing below race mainly during removal of extra pages:

ring_buffer_resize
//Below portion of code
//not under any lock
nr_pages_to_update < 0
init_list_head(new_pages)
rb_update_pages


ring_buffer_resize
tracing_open
tracing_reset_online_cpus
ring_buffer_reset_cpu
cpu_buffer_reset done
//now lock started

warning(nr_removed)

We are seeing cases like cpu buffer got reset due to tracing open in
other call, and then seeing issue in rb_remove_pages.

Similar case can come during rb_insert_pages as well:

rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
{
struct list_head *pages = &cpu_buffer->new_pages;
int retries, success;
//before lock cpu buffer may get reset in another cpu, due to which we
are seeing infinite loop cases as new_pages pointer got reset in
rb_reset_cpu.

raw_spin_lock_irq(&cpu_buffer->reader_lock);

> buffer here. The only difference is that we have:
>
> buf->time_start = buffer_ftrace_now(buf, buf->cpu);
>
> And that the above disables the entire buffer, whereas the reset only
> resets individual ones.
>
> But I don't think that will make any difference.
>
> -- Steve
>


--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2020-09-15 18:30:16

by Gaurav Kohli

[permalink] [raw]
Subject: Re: [PATCH] trace: Fix race in trace_open and buffer resize call

Sorry for spam, saw some failure so sending mail again.

On 9/15/2020 6:53 PM, Steven Rostedt wrote:
> On Tue, 15 Sep 2020 10:38:03 +0530
> Gaurav Kohli <[email protected]> wrote:
>
>>
>> >>> +void ring_buffer_mutex_release(struct trace_buffer *buffer)
>> >>> +{
>> >>> + mutex_unlock(&buffer->mutex);
>> >>> +}
>> >>> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release);
>> >
>> > I really do not like to export these.
>> >
>>
>> Actually available reader lock is not helping
>> here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to
>> resolve this(this came on 4.19/5.4), in latest tip it is trace buffer
>> lock. Due to this i have exported api.
>
> I'm saying, why don't you take the buffer->mutex in the
> ring_buffer_reset_online_cpus() function? And remove all the
protection in
> tracing_reset_online_cpus()?

Yes, got your point. then we can avoid export. Actually we are seeing
issue in older kernel like 4.19/4.14/5.4 and there below patch is not
present in stable branches:

ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
> avoiding synchronize_rcu for each CPU")

Actually i have also thought to take mutex lock in ring_buffer_reset_cpu
while doing individual cpu reset, but this could cause another problem:

Different cpu buffer may have different state, so i have taken lock in
tracing_reset_online_cpus.
>
> void tracing_reset_online_cpus(struct array_buffer *buf)
> {
> struct trace_buffer *buffer = buf->buffer;
>
> if (!buffer)
> return;
>
> buf->time_start = buffer_ftrace_now(buf, buf->cpu);
>
> ring_buffer_reset_online_cpus(buffer);
> }
>
> The reset_online_cpus() is already doing the synchronization, we
don't need
> to do it twice.
>
> I believe commit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
> avoiding synchronize_rcu for each CPU") made the synchronization in
> tracing_reset_online_cpus() obsolete.
>
> -- Steve
>

Yes, with above patch no need to take lock in tracing_reset_online_cpus.
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2020-09-15 18:35:17

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] trace: Fix race in trace_open and buffer resize call

On Tue, 15 Sep 2020 22:53:32 +0530
Gaurav Kohli <[email protected]> wrote:

> On 9/15/2020 6:53 PM, Steven Rostedt wrote:
> > On Tue, 15 Sep 2020 10:38:03 +0530
> > Gaurav Kohli <[email protected]> wrote:
> >
> >>
> >> >>> +void ring_buffer_mutex_release(struct trace_buffer *buffer)
> >> >>> +{
> >> >>> + mutex_unlock(&buffer->mutex);
> >> >>> +}
> >> >>> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release);
> >> >
> >> > I really do not like to export these.
> >> >
> >>
> >> Actually available reader lock is not helping
> >> here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to
> >> resolve this(this came on 4.19/5.4), in latest tip it is trace buffer
> >> lock. Due to this i have exported api.
> >
> > I'm saying, why don't you take the buffer->mutex in the
> > ring_buffer_reset_online_cpus() function? And remove all the protection in
> > tracing_reset_online_cpus()?
>
> Yes, got your point. then we can avoid export. Actually we are seeing
> issue in older kernel like 4.19/4.14/5.4 and there below patch was not
> present in stable branches:
>
> ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
> > avoiding synchronize_rcu for each CPU")

If you mark this patch for stable, you can add:

Depends-on: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU")

>
> Actually i have also thought to take mutex lock in ring_buffer_reset_cpu
> while doing individual cpu reset, but this could cause another problem:

Hmm, I think we should also take the buffer lock in the reset_cpu() call
too, and modify tracing_reset_cpu() the same way.

>
> Different cpu buffer may have different state, so i have taken lock in
> tracing_reset_online_cpus.

Why would different states be an issue in synchronizing?

-- Steve

> >
> > void tracing_reset_online_cpus(struct array_buffer *buf)
> > {
> > struct trace_buffer *buffer = buf->buffer;
> >
> > if (!buffer)
> > return;
> >
> > buf->time_start = buffer_ftrace_now(buf, buf->cpu);
> >
> > ring_buffer_reset_online_cpus(buffer);
> > }
> >
> > The reset_online_cpus() is already doing the synchronization, we don't need
> > to do it twice.
> >
> > I believe commit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
> > avoiding synchronize_rcu for each CPU") made the synchronization in
> > tracing_reset_online_cpus() obsolete.
> >
> > -- Steve
> >
>
> Yes, with above patch no need to take lock in tracing_reset_online_cpus.

2020-09-15 18:55:55

by Gaurav Kohli

[permalink] [raw]
Subject: Re: [PATCH] trace: Fix race in trace_open and buffer resize call



On 9/15/2020 6:53 PM, Steven Rostedt wrote:
> On Tue, 15 Sep 2020 10:38:03 +0530
> Gaurav Kohli <[email protected]> wrote:
>
>>
>> >>> +void ring_buffer_mutex_release(struct trace_buffer *buffer)
>> >>> +{
>> >>> + mutex_unlock(&buffer->mutex);
>> >>> +}
>> >>> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release);
>> >
>> > I really do not like to export these.
>> >
>>
>> Actually available reader lock is not helping
>> here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to
>> resolve this(this came on 4.19/5.4), in latest tip it is trace buffer
>> lock. Due to this i have exported api.
>
> I'm saying, why don't you take the buffer->mutex in the
> ring_buffer_reset_online_cpus() function? And remove all the protection in
> tracing_reset_online_cpus()?

Yes, got your point. then we can avoid export. Actually we are seeing
issue in older kernel like 4.19/4.14/5.4 and there below patch was not
present in stable branches:

ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
> avoiding synchronize_rcu for each CPU")

Actually i have also thought to take mutex lock in ring_buffer_reset_cpu
while doing individual cpu reset, but this could cause another problem:

Different cpu buffer may have different state, so i have taken lock in
tracing_reset_online_cpus.
>
> void tracing_reset_online_cpus(struct array_buffer *buf)
> {
> struct trace_buffer *buffer = buf->buffer;
>
> if (!buffer)
> return;
>
> buf->time_start = buffer_ftrace_now(buf, buf->cpu);
>
> ring_buffer_reset_online_cpus(buffer);
> }
>
> The reset_online_cpus() is already doing the synchronization, we don't need
> to do it twice.
>
> I believe commit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
> avoiding synchronize_rcu for each CPU") made the synchronization in
> tracing_reset_online_cpus() obsolete.
>
> -- Steve
>

Yes, with above patch no need to take lock in tracing_reset_online_cpus.
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2020-09-16 00:39:23

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] trace: Fix race in trace_open and buffer resize call

On Tue, 15 Sep 2020 10:38:03 +0530
Gaurav Kohli <[email protected]> wrote:

>
> >>> +void ring_buffer_mutex_release(struct trace_buffer *buffer)
> >>> +{
> >>> + mutex_unlock(&buffer->mutex);
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release);
> >
> > I really do not like to export these.
> >
>
> Actually available reader lock is not helping
> here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to
> resolve this(this came on 4.19/5.4), in latest tip it is trace buffer
> lock. Due to this i have exported api.

I'm saying, why don't you take the buffer->mutex in the
ring_buffer_reset_online_cpus() function? And remove all the protection in
tracing_reset_online_cpus()?

void tracing_reset_online_cpus(struct array_buffer *buf)
{
struct trace_buffer *buffer = buf->buffer;

if (!buffer)
return;

buf->time_start = buffer_ftrace_now(buf, buf->cpu);

ring_buffer_reset_online_cpus(buffer);
}

The reset_online_cpus() is already doing the synchronization, we don't need
to do it twice.

I believe commit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
avoiding synchronize_rcu for each CPU") made the synchronization in
tracing_reset_online_cpus() obsolete.

-- Steve

2020-09-16 06:35:05

by Gaurav Kohli

[permalink] [raw]
Subject: Re: [PATCH] trace: Fix race in trace_open and buffer resize call



On 9/15/2020 11:43 PM, Steven Rostedt wrote:

>>>> Actually available reader lock is not helping
>>>> here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to
>>>> resolve this(this came on 4.19/5.4), in latest tip it is trace buffer
>>>> lock. Due to this i have exported api.
>>>
>>> I'm saying, why don't you take the buffer->mutex in the
>>> ring_buffer_reset_online_cpus() function? And remove all the protection in
>>> tracing_reset_online_cpus()?
>>
>> Yes, got your point. then we can avoid export. Actually we are seeing
>> issue in older kernel like 4.19/4.14/5.4 and there below patch was not
>> present in stable branches:
>>
>> ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
>> > avoiding synchronize_rcu for each CPU")
>
> If you mark this patch for stable, you can add:
>
> Depends-on: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU")
>

Thanks Steven, Yes this needs to be back ported. I have tried this in
5.4 but this need more patches like
13292494379f92f532de71b31a54018336adc589
tracing: Make struct ring_buffer less ambiguous

Instead of protecting all reset, can we do it individually like below:


+++ b/kernel/trace/ring_buffer.c
@@ -4838,7 +4838,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu
*cpu_buffer)
{
unsigned long flags;
+ struct trace_buffer *buffer = cpu_buffer->buffer;

+ mutex_lock(&buffer->mutex);
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);

if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
@@ -4852,6 +4854,7 @@ static void reset_disabled_cpu_buffer(struct
ring_buffer_per_cpu *cpu_buffer)

out:
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+ mutex_unlock(&buffer->mutex);
}

Please let me know, if above looks good, we will do testing with this.
And this we can directly use in older kernel as well in
ring_buffer_reset_cpu.

>>
>> Actually i have also thought to take mutex lock in ring_buffer_reset_cpu
>> while doing individual cpu reset, but this could cause another problem:
>
> Hmm, I think we should also take the buffer lock in the reset_cpu() call
> too, and modify tracing_reset_cpu() the same way.
>

if we take above patch, then this is not required.
Please let me know for the approach.
>>
>> Different cpu buffer may have different state, so i have taken lock in
>> tracing_reset_online_cpus.
>
> Why would different states be an issue in synchronizing?
>
> -- Steve
>

Yes, this should not be problem.
>>>
>>> void tracing_reset_online_cpus(struct array_buffer *buf)
>>> {
>>> struct trace_buffer *buffer = buf->buffer;
>>>
>>> if (!buffer)
>>> return;
>>>
>>> buf->time_start = buffer_ftrace_now(buf, buf->cpu);
>>>
>>> ring_buffer_reset_online_cpus(buffer);
>>> }
>>>
>>> The reset_online_cpus() is already doing the synchronization, we don't need
>>> to do it twice.
>>>
>>> I believe commit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
>>> avoiding synchronize_rcu for each CPU") made the synchronization in
>>> tracing_reset_online_cpus() obsolete.
>>>
>>> -- Steve
>>>
>>
>> Yes, with above patch no need to take lock in tracing_reset_online_cpus.
>

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2020-09-22 08:35:32

by Gaurav Kohli

[permalink] [raw]
Subject: Re: [PATCH] trace: Fix race in trace_open and buffer resize call



On 9/16/2020 12:02 PM, Gaurav Kohli wrote:

>>>
>>> Yes, got your point. then we can avoid export. Actually we are seeing
>>> issue in older kernel like 4.19/4.14/5.4 and there below patch was not
>>> present in stable branches:
>>>
>>> ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
>>>   > avoiding synchronize_rcu for each CPU")
>>
>> If you mark this patch for stable, you can add:
>>
>> Depends-on: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
>> avoiding synchronize_rcu for each CPU")
>>
>
> Thanks Steven, Yes this needs to be back ported. I have tried this in
> 5.4 but this need more patches like
> 13292494379f92f532de71b31a54018336adc589
> tracing: Make struct ring_buffer less ambiguous
>
> Instead of protecting all reset, can we do it individually like below:
>
>
> +++ b/kernel/trace/ring_buffer.c
> @@ -4838,7 +4838,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
>  static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu
> *cpu_buffer)
>  {
>         unsigned long flags;
> +       struct trace_buffer *buffer = cpu_buffer->buffer;
>
> +       mutex_lock(&buffer->mutex);
>         raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>
>         if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
> @@ -4852,6 +4854,7 @@ static void reset_disabled_cpu_buffer(struct
> ring_buffer_per_cpu *cpu_buffer)
>
>   out:
>         raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
> +       mutex_unlock(&buffer->mutex);
>  }
>

Hi Steven,
Not seeing issue with above patch in 5.4, Please let me know if above
approach looks good to you, will raise patch for same.

Otherwise we will raise patch for older approach by marking depends on
of below patch:
depends-on: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by

Thanks,
Gaurav
> Please let me know, if above looks good, we will do testing with this.
> And this we can directly use in older kernel as well in
> ring_buffer_reset_cpu.
>
>>>
>>> Actually i have also thought to take mutex lock in ring_buffer_reset_cpu
>>> while doing individual cpu reset, but this could cause another problem:
>>
>> Hmm, I think we should also take the buffer lock in the reset_cpu() call
>> too, and modify tracing_reset_cpu() the same way.
>>
>
> if we take above patch, then this is not required.
> Please let me know for the approach.
>>>
>>> Different cpu buffer may have different state, so i have taken lock in
>>> tracing_reset_online_cpus.
>>
>> Why would different states be an issue in synchronizing?
>>
>> -- Steve
>>
>
> Yes, this should not be problem.


--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2020-09-22 14:03:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] trace: Fix race in trace_open and buffer resize call

Sorry for not replying sooner, my email is still rather full from my 10 day
vacation, and I'm still in "skimming" mode at looking at it.

On Wed, 16 Sep 2020 12:02:46 +0530
Gaurav Kohli <[email protected]> wrote:


> >> Yes, got your point. then we can avoid export. Actually we are seeing
> >> issue in older kernel like 4.19/4.14/5.4 and there below patch was not
> >> present in stable branches:
> >>
> >> ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
> >> > avoiding synchronize_rcu for each CPU")
> >
> > If you mark this patch for stable, you can add:
> >
> > Depends-on: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU")
> >
>
> Thanks Steven, Yes this needs to be back ported. I have tried this in
> 5.4 but this need more patches like
> 13292494379f92f532de71b31a54018336adc589
> tracing: Make struct ring_buffer less ambiguous

No, that is not needed. That's just a trivial renaming of structures. Use
the old structure. Dependency is if the algorithm depends on the change.
Not cosmetic.

>
> Instead of protecting all reset, can we do it individually like below:
>
>
> +++ b/kernel/trace/ring_buffer.c
> @@ -4838,7 +4838,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
> static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu
> *cpu_buffer)
> {
> unsigned long flags;
> + struct trace_buffer *buffer = cpu_buffer->buffer;
>
> + mutex_lock(&buffer->mutex);
> raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>
> if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
> @@ -4852,6 +4854,7 @@ static void reset_disabled_cpu_buffer(struct
> ring_buffer_per_cpu *cpu_buffer)
>
> out:
> raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
> + mutex_unlock(&buffer->mutex);
> }
>
> Please let me know, if above looks good, we will do testing with this.
> And this we can directly use in older kernel as well in
> ring_buffer_reset_cpu.

No that will not work. You need the lock around the disabling of the
buffers and the synchronizing with RCU.

-- Steve