2014-06-26 13:23:00

by Petr Mladek

[permalink] [raw]
Subject: [PATCH] ring-buffer: Race when writing and swapping cpu buffer in parallel

The trace/ring_buffer allows to swap the entire ring buffer. Everything has to
be done lockless. I think that I have found a race when trying to understand
the code. The problematic situation is the following:

CPU 1 (write/reserve event) CPU 2 (swap the cpu buffer)
-------------------------------------------------------------------------
ring_buffer_write()
if (cpu_buffer->record_disabled)
^^^ test fails and write continues

ring_buffer_swap_cpu()

inc(&cpu_buffer_a->record_disabled);
inc(&cpu_buffer_b->record_disabled);

if (cpu_buffer_a->committing)
^^^ test fails and swap continues
if (cpu_buffer_b->committing)
^^^ test fails and swap continues

rb_reserve_next_event()
rb_start_commit()
local_inc(&cpu_buffer->committing);
local_inc(&cpu_buffer->commits);

if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
^^^ test fails and reserve_next continues

buffer_a->buffers[cpu] = cpu_buffer_b;
buffer_b->buffers[cpu] = cpu_buffer_a;
cpu_buffer_b->buffer = buffer_a;
cpu_buffer_a->buffer = buffer_b;

Pheeee, reservation continues in the removed buffer.

This can be solved by a better check in rb_reserve_next_event(). The reservation
could continue only when "committing" is enabled and there is no swap in
progress or when any swap was not finished in the meantime.

Note that the solution is not perfect. It stops writing also in this situation:

CPU 1 (write/reserve event) CPU 2 (swap the cpu buffer)
----------------------------------------------------------------------------
ring_buffer_write()
if (cpu_buffer->record_disabled)
^^^ test fails and write continues

ring_buffer_swap_cpu()

inc(&cpu_buffer_a->record_disabled);
inc(&cpu_buffer_b->record_disabled);

rb_reserve_next_event()
rb_start_commit()
local_inc(&cpu_buffer->committing);
local_inc(&cpu_buffer->commits);

if (cpu_buffer_a->committing)
^^^ test passes and swap is canceled

if (cpu_buffer->record_disabled)
^^^ test passes and reserve is canceled

dec(&cpu_buffer_a->record_disabled);
dec(&cpu_buffer_b->record_disabled);

Pheeee, both actions were canceled. The write/reserve could have continued
if the recording was enabled in time.

Well, it is the price for using lockless algorithms. I think that it happens
in more situations here, it is not worth more complicated code, and we could
live with it.

The patch also adds some missing memory barriers. Note that compiler barriers
are not enough because the data can be accessed by different CPUs.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/trace/ring_buffer.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7c56c3d06943..3020060ded7e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2529,13 +2529,15 @@ rb_reserve_next_event(struct ring_buffer *buffer,

#ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
/*
- * Due to the ability to swap a cpu buffer from a buffer
- * it is possible it was swapped before we committed.
- * (committing stops a swap). We check for it here and
- * if it happened, we have to fail the write.
+ * The cpu buffer swap could have started before we managed to stop it
+ * by incrementing the "committing" values. If the swap is in progress,
+ * we see disabled recording. If the swap has finished, we see the new
+ * cpu buffer. In both cases, we should go back and stop committing
+ * to the old buffer. See also ring_buffer_swap_cpu()
*/
- barrier();
- if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
+ smp_mb();
+ if (unlikely(atomic_read(&cpu_buffer->record_disabled) ||
+ ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
local_dec(&cpu_buffer->committing);
local_dec(&cpu_buffer->commits);
return NULL;
@@ -4334,6 +4336,13 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
atomic_inc(&cpu_buffer_a->record_disabled);
atomic_inc(&cpu_buffer_b->record_disabled);

+ /*
+ * We could not swap if a commit is in progress. Also any commit could
+ * not start after we have stopped recording. Keep both checks in sync.
+ * The counter part is in rb_reserve_next_event()
+ */
+ smp_mb();
+
ret = -EBUSY;
if (local_read(&cpu_buffer_a->committing))
goto out_dec;
@@ -4348,6 +4357,12 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,

ret = 0;

+ /*
+ * Mare sure that rb_reserve_next_event() see the right
+ * buffer before we enable recording.
+ */
+ smp_wmb();
+
out_dec:
atomic_dec(&cpu_buffer_a->record_disabled);
atomic_dec(&cpu_buffer_b->record_disabled);
--
1.8.4


2014-06-26 13:58:34

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] ring-buffer: Race when writing and swapping cpu buffer in parallel

On Thu, 26 Jun 2014 15:22:38 +0200
Petr Mladek <[email protected]> wrote:

> The trace/ring_buffer allows to swap the entire ring buffer. Everything has to
> be done lockless. I think that I have found a race when trying to understand
> the code. The problematic situation is the following:
>
> CPU 1 (write/reserve event) CPU 2 (swap the cpu buffer)
> -------------------------------------------------------------------------
> ring_buffer_write()
> if (cpu_buffer->record_disabled)
> ^^^ test fails and write continues
>
> ring_buffer_swap_cpu()

This is a per cpu swap, not a full ring buffer swap.

>
> inc(&cpu_buffer_a->record_disabled);
> inc(&cpu_buffer_b->record_disabled);
>
> if (cpu_buffer_a->committing)
> ^^^ test fails and swap continues
> if (cpu_buffer_b->committing)
> ^^^ test fails and swap continues

Grumble, this was originally written such that the swap can only be
done on the CPU that it is running on.


>
> rb_reserve_next_event()
> rb_start_commit()
> local_inc(&cpu_buffer->committing);
> local_inc(&cpu_buffer->commits);
>
> if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
> ^^^ test fails and reserve_next continues
>
> buffer_a->buffers[cpu] = cpu_buffer_b;
> buffer_b->buffers[cpu] = cpu_buffer_a;
> cpu_buffer_b->buffer = buffer_a;
> cpu_buffer_a->buffer = buffer_b;
>
> Pheeee, reservation continues in the removed buffer.
>
> This can be solved by a better check in rb_reserve_next_event(). The reservation
> could continue only when "committing" is enabled and there is no swap in
> progress or when any swap was not finished in the meantime.
>
> Note that the solution is not perfect. It stops writing also in this situation:
>
> CPU 1 (write/reserve event) CPU 2 (swap the cpu buffer)
> ----------------------------------------------------------------------------
> ring_buffer_write()
> if (cpu_buffer->record_disabled)
> ^^^ test fails and write continues
>
> ring_buffer_swap_cpu()
>
> inc(&cpu_buffer_a->record_disabled);
> inc(&cpu_buffer_b->record_disabled);
>
> rb_reserve_next_event()
> rb_start_commit()
> local_inc(&cpu_buffer->committing);
> local_inc(&cpu_buffer->commits);
>
> if (cpu_buffer_a->committing)
> ^^^ test passes and swap is canceled
>
> if (cpu_buffer->record_disabled)
> ^^^ test passes and reserve is canceled
>
> dec(&cpu_buffer_a->record_disabled);
> dec(&cpu_buffer_b->record_disabled);
>
> Pheeee, both actions were canceled. The write/reserve could have continued
> if the recording was enabled in time.
>
> Well, it is the price for using lockless algorithms. I think that it happens
> in more situations here, it is not worth more complicated code, and we could
> live with it.
>
> The patch also adds some missing memory barriers. Note that compiler barriers
> are not enough because the data can be accessed by different CPUs.
>
> Signed-off-by: Petr Mladek <[email protected]>
> ---
> kernel/trace/ring_buffer.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 7c56c3d06943..3020060ded7e 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -2529,13 +2529,15 @@ rb_reserve_next_event(struct ring_buffer *buffer,
>
> #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
> /*
> - * Due to the ability to swap a cpu buffer from a buffer
> - * it is possible it was swapped before we committed.
> - * (committing stops a swap). We check for it here and
> - * if it happened, we have to fail the write.
> + * The cpu buffer swap could have started before we managed to stop it
> + * by incrementing the "committing" values. If the swap is in progress,
> + * we see disabled recording. If the swap has finished, we see the new
> + * cpu buffer. In both cases, we should go back and stop committing
> + * to the old buffer. See also ring_buffer_swap_cpu()
> */
> - barrier();
> - if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
> + smp_mb();
> + if (unlikely(atomic_read(&cpu_buffer->record_disabled) ||
> + ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
> local_dec(&cpu_buffer->committing);
> local_dec(&cpu_buffer->commits);
> return NULL;
> @@ -4334,6 +4336,13 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
> atomic_inc(&cpu_buffer_a->record_disabled);
> atomic_inc(&cpu_buffer_b->record_disabled);
>
> + /*
> + * We could not swap if a commit is in progress. Also any commit could
> + * not start after we have stopped recording. Keep both checks in sync.
> + * The counter part is in rb_reserve_next_event()
> + */
> + smp_mb();

This will kill ring buffer performance. So I will not allow this fix.

What we can do is force ring_buffer_swap_cpu() to only work for the CPU
that it is on. As we have snapshot in per_cpu buffers, to make that
work, we will need to change the per_cpu version of snapshot to do a
smp_call_function_single() to the CPU that it wants to take a snapshot
of, and run the swap there.

To force this, we can remove the cpu parameter from the
ring_buffer_swap_cpu(). By doing this, we may be able to remove some of
the CONFIG_RING_BUFFER_ALLOW_SWAP hacks too!

I'm not going to sacrifice the general performance of the ring buffer
for a feature that is seldom (if ever) used.

-- Steve

> +
> ret = -EBUSY;
> if (local_read(&cpu_buffer_a->committing))
> goto out_dec;
> @@ -4348,6 +4357,12 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
>
> ret = 0;
>
> + /*
> + * Mare sure that rb_reserve_next_event() see the right
> + * buffer before we enable recording.
> + */
> + smp_wmb();
> +
> out_dec:
> atomic_dec(&cpu_buffer_a->record_disabled);
> atomic_dec(&cpu_buffer_b->record_disabled);

2014-06-27 00:55:04

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] ring-buffer: Race when writing and swapping cpu buffer in parallel

On Thu, 26 Jun 2014 09:58:31 -0400
Steven Rostedt <[email protected]> wrote:

> What we can do is force ring_buffer_swap_cpu() to only work for the CPU
> that it is on. As we have snapshot in per_cpu buffers, to make that
> work, we will need to change the per_cpu version of snapshot to do a
> smp_call_function_single() to the CPU that it wants to take a snapshot
> of, and run the swap there.
>
> To force this, we can remove the cpu parameter from the
> ring_buffer_swap_cpu(). By doing this, we may be able to remove some of
> the CONFIG_RING_BUFFER_ALLOW_SWAP hacks too!
>
> I'm not going to sacrifice the general performance of the ring buffer
> for a feature that is seldom (if ever) used.

Did you want to do the above, or do you want me to write something up?

-- Steve

2014-06-27 07:47:01

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH] ring-buffer: Race when writing and swapping cpu buffer in parallel

On Thu 2014-06-26 20:55:00, Steven Rostedt wrote:
> On Thu, 26 Jun 2014 09:58:31 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > What we can do is force ring_buffer_swap_cpu() to only work for the CPU
> > that it is on. As we have snapshot in per_cpu buffers, to make that
> > work, we will need to change the per_cpu version of snapshot to do a
> > smp_call_function_single() to the CPU that it wants to take a snapshot
> > of, and run the swap there.
> >
> > To force this, we can remove the cpu parameter from the
> > ring_buffer_swap_cpu(). By doing this, we may be able to remove some of
> > the CONFIG_RING_BUFFER_ALLOW_SWAP hacks too!
> >
> > I'm not going to sacrifice the general performance of the ring buffer
> > for a feature that is seldom (if ever) used.

Fair enough. I see the point.

> Did you want to do the above, or do you want me to write something up?

I could look at it the following week.

When I think about it, the race is not that critical. In the worst
case, it slightly modifies the swapped buffer but there should not
be danger of any data corruption. Anyway, I will try to get rid of
it the way you suggested. It might be more important when the
ringbuffer has more users.

Best Regards,
Petr