2009-06-03 14:17:30

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH 2/3] ring-buffer: try to discard unneeded timestamps

From: Steven Rostedt <[email protected]>

There are times that a race may happen that we add a timestamp in a
nested write. This timestamp would just contain a zero delta and serves
no purpose.

Now that we have a way to discard events, this patch will try to discard
the timestamp instead of just wasting the space in the ring buffer.

Signed-off-by: Steven Rostedt <[email protected]>
---
kernel/trace/ring_buffer.c | 67 +++++++++++++++++++++++++++-----------------
1 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9453023..5092660 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1335,6 +1335,38 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
return event;
}

+static inline int
+rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
+ struct ring_buffer_event *event)
+{
+ unsigned long new_index, old_index;
+ struct buffer_page *bpage;
+ unsigned long index;
+ unsigned long addr;
+
+ new_index = rb_event_index(event);
+ old_index = new_index + rb_event_length(event);
+ addr = (unsigned long)event;
+ addr &= PAGE_MASK;
+
+ bpage = cpu_buffer->tail_page;
+
+ if (bpage->page == (void *)addr && rb_page_write(bpage) == old_index) {
+ /*
+ * This is on the tail page. It is possible that
+ * a write could come in and move the tail page
+ * and write to the next page. That is fine
+ * because we just shorten what is on this page.
+ */
+ index = local_cmpxchg(&bpage->write, old_index, new_index);
+ if (index == old_index)
+ return 1;
+ }
+
+ /* could not discard */
+ return 0;
+}
+
static int
rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
u64 *ts, u64 *delta)
@@ -1384,10 +1416,13 @@ rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
/* let the caller know this was the commit */
ret = 1;
} else {
- /* Darn, this is just wasted space */
- event->time_delta = 0;
- event->array[0] = 0;
- ret = 0;
+ /* Try to discard the event */
+ if (!rb_try_to_discard(cpu_buffer, event)) {
+ /* Darn, this is just wasted space */
+ event->time_delta = 0;
+ event->array[0] = 0;
+ ret = 0;
+ }
}

*delta = 0;
@@ -1682,10 +1717,6 @@ void ring_buffer_discard_commit(struct ring_buffer *buffer,
struct ring_buffer_event *event)
{
struct ring_buffer_per_cpu *cpu_buffer;
- unsigned long new_index, old_index;
- struct buffer_page *bpage;
- unsigned long index;
- unsigned long addr;
int cpu;

/* The event is discarded regardless */
@@ -1701,24 +1732,8 @@ void ring_buffer_discard_commit(struct ring_buffer *buffer,
cpu = smp_processor_id();
cpu_buffer = buffer->buffers[cpu];

- new_index = rb_event_index(event);
- old_index = new_index + rb_event_length(event);
- addr = (unsigned long)event;
- addr &= PAGE_MASK;
-
- bpage = cpu_buffer->tail_page;
-
- if (bpage->page == (void *)addr && rb_page_write(bpage) == old_index) {
- /*
- * This is on the tail page. It is possible that
- * a write could come in and move the tail page
- * and write to the next page. That is fine
- * because we just shorten what is on this page.
- */
- index = local_cmpxchg(&bpage->write, old_index, new_index);
- if (index == old_index)
- goto out;
- }
+ if (!rb_try_to_discard(cpu_buffer, event))
+ goto out;

/*
* The commit is still visible by the reader, so we
--
1.6.3.1

--


2009-06-03 18:55:48

by Tim Bird

[permalink] [raw]
Subject: Re: [PATCH 2/3] ring-buffer: try to discard unneeded timestamps

Steven Rostedt wrote:
> There are times that a race may happen that we add a timestamp in a
> nested write. This timestamp would just contain a zero delta and serves
> no purpose.
>
> Now that we have a way to discard events, this patch will try to discard
> the timestamp instead of just wasting the space in the ring buffer.
>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> kernel/trace/ring_buffer.c | 67 +++++++++++++++++++++++++++-----------------
> 1 files changed, 41 insertions(+), 26 deletions(-)
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 9453023..5092660 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1335,6 +1335,38 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
> return event;
> }
>
> +static inline int
> +rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
> + struct ring_buffer_event *event)
> +{
> + unsigned long new_index, old_index;
> + struct buffer_page *bpage;
> + unsigned long index;
> + unsigned long addr;
> +
> + new_index = rb_event_index(event);
> + old_index = new_index + rb_event_length(event);
> + addr = (unsigned long)event;
> + addr &= PAGE_MASK;
> +
> + bpage = cpu_buffer->tail_page;
> +
> + if (bpage->page == (void *)addr && rb_page_write(bpage) == old_index) {
> + /*
> + * This is on the tail page. It is possible that
> + * a write could come in and move the tail page
> + * and write to the next page. That is fine
> + * because we just shorten what is on this page.
> + */
> + index = local_cmpxchg(&bpage->write, old_index, new_index);
> + if (index == old_index)
> + return 1;
> + }
> +
> + /* could not discard */
> + return 0;
> +}
> +

Is this new routine only for discarding uncommitted events,
or can it be used on committed events?

I assume the former, since I see nothing about adjusting the
commit position.

In the ring_buffer API I see that there's a function for
discarding events (committed ones), but not for free-ing them.
In function duration filtering, it is desirable to free the
last committed event, which for a function exit of short
duration will be it's entry event 99% of the time.
-- Tim

P.S. I'm very sorry about the missing '>' on the Signed-off-by line.
I ran checkpatch and got a passing score, but missed this.

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================

2009-06-03 19:14:18

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/3] ring-buffer: try to discard unneeded timestamps


On Wed, 3 Jun 2009, Tim Bird wrote:
>
> Is this new routine only for discarding uncommitted events,
> or can it be used on committed events?
>
> I assume the former, since I see nothing about adjusting the
> commit position.

Only uncommitted events.

>
> In the ring_buffer API I see that there's a function for
> discarding events (committed ones), but not for free-ing them.
> In function duration filtering, it is desirable to free the
> last committed event, which for a function exit of short
> duration will be it's entry event 99% of the time.

In filtering we deside before commiting if we want to discard or not.
(Note, this is only in tip right now.) Once we commit it, there is no way
to safely remove it from the ring buffer. Additions of items are not under
a lock (only the moving from page to page is).

For the event tracer we check if we want to disard it or not before we
commit.

>
> P.S. I'm very sorry about the missing '>' on the Signed-off-by line.
> I ran checkpatch and got a passing score, but missed this.

No prob, you only made me spam LKML (and others) with about 10 gargage
emails ;-)

-- Steve

2009-06-03 19:37:57

by Tim Bird

[permalink] [raw]
Subject: Re: [PATCH 2/3] ring-buffer: try to discard unneeded timestamps

Steven Rostedt wrote:
> On Wed, 3 Jun 2009, Tim Bird wrote:
>> Is this new routine only for discarding uncommitted events,
>> or can it be used on committed events?
>>
>> I assume the former, since I see nothing about adjusting the
>> commit position.
>
> Only uncommitted events.
OK.

>> In the ring_buffer API I see that there's a function for
>> discarding events (committed ones), but not for free-ing them.
>> In function duration filtering, it is desirable to free the
>> last committed event, which for a function exit of short
>> duration will be it's entry event 99% of the time.
>
> In filtering we deside before commiting if we want to discard or not.
> (Note, this is only in tip right now.) Once we commit it, there is no way
> to safely remove it from the ring buffer. Additions of items are not under
> a lock (only the moving from page to page is).
>
> For the event tracer we check if we want to disard it or not before we
> commit.

Yeah - that's what I thought. I have duration filtering working
(well, the user interface is not done yet), but with the above
limitations, I can only free the exit from the trace, and mark
the entry event as discarded. It would save a whole lot more
space to free the entry event as well.

I'm experimenting with free-ing only the last committed event,
when no other write has occurred in the buffer. But I'm still
not sure I can make it safe. Under normal conditions this
would be sufficient to catch 99% of the cases. I did this
in KFT, but under locks, and I know you want to be lockless with
ftrace.

Are writes the only issue, or is it a problem with readers?
I was thinking of experimenting with allowing it when no readers
were active (or were on a different page).

>> P.S. I'm very sorry about the missing '>' on the Signed-off-by line.
>> I ran checkpatch and got a passing score, but missed this.
>
> No prob, you only made me spam LKML (and others) with about 10 garbage
> emails ;-)
LOL. Sorry again. ;-)

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================

2009-06-03 20:56:43

by Tim Bird

[permalink] [raw]
Subject: Re: [PATCH 2/3] ring-buffer: try to discard unneeded timestamps

Steve,

See my last comment for an important fix.

Steven Rostedt wrote:
> From: Steven Rostedt <[email protected]>
>
> There are times that a race may happen that we add a timestamp in a
> nested write. This timestamp would just contain a zero delta and serves
> no purpose.
>
> Now that we have a way to discard events, this patch will try to discard
> the timestamp instead of just wasting the space in the ring buffer.
>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> kernel/trace/ring_buffer.c | 67 +++++++++++++++++++++++++++-----------------
> 1 files changed, 41 insertions(+), 26 deletions(-)
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 9453023..5092660 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1335,6 +1335,38 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
> return event;
> }
>
> +static inline int
> +rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
> + struct ring_buffer_event *event)

It might be worth using the terminology 'rb_try_to_free' (or something else)
to distinguish the case of backing up the write pointer from the
case of marking an event as discarded (pad) in the log.

> +{
> + unsigned long new_index, old_index;
> + struct buffer_page *bpage;
> + unsigned long index;
> + unsigned long addr;
> +
> + new_index = rb_event_index(event);
> + old_index = new_index + rb_event_length(event);
> + addr = (unsigned long)event;
> + addr &= PAGE_MASK;
> +
> + bpage = cpu_buffer->tail_page;
> +
> + if (bpage->page == (void *)addr && rb_page_write(bpage) == old_index) {
> + /*
> + * This is on the tail page. It is possible that
> + * a write could come in and move the tail page
> + * and write to the next page. That is fine
> + * because we just shorten what is on this page.
> + */
> + index = local_cmpxchg(&bpage->write, old_index, new_index);
> + if (index == old_index)
> + return 1;
> + }
> +
> + /* could not discard */
> + return 0;
> +}
> +
> static int
> rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
> u64 *ts, u64 *delta)
> @@ -1384,10 +1416,13 @@ rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
> /* let the caller know this was the commit */
> ret = 1;
> } else {
> - /* Darn, this is just wasted space */
> - event->time_delta = 0;
> - event->array[0] = 0;
> - ret = 0;
> + /* Try to discard the event */
> + if (!rb_try_to_discard(cpu_buffer, event)) {
> + /* Darn, this is just wasted space */
> + event->time_delta = 0;
> + event->array[0] = 0;
> + ret = 0;
> + }
> }
>
> *delta = 0;
> @@ -1682,10 +1717,6 @@ void ring_buffer_discard_commit(struct ring_buffer *buffer,
> struct ring_buffer_event *event)
> {
> struct ring_buffer_per_cpu *cpu_buffer;
> - unsigned long new_index, old_index;
> - struct buffer_page *bpage;
> - unsigned long index;
> - unsigned long addr;
> int cpu;
>
> /* The event is discarded regardless */
> @@ -1701,24 +1732,8 @@ void ring_buffer_discard_commit(struct ring_buffer *buffer,
> cpu = smp_processor_id();
> cpu_buffer = buffer->buffers[cpu];
>
> - new_index = rb_event_index(event);
> - old_index = new_index + rb_event_length(event);
> - addr = (unsigned long)event;
> - addr &= PAGE_MASK;
> -
> - bpage = cpu_buffer->tail_page;
> -
> - if (bpage->page == (void *)addr && rb_page_write(bpage) == old_index) {
> - /*
> - * This is on the tail page. It is possible that
> - * a write could come in and move the tail page
> - * and write to the next page. That is fine
> - * because we just shorten what is on this page.
> - */
> - index = local_cmpxchg(&bpage->write, old_index, new_index);
> - if (index == old_index)
> - goto out;
> - }
> + if (!rb_try_to_discard(cpu_buffer, event))
> + goto out;

The sense of this test is wrong. It should be:

if (rb_try_to_discard(cpu_buffer, event))
goto out;

You want to go to 'out' (skipping the increment of entries)
if you successfully freed the event.

In my testing, rb_try_to_discard was almost alway successful.
Since only one statement is jumped over by the goto, the
following might be better:

if (unlikely(!rb_try_to_discard(cpu_buffer, event)))
/*
* The commit is still visible by the reader, so we
* must increment entries.
*/
local_inc(&cpu_buffer->entries);


> /*
> * The commit is still visible by the reader, so we
> -- 1.6.3.1
> --


--
=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================