Subject: [PATCH 0/9] oprofile: port to the new ring buffer

This patch set ports cpu buffers in oprofile to the new ring buffer
provided by the tracing framework. The motivation here is to leave the
pain of implementing ring buffers to others. Oh, no, there are more
advantages. Main reason is the support of different sample sizes that
could be stored in the buffer. Use cases for this are IBS and Cell spu
profiling. Using the new ring buffer ensures valid and complete
samples and allows copying the cpu buffer stateless without knowing
its content. Second it will use generic kernel API and also reduce
code size. And hopefully, there are less bugs.

The patch set is also available here:

git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git ring_buffer

-Robert



Subject: [PATCH 4/9] oprofile: moving cpu_buffer_reset() to cpu_buffer.h

This is in preparation for changes in the cpu buffer implementation.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/oprofile/buffer_sync.c | 1 +
drivers/oprofile/cpu_buffer.c | 12 ------------
drivers/oprofile/cpu_buffer.h | 28 +++++++++++++---------------
3 files changed, 14 insertions(+), 27 deletions(-)

diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
index aed286c..944a583 100644
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -548,6 +548,7 @@ void sync_buffer(int cpu)

/* Remember, only we can modify tail_pos */

+ cpu_buffer_reset(cpu);
#ifndef CONFIG_OPROFILE_IBS
available = cpu_buffer_entries(cpu_buf);

diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index d6f5de6..5cf7efe 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -124,18 +124,6 @@ void end_cpu_work(void)
flush_scheduled_work();
}

-/* Resets the cpu buffer to a sane state. */
-void cpu_buffer_reset(struct oprofile_cpu_buffer *cpu_buf)
-{
- /*
- * reset these to invalid values; the next sample collected
- * will populate the buffer with proper values to initialize
- * the buffer
- */
- cpu_buf->last_is_kernel = -1;
- cpu_buf->last_task = NULL;
-}
-
/* compute number of available slots in cpu_buffer queue */
static unsigned long nr_available_slots(struct oprofile_cpu_buffer const *b)
{
diff --git a/drivers/oprofile/cpu_buffer.h b/drivers/oprofile/cpu_buffer.h
index 6055b56..895763f 100644
--- a/drivers/oprofile/cpu_buffer.h
+++ b/drivers/oprofile/cpu_buffer.h
@@ -50,7 +50,19 @@ struct oprofile_cpu_buffer {

DECLARE_PER_CPU(struct oprofile_cpu_buffer, cpu_buffer);

-void cpu_buffer_reset(struct oprofile_cpu_buffer *cpu_buf);
+/*
+ * Resets the cpu buffer to a sane state.
+ *
+ * reset these to invalid values; the next sample collected will
+ * populate the buffer with proper values to initialize the buffer
+ */
+static inline void cpu_buffer_reset(int cpu)
+{
+ struct oprofile_cpu_buffer *cpu_buf = &per_cpu(cpu_buffer, cpu);
+
+ cpu_buf->last_is_kernel = -1;
+ cpu_buf->last_task = NULL;
+}

static inline
struct op_sample *cpu_buffer_write_entry(struct oprofile_cpu_buffer *cpu_buf)
@@ -88,20 +100,6 @@ unsigned long cpu_buffer_entries(struct oprofile_cpu_buffer *b)
unsigned long head = b->head_pos;
unsigned long tail = b->tail_pos;

- /*
- * Subtle. This resets the persistent last_task
- * and in_kernel values used for switching notes.
- * BUT, there is a small window between reading
- * head_pos, and this call, that means samples
- * can appear at the new head position, but not
- * be prefixed with the notes for switching
- * kernel mode or a task switch. This small hole
- * can lead to mis-attribution or samples where
- * we don't know if it's in the kernel or not,
- * at the start of an event buffer.
- */
- cpu_buffer_reset(b);
-
if (head >= tail)
return head - tail;

--
1.6.0.1

Subject: [PATCH 5/9] ring_buffer: add remaining cpu functions to ring_buffer.h

These functions are not yet in ring_buffer.h though they seems to be
part of the API.

Cc: Steven Rostedt <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
include/linux/ring_buffer.h | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index e097c2e..de9d8c1 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -116,6 +116,8 @@ void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);

unsigned long ring_buffer_entries(struct ring_buffer *buffer);
unsigned long ring_buffer_overruns(struct ring_buffer *buffer);
+unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu);
+unsigned long ring_buffer_overrun_cpu(struct ring_buffer *buffer, int cpu);

u64 ring_buffer_time_stamp(int cpu);
void ring_buffer_normalize_time_stamp(int cpu, u64 *ts);
--
1.6.0.1

Subject: [PATCH 2/9] oprofile: adding cpu_buffer_write_commit()

This is in preparation for changes in the cpu buffer implementation.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/oprofile/cpu_buffer.c | 18 +-----------------
drivers/oprofile/cpu_buffer.h | 17 +++++++++++++++++
2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index 7e5e650..d6f5de6 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -148,22 +148,6 @@ static unsigned long nr_available_slots(struct oprofile_cpu_buffer const *b)
return tail + (b->buffer_size - head) - 1;
}

-static void increment_head(struct oprofile_cpu_buffer *b)
-{
- unsigned long new_head = b->head_pos + 1;
-
- /*
- * Ensure anything written to the slot before we increment is
- * visible
- */
- wmb();
-
- if (new_head < b->buffer_size)
- b->head_pos = new_head;
- else
- b->head_pos = 0;
-}
-
static inline void
add_sample(struct oprofile_cpu_buffer *cpu_buf,
unsigned long pc, unsigned long event)
@@ -171,7 +155,7 @@ add_sample(struct oprofile_cpu_buffer *cpu_buf,
struct op_sample *entry = cpu_buffer_write_entry(cpu_buf);
entry->eip = pc;
entry->event = event;
- increment_head(cpu_buf);
+ cpu_buffer_write_commit(cpu_buf);
}

static inline void
diff --git a/drivers/oprofile/cpu_buffer.h b/drivers/oprofile/cpu_buffer.h
index 0870699..e608976 100644
--- a/drivers/oprofile/cpu_buffer.h
+++ b/drivers/oprofile/cpu_buffer.h
@@ -59,6 +59,23 @@ struct op_sample *cpu_buffer_write_entry(struct oprofile_cpu_buffer *cpu_buf)
}

static inline
+void cpu_buffer_write_commit(struct oprofile_cpu_buffer *b)
+{
+ unsigned long new_head = b->head_pos + 1;
+
+ /*
+ * Ensure anything written to the slot before we increment is
+ * visible
+ */
+ wmb();
+
+ if (new_head < b->buffer_size)
+ b->head_pos = new_head;
+ else
+ b->head_pos = 0;
+}
+
+static inline
struct op_sample *cpu_buffer_read_entry(struct oprofile_cpu_buffer *cpu_buf)
{
return &cpu_buf->buffer[cpu_buf->tail_pos];
--
1.6.0.1

Subject: [PATCH 3/9] oprofile: adding cpu_buffer_entries()

This is in preparation for changes in the cpu buffer implementation.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/oprofile/buffer_sync.c | 31 ++-----------------------------
drivers/oprofile/cpu_buffer.h | 27 +++++++++++++++++++++++++++
2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
index 44f676c..aed286c 100644
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -464,33 +464,6 @@ static inline int is_code(unsigned long val)
}


-/* "acquire" as many cpu buffer slots as we can */
-static unsigned long get_slots(struct oprofile_cpu_buffer *b)
-{
- unsigned long head = b->head_pos;
- unsigned long tail = b->tail_pos;
-
- /*
- * Subtle. This resets the persistent last_task
- * and in_kernel values used for switching notes.
- * BUT, there is a small window between reading
- * head_pos, and this call, that means samples
- * can appear at the new head position, but not
- * be prefixed with the notes for switching
- * kernel mode or a task switch. This small hole
- * can lead to mis-attribution or samples where
- * we don't know if it's in the kernel or not,
- * at the start of an event buffer.
- */
- cpu_buffer_reset(b);
-
- if (head >= tail)
- return head - tail;
-
- return head + (b->buffer_size - tail);
-}
-
-
/* Move tasks along towards death. Any tasks on dead_tasks
* will definitely have no remaining references in any
* CPU buffers at this point, because we use two lists,
@@ -576,11 +549,11 @@ void sync_buffer(int cpu)
/* Remember, only we can modify tail_pos */

#ifndef CONFIG_OPROFILE_IBS
- available = get_slots(cpu_buf);
+ available = cpu_buffer_entries(cpu_buf);

for (i = 0; i < available; ++i) {
#else
- while (get_slots(cpu_buf)) {
+ while (cpu_buffer_entries(cpu_buf)) {
#endif
struct op_sample *s = cpu_buffer_read_entry(cpu_buf);

diff --git a/drivers/oprofile/cpu_buffer.h b/drivers/oprofile/cpu_buffer.h
index e608976..6055b56 100644
--- a/drivers/oprofile/cpu_buffer.h
+++ b/drivers/oprofile/cpu_buffer.h
@@ -81,6 +81,33 @@ struct op_sample *cpu_buffer_read_entry(struct oprofile_cpu_buffer *cpu_buf)
return &cpu_buf->buffer[cpu_buf->tail_pos];
}

+/* "acquire" as many cpu buffer slots as we can */
+static inline
+unsigned long cpu_buffer_entries(struct oprofile_cpu_buffer *b)
+{
+ unsigned long head = b->head_pos;
+ unsigned long tail = b->tail_pos;
+
+ /*
+ * Subtle. This resets the persistent last_task
+ * and in_kernel values used for switching notes.
+ * BUT, there is a small window between reading
+ * head_pos, and this call, that means samples
+ * can appear at the new head position, but not
+ * be prefixed with the notes for switching
+ * kernel mode or a task switch. This small hole
+ * can lead to mis-attribution or samples where
+ * we don't know if it's in the kernel or not,
+ * at the start of an event buffer.
+ */
+ cpu_buffer_reset(b);
+
+ if (head >= tail)
+ return head - tail;
+
+ return head + (b->buffer_size - tail);
+}
+
/* transient events for the CPU buffer -> event buffer */
#define CPU_IS_KERNEL 1
#define CPU_TRACE_BEGIN 2
--
1.6.0.1

Subject: [PATCH 9/9] ring_buffer: adding EXPORT_SYMBOLs

Not sure if this should be *_GPL().

I added EXPORT_SYMBOLs for all functions part of the API
(ring_buffer.h). This is required since oprofile is using the ring
buffer and the compilation as modules would fail otherwise.

Cc: Steven Rostedt <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
kernel/trace/ring_buffer.c | 34 ++++++++++++++++++++++++++++++++++
1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c8996d2..7e0215e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -31,6 +31,7 @@ void tracing_on(void)
{
ring_buffers_off = 0;
}
+EXPORT_SYMBOL(tracing_on);

/**
* tracing_off - turn off all tracing buffers
@@ -44,6 +45,7 @@ void tracing_off(void)
{
ring_buffers_off = 1;
}
+EXPORT_SYMBOL(tracing_off);

/* Up this if you want to test the TIME_EXTENTS and normalization */
#define DEBUG_SHIFT 0
@@ -60,12 +62,14 @@ u64 ring_buffer_time_stamp(int cpu)

return time;
}
+EXPORT_SYMBOL(ring_buffer_time_stamp);

void ring_buffer_normalize_time_stamp(int cpu, u64 *ts)
{
/* Just stupid testing the normalize function and deltas */
*ts >>= DEBUG_SHIFT;
}
+EXPORT_SYMBOL(ring_buffer_normalize_time_stamp);

#define RB_EVNT_HDR_SIZE (sizeof(struct ring_buffer_event))
#define RB_ALIGNMENT_SHIFT 2
@@ -115,6 +119,7 @@ unsigned ring_buffer_event_length(struct ring_buffer_event *event)
{
return rb_event_length(event);
}
+EXPORT_SYMBOL(ring_buffer_event_length);

/* inline for ring buffer fast paths */
static inline void *
@@ -136,6 +141,7 @@ void *ring_buffer_event_data(struct ring_buffer_event *event)
{
return rb_event_data(event);
}
+EXPORT_SYMBOL(ring_buffer_event_data);

#define for_each_buffer_cpu(buffer, cpu) \
for_each_cpu_mask(cpu, buffer->cpumask)
@@ -444,6 +450,7 @@ struct ring_buffer *ring_buffer_alloc(unsigned long size, unsigned flags)
kfree(buffer);
return NULL;
}
+EXPORT_SYMBOL(ring_buffer_alloc);

/**
* ring_buffer_free - free a ring buffer.
@@ -459,6 +466,7 @@ ring_buffer_free(struct ring_buffer *buffer)

kfree(buffer);
}
+EXPORT_SYMBOL(ring_buffer_free);

static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);

@@ -620,6 +628,7 @@ int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size)
mutex_unlock(&buffer->mutex);
return -ENOMEM;
}
+EXPORT_SYMBOL(ring_buffer_resize);

static inline int rb_null_event(struct ring_buffer_event *event)
{
@@ -1220,6 +1229,7 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer,
preempt_enable_notrace();
return NULL;
}
+EXPORT_SYMBOL(ring_buffer_lock_reserve);

static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
struct ring_buffer_event *event)
@@ -1269,6 +1279,7 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,

return 0;
}
+EXPORT_SYMBOL(ring_buffer_unlock_commit);

/**
* ring_buffer_write - write data to the buffer without reserving
@@ -1334,6 +1345,7 @@ int ring_buffer_write(struct ring_buffer *buffer,

return ret;
}
+EXPORT_SYMBOL(ring_buffer_write);

static inline int rb_per_cpu_empty(struct ring_buffer_per_cpu *cpu_buffer)
{
@@ -1360,6 +1372,7 @@ void ring_buffer_record_disable(struct ring_buffer *buffer)
{
atomic_inc(&buffer->record_disabled);
}
+EXPORT_SYMBOL(ring_buffer_record_disable);

/**
* ring_buffer_record_enable - enable writes to the buffer
@@ -1372,6 +1385,7 @@ void ring_buffer_record_enable(struct ring_buffer *buffer)
{
atomic_dec(&buffer->record_disabled);
}
+EXPORT_SYMBOL(ring_buffer_record_enable);

/**
* ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer
@@ -1393,6 +1407,7 @@ void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu)
cpu_buffer = buffer->buffers[cpu];
atomic_inc(&cpu_buffer->record_disabled);
}
+EXPORT_SYMBOL(ring_buffer_record_disable_cpu);

/**
* ring_buffer_record_enable_cpu - enable writes to the buffer
@@ -1412,6 +1427,7 @@ void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu)
cpu_buffer = buffer->buffers[cpu];
atomic_dec(&cpu_buffer->record_disabled);
}
+EXPORT_SYMBOL(ring_buffer_record_enable_cpu);

/**
* ring_buffer_entries_cpu - get the number of entries in a cpu buffer
@@ -1428,6 +1444,7 @@ unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu)
cpu_buffer = buffer->buffers[cpu];
return cpu_buffer->entries;
}
+EXPORT_SYMBOL(ring_buffer_entries_cpu);

/**
* ring_buffer_overrun_cpu - get the number of overruns in a cpu_buffer
@@ -1444,6 +1461,7 @@ unsigned long ring_buffer_overrun_cpu(struct ring_buffer *buffer, int cpu)
cpu_buffer = buffer->buffers[cpu];
return cpu_buffer->overrun;
}
+EXPORT_SYMBOL(ring_buffer_overrun_cpu);

/**
* ring_buffer_entries - get the number of entries in a buffer
@@ -1466,6 +1484,7 @@ unsigned long ring_buffer_entries(struct ring_buffer *buffer)

return entries;
}
+EXPORT_SYMBOL(ring_buffer_entries);

/**
* ring_buffer_overrun_cpu - get the number of overruns in buffer
@@ -1488,6 +1507,7 @@ unsigned long ring_buffer_overruns(struct ring_buffer *buffer)

return overruns;
}
+EXPORT_SYMBOL(ring_buffer_overruns);

/**
* ring_buffer_iter_reset - reset an iterator
@@ -1513,6 +1533,7 @@ void ring_buffer_iter_reset(struct ring_buffer_iter *iter)
else
iter->read_stamp = iter->head_page->time_stamp;
}
+EXPORT_SYMBOL(ring_buffer_iter_reset);

/**
* ring_buffer_iter_empty - check if an iterator has no more to read
@@ -1527,6 +1548,7 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
return iter->head_page == cpu_buffer->commit_page &&
iter->head == rb_commit_index(cpu_buffer);
}
+EXPORT_SYMBOL(ring_buffer_iter_empty);

static void
rb_update_read_stamp(struct ring_buffer_per_cpu *cpu_buffer,
@@ -1797,6 +1819,7 @@ ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts)

return NULL;
}
+EXPORT_SYMBOL(ring_buffer_peek);

/**
* ring_buffer_iter_peek - peek at the next event to be read
@@ -1867,6 +1890,7 @@ ring_buffer_iter_peek(struct ring_buffer_iter *iter, u64 *ts)

return NULL;
}
+EXPORT_SYMBOL(ring_buffer_iter_peek);

/**
* ring_buffer_consume - return an event and consume it
@@ -1894,6 +1918,7 @@ ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts)

return event;
}
+EXPORT_SYMBOL(ring_buffer_consume);

/**
* ring_buffer_read_start - start a non consuming read of the buffer
@@ -1934,6 +1959,7 @@ ring_buffer_read_start(struct ring_buffer *buffer, int cpu)

return iter;
}
+EXPORT_SYMBOL(ring_buffer_read_start);

/**
* ring_buffer_finish - finish reading the iterator of the buffer
@@ -1950,6 +1976,7 @@ ring_buffer_read_finish(struct ring_buffer_iter *iter)
atomic_dec(&cpu_buffer->record_disabled);
kfree(iter);
}
+EXPORT_SYMBOL(ring_buffer_read_finish);

/**
* ring_buffer_read - read the next item in the ring buffer by the iterator
@@ -1971,6 +1998,7 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts)

return event;
}
+EXPORT_SYMBOL(ring_buffer_read);

/**
* ring_buffer_size - return the size of the ring buffer (in bytes)
@@ -1980,6 +2008,7 @@ unsigned long ring_buffer_size(struct ring_buffer *buffer)
{
return BUF_PAGE_SIZE * buffer->pages;
}
+EXPORT_SYMBOL(ring_buffer_size);

static void
rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
@@ -2022,6 +2051,7 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)

spin_unlock_irqrestore(&cpu_buffer->lock, flags);
}
+EXPORT_SYMBOL(ring_buffer_reset_cpu);

/**
* ring_buffer_reset - reset a ring buffer
@@ -2034,6 +2064,7 @@ void ring_buffer_reset(struct ring_buffer *buffer)
for_each_buffer_cpu(buffer, cpu)
ring_buffer_reset_cpu(buffer, cpu);
}
+EXPORT_SYMBOL(ring_buffer_reset);

/**
* rind_buffer_empty - is the ring buffer empty?
@@ -2052,6 +2083,7 @@ int ring_buffer_empty(struct ring_buffer *buffer)
}
return 1;
}
+EXPORT_SYMBOL(ring_buffer_empty);

/**
* ring_buffer_empty_cpu - is a cpu buffer of a ring buffer empty?
@@ -2068,6 +2100,7 @@ int ring_buffer_empty_cpu(struct ring_buffer *buffer, int cpu)
cpu_buffer = buffer->buffers[cpu];
return rb_per_cpu_empty(cpu_buffer);
}
+EXPORT_SYMBOL(ring_buffer_empty_cpu);

/**
* ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
@@ -2117,6 +2150,7 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,

return 0;
}
+EXPORT_SYMBOL(ring_buffer_swap_cpu);

static ssize_t
rb_simple_read(struct file *filp, char __user *ubuf,
--
1.6.0.1

Subject: [PATCH 8/9] oprofile: fix lost sample counter

The number of lost samples could be greater than the number of
received samples. This patches fixes this. The implementation
introduces return values for add_sample() and add_code().

Signed-off-by: Robert Richter <[email protected]>
---
drivers/oprofile/cpu_buffer.c | 83 ++++++++++++++++++++++++++---------------
1 files changed, 53 insertions(+), 30 deletions(-)

diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index 7f7fc95..6109096 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -145,32 +145,31 @@ void end_cpu_work(void)
flush_scheduled_work();
}

-static inline void
+static inline int
add_sample(struct oprofile_cpu_buffer *cpu_buf,
unsigned long pc, unsigned long event)
{
struct op_entry entry;
+ int ret;

- if (cpu_buffer_write_entry(&entry))
- goto Error;
+ ret = cpu_buffer_write_entry(&entry);
+ if (ret)
+ return ret;

entry.sample->eip = pc;
entry.sample->event = event;

- if (cpu_buffer_write_commit(&entry))
- goto Error;
+ ret = cpu_buffer_write_commit(&entry);
+ if (ret)
+ return ret;

- return;
-
-Error:
- cpu_buf->sample_lost_overflow++;
- return;
+ return 0;
}

-static inline void
+static inline int
add_code(struct oprofile_cpu_buffer *buffer, unsigned long value)
{
- add_sample(buffer, ESCAPE_CODE, value);
+ return add_sample(buffer, ESCAPE_CODE, value);
}

/* This must be safe from any context. It's safe writing here
@@ -201,17 +200,25 @@ static int log_sample(struct oprofile_cpu_buffer *cpu_buf, unsigned long pc,
/* notice a switch from user->kernel or vice versa */
if (cpu_buf->last_is_kernel != is_kernel) {
cpu_buf->last_is_kernel = is_kernel;
- add_code(cpu_buf, is_kernel);
+ if (add_code(cpu_buf, is_kernel))
+ goto fail;
}

/* notice a task switch */
if (cpu_buf->last_task != task) {
cpu_buf->last_task = task;
- add_code(cpu_buf, (unsigned long)task);
+ if (add_code(cpu_buf, (unsigned long)task))
+ goto fail;
}

- add_sample(cpu_buf, pc, event);
+ if (add_sample(cpu_buf, pc, event))
+ goto fail;
+
return 1;
+
+fail:
+ cpu_buf->sample_lost_overflow++;
+ return 0;
}

static int oprofile_begin_trace(struct oprofile_cpu_buffer *cpu_buf)
@@ -266,37 +273,49 @@ void oprofile_add_ibs_sample(struct pt_regs * const regs,
int is_kernel = !user_mode(regs);
struct oprofile_cpu_buffer *cpu_buf = &__get_cpu_var(cpu_buffer);
struct task_struct *task;
+ int fail = 0;

cpu_buf->sample_received++;

/* notice a switch from user->kernel or vice versa */
if (cpu_buf->last_is_kernel != is_kernel) {
+ if (add_code(cpu_buf, is_kernel))
+ goto fail;
cpu_buf->last_is_kernel = is_kernel;
- add_code(cpu_buf, is_kernel);
}

/* notice a task switch */
if (!is_kernel) {
task = current;
if (cpu_buf->last_task != task) {
+ if (add_code(cpu_buf, (unsigned long)task))
+ goto fail;
cpu_buf->last_task = task;
- add_code(cpu_buf, (unsigned long)task);
}
}

- add_code(cpu_buf, ibs_code);
- add_sample(cpu_buf, ibs_sample[0], ibs_sample[1]);
- add_sample(cpu_buf, ibs_sample[2], ibs_sample[3]);
- add_sample(cpu_buf, ibs_sample[4], ibs_sample[5]);
+ fail = fail || add_code(cpu_buf, ibs_code);
+ fail = fail || add_sample(cpu_buf, ibs_sample[0], ibs_sample[1]);
+ fail = fail || add_sample(cpu_buf, ibs_sample[2], ibs_sample[3]);
+ fail = fail || add_sample(cpu_buf, ibs_sample[4], ibs_sample[5]);

if (ibs_code == IBS_OP_BEGIN) {
- add_sample(cpu_buf, ibs_sample[6], ibs_sample[7]);
- add_sample(cpu_buf, ibs_sample[8], ibs_sample[9]);
- add_sample(cpu_buf, ibs_sample[10], ibs_sample[11]);
+ fail = fail || add_sample(cpu_buf, ibs_sample[6], ibs_sample[7]);
+ fail = fail || add_sample(cpu_buf, ibs_sample[8], ibs_sample[9]);
+ fail = fail || add_sample(cpu_buf, ibs_sample[10], ibs_sample[11]);
}

+ if (fail)
+ goto fail;
+
if (backtrace_depth)
oprofile_ops.backtrace(regs, backtrace_depth);
+
+ return;
+
+fail:
+ cpu_buf->sample_lost_overflow++;
+ return;
}

#endif
@@ -318,13 +337,17 @@ void oprofile_add_trace(unsigned long pc)
* broken frame can give an eip with the same value as an
* escape code, abort the trace if we get it
*/
- if (pc == ESCAPE_CODE) {
- cpu_buf->tracing = 0;
- cpu_buf->backtrace_aborted++;
- return;
- }
+ if (pc == ESCAPE_CODE)
+ goto fail;
+
+ if (add_sample(cpu_buf, pc, 0))
+ goto fail;

- add_sample(cpu_buf, pc, 0);
+ return;
+fail:
+ cpu_buf->tracing = 0;
+ cpu_buf->backtrace_aborted++;
+ return;
}

/*
--
1.6.0.1

Subject: [PATCH 7/9] oprofile: remove nr_available_slots()

This function is no longer available after the port to the new ring
buffer. Its removal can lead to incomplete sampling sequences since
IBS samples and backtraces are transfered in multiple samples. Due to
a full buffer, samples could be lost any time. The userspace daemon
has to live with such incomplete sampling sequences as long as the
data within one sample is consistent.

This will be fixed by changing the internal buffer data there all data
of one IBS sample or a backtrace is packed in a single ring buffer
entry. This is possible since the new ring buffer supports variable
data size.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/oprofile/cpu_buffer.c | 34 ----------------------------------
1 files changed, 0 insertions(+), 34 deletions(-)

diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index eb280ec..7f7fc95 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -145,18 +145,6 @@ void end_cpu_work(void)
flush_scheduled_work();
}

-/* compute number of available slots in cpu_buffer queue */
-static unsigned long nr_available_slots(struct oprofile_cpu_buffer const *b)
-{
- unsigned long head = b->head_pos;
- unsigned long tail = b->tail_pos;
-
- if (tail > head)
- return (tail - head) - 1;
-
- return tail + (b->buffer_size - head) - 1;
-}
-
static inline void
add_sample(struct oprofile_cpu_buffer *cpu_buf,
unsigned long pc, unsigned long event)
@@ -206,11 +194,6 @@ static int log_sample(struct oprofile_cpu_buffer *cpu_buf, unsigned long pc,
return 0;
}

- if (nr_available_slots(cpu_buf) < 3) {
- cpu_buf->sample_lost_overflow++;
- return 0;
- }
-
is_kernel = !!is_kernel;

task = current;
@@ -233,11 +216,6 @@ static int log_sample(struct oprofile_cpu_buffer *cpu_buf, unsigned long pc,

static int oprofile_begin_trace(struct oprofile_cpu_buffer *cpu_buf)
{
- if (nr_available_slots(cpu_buf) < 4) {
- cpu_buf->sample_lost_overflow++;
- return 0;
- }
-
add_code(cpu_buf, CPU_TRACE_BEGIN);
cpu_buf->tracing = 1;
return 1;
@@ -291,12 +269,6 @@ void oprofile_add_ibs_sample(struct pt_regs * const regs,

cpu_buf->sample_received++;

- if (nr_available_slots(cpu_buf) < MAX_IBS_SAMPLE_SIZE) {
- /* we can't backtrace since we lost the source of this event */
- cpu_buf->sample_lost_overflow++;
- return;
- }
-
/* notice a switch from user->kernel or vice versa */
if (cpu_buf->last_is_kernel != is_kernel) {
cpu_buf->last_is_kernel = is_kernel;
@@ -342,12 +314,6 @@ void oprofile_add_trace(unsigned long pc)
if (!cpu_buf->tracing)
return;

- if (nr_available_slots(cpu_buf) < 1) {
- cpu_buf->tracing = 0;
- cpu_buf->sample_lost_overflow++;
- return;
- }
-
/*
* broken frame can give an eip with the same value as an
* escape code, abort the trace if we get it
--
1.6.0.1

Subject: [PATCH 6/9] oprofile: port to the new ring_buffer

This patch replaces the current oprofile cpu buffer implementation
with the ring buffer provided by the tracing framework. The motivation
here is to leave the pain of implementing ring buffers to others. Oh,
no, there are more advantages. Main reason is the support of different
sample sizes that could be stored in the buffer. Use cases for this
are IBS and Cell spu profiling. Using the new ring buffer ensures
valid and complete samples and allows copying the cpu buffer stateless
without knowing its content. Second it will use generic kernel API and
also reduce code size. And hopefully, there are less bugs.

Since the new tracing ring buffer implementation uses spin locks to
protect the buffer during read/write access, it is difficult to use
the buffer in an NMI handler. In this case, writing to the buffer by
the NMI handler (x86) could occur also during critical sections when
reading the buffer. To avoid this, there are 2 buffers for independent
read and write access. Read access is in process context only, write
access only in the NMI handler. If the read buffer runs empty, both
buffers are swapped atomically. There is potentially a small window
during swapping where the buffers are disabled and samples could be
lost.

Using 2 buffers is a little bit overhead, but the solution is clear
and does not require changes in the ring buffer implementation. It can
be changed to a single buffer solution when the ring buffer access is
implemented as non-locking atomic code.

The new buffer requires more size to store the same amount of samples
because each sample includes an u32 header. Also, there is more code
to execute for buffer access. Nonetheless, the buffer implementation
is proven in the ftrace environment and worth to use also in oprofile.

Patches that changes the internal IBS buffer usage will follow.

Cc: Steven Rostedt <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
drivers/oprofile/buffer_sync.c | 65 ++++++++++++++----------------------
drivers/oprofile/cpu_buffer.c | 63 +++++++++++++++++++++++++++--------
drivers/oprofile/cpu_buffer.h | 71 +++++++++++++++++++++++-----------------
3 files changed, 114 insertions(+), 85 deletions(-)

diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
index 944a583..737bd94 100644
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -268,18 +268,6 @@ lookup_dcookie(struct mm_struct *mm, unsigned long addr, off_t *offset)
return cookie;
}

-static void increment_tail(struct oprofile_cpu_buffer *b)
-{
- unsigned long new_tail = b->tail_pos + 1;
-
- rmb(); /* be sure fifo pointers are synchronized */
-
- if (new_tail < b->buffer_size)
- b->tail_pos = new_tail;
- else
- b->tail_pos = 0;
-}
-
static unsigned long last_cookie = INVALID_COOKIE;

static void add_cpu_switch(int i)
@@ -331,26 +319,25 @@ static void add_trace_begin(void)

#define IBS_FETCH_CODE_SIZE 2
#define IBS_OP_CODE_SIZE 5
-#define IBS_EIP(cpu_buf) ((cpu_buffer_read_entry(cpu_buf))->eip)
-#define IBS_EVENT(cpu_buf) ((cpu_buffer_read_entry(cpu_buf))->event)

/*
* Add IBS fetch and op entries to event buffer
*/
-static void add_ibs_begin(struct oprofile_cpu_buffer *cpu_buf, int code,
- struct mm_struct *mm)
+static void add_ibs_begin(int cpu, int code, struct mm_struct *mm)
{
unsigned long rip;
int i, count;
unsigned long ibs_cookie = 0;
off_t offset;
+ struct op_sample *sample;

- increment_tail(cpu_buf); /* move to RIP entry */
-
- rip = IBS_EIP(cpu_buf);
+ sample = cpu_buffer_read_entry(cpu);
+ if (!sample)
+ goto Error;
+ rip = sample->eip;

#ifdef __LP64__
- rip += IBS_EVENT(cpu_buf) << 32;
+ rip += sample->event << 32;
#endif

if (mm) {
@@ -374,8 +361,8 @@ static void add_ibs_begin(struct oprofile_cpu_buffer *cpu_buf, int code,
add_event_entry(offset); /* Offset from Dcookie */

/* we send the Dcookie offset, but send the raw Linear Add also*/
- add_event_entry(IBS_EIP(cpu_buf));
- add_event_entry(IBS_EVENT(cpu_buf));
+ add_event_entry(sample->eip);
+ add_event_entry(sample->event);

if (code == IBS_FETCH_CODE)
count = IBS_FETCH_CODE_SIZE; /*IBS FETCH is 2 int64s*/
@@ -383,10 +370,17 @@ static void add_ibs_begin(struct oprofile_cpu_buffer *cpu_buf, int code,
count = IBS_OP_CODE_SIZE; /*IBS OP is 5 int64s*/

for (i = 0; i < count; i++) {
- increment_tail(cpu_buf);
- add_event_entry(IBS_EIP(cpu_buf));
- add_event_entry(IBS_EVENT(cpu_buf));
+ sample = cpu_buffer_read_entry(cpu);
+ if (!sample)
+ goto Error;
+ add_event_entry(sample->eip);
+ add_event_entry(sample->event);
}
+
+ return;
+
+Error:
+ return;
}

#endif
@@ -530,33 +524,26 @@ typedef enum {
*/
void sync_buffer(int cpu)
{
- struct oprofile_cpu_buffer *cpu_buf = &per_cpu(cpu_buffer, cpu);
struct mm_struct *mm = NULL;
struct mm_struct *oldmm;
struct task_struct *new;
unsigned long cookie = 0;
int in_kernel = 1;
sync_buffer_state state = sb_buffer_start;
-#ifndef CONFIG_OPROFILE_IBS
unsigned int i;
unsigned long available;
-#endif

mutex_lock(&buffer_mutex);

add_cpu_switch(cpu);

- /* Remember, only we can modify tail_pos */
-
cpu_buffer_reset(cpu);
-#ifndef CONFIG_OPROFILE_IBS
- available = cpu_buffer_entries(cpu_buf);
+ available = cpu_buffer_entries(cpu);

for (i = 0; i < available; ++i) {
-#else
- while (cpu_buffer_entries(cpu_buf)) {
-#endif
- struct op_sample *s = cpu_buffer_read_entry(cpu_buf);
+ struct op_sample *s = cpu_buffer_read_entry(cpu);
+ if (!s)
+ break;

if (is_code(s->eip)) {
switch (s->event) {
@@ -575,11 +562,11 @@ void sync_buffer(int cpu)
#ifdef CONFIG_OPROFILE_IBS
case IBS_FETCH_BEGIN:
state = sb_bt_start;
- add_ibs_begin(cpu_buf, IBS_FETCH_CODE, mm);
+ add_ibs_begin(cpu, IBS_FETCH_CODE, mm);
break;
case IBS_OP_BEGIN:
state = sb_bt_start;
- add_ibs_begin(cpu_buf, IBS_OP_CODE, mm);
+ add_ibs_begin(cpu, IBS_OP_CODE, mm);
break;
#endif
default:
@@ -600,8 +587,6 @@ void sync_buffer(int cpu)
atomic_inc(&oprofile_stats.bt_lost_no_mapping);
}
}
-
- increment_tail(cpu_buf);
}
release_mm(mm);

diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index 5cf7efe..eb280ec 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -28,6 +28,25 @@
#include "buffer_sync.h"
#include "oprof.h"

+#define OP_BUFFER_FLAGS 0
+
+/*
+ * Read and write access is using spin locking. Thus, writing to the
+ * buffer by NMI handler (x86) could occur also during critical
+ * sections when reading the buffer. To avoid this, there are 2
+ * buffers for independent read and write access. Read access is in
+ * process context only, write access only in the NMI handler. If the
+ * read buffer runs empty, both buffers are swapped atomically. There
+ * is potentially a small window during swapping where the buffers are
+ * disabled and samples could be lost.
+ *
+ * Using 2 buffers is a little bit overhead, but the solution is clear
+ * and does not require changes in the ring buffer implementation. It
+ * can be changed to a single buffer solution when the ring buffer
+ * access is implemented as non-locking atomic code.
+ */
+struct ring_buffer *op_ring_buffer_read;
+struct ring_buffer *op_ring_buffer_write;
DEFINE_PER_CPU(struct oprofile_cpu_buffer, cpu_buffer);

static void wq_sync_buffer(struct work_struct *work);
@@ -37,12 +56,12 @@ static int work_enabled;

void free_cpu_buffers(void)
{
- int i;
-
- for_each_possible_cpu(i) {
- vfree(per_cpu(cpu_buffer, i).buffer);
- per_cpu(cpu_buffer, i).buffer = NULL;
- }
+ if (op_ring_buffer_read)
+ ring_buffer_free(op_ring_buffer_read);
+ op_ring_buffer_read = NULL;
+ if (op_ring_buffer_write)
+ ring_buffer_free(op_ring_buffer_write);
+ op_ring_buffer_write = NULL;
}

unsigned long oprofile_get_cpu_buffer_size(void)
@@ -64,14 +83,16 @@ int alloc_cpu_buffers(void)

unsigned long buffer_size = fs_cpu_buffer_size;

+ op_ring_buffer_read = ring_buffer_alloc(buffer_size, OP_BUFFER_FLAGS);
+ if (!op_ring_buffer_read)
+ goto fail;
+ op_ring_buffer_write = ring_buffer_alloc(buffer_size, OP_BUFFER_FLAGS);
+ if (!op_ring_buffer_write)
+ goto fail;
+
for_each_possible_cpu(i) {
struct oprofile_cpu_buffer *b = &per_cpu(cpu_buffer, i);

- b->buffer = vmalloc_node(sizeof(struct op_sample) * buffer_size,
- cpu_to_node(i));
- if (!b->buffer)
- goto fail;
-
b->last_task = NULL;
b->last_is_kernel = -1;
b->tracing = 0;
@@ -140,10 +161,22 @@ static inline void
add_sample(struct oprofile_cpu_buffer *cpu_buf,
unsigned long pc, unsigned long event)
{
- struct op_sample *entry = cpu_buffer_write_entry(cpu_buf);
- entry->eip = pc;
- entry->event = event;
- cpu_buffer_write_commit(cpu_buf);
+ struct op_entry entry;
+
+ if (cpu_buffer_write_entry(&entry))
+ goto Error;
+
+ entry.sample->eip = pc;
+ entry.sample->event = event;
+
+ if (cpu_buffer_write_commit(&entry))
+ goto Error;
+
+ return;
+
+Error:
+ cpu_buf->sample_lost_overflow++;
+ return;
}

static inline void
diff --git a/drivers/oprofile/cpu_buffer.h b/drivers/oprofile/cpu_buffer.h
index 895763f..aacb0f0 100644
--- a/drivers/oprofile/cpu_buffer.h
+++ b/drivers/oprofile/cpu_buffer.h
@@ -15,6 +15,7 @@
#include <linux/workqueue.h>
#include <linux/cache.h>
#include <linux/sched.h>
+#include <linux/ring_buffer.h>

struct task_struct;

@@ -32,6 +33,12 @@ struct op_sample {
unsigned long event;
};

+struct op_entry {
+ struct ring_buffer_event *event;
+ struct op_sample *sample;
+ unsigned long irq_flags;
+};
+
struct oprofile_cpu_buffer {
volatile unsigned long head_pos;
volatile unsigned long tail_pos;
@@ -39,7 +46,6 @@ struct oprofile_cpu_buffer {
struct task_struct *last_task;
int last_is_kernel;
int tracing;
- struct op_sample *buffer;
unsigned long sample_received;
unsigned long sample_lost_overflow;
unsigned long backtrace_aborted;
@@ -48,6 +54,8 @@ struct oprofile_cpu_buffer {
struct delayed_work work;
};

+extern struct ring_buffer *op_ring_buffer_read;
+extern struct ring_buffer *op_ring_buffer_write;
DECLARE_PER_CPU(struct oprofile_cpu_buffer, cpu_buffer);

/*
@@ -64,46 +72,49 @@ static inline void cpu_buffer_reset(int cpu)
cpu_buf->last_task = NULL;
}

-static inline
-struct op_sample *cpu_buffer_write_entry(struct oprofile_cpu_buffer *cpu_buf)
+static inline int cpu_buffer_write_entry(struct op_entry *entry)
{
- return &cpu_buf->buffer[cpu_buf->head_pos];
-}
+ entry->event = ring_buffer_lock_reserve(op_ring_buffer_write,
+ sizeof(struct op_sample),
+ &entry->irq_flags);
+ if (entry->event)
+ entry->sample = ring_buffer_event_data(entry->event);
+ else
+ entry->sample = NULL;

-static inline
-void cpu_buffer_write_commit(struct oprofile_cpu_buffer *b)
-{
- unsigned long new_head = b->head_pos + 1;
+ if (!entry->sample)
+ return -ENOMEM;

- /*
- * Ensure anything written to the slot before we increment is
- * visible
- */
- wmb();
+ return 0;
+}

- if (new_head < b->buffer_size)
- b->head_pos = new_head;
- else
- b->head_pos = 0;
+static inline int cpu_buffer_write_commit(struct op_entry *entry)
+{
+ return ring_buffer_unlock_commit(op_ring_buffer_write, entry->event,
+ entry->irq_flags);
}

-static inline
-struct op_sample *cpu_buffer_read_entry(struct oprofile_cpu_buffer *cpu_buf)
+static inline struct op_sample *cpu_buffer_read_entry(int cpu)
{
- return &cpu_buf->buffer[cpu_buf->tail_pos];
+ struct ring_buffer_event *e;
+ e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL);
+ if (e)
+ return ring_buffer_event_data(e);
+ if (ring_buffer_swap_cpu(op_ring_buffer_read,
+ op_ring_buffer_write,
+ cpu))
+ return NULL;
+ e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL);
+ if (e)
+ return ring_buffer_event_data(e);
+ return NULL;
}

/* "acquire" as many cpu buffer slots as we can */
-static inline
-unsigned long cpu_buffer_entries(struct oprofile_cpu_buffer *b)
+static inline unsigned long cpu_buffer_entries(int cpu)
{
- unsigned long head = b->head_pos;
- unsigned long tail = b->tail_pos;
-
- if (head >= tail)
- return head - tail;
-
- return head + (b->buffer_size - tail);
+ return ring_buffer_entries_cpu(op_ring_buffer_read, cpu)
+ + ring_buffer_entries_cpu(op_ring_buffer_write, cpu);
}

/* transient events for the CPU buffer -> event buffer */
--
1.6.0.1

Subject: [PATCH 1/9] oprofile: adding cpu buffer r/w access functions

This is in preparation for changes in the cpu buffer implementation.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/oprofile/buffer_sync.c | 20 +++++++++-----------
drivers/oprofile/cpu_buffer.c | 2 +-
drivers/oprofile/cpu_buffer.h | 12 ++++++++++++
3 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
index 7d61ae8..44f676c 100644
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -331,10 +331,8 @@ static void add_trace_begin(void)

#define IBS_FETCH_CODE_SIZE 2
#define IBS_OP_CODE_SIZE 5
-#define IBS_EIP(offset) \
- (((struct op_sample *)&cpu_buf->buffer[(offset)])->eip)
-#define IBS_EVENT(offset) \
- (((struct op_sample *)&cpu_buf->buffer[(offset)])->event)
+#define IBS_EIP(cpu_buf) ((cpu_buffer_read_entry(cpu_buf))->eip)
+#define IBS_EVENT(cpu_buf) ((cpu_buffer_read_entry(cpu_buf))->event)

/*
* Add IBS fetch and op entries to event buffer
@@ -349,10 +347,10 @@ static void add_ibs_begin(struct oprofile_cpu_buffer *cpu_buf, int code,

increment_tail(cpu_buf); /* move to RIP entry */

- rip = IBS_EIP(cpu_buf->tail_pos);
+ rip = IBS_EIP(cpu_buf);

#ifdef __LP64__
- rip += IBS_EVENT(cpu_buf->tail_pos) << 32;
+ rip += IBS_EVENT(cpu_buf) << 32;
#endif

if (mm) {
@@ -376,8 +374,8 @@ static void add_ibs_begin(struct oprofile_cpu_buffer *cpu_buf, int code,
add_event_entry(offset); /* Offset from Dcookie */

/* we send the Dcookie offset, but send the raw Linear Add also*/
- add_event_entry(IBS_EIP(cpu_buf->tail_pos));
- add_event_entry(IBS_EVENT(cpu_buf->tail_pos));
+ add_event_entry(IBS_EIP(cpu_buf));
+ add_event_entry(IBS_EVENT(cpu_buf));

if (code == IBS_FETCH_CODE)
count = IBS_FETCH_CODE_SIZE; /*IBS FETCH is 2 int64s*/
@@ -386,8 +384,8 @@ static void add_ibs_begin(struct oprofile_cpu_buffer *cpu_buf, int code,

for (i = 0; i < count; i++) {
increment_tail(cpu_buf);
- add_event_entry(IBS_EIP(cpu_buf->tail_pos));
- add_event_entry(IBS_EVENT(cpu_buf->tail_pos));
+ add_event_entry(IBS_EIP(cpu_buf));
+ add_event_entry(IBS_EVENT(cpu_buf));
}
}

@@ -584,7 +582,7 @@ void sync_buffer(int cpu)
#else
while (get_slots(cpu_buf)) {
#endif
- struct op_sample *s = &cpu_buf->buffer[cpu_buf->tail_pos];
+ struct op_sample *s = cpu_buffer_read_entry(cpu_buf);

if (is_code(s->eip)) {
switch (s->event) {
diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index 2c4d541..7e5e650 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -168,7 +168,7 @@ static inline void
add_sample(struct oprofile_cpu_buffer *cpu_buf,
unsigned long pc, unsigned long event)
{
- struct op_sample *entry = &cpu_buf->buffer[cpu_buf->head_pos];
+ struct op_sample *entry = cpu_buffer_write_entry(cpu_buf);
entry->eip = pc;
entry->event = event;
increment_head(cpu_buf);
diff --git a/drivers/oprofile/cpu_buffer.h b/drivers/oprofile/cpu_buffer.h
index d3cc262..0870699 100644
--- a/drivers/oprofile/cpu_buffer.h
+++ b/drivers/oprofile/cpu_buffer.h
@@ -52,6 +52,18 @@ DECLARE_PER_CPU(struct oprofile_cpu_buffer, cpu_buffer);

void cpu_buffer_reset(struct oprofile_cpu_buffer *cpu_buf);

+static inline
+struct op_sample *cpu_buffer_write_entry(struct oprofile_cpu_buffer *cpu_buf)
+{
+ return &cpu_buf->buffer[cpu_buf->head_pos];
+}
+
+static inline
+struct op_sample *cpu_buffer_read_entry(struct oprofile_cpu_buffer *cpu_buf)
+{
+ return &cpu_buf->buffer[cpu_buf->tail_pos];
+}
+
/* transient events for the CPU buffer -> event buffer */
#define CPU_IS_KERNEL 1
#define CPU_TRACE_BEGIN 2
--
1.6.0.1

2008-12-11 17:19:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/9] oprofile: port to the new ring buffer


On Thu, 11 Dec 2008, Robert Richter wrote:

> This patch set ports cpu buffers in oprofile to the new ring buffer
> provided by the tracing framework. The motivation here is to leave the
> pain of implementing ring buffers to others. Oh, no, there are more
> advantages. Main reason is the support of different sample sizes that
> could be stored in the buffer. Use cases for this are IBS and Cell spu
> profiling. Using the new ring buffer ensures valid and complete
> samples and allows copying the cpu buffer stateless without knowing
> its content. Second it will use generic kernel API and also reduce
> code size. And hopefully, there are less bugs.
>
> The patch set is also available here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git ring_buffer

Robert,

Thanks for doing this! I'll try to take time out this week to review these
patches.

-- Steve

2008-12-11 19:48:57

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 5/9] ring_buffer: add remaining cpu functions to ring_buffer.h


On Thu, 11 Dec 2008, Robert Richter wrote:

> These functions are not yet in ring_buffer.h though they seems to be
> part of the API.
>
> Cc: Steven Rostedt <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>

Acked-by: Steven Rostedt <[email protected]>

-- Steve

> ---
> include/linux/ring_buffer.h | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index e097c2e..de9d8c1 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -116,6 +116,8 @@ void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
>
> unsigned long ring_buffer_entries(struct ring_buffer *buffer);
> unsigned long ring_buffer_overruns(struct ring_buffer *buffer);
> +unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu);
> +unsigned long ring_buffer_overrun_cpu(struct ring_buffer *buffer, int cpu);
>
> u64 ring_buffer_time_stamp(int cpu);
> void ring_buffer_normalize_time_stamp(int cpu, u64 *ts);
> --
> 1.6.0.1
>
>
>
>

2008-12-11 19:52:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/9] oprofile: port to the new ring_buffer


On Thu, 11 Dec 2008, Robert Richter wrote:

> This patch replaces the current oprofile cpu buffer implementation
> with the ring buffer provided by the tracing framework. The motivation
> here is to leave the pain of implementing ring buffers to others. Oh,
> no, there are more advantages. Main reason is the support of different
> sample sizes that could be stored in the buffer. Use cases for this
> are IBS and Cell spu profiling. Using the new ring buffer ensures
> valid and complete samples and allows copying the cpu buffer stateless
> without knowing its content. Second it will use generic kernel API and
> also reduce code size. And hopefully, there are less bugs.
>
> Since the new tracing ring buffer implementation uses spin locks to
> protect the buffer during read/write access, it is difficult to use
> the buffer in an NMI handler. In this case, writing to the buffer by
> the NMI handler (x86) could occur also during critical sections when
> reading the buffer. To avoid this, there are 2 buffers for independent
> read and write access. Read access is in process context only, write
> access only in the NMI handler. If the read buffer runs empty, both
> buffers are swapped atomically. There is potentially a small window
> during swapping where the buffers are disabled and samples could be
> lost.

There is plans on removing the spinlock from the write side of the buffer.
But this will take a bit of work and care. Lockless is better, but it also
makes for more complex code which translates to more prone to bugs code.

>
> Using 2 buffers is a little bit overhead, but the solution is clear
> and does not require changes in the ring buffer implementation. It can
> be changed to a single buffer solution when the ring buffer access is
> implemented as non-locking atomic code.

Agreed.

>
> The new buffer requires more size to store the same amount of samples
> because each sample includes an u32 header. Also, there is more code
> to execute for buffer access. Nonetheless, the buffer implementation
> is proven in the ftrace environment and worth to use also in oprofile.
>
> Patches that changes the internal IBS buffer usage will follow.
>
> Cc: Steven Rostedt <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/oprofile/buffer_sync.c | 65 ++++++++++++++----------------------
> drivers/oprofile/cpu_buffer.c | 63 +++++++++++++++++++++++++++--------
> drivers/oprofile/cpu_buffer.h | 71 +++++++++++++++++++++++-----------------
> 3 files changed, 114 insertions(+), 85 deletions(-)
>
> diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
> index 944a583..737bd94 100644
> --- a/drivers/oprofile/buffer_sync.c
> +++ b/drivers/oprofile/buffer_sync.c
> @@ -268,18 +268,6 @@ lookup_dcookie(struct mm_struct *mm, unsigned long addr, off_t *offset)
> return cookie;
> }
>
> -static void increment_tail(struct oprofile_cpu_buffer *b)
> -{
> - unsigned long new_tail = b->tail_pos + 1;
> -
> - rmb(); /* be sure fifo pointers are synchronized */
> -
> - if (new_tail < b->buffer_size)
> - b->tail_pos = new_tail;
> - else
> - b->tail_pos = 0;
> -}
> -
> static unsigned long last_cookie = INVALID_COOKIE;
>
> static void add_cpu_switch(int i)
> @@ -331,26 +319,25 @@ static void add_trace_begin(void)
>
> #define IBS_FETCH_CODE_SIZE 2
> #define IBS_OP_CODE_SIZE 5
> -#define IBS_EIP(cpu_buf) ((cpu_buffer_read_entry(cpu_buf))->eip)
> -#define IBS_EVENT(cpu_buf) ((cpu_buffer_read_entry(cpu_buf))->event)
>
> /*
> * Add IBS fetch and op entries to event buffer
> */
> -static void add_ibs_begin(struct oprofile_cpu_buffer *cpu_buf, int code,
> - struct mm_struct *mm)
> +static void add_ibs_begin(int cpu, int code, struct mm_struct *mm)
> {
> unsigned long rip;
> int i, count;
> unsigned long ibs_cookie = 0;
> off_t offset;
> + struct op_sample *sample;
>
> - increment_tail(cpu_buf); /* move to RIP entry */
> -
> - rip = IBS_EIP(cpu_buf);
> + sample = cpu_buffer_read_entry(cpu);
> + if (!sample)
> + goto Error;
> + rip = sample->eip;
>
> #ifdef __LP64__
> - rip += IBS_EVENT(cpu_buf) << 32;
> + rip += sample->event << 32;
> #endif
>
> if (mm) {
> @@ -374,8 +361,8 @@ static void add_ibs_begin(struct oprofile_cpu_buffer *cpu_buf, int code,
> add_event_entry(offset); /* Offset from Dcookie */
>
> /* we send the Dcookie offset, but send the raw Linear Add also*/
> - add_event_entry(IBS_EIP(cpu_buf));
> - add_event_entry(IBS_EVENT(cpu_buf));
> + add_event_entry(sample->eip);
> + add_event_entry(sample->event);
>
> if (code == IBS_FETCH_CODE)
> count = IBS_FETCH_CODE_SIZE; /*IBS FETCH is 2 int64s*/
> @@ -383,10 +370,17 @@ static void add_ibs_begin(struct oprofile_cpu_buffer *cpu_buf, int code,
> count = IBS_OP_CODE_SIZE; /*IBS OP is 5 int64s*/
>
> for (i = 0; i < count; i++) {
> - increment_tail(cpu_buf);
> - add_event_entry(IBS_EIP(cpu_buf));
> - add_event_entry(IBS_EVENT(cpu_buf));
> + sample = cpu_buffer_read_entry(cpu);
> + if (!sample)
> + goto Error;
> + add_event_entry(sample->eip);
> + add_event_entry(sample->event);
> }
> +
> + return;
> +
> +Error:
> + return;
> }
>
> #endif
> @@ -530,33 +524,26 @@ typedef enum {
> */
> void sync_buffer(int cpu)
> {
> - struct oprofile_cpu_buffer *cpu_buf = &per_cpu(cpu_buffer, cpu);
> struct mm_struct *mm = NULL;
> struct mm_struct *oldmm;
> struct task_struct *new;
> unsigned long cookie = 0;
> int in_kernel = 1;
> sync_buffer_state state = sb_buffer_start;
> -#ifndef CONFIG_OPROFILE_IBS
> unsigned int i;
> unsigned long available;
> -#endif
>
> mutex_lock(&buffer_mutex);
>
> add_cpu_switch(cpu);
>
> - /* Remember, only we can modify tail_pos */
> -
> cpu_buffer_reset(cpu);
> -#ifndef CONFIG_OPROFILE_IBS
> - available = cpu_buffer_entries(cpu_buf);
> + available = cpu_buffer_entries(cpu);
>
> for (i = 0; i < available; ++i) {
> -#else
> - while (cpu_buffer_entries(cpu_buf)) {
> -#endif
> - struct op_sample *s = cpu_buffer_read_entry(cpu_buf);
> + struct op_sample *s = cpu_buffer_read_entry(cpu);
> + if (!s)
> + break;
>
> if (is_code(s->eip)) {
> switch (s->event) {
> @@ -575,11 +562,11 @@ void sync_buffer(int cpu)
> #ifdef CONFIG_OPROFILE_IBS
> case IBS_FETCH_BEGIN:
> state = sb_bt_start;
> - add_ibs_begin(cpu_buf, IBS_FETCH_CODE, mm);
> + add_ibs_begin(cpu, IBS_FETCH_CODE, mm);
> break;
> case IBS_OP_BEGIN:
> state = sb_bt_start;
> - add_ibs_begin(cpu_buf, IBS_OP_CODE, mm);
> + add_ibs_begin(cpu, IBS_OP_CODE, mm);
> break;
> #endif
> default:
> @@ -600,8 +587,6 @@ void sync_buffer(int cpu)
> atomic_inc(&oprofile_stats.bt_lost_no_mapping);
> }
> }
> -
> - increment_tail(cpu_buf);
> }
> release_mm(mm);
>
> diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
> index 5cf7efe..eb280ec 100644
> --- a/drivers/oprofile/cpu_buffer.c
> +++ b/drivers/oprofile/cpu_buffer.c
> @@ -28,6 +28,25 @@
> #include "buffer_sync.h"
> #include "oprof.h"
>
> +#define OP_BUFFER_FLAGS 0
> +
> +/*
> + * Read and write access is using spin locking. Thus, writing to the
> + * buffer by NMI handler (x86) could occur also during critical
> + * sections when reading the buffer. To avoid this, there are 2
> + * buffers for independent read and write access. Read access is in
> + * process context only, write access only in the NMI handler. If the
> + * read buffer runs empty, both buffers are swapped atomically. There
> + * is potentially a small window during swapping where the buffers are
> + * disabled and samples could be lost.
> + *
> + * Using 2 buffers is a little bit overhead, but the solution is clear
> + * and does not require changes in the ring buffer implementation. It
> + * can be changed to a single buffer solution when the ring buffer
> + * access is implemented as non-locking atomic code.
> + */
> +struct ring_buffer *op_ring_buffer_read;
> +struct ring_buffer *op_ring_buffer_write;

Ah, this is similar to the ftrace irq latency tracing method.

> DEFINE_PER_CPU(struct oprofile_cpu_buffer, cpu_buffer);
>
> static void wq_sync_buffer(struct work_struct *work);
> @@ -37,12 +56,12 @@ static int work_enabled;
>
> void free_cpu_buffers(void)
> {
> - int i;
> -
> - for_each_possible_cpu(i) {
> - vfree(per_cpu(cpu_buffer, i).buffer);
> - per_cpu(cpu_buffer, i).buffer = NULL;
> - }
> + if (op_ring_buffer_read)
> + ring_buffer_free(op_ring_buffer_read);
> + op_ring_buffer_read = NULL;
> + if (op_ring_buffer_write)
> + ring_buffer_free(op_ring_buffer_write);
> + op_ring_buffer_write = NULL;
> }
>
> unsigned long oprofile_get_cpu_buffer_size(void)
> @@ -64,14 +83,16 @@ int alloc_cpu_buffers(void)
>
> unsigned long buffer_size = fs_cpu_buffer_size;
>
> + op_ring_buffer_read = ring_buffer_alloc(buffer_size, OP_BUFFER_FLAGS);
> + if (!op_ring_buffer_read)
> + goto fail;
> + op_ring_buffer_write = ring_buffer_alloc(buffer_size, OP_BUFFER_FLAGS);
> + if (!op_ring_buffer_write)
> + goto fail;
> +
> for_each_possible_cpu(i) {
> struct oprofile_cpu_buffer *b = &per_cpu(cpu_buffer, i);
>
> - b->buffer = vmalloc_node(sizeof(struct op_sample) * buffer_size,
> - cpu_to_node(i));
> - if (!b->buffer)
> - goto fail;
> -
> b->last_task = NULL;
> b->last_is_kernel = -1;
> b->tracing = 0;
> @@ -140,10 +161,22 @@ static inline void
> add_sample(struct oprofile_cpu_buffer *cpu_buf,
> unsigned long pc, unsigned long event)
> {
> - struct op_sample *entry = cpu_buffer_write_entry(cpu_buf);
> - entry->eip = pc;
> - entry->event = event;
> - cpu_buffer_write_commit(cpu_buf);
> + struct op_entry entry;
> +
> + if (cpu_buffer_write_entry(&entry))
> + goto Error;
> +
> + entry.sample->eip = pc;
> + entry.sample->event = event;
> +
> + if (cpu_buffer_write_commit(&entry))
> + goto Error;
> +
> + return;
> +
> +Error:
> + cpu_buf->sample_lost_overflow++;
> + return;
> }
>
> static inline void
> diff --git a/drivers/oprofile/cpu_buffer.h b/drivers/oprofile/cpu_buffer.h
> index 895763f..aacb0f0 100644
> --- a/drivers/oprofile/cpu_buffer.h
> +++ b/drivers/oprofile/cpu_buffer.h
> @@ -15,6 +15,7 @@
> #include <linux/workqueue.h>
> #include <linux/cache.h>
> #include <linux/sched.h>
> +#include <linux/ring_buffer.h>
>
> struct task_struct;
>
> @@ -32,6 +33,12 @@ struct op_sample {
> unsigned long event;
> };
>
> +struct op_entry {
> + struct ring_buffer_event *event;
> + struct op_sample *sample;
> + unsigned long irq_flags;
> +};
> +
> struct oprofile_cpu_buffer {
> volatile unsigned long head_pos;
> volatile unsigned long tail_pos;
> @@ -39,7 +46,6 @@ struct oprofile_cpu_buffer {
> struct task_struct *last_task;
> int last_is_kernel;
> int tracing;
> - struct op_sample *buffer;
> unsigned long sample_received;
> unsigned long sample_lost_overflow;
> unsigned long backtrace_aborted;
> @@ -48,6 +54,8 @@ struct oprofile_cpu_buffer {
> struct delayed_work work;
> };
>
> +extern struct ring_buffer *op_ring_buffer_read;
> +extern struct ring_buffer *op_ring_buffer_write;
> DECLARE_PER_CPU(struct oprofile_cpu_buffer, cpu_buffer);
>
> /*
> @@ -64,46 +72,49 @@ static inline void cpu_buffer_reset(int cpu)
> cpu_buf->last_task = NULL;
> }
>
> -static inline
> -struct op_sample *cpu_buffer_write_entry(struct oprofile_cpu_buffer *cpu_buf)
> +static inline int cpu_buffer_write_entry(struct op_entry *entry)
> {
> - return &cpu_buf->buffer[cpu_buf->head_pos];
> -}
> + entry->event = ring_buffer_lock_reserve(op_ring_buffer_write,
> + sizeof(struct op_sample),
> + &entry->irq_flags);
> + if (entry->event)
> + entry->sample = ring_buffer_event_data(entry->event);
> + else
> + entry->sample = NULL;
>
> -static inline
> -void cpu_buffer_write_commit(struct oprofile_cpu_buffer *b)
> -{
> - unsigned long new_head = b->head_pos + 1;
> + if (!entry->sample)
> + return -ENOMEM;
>
> - /*
> - * Ensure anything written to the slot before we increment is
> - * visible
> - */
> - wmb();
> + return 0;
> +}
>
> - if (new_head < b->buffer_size)
> - b->head_pos = new_head;
> - else
> - b->head_pos = 0;
> +static inline int cpu_buffer_write_commit(struct op_entry *entry)
> +{
> + return ring_buffer_unlock_commit(op_ring_buffer_write, entry->event,
> + entry->irq_flags);

Note, ring_buffer_unlock_commit is not expected to receive a NULL entry.
It may work, but it will screw up the accounting. I'll fix that in the
future, but for now, your code may be broken.

> }
>
> -static inline
> -struct op_sample *cpu_buffer_read_entry(struct oprofile_cpu_buffer *cpu_buf)
> +static inline struct op_sample *cpu_buffer_read_entry(int cpu)
> {
> - return &cpu_buf->buffer[cpu_buf->tail_pos];
> + struct ring_buffer_event *e;
> + e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL);
> + if (e)
> + return ring_buffer_event_data(e);
> + if (ring_buffer_swap_cpu(op_ring_buffer_read,
> + op_ring_buffer_write,
> + cpu))
> + return NULL;

Is cpu == smp_processor_id() here? If not, there needs to be more
protection.

> + e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL);
> + if (e)
> + return ring_buffer_event_data(e);
> + return NULL;
> }
>
> /* "acquire" as many cpu buffer slots as we can */
> -static inline
> -unsigned long cpu_buffer_entries(struct oprofile_cpu_buffer *b)
> +static inline unsigned long cpu_buffer_entries(int cpu)
> {
> - unsigned long head = b->head_pos;
> - unsigned long tail = b->tail_pos;
> -
> - if (head >= tail)
> - return head - tail;
> -
> - return head + (b->buffer_size - tail);
> + return ring_buffer_entries_cpu(op_ring_buffer_read, cpu)
> + + ring_buffer_entries_cpu(op_ring_buffer_write, cpu);

This may have the wrong value if the ring_buffer_lock_reserve failed.

Again, I need to allow for the commit to take a null entry, and ignore it.

-- Steve

> }
>
> /* transient events for the CPU buffer -> event buffer */
> --
> 1.6.0.1
>
>
>
>

2008-12-12 05:57:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/9] oprofile: port to the new ring buffer


* Robert Richter <[email protected]> wrote:

> This patch set ports cpu buffers in oprofile to the new ring buffer
> provided by the tracing framework. The motivation here is to leave the
> pain of implementing ring buffers to others. Oh, no, there are more
> advantages. Main reason is the support of different sample sizes that
> could be stored in the buffer. Use cases for this are IBS and Cell spu
> profiling. Using the new ring buffer ensures valid and complete samples
> and allows copying the cpu buffer stateless without knowing its
> content. Second it will use generic kernel API and also reduce code
> size. And hopefully, there are less bugs.
>
> The patch set is also available here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git ring_buffer

Pulled into tip/oprofile, thanks Robert - very nice stuff!

I've changed the exports to _GPL because these are really internal APIs
with a high flux, so the practical requirement for code relying on it to
be in-tree (or at least be source-available) is evident.

Ingo

2008-12-16 09:38:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 6/9] oprofile: port to the new ring_buffer

On Thu, 11 Dec 2008 17:42:00 +0100 Robert Richter <[email protected]> wrote:

> -static inline
> -struct op_sample *cpu_buffer_read_entry(struct oprofile_cpu_buffer *cpu_buf)
> +static inline struct op_sample *cpu_buffer_read_entry(int cpu)
> {
> - return &cpu_buf->buffer[cpu_buf->tail_pos];
> + struct ring_buffer_event *e;
> + e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL);
> + if (e)
> + return ring_buffer_event_data(e);
> + if (ring_buffer_swap_cpu(op_ring_buffer_read,
> + op_ring_buffer_write,
> + cpu))
> + return NULL;
> + e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL);
> + if (e)
> + return ring_buffer_event_data(e);
> + return NULL;
> }

This file has some really large inlined functions.
cpu_buffer_read_entry() has three callsites..

Subject: Re: [PATCH 6/9] oprofile: port to the new ring_buffer

On 11.12.08 14:48:56, Steven Rostedt wrote:
> > Since the new tracing ring buffer implementation uses spin locks to
> > protect the buffer during read/write access, it is difficult to use
> > the buffer in an NMI handler. In this case, writing to the buffer by
> > the NMI handler (x86) could occur also during critical sections when
> > reading the buffer. To avoid this, there are 2 buffers for independent
> > read and write access. Read access is in process context only, write
> > access only in the NMI handler. If the read buffer runs empty, both
> > buffers are swapped atomically. There is potentially a small window
> > during swapping where the buffers are disabled and samples could be
> > lost.
>
> There is plans on removing the spinlock from the write side of the buffer.
> But this will take a bit of work and care. Lockless is better, but it also
> makes for more complex code which translates to more prone to bugs code.

In the beginning, the use of separate locks for reading and writing
would be sufficient. Then, there would be only one atomic comparison
needed to check, if the write pointer meets the read pointer. This
should be not as difficult since read and write is always in different
pages (if a am not wrong) and thus only the pointer to the pages have
to be compared.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

2008-12-16 19:38:57

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/9] oprofile: port to the new ring_buffer


On Tue, 16 Dec 2008, Robert Richter wrote:
>
> In the beginning, the use of separate locks for reading and writing
> would be sufficient. Then, there would be only one atomic comparison
> needed to check, if the write pointer meets the read pointer. This
> should be not as difficult since read and write is always in different
> pages (if a am not wrong) and thus only the pointer to the pages have
> to be compared.
>

Most of the ftrace work uses the ring buffer in overwrite mode. That is,
when the head meets the tail, we move the tail. The way the reader works,
is that the first read will swap a page out of the ring with a blank page.
This means, what is on that page is safe from further writes. When the
reader is finished with the page, it will swap the empty page with a tail
of the buffer.

The problem is that there needs to be a lock to protect this change. If
the writer is moving the tail and the reader is about to swap it, things
can get corrupted. There are tricks to protect this with cmpxchg, but for
now we are playing it safe with spin locks. Note, the lock is only taken
by the reader when it needs to go to the next page.

The writers do not need any locks to protect against other writers,
because the buffers are per cpu and a write may only be performed on a
buffer on the local cpu.

-- Steve


Note: I may have tail and head backwards, I do not remember which end I
called what.

2008-12-22 23:54:55

by Carl Love

[permalink] [raw]
Subject: Re: [PATCH 0/9] oprofile: port to the new ring buffer

I have tested the new ring buffer patches on an IBM Cell blade. Our
OProfile testsuite ran fine on the the patched kernel. I ran the
testsuite three times yesterday/last night. As far as I can tell
the new code works fine.

Note, I started with a 2.6.28-rc7 kernel from kernel.org. I then
applied the the ftrace series of patches followed by the ring buffer
patches. I then had to explicitly enable the tracing in the .config
file as there is now a dependency on that. Robert updated the OProfile
dependency stuff yesterday. I started with my .config file that I have
used for building kernels on my cell machine.

Carl Love

On Thu, 2008-12-11 at 17:41 +0100, Robert Richter wrote:
> This patch set ports cpu buffers in oprofile to the new ring buffer
> provided by the tracing framework. The motivation here is to leave the
> pain of implementing ring buffers to others. Oh, no, there are more
> advantages. Main reason is the support of different sample sizes that
> could be stored in the buffer. Use cases for this are IBS and Cell spu
> profiling. Using the new ring buffer ensures valid and complete
> samples and allows copying the cpu buffer stateless without knowing
> its content. Second it will use generic kernel API and also reduce
> code size. And hopefully, there are less bugs.
>
> The patch set is also available here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git ring_buffer
>
> -Robert
>
>
>
>
> ------------------------------------------------------------------------------
> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
> The future of the web can't happen without you. Join us at MIX09 to help
> pave the way to the Next Web now. Learn more and register at
> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
> _______________________________________________
> oprofile-list mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/oprofile-list

2008-12-16 23:50:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/9] oprofile: port to the new ring buffer

On Thu, 11 Dec 2008 17:41:54 +0100
Robert Richter <[email protected]> wrote:

> This patch set ports cpu buffers in oprofile to the new ring buffer
> provided by the tracing framework.

alpha allmodconfig:

arch/alpha/oprofile/../../../drivers/oprofile/buffer_sync.c: In function 'sync_buffer':
arch/alpha/oprofile/../../../drivers/oprofile/buffer_sync.c:398: warning: 'offset' may be used uninitialized in this function
make[1]: *** [arch/alpha/oprofile/../../../drivers/oprofile/buffer_sync.o] Error 1

because stupid gcc thinks that

lookup_dcookie(mm, s->eip, &offset);

didn't initialise `offset', and arch/alpha is compiled with -Werror.

I'm not sure what to do about that, apart from finding (or building) a
new alpha cross-compiler.

What a PITA. I do so wish that someone would maintain a suite of
cross-compilers for kernel developers.

2008-12-17 05:03:57

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 0/9] oprofile: port to the new ring buffer

On Wed, Dec 17, 2008 at 1:49 AM, Andrew Morton
<[email protected]> wrote:
> I'm not sure what to do about that, apart from finding (or building) a
> new alpha cross-compiler.
>
> What a PITA. I do so wish that someone would maintain a suite of
> cross-compilers for kernel developers.

There's quite a few cross compilers for i386 here:

http://www.kernel.org/pub/tools/crosstool/files/bin/i386/