2008-08-01 20:03:09

by Carl Love

[permalink] [raw]
Subject: [PATCH 2/2] Cell OProfile: SPU mutex lock fix, version 4


If an error occurs on opcontrol start, the event and per cpu buffers
are released. If later opcontrol shutdown is called then the free
function will be called again to free buffers that no longer
exist. This results in a kernel oops. The following changes
prevent the call to delete buffers that don't exist.

Signed-off-by: Carl Love <[email protected]>

Index: Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
===================================================================
--- Cell_kernel_6_26_2008.orig/drivers/oprofile/cpu_buffer.c
+++ Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
@@ -38,8 +38,12 @@ void free_cpu_buffers(void)
{
int i;

- for_each_online_cpu(i)
- vfree(per_cpu(cpu_buffer, i).buffer);
+ for_each_online_cpu(i) {
+ if (per_cpu(cpu_buffer, i).buffer) {
+ vfree(per_cpu(cpu_buffer, i).buffer);
+ per_cpu(cpu_buffer, i).buffer = NULL;
+ }
+ }
}

unsigned long oprofile_get_cpu_buffer_size(void)
Index: Cell_kernel_6_26_2008/drivers/oprofile/event_buffer.c
===================================================================
--- Cell_kernel_6_26_2008.orig/drivers/oprofile/event_buffer.c
+++ Cell_kernel_6_26_2008/drivers/oprofile/event_buffer.c
@@ -92,7 +92,10 @@ out:

void free_event_buffer(void)
{
- vfree(event_buffer);
+ if (event_buffer)
+ vfree(event_buffer);
+
+ event_buffer = NULL;
}




2008-08-08 13:26:41

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] Cell OProfile: SPU mutex lock fix, version 4

On Friday 01 August 2008, Carl Love wrote:
> If an error occurs on opcontrol start, the event and per cpu buffers
> are released. ?If later opcontrol shutdown is called then the free
> function will be called again to free buffers that no longer
> exist. ?This results in a kernel oops. ?The following changes
> prevent the call to delete buffers that don't exist.
>
> Signed-off-by: Carl Love <[email protected]>
>

vfree(NULL) is defined to be legal, so you don't need to check the
argument for being non-NULL, just set it to NULL after the free.

Arnd <><

2008-08-08 22:40:56

by Carl Love

[permalink] [raw]
Subject: Re: [PATCH 2/2] Repost Cell OProfile: SPU mutex lock fix, version 4

Updated to address Arnd's comments.

If an error occurs on opcontrol start, the event and per cpu buffers
are released. If later opcontrol shutdown is called then the free
function will be called again to free buffers that no longer
exist. This results in a kernel oops. The following changes
prevent the call to delete buffers that don't exist.

Signed-off-by: Carl Love <[email protected]>

Index: Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
===================================================================
--- Cell_kernel_6_26_2008.orig/drivers/oprofile/cpu_buffer.c
+++ Cell_kernel_6_26_2008/drivers/oprofile/cpu_buffer.c
@@ -38,8 +38,10 @@ void free_cpu_buffers(void)
{
int i;

- for_each_online_cpu(i)
+ for_each_online_cpu(i) {
vfree(per_cpu(cpu_buffer, i).buffer);
+ per_cpu(cpu_buffer, i).buffer = NULL;
+ }
}

unsigned long oprofile_get_cpu_buffer_size(void)
Index: Cell_kernel_6_26_2008/drivers/oprofile/event_buffer.c
===================================================================
--- Cell_kernel_6_26_2008.orig/drivers/oprofile/event_buffer.c
+++ Cell_kernel_6_26_2008/drivers/oprofile/event_buffer.c
@@ -93,6 +93,8 @@ out:
void free_event_buffer(void)
{
vfree(event_buffer);
+
+ event_buffer = NULL;
}



2008-08-09 13:13:31

by Arnd Bergmann

[permalink] [raw]
Subject: please pull cell merge branch

Hi Paul,

I've fixed one last bug in Carl's update for cell-oprofile (int flags
instead of unsigned long flags) and made sure the comments fit the
usual style. This fixes a long-standing bug that prevented us from
using oprofile on SPUs in many real-world scenarios. Please
pull from

master.kernel.org:/pub/scm/linux/kernel/git/arnd/cell-2.6.git merge

so we can get it fixed in 2.6.27-rc3. Sorry for the size of the patch
so late after the merge window, unfortunately it wasn't possible to
fix the problem in a simpler way.

Arnd <><

---

arch/powerpc/oprofile/cell/pr_util.h | 13 +
arch/powerpc/oprofile/cell/spu_profiler.c | 4
arch/powerpc/oprofile/cell/spu_task_sync.c | 236 ++++++++++++++++++++++++---
drivers/oprofile/buffer_sync.c | 24 ++
drivers/oprofile/cpu_buffer.c | 15 +
drivers/oprofile/event_buffer.c | 2
drivers/oprofile/event_buffer.h | 7
include/linux/oprofile.h | 16 +
drivers/oprofile/cpu_buffer.c | 4
9 files changed, 284 insertions(+), 37 deletions(-)

commit f90a87b5f5fa46dc6c556e9267a6f25a95fbef14
Author: Carl Love <[email protected]>
Date: Fri Aug 8 15:39:44 2008 -0700

powerpc/cell/oprofile: avoid double free of profile buffer

If an error occurs on opcontrol start, the event and per cpu buffers
are released. If later opcontrol shutdown is called then the free
function will be called again to free buffers that no longer
exist. This results in a kernel oops. The following changes
prevent the call to delete buffers that don't exist.

Signed-off-by: Carl Love <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

commit 70f546a7262c6f67b87e875a542f315f8b9a7c17
Author: Carl Love <[email protected]>
Date: Fri Aug 8 15:38:36 2008 -0700

powerpc/cell/oprofile: fix mutex locking for spu-oprofile

The issue is the SPU code is not holding the kernel mutex lock while
adding samples to the kernel buffer.

This patch creates per SPU buffers to hold the data. Data
is added to the buffers from in interrupt context. The data
is periodically pushed to the kernel buffer via a new Oprofile
function oprofile_put_buff(). The oprofile_put_buff() function
is called via a work queue enabling the funtion to acquire the
mutex lock.

The existing user controls for adjusting the per CPU buffer
size is used to control the size of the per SPU buffers.
Similarly, overflows of the SPU buffers are reported by
incrementing the per CPU buffer stats. This eliminates the
need to have architecture specific controls for the per SPU
buffers which is not acceptable to the OProfile user tool
maintainer.

The export of the oprofile add_event_entry() is removed as it
is no longer needed given this patch.

Note, this patch has not addressed the issue of indexing arrays
by the spu number. This still needs to be fixed as the spu
numbering is not guarenteed to be 0 to max_num_spus-1.

Signed-off-by: Carl Love <[email protected]>
Signed-off-by: Maynard Johnson <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

2008-08-10 23:22:06

by Paul Mackerras

[permalink] [raw]
Subject: Re: please pull cell merge branch

Arnd Bergmann writes:

> I've fixed one last bug in Carl's update for cell-oprofile (int flags
> instead of unsigned long flags) and made sure the comments fit the
> usual style. This fixes a long-standing bug that prevented us from
> using oprofile on SPUs in many real-world scenarios. Please
> pull from
>
> master.kernel.org:/pub/scm/linux/kernel/git/arnd/cell-2.6.git merge
>
> so we can get it fixed in 2.6.27-rc3. Sorry for the size of the patch
> so late after the merge window, unfortunately it wasn't possible to
> fix the problem in a simpler way.
>
> Arnd <><
>
> ---
>
> arch/powerpc/oprofile/cell/pr_util.h | 13 +
> arch/powerpc/oprofile/cell/spu_profiler.c | 4
> arch/powerpc/oprofile/cell/spu_task_sync.c | 236 ++++++++++++++++++++++++---
> drivers/oprofile/buffer_sync.c | 24 ++
> drivers/oprofile/cpu_buffer.c | 15 +
> drivers/oprofile/event_buffer.c | 2
> drivers/oprofile/event_buffer.h | 7
> include/linux/oprofile.h | 16 +
> drivers/oprofile/cpu_buffer.c | 4

Since you're touching generic oprofile code here, do you have an ack
from the oprofile maintainer? Or is the oprofile maintainer listed in
MAINTAINERS (Robert Richter) no longer active or something?

Paul.

2008-08-11 07:19:24

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [Cbe-oss-dev] please pull cell merge branch

On Monday 11 August 2008, Paul Mackerras wrote:
> Arnd Bergmann writes:
> > I've fixed one last bug in Carl's update for cell-oprofile (int flags
> > instead of unsigned long flags) and made sure the comments fit the
> > usual style. This fixes a long-standing bug that prevented us from
> > using oprofile on SPUs in many real-world scenarios.
>
> Since you're touching generic oprofile code here, do you have an ack
> from the oprofile maintainer? ?Or is the oprofile maintainer listed in
> MAINTAINERS (Robert Richter) no longer active or something?
>

Sorry about that. I had assumed that at the very least Carl had had
the oprofile list on Cc and that people there just didn't care.

Robert has just recently taken over maintainership for oprofile,
so I assume that he is indeed active and interested in the patches.
I'll send them out again for his review on oprofile-list.

Arnd <><

2008-08-11 07:27:08

by Arnd Bergmann

[permalink] [raw]
Subject: powerpc/cell/oprofile: avoid double free of profile buffer

From: Carl Love <[email protected]>

If an error occurs on opcontrol start, the event and per cpu buffers
are released. If later opcontrol shutdown is called then the free
function will be called again to free buffers that no longer
exist. This results in a kernel oops. The following changes
prevent the call to delete buffers that don't exist.

Signed-off-by: Carl Love <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
---
drivers/oprofile/cpu_buffer.c | 4 +++-
drivers/oprofile/event_buffer.c | 2 ++
2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index b8601dc..366b5d2 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -38,8 +38,10 @@ void free_cpu_buffers(void)
{
int i;

- for_each_online_cpu(i)
+ for_each_online_cpu(i) {
vfree(per_cpu(cpu_buffer, i).buffer);
+ per_cpu(cpu_buffer, i).buffer = NULL;
+ }
}

unsigned long oprofile_get_cpu_buffer_size(void)
diff --git a/drivers/oprofile/event_buffer.c b/drivers/oprofile/event_buffer.c
index e7fbac5..8d692a5 100644
--- a/drivers/oprofile/event_buffer.c
+++ b/drivers/oprofile/event_buffer.c
@@ -93,6 +93,8 @@ out:
void free_event_buffer(void)
{
vfree(event_buffer);
+
+ event_buffer = NULL;
}


--
1.5.4.3

2008-08-11 07:28:01

by Arnd Bergmann

[permalink] [raw]
Subject: powerpc/cell/oprofile: fix mutex locking for spu-oprofile

From: Carl Love <[email protected]>

The issue is the SPU code is not holding the kernel mutex lock while
adding samples to the kernel buffer.

This patch creates per SPU buffers to hold the data. Data
is added to the buffers from in interrupt context. The data
is periodically pushed to the kernel buffer via a new Oprofile
function oprofile_put_buff(). The oprofile_put_buff() function
is called via a work queue enabling the funtion to acquire the
mutex lock.

The existing user controls for adjusting the per CPU buffer
size is used to control the size of the per SPU buffers.
Similarly, overflows of the SPU buffers are reported by
incrementing the per CPU buffer stats. This eliminates the
need to have architecture specific controls for the per SPU
buffers which is not acceptable to the OProfile user tool
maintainer.

The export of the oprofile add_event_entry() is removed as it
is no longer needed given this patch.

Note, this patch has not addressed the issue of indexing arrays
by the spu number. This still needs to be fixed as the spu
numbering is not guarenteed to be 0 to max_num_spus-1.

Signed-off-by: Carl Love <[email protected]>
Signed-off-by: Maynard Johnson <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/powerpc/oprofile/cell/pr_util.h | 13 ++
arch/powerpc/oprofile/cell/spu_profiler.c | 4 +-
arch/powerpc/oprofile/cell/spu_task_sync.c | 236 +++++++++++++++++++++++++---
drivers/oprofile/buffer_sync.c | 24 +++
drivers/oprofile/cpu_buffer.c | 15 ++-
drivers/oprofile/event_buffer.h | 7 +
include/linux/oprofile.h | 16 +-
7 files changed, 279 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/oprofile/cell/pr_util.h b/arch/powerpc/oprofile/cell/pr_util.h
index 22e4e8d..628009c 100644
--- a/arch/powerpc/oprofile/cell/pr_util.h
+++ b/arch/powerpc/oprofile/cell/pr_util.h
@@ -24,6 +24,11 @@
#define SKIP_GENERIC_SYNC 0
#define SYNC_START_ERROR -1
#define DO_GENERIC_SYNC 1
+#define SPUS_PER_NODE 8
+#define DEFAULT_TIMER_EXPIRE (HZ / 10)
+
+extern struct delayed_work spu_work;
+extern int spu_prof_running;

struct spu_overlay_info { /* map of sections within an SPU overlay */
unsigned int vma; /* SPU virtual memory address from elf */
@@ -62,6 +67,14 @@ struct vma_to_fileoffset_map { /* map of sections within an SPU program */

};

+struct spu_buffer {
+ int last_guard_val;
+ int ctx_sw_seen;
+ unsigned long *buff;
+ unsigned int head, tail;
+};
+
+
/* The three functions below are for maintaining and accessing
* the vma-to-fileoffset map.
*/
diff --git a/arch/powerpc/oprofile/cell/spu_profiler.c b/arch/powerpc/oprofile/cell/spu_profiler.c
index 380d7e2..6edaebd 100644
--- a/arch/powerpc/oprofile/cell/spu_profiler.c
+++ b/arch/powerpc/oprofile/cell/spu_profiler.c
@@ -23,12 +23,11 @@

static u32 *samples;

-static int spu_prof_running;
+int spu_prof_running;
static unsigned int profiling_interval;

#define NUM_SPU_BITS_TRBUF 16
#define SPUS_PER_TB_ENTRY 4
-#define SPUS_PER_NODE 8

#define SPU_PC_MASK 0xFFFF

@@ -208,6 +207,7 @@ int start_spu_profiling(unsigned int cycles_reset)

spu_prof_running = 1;
hrtimer_start(&timer, kt, HRTIMER_MODE_REL);
+ schedule_delayed_work(&spu_work, DEFAULT_TIMER_EXPIRE);

return 0;
}
diff --git a/arch/powerpc/oprofile/cell/spu_task_sync.c b/arch/powerpc/oprofile/cell/spu_task_sync.c
index 2a9b4a0..2949126 100644
--- a/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ b/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -35,7 +35,102 @@ static DEFINE_SPINLOCK(buffer_lock);
static DEFINE_SPINLOCK(cache_lock);
static int num_spu_nodes;
int spu_prof_num_nodes;
-int last_guard_val[MAX_NUMNODES * 8];
+
+struct spu_buffer spu_buff[MAX_NUMNODES * SPUS_PER_NODE];
+struct delayed_work spu_work;
+static unsigned max_spu_buff;
+
+static void spu_buff_add(unsigned long int value, int spu)
+{
+ /* spu buff is a circular buffer. Add entries to the
+ * head. Head is the index to store the next value.
+ * The buffer is full when there is one available entry
+ * in the queue, i.e. head and tail can't be equal.
+ * That way we can tell the difference between the
+ * buffer being full versus empty.
+ *
+ * ASSUPTION: the buffer_lock is held when this function
+ * is called to lock the buffer, head and tail.
+ */
+ int full = 1;
+
+ if (spu_buff[spu].head >= spu_buff[spu].tail) {
+ if ((spu_buff[spu].head - spu_buff[spu].tail)
+ < (max_spu_buff - 1))
+ full = 0;
+
+ } else if (spu_buff[spu].tail > spu_buff[spu].head) {
+ if ((spu_buff[spu].tail - spu_buff[spu].head)
+ > 1)
+ full = 0;
+ }
+
+ if (!full) {
+ spu_buff[spu].buff[spu_buff[spu].head] = value;
+ spu_buff[spu].head++;
+
+ if (spu_buff[spu].head >= max_spu_buff)
+ spu_buff[spu].head = 0;
+ } else {
+ /* From the user's perspective make the SPU buffer
+ * size management/overflow look like we are using
+ * per cpu buffers. The user uses the same
+ * per cpu parameter to adjust the SPU buffer size.
+ * Increment the sample_lost_overflow to inform
+ * the user the buffer size needs to be increased.
+ */
+ oprofile_cpu_buffer_inc_smpl_lost();
+ }
+}
+
+/* This function copies the per SPU buffers to the
+ * OProfile kernel buffer.
+ */
+void sync_spu_buff(void)
+{
+ int spu;
+ unsigned long flags;
+ int curr_head;
+
+ for (spu = 0; spu < num_spu_nodes; spu++) {
+ /* In case there was an issue and the buffer didn't
+ * get created skip it.
+ */
+ if (spu_buff[spu].buff == NULL)
+ continue;
+
+ /* Hold the lock to make sure the head/tail
+ * doesn't change while spu_buff_add() is
+ * deciding if the buffer is full or not.
+ * Being a little paranoid.
+ */
+ spin_lock_irqsave(&buffer_lock, flags);
+ curr_head = spu_buff[spu].head;
+ spin_unlock_irqrestore(&buffer_lock, flags);
+
+ /* Transfer the current contents to the kernel buffer.
+ * data can still be added to the head of the buffer.
+ */
+ oprofile_put_buff(spu_buff[spu].buff,
+ spu_buff[spu].tail,
+ curr_head, max_spu_buff);
+
+ spin_lock_irqsave(&buffer_lock, flags);
+ spu_buff[spu].tail = curr_head;
+ spin_unlock_irqrestore(&buffer_lock, flags);
+ }
+
+}
+
+static void wq_sync_spu_buff(struct work_struct *work)
+{
+ /* move data from spu buffers to kernel buffer */
+ sync_spu_buff();
+
+ /* only reschedule if profiling is not done */
+ if (spu_prof_running)
+ schedule_delayed_work(&spu_work, DEFAULT_TIMER_EXPIRE);
+}

/* Container for caching information about an active SPU task. */
struct cached_info {
@@ -305,14 +400,21 @@ static int process_context_switch(struct spu *spu, unsigned long objectId)

/* Record context info in event buffer */
spin_lock_irqsave(&buffer_lock, flags);
- add_event_entry(ESCAPE_CODE);
- add_event_entry(SPU_CTX_SWITCH_CODE);
- add_event_entry(spu->number);
- add_event_entry(spu->pid);
- add_event_entry(spu->tgid);
- add_event_entry(app_dcookie);
- add_event_entry(spu_cookie);
- add_event_entry(offset);
+ spu_buff_add(ESCAPE_CODE, spu->number);
+ spu_buff_add(SPU_CTX_SWITCH_CODE, spu->number);
+ spu_buff_add(spu->number, spu->number);
+ spu_buff_add(spu->pid, spu->number);
+ spu_buff_add(spu->tgid, spu->number);
+ spu_buff_add(app_dcookie, spu->number);
+ spu_buff_add(spu_cookie, spu->number);
+ spu_buff_add(offset, spu->number);
+
+ /* Set flag to indicate SPU PC data can now be written out. If
+ * the SPU program counter data is seen before an SPU context
+ * record is seen, the postprocessing will fail.
+ */
+ spu_buff[spu->number].ctx_sw_seen = 1;
+
spin_unlock_irqrestore(&buffer_lock, flags);
smp_wmb(); /* insure spu event buffer updates are written */
/* don't want entries intermingled... */
@@ -360,6 +462,47 @@ static int number_of_online_nodes(void)
return nodes;
}

+static int oprofile_spu_buff_create(void)
+{
+ int spu;
+
+ max_spu_buff = oprofile_get_cpu_buffer_size();
+
+ for (spu = 0; spu < num_spu_nodes; spu++) {
+ /* create circular buffers to store the data in.
+ * use locks to manage accessing the buffers
+ */
+ spu_buff[spu].head = 0;
+ spu_buff[spu].tail = 0;
+
+ /*
+ * Create a buffer for each SPU. Can't reliably
+ * create a single buffer for all spus due to not
+ * enough contiguous kernel memory.
+ */
+
+ spu_buff[spu].buff = kzalloc((max_spu_buff
+ * sizeof(unsigned long)),
+ GFP_KERNEL);
+
+ if (!spu_buff[spu].buff) {
+ printk(KERN_ERR "SPU_PROF: "
+ "%s, line %d: oprofile_spu_buff_create "
+ "failed to allocate spu buffer %d.\n",
+ __func__, __LINE__, spu);
+
+ /* release the spu buffers that have been allocated */
+ while (spu >= 0) {
+ kfree(spu_buff[spu].buff);
+ spu_buff[spu].buff = 0;
+ spu--;
+ }
+ return -ENOMEM;
+ }
+ }
+ return 0;
+}
+
/* The main purpose of this function is to synchronize
* OProfile with SPUFS by registering to be notified of
* SPU task switches.
@@ -372,20 +515,35 @@ static int number_of_online_nodes(void)
*/
int spu_sync_start(void)
{
- int k;
+ int spu;
int ret = SKIP_GENERIC_SYNC;
int register_ret;
unsigned long flags = 0;

spu_prof_num_nodes = number_of_online_nodes();
num_spu_nodes = spu_prof_num_nodes * 8;
+ INIT_DELAYED_WORK(&spu_work, wq_sync_spu_buff);
+
+ /* create buffer for storing the SPU data to put in
+ * the kernel buffer.
+ */
+ ret = oprofile_spu_buff_create();
+ if (ret)
+ goto out;

spin_lock_irqsave(&buffer_lock, flags);
- add_event_entry(ESCAPE_CODE);
- add_event_entry(SPU_PROFILING_CODE);
- add_event_entry(num_spu_nodes);
+ for (spu = 0; spu < num_spu_nodes; spu++) {
+ spu_buff_add(ESCAPE_CODE, spu);
+ spu_buff_add(SPU_PROFILING_CODE, spu);
+ spu_buff_add(num_spu_nodes, spu);
+ }
spin_unlock_irqrestore(&buffer_lock, flags);

+ for (spu = 0; spu < num_spu_nodes; spu++) {
+ spu_buff[spu].ctx_sw_seen = 0;
+ spu_buff[spu].last_guard_val = 0;
+ }
+
/* Register for SPU events */
register_ret = spu_switch_event_register(&spu_active);
if (register_ret) {
@@ -393,8 +551,6 @@ int spu_sync_start(void)
goto out;
}

- for (k = 0; k < (MAX_NUMNODES * 8); k++)
- last_guard_val[k] = 0;
pr_debug("spu_sync_start -- running.\n");
out:
return ret;
@@ -446,13 +602,20 @@ void spu_sync_buffer(int spu_num, unsigned int *samples,
* use. We need to discard samples taken during the time
* period which an overlay occurs (i.e., guard value changes).
*/
- if (grd_val && grd_val != last_guard_val[spu_num]) {
- last_guard_val[spu_num] = grd_val;
+ if (grd_val && grd_val != spu_buff[spu_num].last_guard_val) {
+ spu_buff[spu_num].last_guard_val = grd_val;
/* Drop the rest of the samples. */
break;
}

- add_event_entry(file_offset | spu_num_shifted);
+ /* We must ensure that the SPU context switch has been written
+ * out before samples for the SPU. Otherwise, the SPU context
+ * information is not available and the postprocessing of the
+ * SPU PC will fail with no available anonymous map information.
+ */
+ if (spu_buff[spu_num].ctx_sw_seen)
+ spu_buff_add((file_offset | spu_num_shifted),
+ spu_num);
}
spin_unlock(&buffer_lock);
out:
@@ -463,20 +626,41 @@ out:
int spu_sync_stop(void)
{
unsigned long flags = 0;
- int ret = spu_switch_event_unregister(&spu_active);
- if (ret) {
+ int ret;
+ int k;
+
+ ret = spu_switch_event_unregister(&spu_active);
+
+ if (ret)
printk(KERN_ERR "SPU_PROF: "
- "%s, line %d: spu_switch_event_unregister returned %d\n",
- __func__, __LINE__, ret);
- goto out;
- }
+ "%s, line %d: spu_switch_event_unregister " \
+ "returned %d\n",
+ __func__, __LINE__, ret);
+
+ /* flush any remaining data in the per SPU buffers */
+ sync_spu_buff();

spin_lock_irqsave(&cache_lock, flags);
ret = release_cached_info(RELEASE_ALL);
spin_unlock_irqrestore(&cache_lock, flags);
-out:
+
+ /* remove scheduled work queue item rather then waiting
+ * for every queued entry to execute. Then flush pending
+ * system wide buffer to event buffer.
+ */
+ cancel_delayed_work(&spu_work);
+
+ for (k = 0; k < num_spu_nodes; k++) {
+ spu_buff[k].ctx_sw_seen = 0;
+
+ /*
+ * spu_sys_buff will be null if there was a problem
+ * allocating the buffer. Only delete if it exists.
+ */
+ kfree(spu_buff[k].buff);
+ spu_buff[k].buff = 0;
+ }
pr_debug("spu_sync_stop -- done.\n");
return ret;
}

-
diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
index 9304c45..4a70180 100644
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -551,3 +551,27 @@ void sync_buffer(int cpu)

mutex_unlock(&buffer_mutex);
}
+
+/* The function can be used to add a buffer worth of data directly to
+ * the kernel buffer. The buffer is assumed to be a circular buffer.
+ * Take the entries from index start and end at index end, wrapping
+ * at max_entries.
+ */
+void oprofile_put_buff(unsigned long *buf, unsigned int start,
+ unsigned int stop, unsigned int max)
+{
+ int i;
+
+ i = start;
+
+ mutex_lock(&buffer_mutex);
+ while (i != stop) {
+ add_event_entry(buf[i++]);
+
+ if (i >= max)
+ i = 0;
+ }
+
+ mutex_unlock(&buffer_mutex);
+}
+
diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index 2450b3a..b8601dc 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -37,11 +37,24 @@ static int work_enabled;
void free_cpu_buffers(void)
{
int i;
-
+
for_each_online_cpu(i)
vfree(per_cpu(cpu_buffer, i).buffer);
}

+unsigned long oprofile_get_cpu_buffer_size(void)
+{
+ return fs_cpu_buffer_size;
+}
+
+void oprofile_cpu_buffer_inc_smpl_lost(void)
+{
+ struct oprofile_cpu_buffer *cpu_buf
+ = &__get_cpu_var(cpu_buffer);
+
+ cpu_buf->sample_lost_overflow++;
+}
+
int alloc_cpu_buffers(void)
{
int i;
diff --git a/drivers/oprofile/event_buffer.h b/drivers/oprofile/event_buffer.h
index 5076ed1..84bf324 100644
--- a/drivers/oprofile/event_buffer.h
+++ b/drivers/oprofile/event_buffer.h
@@ -17,6 +17,13 @@ int alloc_event_buffer(void);

void free_event_buffer(void);

+/**
+ * Add data to the event buffer.
+ * The data passed is free-form, but typically consists of
+ * file offsets, dcookies, context information, and ESCAPE codes.
+ */
+void add_event_entry(unsigned long data);
+
/* wake up the process sleeping on the event file */
void wake_up_buffer_waiter(void);

diff --git a/include/linux/oprofile.h b/include/linux/oprofile.h
index 041bb31..1ef7fce 100644
--- a/include/linux/oprofile.h
+++ b/include/linux/oprofile.h
@@ -84,13 +84,6 @@ int oprofile_arch_init(struct oprofile_operations * ops);
void oprofile_arch_exit(void);

/**
- * Add data to the event buffer.
- * The data passed is free-form, but typically consists of
- * file offsets, dcookies, context information, and ESCAPE codes.
- */
-void add_event_entry(unsigned long data);
-
-/**
* Add a sample. This may be called from any context. Pass
* smp_processor_id() as cpu.
*/
@@ -160,5 +153,14 @@ int oprofilefs_ulong_from_user(unsigned long * val, char const __user * buf, siz

/** lock for read/write safety */
extern spinlock_t oprofilefs_lock;
+
+/**
+ * Add the contents of a circular buffer to the event buffer.
+ */
+void oprofile_put_buff(unsigned long *buf, unsigned int start,
+ unsigned int stop, unsigned int max);
+
+unsigned long oprofile_get_cpu_buffer_size(void);
+void oprofile_cpu_buffer_inc_smpl_lost(void);

#endif /* OPROFILE_H */
--
1.5.4.3

2008-08-11 15:35:55

by Carl Love

[permalink] [raw]
Subject: Re: [Cbe-oss-dev] please pull cell merge branch


On Mon, 2008-08-11 at 09:18 +0200, Arnd Bergmann wrote:
> On Monday 11 August 2008, Paul Mackerras wrote:
> > Arnd Bergmann writes:
> > > I've fixed one last bug in Carl's update for cell-oprofile (int flags
> > > instead of unsigned long flags) and made sure the comments fit the
> > > usual style. This fixes a long-standing bug that prevented us from
> > > using oprofile on SPUs in many real-world scenarios.
> >
> > Since you're touching generic oprofile code here, do you have an ack
> > from the oprofile maintainer? Or is the oprofile maintainer listed in
> > MAINTAINERS (Robert Richter) no longer active or something?
> >
>
> Sorry about that. I had assumed that at the very least Carl had had
> the oprofile list on Cc and that people there just didn't care.
>
> Robert has just recently taken over maintainership for oprofile,
> so I assume that he is indeed active and interested in the patches.
> I'll send them out again for his review on oprofile-list.
>
> Arnd <><

Sorry, my mistake. I did mean to post both patches to the OProfile list
as well. I was planning on following up with Robert on the patches this
week since I had not heard from him.

Carl Love

Subject: Re: powerpc/cell/oprofile: avoid double free of profile buffer

On 11.08.08 09:25:43, Arnd Bergmann wrote:
> From: Carl Love <[email protected]>
>
> If an error occurs on opcontrol start, the event and per cpu buffers
> are released. If later opcontrol shutdown is called then the free
> function will be called again to free buffers that no longer
> exist. This results in a kernel oops. The following changes
> prevent the call to delete buffers that don't exist.
>
> Signed-off-by: Carl Love <[email protected]>
> Signed-off-by: Arnd Bergmann <[email protected]>

Acked-by: Robert Richter <[email protected]>

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

Subject: Re: powerpc/cell/oprofile: fix mutex locking for spu-oprofile

I am fine with the changes with the exception of removing
add_event_entry() from include/linux/oprofile.h. Though there is no
usage of the function also in other architectures anymore, this change
in the API should be discussed on the oprofile mailing list. Please
separate the change in a different patch and submit it to the mailing
list. If there are no objections then, this change can go upstream as
well.

-Robert

On 11.08.08 09:25:07, Arnd Bergmann wrote:
> From: Carl Love <[email protected]>
>
> The issue is the SPU code is not holding the kernel mutex lock while
> adding samples to the kernel buffer.
>
> This patch creates per SPU buffers to hold the data. Data
> is added to the buffers from in interrupt context. The data
> is periodically pushed to the kernel buffer via a new Oprofile
> function oprofile_put_buff(). The oprofile_put_buff() function
> is called via a work queue enabling the funtion to acquire the
> mutex lock.
>
> The existing user controls for adjusting the per CPU buffer
> size is used to control the size of the per SPU buffers.
> Similarly, overflows of the SPU buffers are reported by
> incrementing the per CPU buffer stats. This eliminates the
> need to have architecture specific controls for the per SPU
> buffers which is not acceptable to the OProfile user tool
> maintainer.
>
> The export of the oprofile add_event_entry() is removed as it
> is no longer needed given this patch.
>
> Note, this patch has not addressed the issue of indexing arrays
> by the spu number. This still needs to be fixed as the spu
> numbering is not guarenteed to be 0 to max_num_spus-1.
>
> Signed-off-by: Carl Love <[email protected]>
> Signed-off-by: Maynard Johnson <[email protected]>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> arch/powerpc/oprofile/cell/pr_util.h | 13 ++
> arch/powerpc/oprofile/cell/spu_profiler.c | 4 +-
> arch/powerpc/oprofile/cell/spu_task_sync.c | 236 +++++++++++++++++++++++++---
> drivers/oprofile/buffer_sync.c | 24 +++
> drivers/oprofile/cpu_buffer.c | 15 ++-
> drivers/oprofile/event_buffer.h | 7 +
> include/linux/oprofile.h | 16 +-
> 7 files changed, 279 insertions(+), 36 deletions(-)
>
> diff --git a/arch/powerpc/oprofile/cell/pr_util.h b/arch/powerpc/oprofile/cell/pr_util.h
> index 22e4e8d..628009c 100644
> --- a/arch/powerpc/oprofile/cell/pr_util.h
> +++ b/arch/powerpc/oprofile/cell/pr_util.h
> @@ -24,6 +24,11 @@
> #define SKIP_GENERIC_SYNC 0
> #define SYNC_START_ERROR -1
> #define DO_GENERIC_SYNC 1
> +#define SPUS_PER_NODE 8
> +#define DEFAULT_TIMER_EXPIRE (HZ / 10)
> +
> +extern struct delayed_work spu_work;
> +extern int spu_prof_running;
>
> struct spu_overlay_info { /* map of sections within an SPU overlay */
> unsigned int vma; /* SPU virtual memory address from elf */
> @@ -62,6 +67,14 @@ struct vma_to_fileoffset_map { /* map of sections within an SPU program */
>
> };
>
> +struct spu_buffer {
> + int last_guard_val;
> + int ctx_sw_seen;
> + unsigned long *buff;
> + unsigned int head, tail;
> +};
> +
> +
> /* The three functions below are for maintaining and accessing
> * the vma-to-fileoffset map.
> */
> diff --git a/arch/powerpc/oprofile/cell/spu_profiler.c b/arch/powerpc/oprofile/cell/spu_profiler.c
> index 380d7e2..6edaebd 100644
> --- a/arch/powerpc/oprofile/cell/spu_profiler.c
> +++ b/arch/powerpc/oprofile/cell/spu_profiler.c
> @@ -23,12 +23,11 @@
>
> static u32 *samples;
>
> -static int spu_prof_running;
> +int spu_prof_running;
> static unsigned int profiling_interval;
>
> #define NUM_SPU_BITS_TRBUF 16
> #define SPUS_PER_TB_ENTRY 4
> -#define SPUS_PER_NODE 8
>
> #define SPU_PC_MASK 0xFFFF
>
> @@ -208,6 +207,7 @@ int start_spu_profiling(unsigned int cycles_reset)
>
> spu_prof_running = 1;
> hrtimer_start(&timer, kt, HRTIMER_MODE_REL);
> + schedule_delayed_work(&spu_work, DEFAULT_TIMER_EXPIRE);
>
> return 0;
> }
> diff --git a/arch/powerpc/oprofile/cell/spu_task_sync.c b/arch/powerpc/oprofile/cell/spu_task_sync.c
> index 2a9b4a0..2949126 100644
> --- a/arch/powerpc/oprofile/cell/spu_task_sync.c
> +++ b/arch/powerpc/oprofile/cell/spu_task_sync.c
> @@ -35,7 +35,102 @@ static DEFINE_SPINLOCK(buffer_lock);
> static DEFINE_SPINLOCK(cache_lock);
> static int num_spu_nodes;
> int spu_prof_num_nodes;
> -int last_guard_val[MAX_NUMNODES * 8];
> +
> +struct spu_buffer spu_buff[MAX_NUMNODES * SPUS_PER_NODE];
> +struct delayed_work spu_work;
> +static unsigned max_spu_buff;
> +
> +static void spu_buff_add(unsigned long int value, int spu)
> +{
> + /* spu buff is a circular buffer. Add entries to the
> + * head. Head is the index to store the next value.
> + * The buffer is full when there is one available entry
> + * in the queue, i.e. head and tail can't be equal.
> + * That way we can tell the difference between the
> + * buffer being full versus empty.
> + *
> + * ASSUPTION: the buffer_lock is held when this function
> + * is called to lock the buffer, head and tail.
> + */
> + int full = 1;
> +
> + if (spu_buff[spu].head >= spu_buff[spu].tail) {
> + if ((spu_buff[spu].head - spu_buff[spu].tail)
> + < (max_spu_buff - 1))
> + full = 0;
> +
> + } else if (spu_buff[spu].tail > spu_buff[spu].head) {
> + if ((spu_buff[spu].tail - spu_buff[spu].head)
> + > 1)
> + full = 0;
> + }
> +
> + if (!full) {
> + spu_buff[spu].buff[spu_buff[spu].head] = value;
> + spu_buff[spu].head++;
> +
> + if (spu_buff[spu].head >= max_spu_buff)
> + spu_buff[spu].head = 0;
> + } else {
> + /* From the user's perspective make the SPU buffer
> + * size management/overflow look like we are using
> + * per cpu buffers. The user uses the same
> + * per cpu parameter to adjust the SPU buffer size.
> + * Increment the sample_lost_overflow to inform
> + * the user the buffer size needs to be increased.
> + */
> + oprofile_cpu_buffer_inc_smpl_lost();
> + }
> +}
> +
> +/* This function copies the per SPU buffers to the
> + * OProfile kernel buffer.
> + */
> +void sync_spu_buff(void)
> +{
> + int spu;
> + unsigned long flags;
> + int curr_head;
> +
> + for (spu = 0; spu < num_spu_nodes; spu++) {
> + /* In case there was an issue and the buffer didn't
> + * get created skip it.
> + */
> + if (spu_buff[spu].buff == NULL)
> + continue;
> +
> + /* Hold the lock to make sure the head/tail
> + * doesn't change while spu_buff_add() is
> + * deciding if the buffer is full or not.
> + * Being a little paranoid.
> + */
> + spin_lock_irqsave(&buffer_lock, flags);
> + curr_head = spu_buff[spu].head;
> + spin_unlock_irqrestore(&buffer_lock, flags);
> +
> + /* Transfer the current contents to the kernel buffer.
> + * data can still be added to the head of the buffer.
> + */
> + oprofile_put_buff(spu_buff[spu].buff,
> + spu_buff[spu].tail,
> + curr_head, max_spu_buff);
> +
> + spin_lock_irqsave(&buffer_lock, flags);
> + spu_buff[spu].tail = curr_head;
> + spin_unlock_irqrestore(&buffer_lock, flags);
> + }
> +
> +}
> +
> +static void wq_sync_spu_buff(struct work_struct *work)
> +{
> + /* move data from spu buffers to kernel buffer */
> + sync_spu_buff();
> +
> + /* only reschedule if profiling is not done */
> + if (spu_prof_running)
> + schedule_delayed_work(&spu_work, DEFAULT_TIMER_EXPIRE);
> +}
>
> /* Container for caching information about an active SPU task. */
> struct cached_info {
> @@ -305,14 +400,21 @@ static int process_context_switch(struct spu *spu, unsigned long objectId)
>
> /* Record context info in event buffer */
> spin_lock_irqsave(&buffer_lock, flags);
> - add_event_entry(ESCAPE_CODE);
> - add_event_entry(SPU_CTX_SWITCH_CODE);
> - add_event_entry(spu->number);
> - add_event_entry(spu->pid);
> - add_event_entry(spu->tgid);
> - add_event_entry(app_dcookie);
> - add_event_entry(spu_cookie);
> - add_event_entry(offset);
> + spu_buff_add(ESCAPE_CODE, spu->number);
> + spu_buff_add(SPU_CTX_SWITCH_CODE, spu->number);
> + spu_buff_add(spu->number, spu->number);
> + spu_buff_add(spu->pid, spu->number);
> + spu_buff_add(spu->tgid, spu->number);
> + spu_buff_add(app_dcookie, spu->number);
> + spu_buff_add(spu_cookie, spu->number);
> + spu_buff_add(offset, spu->number);
> +
> + /* Set flag to indicate SPU PC data can now be written out. If
> + * the SPU program counter data is seen before an SPU context
> + * record is seen, the postprocessing will fail.
> + */
> + spu_buff[spu->number].ctx_sw_seen = 1;
> +
> spin_unlock_irqrestore(&buffer_lock, flags);
> smp_wmb(); /* insure spu event buffer updates are written */
> /* don't want entries intermingled... */
> @@ -360,6 +462,47 @@ static int number_of_online_nodes(void)
> return nodes;
> }
>
> +static int oprofile_spu_buff_create(void)
> +{
> + int spu;
> +
> + max_spu_buff = oprofile_get_cpu_buffer_size();
> +
> + for (spu = 0; spu < num_spu_nodes; spu++) {
> + /* create circular buffers to store the data in.
> + * use locks to manage accessing the buffers
> + */
> + spu_buff[spu].head = 0;
> + spu_buff[spu].tail = 0;
> +
> + /*
> + * Create a buffer for each SPU. Can't reliably
> + * create a single buffer for all spus due to not
> + * enough contiguous kernel memory.
> + */
> +
> + spu_buff[spu].buff = kzalloc((max_spu_buff
> + * sizeof(unsigned long)),
> + GFP_KERNEL);
> +
> + if (!spu_buff[spu].buff) {
> + printk(KERN_ERR "SPU_PROF: "
> + "%s, line %d: oprofile_spu_buff_create "
> + "failed to allocate spu buffer %d.\n",
> + __func__, __LINE__, spu);
> +
> + /* release the spu buffers that have been allocated */
> + while (spu >= 0) {
> + kfree(spu_buff[spu].buff);
> + spu_buff[spu].buff = 0;
> + spu--;
> + }
> + return -ENOMEM;
> + }
> + }
> + return 0;
> +}
> +
> /* The main purpose of this function is to synchronize
> * OProfile with SPUFS by registering to be notified of
> * SPU task switches.
> @@ -372,20 +515,35 @@ static int number_of_online_nodes(void)
> */
> int spu_sync_start(void)
> {
> - int k;
> + int spu;
> int ret = SKIP_GENERIC_SYNC;
> int register_ret;
> unsigned long flags = 0;
>
> spu_prof_num_nodes = number_of_online_nodes();
> num_spu_nodes = spu_prof_num_nodes * 8;
> + INIT_DELAYED_WORK(&spu_work, wq_sync_spu_buff);
> +
> + /* create buffer for storing the SPU data to put in
> + * the kernel buffer.
> + */
> + ret = oprofile_spu_buff_create();
> + if (ret)
> + goto out;
>
> spin_lock_irqsave(&buffer_lock, flags);
> - add_event_entry(ESCAPE_CODE);
> - add_event_entry(SPU_PROFILING_CODE);
> - add_event_entry(num_spu_nodes);
> + for (spu = 0; spu < num_spu_nodes; spu++) {
> + spu_buff_add(ESCAPE_CODE, spu);
> + spu_buff_add(SPU_PROFILING_CODE, spu);
> + spu_buff_add(num_spu_nodes, spu);
> + }
> spin_unlock_irqrestore(&buffer_lock, flags);
>
> + for (spu = 0; spu < num_spu_nodes; spu++) {
> + spu_buff[spu].ctx_sw_seen = 0;
> + spu_buff[spu].last_guard_val = 0;
> + }
> +
> /* Register for SPU events */
> register_ret = spu_switch_event_register(&spu_active);
> if (register_ret) {
> @@ -393,8 +551,6 @@ int spu_sync_start(void)
> goto out;
> }
>
> - for (k = 0; k < (MAX_NUMNODES * 8); k++)
> - last_guard_val[k] = 0;
> pr_debug("spu_sync_start -- running.\n");
> out:
> return ret;
> @@ -446,13 +602,20 @@ void spu_sync_buffer(int spu_num, unsigned int *samples,
> * use. We need to discard samples taken during the time
> * period which an overlay occurs (i.e., guard value changes).
> */
> - if (grd_val && grd_val != last_guard_val[spu_num]) {
> - last_guard_val[spu_num] = grd_val;
> + if (grd_val && grd_val != spu_buff[spu_num].last_guard_val) {
> + spu_buff[spu_num].last_guard_val = grd_val;
> /* Drop the rest of the samples. */
> break;
> }
>
> - add_event_entry(file_offset | spu_num_shifted);
> + /* We must ensure that the SPU context switch has been written
> + * out before samples for the SPU. Otherwise, the SPU context
> + * information is not available and the postprocessing of the
> + * SPU PC will fail with no available anonymous map information.
> + */
> + if (spu_buff[spu_num].ctx_sw_seen)
> + spu_buff_add((file_offset | spu_num_shifted),
> + spu_num);
> }
> spin_unlock(&buffer_lock);
> out:
> @@ -463,20 +626,41 @@ out:
> int spu_sync_stop(void)
> {
> unsigned long flags = 0;
> - int ret = spu_switch_event_unregister(&spu_active);
> - if (ret) {
> + int ret;
> + int k;
> +
> + ret = spu_switch_event_unregister(&spu_active);
> +
> + if (ret)
> printk(KERN_ERR "SPU_PROF: "
> - "%s, line %d: spu_switch_event_unregister returned %d\n",
> - __func__, __LINE__, ret);
> - goto out;
> - }
> + "%s, line %d: spu_switch_event_unregister " \
> + "returned %d\n",
> + __func__, __LINE__, ret);
> +
> + /* flush any remaining data in the per SPU buffers */
> + sync_spu_buff();
>
> spin_lock_irqsave(&cache_lock, flags);
> ret = release_cached_info(RELEASE_ALL);
> spin_unlock_irqrestore(&cache_lock, flags);
> -out:
> +
> + /* remove scheduled work queue item rather then waiting
> + * for every queued entry to execute. Then flush pending
> + * system wide buffer to event buffer.
> + */
> + cancel_delayed_work(&spu_work);
> +
> + for (k = 0; k < num_spu_nodes; k++) {
> + spu_buff[k].ctx_sw_seen = 0;
> +
> + /*
> + * spu_sys_buff will be null if there was a problem
> + * allocating the buffer. Only delete if it exists.
> + */
> + kfree(spu_buff[k].buff);
> + spu_buff[k].buff = 0;
> + }
> pr_debug("spu_sync_stop -- done.\n");
> return ret;
> }
>
> -
> diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
> index 9304c45..4a70180 100644
> --- a/drivers/oprofile/buffer_sync.c
> +++ b/drivers/oprofile/buffer_sync.c
> @@ -551,3 +551,27 @@ void sync_buffer(int cpu)
>
> mutex_unlock(&buffer_mutex);
> }
> +
> +/* The function can be used to add a buffer worth of data directly to
> + * the kernel buffer. The buffer is assumed to be a circular buffer.
> + * Take the entries from index start and end at index end, wrapping
> + * at max_entries.
> + */
> +void oprofile_put_buff(unsigned long *buf, unsigned int start,
> + unsigned int stop, unsigned int max)
> +{
> + int i;
> +
> + i = start;
> +
> + mutex_lock(&buffer_mutex);
> + while (i != stop) {
> + add_event_entry(buf[i++]);
> +
> + if (i >= max)
> + i = 0;
> + }
> +
> + mutex_unlock(&buffer_mutex);
> +}
> +
> diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
> index 2450b3a..b8601dc 100644
> --- a/drivers/oprofile/cpu_buffer.c
> +++ b/drivers/oprofile/cpu_buffer.c
> @@ -37,11 +37,24 @@ static int work_enabled;
> void free_cpu_buffers(void)
> {
> int i;
> -
> +
> for_each_online_cpu(i)
> vfree(per_cpu(cpu_buffer, i).buffer);
> }
>
> +unsigned long oprofile_get_cpu_buffer_size(void)
> +{
> + return fs_cpu_buffer_size;
> +}
> +
> +void oprofile_cpu_buffer_inc_smpl_lost(void)
> +{
> + struct oprofile_cpu_buffer *cpu_buf
> + = &__get_cpu_var(cpu_buffer);
> +
> + cpu_buf->sample_lost_overflow++;
> +}
> +
> int alloc_cpu_buffers(void)
> {
> int i;
> diff --git a/drivers/oprofile/event_buffer.h b/drivers/oprofile/event_buffer.h
> index 5076ed1..84bf324 100644
> --- a/drivers/oprofile/event_buffer.h
> +++ b/drivers/oprofile/event_buffer.h
> @@ -17,6 +17,13 @@ int alloc_event_buffer(void);
>
> void free_event_buffer(void);
>
> +/**
> + * Add data to the event buffer.
> + * The data passed is free-form, but typically consists of
> + * file offsets, dcookies, context information, and ESCAPE codes.
> + */
> +void add_event_entry(unsigned long data);
> +
> /* wake up the process sleeping on the event file */
> void wake_up_buffer_waiter(void);
>
> diff --git a/include/linux/oprofile.h b/include/linux/oprofile.h
> index 041bb31..1ef7fce 100644
> --- a/include/linux/oprofile.h
> +++ b/include/linux/oprofile.h
> @@ -84,13 +84,6 @@ int oprofile_arch_init(struct oprofile_operations * ops);
> void oprofile_arch_exit(void);
>
> /**
> - * Add data to the event buffer.
> - * The data passed is free-form, but typically consists of
> - * file offsets, dcookies, context information, and ESCAPE codes.
> - */
> -void add_event_entry(unsigned long data);
> -
> -/**
> * Add a sample. This may be called from any context. Pass
> * smp_processor_id() as cpu.
> */
> @@ -160,5 +153,14 @@ int oprofilefs_ulong_from_user(unsigned long * val, char const __user * buf, siz
>
> /** lock for read/write safety */
> extern spinlock_t oprofilefs_lock;
> +
> +/**
> + * Add the contents of a circular buffer to the event buffer.
> + */
> +void oprofile_put_buff(unsigned long *buf, unsigned int start,
> + unsigned int stop, unsigned int max);
> +
> +unsigned long oprofile_get_cpu_buffer_size(void);
> +void oprofile_cpu_buffer_inc_smpl_lost(void);
>
> #endif /* OPROFILE_H */
> --
> 1.5.4.3
>
>

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

2008-08-20 12:07:22

by Arnd Bergmann

[permalink] [raw]
Subject: Re: powerpc/cell/oprofile: fix mutex locking for spu-oprofile

On Wednesday 20 August 2008, Robert Richter wrote:
> I am fine with the changes with the exception of removing
> add_event_entry() from include/linux/oprofile.h. Though there is no
> usage of the function also in other architectures anymore, this change
> in the API should be discussed on the oprofile mailing list. Please
> separate the change in a different patch and submit it to the mailing
> list. If there are no objections then, this change can go upstream as
> well.

As an explanation, the removal of add_event_entry is the whole point
of this patch. add_event_entry must only be called with buffer_mutex
held, but buffer_mutex itself is not exported.
I'm pretty sure that no other user of add_event_entry exists, as it
was exported specifically for the SPU support and that never worked.
Any other (theoretical) code using it would be broken in the same way
and need a corresponding fix.

We can easily leave the declaration in place, but I'd recommend removing
it eventually. If you prefer to keep it, how about marking it as
__deprecated?

Arnd <><

Subject: Re: powerpc/cell/oprofile: fix mutex locking for spu-oprofile

On 20.08.08 14:05:31, Arnd Bergmann wrote:
> On Wednesday 20 August 2008, Robert Richter wrote:
> > I am fine with the changes with the exception of removing
> > add_event_entry() from include/linux/oprofile.h. Though there is no
> > usage of the function also in other architectures anymore, this change
> > in the API should be discussed on the oprofile mailing list. Please
> > separate the change in a different patch and submit it to the mailing
> > list. If there are no objections then, this change can go upstream as
> > well.
>
> As an explanation, the removal of add_event_entry is the whole point
> of this patch. add_event_entry must only be called with buffer_mutex
> held, but buffer_mutex itself is not exported.

Thanks for pointing this out.

> I'm pretty sure that no other user of add_event_entry exists, as it
> was exported specifically for the SPU support and that never worked.
> Any other (theoretical) code using it would be broken in the same way
> and need a corresponding fix.
>
> We can easily leave the declaration in place, but I'd recommend removing
> it eventually. If you prefer to keep it, how about marking it as
> __deprecated?

No, since this is broken by design we remove it. The patch can go
upstream as it is.

Thanks,

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

Subject: Re: powerpc/cell/oprofile: fix mutex locking for spu-oprofile

On 11.08.08 09:25:07, Arnd Bergmann wrote:
> From: Carl Love <[email protected]>
>
> The issue is the SPU code is not holding the kernel mutex lock while
> adding samples to the kernel buffer.
>
> This patch creates per SPU buffers to hold the data. Data
> is added to the buffers from in interrupt context. The data
> is periodically pushed to the kernel buffer via a new Oprofile
> function oprofile_put_buff(). The oprofile_put_buff() function
> is called via a work queue enabling the funtion to acquire the
> mutex lock.
>
> The existing user controls for adjusting the per CPU buffer
> size is used to control the size of the per SPU buffers.
> Similarly, overflows of the SPU buffers are reported by
> incrementing the per CPU buffer stats. This eliminates the
> need to have architecture specific controls for the per SPU
> buffers which is not acceptable to the OProfile user tool
> maintainer.
>
> The export of the oprofile add_event_entry() is removed as it
> is no longer needed given this patch.
>
> Note, this patch has not addressed the issue of indexing arrays
> by the spu number. This still needs to be fixed as the spu
> numbering is not guarenteed to be 0 to max_num_spus-1.
>
> Signed-off-by: Carl Love <[email protected]>
> Signed-off-by: Maynard Johnson <[email protected]>
> Signed-off-by: Arnd Bergmann <[email protected]>

Acked-by: Robert Richter <[email protected]>

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

2008-08-20 13:20:35

by Arnd Bergmann

[permalink] [raw]
Subject: Re: powerpc/cell/oprofile: fix mutex locking for spu-oprofile

On Wednesday 20 August 2008, Robert Richter wrote:

> > Signed-off-by: Carl Love <[email protected]>
> > Signed-off-by: Maynard Johnson <[email protected]>
> > Signed-off-by: Arnd Bergmann <[email protected]>
>
> Acked-by: Robert Richter <[email protected]>
>

Thanks Robert.

Paul, any chance we can still get this into 2.6.27?

I've added the Ack and uploaded it again for you to
pull from

master.kernel.org:/pub/scm/linux/kernel/git/arnd/cell-2.6.git merge

Arnd <><

2008-08-20 14:59:15

by Carl Love

[permalink] [raw]
Subject: Re: powerpc/cell/oprofile: fix mutex locking for spu-oprofile


On Wed, 2008-08-20 at 14:39 +0200, Robert Richter wrote:
> On 20.08.08 14:05:31, Arnd Bergmann wrote:
> > On Wednesday 20 August 2008, Robert Richter wrote:
> > > I am fine with the changes with the exception of removing
> > > add_event_entry() from include/linux/oprofile.h. Though there is no
> > > usage of the function also in other architectures anymore, this change
> > > in the API should be discussed on the oprofile mailing list. Please
> > > separate the change in a different patch and submit it to the mailing
> > > list. If there are no objections then, this change can go upstream as
> > > well.
> >
> > As an explanation, the removal of add_event_entry is the whole point
> > of this patch. add_event_entry must only be called with buffer_mutex
> > held, but buffer_mutex itself is not exported.
>
> Thanks for pointing this out.
>

We originally added add_event_entry() to include/linux/oprofile.h
specifically because it was needed for the CELL SPU support. As it
turns out it the approach was not completely thought through. We were
using the function call without holding the mutex lock. As we
discovered later, this can result in corrupting the data put into the
event buffer. So exposing the function without a way to hold the mutex
lock is actually a really bad idea as it would encourage others to fall
into the same mistake that we made. So, as Arnd said, the whole point
of this patch is to come up with a correct approach to adding the data.

> > I'm pretty sure that no other user of add_event_entry exists, as it
> > was exported specifically for the SPU support and that never worked.
> > Any other (theoretical) code using it would be broken in the same way
> > and need a corresponding fix.
> >
> > We can easily leave the declaration in place, but I'd recommend removing
> > it eventually. If you prefer to keep it, how about marking it as
> > __deprecated?
>
> No, since this is broken by design we remove it. The patch can go
> upstream as it is.
>
> Thanks,

> -Robert
>

It really is best to remove it. Thank you for taking the time to review
and comment on the patch.

Carl Love

2008-08-21 04:21:45

by Paul Mackerras

[permalink] [raw]
Subject: Re: powerpc/cell/oprofile: fix mutex locking for spu-oprofile

Arnd Bergmann writes:

> Paul, any chance we can still get this into 2.6.27?

Possibly. We'll need a really good explanation for Linus as to why
this is needed (what regression or serious bug this fixes) and why it
is late. Can you send me something explaining that?

> I've added the Ack and uploaded it again for you to
> pull from
>
> master.kernel.org:/pub/scm/linux/kernel/git/arnd/cell-2.6.git merge

Are you sure you actually managed to update that?

$ git ls-remote master.kernel.org:/pub/scm/linux/kernel/git/arnd/cell-2.6.git merge
f90a87b5f5fa46dc6c556e9267a6f25a95fbef14 refs/heads/merge

and f90a87b5f5fa46dc6c556e9267a6f25a95fbef14 is the same commit from
2008-08-09 that you asked me to pull previously.

Paul.

2008-08-21 08:20:00

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [Cbe-oss-dev] powerpc/cell/oprofile: fix mutex locking for spu-oprofile

On Thursday 21 August 2008, Paul Mackerras wrote:
> Arnd Bergmann writes:
>
> > Paul, any chance we can still get this into 2.6.27?
>
> Possibly. We'll need a really good explanation for Linus as to why
> this is needed (what regression or serious bug this fixes) and why it
> is late. Can you send me something explaining that?

The patch does not fix a regression, the spu-oprofile code basically never
worked. With the current code in Linux, samples in the profile buffer
can get corrupted because reader and writer to that buffer use different
locks for accessing it. It took us several iterations to come up with
a solution that does not introduce other problems and I didn't want to
push an earlier version that would need more fixups.

Since rc4 is out now, I understand if you feel more comfortable with
putting the patch into -next instead of -merge.
Note that the second patch is trivial and fixes an oopsable condition
of the kernel, so at least that should still go into 2.6.27.

> > I've added the Ack and uploaded it again for you to
> > pull from
> >
> > master.kernel.org:/pub/scm/linux/kernel/git/arnd/cell-2.6.git merge
>
> Are you sure you actually managed to update that?

No, but it's there now. I was missing the '-f' for git-push.

Arnd <><

2008-08-21 10:20:30

by Michael Ellerman

[permalink] [raw]
Subject: Re: [Cbe-oss-dev] powerpc/cell/oprofile: fix mutex locking for spu-oprofile

On Thu, 2008-08-21 at 10:14 +0200, Arnd Bergmann wrote:
> On Thursday 21 August 2008, Paul Mackerras wrote:
> > Arnd Bergmann writes:
> >
> > > Paul, any chance we can still get this into 2.6.27?
> >
> > Possibly. We'll need a really good explanation for Linus as to why
> > this is needed (what regression or serious bug this fixes) and why it
> > is late. Can you send me something explaining that?
>
> The patch does not fix a regression, the spu-oprofile code basically never
> worked. With the current code in Linux, samples in the profile buffer
> can get corrupted because reader and writer to that buffer use different
> locks for accessing it.

Actually for me it worked[1] a reasonable amount of the time, enough to
be useful. So the spu-oprofile code has always been broken in this way,
but it's not always fatal. So the patch doesn't fix a regression, but it
fixes a serious user-visible bug, which makes it legit rc4 material
IMHO.

[1] that was late last year, so possibly a kernel or two ago.

cheers

--
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-08-21 14:59:49

by Carl Love

[permalink] [raw]
Subject: Re: [Cbe-oss-dev] powerpc/cell/oprofile: fix mutex locking for spu-oprofile


On Thu, 2008-08-21 at 20:20 +1000, Michael Ellerman wrote:
> On Thu, 2008-08-21 at 10:14 +0200, Arnd Bergmann wrote:
> > On Thursday 21 August 2008, Paul Mackerras wrote:
> > > Arnd Bergmann writes:
> > >
> > > > Paul, any chance we can still get this into 2.6.27?
> > >
> > > Possibly. We'll need a really good explanation for Linus as to why
> > > this is needed (what regression or serious bug this fixes) and why it
> > > is late. Can you send me something explaining that?
> >
> > The patch does not fix a regression, the spu-oprofile code basically never
> > worked. With the current code in Linux, samples in the profile buffer
> > can get corrupted because reader and writer to that buffer use different
> > locks for accessing it.
>
> Actually for me it worked[1] a reasonable amount of the time, enough to
> be useful. So the spu-oprofile code has always been broken in this way,
> but it's not always fatal. So the patch doesn't fix a regression, but it
> fixes a serious user-visible bug, which makes it legit rc4 material
> IMHO.
>
> [1] that was late last year, so possibly a kernel or two ago.

The bug came in the original OProfile SPU support that was put out about
2 years ago. The way the code was there was a window in which you may
get corruption. It was not until Jan 08 when we got the first report of
the bug from Michael and identified it. Since then there have been
three or four more people who have hit and reported the bug. I am
seeing the bug show up more frequently with the latest couple of weekly
SDK 3.1 kernels. It would seem that the kernel may have changed such
that the timing is more likely to hit the bug. For the Beta SDK 3.1
release the IVT team was not able to complete their OProfile testing due
to the bug.
>
> cheers
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________ oprofile-list mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oprofile-list

2008-08-25 07:56:31

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [Cbe-oss-dev] powerpc/cell/oprofile: fix mutex locking for spu-oprofile

On Monday 25 August 2008, Paul Mackerras wrote:
>
> > Since rc4 is out now, I understand if you feel more comfortable with
> > putting the patch into -next instead of -merge.
>
> Linus has been getting stricter about only putting in fixes for
> regressions and serious bugs (see his recent email to Dave Airlie on
> LKML for instance). ?I assume that the corruption is just in the data
> that is supplied to userspace and doesn't extend to any kernel data
> structures.

That's right, please queue it for -next then.

> > Note that the second patch is trivial and fixes an oopsable condition
> > of the kernel, so at least that should still go into 2.6.27.
>
> OK, I'll cherry-pick that one for my next batch for Linus.

Thanks,

Arnd <><

2008-08-25 06:37:15

by Paul Mackerras

[permalink] [raw]
Subject: Re: [Cbe-oss-dev] powerpc/cell/oprofile: fix mutex locking for spu-oprofile

Arnd Bergmann writes:

> The patch does not fix a regression, the spu-oprofile code basically never
> worked. With the current code in Linux, samples in the profile buffer
> can get corrupted because reader and writer to that buffer use different
> locks for accessing it. It took us several iterations to come up with
> a solution that does not introduce other problems and I didn't want to
> push an earlier version that would need more fixups.
>
> Since rc4 is out now, I understand if you feel more comfortable with
> putting the patch into -next instead of -merge.

Linus has been getting stricter about only putting in fixes for
regressions and serious bugs (see his recent email to Dave Airlie on
LKML for instance). I assume that the corruption is just in the data
that is supplied to userspace and doesn't extend to any kernel data
structures. If it does then we have a much stronger argument for
pushing this stuff for 2.6.27.

> Note that the second patch is trivial and fixes an oopsable condition
> of the kernel, so at least that should still go into 2.6.27.

OK, I'll cherry-pick that one for my next batch for Linus.

Paul.

2008-10-13 14:53:58

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [Cbe-oss-dev] powerpc/cell/oprofile: fix mutex locking for spu-oprofile

On Monday 25 August 2008, Arnd Bergmann wrote:
> On Monday 25 August 2008, Paul Mackerras wrote:
> >
> > > Since rc4 is out now, I understand if you feel more comfortable with
> > > putting the patch into -next instead of -merge.
> >
> > Linus has been getting stricter about only putting in fixes for
> > regressions and serious bugs (see his recent email to Dave Airlie on
> > LKML for instance). ?I assume that the corruption is just in the data
> > that is supplied to userspace and doesn't extend to any kernel data
> > structures.
>
> That's right, please queue it for -next then.

I just realized that this patch never made it into powerpc-next after
all, neither benh nor paulus version. Whoever is handling it today,
could you please pull

master.kernel.org:/pub/scm/linux/kernel/git/arnd/cell-2.6.git merge

to get this commit below. I have rebased it on top of the current
benh/powerpc/next branch.

Thanks,

Arnd <><

---

commit aa5810fa545515c9f383e3e649bd120bef9c7f29
Author: Carl Love <[email protected]>
Date: Fri Aug 8 15:38:36 2008 -0700

powerpc/cell/oprofile: fix mutex locking for spu-oprofile

The issue is the SPU code is not holding the kernel mutex lock while
adding samples to the kernel buffer.

This patch creates per SPU buffers to hold the data. Data
is added to the buffers from in interrupt context. The data
is periodically pushed to the kernel buffer via a new Oprofile
function oprofile_put_buff(). The oprofile_put_buff() function
is called via a work queue enabling the funtion to acquire the
mutex lock.

The existing user controls for adjusting the per CPU buffer
size is used to control the size of the per SPU buffers.
Similarly, overflows of the SPU buffers are reported by
incrementing the per CPU buffer stats. This eliminates the
need to have architecture specific controls for the per SPU
buffers which is not acceptable to the OProfile user tool
maintainer.

The export of the oprofile add_event_entry() is removed as it
is no longer needed given this patch.

Note, this patch has not addressed the issue of indexing arrays
by the spu number. This still needs to be fixed as the spu
numbering is not guarenteed to be 0 to max_num_spus-1.

Signed-off-by: Carl Love <[email protected]>
Signed-off-by: Maynard Johnson <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Acked-by: Robert Richter <[email protected]>

arch/powerpc/oprofile/cell/pr_util.h | 13 +
arch/powerpc/oprofile/cell/spu_profiler.c | 4
arch/powerpc/oprofile/cell/spu_task_sync.c | 236 ++++++++++++++++++++++++---
drivers/oprofile/buffer_sync.c | 24 ++
drivers/oprofile/cpu_buffer.c | 15 +
drivers/oprofile/event_buffer.c | 2
drivers/oprofile/event_buffer.h | 7
include/linux/oprofile.h | 16 +
drivers/oprofile/cpu_buffer.c | 4
9 files changed, 284 insertions(+), 37 deletions(-)

Subject: Re: [Cbe-oss-dev] powerpc/cell/oprofile: fix mutex locking for spu-oprofile

On 13.10.08 16:53:28, Arnd Bergmann wrote:
> On Monday 25 August 2008, Arnd Bergmann wrote:
> > On Monday 25 August 2008, Paul Mackerras wrote:
> > >
> > > > Since rc4 is out now, I understand if you feel more comfortable with
> > > > putting the patch into -next instead of -merge.
> > >
> > > Linus has been getting stricter about only putting in fixes for
> > > regressions and serious bugs (see his recent email to Dave Airlie on
> > > LKML for instance). ?I assume that the corruption is just in the data
> > > that is supplied to userspace and doesn't extend to any kernel data
> > > structures.
> >
> > That's right, please queue it for -next then.
>
> I just realized that this patch never made it into powerpc-next after
> all, neither benh nor paulus version. Whoever is handling it today,
> could you please pull
>
> master.kernel.org:/pub/scm/linux/kernel/git/arnd/cell-2.6.git merge
>
> to get this commit below. I have rebased it on top of the current
> benh/powerpc/next branch.

All powerpc oprofile patches are in this branch:

git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git powerpc-for-paul

Pending patches are:

Carl Love (1):
powerpc/cell/oprofile: fix mutex locking for spu-oprofile

Roel Kluin (1):
powerpc/cell/oprofile: vma_map: fix test on overlay_tbl_offset

Please pull from there.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]