Hi,
I've been trying to get hardware breakpoints with perf to work on POWER7
but I'm getting the following:
% perf record -e mem:0x10000000 true
Error: sys_perf_event_open() syscall returned with 28 (No space left on device). /bin/dmesg may provide additional information.
Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
true: Terminated
(FWIW adding -a and it works fine)
Debugging it seems that __reserve_bp_slot() is returning ENOSPC because
it thinks there are no free breakpoint slots on this CPU.
I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls
to add a counter to each CPU [1]. The first syscall succeeds but the
second is failing.
On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
despite there being no breakpoint on this CPU. This is because the call
the task_bp_pinned, checks all CPUs, rather than just the current CPU.
POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
return ENOSPC.
The following patch fixes this by checking the associated CPU for each
breakpoint in task_bp_pinned. I'm not familiar with this code, so it's
provided as a reference to the above issue.
Mikey
1. not sure why it doesn't just do one syscall and specify all CPUs, but
that's another issue. Using two syscalls should work.
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index bb38c4d..e092daa 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -111,14 +111,16 @@ static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
* Count the number of breakpoints of the same type and same task.
* The given event must be not on the list.
*/
-static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
+static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
{
struct task_struct *tsk = bp->hw.bp_target;
struct perf_event *iter;
int count = 0;
list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
- if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type)
+ if (iter->hw.bp_target == tsk &&
+ find_slot_idx(iter) == type &&
+ cpu == iter->cpu)
count += hw_breakpoint_weight(iter);
}
@@ -141,7 +143,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
if (!tsk)
slots->pinned += max_task_bp_pinned(cpu, type);
else
- slots->pinned += task_bp_pinned(bp, type);
+ slots->pinned += task_bp_pinned(cpu, bp, type);
slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
return;
@@ -154,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
if (!tsk)
nr += max_task_bp_pinned(cpu, type);
else
- nr += task_bp_pinned(bp, type);
+ nr += task_bp_pinned(cpu, bp, type);
if (nr > slots->pinned)
slots->pinned = nr;
@@ -188,7 +190,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
int old_idx = 0;
int idx = 0;
- old_count = task_bp_pinned(bp, type);
+ old_count = task_bp_pinned(cpu, bp, type);
old_idx = old_count - 1;
idx = old_idx + weight;
On Thu, 2012-08-16 at 14:23 +1000, Michael Neuling wrote:
>
> On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> despite there being no breakpoint on this CPU. This is because the call
> the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> return ENOSPC.
I think this comes from the ptrace legacy, we register a breakpoint on
all cpus because when we migrate a task it cannot fail to migrate the
breakpoint.
Its one of the things I hate most about the hwbp stuff as it relates to
perf.
Frederic knows more...
Peter,
> > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > despite there being no breakpoint on this CPU. This is because the call
> > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > return ENOSPC.
>
> I think this comes from the ptrace legacy, we register a breakpoint on
> all cpus because when we migrate a task it cannot fail to migrate the
> breakpoint.
>
> Its one of the things I hate most about the hwbp stuff as it relates to
> perf.
>
> Frederic knows more...
Maybe I should wait for Frederic to respond but I'm not sure I
understand what you're saying.
I can see how using ptrace hw breakpoints and perf hw breakpoints at the
same time could be a problem, but I'm not sure how this would stop it.
Are you saying that we need to keep at least 1 slot free at all times,
so that we can use it for ptrace?
Is "perf record -e mem:0x10000000 true" ever going to be able to work on
POWER7 with only one hw breakpoint resource per CPU?
Thanks,
Mikey
On Thu, 2012-08-16 at 21:17 +1000, Michael Neuling wrote:
> Peter,
>
> > > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > > despite there being no breakpoint on this CPU. This is because the call
> > > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > > return ENOSPC.
> >
> > I think this comes from the ptrace legacy, we register a breakpoint on
> > all cpus because when we migrate a task it cannot fail to migrate the
> > breakpoint.
> >
> > Its one of the things I hate most about the hwbp stuff as it relates to
> > perf.
> >
> > Frederic knows more...
>
> Maybe I should wait for Frederic to respond but I'm not sure I
> understand what you're saying.
>
> I can see how using ptrace hw breakpoints and perf hw breakpoints at the
> same time could be a problem, but I'm not sure how this would stop it.
ptrace uses perf for hwbp support so we're stuck with all kinds of
stupid ptrace constraints.. or somesuch.
> Are you saying that we need to keep at least 1 slot free at all times,
> so that we can use it for ptrace?
No, I'm saying perf-hwbp is weird because of ptrace, maybe the ptrace
weirdness shouldn't live in perf-hwpb but in the ptrace-perf glue
however..
> Is "perf record -e mem:0x10000000 true" ever going to be able to work on
> POWER7 with only one hw breakpoint resource per CPU?
I think it should work... but I'm fairly sure it currently doesn't
because of how things are done. 'perf record -ie mem:0x100... true'
might just work.
I always forget all the ptrace details but I am forever annoyed at the
mess that is perf-hwbp.. Frederic is there really nothing we can do
about this?
The fact that ptrace hwbp semantics are different per architecture
doesn't help of course.
On Thu, 2012-08-16 at 13:44 +0200, Peter Zijlstra wrote:
> On Thu, 2012-08-16 at 21:17 +1000, Michael Neuling wrote:
> > Peter,
> >
> > > > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > > > despite there being no breakpoint on this CPU. This is because the call
> > > > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > > > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > > > return ENOSPC.
> > >
> > > I think this comes from the ptrace legacy, we register a breakpoint on
> > > all cpus because when we migrate a task it cannot fail to migrate the
> > > breakpoint.
> > >
> > > Its one of the things I hate most about the hwbp stuff as it relates to
> > > perf.
> > >
> > > Frederic knows more...
> >
> > Maybe I should wait for Frederic to respond but I'm not sure I
> > understand what you're saying.
> >
> > I can see how using ptrace hw breakpoints and perf hw breakpoints at the
> > same time could be a problem, but I'm not sure how this would stop it.
>
> ptrace uses perf for hwbp support so we're stuck with all kinds of
> stupid ptrace constraints.. or somesuch.
>
> > Are you saying that we need to keep at least 1 slot free at all times,
> > so that we can use it for ptrace?
>
> No, I'm saying perf-hwbp is weird because of ptrace, maybe the ptrace
> weirdness shouldn't live in perf-hwpb but in the ptrace-perf glue
> however..
But how else would it work, even if ptrace wasn't in the picture?
You do want to guarantee that the task will always be subject to the
breakpoint, even if it moves cpus. So is there any way to guarantee that
other than reserving a breakpoint slot on every cpu ahead of time?
Or can a hwbp event go into error state if it can't be installed on the
new cpu, like a pinned event does? I can't see any code that does that.
cheers
On Fri, 2012-08-17 at 00:02 +1000, Michael Ellerman wrote:
> You do want to guarantee that the task will always be subject to the
> breakpoint, even if it moves cpus. So is there any way to guarantee that
> other than reserving a breakpoint slot on every cpu ahead of time?
That's not how regular perf works.. regular perf can overload hw
resources at will and stuff is strictly per-cpu.
So the regular perf record has perf_event_attr::inherit enabled by
default, this will result in it creating a per-task-per-cpu event for
each cpu and this will succeed because there's no strict reservation to
avoid/detect starvation against perf_event_attr::pinned events.
For regular (!pinned) events, we'll RR the created events on the
available hardware resources.
HWBP does things completely different and reserves a slot over all CPUs
for everything, thus stuff completely falls apart.
> > > > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > > > despite there being no breakpoint on this CPU. This is because the call
> > > > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > > > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > > > return ENOSPC.
> > >
> > > I think this comes from the ptrace legacy, we register a breakpoint on
> > > all cpus because when we migrate a task it cannot fail to migrate the
> > > breakpoint.
> > >
> > > Its one of the things I hate most about the hwbp stuff as it relates to
> > > perf.
> > >
> > > Frederic knows more...
> >
> > Maybe I should wait for Frederic to respond but I'm not sure I
> > understand what you're saying.
> >
> > I can see how using ptrace hw breakpoints and perf hw breakpoints at the
> > same time could be a problem, but I'm not sure how this would stop it.
>
> ptrace uses perf for hwbp support so we're stuck with all kinds of
> stupid ptrace constraints.. or somesuch.
OK
> > Are you saying that we need to keep at least 1 slot free at all times,
> > so that we can use it for ptrace?
>
> No, I'm saying perf-hwbp is weird because of ptrace, maybe the ptrace
> weirdness shouldn't live in perf-hwpb but in the ptrace-perf glue
> however..
OK.
> > Is "perf record -e mem:0x10000000 true" ever going to be able to work on
> > POWER7 with only one hw breakpoint resource per CPU?
>
> I think it should work... but I'm fairly sure it currently doesn't
> because of how things are done. 'perf record -ie mem:0x100... true'
> might just work.
Adding -i doesn't help.
Mikey
On Thu, 2012-08-16 at 16:15 +0200, Peter Zijlstra wrote:
> On Fri, 2012-08-17 at 00:02 +1000, Michael Ellerman wrote:
> > You do want to guarantee that the task will always be subject to the
> > breakpoint, even if it moves cpus. So is there any way to guarantee that
> > other than reserving a breakpoint slot on every cpu ahead of time?
>
> That's not how regular perf works.. regular perf can overload hw
> resources at will and stuff is strictly per-cpu.
..
> For regular (!pinned) events, we'll RR the created events on the
> available hardware resources.
Yeah I know, but that isn't really the semantics you want for a
breakpoint. You don't want to sometimes have the breakpoint active and
sometimes not, it needs to be active at all times when the task is
running.
At the very least you want it to behave like a pinned event, ie. if it
can't be scheduled you get notified and can tell the user.
> HWBP does things completely different and reserves a slot over all CPUs
> for everything, thus stuff completely falls apart.
So it would seem :)
I guess my point was that reserving a slot on each cpu seems like a
reasonable way of guaranteeing that wherever the task goes we will be
able to install the breakpoint.
But obviously we need some way to make it play nice with perf.
cheers
On Thu, Aug 16, 2012 at 02:23:54PM +1000, Michael Neuling wrote:
> Hi,
>
> I've been trying to get hardware breakpoints with perf to work on POWER7
> but I'm getting the following:
>
> % perf record -e mem:0x10000000 true
>
> Error: sys_perf_event_open() syscall returned with 28 (No space left on device). /bin/dmesg may provide additional information.
>
> Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
>
> true: Terminated
>
> (FWIW adding -a and it works fine)
>
> Debugging it seems that __reserve_bp_slot() is returning ENOSPC because
> it thinks there are no free breakpoint slots on this CPU.
>
> I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls
> to add a counter to each CPU [1]. The first syscall succeeds but the
> second is failing.
>
> On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> despite there being no breakpoint on this CPU. This is because the call
> the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> return ENOSPC.
>
> The following patch fixes this by checking the associated CPU for each
> breakpoint in task_bp_pinned. I'm not familiar with this code, so it's
> provided as a reference to the above issue.
>
> Mikey
>
> 1. not sure why it doesn't just do one syscall and specify all CPUs, but
> that's another issue. Using two syscalls should work.
This patch seems to make sense. I'll try it and run some tests.
Can I have your Signed-off-by ?
Thanks.
>
> diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
> index bb38c4d..e092daa 100644
> --- a/kernel/events/hw_breakpoint.c
> +++ b/kernel/events/hw_breakpoint.c
> @@ -111,14 +111,16 @@ static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
> * Count the number of breakpoints of the same type and same task.
> * The given event must be not on the list.
> */
> -static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
> +static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
> {
> struct task_struct *tsk = bp->hw.bp_target;
> struct perf_event *iter;
> int count = 0;
>
> list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
> - if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type)
> + if (iter->hw.bp_target == tsk &&
> + find_slot_idx(iter) == type &&
> + cpu == iter->cpu)
> count += hw_breakpoint_weight(iter);
> }
>
> @@ -141,7 +143,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
> if (!tsk)
> slots->pinned += max_task_bp_pinned(cpu, type);
> else
> - slots->pinned += task_bp_pinned(bp, type);
> + slots->pinned += task_bp_pinned(cpu, bp, type);
> slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
>
> return;
> @@ -154,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
> if (!tsk)
> nr += max_task_bp_pinned(cpu, type);
> else
> - nr += task_bp_pinned(bp, type);
> + nr += task_bp_pinned(cpu, bp, type);
>
> if (nr > slots->pinned)
> slots->pinned = nr;
> @@ -188,7 +190,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
> int old_idx = 0;
> int idx = 0;
>
> - old_count = task_bp_pinned(bp, type);
> + old_count = task_bp_pinned(cpu, bp, type);
> old_idx = old_count - 1;
> idx = old_idx + weight;
>
Frederic Weisbecker <[email protected]> wrote:
> On Thu, Aug 16, 2012 at 02:23:54PM +1000, Michael Neuling wrote:
> > Hi,
> >
> > I've been trying to get hardware breakpoints with perf to work on POWER7
> > but I'm getting the following:
> >
> > % perf record -e mem:0x10000000 true
> >
> > Error: sys_perf_event_open() syscall returned with 28 (No space left on device). /bin/dmesg may provide additional information.
> >
> > Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
> >
> > true: Terminated
> >
> > (FWIW adding -a and it works fine)
> >
> > Debugging it seems that __reserve_bp_slot() is returning ENOSPC because
> > it thinks there are no free breakpoint slots on this CPU.
> >
> > I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls
> > to add a counter to each CPU [1]. The first syscall succeeds but the
> > second is failing.
> >
> > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> > despite there being no breakpoint on this CPU. This is because the call
> > the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> > return ENOSPC.
> >
> > The following patch fixes this by checking the associated CPU for each
> > breakpoint in task_bp_pinned. I'm not familiar with this code, so it's
> > provided as a reference to the above issue.
> >
> > Mikey
> >
> > 1. not sure why it doesn't just do one syscall and specify all CPUs, but
> > that's another issue. Using two syscalls should work.
>
> This patch seems to make sense. I'll try it and run some tests.
> Can I have your Signed-off-by ?
Of course...
Signed-off-by: Michael Neuling <[email protected]>
>
> Thanks.
>
> >
> > diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
> > index bb38c4d..e092daa 100644
> > --- a/kernel/events/hw_breakpoint.c
> > +++ b/kernel/events/hw_breakpoint.c
> > @@ -111,14 +111,16 @@ static unsigned int max_task_bp_pinned(int cpu, enum bp_type_idx type)
> > * Count the number of breakpoints of the same type and same task.
> > * The given event must be not on the list.
> > */
> > -static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
> > +static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx type)
> > {
> > struct task_struct *tsk = bp->hw.bp_target;
> > struct perf_event *iter;
> > int count = 0;
> >
> > list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
> > - if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type)
> > + if (iter->hw.bp_target == tsk &&
> > + find_slot_idx(iter) == type &&
> > + cpu == iter->cpu)
> > count += hw_breakpoint_weight(iter);
> > }
> >
> > @@ -141,7 +143,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
> > if (!tsk)
> > slots->pinned += max_task_bp_pinned(cpu, type);
> > else
> > - slots->pinned += task_bp_pinned(bp, type);
> > + slots->pinned += task_bp_pinned(cpu, bp, type);
> > slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
> >
> > return;
> > @@ -154,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct perf_event *bp,
> > if (!tsk)
> > nr += max_task_bp_pinned(cpu, type);
> > else
> > - nr += task_bp_pinned(bp, type);
> > + nr += task_bp_pinned(cpu, bp, type);
> >
> > if (nr > slots->pinned)
> > slots->pinned = nr;
> > @@ -188,7 +190,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, int cpu, bool enable,
> > int old_idx = 0;
> > int idx = 0;
> >
> > - old_count = task_bp_pinned(bp, type);
> > + old_count = task_bp_pinned(cpu, bp, type);
> > old_idx = old_count - 1;
> > idx = old_idx + weight;
> >
>