2009-12-17 17:20:24

by K.Prasad

[permalink] [raw]
Subject: [Patch 0/1] Enable user-space breakpoint requests using PID

Hi All,
Please find a patch that enables user-space breakpoints to be
requested using the 'pid' of a process.

The breakpoint request, thus received, is active over all threads of
the process (all task_structs having the same 'tgid') and on any new
threads that get spawned eventually.

Given that the 'pid' of a process is more visible (than its
task_struct), such an interface is expected to be useful to
debuggers/tracers (such as SystemTap).

The patch is based on commit 7818b3d0fc68f5c2a85fed86d9fa37131c5a3068 of
-tip tree. It has been tested with a hack to the -tip kernel code (that
comments out all references to event->filp) due to a bug in the
perf-events code (reported here [email protected]).

Thanks,
K.Prasad


2009-12-17 17:23:10

by K.Prasad

[permalink] [raw]
Subject: [Patch 1/1] Introduce register_user_hbp_by_pid() and unregister_user_hbp_by_pid()

Provide an interface to (un)register user-space breakpoints using a
process' pid.

Signed-off-by: K.Prasad <[email protected]>
---
include/linux/hw_breakpoint.h | 8 +++
kernel/hw_breakpoint.c | 92 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 100 insertions(+)

Index: linux-2.6-tip.reg_by_pid/include/linux/hw_breakpoint.h
===================================================================
--- linux-2.6-tip.reg_by_pid.orig/include/linux/hw_breakpoint.h
+++ linux-2.6-tip.reg_by_pid/include/linux/hw_breakpoint.h
@@ -54,6 +54,10 @@ register_user_hw_breakpoint(struct perf_
perf_overflow_handler_t triggered,
struct task_struct *tsk);

+extern int register_user_hbp_by_pid(struct perf_event_attr *attr,
+ perf_overflow_handler_t triggered,
+ pid_t pid);
+extern void unregister_user_hbp_by_pid(pid_t pid);
/* FIXME: only change from the attr, and don't unregister */
extern int
modify_user_hw_breakpoint(struct perf_event *bp, struct perf_event_attr *attr);
@@ -91,6 +95,10 @@ static inline struct perf_event *
register_user_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
struct task_struct *tsk) { return NULL; }
+int register_user_hbp_by_pid(struct perf_event_attr *attr,
+ perf_overflow_handler_t triggered,
+ pid_t pid) { return 0; }
+void unregister_user_hbp_by_pid(pid_t pid) {}
static inline int
modify_user_hw_breakpoint(struct perf_event *bp,
struct perf_event_attr *attr) { return -ENOSYS; }
Index: linux-2.6-tip.reg_by_pid/kernel/hw_breakpoint.c
===================================================================
--- linux-2.6-tip.reg_by_pid.orig/kernel/hw_breakpoint.c
+++ linux-2.6-tip.reg_by_pid/kernel/hw_breakpoint.c
@@ -298,6 +298,98 @@ int register_perf_hw_breakpoint(struct p
return ret;
}

+/*
+ * Unregister breakpoints thread-by-thread, for all threads ranging from
+ * @start to @end.
+ */
+static inline void __unregister_user_hbp_for_threads(struct task_struct *start,
+ struct task_struct *end)
+{
+ struct perf_event *bp, *temp_bp;
+
+ do {
+ mutex_lock(&start->perf_event_mutex);
+ list_for_each_entry_safe(bp, temp_bp, &start->perf_event_list,
+ owner_entry) {
+ if (bp->attr.type != PERF_TYPE_BREAKPOINT)
+ continue;
+ unregister_hw_breakpoint(bp);
+ break;
+ }
+ mutex_unlock(&start->perf_event_mutex);
+ } while_each_thread(start, end);
+}
+
+/**
+ * register_user_hbp_by_pid - register a hardware breakpoint for user space using pid
+ * @attr: breakpoint attributes
+ * @triggered: callback to trigger when we hit the breakpoint
+ * @pid: pid of the thread group for which breakpoints must be registered
+ */
+int register_user_hbp_by_pid(struct perf_event_attr *attr,
+ perf_overflow_handler_t triggered,
+ pid_t pid)
+{
+ int ret;
+ struct task_struct *t1, *t2;
+
+ t1 = t2 = find_task_by_vpid(pid);
+ if (t1 == NULL)
+ return -ESRCH;
+
+ /*
+ * Ensure that the breakpoint propogates to every new thread created in
+ * this thread_group.
+ */
+ attr->inherit = 1;
+ /*
+ * Register a breakpoint individually for every thread of the
+ * thread_group using register_user_hw_breakpoint() interface.
+ * Warning: Involves redundant validation checks using
+ * arch_validate_hwbkpt_settings().
+ */
+ do {
+ ret = IS_ERR(register_user_hw_breakpoint(attr, triggered, t1));
+ if (ret)
+ goto fail;
+ t1 = next_thread(t1);
+ } while (t1 != t2);
+
+ return 0;
+fail:
+ /*
+ * Check if the very first register_user_hw_breakpoint() request
+ * failed. If then, do nothing but return the error value.
+ */
+ if (t1 == t2)
+ return ret;
+ /*
+ * Since there exists a thread where the breakpoint request was not
+ * successful, we are unable to provide a process-wide breakpoint. Hence
+ * cleanup the breakpoints from the previously registered threads.
+ */
+ __unregister_user_hbp_for_threads(t2, t1);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(register_user_hbp_by_pid);
+
+/**
+ * unregister_hbp_by_pid - unregister a user-space hardware breakpoint previously registered using a pid
+ * @pid: pid of the process for which breakpoint must be unregistered
+ */
+void unregister_user_hbp_by_pid(pid_t pid)
+{
+ struct task_struct *t1, *t2;
+
+ t1 = t2 = find_task_by_vpid(pid);
+ if (t1 == NULL)
+ return;
+
+ __unregister_user_hbp_for_threads(t1, t2);
+}
+EXPORT_SYMBOL_GPL(unregister_user_hbp_by_pid);
+
/**
* register_user_hw_breakpoint - register a hardware breakpoint for user space
* @attr: breakpoint attributes

2009-12-18 20:47:55

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [Patch 1/1] Introduce register_user_hbp_by_pid() and unregister_user_hbp_by_pid()

On Thu, Dec 17, 2009 at 10:52:53PM +0530, K.Prasad wrote:
> Provide an interface to (un)register user-space breakpoints using a
> process' pid.
>
> Signed-off-by: K.Prasad <[email protected]>
> ---
> include/linux/hw_breakpoint.h | 8 +++
> kernel/hw_breakpoint.c | 92 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 100 insertions(+)
>
> Index: linux-2.6-tip.reg_by_pid/include/linux/hw_breakpoint.h
> ===================================================================
> --- linux-2.6-tip.reg_by_pid.orig/include/linux/hw_breakpoint.h
> +++ linux-2.6-tip.reg_by_pid/include/linux/hw_breakpoint.h
> @@ -54,6 +54,10 @@ register_user_hw_breakpoint(struct perf_
> perf_overflow_handler_t triggered,
> struct task_struct *tsk);
>
> +extern int register_user_hbp_by_pid(struct perf_event_attr *attr,
> + perf_overflow_handler_t triggered,
> + pid_t pid);
> +extern void unregister_user_hbp_by_pid(pid_t pid);
> /* FIXME: only change from the attr, and don't unregister */
> extern int
> modify_user_hw_breakpoint(struct perf_event *bp, struct perf_event_attr *attr);
> @@ -91,6 +95,10 @@ static inline struct perf_event *
> register_user_hw_breakpoint(struct perf_event_attr *attr,
> perf_overflow_handler_t triggered,
> struct task_struct *tsk) { return NULL; }
> +int register_user_hbp_by_pid(struct perf_event_attr *attr,
> + perf_overflow_handler_t triggered,
> + pid_t pid) { return 0; }
> +void unregister_user_hbp_by_pid(pid_t pid) {}
> static inline int
> modify_user_hw_breakpoint(struct perf_event *bp,
> struct perf_event_attr *attr) { return -ENOSYS; }
> Index: linux-2.6-tip.reg_by_pid/kernel/hw_breakpoint.c
> ===================================================================
> --- linux-2.6-tip.reg_by_pid.orig/kernel/hw_breakpoint.c
> +++ linux-2.6-tip.reg_by_pid/kernel/hw_breakpoint.c
> @@ -298,6 +298,98 @@ int register_perf_hw_breakpoint(struct p
> return ret;
> }
>
> +/*
> + * Unregister breakpoints thread-by-thread, for all threads ranging from
> + * @start to @end.
> + */
> +static inline void __unregister_user_hbp_for_threads(struct task_struct *start,
> + struct task_struct *end)



I'm not sure this wants to be inlined. The function is not not
that tiny. May be let the compiler choose?


> +{
> + struct perf_event *bp, *temp_bp;
> +
> + do {
> + mutex_lock(&start->perf_event_mutex);
> + list_for_each_entry_safe(bp, temp_bp, &start->perf_event_list,
> + owner_entry) {
> + if (bp->attr.type != PERF_TYPE_BREAKPOINT)
> + continue;
> + unregister_hw_breakpoint(bp);
> + break;



Do you really want to unregister all of them? What about those
that may have been registered using perf syscall?

> +/**
> + * register_user_hbp_by_pid - register a hardware breakpoint for user space using pid
> + * @attr: breakpoint attributes
> + * @triggered: callback to trigger when we hit the breakpoint
> + * @pid: pid of the thread group for which breakpoints must be registered
> + */
> +int register_user_hbp_by_pid(struct perf_event_attr *attr,
> + perf_overflow_handler_t triggered,
> + pid_t pid)
> +{
> + int ret;
> + struct task_struct *t1, *t2;



This needs rcu_read_lock()



> + t1 = t2 = find_task_by_vpid(pid);
> + if (t1 == NULL)
> + return -ESRCH;
> +
> + /*
> + * Ensure that the breakpoint propogates to every new thread created in
> + * this thread_group.
> + */
> + attr->inherit = 1;
> + /*
> + * Register a breakpoint individually for every thread of the
> + * thread_group using register_user_hw_breakpoint() interface.
> + * Warning: Involves redundant validation checks using
> + * arch_validate_hwbkpt_settings().
> + */
> + do {
> + ret = IS_ERR(register_user_hw_breakpoint(attr, triggered, t1));
> + if (ret)
> + goto fail;
> + t1 = next_thread(t1);
> + } while (t1 != t2);


And this needs rcu_read_unlock()



> + return 0;
> +fail:
> + /*
> + * Check if the very first register_user_hw_breakpoint() request
> + * failed. If then, do nothing but return the error value.
> + */
> + if (t1 == t2)
> + return ret;
> + /*
> + * Since there exists a thread where the breakpoint request was not
> + * successful, we are unable to provide a process-wide breakpoint. Hence
> + * cleanup the breakpoints from the previously registered threads.
> + */
> + __unregister_user_hbp_for_threads(t2, t1);


There too.

Once you play with tasks (if it's not current), and iterate
through these, you need to protect either by read-lock
tasklist_lock or using rcu.

Rcu is the much prefered way here.


> +/**
> + * unregister_hbp_by_pid - unregister a user-space hardware breakpoint previously registered using a pid
> + * @pid: pid of the process for which breakpoint must be unregistered
> + */
> +void unregister_user_hbp_by_pid(pid_t pid)
> +{
> + struct task_struct *t1, *t2;
> +
> + t1 = t2 = find_task_by_vpid(pid);
> + if (t1 == NULL)
> + return;
> +
> + __unregister_user_hbp_for_threads(t1, t2);



And this function needs rcu too.

I don't see any in-kernel user for this new feature.
That would be required to integrate it.

Thanks.

2009-12-21 18:46:44

by K.Prasad

[permalink] [raw]
Subject: Re: [Patch 1/1] Introduce register_user_hbp_by_pid() and unregister_user_hbp_by_pid()

On Fri, Dec 18, 2009 at 09:47:48PM +0100, Frederic Weisbecker wrote:
> On Thu, Dec 17, 2009 at 10:52:53PM +0530, K.Prasad wrote:
> > Provide an interface to (un)register user-space breakpoints using a
> > process' pid.
> >
> > Signed-off-by: K.Prasad <[email protected]>
> > ---
> > include/linux/hw_breakpoint.h | 8 +++
> > kernel/hw_breakpoint.c | 92 ++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 100 insertions(+)
> >
<snipped>
> >
> > +/*
> > + * Unregister breakpoints thread-by-thread, for all threads ranging from
> > + * @start to @end.
> > + */
> > +static inline void __unregister_user_hbp_for_threads(struct task_struct *start,
> > + struct task_struct *end)
>
> I'm not sure this wants to be inlined. The function is not not
> that tiny. May be let the compiler choose?

Okay...will make it just static.

> > +{
> > + struct perf_event *bp, *temp_bp;
> > +
> > + do {
> > + mutex_lock(&start->perf_event_mutex);
> > + list_for_each_entry_safe(bp, temp_bp, &start->perf_event_list,
> > + owner_entry) {
> > + if (bp->attr.type != PERF_TYPE_BREAKPOINT)
> > + continue;
> > + unregister_hw_breakpoint(bp);
> > + break;
>
> Do you really want to unregister all of them? What about those
> that may have been registered using perf syscall?
>

Seems that I got influenced heavily by the PPC64 design (was coding
simultaneously for it :-)...will accept a "struct perf_event_attr *" and
use that to identify appropriate breakpoint.

> > +/**
> > + * register_user_hbp_by_pid - register a hardware breakpoint for user space using pid
> > + * @attr: breakpoint attributes
> > + * @triggered: callback to trigger when we hit the breakpoint
> > + * @pid: pid of the thread group for which breakpoints must be registered
> > + */
> > +int register_user_hbp_by_pid(struct perf_event_attr *attr,
> > + perf_overflow_handler_t triggered,
> > + pid_t pid)
> > +{
> > + int ret;
> > + struct task_struct *t1, *t2;
>
> This needs rcu_read_lock()
>

Ok.

> > + t1 = t2 = find_task_by_vpid(pid);
> > + if (t1 == NULL)
> > + return -ESRCH;
> > +
> > + /*
> > + * Ensure that the breakpoint propogates to every new thread created in
> > + * this thread_group.
> > + */
> > + attr->inherit = 1;
> > + /*
> > + * Register a breakpoint individually for every thread of the
> > + * thread_group using register_user_hw_breakpoint() interface.
> > + * Warning: Involves redundant validation checks using
> > + * arch_validate_hwbkpt_settings().
> > + */
> > + do {
> > + ret = IS_ERR(register_user_hw_breakpoint(attr, triggered, t1));
> > + if (ret)
> > + goto fail;
> > + t1 = next_thread(t1);
> > + } while (t1 != t2);
>
> And this needs rcu_read_unlock()
>

Okay.

> > + return 0;
> > +fail:
> > + /*
> > + * Check if the very first register_user_hw_breakpoint() request
> > + * failed. If then, do nothing but return the error value.
> > + */
> > + if (t1 == t2)
> > + return ret;
> > + /*
> > + * Since there exists a thread where the breakpoint request was not
> > + * successful, we are unable to provide a process-wide breakpoint. Hence
> > + * cleanup the breakpoints from the previously registered threads.
> > + */
> > + __unregister_user_hbp_for_threads(t2, t1);
>
>
> There too.
>
> Once you play with tasks (if it's not current), and iterate
> through these, you need to protect either by read-lock
> tasklist_lock or using rcu.
>
> Rcu is the much prefered way here.
>

Okay. I wasn't sure if I had taken sufficient locks....thanks for
pointing it out.

> > +/**
> > + * unregister_hbp_by_pid - unregister a user-space hardware breakpoint previously registered using a pid
> > + * @pid: pid of the process for which breakpoint must be unregistered
> > + */
> > +void unregister_user_hbp_by_pid(pid_t pid)
> > +{
> > + struct task_struct *t1, *t2;
> > +
> > + t1 = t2 = find_task_by_vpid(pid);
> > + if (t1 == NULL)
> > + return;
> > +
> > + __unregister_user_hbp_for_threads(t1, t2);
>
>
>
> And this function needs rcu too.
>
> I don't see any in-kernel user for this new feature.
> That would be required to integrate it.
>

The proposed interfaces, as obvious, are mere wrappers over existing
(un)register_user_* interfaces, and don't do anything vastly different
in order to demonstrate them separately.

I can get a sample kernel module ready - that consumes pid and user-space
address to track write accesses, if you prefer it.

Thanks for reviewing the code!

-- K.Prasad

2009-12-30 22:28:44

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [Patch 1/1] Introduce register_user_hbp_by_pid() and unregister_user_hbp_by_pid()

On Tue, Dec 22, 2009 at 12:16:31AM +0530, K.Prasad wrote:
> On Fri, Dec 18, 2009 at 09:47:48PM +0100, Frederic Weisbecker wrote:
> > > +
> > > + do {
> > > + mutex_lock(&start->perf_event_mutex);
> > > + list_for_each_entry_safe(bp, temp_bp, &start->perf_event_list,
> > > + owner_entry) {
> > > + if (bp->attr.type != PERF_TYPE_BREAKPOINT)
> > > + continue;
> > > + unregister_hw_breakpoint(bp);
> > > + break;
> >
> > Do you really want to unregister all of them? What about those
> > that may have been registered using perf syscall?
> >
>
> Seems that I got influenced heavily by the PPC64 design (was coding
> simultaneously for it :-)...will accept a "struct perf_event_attr *" and
> use that to identify appropriate breakpoint.


:)


> > And this function needs rcu too.
> >
> > I don't see any in-kernel user for this new feature.
> > That would be required to integrate it.
> >
>
> The proposed interfaces, as obvious, are mere wrappers over existing
> (un)register_user_* interfaces, and don't do anything vastly different
> in order to demonstrate them separately.
>
> I can get a sample kernel module ready - that consumes pid and user-space
> address to track write accesses, if you prefer it.


Ok. The code looks good and useful.

But the usual philosophy in the kernel is to not add code
that is left unused upstream. And samples don't substitute a user.
I'm not sure this is a good idea to merge this.

2009-12-31 18:48:14

by K.Prasad

[permalink] [raw]
Subject: Re: [Patch 1/1] Introduce register_user_hbp_by_pid() and unregister_user_hbp_by_pid()

On Wed, Dec 30, 2009 at 11:28:39PM +0100, Frederic Weisbecker wrote:
> On Tue, Dec 22, 2009 at 12:16:31AM +0530, K.Prasad wrote:
> > On Fri, Dec 18, 2009 at 09:47:48PM +0100, Frederic Weisbecker wrote:
>
> > > And this function needs rcu too.
> > >
> > > I don't see any in-kernel user for this new feature.
> > > That would be required to integrate it.
> > >
> >
> > The proposed interfaces, as obvious, are mere wrappers over existing
> > (un)register_user_* interfaces, and don't do anything vastly different
> > in order to demonstrate them separately.
> >
> > I can get a sample kernel module ready - that consumes pid and user-space
> > address to track write accesses, if you prefer it.
>
>
> Ok. The code looks good and useful.
>
> But the usual philosophy in the kernel is to not add code
> that is left unused upstream. And samples don't substitute a user.
> I'm not sure this is a good idea to merge this.
>

Back to the old trick!...How about an ftrace plugin that accepts pid,
user-space address and memory access type and traces all the IP
addresses that caused access?

echo <pid>:<user_addr><access_type> > usym_trace_filter
echo 567:0x1234567:rw- > usym_trace_filter

Breakpoint IP
------------ ---------
567:0x1234567 0x0abcdef

I'm unsure if it sounds interesting at all, but I suspect it wouldn't be
as easy as above to gather the shown information through any existing
tools.

Thanks,
K.Prasad

2010-01-10 04:00:51

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [Patch 1/1] Introduce register_user_hbp_by_pid() and unregister_user_hbp_by_pid()

On Fri, Jan 01, 2010 at 12:18:04AM +0530, K.Prasad wrote:
> On Wed, Dec 30, 2009 at 11:28:39PM +0100, Frederic Weisbecker wrote:
> > On Tue, Dec 22, 2009 at 12:16:31AM +0530, K.Prasad wrote:
> > > On Fri, Dec 18, 2009 at 09:47:48PM +0100, Frederic Weisbecker wrote:
> >
> > > > And this function needs rcu too.
> > > >
> > > > I don't see any in-kernel user for this new feature.
> > > > That would be required to integrate it.
> > > >
> > >
> > > The proposed interfaces, as obvious, are mere wrappers over existing
> > > (un)register_user_* interfaces, and don't do anything vastly different
> > > in order to demonstrate them separately.
> > >
> > > I can get a sample kernel module ready - that consumes pid and user-space
> > > address to track write accesses, if you prefer it.
> >
> >
> > Ok. The code looks good and useful.
> >
> > But the usual philosophy in the kernel is to not add code
> > that is left unused upstream. And samples don't substitute a user.
> > I'm not sure this is a good idea to merge this.
> >
>
> Back to the old trick!...How about an ftrace plugin that accepts pid,
> user-space address and memory access type and traces all the IP
> addresses that caused access?
>
> echo <pid>:<user_addr><access_type> > usym_trace_filter
> echo 567:0x1234567:rw- > usym_trace_filter
>
> Breakpoint IP
> ------------ ---------
> 567:0x1234567 0x0abcdef
>
> I'm unsure if it sounds interesting at all, but I suspect it wouldn't be
> as easy as above to gather the shown information through any existing
> tools.


That's a good idea to trace userspace breakpoints but:

I think the perf interface to use breakpoints is much more powerful
than the breakpoint ftrace plugin, which is somehow deprecated now.

The fact is that ftrace plugins in general (I mean the plugins based
on struct tracer, not the trace events) are mostly deprecated in favor
of trace events.

There are still some areas where the plugins are necessary though, such as
function tracing, and some other tracers. But these are exceptions.
And breakpoints interface was one of them, but now we have
a much more powerful existing interface for it in perf tools.

When it comes to think about improving or adding a new ftrace plugin,
if we know an existing interface that is proven better, let's rather
improve the latter (that's why I'll try to convert the function graph
tracer into a trace event...not an easy task).

The breakpoint ftrace plugin can only trace the whole kernel, has no
per-cpu or per task (+inheritance) granularity, backtraces,
or whatever the perf tools can already offer.

To sum-up, sure we could improve it but:

- we already have better, as a unified and easier to improve interface.
Extending this ftrace plugin would make it even harder to maintain.

- ftrace plugins are deprecated, except for particular cases


So, concerning breakpoints, I really suggest to focus on the perf tools
and deprecate this breakpoint ftrace plugin.

Also, I'm pretty sure that this would already work:

./perf record -e mem:@addr_in_ls:rw ls /usr
or:
./perf record -e mem:@addr_in_ls:rw --pid $(pid_of_a_running_ls)

I've never tested it, but if that doesn't work, that's probably because
of a guardian inside the kernel that only accepts userspace breakpoints
from ptraced processes. I should check that. But if it doesn't work yet,
that would require very few changes for it to work.