On Nov 13, 2007 12:01 AM, Abhishek Sagar <[email protected]> wrote:
> Hope these simple changes would suffice; all suggestions/corrections are welcome.
Whoops...sry for the repeated email..emailer trouble.
--
Regards,
Abhishek
On Nov 13, 2007 12:09 AM, Abhishek Sagar <[email protected]> wrote:
> Whoops...sry for the repeated email..emailer trouble.
Expecting this one to makes it to the list. Summary again:
This patch introduces a provision to specify a user-defined callback
to run at function entry to complement the return handler in
kretprobes. Currently, whenever a kretprobe is registered, a user can
specify a callback (return-handler) to be run each time the target
function returns. This is also not guaranteed and is limited by the
number of concurrently pending return instances of the target function
in the current process's context.
This patch will now allow registration of another user defined handler
which is guaranteed to run each time the current return instance is
allocated and the return handler is set-up. Conversely, if the
entry-handler returns an error, it'll cause the current return
instance to be dropped and the return handler will also not run. The
purpose is to provide flexibility to do certain kinds of function
level profiling using kretprobes. By being able to register function
entry and return handlers, kretprobes will now be able to reduce an
extra probe registration (and associated race) for scenarios where an
entry handler is required to capture the function call/entry event
along with the corresponding function exit event.
Hope these simple changes would suffice; all suggestions/corrections
are welcome.
Signed-off-by: Abhishek Sagar <[email protected]>
---
diff -X /home/sagara/kprobes/dontdiff -upNr
linux-2.6.24-rc2/include/linux/kprobes.h
linux-2.6.24-rc2_kp/include/linux/kprobes.h
--- linux-2.6.24-rc2/include/linux/kprobes.h 2007-11-07 03:27:46.000000000 +0530
+++ linux-2.6.24-rc2_kp/include/linux/kprobes.h 2007-11-13
16:04:35.000000000 +0530
@@ -152,6 +152,7 @@ static inline int arch_trampoline_kprobe
struct kretprobe {
struct kprobe kp;
kretprobe_handler_t handler;
+ kretprobe_handler_t entry_handler;
int maxactive;
int nmissed;
struct hlist_head free_instances;
diff -X /home/sagara/kprobes/dontdiff -upNr
linux-2.6.24-rc2/kernel/kprobes.c linux-2.6.24-rc2_kp/kernel/kprobes.c
--- linux-2.6.24-rc2/kernel/kprobes.c 2007-11-07 03:27:46.000000000 +0530
+++ linux-2.6.24-rc2_kp/kernel/kprobes.c 2007-11-13 16:04:17.000000000 +0530
@@ -694,12 +694,22 @@ static int __kprobes pre_handler_kretpro
spin_lock_irqsave(&kretprobe_lock, flags);
if (!hlist_empty(&rp->free_instances)) {
struct kretprobe_instance *ri;
+ struct pt_regs copy;
ri = hlist_entry(rp->free_instances.first,
struct kretprobe_instance, uflist);
ri->rp = rp;
ri->task = current;
- arch_prepare_kretprobe(ri, regs);
+
+ if (rp->entry_handler) {
+ copy = *regs;
+ arch_prepare_kretprobe(ri, ©);
+ if (rp->entry_handler(ri, ©))
+ goto out; /* skip current kretprobe instance */
+ *regs = copy;
+ } else {
+ arch_prepare_kretprobe(ri, regs);
+ }
/* XXX(hch): why is there no hlist_move_head? */
hlist_del(&ri->uflist);
@@ -707,6 +717,7 @@ static int __kprobes pre_handler_kretpro
hlist_add_head(&ri->hlist, kretprobe_inst_table_head(ri->task));
} else
rp->nmissed++;
+out:
spin_unlock_irqrestore(&kretprobe_lock, flags);
return 0;
}
Abhishek Sagar wrote:
> On Nov 13, 2007 12:09 AM, Abhishek Sagar <[email protected]> wrote:
>> Whoops...sry for the repeated email..emailer trouble.
>
> Expecting this one to makes it to the list. Summary again:
>
> This patch introduces a provision to specify a user-defined callback
> to run at function entry to complement the return handler in
> kretprobes. Currently, whenever a kretprobe is registered, a user can
> specify a callback (return-handler) to be run each time the target
> function returns. This is also not guaranteed and is limited by the
> number of concurrently pending return instances of the target function
> in the current process's context.
>
> This patch will now allow registration of another user defined handler
> which is guaranteed to run each time the current return instance is
> allocated and the return handler is set-up. Conversely, if the
> entry-handler returns an error, it'll cause the current return
> instance to be dropped and the return handler will also not run. The
> purpose is to provide flexibility to do certain kinds of function
> level profiling using kretprobes. By being able to register function
> entry and return handlers, kretprobes will now be able to reduce an
> extra probe registration (and associated race) for scenarios where an
> entry handler is required to capture the function call/entry event
> along with the corresponding function exit event.
>
If I understand your intentions(to capture information on function call/entry and corresponding function exit) cleary, I have few concerns on this.
1) How do you map the entry_handler(which gets executed when a process enters the function) with each instance of return probe handler.
I accept that entry_handler() will execute each time process enters the function, but to calculate time, one needs to know corresponding instance of return probe handler(there should be a map for each return handler).
Let me explain briefly.
Suppose in a SMP system, 4 processes enter the same function at almost sametime(by executing entry_hanlder()) and returns from 4 different locations by executing the return handler. Now how do I match entry_handler() with corresponding instance of return handler for calculating time.
Sometime back, Even I was interested in calculating the function execution time, but I used approach a) .
Now What I think is, there could be 2 solutions to these problem.
a) Collect the entry time and exit time and put it in that kretprobe_instance structure and fetch it before freeing that instance.
b) Or pass ri(kretprobe_instance address to entry_handler) and match it with return probe handler.
On Nov 14, 2007 1:27 PM, Srinivasa Ds <[email protected]> wrote:
> 1) How do you map the entry_handler(which gets executed when a process enters the function) with each instance of return probe handler.
> I accept that entry_handler() will execute each time process enters the function, but to calculate time, one needs to know corresponding instance of return probe handler(there should be a map for each return handler).
Yes, a one-to-one correspondence between the entry and return handlers
is important. I've tried to address this by passing the same
kretprobe_instance to both the entry and return handlers.
> Let me explain briefly.
> Suppose in a SMP system, 4 processes enter the same function at almost sametime(by executing entry_hanlder()) and returns from 4 different locations by executing the return handler. Now how do I match entry_handler() with corresponding instance of return handler for calculating time.
Right, and this scenario would also occur on UPs where the kretprobe'd
function has nested calls. This has been taken care of (see below).
> Now What I think is, there could be 2 solutions to these problem.
>
> a) Collect the entry time and exit time and put it in that kretprobe_instance structure and fetch it before freeing that instance.
In case someone wants to calculate the entry and exit timestamps of a
function using kretprobes, the appropriate place for it is not the
entry and return handlers. Thats because the path from when a function
is called or from when it returns, to the point of invocation of the
entry or return handler is not O(1).
Looking at trampoline_handler(), it actually traverses a list of all
pending return instances to reach the correct return instance. So any
kind of exit timestamp must be placed just before that and passed to
the return handler via kretprobe_instance (as you just suggested).
> b) Or pass ri(kretprobe_instance address to entry_handler) and match it with return probe handler.
This is what I'm trying to do with this patch. I hope I've not misread
what you meant here, but as you'll notice from the patch:
+ if (rp->entry_handler) {
+ copy = *regs;
+ arch_prepare_kretprobe(ri, ©);
+ if (rp->entry_handler(ri, ©))
<------------ (entry-handler with ri)
+ goto out; /* skip current kretprobe instance */
+ *regs = copy;
+ } else {
+ arch_prepare_kretprobe(ri, regs);
+ }
the entry handler is called with the appropriate return instance. I
haven't put any explicit "match" test here for ri. The reason is that
the correct ri would be passed to both the entry and return handlers
as trampoline_handler() explicitly matches them to the correct task.
Note that all pending return instances of a function are chained in
LIFO order. S the entry-handler which gets called last, should have
its return handler called first (in case of multiple pending return
instances).
Another cool thing here is that if the entry handler returns a
non-zero value, the current return instance is aborted altogether. So
if the entry-handler fails, no return handler gets called.
Does this address your concerns?
--
Regards
Abhishek Sagar
Abhishek Sagar wrote:
> the entry handler is called with the appropriate return instance. I
> haven't put any explicit "match" test here for ri. The reason is that
> the correct ri would be passed to both the entry and return handlers
> as trampoline_handler() explicitly matches them to the correct task.
> Note that all pending return instances of a function are chained in
> LIFO order. S the entry-handler which gets called last, should have
> its return handler called first (in case of multiple pending return
> instances).
>
No, eventhough return instances are chained in an order, order of execution of
return handler entirely depends on which process returns first(some process may
return from 2 line of the function and some process may return from last line
of the function). So entry_handler() which gets executed last doesn't guarantee
that its return handler will be executed first(because it took a lot time
to return).
So only thing to match the entry_handler() with its return_handler() is
return probe instance(ri)'s address, which user has to take care explicitly
(Hence I feel sol a) would be nice).
Thanks
Srinivasa DS
On Nov 14, 2007 3:53 PM, Srinivasa Ds <[email protected]> wrote:
> No, eventhough return instances are chained in an order, order of execution of
> return handler entirely depends on which process returns first
Right...the LIFO chain analogy holds true for return instances for the
same task only. As you've pointed out, kretprobe_instance is the only
thing that can bind corresponding entry and return handlers together,
which has been taken care of.
> So entry_handler() which gets executed last doesn't guarantee
> that its return handler will be executed first(because it took a lot time
> to return).
Only if there are return instances pending belonging to different tasks.
> So only thing to match the entry_handler() with its return_handler() is
> return probe instance(ri)'s address, which user has to take care explicitly
Lets see how entry and return handlers can be matched up in three
different scenarios:-
1. Multiple function entries from various tasks (the one you've just
pointed out).
2. Multiple kretprobe registration on the same function.
3. Nested calls of kretprobe'd function.
In cases 1 and 3, the following information can be used to match
corresponding entry and return handlers inside a user handler (if
needed):
(ri->task, ri->ret_addr)
where ri is struct kretprobe_instance *
This tuple should uniquely identify a return address (right?).
In case 2, entry and return handlers are anyways called in the right
order (taken care of by
trampoline_handler() due to LIFO insertion in ri->hlist).
The fact that ri is passed to both handlers should allow any user
handler to identify each of these cases and take appropriate
synchronization action pertaining to its private data, if needed.
> (Hence I feel sol a) would be nice).
With an entry-handler, any module aiming to profile running time of a
function (say) can simply do the following without being "return
instance" conscious. Note however that I'm not trying to address just
this scenario but trying to provide a general way to use
entry-handlers in kretprobes:
static unsigned long flag = 0; /* use bit 0 as a global flag */
unsigned long long entry, exit;
int my_entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
if (!test_and_set_bit(0, &flag))
/* this instance claims the first entry to kretprobe'd function */
entry = sched_clock();
/* do other stuff */
return 0; /* right on! */
}
return 1; /* error: no return instance to be allocated for this
function entry */
}
/* will only be called iff flag == 1 */
int my_return_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
BUG_ON(!flag);
exit = sched_clock();
set_bit(0, &flag);
}
I think something like this should do the trick for you.
> Thanks
> Srinivasa DS
--
Thanks & Regards
Abhishek Sagar
On Wed, 2007-11-14 at 19:00 +0530, Abhishek Sagar wrote:
First of all, some general comments. We seem to be trying to solve two
problems here:
1. Prevent the asymmetry in entry- vs. return-handler calls that can
develop when we temporarily run out of kretprobe_instances. E.g., if we
have m kretprobe "misses", we may report n calls but only (n-m) returns.
2. Simplify the task of correlating data (e.g., timestamps) between
function entry and function return.
Problem #1 wouldn't exist if we could solve problem #1a:
1a. Ensure that we never run out of kretprobe_instances (for some
appropriate value of "never").
I've thought of various approaches to #1a -- e.g., allocate
kretprobe_instances from GFP_ATOMIC memory when we're out of
preallocated instances -- but I've never found time to pursue them.
This might be a good time to brainstorm solutions to that problem.
Lacking a solution to #1a, I think Abhishek's approach provides a
reasonable solution to problem #1.
I don't think it takes us very far toward solving #2, though. I agree
with Srinivasa that it would be more helpful if we could store the data
collected at function-entry time right in the kretprobe_instance. Kevin
Stafford prototyped this "data pouch" idea a couple of years ago --
http://sourceware.org/ml/systemtap/2005-q3/msg00593.html
-- but an analogous feature was implemented at a higher level in
SystemTap. Either approach -- in kprobes or SystemTap -- would benefit
from a fix to #1 (or #1a).
Review of Abhishek's patch:
I see no reason to save a copy of *regs and pass that to the entry
handler. Passing the real regs pointer is good enough for other kprobe
handlers. And if a handler on i386 uses ®s->esp as the value of the
stack pointer (which is correct -- long story), it'll get the wrong
value if its regs arg points at the copy.
More comments below.
> On Nov 14, 2007 3:53 PM, Srinivasa Ds <[email protected]> wrote:
...
>
> > So entry_handler() which gets executed last doesn't guarantee
> > that its return handler will be executed first(because it took a lot time
> > to return).
>
> Only if there are return instances pending belonging to different tasks.
I agree with Abhishek here.
...
>
> Lets see how entry and return handlers can be matched up in three
> different scenarios:-
>
> 1. Multiple function entries from various tasks (the one you've just
> pointed out).
> 2. Multiple kretprobe registration on the same function.
> 3. Nested calls of kretprobe'd function.
>
> In cases 1 and 3, the following information can be used to match
> corresponding entry and return handlers inside a user handler (if
> needed):
>
> (ri->task, ri->ret_addr)
> where ri is struct kretprobe_instance *
>
> This tuple should uniquely identify a return address (right?).
But if it's a recursive function, there could be multiple instances in
the same task with the same return address. The stack pointer would be
different, FWIW.
>
>
> In case 2, entry and return handlers are anyways called in the right
> order (taken care of by
> trampoline_handler() due to LIFO insertion in ri->hlist).
Yes. And for case #2, ri->rp will be different for each
kretprobe_instance anyway.
>
> The fact that ri is passed to both handlers should allow any user
> handler to identify each of these cases and take appropriate
> synchronization action pertaining to its private data, if needed.
I don't think Abhishek has made his case here. See below.
>
> > (Hence I feel sol a) would be nice).
>
> With an entry-handler, any module aiming to profile running time of a
> function (say) can simply do the following without being "return
> instance" conscious. Note however that I'm not trying to address just
> this scenario but trying to provide a general way to use
> entry-handlers in kretprobes:
>
> static unsigned long flag = 0; /* use bit 0 as a global flag */
> unsigned long long entry, exit;
>
> int my_entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> {
> if (!test_and_set_bit(0, &flag))
> /* this instance claims the first entry to kretprobe'd function */
> entry = sched_clock();
> /* do other stuff */
> return 0; /* right on! */
> }
> return 1; /* error: no return instance to be allocated for this
> function entry */
> }
>
> /* will only be called iff flag == 1 */
> int my_return_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> {
> BUG_ON(!flag);
> exit = sched_clock();
> set_bit(0, &flag);
> }
>
> I think something like this should do the trick for you.
Since flag is static, it seems to me that if there were instances of the
probed function active concurrently in multiple tasks, only the
first-called instance would be profiled.
>
> > Thanks
> > Srinivasa DS
>
> --
> Thanks & Regards
> Abhishek Sagar
Jim Keniston
On Nov 15, 2007 4:21 AM, Jim Keniston <[email protected]> wrote:
> On Wed, 2007-11-14 at 19:00 +0530, Abhishek Sagar wrote:
>
> First of all, some general comments. We seem to be trying to solve two
> problems here:
> 1. Prevent the asymmetry in entry- vs. return-handler calls that can
> develop when we temporarily run out of kretprobe_instances. E.g., if we
> have m kretprobe "misses", we may report n calls but only (n-m) returns.
That has already been taken care of. The entry-handler is called iff
'hlist_empty(&rp->free_instances)' is false. Additionally, if
entry_handler() returns an error then the corresponding return handler
is also not called because that particular "return instance" is
aborted/voluntarily-missed. Hence the following guarantee is implied:
No. of return handler calls = No. of entry_handler calls which
returned 0 (success).
The number of failed entry_handler calls doesn't update any kind of
ri->voluntary_nmissed count since the user handlers are internally
aware of them (unlike ri->nmissed).
> 2. Simplify the task of correlating data (e.g., timestamps) between
> function entry and function return.
Right. Srinivasa and you have been hinting at the use of per-instance
private data for the same. I think ri should be enough.
> Problem #1 wouldn't exist if we could solve problem #1a:
> 1a. Ensure that we never run out of kretprobe_instances (for some
> appropriate value of "never").
>
> I've thought of various approaches to #1a -- e.g., allocate
> kretprobe_instances from GFP_ATOMIC memory when we're out of
> preallocated instances -- but I've never found time to pursue them.
> This might be a good time to brainstorm solutions to that problem.
>
> Lacking a solution to #1a, I think Abhishek's approach provides a
> reasonable solution to problem #1.
If you're not convinced that problem #1 isn't appropriately handled,
we should look for something like that I guess.
> I don't think it takes us very far toward solving #2, though. I agree
> with Srinivasa that it would be more helpful if we could store the data
> collected at function-entry time right in the kretprobe_instance. Kevin
> Stafford prototyped this "data pouch" idea a couple of years ago --
> http://sourceware.org/ml/systemtap/2005-q3/msg00593.html
> -- but an analogous feature was implemented at a higher level in
> SystemTap. Either approach -- in kprobes or SystemTap -- would benefit
> from a fix to #1 (or #1a).
There are three problems in the "data pouch" approach, which is a
generalized case of Srinivasa's timestamp case:
1. It bloats the size of each return instance to a run-time defined
value. I'm certain that quite a few memory starved ARM kprobes users
would certainly be wishing they could install more probes on their
system without taking away too much from the precious memory pools
which can impact their system performance. This is not a deal breaker
though, just an annoyance.
2. Forces user entry/return handlers to use ri->entry_info (see
Kevin's patch) even if their design only wanted such private data to
be used in certain instances. Therefore ideally, any per-instance data
allocation should be left to user entry handlers, IMO. Even if we
allow a pouch for private data in a return instance, the user handlers
would still need be aware of "return instances" to actually use them
globally.
3. It's redundant. 'ri' can uniquely identify any entry-return handler
pair. This itself solves your problem #2. It only moves the onus of
private data allocation to user handlers.
> Review of Abhishek's patch:
>
> I see no reason to save a copy of *regs and pass that to the entry
> handler. Passing the real regs pointer is good enough for other kprobe
> handlers.
No, a copy is required because if the entry_handler() returns error (a
voluntary miss), then there has to be a way to roll-back all the
changes that arch_prepare_kretprobe() and entry_handler() made to
*regs. Such an instance is considered "aborted".
> And if a handler on i386 uses ®s->esp as the value of the
> stack pointer (which is correct -- long story), it'll get the wrong
> value if its regs arg points at the copy.
That's a catch! I've made the fix (see inlined patch below). It now
passes regs instead of © to both the entry_handler() and
arch_prepare_kretprobe().
> More comments below.
[snip]
> > 1. Multiple function entries from various tasks (the one you've just
> > pointed out).
> > 2. Multiple kretprobe registration on the same function.
> > 3. Nested calls of kretprobe'd function.
> >
> > In cases 1 and 3, the following information can be used to match
> > corresponding entry and return handlers inside a user handler (if
> > needed):
> >
> > (ri->task, ri->ret_addr)
> > where ri is struct kretprobe_instance *
> >
> > This tuple should uniquely identify a return address (right?).
>
> But if it's a recursive function, there could be multiple instances in
> the same task with the same return address. The stack pointer would be
> different, FWIW.
Wouldn't the return addresses be different for recursive/nested calls?
I think the only case where the return addresses would be same is when
multiple return probes are installed on the same function.
> > The fact that ri is passed to both handlers should allow any user
> > handler to identify each of these cases and take appropriate
> > synchronization action pertaining to its private data, if needed.
>
> I don't think Abhishek has made his case here. See below.
>
> >
> > > (Hence I feel sol a) would be nice).
> >
> > With an entry-handler, any module aiming to profile running time of a
> > function (say) can simply do the following without being "return
> > instance" conscious. Note however that I'm not trying to address just
> > this scenario but trying to provide a general way to use
> > entry-handlers in kretprobes:
> >
> > static unsigned long flag = 0; /* use bit 0 as a global flag */
> > unsigned long long entry, exit;
> >
> > int my_entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> > {
> > if (!test_and_set_bit(0, &flag))
> > /* this instance claims the first entry to kretprobe'd function */
> > entry = sched_clock();
> > /* do other stuff */
> > return 0; /* right on! */
> > }
> > return 1; /* error: no return instance to be allocated for this
> > function entry */
> > }
> >
> > /* will only be called iff flag == 1 */
> > int my_return_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> > {
> > BUG_ON(!flag);
> > exit = sched_clock();
> > set_bit(0, &flag);
> > }
> >
> > I think something like this should do the trick for you.
>
> Since flag is static, it seems to me that if there were instances of the
> probed function active concurrently in multiple tasks, only the
> first-called instance would be profiled.
Oh that's right, or we could use a per-cpu flag which would restrict
us to only one profiling instance per processor.
> Jim Keniston
--
Thanks & Regards
Abhishek Sagar
---
Signed-off-by: Abhishek Sagar <[email protected]>
diff -upNr linux-2.6.24-rc2/include/linux/kprobes.h
linux-2.6.24-rc2_kp/include/linux/kprobes.h
--- linux-2.6.24-rc2/include/linux/kprobes.h 2007-11-07 03:27:46.000000000 +0530
+++ linux-2.6.24-rc2_kp/include/linux/kprobes.h 2007-11-15
15:49:39.000000000 +0530
@@ -152,6 +152,7 @@ static inline int arch_trampoline_kprobe
struct kretprobe {
struct kprobe kp;
kretprobe_handler_t handler;
+ kretprobe_handler_t entry_handler;
int maxactive;
int nmissed;
struct hlist_head free_instances;
diff -upNr linux-2.6.24-rc2/kernel/kprobes.c
linux-2.6.24-rc2_kp/kernel/kprobes.c
--- linux-2.6.24-rc2/kernel/kprobes.c 2007-11-07 03:27:46.000000000 +0530
+++ linux-2.6.24-rc2_kp/kernel/kprobes.c 2007-11-15 16:00:57.000000000 +0530
@@ -694,12 +694,24 @@ static int __kprobes pre_handler_kretpro
spin_lock_irqsave(&kretprobe_lock, flags);
if (!hlist_empty(&rp->free_instances)) {
struct kretprobe_instance *ri;
+ struct pt_regs copy;
ri = hlist_entry(rp->free_instances.first,
struct kretprobe_instance, uflist);
ri->rp = rp;
ri->task = current;
- arch_prepare_kretprobe(ri, regs);
+
+ if (rp->entry_handler) {
+ copy = *regs;
+ arch_prepare_kretprobe(ri, regs);
+ if (rp->entry_handler(ri, regs)) {
+ *regs = copy;
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+ return 0; /* skip current kretprobe instance */
+ }
+ } else {
+ arch_prepare_kretprobe(ri, regs);
+ }
/* XXX(hch): why is there no hlist_move_head? */
hlist_del(&ri->uflist);
On Nov 15, 2007 4:21 AM, Jim Keniston <[email protected]> wrote:
> 2. Simplify the task of correlating data (e.g., timestamps) between
> function entry and function return.
Would adding of data and len fields in ri help? Instead of "pouching"
data in one go at registration time, this would let user handlers do
the allocation and allow them to use different kinds of data
structures per-instance.
- Abhishek Sagar
On Thu, 2007-11-15 at 18:46 +0530, Abhishek Sagar wrote:
> On Nov 15, 2007 4:21 AM, Jim Keniston <[email protected]> wrote:
...
> > First of all, some general comments. We seem to be trying to solve two
> > problems here:
> > 1. Prevent the asymmetry in entry- vs. return-handler calls that can
> > develop when we temporarily run out of kretprobe_instances. E.g., if we
> > have m kretprobe "misses", we may report n calls but only (n-m) returns.
>
> That has already been taken care of. ...
[Much discussion snipped]
... in your patch. Yes, I think we're in violent agreement on that
point.
>
> > 2. Simplify the task of correlating data (e.g., timestamps) between
> > function entry and function return.
>
> Right. Srinivasa and you have been hinting at the use of per-instance
> private data for the same. I think ri should be enough.
>
> > Problem #1 wouldn't exist if we could solve problem #1a:
> > 1a. Ensure that we never run out of kretprobe_instances (for some
> > appropriate value of "never").
> >
> > I've thought of various approaches to #1a -- e.g., allocate
> > kretprobe_instances from GFP_ATOMIC memory when we're out of
> > preallocated instances -- but I've never found time to pursue them.
> > This might be a good time to brainstorm solutions to that problem.
> >
> > Lacking a solution to #1a, I think Abhishek's approach provides a
> > reasonable solution to problem #1.
>
> If you're not convinced that problem #1 isn't appropriately handled,
I don't know where you got that idea. Reread my last sentence above.
My concern is that your patch fixes one symptom (number of function
returns reported < number of function entries reported), but not the
root cause (we can fail to report some function returns). If fixing one
symptom is the best we can do, then I have no real objection to that.
Of course, we need to keep in mind that you're adding an externally
visible feature that would need to be maintained, so it demands more
scrutiny than a simple bug fix.
> we should look for something like that I guess.
>
> > I don't think it takes us very far toward solving #2, though. I agree
> > with Srinivasa that it would be more helpful if we could store the data
> > collected at function-entry time right in the kretprobe_instance. Kevin
> > Stafford prototyped this "data pouch" idea a couple of years ago --
> > http://sourceware.org/ml/systemtap/2005-q3/msg00593.html
> > -- but an analogous feature was implemented at a higher level in
> > SystemTap. Either approach -- in kprobes or SystemTap -- would benefit
> > from a fix to #1 (or #1a).
>
> There are three problems in the "data pouch" approach, which is a
> generalized case of Srinivasa's timestamp case:
>
> 1. It bloats the size of each return instance to a run-time defined
> value. I'm certain that quite a few memory starved ARM kprobes users
> would certainly be wishing they could install more probes on their
> system without taking away too much from the precious memory pools
> which can impact their system performance. This is not a deal breaker
> though, just an annoyance.
entry_info is, by default, a zero-length array, which adds nothing to
the size of a uretprobe_instance -- at least on the 3 architectures I've
tested on (i386, x86_64, powerpc).
>
> 2. Forces user entry/return handlers to use ri->entry_info (see
> Kevin's patch) even if their design only wanted such private data to
> be used in certain instances.
No, it doesn't. Providing a feature isn't the same as forcing people to
use the feature.
> Therefore ideally, any per-instance data
> allocation should be left to user entry handlers, IMO. Even if we
> allow a pouch for private data in a return instance, the user handlers
> would still need be aware of "return instances" to actually use them
> globally.
>
> 3. It's redundant. 'ri' can uniquely identify any entry-return handler
> pair. This itself solves your problem #2. It only moves the onus of
> private data allocation to user handlers.
Having ri available at function entry time is definitely a win, but
maintaining separate data structures and using ri to map to the right
one is no trivial task. (It's a lot easier if you pre-allocate the
maximum number of data structures you'll need -- presumably matching
rp->maxactive -- but then you have at least the same amount of "bloat"
as with the data pouch approach.)
I suggest you show us a probe module that captures data at function
entry and reports it upon return, exploiting your proposed patch.
Profiling would be a reasonable example, but something where multiple
values are captured might be more relevant. (Your example below doesn't
count because it doesn't work.) Then I'll code up the same example
using your enhancement + the data pouch enhancement, and we can compare.
Of course, the data pouch enhancement would build nicely atop your
enhancement, so it could be proposed and considered separately.
>
> > Review of Abhishek's patch:
> >
> > I see no reason to save a copy of *regs and pass that to the entry
> > handler. Passing the real regs pointer is good enough for other kprobe
> > handlers.
>
> No, a copy is required because if the entry_handler() returns error (a
> voluntary miss), then there has to be a way to roll-back all the
> changes that arch_prepare_kretprobe() and entry_handler() made to
> *regs. Such an instance is considered "aborted".
No. Handlers shouldn't be writing to the pt_regs struct unless they
want to change the operation of the probed code. If an entry handler
scribbles on *regs and then decides to "abort", it's up to that handler
to restore the contents of *regs. It's that way with all kprobe and
kretprobe handlers, and the proposed entry handler is no different, as
far as I can see.
> > And if a handler on i386 uses ®s->esp as the value of the
> > stack pointer (which is correct -- long story), it'll get the wrong
> > value if its regs arg points at the copy.
>
> That's a catch! I've made the fix (see inlined patch below). It now
> passes regs instead of © to both the entry_handler() and
> arch_prepare_kretprobe().
>
> > More comments below.
> [snip]
> > > 1. Multiple function entries from various tasks (the one you've just
> > > pointed out).
> > > 2. Multiple kretprobe registration on the same function.
> > > 3. Nested calls of kretprobe'd function.
> > >
> > > In cases 1 and 3, the following information can be used to match
> > > corresponding entry and return handlers inside a user handler (if
> > > needed):
> > >
> > > (ri->task, ri->ret_addr)
> > > where ri is struct kretprobe_instance *
> > >
> > > This tuple should uniquely identify a return address (right?).
> >
> > But if it's a recursive function, there could be multiple instances in
> > the same task with the same return address. The stack pointer would be
> > different, FWIW.
>
> Wouldn't the return addresses be different for recursive/nested calls?
Not for recursive calls. Think about it. Or write a little program
that calls a recursive function like factorial(), and have factorial()
report the value of __builtin_return_address(0) each time it's called.
> I think the only case where the return addresses would be same is when
> multiple return probes are installed on the same function.
>
> > > The fact that ri is passed to both handlers should allow any user
> > > handler to identify each of these cases and take appropriate
> > > synchronization action pertaining to its private data, if needed.
> >
> > I don't think Abhishek has made his case here. See below.
> >
> > >
> > > > (Hence I feel sol a) would be nice).
> > >
> > > With an entry-handler, any module aiming to profile running time of a
> > > function (say) can simply do the following without being "return
> > > instance" conscious. Note however that I'm not trying to address just
> > > this scenario but trying to provide a general way to use
> > > entry-handlers in kretprobes:
> > >
> > > static unsigned long flag = 0; /* use bit 0 as a global flag */
> > > unsigned long long entry, exit;
> > >
> > > int my_entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> > > {
> > > if (!test_and_set_bit(0, &flag))
> > > /* this instance claims the first entry to kretprobe'd function */
> > > entry = sched_clock();
> > > /* do other stuff */
> > > return 0; /* right on! */
> > > }
> > > return 1; /* error: no return instance to be allocated for this
> > > function entry */
> > > }
> > >
> > > /* will only be called iff flag == 1 */
> > > int my_return_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> > > {
> > > BUG_ON(!flag);
> > > exit = sched_clock();
> > > set_bit(0, &flag);
> > > }
> > >
> > > I think something like this should do the trick for you.
> >
> > Since flag is static, it seems to me that if there were instances of the
> > probed function active concurrently in multiple tasks, only the
> > first-called instance would be profiled.
>
> Oh that's right, or we could use a per-cpu flag which would restrict
> us to only one profiling instance per processor.
If the probed function can sleep, then it could return on a different
CPU; a per-cpu flag wouldn't work in such cases.
>
> > Jim Keniston
>
> --
> Thanks & Regards
> Abhishek Sagar
>
> ---
> Signed-off-by: Abhishek Sagar <[email protected]>
>
> diff -upNr linux-2.6.24-rc2/include/linux/kprobes.h
> linux-2.6.24-rc2_kp/include/linux/kprobes.h
> --- linux-2.6.24-rc2/include/linux/kprobes.h 2007-11-07 03:27:46.000000000 +0530
> +++ linux-2.6.24-rc2_kp/include/linux/kprobes.h 2007-11-15
> 15:49:39.000000000 +0530
> @@ -152,6 +152,7 @@ static inline int arch_trampoline_kprobe
> struct kretprobe {
> struct kprobe kp;
> kretprobe_handler_t handler;
> + kretprobe_handler_t entry_handler;
> int maxactive;
> int nmissed;
> struct hlist_head free_instances;
> diff -upNr linux-2.6.24-rc2/kernel/kprobes.c
> linux-2.6.24-rc2_kp/kernel/kprobes.c
> --- linux-2.6.24-rc2/kernel/kprobes.c 2007-11-07 03:27:46.000000000 +0530
> +++ linux-2.6.24-rc2_kp/kernel/kprobes.c 2007-11-15 16:00:57.000000000 +0530
> @@ -694,12 +694,24 @@ static int __kprobes pre_handler_kretpro
> spin_lock_irqsave(&kretprobe_lock, flags);
> if (!hlist_empty(&rp->free_instances)) {
> struct kretprobe_instance *ri;
> + struct pt_regs copy;
NAK. Saving and restoring regs is expensive and inconsistent with
existing kprobes usage.
>
> ri = hlist_entry(rp->free_instances.first,
> struct kretprobe_instance, uflist);
> ri->rp = rp;
> ri->task = current;
> - arch_prepare_kretprobe(ri, regs);
> +
> + if (rp->entry_handler) {
> + copy = *regs;
> + arch_prepare_kretprobe(ri, regs);
> + if (rp->entry_handler(ri, regs)) {
> + *regs = copy;
> + spin_unlock_irqrestore(&kretprobe_lock, flags);
> + return 0; /* skip current kretprobe instance */
> + }
> + } else {
> + arch_prepare_kretprobe(ri, regs);
> + }
>
> /* XXX(hch): why is there no hlist_move_head? */
> hlist_del(&ri->uflist);
Jim
On Thu, 2007-11-15 at 20:30 +0530, Abhishek Sagar wrote:
> On Nov 15, 2007 4:21 AM, Jim Keniston <[email protected]> wrote:
> > 2. Simplify the task of correlating data (e.g., timestamps) between
> > function entry and function return.
>
> Would adding of data and len fields in ri help? Instead of "pouching"
> data in one go at registration time, this would let user handlers do
> the allocation
Yes and no. Adding just a data field -- void*, or maybe unsigned long
long so it's big enought to accommodate big timestamps -- would be a big
improvement on your current proposal. That would save the user the
drudgery of mapping the ri pointer to his/her per-instance data.
There's plenty of precedent for passing "private_data" values to
callbacks.
I don't think a len field would help much. If such info were needed, it
could be stored in the data structure pointed to by the data field.
I still don't think "letting [i.e., requiring that] user handlers do the
allocation" is a win. I'm still interested to see how this plays out in
real examples.
> and allow them to use different kinds of data
> structures per-instance.
I haven't been able to think of any scenarios where this would be
useful. A "data pouch" could always contain a union, FWIW.
>
> - Abhishek Sagar
Jim
On Nov 16, 2007 2:46 AM, Jim Keniston <[email protected]> wrote:
> > > Lacking a solution to #1a, I think Abhishek's approach provides a
> > > reasonable solution to problem #1.
> >
> > If you're not convinced that problem #1 isn't appropriately handled,
>
> I don't know where you got that idea. Reread my last sentence above.
>
> My concern is that your patch fixes one symptom (number of function
> returns reported < number of function entries reported), but not the
> root cause (we can fail to report some function returns). If fixing one
> symptom is the best we can do, then I have no real objection to that.
>
> Of course, we need to keep in mind that you're adding an externally
> visible feature that would need to be maintained, so it demands more
> scrutiny than a simple bug fix.
Agreed.
> > There are three problems in the "data pouch" approach, which is a
> > generalized case of Srinivasa's timestamp case:
> >
> > 1. It bloats the size of each return instance to a run-time defined
> > value. I'm certain that quite a few memory starved ARM kprobes users
> > would certainly be wishing they could install more probes on their
> > system without taking away too much from the precious memory pools
> > which can impact their system performance. This is not a deal breaker
> > though, just an annoyance.
>
> entry_info is, by default, a zero-length array, which adds nothing to
> the size of a uretprobe_instance -- at least on the 3 architectures I've
> tested on (i386, x86_64, powerpc).
Strange, because from what I could gather, the data pouch patch had
the following in the kretprobe registration routine:
for (i = 0; i < rp->maxactive; i++) {
- inst = kmalloc(sizeof(struct kretprobe_instance), GFP_KERNEL);
+ inst = kmalloc((sizeof(struct kretprobe_instance)
+ + rp->entry_info_sz), GFP_KERNEL);
rp->entry_info_sz is presumably the size of the private data structure
of the registering module. This is the bloat I was referring to. But
this difference should've showed up in your tests...?
> >
> > 2. Forces user entry/return handlers to use ri->entry_info (see
> > Kevin's patch) even if their design only wanted such private data to
> > be used in certain instances.
>
> No, it doesn't. Providing a feature isn't the same as forcing people to
> use the feature.
>From the portion of the patch quoted above, it seems that there is
private data allocation at registration time per-instance in one go.
First of all, not all maxactive instances get used frequently. Even if
they do, the fact that this private data would be used by the user
handlers for a particular instance is also an assumption. So I guess
it is better to leave allocation to the handlers and provide them with
a data pointer in ri.
> > Therefore ideally, any per-instance data
> > allocation should be left to user entry handlers, IMO. Even if we
> > allow a pouch for private data in a return instance, the user handlers
> > would still need be aware of "return instances" to actually use them
> > globally.
> >
> > 3. It's redundant. 'ri' can uniquely identify any entry-return handler
> > pair. This itself solves your problem #2. It only moves the onus of
> > private data allocation to user handlers.
>
> Having ri available at function entry time is definitely a win, but
> maintaining separate data structures and using ri to map to the right
> one is no trivial task. (It's a lot easier if you pre-allocate the
> maximum number of data structures you'll need -- presumably matching
> rp->maxactive -- but then you have at least the same amount of "bloat"
> as with the data pouch approach.)
True, some kind of data pointer/pouch is essential.
> I suggest you show us a probe module that captures data at function
> entry and reports it upon return, exploiting your proposed patch.
> Profiling would be a reasonable example, but something where multiple
> values are captured might be more relevant. (Your example below doesn't
> count because it doesn't work.) Then I'll code up the same example
> using your enhancement + the data pouch enhancement, and we can compare.
I'll send a sample module which uses data pointer provided in ri in my
next response.
> Of course, the data pouch enhancement would build nicely atop your
> enhancement, so it could be proposed and considered separately.
Ok.
> >
> > > Review of Abhishek's patch:
> > >
> > > I see no reason to save a copy of *regs and pass that to the entry
> > > handler. Passing the real regs pointer is good enough for other kprobe
> > > handlers.
> >
> > No, a copy is required because if the entry_handler() returns error (a
> > voluntary miss), then there has to be a way to roll-back all the
> > changes that arch_prepare_kretprobe() and entry_handler() made to
> > *regs. Such an instance is considered "aborted".
>
> No. Handlers shouldn't be writing to the pt_regs struct unless they
> want to change the operation of the probed code. If an entry handler
> scribbles on *regs and then decides to "abort", it's up to that handler
> to restore the contents of *regs. It's that way with all kprobe and
> kretprobe handlers, and the proposed entry handler is no different, as
> far as I can see.
I had the copy mainly to undo the effect of arch_prepare_kretprobe()
on regs if entry_handler decided to abort. But I think it didn't serve
that purpose as well since arch_prepare_kretprobe() ends up modifying
the stack on x86. So I've gotten rid of the copy now. The
entry_handler now gets called before arch_prepare_kretprobe() and
therefore shouldn't expect to use ri->ret_addr. In effect the
entry_handler is an early decision maker on whether the return
instance setup should even take place for current function hit or not.
> > Wouldn't the return addresses be different for recursive/nested calls?
>
> Not for recursive calls. Think about it. Or write a little program
> that calls a recursive function like factorial(), and have factorial()
> report the value of __builtin_return_address(0) each time it's called.
Oh yea got it...I was stuck with the nested call case and forgot about
direct recursion :)
> > > > The fact that ri is passed to both handlers should allow any user
> > > > handler to identify each of these cases and take appropriate
> > > > synchronization action pertaining to its private data, if needed.
> > >
> > > I don't think Abhishek has made his case here. See below.
> > >
> > > >
> > > > > (Hence I feel sol a) would be nice).
> > > >
> > > > With an entry-handler, any module aiming to profile running time of a
> > > > function (say) can simply do the following without being "return
> > > > instance" conscious. Note however that I'm not trying to address just
> > > > this scenario but trying to provide a general way to use
> > > > entry-handlers in kretprobes:
> > > >
> > > > static unsigned long flag = 0; /* use bit 0 as a global flag */
> > > > unsigned long long entry, exit;
> > > >
> > > > int my_entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> > > > {
> > > > if (!test_and_set_bit(0, &flag))
> > > > /* this instance claims the first entry to kretprobe'd function */
> > > > entry = sched_clock();
> > > > /* do other stuff */
> > > > return 0; /* right on! */
> > > > }
> > > > return 1; /* error: no return instance to be allocated for this
> > > > function entry */
> > > > }
> > > >
> > > > /* will only be called iff flag == 1 */
> > > > int my_return_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> > > > {
> > > > BUG_ON(!flag);
> > > > exit = sched_clock();
> > > > set_bit(0, &flag);
> > > > }
> > > >
> > > > I think something like this should do the trick for you.
> > >
> > > Since flag is static, it seems to me that if there were instances of the
> > > probed function active concurrently in multiple tasks, only the
> > > first-called instance would be profiled.
> >
> > Oh that's right, or we could use a per-cpu flag which would restrict
> > us to only one profiling instance per processor.
>
> If the probed function can sleep, then it could return on a different
> CPU; a per-cpu flag wouldn't work in such cases.
Hmmm...since disabling preemption from entry_handler won't work, a
more practical approach would be to associate all return instances to
tasks. I'll try to come up with an example to exploit the per-task
return instance associativity.
> >
> > > Jim Keniston
> >
> > --
> > Thanks & Regards
> > Abhishek Sagar
> >
> > ---
> > Signed-off-by: Abhishek Sagar <[email protected]>
> >
> > diff -upNr linux-2.6.24-rc2/include/linux/kprobes.h
> > linux-2.6.24-rc2_kp/include/linux/kprobes.h
> > --- linux-2.6.24-rc2/include/linux/kprobes.h 2007-11-07 03:27:46.000000000 +0530
> > +++ linux-2.6.24-rc2_kp/include/linux/kprobes.h 2007-11-15
> > 15:49:39.000000000 +0530
> > @@ -152,6 +152,7 @@ static inline int arch_trampoline_kprobe
> > struct kretprobe {
> > struct kprobe kp;
> > kretprobe_handler_t handler;
> > + kretprobe_handler_t entry_handler;
> > int maxactive;
> > int nmissed;
> > struct hlist_head free_instances;
> > diff -upNr linux-2.6.24-rc2/kernel/kprobes.c
> > linux-2.6.24-rc2_kp/kernel/kprobes.c
> > --- linux-2.6.24-rc2/kernel/kprobes.c 2007-11-07 03:27:46.000000000 +0530
> > +++ linux-2.6.24-rc2_kp/kernel/kprobes.c 2007-11-15 16:00:57.000000000 +0530
> > @@ -694,12 +694,24 @@ static int __kprobes pre_handler_kretpro
> > spin_lock_irqsave(&kretprobe_lock, flags);
> > if (!hlist_empty(&rp->free_instances)) {
> > struct kretprobe_instance *ri;
> > + struct pt_regs copy;
>
> NAK. Saving and restoring regs is expensive and inconsistent with
> existing kprobes usage.
Revising the patch with the suggested change:
---
Signed-off-by: Abhishek Sagar <[email protected]>
diff -upNr linux-2.6.24-rc2/include/linux/kprobes.h
linux-2.6.24-rc2_kp/include/linux/kprobes.h
--- linux-2.6.24-rc2/include/linux/kprobes.h 2007-11-07 03:27:46.000000000 +0530
+++ linux-2.6.24-rc2_kp/include/linux/kprobes.h 2007-11-16
22:50:24.000000000 +0530
@@ -152,6 +152,7 @@ static inline int arch_trampoline_kprobe
struct kretprobe {
struct kprobe kp;
kretprobe_handler_t handler;
+ kretprobe_handler_t entry_handler;
int maxactive;
int nmissed;
struct hlist_head free_instances;
@@ -164,6 +165,7 @@ struct kretprobe_instance {
struct kretprobe *rp;
kprobe_opcode_t *ret_addr;
struct task_struct *task;
+ void *data;
};
struct kretprobe_blackpoint {
diff -upNr linux-2.6.24-rc2/kernel/kprobes.c
linux-2.6.24-rc2_kp/kernel/kprobes.c
--- linux-2.6.24-rc2/kernel/kprobes.c 2007-11-07 03:27:46.000000000 +0530
+++ linux-2.6.24-rc2_kp/kernel/kprobes.c 2007-11-16 22:53:46.000000000 +0530
@@ -699,6 +699,14 @@ static int __kprobes pre_handler_kretpro
struct kretprobe_instance, uflist);
ri->rp = rp;
ri->task = current;
+ ri->data = NULL;
+
+ if (rp->entry_handler) {
+ if (rp->entry_handler(ri, regs)) {
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+ return 0; /* voluntary miss */
+ }
+ }
arch_prepare_kretprobe(ri, regs);
/* XXX(hch): why is there no hlist_move_head? */
On Nov 16, 2007 5:37 AM, Jim Keniston <[email protected]> wrote:
> On Thu, 2007-11-15 at 20:30 +0530, Abhishek Sagar wrote:
> > On Nov 15, 2007 4:21 AM, Jim Keniston <[email protected]> wrote:
> > > 2. Simplify the task of correlating data (e.g., timestamps) between
> > > function entry and function return.
> >
> > Would adding of data and len fields in ri help? Instead of "pouching"
> > data in one go at registration time, this would let user handlers do
> > the allocation
>
> Yes and no. Adding just a data field -- void*, or maybe unsigned long
> long so it's big enought to accommodate big timestamps -- would be a big
> improvement on your current proposal. That would save the user the
> drudgery of mapping the ri pointer to his/her per-instance data.
> There's plenty of precedent for passing "private_data" values to
> callbacks.
>
> I don't think a len field would help much. If such info were needed, it
> could be stored in the data structure pointed to by the data field.
>
> I still don't think "letting [i.e., requiring that] user handlers do the
> allocation" is a win. I'm still interested to see how this plays out in
> real examples.
>
> > and allow them to use different kinds of data
> > structures per-instance.
>
> I haven't been able to think of any scenarios where this would be
> useful. A "data pouch" could always contain a union, FWIW.
I'm inlining a sample module which uses a data pointer in ri. I didn't
go for a timestamp because it's not reliable. Some platforms might
simply not have any h/w timestamp counters. For the same reason a lot
of platforms (on ARM, say) have their sched_clock() mapped on jiffies.
This may prevent timestamps from being distinct across function entry
and exit. Plus a data pointer looks pretty harmless.
--- test module ---
#include <linux/kernel.h>
#include <linux/version.h>
#include <linux/module.h>
#include <linux/kprobes.h>
#include <linux/ktime.h>
#define PRINT_DELAY (10 * HZ)
/* This module calculates the total time and instances of func being called
* across all cpu's. An average is calculated every 10 seconds and displayed.
* Only one function instance per-task is monitored. This cuts out the
* possibility of measuring time for recursive and nested function
* invocations.
*
* Note: If compiling as a standalone module, make sure sched_clock() is
* exported in the kernel. */
/* per-task data */
struct prof_data {
struct task_struct *task;
struct list_head list;
unsigned long long entry_stamp;
};
static const char *func = "sys_open";
static spinlock_t time_lock;
static ktime_t total_time;
static unsigned long hits;
static LIST_HEAD(data_nodes); /* list of per-task data */
static struct delayed_work work_print;
static struct prof_data *get_per_task_data(struct task_struct *tsk)
{
struct prof_data *p;
/* lookup prof_data corresponding to tsk */
list_for_each_entry(p, &data_nodes, list) {
if (p->task == tsk)
return p;
}
return NULL;
}
/* called with kretprobe_lock held */
static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
struct prof_data *stats;
stats = get_per_task_data(current);
if (stats)
return 1; /* recursive/nested call */
stats = kmalloc(sizeof(struct prof_data), GFP_ATOMIC);
if (!stats)
return 1;
stats->entry_stamp = sched_clock();
stats->task = current;
INIT_LIST_HEAD(&stats->list);
list_add(&stats->list, &data_nodes);
ri->data = stats;
return 0;
}
/* called with kretprobe_lock held */
static int return_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
unsigned long flags;
struct prof_data *stats = (struct prof_data *)ri->data;
u64 elapsed;
BUG_ON(ri->data == NULL);
elapsed = (long long)sched_clock() - (long long)stats->entry_stamp;
/* update stats */
spin_lock_irqsave(&time_lock, flags);
++hits;
total_time = ktime_add_ns(total_time, elapsed);
spin_unlock_irqrestore(&time_lock, flags);
list_del(&stats->list);
kfree(stats);
return 0;
}
static struct kretprobe my_kretprobe = {
.handler = return_handler,
.entry_handler = entry_handler,
};
/* called after every PRINT_DELAY seconds */
static void print_time(struct work_struct *work)
{
unsigned long flags;
s64 time_ns;
struct timespec ts;
BUG_ON(work != &work_print.work);
spin_lock_irqsave(&time_lock, flags);
time_ns = ktime_to_ns(total_time);
do_div(time_ns, hits);
spin_unlock_irqrestore(&time_lock, flags);
ts = ns_to_timespec(time_ns);
printk(KERN_DEBUG "Avg. running time of %s = %ld sec, %ld nsec\n",
func, ts.tv_sec, ts.tv_nsec);
schedule_delayed_work(&work_print, PRINT_DELAY);
}
static int __init test_module_init(void)
{
int ret;
my_kretprobe.kp.symbol_name = (char *)func;
spin_lock_init(&time_lock);
if ((ret = register_kretprobe(&my_kretprobe)) < 0) {
printk("Failed to register test kretprobe!\n");
return -1;
}
printk("Kretprobe active on %s\n", my_kretprobe.kp.symbol_name);
INIT_DELAYED_WORK(&work_print, print_time);
schedule_delayed_work(&work_print, PRINT_DELAY);
return 0;
}
static void __exit test_module_exit(void)
{
unregister_kretprobe(&my_kretprobe);
printk("kretprobe unregistered\n");
printk("Missed probing %d instances of %s\n",
my_kretprobe.nmissed, func);
}
module_init(test_module_init)
module_exit(test_module_exit)
MODULE_LICENSE("GPL");
--
Abhishek Sagar
On Sat, 2007-11-17 at 00:23 +0530, Abhishek Sagar wrote:
> On Nov 16, 2007 5:37 AM, Jim Keniston <[email protected]> wrote:
> > On Thu, 2007-11-15 at 20:30 +0530, Abhishek Sagar wrote:
> > > On Nov 15, 2007 4:21 AM, Jim Keniston <[email protected]> wrote:
> > > > 2. Simplify the task of correlating data (e.g., timestamps) between
> > > > function entry and function return.
> > >
> > > Would adding of data and len fields in ri help? Instead of "pouching"
> > > data in one go at registration time, this would let user handlers do
> > > the allocation
> >
> > Yes and no. Adding just a data field -- void*, or maybe unsigned long
> > long so it's big enought to accommodate big timestamps -- would be a big
> > improvement on your current proposal. That would save the user the
> > drudgery of mapping the ri pointer to his/her per-instance data.
> > There's plenty of precedent for passing "private_data" values to
> > callbacks.
> >
> > I don't think a len field would help much. If such info were needed, it
> > could be stored in the data structure pointed to by the data field.
> >
> > I still don't think "letting [i.e., requiring that] user handlers do the
> > allocation" is a win. I'm still interested to see how this plays out in
> > real examples.
> >
> ...
>
> I'm inlining a sample module which uses a data pointer in ri. I didn't
> go for a timestamp because it's not reliable. Some platforms might
> simply not have any h/w timestamp counters. For the same reason a lot
> of platforms (on ARM, say) have their sched_clock() mapped on jiffies.
> This may prevent timestamps from being distinct across function entry
> and exit. Plus a data pointer looks pretty harmless.
>
> --- test module ---
>
> #include <linux/kernel.h>
> #include <linux/version.h>
> #include <linux/module.h>
> #include <linux/kprobes.h>
> #include <linux/ktime.h>
>
> #define PRINT_DELAY (10 * HZ)
>
> /* This module calculates the total time and instances of func being called
> * across all cpu's. An average is calculated every 10 seconds and displayed.
> * Only one function instance per-task is monitored. This cuts out the
> * possibility of measuring time for recursive and nested function
> * invocations.
> *
> * Note: If compiling as a standalone module, make sure sched_clock() is
> * exported in the kernel. */
>
> /* per-task data */
> struct prof_data {
> struct task_struct *task;
> struct list_head list;
> unsigned long long entry_stamp;
> };
>
> static const char *func = "sys_open";
> static spinlock_t time_lock;
> static ktime_t total_time;
> static unsigned long hits;
> static LIST_HEAD(data_nodes); /* list of per-task data */
> static struct delayed_work work_print;
>
> static struct prof_data *get_per_task_data(struct task_struct *tsk)
> {
> struct prof_data *p;
>
> /* lookup prof_data corresponding to tsk */
> list_for_each_entry(p, &data_nodes, list) {
> if (p->task == tsk)
> return p;
> }
> return NULL;
> }
>
> /* called with kretprobe_lock held */
> static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> {
> struct prof_data *stats;
>
> stats = get_per_task_data(current);
> if (stats)
> return 1; /* recursive/nested call */
>
> stats = kmalloc(sizeof(struct prof_data), GFP_ATOMIC);
> if (!stats)
> return 1;
>
> stats->entry_stamp = sched_clock();
> stats->task = current;
> INIT_LIST_HEAD(&stats->list);
> list_add(&stats->list, &data_nodes);
> ri->data = stats;
> return 0;
> }
>
> /* called with kretprobe_lock held */
> static int return_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> {
> unsigned long flags;
> struct prof_data *stats = (struct prof_data *)ri->data;
> u64 elapsed;
>
> BUG_ON(ri->data == NULL);
> elapsed = (long long)sched_clock() - (long long)stats->entry_stamp;
>
> /* update stats */
> spin_lock_irqsave(&time_lock, flags);
> ++hits;
> total_time = ktime_add_ns(total_time, elapsed);
> spin_unlock_irqrestore(&time_lock, flags);
>
> list_del(&stats->list);
> kfree(stats);
> return 0;
> }
>
> static struct kretprobe my_kretprobe = {
> .handler = return_handler,
> .entry_handler = entry_handler,
> };
>
> /* called after every PRINT_DELAY seconds */
> static void print_time(struct work_struct *work)
> {
> unsigned long flags;
> s64 time_ns;
> struct timespec ts;
>
> BUG_ON(work != &work_print.work);
> spin_lock_irqsave(&time_lock, flags);
> time_ns = ktime_to_ns(total_time);
> do_div(time_ns, hits);
> spin_unlock_irqrestore(&time_lock, flags);
>
> ts = ns_to_timespec(time_ns);
> printk(KERN_DEBUG "Avg. running time of %s = %ld sec, %ld nsec\n",
> func, ts.tv_sec, ts.tv_nsec);
> schedule_delayed_work(&work_print, PRINT_DELAY);
> }
>
> static int __init test_module_init(void)
> {
> int ret;
> my_kretprobe.kp.symbol_name = (char *)func;
>
> spin_lock_init(&time_lock);
> if ((ret = register_kretprobe(&my_kretprobe)) < 0) {
> printk("Failed to register test kretprobe!\n");
> return -1;
> }
> printk("Kretprobe active on %s\n", my_kretprobe.kp.symbol_name);
> INIT_DELAYED_WORK(&work_print, print_time);
> schedule_delayed_work(&work_print, PRINT_DELAY);
> return 0;
> }
>
> static void __exit test_module_exit(void)
> {
> unregister_kretprobe(&my_kretprobe);
> printk("kretprobe unregistered\n");
> printk("Missed probing %d instances of %s\n",
> my_kretprobe.nmissed, func);
> }
>
> module_init(test_module_init)
> module_exit(test_module_exit)
> MODULE_LICENSE("GPL");
>
> --
> Abhishek Sagar
First of all, as promised, here's what would be different if it were
implemented using the data-pouch approach:
--- abhishek1.c 2007-11-16 13:57:13.000000000 -0800
+++ jim1.c 2007-11-16 14:20:39.000000000 -0800
@@ -50,15 +50,12 @@
if (stats)
return 1; /* recursive/nested call */
- stats = kmalloc(sizeof(struct prof_data), GFP_ATOMIC);
- if (!stats)
- return 1;
+ stats = (struct prof_data *) ri->entry_info;
stats->entry_stamp = sched_clock();
stats->task = current;
INIT_LIST_HEAD(&stats->list);
list_add(&stats->list, &data_nodes);
- ri->data = stats;
return 0;
}
@@ -66,10 +63,9 @@
static int return_handler(struct kretprobe_instance *ri, struct pt_regs
*regs)
{
unsigned long flags;
- struct prof_data *stats = (struct prof_data *)ri->data;
+ struct prof_data *stats = (struct prof_data *)ri->entry_info;
u64 elapsed;
- BUG_ON(ri->data == NULL);
elapsed = (long long)sched_clock() - (long long)stats->entry_stamp;
/* update stats */
@@ -79,13 +75,13 @@
spin_unlock_irqrestore(&time_lock, flags);
list_del(&stats->list);
- kfree(stats);
return 0;
}
static struct kretprobe my_kretprobe = {
.handler = return_handler,
.entry_handler = entry_handler,
+ .entry_info_sz = sizeof(struct prof_data)
};
/* called after every PRINT_DELAY seconds */
So the data-pouch approach saves you a little code and a kmalloc/kfree
round trip on each kretprobe hit. A kmalloc/kfree round trip is about
80 ns on my system, or about 20% of the base cost of a kretprobe hit. I
don't know how important that is to people.
I also note that this particular example maintains a per-task list of
prof_data objects to avoid overcounting the time spent in a recursive
function. That adds about 30% to the size of your module source (136
lines vs. 106, by my count). I suspect that many instrumentation
modules wouldn't need such a list. However, without your ri->data
pointer (or Kevin's ri->entry_info pouch), every instrumentation module
that uses your enhancement would need such a list in order to map the ri
to the per-instance.
Jim
It'd be helpful to see others (especially kprobes maintainers) chime in
on this. In particular, if doing kmalloc/kfree of GFP_ATOMIC data at
kretprobe-hit time is OK, as in Abhishek's approach, then we could also
use GFP_ATOMIC (or at least GFP_NOWAIT) allocations to make up the
difference when we run low on kretprobe_instances.
More comments below.
On Fri, 2007-11-16 at 23:20 +0530, Abhishek Sagar wrote:
> On Nov 16, 2007 2:46 AM, Jim Keniston <[email protected]> wrote:
...
>
> > > There are three problems in the "data pouch" approach, which is a
> > > generalized case of Srinivasa's timestamp case:
> > >
> > > 1. It bloats the size of each return instance to a run-time defined
> > > value. I'm certain that quite a few memory starved ARM kprobes users
> > > would certainly be wishing they could install more probes on their
> > > system without taking away too much from the precious memory pools
> > > which can impact their system performance. This is not a deal breaker
> > > though, just an annoyance.
> >
> > entry_info is, by default, a zero-length array, which adds nothing to
> > the size of a uretprobe_instance -- at least on the 3 architectures I've
> > tested on (i386, x86_64, powerpc).
>
> Strange, because from what I could gather, the data pouch patch had
> the following in the kretprobe registration routine:
>
>
> for (i = 0; i < rp->maxactive; i++) {
> - inst = kmalloc(sizeof(struct kretprobe_instance), GFP_KERNEL);
> + inst = kmalloc((sizeof(struct kretprobe_instance)
> + + rp->entry_info_sz), GFP_KERNEL);
>
>
> rp->entry_info_sz is presumably the size of the private data structure
> of the registering module.
... which is zero for kretprobes that don't use the data pouch.
> This is the bloat I was referring to. But
> this difference should've showed up in your tests...?
What bloat? On my 32-bit system, the pouch to hold struct prof_data in
your test_module example would be 20 bytes. (For comparison,
sizeof(struct kretprobe_instance) = 28, btw.) Except for functions like
schedule(), where a lot of tasks can be sleeping at the same time, an
rp->maxactive value of 5 or 10 is typically plenty. That's 100-200
bytes of "bloat" spent at registration time (GFP_KERNEL), at least some
of which will be saved at probe-hit time (GFP_ATOMIC). (And if somebody
says, "I always use a much higher value of rp->maxactive," then he/she's
probably not really worried about bloat.)
>
> > >
> > > 2. Forces user entry/return handlers to use ri->entry_info (see
> > > Kevin's patch) even if their design only wanted such private data to
> > > be used in certain instances.
> >
> > No, it doesn't. Providing a feature isn't the same as forcing people to
> > use the feature.
>
> From the portion of the patch quoted above, it seems that there is
> private data allocation at registration time per-instance in one go.
> First of all, not all maxactive instances get used frequently. Even if
> they do, the fact that this private data would be used by the user
> handlers for a particular instance is also an assumption. So I guess
> it is better to leave allocation to the handlers and provide them with
> a data pointer in ri.
But as noted in my review of your sample module, hand-crafted, per-hit
allocation is more expensive both in terms of execution time and code
size/complexity.
>
...
> > >
> > > 3. It's redundant. 'ri' can uniquely identify any entry-return handler
> > > pair. This itself solves your problem #2. It only moves the onus of
> > > private data allocation to user handlers.
> >
> > Having ri available at function entry time is definitely a win, but
> > maintaining separate data structures and using ri to map to the right
> > one is no trivial task.
...
>
> True, some kind of data pointer/pouch is essential.
Yes. If the pouch idea is too weird, then the data pointer is a good
compromise.
With the above reservations, your enclosed patch looks OK.
You should provide a patch #2 that updates Documentation/kprobes.txt.
Maybe that will yield a little more review from other folks.
>
> > I suggest you show us a probe module that captures data at function
> > entry and reports it upon return, exploiting your proposed patch.
> > Profiling would be a reasonable example, but something where multiple
> > values are captured might be more relevant. (Your example below doesn't
> > count because it doesn't work.) Then I'll code up the same example
> > using your enhancement + the data pouch enhancement, and we can compare.
>
> I'll send a sample module which uses data pointer provided in ri in my
> next response.
Got it. Thanks. I've already commented.
Jim
>
> > Of course, the data pouch enhancement would build nicely atop your
> > enhancement, so it could be proposed and considered separately.
>
> Ok.
>
> > >
> > > > Review of Abhishek's patch:
...
> Revising the patch with the suggested change:
>
> ---
> Signed-off-by: Abhishek Sagar <[email protected]>
>
> diff -upNr linux-2.6.24-rc2/include/linux/kprobes.h
> linux-2.6.24-rc2_kp/include/linux/kprobes.h
> --- linux-2.6.24-rc2/include/linux/kprobes.h 2007-11-07 03:27:46.000000000 +0530
> +++ linux-2.6.24-rc2_kp/include/linux/kprobes.h 2007-11-16
> 22:50:24.000000000 +0530
> @@ -152,6 +152,7 @@ static inline int arch_trampoline_kprobe
> struct kretprobe {
> struct kprobe kp;
> kretprobe_handler_t handler;
> + kretprobe_handler_t entry_handler;
> int maxactive;
> int nmissed;
> struct hlist_head free_instances;
> @@ -164,6 +165,7 @@ struct kretprobe_instance {
> struct kretprobe *rp;
> kprobe_opcode_t *ret_addr;
> struct task_struct *task;
> + void *data;
> };
>
> struct kretprobe_blackpoint {
> diff -upNr linux-2.6.24-rc2/kernel/kprobes.c
> linux-2.6.24-rc2_kp/kernel/kprobes.c
> --- linux-2.6.24-rc2/kernel/kprobes.c 2007-11-07 03:27:46.000000000 +0530
> +++ linux-2.6.24-rc2_kp/kernel/kprobes.c 2007-11-16 22:53:46.000000000 +0530
> @@ -699,6 +699,14 @@ static int __kprobes pre_handler_kretpro
> struct kretprobe_instance, uflist);
> ri->rp = rp;
> ri->task = current;
> + ri->data = NULL;
> +
> + if (rp->entry_handler) {
> + if (rp->entry_handler(ri, regs)) {
> + spin_unlock_irqrestore(&kretprobe_lock, flags);
> + return 0; /* voluntary miss */
> + }
> + }
> arch_prepare_kretprobe(ri, regs);
>
> /* XXX(hch): why is there no hlist_move_head? */
On Nov 17, 2007 4:39 AM, Jim Keniston <[email protected]> wrote:
> First of all, as promised, here's what would be different if it were
> implemented using the data-pouch approach:
>
> --- abhishek1.c 2007-11-16 13:57:13.000000000 -0800
> +++ jim1.c 2007-11-16 14:20:39.000000000 -0800
> @@ -50,15 +50,12 @@
> if (stats)
> return 1; /* recursive/nested call */
>
> - stats = kmalloc(sizeof(struct prof_data), GFP_ATOMIC);
> - if (!stats)
> - return 1;
> + stats = (struct prof_data *) ri->entry_info;
>
> stats->entry_stamp = sched_clock();
> stats->task = current;
> INIT_LIST_HEAD(&stats->list);
> list_add(&stats->list, &data_nodes);
> - ri->data = stats;
> return 0;
> }
>
> @@ -66,10 +63,9 @@
> static int return_handler(struct kretprobe_instance *ri, struct pt_regs
> *regs)
> {
> unsigned long flags;
> - struct prof_data *stats = (struct prof_data *)ri->data;
> + struct prof_data *stats = (struct prof_data *)ri->entry_info;
> u64 elapsed;
>
> - BUG_ON(ri->data == NULL);
> elapsed = (long long)sched_clock() - (long long)stats->entry_stamp;
>
> /* update stats */
> @@ -79,13 +75,13 @@
> spin_unlock_irqrestore(&time_lock, flags);
>
> list_del(&stats->list);
> - kfree(stats);
> return 0;
> }
>
> static struct kretprobe my_kretprobe = {
> .handler = return_handler,
> .entry_handler = entry_handler,
> + .entry_info_sz = sizeof(struct prof_data)
> };
>
> /* called after every PRINT_DELAY seconds */
>
> So the data-pouch approach saves you a little code and a kmalloc/kfree
> round trip on each kretprobe hit. A kmalloc/kfree round trip is about
> 80 ns on my system, or about 20% of the base cost of a kretprobe hit. I
> don't know how important that is to people.
>
> I also note that this particular example maintains a per-task list of
> prof_data objects to avoid overcounting the time spent in a recursive
> function. That adds about 30% to the size of your module source (136
> lines vs. 106, by my count). I suspect that many instrumentation
> modules wouldn't need such a list. However, without your ri->data
> pointer (or Kevin's ri->entry_info pouch), every instrumentation module
> that uses your enhancement would need such a list in order to map the ri
> to the per-instance.
Those are interesting numbers. Will incorporate pouching in the next
patch. Even with a data pointer or pouch, the mapping of ri (or
ri->data) would sometimes be necessary. It's required to catch
recursive/nested invocation cases. In case of time measurment test
module, these invocations needed to be weeded out and therefore such a
list was required. Other scenarios might not care for it. E.g a module
which measures the change in some global system state across every
call.
Thanks for the comments.
> Jim
- Abhishek
On Nov 17, 2007 6:24 AM, Jim Keniston <[email protected]> wrote:
> It'd be helpful to see others (especially kprobes maintainers) chime in
> on this. In particular, if doing kmalloc/kfree of GFP_ATOMIC data at
> kretprobe-hit time is OK, as in Abhishek's approach, then we could also
> use GFP_ATOMIC (or at least GFP_NOWAIT) allocations to make up the
> difference when we run low on kretprobe_instances.
It might cause a problem with return instances having a large value of
entry_info_sz, being allocated in the page frame reclamation code
path.
> > > entry_info is, by default, a zero-length array, which adds nothing to
> > > the size of a uretprobe_instance -- at least on the 3 architectures I've
> > > tested on (i386, x86_64, powerpc).
> >
> > Strange, because from what I could gather, the data pouch patch had
> > the following in the kretprobe registration routine:
> >
> >
> > for (i = 0; i < rp->maxactive; i++) {
> > - inst = kmalloc(sizeof(struct kretprobe_instance), GFP_KERNEL);
> > + inst = kmalloc((sizeof(struct kretprobe_instance)
> > + + rp->entry_info_sz), GFP_KERNEL);
> >
> >
> > rp->entry_info_sz is presumably the size of the private data structure
> > of the registering module.
>
> ... which is zero for kretprobes that don't use the data pouch.
>
> > This is the bloat I was referring to. But
> > this difference should've showed up in your tests...?
>
> What bloat? On my 32-bit system, the pouch to hold struct prof_data in
> your test_module example would be 20 bytes. (For comparison,
> sizeof(struct kretprobe_instance) = 28, btw.) Except for functions like
> schedule(), where a lot of tasks can be sleeping at the same time, an
> rp->maxactive value of 5 or 10 is typically plenty. That's 100-200
> bytes of "bloat" spent at registration time (GFP_KERNEL), at least some
> of which will be saved at probe-hit time (GFP_ATOMIC). (And if somebody
> says, "I always use a much higher value of rp->maxactive," then he/she's
> probably not really worried about bloat.)
Ok. Will make the necessary transition to registration time allocation
of private data.
> Yes. If the pouch idea is too weird, then the data pointer is a good
> compromise.
>
> With the above reservations, your enclosed patch looks OK.
>
> You should provide a patch #2 that updates Documentation/kprobes.txt.
> Maybe that will yield a little more review from other folks.
Will incorporate changes to kprobes.txt as well.
- Abhishek
On Nov 17, 2007 6:24 AM, Jim Keniston <[email protected]> wrote:
> > True, some kind of data pointer/pouch is essential.
>
> Yes. If the pouch idea is too weird, then the data pointer is a good
> compromise.
>
> With the above reservations, your enclosed patch looks OK.
>
> You should provide a patch #2 that updates Documentation/kprobes.txt.
> Maybe that will yield a little more review from other folks.
The inlined patch provides support for an optional user entry-handler
in kretprobes. It also adds provision for private data to be held in
each return instance based on Kevin Stafford's "data pouch" approach.
Kretprobe usage example in Documentation/kprobes.txt has also been
updated to demonstrate the usage of entry-handlers.
Signed-off-by: Abhishek Sagar <[email protected]>
---
diff -upNr linux-2.6.24-rc2/Documentation/kprobes.txt
linux-2.6.24-rc2_kp/Documentation/kprobes.txt
--- linux-2.6.24-rc2/Documentation/kprobes.txt 2007-11-07
03:27:46.000000000 +0530
+++ linux-2.6.24-rc2_kp/Documentation/kprobes.txt 2007-11-19
17:41:27.000000000 +0530
@@ -100,16 +100,21 @@ prototype matches that of the probed fun
When you call register_kretprobe(), Kprobes establishes a kprobe at
the entry to the function. When the probed function is called and this
-probe is hit, Kprobes saves a copy of the return address, and replaces
-the return address with the address of a "trampoline." The trampoline
-is an arbitrary piece of code -- typically just a nop instruction.
-At boot time, Kprobes registers a kprobe at the trampoline.
-
-When the probed function executes its return instruction, control
-passes to the trampoline and that probe is hit. Kprobes' trampoline
-handler calls the user-specified handler associated with the kretprobe,
-then sets the saved instruction pointer to the saved return address,
-and that's where execution resumes upon return from the trap.
+probe is hit, the user defined entry_handler is invoked (optional). If
+the entry_handler returns 0 (success) or is not present, then Kprobes
+saves a copy of the return address, and replaces the return address
+with the address of a "trampoline." If the entry_handler returns a
+non-zero error, the function executes as normal, as if no probe was
+installed on it. The trampoline is an arbitrary piece of code --
+typically just a nop instruction. At boot time, Kprobes registers a
+kprobe at the trampoline.
+
+After the trampoline return address is planted, when the probed function
+executes its return instruction, control passes to the trampoline and
+that probe is hit. Kprobes' trampoline handler calls the user-specified
+return handler associated with the kretprobe, then sets the saved
+instruction pointer to the saved return address, and that's where
+execution resumes upon return from the trap.
While the probed function is executing, its return address is
stored in an object of type kretprobe_instance. Before calling
@@ -117,6 +122,9 @@ register_kretprobe(), the user sets the
kretprobe struct to specify how many instances of the specified
function can be probed simultaneously. register_kretprobe()
pre-allocates the indicated number of kretprobe_instance objects.
+Additionally, a user may also specify per-instance private data to
+be part of each return instance. This is useful when using kretprobes
+with a user entry_handler (see "register_kretprobe" for details).
For example, if the function is non-recursive and is called with a
spinlock held, maxactive = 1 should be enough. If the function is
@@ -129,7 +137,8 @@ It's not a disaster if you set maxactive
some probes. In the kretprobe struct, the nmissed field is set to
zero when the return probe is registered, and is incremented every
time the probed function is entered but there is no kretprobe_instance
-object available for establishing the return probe.
+object available for establishing the return probe. A miss also prevents
+user entry_handler from being invoked.
2. Architectures Supported
@@ -258,6 +267,16 @@ Establishes a return probe for the funct
rp->kp.addr. When that function returns, Kprobes calls rp->handler.
You must set rp->maxactive appropriately before you call
register_kretprobe(); see "How Does a Return Probe Work?" for details.
+An optional entry-handler can also be specified by initializing
+rp->entry_handler. This handler is called at the beginning of the
+probed function (except for instances exceeding rp->maxactive). On
+success the entry_handler return 0, which guarantees invocation of
+a corresponding return handler. Corresponding entry and return handler
+invocations can be matched using the return instance (ri) passed to them.
+Also, each ri can hold per-instance private data (ri->data), whose size
+is determined by rp->data_size. If the entry_handler returns a non-zero
+error, the current return instance is skipped and no return handler is
+called for the current instance.
register_kretprobe() returns 0 on success, or a negative errno
otherwise.
@@ -555,23 +574,46 @@ report failed calls to sys_open().
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>
+#include <linux/ktime.h>
static const char *probed_func = "sys_open";
-/* Return-probe handler: If the probed function fails, log the return value. */
-static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
+/* per-instance private data */
+struct my_data {
+ ktime_t entry_stamp;
+};
+
+static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
+{
+ struct my_data *data;
+
+ if(!current->mm)
+ return 1; /* skip kernel threads */
+
+ data = (struct my_data *)ri->data;
+ data->entry_stamp = ktime_get();
+ return 0;
+}
+
+static int return_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
int retval = regs_return_value(regs);
- if (retval < 0) {
- printk("%s returns %d\n", probed_func, retval);
- }
+ struct my_data *data = (struct my_data *)ri->data;
+ s64 delta;
+ ktime_t now = ktime_get();
+
+ delta = ktime_to_ns(ktime_sub(now, data->entry_stamp));
+ if (retval < 0) /* probed function failed; log retval and duration */
+ printk("%s: return val = %d (duration = %lld ns)\n",
+ probed_func, retval, delta);
return 0;
}
static struct kretprobe my_kretprobe = {
- .handler = ret_handler,
- /* Probe up to 20 instances concurrently. */
- .maxactive = 20
+ .handler = return_handler,
+ .entry_handler = entry_handler,
+ .data_size = sizeof(struct my_data),
+ .maxactive = 1, /* profile one invocation at a time */
};
static int __init kretprobe_init(void)
@@ -580,10 +622,10 @@ static int __init kretprobe_init(void)
my_kretprobe.kp.symbol_name = (char *)probed_func;
if ((ret = register_kretprobe(&my_kretprobe)) < 0) {
- printk("register_kretprobe failed, returned %d\n", ret);
+ printk("Failed to register kretprobe!\n");
return -1;
}
- printk("Planted return probe at %p\n", my_kretprobe.kp.addr);
+ printk("Kretprobe active on %s\n", my_kretprobe.kp.symbol_name);
return 0;
}
@@ -591,7 +633,6 @@ static void __exit kretprobe_exit(void)
{
unregister_kretprobe(&my_kretprobe);
printk("kretprobe unregistered\n");
- /* nmissed > 0 suggests that maxactive was set too low. */
printk("Missed probing %d instances of %s\n",
my_kretprobe.nmissed, probed_func);
}
diff -upNr linux-2.6.24-rc2/include/linux/kprobes.h
linux-2.6.24-rc2_kp/include/linux/kprobes.h
--- linux-2.6.24-rc2/include/linux/kprobes.h 2007-11-07 03:27:46.000000000 +0530
+++ linux-2.6.24-rc2_kp/include/linux/kprobes.h 2007-11-19
15:56:20.000000000 +0530
@@ -152,8 +152,10 @@ static inline int arch_trampoline_kprobe
struct kretprobe {
struct kprobe kp;
kretprobe_handler_t handler;
+ kretprobe_handler_t entry_handler;
int maxactive;
int nmissed;
+ size_t data_size;
struct hlist_head free_instances;
struct hlist_head used_instances;
};
@@ -164,6 +166,7 @@ struct kretprobe_instance {
struct kretprobe *rp;
kprobe_opcode_t *ret_addr;
struct task_struct *task;
+ char data[0];
};
struct kretprobe_blackpoint {
diff -upNr linux-2.6.24-rc2/kernel/kprobes.c
linux-2.6.24-rc2_kp/kernel/kprobes.c
--- linux-2.6.24-rc2/kernel/kprobes.c 2007-11-07 03:27:46.000000000 +0530
+++ linux-2.6.24-rc2_kp/kernel/kprobes.c 2007-11-19 15:34:18.000000000 +0530
@@ -699,6 +699,14 @@ static int __kprobes pre_handler_kretpro
struct kretprobe_instance, uflist);
ri->rp = rp;
ri->task = current;
+
+ if (rp->entry_handler) {
+ if (rp->entry_handler(ri, regs)) {
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+ return 0;
+ }
+ }
+
arch_prepare_kretprobe(ri, regs);
/* XXX(hch): why is there no hlist_move_head? */
@@ -745,7 +753,8 @@ int __kprobes register_kretprobe(struct
INIT_HLIST_HEAD(&rp->used_instances);
INIT_HLIST_HEAD(&rp->free_instances);
for (i = 0; i < rp->maxactive; i++) {
- inst = kmalloc(sizeof(struct kretprobe_instance), GFP_KERNEL);
+ inst = kmalloc(sizeof(struct kretprobe_instance) +
+ rp->data_size, GFP_KERNEL);
if (inst == NULL) {
free_rp_inst(rp);
return -ENOMEM;
On Mon, 2007-11-19 at 17:56 +0530, Abhishek Sagar wrote:
> On Nov 17, 2007 6:24 AM, Jim Keniston <[email protected]> wrote:
> > > True, some kind of data pointer/pouch is essential.
> >
> > Yes. If the pouch idea is too weird, then the data pointer is a good
> > compromise.
> >
> > With the above reservations, your enclosed patch looks OK.
> >
> > You should provide a patch #2 that updates Documentation/kprobes.txt.
> > Maybe that will yield a little more review from other folks.
>
> The inlined patch provides support for an optional user entry-handler
> in kretprobes. It also adds provision for private data to be held in
> each return instance based on Kevin Stafford's "data pouch" approach.
> Kretprobe usage example in Documentation/kprobes.txt has also been
> updated to demonstrate the usage of entry-handlers.
>
> Signed-off-by: Abhishek Sagar <[email protected]>
Thanks for doing this.
I have one minor suggestion on the code -- see below -- but I'm willing
to ack that with or without the suggested change. Please also see
suggestions on kprobes.txt and the demo program.
Jim Keniston
> ---
> diff -upNr linux-2.6.24-rc2/Documentation/kprobes.txt
> linux-2.6.24-rc2_kp/Documentation/kprobes.txt
> --- linux-2.6.24-rc2/Documentation/kprobes.txt 2007-11-07
> 03:27:46.000000000 +0530
> +++ linux-2.6.24-rc2_kp/Documentation/kprobes.txt 2007-11-19
> 17:41:27.000000000 +0530
> @@ -100,16 +100,21 @@ prototype matches that of the probed fun
>
> When you call register_kretprobe(), Kprobes establishes a kprobe at
> the entry to the function. When the probed function is called and this
> -probe is hit, Kprobes saves a copy of the return address, and replaces
> -the return address with the address of a "trampoline." The trampoline
> -is an arbitrary piece of code -- typically just a nop instruction.
> -At boot time, Kprobes registers a kprobe at the trampoline.
> -
> -When the probed function executes its return instruction, control
> -passes to the trampoline and that probe is hit. Kprobes' trampoline
> -handler calls the user-specified handler associated with the kretprobe,
> -then sets the saved instruction pointer to the saved return address,
> -and that's where execution resumes upon return from the trap.
> +probe is hit, the user defined entry_handler is invoked (optional). If
probe is hit, the user-defined entry_handler, if any, is invoked. If
> +the entry_handler returns 0 (success) or is not present, then Kprobes
> +saves a copy of the return address, and replaces the return address
> +with the address of a "trampoline." If the entry_handler returns a
> +non-zero error, the function executes as normal, as if no probe was
> +installed on it.
non-zero value, Kprobes leaves the return address as is, and the
kretprobe has no further effect for that particular function instance.
> The trampoline is an arbitrary piece of code --
> +typically just a nop instruction. At boot time, Kprobes registers a
> +kprobe at the trampoline.
> +
> +After the trampoline return address is planted, when the probed function
> +executes its return instruction, control passes to the trampoline and
> +that probe is hit. Kprobes' trampoline handler calls the user-specified
> +return handler associated with the kretprobe, then sets the saved
> +instruction pointer to the saved return address, and that's where
> +execution resumes upon return from the trap.
>
> While the probed function is executing, its return address is
> stored in an object of type kretprobe_instance. Before calling
> @@ -117,6 +122,9 @@ register_kretprobe(), the user sets the
> kretprobe struct to specify how many instances of the specified
> function can be probed simultaneously. register_kretprobe()
> pre-allocates the indicated number of kretprobe_instance objects.
> +Additionally, a user may also specify per-instance private data to
> +be part of each return instance. This is useful when using kretprobes
> +with a user entry_handler (see "register_kretprobe" for details).
>
> For example, if the function is non-recursive and is called with a
> spinlock held, maxactive = 1 should be enough. If the function is
> @@ -129,7 +137,8 @@ It's not a disaster if you set maxactive
> some probes. In the kretprobe struct, the nmissed field is set to
> zero when the return probe is registered, and is incremented every
> time the probed function is entered but there is no kretprobe_instance
> -object available for establishing the return probe.
> +object available for establishing the return probe. A miss also prevents
> +user entry_handler from being invoked.
>
> 2. Architectures Supported
>
> @@ -258,6 +267,16 @@ Establishes a return probe for the funct
> rp->kp.addr. When that function returns, Kprobes calls rp->handler.
> You must set rp->maxactive appropriately before you call
> register_kretprobe(); see "How Does a Return Probe Work?" for details.
It would be more consistent with the existing text in kprobes.txt to add
a subsection labeled "User's entry handler (rp->entry_handler):" and
document the entry_handler there.
> +An optional entry-handler can also be specified by initializing
> +rp->entry_handler. This handler is called at the beginning of the
> +probed function (except for instances exceeding rp->maxactive). On
> +success the entry_handler return 0, which guarantees invocation of
> +a corresponding return handler. Corresponding entry and return handler
> +invocations can be matched using the return instance (ri) passed to them.
> +Also, each ri can hold per-instance private data (ri->data), whose size
> +is determined by rp->data_size. If the entry_handler returns a non-zero
> +error, the current return instance is skipped
Unlcear? It might be clearer to say that "the current
kretprobe_instance is recycled."
> and no return handler is
> +called for the current instance.
>
> register_kretprobe() returns 0 on success, or a negative errno
> otherwise.
> @@ -555,23 +574,46 @@ report failed calls to sys_open().
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/kprobes.h>
> +#include <linux/ktime.h>
>
> static const char *probed_func = "sys_open";
>
> -/* Return-probe handler: If the probed function fails, log the return value. */
> -static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> +/* per-instance private data */
> +struct my_data {
> + ktime_t entry_stamp;
> +};
> +
> +static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> +{
> + struct my_data *data;
> +
> + if(!current->mm)
> + return 1; /* skip kernel threads */
> +
> + data = (struct my_data *)ri->data;
> + data->entry_stamp = ktime_get();
> + return 0;
> +}
> +
> +static int return_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> {
> int retval = regs_return_value(regs);
> - if (retval < 0) {
> - printk("%s returns %d\n", probed_func, retval);
> - }
> + struct my_data *data = (struct my_data *)ri->data;
> + s64 delta;
> + ktime_t now = ktime_get();
> +
> + delta = ktime_to_ns(ktime_sub(now, data->entry_stamp));
> + if (retval < 0) /* probed function failed; log retval and duration */
> + printk("%s: return val = %d (duration = %lld ns)\n",
> + probed_func, retval, delta);
> return 0;
> }
>
> static struct kretprobe my_kretprobe = {
> - .handler = ret_handler,
> - /* Probe up to 20 instances concurrently. */
> - .maxactive = 20
> + .handler = return_handler,
> + .entry_handler = entry_handler,
> + .data_size = sizeof(struct my_data),
> + .maxactive = 1, /* profile one invocation at a time */
I don't like the idea of setting maxactive = 1 here. That's not normal
kretprobes usage, which is what we're trying to illustrate here. This
is no place for splitting hairs about profiling recursive functions.
> };
>
> static int __init kretprobe_init(void)
> @@ -580,10 +622,10 @@ static int __init kretprobe_init(void)
> my_kretprobe.kp.symbol_name = (char *)probed_func;
>
> if ((ret = register_kretprobe(&my_kretprobe)) < 0) {
> - printk("register_kretprobe failed, returned %d\n", ret);
> + printk("Failed to register kretprobe!\n");
> return -1;
> }
> - printk("Planted return probe at %p\n", my_kretprobe.kp.addr);
> + printk("Kretprobe active on %s\n", my_kretprobe.kp.symbol_name);
> return 0;
> }
>
> @@ -591,7 +633,6 @@ static void __exit kretprobe_exit(void)
> {
> unregister_kretprobe(&my_kretprobe);
> printk("kretprobe unregistered\n");
> - /* nmissed > 0 suggests that maxactive was set too low. */
> printk("Missed probing %d instances of %s\n",
> my_kretprobe.nmissed, probed_func);
> }
> diff -upNr linux-2.6.24-rc2/include/linux/kprobes.h
> linux-2.6.24-rc2_kp/include/linux/kprobes.h
> --- linux-2.6.24-rc2/include/linux/kprobes.h 2007-11-07 03:27:46.000000000 +0530
> +++ linux-2.6.24-rc2_kp/include/linux/kprobes.h 2007-11-19
> 15:56:20.000000000 +0530
> @@ -152,8 +152,10 @@ static inline int arch_trampoline_kprobe
> struct kretprobe {
> struct kprobe kp;
> kretprobe_handler_t handler;
> + kretprobe_handler_t entry_handler;
> int maxactive;
> int nmissed;
> + size_t data_size;
> struct hlist_head free_instances;
> struct hlist_head used_instances;
> };
> @@ -164,6 +166,7 @@ struct kretprobe_instance {
> struct kretprobe *rp;
> kprobe_opcode_t *ret_addr;
> struct task_struct *task;
> + char data[0];
> };
>
> struct kretprobe_blackpoint {
> diff -upNr linux-2.6.24-rc2/kernel/kprobes.c
> linux-2.6.24-rc2_kp/kernel/kprobes.c
> --- linux-2.6.24-rc2/kernel/kprobes.c 2007-11-07 03:27:46.000000000 +0530
> +++ linux-2.6.24-rc2_kp/kernel/kprobes.c 2007-11-19 15:34:18.000000000 +0530
> @@ -699,6 +699,14 @@ static int __kprobes pre_handler_kretpro
> struct kretprobe_instance, uflist);
> ri->rp = rp;
> ri->task = current;
> +
> + if (rp->entry_handler) {
> + if (rp->entry_handler(ri, regs)) {
Could also be
if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> + spin_unlock_irqrestore(&kretprobe_lock, flags);
> + return 0;
> + }
> + }
> +
> arch_prepare_kretprobe(ri, regs);
>
> /* XXX(hch): why is there no hlist_move_head? */
> @@ -745,7 +753,8 @@ int __kprobes register_kretprobe(struct
> INIT_HLIST_HEAD(&rp->used_instances);
> INIT_HLIST_HEAD(&rp->free_instances);
> for (i = 0; i < rp->maxactive; i++) {
> - inst = kmalloc(sizeof(struct kretprobe_instance), GFP_KERNEL);
> + inst = kmalloc(sizeof(struct kretprobe_instance) +
> + rp->data_size, GFP_KERNEL);
> if (inst == NULL) {
> free_rp_inst(rp);
> return -ENOMEM;
On 11/21/07, Jim Keniston <[email protected]> wrote:
> I have one minor suggestion on the code -- see below -- but I'm willing
> to ack that with or without the suggested change. Please also see
> suggestions on kprobes.txt and the demo program.
Thanks...I've made the necessary changes. More comments below.
> It would be more consistent with the existing text in kprobes.txt to add
> a subsection labeled "User's entry handler (rp->entry_handler):" and
> document the entry_handler there.
I've moved almost all of the entry_handler discussion in a separate section.
> > static struct kretprobe my_kretprobe = {
> > - .handler = ret_handler,
> > - /* Probe up to 20 instances concurrently. */
> > - .maxactive = 20
> > + .handler = return_handler,
> > + .entry_handler = entry_handler,
> > + .data_size = sizeof(struct my_data),
> > + .maxactive = 1, /* profile one invocation at a time */
>
> I don't like the idea of setting maxactive = 1 here. That's not normal
> kretprobes usage, which is what we're trying to illustrate here. This
> is no place for splitting hairs about profiling recursive functions.
I've reverted back to ".maxactive = 20". In any case, there is no need
to protect against recursive instances of sys_open.
> > @@ -699,6 +699,14 @@ static int __kprobes pre_handler_kretpro
> > struct kretprobe_instance, uflist);
> > ri->rp = rp;
> > ri->task = current;
> > +
> > + if (rp->entry_handler) {
> > + if (rp->entry_handler(ri, regs)) {
>
> Could also be
> if (rp->entry_handler && rp->entry_handler(ri, regs)) {
Done.
---
Signed-off-by: Abhishek Sagar <[email protected]>
diff -upNr linux-2.6.24-rc3/Documentation/kprobes.txt
linux-2.6.24-rc3_kp/Documentation/kprobes.txt
--- linux-2.6.24-rc3/Documentation/kprobes.txt 2007-11-17
10:46:36.000000000 +0530
+++ linux-2.6.24-rc3_kp/Documentation/kprobes.txt 2007-11-21
15:20:53.000000000 +0530
@@ -96,7 +96,9 @@ or in registers (e.g., for x86_64 or for
The jprobe will work in either case, so long as the handler's
prototype matches that of the probed function.
-1.3 How Does a Return Probe Work?
+1.3 Return Probes
+
+1.3.1 How Does a Return Probe Work?
When you call register_kretprobe(), Kprobes establishes a kprobe at
the entry to the function. When the probed function is called and this
@@ -107,7 +109,7 @@ At boot time, Kprobes registers a kprobe
When the probed function executes its return instruction, control
passes to the trampoline and that probe is hit. Kprobes' trampoline
-handler calls the user-specified handler associated with the kretprobe,
+handler calls the user-specified return handler associated with the kretprobe,
then sets the saved instruction pointer to the saved return address,
and that's where execution resumes upon return from the trap.
@@ -131,6 +133,30 @@ zero when the return probe is registered
time the probed function is entered but there is no kretprobe_instance
object available for establishing the return probe.
+1.3.2 Kretprobe entry-handler
+
+Kretprobes also provides an optional user-specified handler which runs
+on function entry. This handler is specified by setting the entry_handler
+field of the kretprobe struct. Whenever the kprobe placed by kretprobe at the
+function entry is hit, the user-defined entry_handler, if any, is invoked.
+If the entry_handler returns 0 (success) then a corresponding return handler
+is guaranteed to be called upon function return. If the entry_handler
+returns a non-zero error then Kprobes leaves the return address as is, and
+the kretprobe has no further effect for that particular function instance.
+
+Multiple entry and return handler invocations are matched using the unique
+kretprobe_instance object associated with them. Additionally, a user
+may also specify per return-instance private data to be part of each
+kretprobe_instance object. This is especially useful when sharing private
+data between corresponding user entry and return handlers. The size of each
+private data object can be specified at kretprobe registration time by
+setting the data_size field of the kretprobe struct. This data can be
+accessed through the data field of each kretprobe_instance object.
+
+In case probed function is entered but there is no kretprobe_instance
+object available, then in addition to incrementing the nmissed count,
+the user entry_handler invocation is also skipped.
+
2. Architectures Supported
Kprobes, jprobes, and return probes are implemented on the following
@@ -273,6 +299,8 @@ of interest:
- ret_addr: the return address
- rp: points to the corresponding kretprobe object
- task: points to the corresponding task struct
+- data: points to per return-instance private data; see "Kretprobe entry-
+ handler" for details.
The regs_return_value(regs) macro provides a simple abstraction to
extract the return value from the appropriate register as defined by
@@ -555,23 +583,46 @@ report failed calls to sys_open().
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>
+#include <linux/ktime.h>
static const char *probed_func = "sys_open";
-/* Return-probe handler: If the probed function fails, log the return value. */
-static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
+/* per-instance private data */
+struct my_data {
+ ktime_t entry_stamp;
+};
+
+static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
+{
+ struct my_data *data;
+
+ if(!current->mm)
+ return 1; /* skip kernel threads */
+
+ data = (struct my_data *)ri->data;
+ data->entry_stamp = ktime_get();
+ return 0;
+}
+
+static int return_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
int retval = regs_return_value(regs);
- if (retval < 0) {
- printk("%s returns %d\n", probed_func, retval);
- }
+ struct my_data *data = (struct my_data *)ri->data;
+ s64 delta;
+ ktime_t now = ktime_get();
+
+ delta = ktime_to_ns(ktime_sub(now, data->entry_stamp));
+ if (retval < 0) /* probed function failed; log retval and duration */
+ printk("%s: return val = %d (duration = %lld ns)\n",
+ probed_func, retval, delta);
return 0;
}
static struct kretprobe my_kretprobe = {
- .handler = ret_handler,
- /* Probe up to 20 instances concurrently. */
- .maxactive = 20
+ .handler = return_handler,
+ .entry_handler = entry_handler,
+ .data_size = sizeof(struct my_data),
+ .maxactive = 20, /* probe up to 20 instances concurrently */
};
static int __init kretprobe_init(void)
@@ -580,10 +631,10 @@ static int __init kretprobe_init(void)
my_kretprobe.kp.symbol_name = (char *)probed_func;
if ((ret = register_kretprobe(&my_kretprobe)) < 0) {
- printk("register_kretprobe failed, returned %d\n", ret);
+ printk("Failed to register kretprobe!\n");
return -1;
}
- printk("Planted return probe at %p\n", my_kretprobe.kp.addr);
+ printk("Kretprobe active on %s\n", my_kretprobe.kp.symbol_name);
return 0;
}
@@ -591,7 +642,7 @@ static void __exit kretprobe_exit(void)
{
unregister_kretprobe(&my_kretprobe);
printk("kretprobe unregistered\n");
- /* nmissed > 0 suggests that maxactive was set too low. */
+ /* nmissed > 0 suggests that maxactive was set too low */
printk("Missed probing %d instances of %s\n",
my_kretprobe.nmissed, probed_func);
}
diff -upNr linux-2.6.24-rc3/include/linux/kprobes.h
linux-2.6.24-rc3_kp/include/linux/kprobes.h
--- linux-2.6.24-rc3/include/linux/kprobes.h 2007-11-17 10:46:36.000000000 +0530
+++ linux-2.6.24-rc3_kp/include/linux/kprobes.h 2007-11-21
13:40:48.000000000 +0530
@@ -152,8 +152,10 @@ static inline int arch_trampoline_kprobe
struct kretprobe {
struct kprobe kp;
kretprobe_handler_t handler;
+ kretprobe_handler_t entry_handler;
int maxactive;
int nmissed;
+ size_t data_size;
struct hlist_head free_instances;
struct hlist_head used_instances;
};
@@ -164,6 +166,7 @@ struct kretprobe_instance {
struct kretprobe *rp;
kprobe_opcode_t *ret_addr;
struct task_struct *task;
+ char data[0];
};
struct kretprobe_blackpoint {
diff -upNr linux-2.6.24-rc3/kernel/kprobes.c
linux-2.6.24-rc3_kp/kernel/kprobes.c
--- linux-2.6.24-rc3/kernel/kprobes.c 2007-11-17 10:46:36.000000000 +0530
+++ linux-2.6.24-rc3_kp/kernel/kprobes.c 2007-11-21 13:41:57.000000000 +0530
@@ -699,6 +699,12 @@ static int __kprobes pre_handler_kretpro
struct kretprobe_instance, uflist);
ri->rp = rp;
ri->task = current;
+
+ if (rp->entry_handler && rp->entry_handler(ri, regs)) {
+ spin_unlock_irqrestore(&kretprobe_lock, flags);
+ return 0;
+ }
+
arch_prepare_kretprobe(ri, regs);
/* XXX(hch): why is there no hlist_move_head? */
@@ -745,7 +751,8 @@ int __kprobes register_kretprobe(struct
INIT_HLIST_HEAD(&rp->used_instances);
INIT_HLIST_HEAD(&rp->free_instances);
for (i = 0; i < rp->maxactive; i++) {
- inst = kmalloc(sizeof(struct kretprobe_instance), GFP_KERNEL);
+ inst = kmalloc(sizeof(struct kretprobe_instance) +
+ rp->data_size, GFP_KERNEL);
if (inst == NULL) {
free_rp_inst(rp);
return -ENOMEM;
On Wed, 2007-11-21 at 15:50 +0530, Abhishek Sagar wrote:
> On 11/21/07, Jim Keniston <[email protected]> wrote:
> > I have one minor suggestion on the code -- see below -- but I'm willing
> > to ack that with or without the suggested change. Please also see
> > suggestions on kprobes.txt and the demo program.
>
> Thanks...I've made the necessary changes. More comments below.
>
> > It would be more consistent with the existing text in kprobes.txt to add
> > a subsection labeled "User's entry handler (rp->entry_handler):" and
> > document the entry_handler there.
>
> I've moved almost all of the entry_handler discussion in a separate section.
>
> > > static struct kretprobe my_kretprobe = {
> > > - .handler = ret_handler,
> > > - /* Probe up to 20 instances concurrently. */
> > > - .maxactive = 20
> > > + .handler = return_handler,
> > > + .entry_handler = entry_handler,
> > > + .data_size = sizeof(struct my_data),
> > > + .maxactive = 1, /* profile one invocation at a time */
> >
> > I don't like the idea of setting maxactive = 1 here. That's not normal
> > kretprobes usage, which is what we're trying to illustrate here. This
> > is no place for splitting hairs about profiling recursive functions.
>
> I've reverted back to ".maxactive = 20". In any case, there is no need
> to protect against recursive instances of sys_open.
>
> > > @@ -699,6 +699,14 @@ static int __kprobes pre_handler_kretpro
> > > struct kretprobe_instance, uflist);
> > > ri->rp = rp;
> > > ri->task = current;
> > > +
> > > + if (rp->entry_handler) {
> > > + if (rp->entry_handler(ri, regs)) {
> >
> > Could also be
> > if (rp->entry_handler && rp->entry_handler(ri, regs)) {
>
> Done.
>
> ---
> Signed-off-by: Abhishek Sagar <[email protected]>
Acked-by: Jim Keniston <[email protected]>
This works for me with the revised kretprobe-example.c and with another
test program I had lying around.
Jim
>
> diff -upNr linux-2.6.24-rc3/Documentation/kprobes.txt
> linux-2.6.24-rc3_kp/Documentation/kprobes.txt
> --- linux-2.6.24-rc3/Documentation/kprobes.txt 2007-11-17
> 10:46:36.000000000 +0530
> +++ linux-2.6.24-rc3_kp/Documentation/kprobes.txt 2007-11-21
> 15:20:53.000000000 +0530
> @@ -96,7 +96,9 @@ or in registers (e.g., for x86_64 or for
> The jprobe will work in either case, so long as the handler's
> prototype matches that of the probed function.
>
> -1.3 How Does a Return Probe Work?
> +1.3 Return Probes
> +
> +1.3.1 How Does a Return Probe Work?
>
> When you call register_kretprobe(), Kprobes establishes a kprobe at
> the entry to the function. When the probed function is called and this
> @@ -107,7 +109,7 @@ At boot time, Kprobes registers a kprobe
>
> When the probed function executes its return instruction, control
> passes to the trampoline and that probe is hit. Kprobes' trampoline
> -handler calls the user-specified handler associated with the kretprobe,
> +handler calls the user-specified return handler associated with the kretprobe,
> then sets the saved instruction pointer to the saved return address,
> and that's where execution resumes upon return from the trap.
>
> @@ -131,6 +133,30 @@ zero when the return probe is registered
> time the probed function is entered but there is no kretprobe_instance
> object available for establishing the return probe.
>
> +1.3.2 Kretprobe entry-handler
> +
> +Kretprobes also provides an optional user-specified handler which runs
> +on function entry. This handler is specified by setting the entry_handler
> +field of the kretprobe struct. Whenever the kprobe placed by kretprobe at the
> +function entry is hit, the user-defined entry_handler, if any, is invoked.
> +If the entry_handler returns 0 (success) then a corresponding return handler
> +is guaranteed to be called upon function return. If the entry_handler
> +returns a non-zero error then Kprobes leaves the return address as is, and
> +the kretprobe has no further effect for that particular function instance.
> +
> +Multiple entry and return handler invocations are matched using the unique
> +kretprobe_instance object associated with them. Additionally, a user
> +may also specify per return-instance private data to be part of each
> +kretprobe_instance object. This is especially useful when sharing private
> +data between corresponding user entry and return handlers. The size of each
> +private data object can be specified at kretprobe registration time by
> +setting the data_size field of the kretprobe struct. This data can be
> +accessed through the data field of each kretprobe_instance object.
> +
> +In case probed function is entered but there is no kretprobe_instance
> +object available, then in addition to incrementing the nmissed count,
> +the user entry_handler invocation is also skipped.
> +
> 2. Architectures Supported
>
> Kprobes, jprobes, and return probes are implemented on the following
> @@ -273,6 +299,8 @@ of interest:
> - ret_addr: the return address
> - rp: points to the corresponding kretprobe object
> - task: points to the corresponding task struct
> +- data: points to per return-instance private data; see "Kretprobe entry-
> + handler" for details.
>
> The regs_return_value(regs) macro provides a simple abstraction to
> extract the return value from the appropriate register as defined by
> @@ -555,23 +583,46 @@ report failed calls to sys_open().
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/kprobes.h>
> +#include <linux/ktime.h>
>
> static const char *probed_func = "sys_open";
>
> -/* Return-probe handler: If the probed function fails, log the return value. */
> -static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> +/* per-instance private data */
> +struct my_data {
> + ktime_t entry_stamp;
> +};
> +
> +static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> +{
> + struct my_data *data;
> +
> + if(!current->mm)
> + return 1; /* skip kernel threads */
> +
> + data = (struct my_data *)ri->data;
> + data->entry_stamp = ktime_get();
> + return 0;
> +}
> +
> +static int return_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> {
> int retval = regs_return_value(regs);
> - if (retval < 0) {
> - printk("%s returns %d\n", probed_func, retval);
> - }
> + struct my_data *data = (struct my_data *)ri->data;
> + s64 delta;
> + ktime_t now = ktime_get();
> +
> + delta = ktime_to_ns(ktime_sub(now, data->entry_stamp));
> + if (retval < 0) /* probed function failed; log retval and duration */
> + printk("%s: return val = %d (duration = %lld ns)\n",
> + probed_func, retval, delta);
> return 0;
> }
>
> static struct kretprobe my_kretprobe = {
> - .handler = ret_handler,
> - /* Probe up to 20 instances concurrently. */
> - .maxactive = 20
> + .handler = return_handler,
> + .entry_handler = entry_handler,
> + .data_size = sizeof(struct my_data),
> + .maxactive = 20, /* probe up to 20 instances concurrently */
> };
>
> static int __init kretprobe_init(void)
> @@ -580,10 +631,10 @@ static int __init kretprobe_init(void)
> my_kretprobe.kp.symbol_name = (char *)probed_func;
>
> if ((ret = register_kretprobe(&my_kretprobe)) < 0) {
> - printk("register_kretprobe failed, returned %d\n", ret);
> + printk("Failed to register kretprobe!\n");
> return -1;
> }
> - printk("Planted return probe at %p\n", my_kretprobe.kp.addr);
> + printk("Kretprobe active on %s\n", my_kretprobe.kp.symbol_name);
> return 0;
> }
>
> @@ -591,7 +642,7 @@ static void __exit kretprobe_exit(void)
> {
> unregister_kretprobe(&my_kretprobe);
> printk("kretprobe unregistered\n");
> - /* nmissed > 0 suggests that maxactive was set too low. */
> + /* nmissed > 0 suggests that maxactive was set too low */
> printk("Missed probing %d instances of %s\n",
> my_kretprobe.nmissed, probed_func);
> }
> diff -upNr linux-2.6.24-rc3/include/linux/kprobes.h
> linux-2.6.24-rc3_kp/include/linux/kprobes.h
> --- linux-2.6.24-rc3/include/linux/kprobes.h 2007-11-17 10:46:36.000000000 +0530
> +++ linux-2.6.24-rc3_kp/include/linux/kprobes.h 2007-11-21
> 13:40:48.000000000 +0530
> @@ -152,8 +152,10 @@ static inline int arch_trampoline_kprobe
> struct kretprobe {
> struct kprobe kp;
> kretprobe_handler_t handler;
> + kretprobe_handler_t entry_handler;
> int maxactive;
> int nmissed;
> + size_t data_size;
> struct hlist_head free_instances;
> struct hlist_head used_instances;
> };
> @@ -164,6 +166,7 @@ struct kretprobe_instance {
> struct kretprobe *rp;
> kprobe_opcode_t *ret_addr;
> struct task_struct *task;
> + char data[0];
> };
>
> struct kretprobe_blackpoint {
> diff -upNr linux-2.6.24-rc3/kernel/kprobes.c
> linux-2.6.24-rc3_kp/kernel/kprobes.c
> --- linux-2.6.24-rc3/kernel/kprobes.c 2007-11-17 10:46:36.000000000 +0530
> +++ linux-2.6.24-rc3_kp/kernel/kprobes.c 2007-11-21 13:41:57.000000000 +0530
> @@ -699,6 +699,12 @@ static int __kprobes pre_handler_kretpro
> struct kretprobe_instance, uflist);
> ri->rp = rp;
> ri->task = current;
> +
> + if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> + spin_unlock_irqrestore(&kretprobe_lock, flags);
> + return 0;
> + }
> +
> arch_prepare_kretprobe(ri, regs);
>
> /* XXX(hch): why is there no hlist_move_head? */
> @@ -745,7 +751,8 @@ int __kprobes register_kretprobe(struct
> INIT_HLIST_HEAD(&rp->used_instances);
> INIT_HLIST_HEAD(&rp->free_instances);
> for (i = 0; i < rp->maxactive; i++) {
> - inst = kmalloc(sizeof(struct kretprobe_instance), GFP_KERNEL);
> + inst = kmalloc(sizeof(struct kretprobe_instance) +
> + rp->data_size, GFP_KERNEL);
> if (inst == NULL) {
> free_rp_inst(rp);
> return -ENOMEM;