2012-05-02 19:34:32

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

From: Steven Rostedt <[email protected]>

If a probe is placed on a ftrace nop (or ftrace_caller), simply move the
probe to the next instruction instead of rejecting it. This will allow
kprobes not to be affected by ftrace using the -mfentry gcc option which
will put the ftrace nop at the beginning of the function. As a very common
case for kprobes is to add a probe to the very beginning of a function
we need a way to handle the case when ftrace takes the first instruction.

Added KPROBE_FLAG_MOVED (as suggested by Masami) that is set when the address
is moved to get around an ftrace nop.

Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/ftrace.h | 2 ++
include/linux/kprobes.h | 1 +
kernel/kprobes.c | 13 ++++++++++++-
3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9310993..211ae45 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -414,6 +414,8 @@ static inline int ftrace_text_reserved(void *start, void *end)
#define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; })
#define ftrace_free_filter(ops) do { } while (0)

+static inline unsigned long ftrace_location(unsigned long ip) { return 0; }
+
static inline ssize_t ftrace_filter_write(struct file *file, const char __user *ubuf,
size_t cnt, loff_t *ppos) { return -ENODEV; }
static inline ssize_t ftrace_notrace_write(struct file *file, const char __user *ubuf,
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index b6e1f8c..23cf41e 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -128,6 +128,7 @@ struct kprobe {
* NOTE:
* this flag is only for optimized_kprobe.
*/
+#define KPROBE_FLAG_MOVED 8 /* probe was moved passed ftrace nop */

/* Has this kprobe gone ? */
static inline int kprobe_gone(struct kprobe *p)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index c62b854..952619b 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1319,10 +1319,22 @@ int __kprobes register_kprobe(struct kprobe *p)
struct kprobe *old_p;
struct module *probed_mod;
kprobe_opcode_t *addr;
+ unsigned long ftrace_addr;

addr = kprobe_addr(p);
if (IS_ERR(addr))
return PTR_ERR(addr);
+
+ /*
+ * If the address is located on a ftrace nop, set the
+ * breakpoint to the following instruction.
+ */
+ ftrace_addr = ftrace_location((unsigned long)addr);
+ if (unlikely(ftrace_addr)) {
+ addr = (kprobe_opcode_t *)(ftrace_addr + MCOUNT_INSN_SIZE);
+ p->flags |= KPROBE_FLAG_MOVED;
+ }
+
p->addr = addr;

ret = check_kprobe_rereg(p);
@@ -1333,7 +1345,6 @@ int __kprobes register_kprobe(struct kprobe *p)
preempt_disable();
if (!kernel_text_address((unsigned long) p->addr) ||
in_kprobes_functions((unsigned long) p->addr) ||
- ftrace_text_reserved(p->addr, p->addr) ||
jump_label_text_reserved(p->addr, p->addr)) {
ret = -EINVAL;
goto cannot_probe;
--
1.7.9.5


2012-05-02 20:40:30

by Frank Ch. Eigler

[permalink] [raw]
Subject: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)


rostedt wrote:

> [...] Added KPROBE_FLAG_MOVED (as suggested by Masami) that is set
> when the address is moved to get around an ftrace nop. [...]

Steve, perhaps my earlier comments on this got lost during the mailing
list outage.

The gist is that a KPROBE_FLAG_MOVED being set this way accomplishes
very little since nothing is looking for that flag. Instead, you
should patch {arch/*}/kernel/kprobe.c kprobe_handler() to subtract
MCOUNT_INSN_SIZE back out from pt_regs->ip if KPROBE_FLAG_MOVED was
set. That way, kprobes clients need do not perceive the int3 movement.

- FChE

2012-05-02 23:40:36

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

On Wed, 2012-05-02 at 16:40 -0400, Frank Ch. Eigler wrote:
> rostedt wrote:
>
> > [...] Added KPROBE_FLAG_MOVED (as suggested by Masami) that is set
> > when the address is moved to get around an ftrace nop. [...]
>
> Steve, perhaps my earlier comments on this got lost during the mailing
> list outage.

I saw it, but it didn't really specify what you wanted. Here's your
comment:


> I suspect Masami intended that this flag is later used during int3
> processing to subtract MCOUNT_INSN_SIZE back out from the pt_regs->ip
> during kprobe_handler() if this flag was set.

This is what I thought too, but to me it sounded like Masami could do
the work. I was just setting up a flag to make it possible.

>
> The gist is that a KPROBE_FLAG_MOVED being set this way accomplishes
> very little since nothing is looking for that flag. Instead, you
> should patch {arch/*}/kernel/kprobe.c kprobe_handler() to subtract
> MCOUNT_INSN_SIZE back out from pt_regs->ip if KPROBE_FLAG_MOVED was
> set. That way, kprobes clients need do not perceive the int3 movement.

I basically thought that Masami wanted me to add the flag, and then
others could look for this and do the adjustment. I'm not the kprobes
author. I was just adding a flag that Masami and others could use to do
such updates.

I'm not sure if the adjustment is fine with everyone, as it may cause
repercussions that I don't know about.

Perhaps that could be another patch (want to write it?)

-- Steve

Subject: Re: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

(2012/05/03 8:40), Steven Rostedt wrote:
> On Wed, 2012-05-02 at 16:40 -0400, Frank Ch. Eigler wrote:
>> rostedt wrote:
>>
>>> [...] Added KPROBE_FLAG_MOVED (as suggested by Masami) that is set
>>> when the address is moved to get around an ftrace nop. [...]
>>
>> Steve, perhaps my earlier comments on this got lost during the mailing
>> list outage.
>
> I saw it, but it didn't really specify what you wanted. Here's your
> comment:
>
>
>> I suspect Masami intended that this flag is later used during int3
>> processing to subtract MCOUNT_INSN_SIZE back out from the pt_regs->ip
>> during kprobe_handler() if this flag was set.
>
> This is what I thought too, but to me it sounded like Masami could do
> the work. I was just setting up a flag to make it possible.
>
>>
>> The gist is that a KPROBE_FLAG_MOVED being set this way accomplishes
>> very little since nothing is looking for that flag. Instead, you
>> should patch {arch/*}/kernel/kprobe.c kprobe_handler() to subtract
>> MCOUNT_INSN_SIZE back out from pt_regs->ip if KPROBE_FLAG_MOVED was
>> set. That way, kprobes clients need do not perceive the int3 movement.
>
> I basically thought that Masami wanted me to add the flag, and then
> others could look for this and do the adjustment. I'm not the kprobes
> author. I was just adding a flag that Masami and others could use to do
> such updates.

Right, that was what I thought. Since the kp->addr is changed when
kprobe is set, kprobes itself don't need to adjust the pt_regs->ip.
I mean, struct kprobe itself puts a probe on the next to the mcount
entry, even if the caller tries to put a probe on the mcount entry.

This change may be unintended and caller will doubt that why the
kp->addr is automatically changed. So this KPROBE_FLAG_MOVED gives
a hint for the caller who knows the original intended probed address.

> I'm not sure if the adjustment is fine with everyone, as it may cause
> repercussions that I don't know about.

Yeah, that's a point. if the adjustment is transparently done, there
is no problem. But it changes kp->addr when registering a probe.
If adjustment is done, following code (still) doesn't work.

---
int func(struct kprobe *kp, strcut pt_regs *regs)
{
BUG_ON(kp->addr != regs->ip);
/* or */
store_probed_address(kp->addr); /* since regs->ip depends on x86*/
}

kp->handler = func;
kp->addr = <somewhere on ftrace>
register_kprobe(kp);
---

but if adjustment is not done, at least, kprobes behavior itself
looks same. (but just be moved if probed on ftrace)

Yeah, I know systemtap people likes regs->ip to be adjusted, but
there may be someone who use raw kprobes.

> Perhaps that could be another patch (want to write it?)

Oh, so I think we need to show the new flag on debugfs for
someone who want to know why the probe has been moved. :)


By the way, there is another way to do that transparently which
we add a "real_addr" member to struct kprobes and put the real
probed address to the member (without changing kp->addr). This
will keep API compatibility.

Thank you,


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

(2012/05/07 20:37), Masami Hiramatsu wrote:
> (2012/05/03 8:40), Steven Rostedt wrote:
>> On Wed, 2012-05-02 at 16:40 -0400, Frank Ch. Eigler wrote:
>>> rostedt wrote:
>>>
>>>> [...] Added KPROBE_FLAG_MOVED (as suggested by Masami) that is set
>>>> when the address is moved to get around an ftrace nop. [...]
>>>
>>> Steve, perhaps my earlier comments on this got lost during the mailing
>>> list outage.
>>
>> I saw it, but it didn't really specify what you wanted. Here's your
>> comment:
>>
>>
>>> I suspect Masami intended that this flag is later used during int3
>>> processing to subtract MCOUNT_INSN_SIZE back out from the pt_regs->ip
>>> during kprobe_handler() if this flag was set.
>>
>> This is what I thought too, but to me it sounded like Masami could do
>> the work. I was just setting up a flag to make it possible.
>>
>>>
>>> The gist is that a KPROBE_FLAG_MOVED being set this way accomplishes
>>> very little since nothing is looking for that flag. Instead, you
>>> should patch {arch/*}/kernel/kprobe.c kprobe_handler() to subtract
>>> MCOUNT_INSN_SIZE back out from pt_regs->ip if KPROBE_FLAG_MOVED was
>>> set. That way, kprobes clients need do not perceive the int3 movement.
>>
>> I basically thought that Masami wanted me to add the flag, and then
>> others could look for this and do the adjustment. I'm not the kprobes
>> author. I was just adding a flag that Masami and others could use to do
>> such updates.
>
> Right, that was what I thought. Since the kp->addr is changed when
> kprobe is set, kprobes itself don't need to adjust the pt_regs->ip.
> I mean, struct kprobe itself puts a probe on the next to the mcount
> entry, even if the caller tries to put a probe on the mcount entry.
>
> This change may be unintended and caller will doubt that why the
> kp->addr is automatically changed. So this KPROBE_FLAG_MOVED gives
> a hint for the caller who knows the original intended probed address.
>
>> I'm not sure if the adjustment is fine with everyone, as it may cause
>> repercussions that I don't know about.
>
> Yeah, that's a point. if the adjustment is transparently done, there
> is no problem. But it changes kp->addr when registering a probe.
> If adjustment is done, following code (still) doesn't work.
>
> ---
> int func(struct kprobe *kp, strcut pt_regs *regs)
> {
> BUG_ON(kp->addr != regs->ip);
> /* or */
> store_probed_address(kp->addr); /* since regs->ip depends on x86*/
> }
>
> kp->handler = func;
> kp->addr = <somewhere on ftrace>
> register_kprobe(kp);
> ---
>
> but if adjustment is not done, at least, kprobes behavior itself
> looks same. (but just be moved if probed on ftrace)
>
> Yeah, I know systemtap people likes regs->ip to be adjusted, but
> there may be someone who use raw kprobes.
>
>> Perhaps that could be another patch (want to write it?)
>
> Oh, so I think we need to show the new flag on debugfs for
> someone who want to know why the probe has been moved. :)

Hmm, I hit another good idea. :)

Adding an optional flag for kprobes like KPROBE_FLAG_ALLOWMOVE, and
only if it is set, kprobes moves probe on ftrace, and adjust pt_regs
(on arch which supports dynamic-ftrace and kprobes).
If not, it rejects the probe.

This will not break any backward compatibility and also encapsulates
arch-dependent address adjustment. (and also, it can be a separated
patches)

BTW, Steven, is this series already put on some git repository?
I'd like to pull it to work on that.

Thank you,


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2012-05-07 12:43:27

by Steven Rostedt

[permalink] [raw]
Subject: Re: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

On Mon, 2012-05-07 at 20:37 +0900, Masami Hiramatsu wrote:

> By the way, there is another way to do that transparently which
> we add a "real_addr" member to struct kprobes and put the real
> probed address to the member (without changing kp->addr). This
> will keep API compatibility.

I like this a lot. Perhaps we don't need a flag at all, and just use
these two addrs instead?

-- Steve

2012-05-07 12:44:45

by Steven Rostedt

[permalink] [raw]
Subject: Re: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

On Mon, 2012-05-07 at 20:59 +0900, Masami Hiramatsu wrote:

> BTW, Steven, is this series already put on some git repository?
> I'd like to pull it to work on that.

It's currently only on my local machines because I rebase it a lot. I
can push it out for you to play with, but it is not in its final form
(yet) and may be rebased once again. Thus, don't rely on it ;-)

-- Steve

2012-05-07 12:51:18

by Steven Rostedt

[permalink] [raw]
Subject: Re: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

On Mon, 2012-05-07 at 08:44 -0400, Steven Rostedt wrote:
> On Mon, 2012-05-07 at 20:59 +0900, Masami Hiramatsu wrote:
>
> > BTW, Steven, is this series already put on some git repository?
> > I'd like to pull it to work on that.
>
> It's currently only on my local machines because I rebase it a lot. I
> can push it out for you to play with, but it is not in its final form
> (yet) and may be rebased once again. Thus, don't rely on it ;-)
>

I just pushed this branch to my repo:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git

tip/perf/next

-- Steve

Subject: Re: Re: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

(2012/05/07 21:43), Steven Rostedt wrote:
> On Mon, 2012-05-07 at 20:37 +0900, Masami Hiramatsu wrote:
>
>> By the way, there is another way to do that transparently which
>> we add a "real_addr" member to struct kprobes and put the real
>> probed address to the member (without changing kp->addr). This
>> will keep API compatibility.
>
> I like this a lot. Perhaps we don't need a flag at all, and just use
> these two addrs instead?

Yes, just replacing all "*p->addr" and "kp.addr" with real_addr
and copying addr to real_addr when registering. That's the basic
idea.

I just concern about the cost balance... this is feasible, but
we need to change many arch-dependent parts. Do we really need to
pay that cost just for the backward compatibility?

I mean, the kprobes itself can be changed because we don't ensure
the stability of kernel APIs. If so, it looks enough to change the
behavior of kprobes and give an upgrade hint for users (current patch),
isn't it?

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2012-05-08 13:04:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: Re: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

On Tue, 2012-05-08 at 12:08 +0900, Masami Hiramatsu wrote:
> (2012/05/07 21:43), Steven Rostedt wrote:
> > On Mon, 2012-05-07 at 20:37 +0900, Masami Hiramatsu wrote:
> >
> >> By the way, there is another way to do that transparently which
> >> we add a "real_addr" member to struct kprobes and put the real
> >> probed address to the member (without changing kp->addr). This
> >> will keep API compatibility.
> >
> > I like this a lot. Perhaps we don't need a flag at all, and just use
> > these two addrs instead?
>
> Yes, just replacing all "*p->addr" and "kp.addr" with real_addr
> and copying addr to real_addr when registering. That's the basic
> idea.
>
> I just concern about the cost balance... this is feasible, but
> we need to change many arch-dependent parts. Do we really need to
> pay that cost just for the backward compatibility?
>
> I mean, the kprobes itself can be changed because we don't ensure
> the stability of kernel APIs. If so, it looks enough to change the
> behavior of kprobes and give an upgrade hint for users (current patch),
> isn't it?
>
> Thank you,

I guess the question is what's best long term. That's what I would like
to do. If a flag is "good enough" for both now and long term, than
that's fine with me. But if we find that it would be better to have a
"real_addr" then we should do it now and bite the bullet with all archs.

Otherwise, we'll have all the archs doing something special with the
MOVE flag and that would cause even more pain to update it later.

I also like the real addr because it helps with the optimize probes. We
only need to search one location. This doesn't matter with this patch
set, but with the code I have that uses ftrace hooks. One solution with
that is to have the optimize code see that the probe was moved, (or its
real addr was on a ftrace nop) and then just use the ftrace code on
optimization instead of normal optimizations (replacing with a jump).

Note, the big difference with using ftrace optimization and normal
kprobe jump optimization is that the ftrace one can be used on a preempt
kernel. But this code is still under development. I want to get a
solution for the current code (this patch set) now. It would be nice if
it was ready for 3.5.


-- Steve

Subject: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

(2012/05/08 22:04), Steven Rostedt wrote:
> I guess the question is what's best long term. That's what I would like
> to do. If a flag is "good enough" for both now and long term, than
> that's fine with me. But if we find that it would be better to have a
> "real_addr" then we should do it now and bite the bullet with all archs.

Well, I was not sure that the moving probe address method was the
short-term solution. Maybe that was wrong.

>
> Otherwise, we'll have all the archs doing something special with the
> MOVE flag and that would cause even more pain to update it later.

Just a comment. If user find that the MOVE flag is set, then they can
choose;
- reject the probing request which on the ftrace
- stores original IP on another variable and use that instead of
kp->regs.
So, they don't need to adjust address for each arch. :)


> I also like the real addr because it helps with the optimize probes. We
> only need to search one location. This doesn't matter with this patch
> set, but with the code I have that uses ftrace hooks. One solution with
> that is to have the optimize code see that the probe was moved, (or its
> real addr was on a ftrace nop) and then just use the ftrace code on
^^^^^^^^^ would you mean addr? :)
> optimization instead of normal optimizations (replacing with a jump).

OK, I misunderstood. I thought that ftrace-optimization could replace
the moving probe address solution, but it couldn't.
For example, jprobe, which puts a probe on the entry of function and
change IP to special handler, can not be optimized even with ftrace.
Thus, we still need to move probe address to the next instruction.

So, I agree with you. We need real_addr solution for transparent
moving the probepoint.

> Note, the big difference with using ftrace optimization and normal
> kprobe jump optimization is that the ftrace one can be used on a preempt
> kernel. But this code is still under development. I want to get a
> solution for the current code (this patch set) now. It would be nice if
> it was ready for 3.5.

I doubt that we can really do this. If this is possible, I can make
jump optimization work with preemptive kernel.

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

(2012/05/09 14:53), Masami Hiramatsu wrote:
> (2012/05/08 22:04), Steven Rostedt wrote:
>> Note, the big difference with using ftrace optimization and normal
>> kprobe jump optimization is that the ftrace one can be used on a preempt
>> kernel. But this code is still under development. I want to get a
>> solution for the current code (this patch set) now. It would be nice if
>> it was ready for 3.5.
>
> I doubt that we can really do this. If this is possible, I can make
> jump optimization work with preemptive kernel.

Now I'm clear that it can be done only with ftrace-nop.
What I concerned with is out-of-line execution buffer releasing
timing. but with the ftrace-nop, we don't need to do that.

Theoretically, that will be done in following steps;

1. call mcount (w/ push call-site address+5)
2. mcount calls ftrace handlers
3. mcount makes pt_regs on the stack
- on x86-32, regs->cs should be the call-site address+5, since
regs->sp must be the top of original stack.
- on x86-64, it is enough that regs->sp points the top of original
stack.
4. mcount store call-site address+5 to %ax
5. mcount calls new-optprobe handler (which returns call-site address+5)
6. mcount restores registers
7. mcount returns to the call-site address+5.

Between 6 and 7, original opt-probe has to do out-of-line execution,
but this ftrace-based one doesn't need to do that. In that case,
we don't need to allocate nor to release out-of-line execution buffer
for this probe.

Thank you!

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

(2012/05/09 17:12), Masami Hiramatsu wrote:
> (2012/05/09 14:53), Masami Hiramatsu wrote:
>> (2012/05/08 22:04), Steven Rostedt wrote:
>>> Note, the big difference with using ftrace optimization and normal
>>> kprobe jump optimization is that the ftrace one can be used on a preempt
>>> kernel. But this code is still under development. I want to get a
>>> solution for the current code (this patch set) now. It would be nice if
>>> it was ready for 3.5.
>>
>> I doubt that we can really do this. If this is possible, I can make
>> jump optimization work with preemptive kernel.
>
> Now I'm clear that it can be done only with ftrace-nop.
> What I concerned with is out-of-line execution buffer releasing
> timing. but with the ftrace-nop, we don't need to do that.
>
> Theoretically, that will be done in following steps;
>
> 1. call mcount (w/ push call-site address+5)
> 2. mcount calls ftrace handlers
> 3. mcount makes pt_regs on the stack
> - on x86-32, regs->cs should be the call-site address+5, since
> regs->sp must be the top of original stack.

Oops, no, %flags is stored on regs->cs and regs->flags is
call-site address. So we have to fix it up in optprobe handler.

> - on x86-64, it is enough that regs->sp points the top of original
> stack.
> 4. mcount store call-site address+5 to %ax
> 5. mcount calls new-optprobe handler (which returns call-site address+5)
> 6. mcount restores registers
> 7. mcount returns to the call-site address+5.
>
> Between 6 and 7, original opt-probe has to do out-of-line execution,
> but this ftrace-based one doesn't need to do that. In that case,
> we don't need to allocate nor to release out-of-line execution buffer
> for this probe.

And I got an concrete idea of kprobes call-optimization which
uses call instead of jump. That will make ftrace-based optimization
very easy. :)

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2012-05-09 14:50:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 6/9][RFC] kprobes: Allow probe on ftrace reserved text (but move it)

On Wed, 2012-05-09 at 18:02 +0900, Masami Hiramatsu wrote:

> And I got an concrete idea of kprobes call-optimization which
> uses call instead of jump. That will make ftrace-based optimization
> very easy. :)

I just uploaded my devel branch. If you want to make suggestions, or
even modify and add to what I have, feel free:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git

rfc/kprobes/ftrace-v2

-- Steve

Subject: [RFC PATCH -tip ] x86/kprobes: kprobes call optimization

Note: this code is still under development, but it will
be useful for the discussion to show this.

Use relative call instead of relative jump for optimizing
kprobes. This reduces x86-depend magic code and runtime
memory size.

Current jump-based optimization copies below regions
into an out-of-line buffer and add some portions to make
a detour path for each optimized probe.

ffffffff815f7009 T optprobe_template_entry
ffffffff815f7029 T optprobe_template_val
ffffffff815f7033 T optprobe_template_call
ffffffff815f7068 T optprobe_template_end

This actually consumes 0x5f == 95 bytes + copied
instructions(20bytes) + jump back (5bytes).

Since call-based optimization can share above templates
of trampoline code, it just requires 25bytes for each
optimized probe.

Steven, I saw your ftrace-based optimization code. And
this will do similar thing but a different way. I'd like
to reuse this code for getting pt_regs in both i386/x86-64.
So, that will be done as follows;

- ftrace-call calls mcount
- mcount calls ftrace handlers
- mcount checks if there is a kprobe
if so, it jumps into trampoline code
- trampoline code does same way if it is called from
probe point directly. But if it is called from mcount,
it directly returns into the ftrace-caller site.

Thank you,

Cc: Steven Rostedt <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
---

Documentation/kprobes.txt | 39 +++++---
arch/x86/include/asm/kprobes.h | 15 +--
arch/x86/kernel/kprobes-common.h | 31 ++++--
arch/x86/kernel/kprobes-opt.c | 191 ++++++++++++++++----------------------
arch/x86/kernel/kprobes.c | 4 -
kernel/kprobes.c | 6 -
6 files changed, 131 insertions(+), 155 deletions(-)

diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 0cfb00f..316d5d2 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -45,7 +45,7 @@ can speed up unregistration process when you have to unregister
a lot of probes at once.

The next four subsections explain how the different types of
-probes work and how jump optimization works. They explain certain
+probes work and how call optimization works. They explain certain
things that you'll need to know in order to make the best use of
Kprobes -- e.g., the difference between a pre_handler and
a post_handler, and how to use the maxactive and nmissed fields of
@@ -163,12 +163,12 @@ In case probed function is entered but there is no kretprobe_instance
object available, then in addition to incrementing the nmissed count,
the user entry_handler invocation is also skipped.

-1.4 How Does Jump Optimization Work?
+1.4 How Does Call Optimization Work?

If your kernel is built with CONFIG_OPTPROBES=y (currently this flag
is automatically set 'y' on x86/x86-64, non-preemptive kernel) and
the "debug.kprobes_optimization" kernel parameter is set to 1 (see
-sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump
+sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a call
instruction instead of a breakpoint instruction at each probepoint.

1.4.1 Init a Kprobe
@@ -182,9 +182,9 @@ probepoint, there'll be a probe there.

Before optimizing a probe, Kprobes performs the following safety checks:

-- Kprobes verifies that the region that will be replaced by the jump
+- Kprobes verifies that the region that will be replaced by the call
instruction (the "optimized region") lies entirely within one function.
-(A jump instruction is multiple bytes, and so may overlay multiple
+(A call instruction is multiple bytes, and so may overlay multiple
instructions.)

- Kprobes analyzes the entire function and verifies that there is no
@@ -204,9 +204,6 @@ the instruction can be executed out of line.

Next, Kprobes prepares a "detour" buffer, which contains the following
instruction sequence:
-- code to push the CPU's registers (emulating a breakpoint trap)
-- a call to the trampoline code which calls user's probe handlers.
-- code to restore registers
- the instructions from the optimized region
- a jump back to the original execution path.

@@ -231,7 +228,7 @@ the CPU's instruction pointer to the copied code in the detour buffer

1.4.5 Optimization

-The Kprobe-optimizer doesn't insert the jump instruction immediately;
+The Kprobe-optimizer doesn't insert the call instruction immediately;
rather, it calls synchronize_sched() for safety first, because it's
possible for a CPU to be interrupted in the middle of executing the
optimized region(*). As you know, synchronize_sched() can ensure
@@ -240,10 +237,20 @@ was called are done, but only if CONFIG_PREEMPT=n. So, this version
of kprobe optimization supports only kernels with CONFIG_PREEMPT=n.(**)

After that, the Kprobe-optimizer calls stop_machine() to replace
-the optimized region with a jump instruction to the detour buffer,
-using text_poke_smp().
+the optimized region with a call instruction to the trampoline code,
+by using text_poke_smp().

-1.4.6 Unoptimization
+1.4.6 Trampoline code
+
+The trampoline code doing as follows;
+- save all registers (to make pt_regs on stack)
+- pass call-address (which was on the top of original stack)
+ and pt_regs to optprobe_callback
+- optprobe callback finding kprobes on the call address and
+ returns the address of detour buffer
+- restore all registers and return to the address
+
+1.4.7 Unoptimization

When an optimized kprobe is unregistered, disabled, or blocked by
another kprobe, it will be unoptimized. If this happens before
@@ -571,7 +578,7 @@ reason, Kprobes doesn't support return probes (or kprobes or jprobes)
on the x86_64 version of __switch_to(); the registration functions
return -EINVAL.

-On x86/x86-64, since the Jump Optimization of Kprobes modifies
+On x86/x86-64, since the Call Optimization of Kprobes modifies
instructions widely, there are some limitations to optimization. To
explain it, we introduce some terminology. Imagine a 3-instruction
sequence consisting of a two 2-byte instructions and one 3-byte
@@ -593,7 +600,7 @@ DCR: Detoured Code Region

The instructions in DCR are copied to the out-of-line buffer
of the kprobe, because the bytes in DCR are replaced by
-a 5-byte jump instruction. So there are several limitations.
+a 5-byte call instruction. So there are several limitations.

a) The instructions in DCR must be relocatable.
b) The instructions in DCR must not include a call instruction.
@@ -704,8 +711,8 @@ Appendix B: The kprobes sysctl interface
/proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF.

When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides
-a knob to globally and forcibly turn jump optimization (see section
-1.4) ON or OFF. By default, jump optimization is allowed (ON).
+a knob to globally and forcibly turn the call optimization (see section
+1.4) ON or OFF. By default, call optimization is allowed (ON).
If you echo "0" to this file or set "debug.kprobes_optimization" to
0 via sysctl, all optimized probes will be unoptimized, and any new
probes registered after that will not be optimized. Note that this
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 5478825..c6e0dec 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -36,6 +36,7 @@ typedef u8 kprobe_opcode_t;
#define RELATIVEJUMP_OPCODE 0xe9
#define RELATIVEJUMP_SIZE 5
#define RELATIVECALL_OPCODE 0xe8
+#define RELATIVECALL_SIZE 5
#define RELATIVE_ADDR_SIZE 4
#define MAX_STACK_SIZE 64
#define MIN_STACK_SIZE(ADDR) \
@@ -47,16 +48,10 @@ typedef u8 kprobe_opcode_t;

#define flush_insn_slot(p) do { } while (0)

-/* optinsn template addresses */
-extern kprobe_opcode_t optprobe_template_entry;
-extern kprobe_opcode_t optprobe_template_val;
-extern kprobe_opcode_t optprobe_template_call;
-extern kprobe_opcode_t optprobe_template_end;
-#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
-#define MAX_OPTINSN_SIZE \
- (((unsigned long)&optprobe_template_end - \
- (unsigned long)&optprobe_template_entry) + \
- MAX_OPTIMIZED_LENGTH + RELATIVEJUMP_SIZE)
+/* optprobe trampoline addresses */
+void optprobe_trampoline(void);
+#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
+#define MAX_OPTINSN_SIZE (MAX_OPTIMIZED_LENGTH + RELATIVEJUMP_SIZE)

extern const int kretprobe_blacklist_size;

diff --git a/arch/x86/kernel/kprobes-common.h b/arch/x86/kernel/kprobes-common.h
index 3230b68..fedc8a3 100644
--- a/arch/x86/kernel/kprobes-common.h
+++ b/arch/x86/kernel/kprobes-common.h
@@ -4,9 +4,8 @@
/* Kprobes and Optprobes common header */

#ifdef CONFIG_X86_64
-#define SAVE_REGS_STRING \
- /* Skip cs, ip, orig_ax. */ \
- " subq $24, %rsp\n" \
+#define SAVE_REGS_STRING_OFF(offs) \
+ " subq $" #offs ", %rsp\n" \
" pushq %rdi\n" \
" pushq %rsi\n" \
" pushq %rdx\n" \
@@ -22,7 +21,7 @@
" pushq %r13\n" \
" pushq %r14\n" \
" pushq %r15\n"
-#define RESTORE_REGS_STRING \
+#define RESTORE_REGS_STRING_OFF(offs) \
" popq %r15\n" \
" popq %r14\n" \
" popq %r13\n" \
@@ -38,12 +37,15 @@
" popq %rdx\n" \
" popq %rsi\n" \
" popq %rdi\n" \
- /* Skip orig_ax, ip, cs */ \
- " addq $24, %rsp\n"
+ " addq $" #offs ", %rsp\n"
+
+/* skip cs, ip, orig_ax */
+#define SAVE_REGS_STRING SAVE_REGS_STRING_OFF(24)
+#define RESTORE_REGS_STRING RESTORE_REGS_STRING_OFF(24)
+
#else
-#define SAVE_REGS_STRING \
- /* Skip cs, ip, orig_ax and gs. */ \
- " subl $16, %esp\n" \
+#define SAVE_REGS_STRING_OFF(offs) \
+ " subl $" #offs ", %esp\n" \
" pushl %fs\n" \
" pushl %es\n" \
" pushl %ds\n" \
@@ -54,7 +56,7 @@
" pushl %edx\n" \
" pushl %ecx\n" \
" pushl %ebx\n"
-#define RESTORE_REGS_STRING \
+#define RESTORE_REGS_STRING_OFF(offs) \
" popl %ebx\n" \
" popl %ecx\n" \
" popl %edx\n" \
@@ -62,8 +64,13 @@
" popl %edi\n" \
" popl %ebp\n" \
" popl %eax\n" \
- /* Skip ds, es, fs, gs, orig_ax, and ip. Note: don't pop cs here*/\
- " addl $24, %esp\n"
+ " addl $" #offs ", %esp\n"
+
+/* skip cs, ip, orig_ax and gs. */
+#define SAVE_REGS_STRING SAVE_REGS_STRING_OFF(16)
+/* skip ds, es, fs, gs, orig_ax, and ip. Note: don't pop cs here*/
+#define RESTORE_REGS_STRING RESTORE_REGS_STRING_OFF(24)
+
#endif

/* Ensure if the instruction can be boostable */
diff --git a/arch/x86/kernel/kprobes-opt.c b/arch/x86/kernel/kprobes-opt.c
index c5e410e..0b0a382 100644
--- a/arch/x86/kernel/kprobes-opt.c
+++ b/arch/x86/kernel/kprobes-opt.c
@@ -1,5 +1,5 @@
/*
- * Kernel Probes Jump Optimization (Optprobes)
+ * Kernel Probes Call Optimization (Optprobes)
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -46,9 +46,9 @@ unsigned long __recover_optprobed_insn(kprobe_opcode_t *buf, unsigned long addr)
long offs;
int i;

- for (i = 0; i < RELATIVEJUMP_SIZE; i++) {
+ for (i = 0; i < RELATIVECALL_SIZE; i++) {
kp = get_kprobe((void *)addr - i);
- /* This function only handles jump-optimized kprobe */
+ /* This function only handles call-optimized kprobe */
if (kp && kprobe_optimized(kp)) {
op = container_of(kp, struct optimized_kprobe, kp);
/* If op->list is not empty, op is under optimizing */
@@ -61,7 +61,7 @@ unsigned long __recover_optprobed_insn(kprobe_opcode_t *buf, unsigned long addr)
found:
/*
* If the kprobe can be optimized, original bytes which can be
- * overwritten by jump destination address. In this case, original
+ * overwritten by call destination address. In this case, original
* bytes must be recovered from op->optinsn.copied_insn buffer.
*/
memcpy(buf, (void *)addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
@@ -76,83 +76,69 @@ found:
return (unsigned long)buf;
}

-/* Insert a move instruction which sets a pointer to eax/rdi (1st arg). */
-static void __kprobes synthesize_set_arg1(kprobe_opcode_t *addr, unsigned long val)
-{
-#ifdef CONFIG_X86_64
- *addr++ = 0x48;
- *addr++ = 0xbf;
-#else
- *addr++ = 0xb8;
-#endif
- *(unsigned long *)addr = val;
-}
-
-static void __used __kprobes kprobes_optinsn_template_holder(void)
+static void __used __kprobes optprobe_trampoline_holder(void)
{
asm volatile (
- ".global optprobe_template_entry\n"
- "optprobe_template_entry:\n"
+ ".global optprobe_trampoline\n"
+ "optprobe_trampoline:\n"
+ /* When call to here, the stack has call address+5 */
#ifdef CONFIG_X86_64
/* We don't bother saving the ss register */
" pushq %rsp\n"
" pushfq\n"
SAVE_REGS_STRING
- " movq %rsp, %rsi\n"
- ".global optprobe_template_val\n"
- "optprobe_template_val:\n"
- ASM_NOP5
- ASM_NOP5
- ".global optprobe_template_call\n"
- "optprobe_template_call:\n"
- ASM_NOP5
- /* Move flags to rsp */
- " movq 144(%rsp), %rdx\n"
- " movq %rdx, 152(%rsp)\n"
+ " movq %rsp, %rsi\n" /* pt_regs */
+ " movq 160(%rsp), %rdi\n" /* call address */
+ " call optimized_callback\n"
+ /* Replace call address with stub address */
+ " movq %rax, 160(%rsp)\n"
RESTORE_REGS_STRING
- /* Skip flags entry */
- " addq $8, %rsp\n"
" popfq\n"
+ /* Skip rsp entry */
+ " addq $8, %rsp\n"
#else /* CONFIG_X86_32 */
" pushf\n"
- SAVE_REGS_STRING
- " movl %esp, %edx\n"
- ".global optprobe_template_val\n"
- "optprobe_template_val:\n"
- ASM_NOP5
- ".global optprobe_template_call\n"
- "optprobe_template_call:\n"
- ASM_NOP5
+ /* skip only ip, orig_ax, and gs (flags saved on cs) */
+ SAVE_REGS_STRING_OFF(12)
+ " movl 56(%esp), %eax\n" /* call address */
+ /* recover flags from cs */
+ " movl 52(%esp), %edx\n"
+ " movl %edx, 56(%esp)\n"
+ " movl %esp, %edx\n" /* pt_regs */
+ " call optimized_callback\n"
+ /* Move flags to cs */
+ " movl 56(%esp), %edx\n"
+ " movl %edx, 52(%esp)\n"
+ /* Replace flags with stub address */
+ " movl %eax, 56(%esp)\n"
RESTORE_REGS_STRING
- " addl $4, %esp\n" /* skip cs */
" popf\n"
#endif
- ".global optprobe_template_end\n"
- "optprobe_template_end:\n");
+ " ret\n");
}

-#define TMPL_MOVE_IDX \
- ((long)&optprobe_template_val - (long)&optprobe_template_entry)
-#define TMPL_CALL_IDX \
- ((long)&optprobe_template_call - (long)&optprobe_template_entry)
-#define TMPL_END_IDX \
- ((long)&optprobe_template_end - (long)&optprobe_template_entry)
-
#define INT3_SIZE sizeof(kprobe_opcode_t)

/* Optimized kprobe call back function: called from optinsn */
-static void __kprobes optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
+static __used __kprobes kprobe_opcode_t *optimized_callback(kprobe_opcode_t *addr, struct pt_regs *regs)
{
- struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ struct optimized_kprobe *op;
+ struct kprobe_ctlblk *kcb;
unsigned long flags;
+ struct kprobe *p;
+
+ local_irq_save(flags);
+
+ kcb = get_kprobe_ctlblk();
+ p = get_kprobe(addr - RELATIVECALL_SIZE);
+ BUG_ON(!p);

/* This is possible if op is under delayed unoptimizing */
- if (kprobe_disabled(&op->kp))
- return;
+ if (kprobe_disabled(p))
+ goto end;

- local_irq_save(flags);
if (kprobe_running()) {
- kprobes_inc_nmissed_count(&op->kp);
+ kprobes_inc_nmissed_count(p);
} else {
/* Save skipped registers */
#ifdef CONFIG_X86_64
@@ -161,22 +147,26 @@ static void __kprobes optimized_callback(struct optimized_kprobe *op, struct pt_
regs->cs = __KERNEL_CS | get_kernel_rpl();
regs->gs = 0;
#endif
- regs->ip = (unsigned long)op->kp.addr + INT3_SIZE;
+ regs->ip = (unsigned long)p->addr + INT3_SIZE;
regs->orig_ax = ~0UL;

- __this_cpu_write(current_kprobe, &op->kp);
+ __this_cpu_write(current_kprobe, p);
kcb->kprobe_status = KPROBE_HIT_ACTIVE;
- opt_pre_handler(&op->kp, regs);
+ opt_pre_handler(p, regs);
__this_cpu_write(current_kprobe, NULL);
}
+end:
+ op = container_of(p, struct optimized_kprobe, kp);
+ addr = (kprobe_opcode_t *)(op->optinsn.insn);
local_irq_restore(flags);
+ return addr;
}

static int __kprobes copy_optimized_instructions(u8 *dest, u8 *src)
{
int len = 0, ret;

- while (len < RELATIVEJUMP_SIZE) {
+ while (len < RELATIVECALL_SIZE) {
ret = __copy_instruction(dest + len, src + len);
if (!ret || !can_boost(dest + len))
return -EINVAL;
@@ -245,8 +235,8 @@ static int __kprobes can_optimize(unsigned long paddr)
(paddr < (unsigned long)__entry_text_end))
return 0;

- /* Check there is enough space for a relative jump. */
- if (size - offset < RELATIVEJUMP_SIZE)
+ /* Check there is enough space for a relative call. */
+ if (size - offset < RELATIVECALL_SIZE)
return 0;

/* Decode instructions */
@@ -325,7 +315,6 @@ int __kprobes arch_prepare_optimized_kprobe(struct optimized_kprobe *op)
{
u8 *buf;
int ret;
- long rel;

if (!can_optimize((unsigned long)op->kp.addr))
return -EILSEQ;
@@ -334,70 +323,52 @@ int __kprobes arch_prepare_optimized_kprobe(struct optimized_kprobe *op)
if (!op->optinsn.insn)
return -ENOMEM;

- /*
- * Verify if the address gap is in 2GB range, because this uses
- * a relative jump.
- */
- rel = (long)op->optinsn.insn - (long)op->kp.addr + RELATIVEJUMP_SIZE;
- if (abs(rel) > 0x7fffffff)
- return -ERANGE;
-
buf = (u8 *)op->optinsn.insn;

/* Copy instructions into the out-of-line buffer */
- ret = copy_optimized_instructions(buf + TMPL_END_IDX, op->kp.addr);
+ ret = copy_optimized_instructions(buf, op->kp.addr);
if (ret < 0) {
__arch_remove_optimized_kprobe(op, 0);
return ret;
}
op->optinsn.size = ret;

- /* Copy arch-dep-instance from template */
- memcpy(buf, &optprobe_template_entry, TMPL_END_IDX);
-
- /* Set probe information */
- synthesize_set_arg1(buf + TMPL_MOVE_IDX, (unsigned long)op);
-
- /* Set probe function call */
- synthesize_relcall(buf + TMPL_CALL_IDX, optimized_callback);
-
/* Set returning jmp instruction at the tail of out-of-line buffer */
- synthesize_reljump(buf + TMPL_END_IDX + op->optinsn.size,
+ synthesize_reljump(buf + op->optinsn.size,
(u8 *)op->kp.addr + op->optinsn.size);

- flush_icache_range((unsigned long) buf,
- (unsigned long) buf + TMPL_END_IDX +
+ flush_icache_range((unsigned long) buf, (unsigned long) buf +
op->optinsn.size + RELATIVEJUMP_SIZE);
return 0;
}

#define MAX_OPTIMIZE_PROBES 256
-static struct text_poke_param *jump_poke_params;
-static struct jump_poke_buffer {
- u8 buf[RELATIVEJUMP_SIZE];
-} *jump_poke_bufs;
+static struct text_poke_param *call_poke_params;
+static struct call_poke_buffer {
+ u8 buf[RELATIVECALL_SIZE];
+} *call_poke_bufs;

static void __kprobes setup_optimize_kprobe(struct text_poke_param *tprm,
u8 *insn_buf,
struct optimized_kprobe *op)
{
- s32 rel = (s32)((long)op->optinsn.insn -
- ((long)op->kp.addr + RELATIVEJUMP_SIZE));
+ s32 rel = (s32)((long)&optprobe_trampoline -
+ ((long)op->kp.addr + RELATIVECALL_SIZE));

/* Backup instructions which will be replaced by jump address */
memcpy(op->optinsn.copied_insn, op->kp.addr + INT3_SIZE,
RELATIVE_ADDR_SIZE);

- insn_buf[0] = RELATIVEJUMP_OPCODE;
+ insn_buf[0] = RELATIVECALL_OPCODE;
*(s32 *)(&insn_buf[1]) = rel;

tprm->addr = op->kp.addr;
tprm->opcode = insn_buf;
- tprm->len = RELATIVEJUMP_SIZE;
+ tprm->len = RELATIVECALL_SIZE;
}

/*
- * Replace breakpoints (int3) with relative jumps.
+ * Replace breakpoints (int3) with relative calls.
* Caller must call with locking kprobe_mutex and text_mutex.
*/
void __kprobes arch_optimize_kprobes(struct list_head *oplist)
@@ -408,8 +379,8 @@ void __kprobes arch_optimize_kprobes(struct list_head *oplist)
list_for_each_entry_safe(op, tmp, oplist, list) {
WARN_ON(kprobe_disabled(&op->kp));
/* Setup param */
- setup_optimize_kprobe(&jump_poke_params[c],
- jump_poke_bufs[c].buf, op);
+ setup_optimize_kprobe(&call_poke_params[c],
+ call_poke_bufs[c].buf, op);
list_del_init(&op->list);
if (++c >= MAX_OPTIMIZE_PROBES)
break;
@@ -420,7 +391,7 @@ void __kprobes arch_optimize_kprobes(struct list_head *oplist)
* However, since kprobes itself also doesn't support NMI/MCE
* code probing, it's not a problem.
*/
- text_poke_smp_batch(jump_poke_params, c);
+ text_poke_smp_batch(call_poke_params, c);
}

static void __kprobes setup_unoptimize_kprobe(struct text_poke_param *tprm,
@@ -433,11 +404,11 @@ static void __kprobes setup_unoptimize_kprobe(struct text_poke_param *tprm,

tprm->addr = op->kp.addr;
tprm->opcode = insn_buf;
- tprm->len = RELATIVEJUMP_SIZE;
+ tprm->len = RELATIVECALL_SIZE;
}

/*
- * Recover original instructions and breakpoints from relative jumps.
+ * Recover original instructions and breakpoints from relative calls.
* Caller must call with locking kprobe_mutex.
*/
extern void arch_unoptimize_kprobes(struct list_head *oplist,
@@ -448,8 +419,8 @@ extern void arch_unoptimize_kprobes(struct list_head *oplist,

list_for_each_entry_safe(op, tmp, oplist, list) {
/* Setup param */
- setup_unoptimize_kprobe(&jump_poke_params[c],
- jump_poke_bufs[c].buf, op);
+ setup_unoptimize_kprobe(&call_poke_params[c],
+ call_poke_bufs[c].buf, op);
list_move(&op->list, done_list);
if (++c >= MAX_OPTIMIZE_PROBES)
break;
@@ -460,18 +431,18 @@ extern void arch_unoptimize_kprobes(struct list_head *oplist,
* However, since kprobes itself also doesn't support NMI/MCE
* code probing, it's not a problem.
*/
- text_poke_smp_batch(jump_poke_params, c);
+ text_poke_smp_batch(call_poke_params, c);
}

/* Replace a relative jump with a breakpoint (int3). */
void __kprobes arch_unoptimize_kprobe(struct optimized_kprobe *op)
{
- u8 buf[RELATIVEJUMP_SIZE];
+ u8 buf[RELATIVECALL_SIZE];

/* Set int3 to first byte for kprobes */
buf[0] = BREAKPOINT_INSTRUCTION;
memcpy(buf + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
- text_poke_smp(op->kp.addr, buf, RELATIVEJUMP_SIZE);
+ text_poke_smp(op->kp.addr, buf, RELATIVECALL_SIZE);
}

int __kprobes
@@ -483,7 +454,7 @@ setup_detour_execution(struct kprobe *p, struct pt_regs *regs, int reenter)
/* This kprobe is really able to run optimized path. */
op = container_of(p, struct optimized_kprobe, kp);
/* Detour through copied instructions */
- regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
+ regs->ip = (unsigned long)op->optinsn.insn;
if (!reenter)
reset_current_kprobe();
preempt_enable_no_resched();
@@ -495,16 +466,16 @@ setup_detour_execution(struct kprobe *p, struct pt_regs *regs, int reenter)
int __kprobes arch_init_optprobes(void)
{
/* Allocate code buffer and parameter array */
- jump_poke_bufs = kmalloc(sizeof(struct jump_poke_buffer) *
+ call_poke_bufs = kmalloc(sizeof(struct call_poke_buffer) *
MAX_OPTIMIZE_PROBES, GFP_KERNEL);
- if (!jump_poke_bufs)
+ if (!call_poke_bufs)
return -ENOMEM;

- jump_poke_params = kmalloc(sizeof(struct text_poke_param) *
+ call_poke_params = kmalloc(sizeof(struct text_poke_param) *
MAX_OPTIMIZE_PROBES, GFP_KERNEL);
- if (!jump_poke_params) {
- kfree(jump_poke_bufs);
- jump_poke_bufs = NULL;
+ if (!call_poke_params) {
+ kfree(call_poke_bufs);
+ call_poke_bufs = NULL;
return -ENOMEM;
}

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index e2f751e..2fe982c 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -276,8 +276,8 @@ static int __kprobes can_probe(unsigned long paddr)
* Check if the instruction has been modified by another
* kprobe, in which case we replace the breakpoint by the
* original instruction in our buffer.
- * Also, jump optimization will change the breakpoint to
- * relative-jump. Since the relative-jump itself is
+ * Also, call optimization will change the breakpoint to
+ * relative-call. Since the relative-call itself is
* normally used, we just go through if there is no kprobe.
*/
__addr = recover_probed_instruction(buf, addr);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index c62b854..b1552dd 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -290,7 +290,7 @@ void __kprobes free_insn_slot(kprobe_opcode_t * slot, int dirty)
static DEFINE_MUTEX(kprobe_optinsn_mutex); /* Protects kprobe_optinsn_slots */
static struct kprobe_insn_cache kprobe_optinsn_slots = {
.pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages),
- /* .insn_size is initialized later */
+ .insn_size = MAX_OPTINSN_SIZE,
.nr_garbage = 0,
};
/* Get a slot for optimized_kprobe buffer */
@@ -2002,10 +2002,6 @@ static int __init init_kprobes(void)
}

#if defined(CONFIG_OPTPROBES)
-#if defined(__ARCH_WANT_KPROBES_INSN_SLOT)
- /* Init kprobe_optinsn_slots */
- kprobe_optinsn_slots.insn_size = MAX_OPTINSN_SIZE;
-#endif
/* By default, kprobes can be optimized */
kprobes_allow_optimization = true;
#endif

Subject: Re: [RFC PATCH -tip ] x86/kprobes: kprobes call optimization

(2012/05/10 20:54), Masami Hiramatsu wrote:
> Note: this code is still under development, but it will
> be useful for the discussion to show this.
>
> Use relative call instead of relative jump for optimizing
> kprobes. This reduces x86-depend magic code and runtime
> memory size.

Hmm, I've found that this patch may increase the execution
time of probing more than 20%. I think that is more than
acceptable...


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]