2022-02-24 16:24:44

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

With IBT on, sym+0 is no longer the __fentry__ site.

NOTE: the architecture has a special case and *does* allow placing an
INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
and as such we don't need to disallow probing these instructions.

NOTE: irrespective of the above; there is a complication in that
direct branches to functions are rewritten to not execute ENDBR, so
any breakpoint thereon might miss lots of actual function executions.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/kernel/kprobes/core.c | 11 +++++++++++
kernel/kprobes.c | 15 ++++++++++++---
2 files changed, 23 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1156,3 +1162,8 @@ int arch_trampoline_kprobe(struct kprobe
{
return 0;
}
+
+bool arch_kprobe_on_func_entry(unsigned long offset)
+{
+ return offset <= 4*HAS_KERNEL_IBT;
+}
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -67,10 +67,19 @@ static bool kprobes_all_disarmed;
static DEFINE_MUTEX(kprobe_mutex);
static DEFINE_PER_CPU(struct kprobe *, kprobe_instance);

-kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
- unsigned int __unused)
+kprobe_opcode_t * __weak kprobe_lookup_name(const char *name, unsigned int offset)
{
- return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
+ kprobe_opcode_t *addr = NULL;
+
+ addr = ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
+#ifdef CONFIG_KPROBES_ON_FTRACE
+ if (addr && !offset) {
+ unsigned long faddr = ftrace_location((unsigned long)addr);
+ if (faddr)
+ addr = (kprobe_opcode_t *)faddr;
+ }
+#endif
+ return addr;
}

/*



2022-02-25 03:02:38

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

Hi Peter,

On Thu, 24 Feb 2022 15:51:53 +0100
Peter Zijlstra <[email protected]> wrote:

> With IBT on, sym+0 is no longer the __fentry__ site.
>
> NOTE: the architecture has a special case and *does* allow placing an
> INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> and as such we don't need to disallow probing these instructions.

Does this mean we can still putting a probe on sym+0??

If so, NAK this patch, since the KPROBES_ON_FTRACE is not meaning
to accelerate the function entry probe, but just allows user to
put a probe on 'call _mcount' (which can be modified by ftrace).

func:
endbr <- sym+0 : INT3 is used. (kp->addr = func+0)
nop5 <- sym+4? : ftrace is used. (kp->addr = func+4?)
...

And anyway, in some case (e.g. perf probe) symbol will be a basement
symbol like '_text' and @offset will be the function addr - _text addr
so that we can put a probe on local-scope function.

If you think we should not probe on the endbr, we should treat the
pair of endbr and nop5 (or call _mcount) instructions as a virtual
single instruction. This means kp->addr should point sym+0, but use
ftrace to probe.

func:
endbr <- sym+0 : ftrace is used. (kp->addr = func+0)
nop5 <- sym+4? : This is not able to be probed.
...

Thank you,

>
> NOTE: irrespective of the above; there is a complication in that
> direct branches to functions are rewritten to not execute ENDBR, so
> any breakpoint thereon might miss lots of actual function executions.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> arch/x86/kernel/kprobes/core.c | 11 +++++++++++
> kernel/kprobes.c | 15 ++++++++++++---
> 2 files changed, 23 insertions(+), 3 deletions(-)
>
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -1156,3 +1162,8 @@ int arch_trampoline_kprobe(struct kprobe
> {
> return 0;
> }
> +
> +bool arch_kprobe_on_func_entry(unsigned long offset)
> +{
> + return offset <= 4*HAS_KERNEL_IBT;
> +}
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -67,10 +67,19 @@ static bool kprobes_all_disarmed;
> static DEFINE_MUTEX(kprobe_mutex);
> static DEFINE_PER_CPU(struct kprobe *, kprobe_instance);
>
> -kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
> - unsigned int __unused)
> +kprobe_opcode_t * __weak kprobe_lookup_name(const char *name, unsigned int offset)
> {
> - return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
> + kprobe_opcode_t *addr = NULL;
> +
> + addr = ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> + if (addr && !offset) {
> + unsigned long faddr = ftrace_location((unsigned long)addr);
> + if (faddr)
> + addr = (kprobe_opcode_t *)faddr;
> + }
> +#endif
> + return addr;
> }
>
> /*
>
>


--
Masami Hiramatsu <[email protected]>

2022-02-25 03:15:44

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Thu, Feb 24, 2022 at 03:51:53PM +0100, Peter Zijlstra wrote:
> With IBT on, sym+0 is no longer the __fentry__ site.
>
> NOTE: the architecture has a special case and *does* allow placing an
> INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> and as such we don't need to disallow probing these instructions.
>
> NOTE: irrespective of the above; there is a complication in that
> direct branches to functions are rewritten to not execute ENDBR, so
> any breakpoint thereon might miss lots of actual function executions.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> arch/x86/kernel/kprobes/core.c | 11 +++++++++++
> kernel/kprobes.c | 15 ++++++++++++---
> 2 files changed, 23 insertions(+), 3 deletions(-)
>
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -1156,3 +1162,8 @@ int arch_trampoline_kprobe(struct kprobe
> {
> return 0;
> }
> +
> +bool arch_kprobe_on_func_entry(unsigned long offset)
> +{
> + return offset <= 4*HAS_KERNEL_IBT;
> +}

Let's avoid magic (though obvious right now) literal values. Can the "4"
be changed to a new ENBR_INSTR_SIZE macro or something?

--
Kees Cook

2022-02-25 16:46:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Fri, Feb 25, 2022 at 10:32:15AM +0900, Masami Hiramatsu wrote:
> Hi Peter,
>
> On Thu, 24 Feb 2022 15:51:53 +0100
> Peter Zijlstra <[email protected]> wrote:
>
> > With IBT on, sym+0 is no longer the __fentry__ site.
> >
> > NOTE: the architecture has a special case and *does* allow placing an
> > INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> > and as such we don't need to disallow probing these instructions.
>
> Does this mean we can still putting a probe on sym+0??

I'm not sure... Possibly not. I'm not sure if there's an ABI that
by-passes kprobes_lookup_name(). Arguably you could give it a direct
address instead of a name and still hit the ENDBR I think. But the ABI
surface of this thing it too big for me to easily tell.

> If so, NAK this patch, since the KPROBES_ON_FTRACE is not meaning
> to accelerate the function entry probe, but just allows user to
> put a probe on 'call _mcount' (which can be modified by ftrace).
>
> func:
> endbr <- sym+0 : INT3 is used. (kp->addr = func+0)
> nop5 <- sym+4? : ftrace is used. (kp->addr = func+4?)
> ...
>
> And anyway, in some case (e.g. perf probe) symbol will be a basement
> symbol like '_text' and @offset will be the function addr - _text addr
> so that we can put a probe on local-scope function.
>
> If you think we should not probe on the endbr, we should treat the
> pair of endbr and nop5 (or call _mcount) instructions as a virtual
> single instruction. This means kp->addr should point sym+0, but use
> ftrace to probe.
>
> func:
> endbr <- sym+0 : ftrace is used. (kp->addr = func+0)
> nop5 <- sym+4? : This is not able to be probed.
> ...

Well, it's all a bit crap :/

This patch came from kernel/trace/trace_kprobe.c selftest failing at
boot. That tries to set a kprobe on kprobe_trace_selftest_target which
the whole kprobe machinery translates into
kprobe_trace_selftest_target+0 and then not actually hitting the fentry.

IOW, that selftest seems to hard-code/assume +0 matches __fentry__,
which just isn't true in general (arm64, powerpc are architectures that
come to mind) and now also might not be true on x86.

Calling the selftest broken works for me and I'll drop the patch.


Note that with these patches:

- Not every function starts with ENDBR; the compiler is free to omit
this instruction if it can determine the function address is never
taken (and as such there's never an indirect call to it).

- If there is an ENDBR, not every function entry will actually execute
it. This first instruction is used exclusively as an indirect entry
point. All direct calls should be to the next instruction.

- If there was an ENDBR, it might be turned into a 4 byte UD1
instruction to ensure any indirect call *will* fail.

Given all that, kprobe users are in a bit of a bind. Determining the
__fentry__ point basically means they *have* to first read the function
assembly to figure out where it is.

This patch takes the approach that sym+0 means __fentry__, irrespective
of where it might actually live. I *think* that's more or less
consistent with what other architectures do; specifically see
arch/powerpc/kernel/kprobes.c:kprobe_lookup_name(). I'm not quite sure
what ARM64 does when it has BTI on (which is then very similar to what
we have here).

What do you think makes most sense here?

2022-02-26 00:17:26

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Fri, 25 Feb 2022 11:46:23 +0100
Peter Zijlstra <[email protected]> wrote:

> On Fri, Feb 25, 2022 at 10:32:15AM +0900, Masami Hiramatsu wrote:
> > Hi Peter,
> >
> > On Thu, 24 Feb 2022 15:51:53 +0100
> > Peter Zijlstra <[email protected]> wrote:
> >
> > > With IBT on, sym+0 is no longer the __fentry__ site.
> > >
> > > NOTE: the architecture has a special case and *does* allow placing an
> > > INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> > > and as such we don't need to disallow probing these instructions.
> >
> > Does this mean we can still putting a probe on sym+0??
>
> I'm not sure... Possibly not. I'm not sure if there's an ABI that
> by-passes kprobes_lookup_name(). Arguably you could give it a direct
> address instead of a name and still hit the ENDBR I think. But the ABI
> surface of this thing it too big for me to easily tell.
>
> > If so, NAK this patch, since the KPROBES_ON_FTRACE is not meaning
> > to accelerate the function entry probe, but just allows user to
> > put a probe on 'call _mcount' (which can be modified by ftrace).
> >
> > func:
> > endbr <- sym+0 : INT3 is used. (kp->addr = func+0)
> > nop5 <- sym+4? : ftrace is used. (kp->addr = func+4?)
> > ...
> >
> > And anyway, in some case (e.g. perf probe) symbol will be a basement
> > symbol like '_text' and @offset will be the function addr - _text addr
> > so that we can put a probe on local-scope function.
> >
> > If you think we should not probe on the endbr, we should treat the
> > pair of endbr and nop5 (or call _mcount) instructions as a virtual
> > single instruction. This means kp->addr should point sym+0, but use
> > ftrace to probe.
> >
> > func:
> > endbr <- sym+0 : ftrace is used. (kp->addr = func+0)
> > nop5 <- sym+4? : This is not able to be probed.
> > ...
>
> Well, it's all a bit crap :/
>
> This patch came from kernel/trace/trace_kprobe.c selftest failing at
> boot. That tries to set a kprobe on kprobe_trace_selftest_target which
> the whole kprobe machinery translates into
> kprobe_trace_selftest_target+0 and then not actually hitting the fentry.

OK.

>
> IOW, that selftest seems to hard-code/assume +0 matches __fentry__,
> which just isn't true in general (arm64, powerpc are architectures that
> come to mind) and now also might not be true on x86.

Yeah, right. But if we can handle this as above, maybe we can continue
to put the probe on the entry of the function.

>
> Calling the selftest broken works for me and I'll drop the patch.
>
>
> Note that with these patches:
>
> - Not every function starts with ENDBR; the compiler is free to omit
> this instruction if it can determine the function address is never
> taken (and as such there's never an indirect call to it).
>
> - If there is an ENDBR, not every function entry will actually execute
> it. This first instruction is used exclusively as an indirect entry
> point. All direct calls should be to the next instruction.
>
> - If there was an ENDBR, it might be turned into a 4 byte UD1
> instruction to ensure any indirect call *will* fail.

Ah, I see. So that is a booby trap for the cracker.

>
> Given all that, kprobe users are in a bit of a bind. Determining the
> __fentry__ point basically means they *have* to first read the function
> assembly to figure out where it is.

OK, this sounds like kp->addr should be "call fentry" if there is ENDBR.

>
> This patch takes the approach that sym+0 means __fentry__, irrespective
> of where it might actually live. I *think* that's more or less
> consistent with what other architectures do; specifically see
> arch/powerpc/kernel/kprobes.c:kprobe_lookup_name(). I'm not quite sure
> what ARM64 does when it has BTI on (which is then very similar to what
> we have here).

Yeah, I know the powerpc does such thing, but I think that is not what
user expected. I actually would like to fix that, because in powerpc
and other non-x86 case (without BTI/IBT), the instructions on sym+0 is
actually executed.

>
> What do you think makes most sense here?

Are there any way to distinguish the "preparing instructions" (part of
calling mcount) and this kind of trap instruction online[1]? If possible,
I would like to skip such traps, but put the probe on preparing
instructions.
It seems currently we are using ftrace address as the end marker of
the trap instruction, but we actually need another marker to split
the end of ENDBR and the preparing instructions.

[1]
On x86, we have

func:
endbr
call __fentry__ <-- ftrace location

But on other arch,

func:
[BTI instruction]
push return address <--- preparing instruction(s)
call __fentry__ <-- ftrace location



Thank you,

--
Masami Hiramatsu <[email protected]>

2022-02-26 01:04:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Fri, 25 Feb 2022 11:46:23 +0100
Peter Zijlstra <[email protected]> wrote:

> Given all that, kprobe users are in a bit of a bind. Determining the
> __fentry__ point basically means they *have* to first read the function
> assembly to figure out where it is.

Technically I think that's what kprobes has been designed for. But
realistically, I do not think anyone actually does that (outside of
academic and niche uses).

Really, when people use func+0 they just want to trace the function, and
ftrace is the fastest way to do so, and if it's not *exactly* at function
entry, but includes the arguments, then it should be fine.

That said, perhaps we should add a config to know if the architecture
uses function entry or the old mcount that is after the frame set up (that
is, you can not get to the arguments).

CONFIG_HAVE_FTRACE_FUNCTION_START ?

Because, if the arch still uses the old mcount method (where it's after the
frame set up), then a kprobe at func+0 really wants the breakpoint.

-- Steve

2022-02-26 02:40:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Fri, Feb 25, 2022 at 10:42:49PM +0900, Masami Hiramatsu wrote:

> OK, this sounds like kp->addr should be "call fentry" if there is ENDBR.
>
> >
> > This patch takes the approach that sym+0 means __fentry__, irrespective
> > of where it might actually live. I *think* that's more or less
> > consistent with what other architectures do; specifically see
> > arch/powerpc/kernel/kprobes.c:kprobe_lookup_name(). I'm not quite sure
> > what ARM64 does when it has BTI on (which is then very similar to what
> > we have here).
>
> Yeah, I know the powerpc does such thing, but I think that is not what
> user expected. I actually would like to fix that, because in powerpc
> and other non-x86 case (without BTI/IBT), the instructions on sym+0 is
> actually executed.
>
> >
> > What do you think makes most sense here?
>
> Are there any way to distinguish the "preparing instructions" (part of
> calling mcount) and this kind of trap instruction online[1]? If possible,
> I would like to skip such traps, but put the probe on preparing
> instructions.

None that exist, but we could easily create one. See also my email here:

https://lkml.kernel.org/r/[email protected]

That skip_endbr() function is basically what you're looking for; it just
needs a better name and a Power/ARM64 implementation to get what you
want, right?

The alternative 'hack' I've been contemplating is (ab)using
INT_MIN/INT_MAX offset for __fentry__ and __fexit__ points (that latter
is something we'll probably have to grow when CET-SHSTK or backward-edge
CFI gets to be done, because then ROP tricks as used by function-graph
and kretprobes are out the window).

That way sym+[0..size) is still a valid reference to the actual
instruction in the symbol, but sym+INT_MIN will hard map to __fentry__
while sym+INT_MAX will get us __fexit__.

2022-02-26 02:45:51

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Fri, 25 Feb 2022 16:41:15 +0100
Peter Zijlstra <[email protected]> wrote:

> On Fri, Feb 25, 2022 at 10:42:49PM +0900, Masami Hiramatsu wrote:
>
> > OK, this sounds like kp->addr should be "call fentry" if there is ENDBR.
> >
> > >
> > > This patch takes the approach that sym+0 means __fentry__, irrespective
> > > of where it might actually live. I *think* that's more or less
> > > consistent with what other architectures do; specifically see
> > > arch/powerpc/kernel/kprobes.c:kprobe_lookup_name(). I'm not quite sure
> > > what ARM64 does when it has BTI on (which is then very similar to what
> > > we have here).
> >
> > Yeah, I know the powerpc does such thing, but I think that is not what
> > user expected. I actually would like to fix that, because in powerpc
> > and other non-x86 case (without BTI/IBT), the instructions on sym+0 is
> > actually executed.
> >
> > >
> > > What do you think makes most sense here?
> >
> > Are there any way to distinguish the "preparing instructions" (part of
> > calling mcount) and this kind of trap instruction online[1]? If possible,
> > I would like to skip such traps, but put the probe on preparing
> > instructions.
>
> None that exist, but we could easily create one. See also my email here:
>
> https://lkml.kernel.org/r/[email protected]
>
> That skip_endbr() function is basically what you're looking for; it just
> needs a better name and a Power/ARM64 implementation to get what you
> want, right?

Great! that's what I need. I think is_endbr() is also useful :)

> The alternative 'hack' I've been contemplating is (ab)using
> INT_MIN/INT_MAX offset for __fentry__ and __fexit__ points (that latter
> is something we'll probably have to grow when CET-SHSTK or backward-edge
> CFI gets to be done, because then ROP tricks as used by function-graph
> and kretprobes are out the window).
>
> That way sym+[0..size) is still a valid reference to the actual
> instruction in the symbol, but sym+INT_MIN will hard map to __fentry__
> while sym+INT_MAX will get us __fexit__.

Interesting, is that done by another series?
Maybe I have to check that change for kprobe jump optimization.

Thank you,

--
Masami Hiramatsu <[email protected]>

2022-02-26 07:26:47

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Fri, 25 Feb 2022 09:14:09 -0500
Steven Rostedt <[email protected]> wrote:

> On Fri, 25 Feb 2022 11:46:23 +0100
> Peter Zijlstra <[email protected]> wrote:
>
> > Given all that, kprobe users are in a bit of a bind. Determining the
> > __fentry__ point basically means they *have* to first read the function
> > assembly to figure out where it is.
>
> Technically I think that's what kprobes has been designed for. But
> realistically, I do not think anyone actually does that (outside of
> academic and niche uses).

Yeah, raw kprobe user must check the instruction boundary anyway.
And if possible, I would like to keep the kprobe (in kprobe level) as it is.

> Really, when people use func+0 they just want to trace the function, and
> ftrace is the fastest way to do so, and if it's not *exactly* at function
> entry, but includes the arguments, then it should be fine.

Yes, that is another (sub) reason why I introduced fprobe. ;-)

OK, I understand that we should not allow to probe on endbr unless
user really wants it. Let me add a KPROBE_FLAG_RAW_ENTRY for that special
purpose. If the flag is not set (by default), the kprobe::addr will be
shifted automatically.
ANyway, this address translation must be done in check_ftrace_location
instead of kprobe_lookup_name(). Let me make another patch.
Also, selftest and document must be updated with that.

> That said, perhaps we should add a config to know if the architecture
> uses function entry or the old mcount that is after the frame set up (that
> is, you can not get to the arguments).
>
> CONFIG_HAVE_FTRACE_FUNCTION_START ?

Hmm, ENDBR is not always there, and except x86, most of the arch will
make it 'n'. (x86 is a special case.)

>
> Because, if the arch still uses the old mcount method (where it's after the
> frame set up), then a kprobe at func+0 really wants the breakpoint.
>
> -- Steve
>

Thank you,

--
Masami Hiramatsu <[email protected]>

2022-02-26 15:22:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Sat, Feb 26, 2022 at 11:10:40AM +0900, Masami Hiramatsu wrote:

> > The alternative 'hack' I've been contemplating is (ab)using
> > INT_MIN/INT_MAX offset for __fentry__ and __fexit__ points (that latter
> > is something we'll probably have to grow when CET-SHSTK or backward-edge
> > CFI gets to be done, because then ROP tricks as used by function-graph
> > and kretprobes are out the window).
> >
> > That way sym+[0..size) is still a valid reference to the actual
> > instruction in the symbol, but sym+INT_MIN will hard map to __fentry__
> > while sym+INT_MAX will get us __fexit__.
>
> Interesting, is that done by another series?

Not yet, that was just a crazy idea I had ;-)

2022-02-28 06:54:30

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

Hi Peter,

So, instead of this change, can you try below?
This introduce the arch_adjust_kprobe_addr() and use it in the kprobe_addr()
so that it can handle the case that user passed the probe address in
_text+OFFSET format.

From: Masami Hiramatsu <[email protected]>
Date: Mon, 28 Feb 2022 15:01:48 +0900
Subject: [PATCH] x86: kprobes: Skip ENDBR instruction probing

This adjust the kprobe probe address to skip the ENDBR and put the kprobe
next to the ENDBR so that the kprobe doesn't disturb IBT.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/x86/kernel/kprobes/core.c | 7 +++++++
include/linux/kprobes.h | 2 ++
kernel/kprobes.c | 11 ++++++++++-
3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 745f42cf82dc..a90cfe50d800 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -52,6 +52,7 @@
#include <asm/insn.h>
#include <asm/debugreg.h>
#include <asm/set_memory.h>
+"include <asm/ibt.h>

#include "common.h"

@@ -301,6 +302,12 @@ static int can_probe(unsigned long paddr)
return (addr == paddr);
}

+/* If the x86 support IBT (ENDBR) it must be skipped. */
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr)
+{
+ return (kprobe_opcode_t *)skip_endbr((void *)addr);
+}
+
/*
* Copy an instruction with recovering modified instruction by kprobes
* and adjust the displacement if the instruction uses the %rip-relative
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 19b884353b15..485d7832a613 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -384,6 +384,8 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
}

kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr);
+
int register_kprobe(struct kprobe *p);
void unregister_kprobe(struct kprobe *p);
int register_kprobes(struct kprobe **kps, int num);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 94cab8c9ce56..312f10e85c93 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1488,6 +1488,15 @@ bool within_kprobe_blacklist(unsigned long addr)
return false;
}

+/*
+ * If the arch supports the feature like IBT which will put a trap at
+ * the entry of the symbol, it must be adjusted in this function.
+ */
+kprobe_opcode_t *__weak arch_adjust_kprobe_addr(unsigned long addr)
+{
+ return (kprobe_opcode_t *)addr;
+}
+
/*
* If 'symbol_name' is specified, look it up and add the 'offset'
* to it. This way, we can specify a relative address to a symbol.
@@ -1506,7 +1515,7 @@ static kprobe_opcode_t *_kprobe_addr(kprobe_opcode_t *addr,
return ERR_PTR(-ENOENT);
}

- addr = (kprobe_opcode_t *)(((char *)addr) + offset);
+ addr = arch_adjust_kprobe_addr((unsigned long)addr + offset);
if (addr)
return addr;

--
2.25.1


On Thu, 24 Feb 2022 15:51:53 +0100
Peter Zijlstra <[email protected]> wrote:

> With IBT on, sym+0 is no longer the __fentry__ site.
>
> NOTE: the architecture has a special case and *does* allow placing an
> INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> and as such we don't need to disallow probing these instructions.
>
> NOTE: irrespective of the above; there is a complication in that
> direct branches to functions are rewritten to not execute ENDBR, so
> any breakpoint thereon might miss lots of actual function executions.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> arch/x86/kernel/kprobes/core.c | 11 +++++++++++
> kernel/kprobes.c | 15 ++++++++++++---
> 2 files changed, 23 insertions(+), 3 deletions(-)
>
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -1156,3 +1162,8 @@ int arch_trampoline_kprobe(struct kprobe
> {
> return 0;
> }
> +
> +bool arch_kprobe_on_func_entry(unsigned long offset)
> +{
> + return offset <= 4*HAS_KERNEL_IBT;
> +}
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -67,10 +67,19 @@ static bool kprobes_all_disarmed;
> static DEFINE_MUTEX(kprobe_mutex);
> static DEFINE_PER_CPU(struct kprobe *, kprobe_instance);
>
> -kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
> - unsigned int __unused)
> +kprobe_opcode_t * __weak kprobe_lookup_name(const char *name, unsigned int offset)
> {
> - return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
> + kprobe_opcode_t *addr = NULL;
> +
> + addr = ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> + if (addr && !offset) {
> + unsigned long faddr = ftrace_location((unsigned long)addr);
> + if (faddr)
> + addr = (kprobe_opcode_t *)faddr;
> + }
> +#endif
> + return addr;
> }
>
> /*
>
>


--
Masami Hiramatsu <[email protected]>

2022-02-28 23:44:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Mon, Feb 28, 2022 at 03:07:05PM +0900, Masami Hiramatsu wrote:
> Hi Peter,
>
> So, instead of this change, can you try below?
> This introduce the arch_adjust_kprobe_addr() and use it in the kprobe_addr()
> so that it can handle the case that user passed the probe address in
> _text+OFFSET format.

It works a little... at the very least it still needs
arch_kprobe_on_func_entry() allowing offset 4.

But looking at this, we've got:

kprobe_on_func_entry(addr, sym, offset)
_kprobe_addr(addr, sym, offset)
if (sym)
addr = kprobe_lookup_name()
= kallsyms_lookup_name()
arch_adjust_kprobe_addr(addr+offset)
skip_endbr()
kallsyms_loopup_size_offset(addr, ...)
kallsyms_lookup_size_offset(addr, NULL, &offset)
arch_kprobe_on_func_entry(offset)

Which is _3_ kallsyms lookups and 3 weak/arch hooks.

Surely we can make this a little more streamlined? The below seems to
work.

I think with a little care and testing it should be possible to fold all
the magic of PowerPC's kprobe_lookup_name() into this one hook as well,
meaning we can get rid of kprobe_lookup_name() entirely. Naveen?

This then gets us down to a 1 kallsyms call and 1 arch hook. Hmm?

---
arch/powerpc/kernel/kprobes.c | 34 +++++++++++++++---------
arch/x86/kernel/kprobes/core.c | 17 ++++++++++++
include/linux/kprobes.h | 3 +-
kernel/kprobes.c | 56 ++++++++++++++++++++++++++++++++++-------
4 files changed, 87 insertions(+), 23 deletions(-)

--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,6 +105,27 @@ kprobe_opcode_t *kprobe_lookup_name(cons
return addr;
}

+static bool arch_kprobe_on_func_entry(unsigned long offset)
+{
+#ifdef PPC64_ELF_ABI_v2
+#ifdef CONFIG_KPROBES_ON_FTRACE
+ return offset <= 16;
+#else
+ return offset <= 8;
+#endif
+#else
+ return !offset;
+#endif
+}
+
+/* XXX try and fold the magic of kprobe_lookup_name() in this */
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
+ bool *on_func_entry)
+{
+ *on_func_entry = arch_kprobe_on_func_entry(offset);
+ return (kprobe_opcode_t *)(addr + offset);
+}
+
void *alloc_insn_page(void)
{
void *page;
@@ -218,19 +239,6 @@ static nokprobe_inline void set_current_
kcb->kprobe_saved_msr = regs->msr;
}

-bool arch_kprobe_on_func_entry(unsigned long offset)
-{
-#ifdef PPC64_ELF_ABI_v2
-#ifdef CONFIG_KPROBES_ON_FTRACE
- return offset <= 16;
-#else
- return offset <= 8;
-#endif
-#else
- return !offset;
-#endif
-}
-
void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->link;
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -52,6 +52,7 @@
#include <asm/insn.h>
#include <asm/debugreg.h>
#include <asm/set_memory.h>
+#include <asm/ibt.h>

#include "common.h"

@@ -301,6 +302,22 @@ static int can_probe(unsigned long paddr
return (addr == paddr);
}

+/* If the x86 support IBT (ENDBR) it must be skipped. */
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
+ bool *on_func_entry)
+{
+ if (is_endbr(*(u32 *)addr)) {
+ *on_func_entry = !offset || offset == 4;
+ if (*on_func_entry)
+ offset = 4;
+
+ } else {
+ *on_func_entry = !offset;
+ }
+
+ return (kprobe_opcode_t *)(addr + offset);
+}
+
/*
* Copy an instruction with recovering modified instruction by kprobes
* and adjust the displacement if the instruction uses the %rip-relative
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -265,7 +265,6 @@ extern int arch_init_kprobes(void);
extern void kprobes_inc_nmissed_count(struct kprobe *p);
extern bool arch_within_kprobe_blacklist(unsigned long addr);
extern int arch_populate_kprobe_blacklist(void);
-extern bool arch_kprobe_on_func_entry(unsigned long offset);
extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);

extern bool within_kprobe_blacklist(unsigned long addr);
@@ -384,6 +383,8 @@ static inline struct kprobe_ctlblk *get_
}

kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry);
+
int register_kprobe(struct kprobe *p);
void unregister_kprobe(struct kprobe *p);
int register_kprobes(struct kprobe **kps, int num);
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1489,24 +1489,63 @@ bool within_kprobe_blacklist(unsigned lo
}

/*
+ * arch_adjust_kprobe_addr - adjust the address
+ * @addr: symbol base address
+ * @offset: offset within the symbol
+ * @on_func_entry: was this @addr+@offset on the function entry
+ *
+ * Typically returns @addr + @offset, except for special cases where the
+ * function might be prefixed by a CFI landing pad, in that case any offset
+ * inside the landing pad is mapped to the first 'real' instruction of the
+ * symbol.
+ *
+ * Specifically, for things like IBT/BTI, skip the resp. ENDBR/BTI.C
+ * instruction at +0.
+ */
+kprobe_opcode_t *__weak arch_adjust_kprobe_addr(unsigned long addr,
+ unsigned long offset,
+ bool *on_func_entry)
+{
+ *on_func_entry = !offset;
+ return (kprobe_opcode_t *)(addr + offset);
+}
+
+/*
* If 'symbol_name' is specified, look it up and add the 'offset'
* to it. This way, we can specify a relative address to a symbol.
* This returns encoded errors if it fails to look up symbol or invalid
* combination of parameters.
*/
-static kprobe_opcode_t *_kprobe_addr(kprobe_opcode_t *addr,
- const char *symbol_name, unsigned int offset)
+static kprobe_opcode_t *
+_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
+ unsigned long offset, bool *on_func_entry)
{
if ((symbol_name && addr) || (!symbol_name && !addr))
goto invalid;

if (symbol_name) {
+ /*
+ * Input: @sym + @offset
+ * Output: @addr + @offset
+ *
+ * NOTE: kprobe_lookup_name() does *NOT* fold the offset
+ * argument into it's output!
+ */
addr = kprobe_lookup_name(symbol_name, offset);
if (!addr)
return ERR_PTR(-ENOENT);
+ } else {
+ /*
+ * Input: @addr + @offset
+ * Output: @addr' + @offset'
+ */
+ if (!kallsyms_lookup_size_offset((unsigned long)addr + offset,
+ NULL, &offset))
+ return ERR_PTR(-ENOENT);
+ addr = (kprobe_opcode_t *)((unsigned long)addr - offset);
}

- addr = (kprobe_opcode_t *)(((char *)addr) + offset);
+ addr = arch_adjust_kprobe_addr((unsigned long)addr, offset, on_func_entry);
if (addr)
return addr;

@@ -1516,7 +1555,8 @@ static kprobe_opcode_t *_kprobe_addr(kpr

static kprobe_opcode_t *kprobe_addr(struct kprobe *p)
{
- return _kprobe_addr(p->addr, p->symbol_name, p->offset);
+ bool on_func_entry;
+ return _kprobe_addr(p->addr, p->symbol_name, p->offset, &on_func_entry);
}

/*
@@ -2067,15 +2107,13 @@ bool __weak arch_kprobe_on_func_entry(un
*/
int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
{
- kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
+ bool on_func_entry;
+ kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset, &on_func_entry);

if (IS_ERR(kp_addr))
return PTR_ERR(kp_addr);

- if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset))
- return -ENOENT;
-
- if (!arch_kprobe_on_func_entry(offset))
+ if (!on_func_entry)
return -EINVAL;

return 0;

2022-03-01 03:04:21

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Tue, 1 Mar 2022 00:25:13 +0100
Peter Zijlstra <[email protected]> wrote:

> On Mon, Feb 28, 2022 at 03:07:05PM +0900, Masami Hiramatsu wrote:
> > Hi Peter,
> >
> > So, instead of this change, can you try below?
> > This introduce the arch_adjust_kprobe_addr() and use it in the kprobe_addr()
> > so that it can handle the case that user passed the probe address in
> > _text+OFFSET format.
>
> It works a little... at the very least it still needs
> arch_kprobe_on_func_entry() allowing offset 4.
>
> But looking at this, we've got:
>
> kprobe_on_func_entry(addr, sym, offset)
> _kprobe_addr(addr, sym, offset)
> if (sym)
> addr = kprobe_lookup_name()
> = kallsyms_lookup_name()
> arch_adjust_kprobe_addr(addr+offset)
> skip_endbr()
> kallsyms_loopup_size_offset(addr, ...)
> kallsyms_lookup_size_offset(addr, NULL, &offset)
> arch_kprobe_on_func_entry(offset)
>
> Which is _3_ kallsyms lookups and 3 weak/arch hooks.

Yeah.

>
> Surely we can make this a little more streamlined? The below seems to
> work.

OK, let me check.

>
> I think with a little care and testing it should be possible to fold all
> the magic of PowerPC's kprobe_lookup_name() into this one hook as well,
> meaning we can get rid of kprobe_lookup_name() entirely. Naveen?

Agreed. my previous patch just focused on x86, but powerpc
kprobe_lookup_name() must be updated too.

>
> This then gets us down to a 1 kallsyms call and 1 arch hook. Hmm?
>
> ---
> arch/powerpc/kernel/kprobes.c | 34 +++++++++++++++---------
> arch/x86/kernel/kprobes/core.c | 17 ++++++++++++
> include/linux/kprobes.h | 3 +-
> kernel/kprobes.c | 56 ++++++++++++++++++++++++++++++++++-------
> 4 files changed, 87 insertions(+), 23 deletions(-)
>
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -105,6 +105,27 @@ kprobe_opcode_t *kprobe_lookup_name(cons
> return addr;
> }
>
> +static bool arch_kprobe_on_func_entry(unsigned long offset)
> +{
> +#ifdef PPC64_ELF_ABI_v2
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> + return offset <= 16;
> +#else
> + return offset <= 8;
> +#endif
> +#else
> + return !offset;
> +#endif
> +}
> +
> +/* XXX try and fold the magic of kprobe_lookup_name() in this */
> +kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> + bool *on_func_entry)
> +{
> + *on_func_entry = arch_kprobe_on_func_entry(offset);
> + return (kprobe_opcode_t *)(addr + offset);
> +}
> +
> void *alloc_insn_page(void)
> {
> void *page;
> @@ -218,19 +239,6 @@ static nokprobe_inline void set_current_
> kcb->kprobe_saved_msr = regs->msr;
> }
>
> -bool arch_kprobe_on_func_entry(unsigned long offset)
> -{
> -#ifdef PPC64_ELF_ABI_v2
> -#ifdef CONFIG_KPROBES_ON_FTRACE
> - return offset <= 16;
> -#else
> - return offset <= 8;
> -#endif
> -#else
> - return !offset;
> -#endif
> -}
> -
> void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
> {
> ri->ret_addr = (kprobe_opcode_t *)regs->link;
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -52,6 +52,7 @@
> #include <asm/insn.h>
> #include <asm/debugreg.h>
> #include <asm/set_memory.h>
> +#include <asm/ibt.h>
>
> #include "common.h"
>
> @@ -301,6 +302,22 @@ static int can_probe(unsigned long paddr
> return (addr == paddr);
> }
>
> +/* If the x86 support IBT (ENDBR) it must be skipped. */
> +kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> + bool *on_func_entry)
> +{
> + if (is_endbr(*(u32 *)addr)) {
> + *on_func_entry = !offset || offset == 4;
> + if (*on_func_entry)
> + offset = 4;
> +
> + } else {
> + *on_func_entry = !offset;
> + }
> +
> + return (kprobe_opcode_t *)(addr + offset);
> +}
> +
> /*
> * Copy an instruction with recovering modified instruction by kprobes
> * and adjust the displacement if the instruction uses the %rip-relative
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -265,7 +265,6 @@ extern int arch_init_kprobes(void);
> extern void kprobes_inc_nmissed_count(struct kprobe *p);
> extern bool arch_within_kprobe_blacklist(unsigned long addr);
> extern int arch_populate_kprobe_blacklist(void);
> -extern bool arch_kprobe_on_func_entry(unsigned long offset);
> extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
>
> extern bool within_kprobe_blacklist(unsigned long addr);
> @@ -384,6 +383,8 @@ static inline struct kprobe_ctlblk *get_
> }
>
> kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
> +kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry);
> +
> int register_kprobe(struct kprobe *p);
> void unregister_kprobe(struct kprobe *p);
> int register_kprobes(struct kprobe **kps, int num);
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1489,24 +1489,63 @@ bool within_kprobe_blacklist(unsigned lo
> }
>
> /*
> + * arch_adjust_kprobe_addr - adjust the address
> + * @addr: symbol base address
> + * @offset: offset within the symbol
> + * @on_func_entry: was this @addr+@offset on the function entry
> + *
> + * Typically returns @addr + @offset, except for special cases where the
> + * function might be prefixed by a CFI landing pad, in that case any offset
> + * inside the landing pad is mapped to the first 'real' instruction of the
> + * symbol.
> + *
> + * Specifically, for things like IBT/BTI, skip the resp. ENDBR/BTI.C
> + * instruction at +0.
> + */
> +kprobe_opcode_t *__weak arch_adjust_kprobe_addr(unsigned long addr,
> + unsigned long offset,
> + bool *on_func_entry)
> +{
> + *on_func_entry = !offset;
> + return (kprobe_opcode_t *)(addr + offset);
> +}
> +
> +/*
> * If 'symbol_name' is specified, look it up and add the 'offset'
> * to it. This way, we can specify a relative address to a symbol.
> * This returns encoded errors if it fails to look up symbol or invalid
> * combination of parameters.
> */
> -static kprobe_opcode_t *_kprobe_addr(kprobe_opcode_t *addr,
> - const char *symbol_name, unsigned int offset)
> +static kprobe_opcode_t *
> +_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
> + unsigned long offset, bool *on_func_entry)
> {
> if ((symbol_name && addr) || (!symbol_name && !addr))
> goto invalid;
>
> if (symbol_name) {
> + /*
> + * Input: @sym + @offset
> + * Output: @addr + @offset
> + *
> + * NOTE: kprobe_lookup_name() does *NOT* fold the offset
> + * argument into it's output!
> + */
> addr = kprobe_lookup_name(symbol_name, offset);

Hmm, there are 2 issues.

- the 'addr' includes the 'offset' here.
- the 'offset' is NOT limited under the symbol size.
(e.g. symbol_name = "_text" and @offset points the offset of target symbol from _text)

This means we need to call kallsyms_lookup_size_offset() in this case too.

> if (!addr)
> return ERR_PTR(-ENOENT);
> + } else {
> + /*
> + * Input: @addr + @offset
> + * Output: @addr' + @offset'
> + */
> + if (!kallsyms_lookup_size_offset((unsigned long)addr + offset,
> + NULL, &offset))
> + return ERR_PTR(-ENOENT);
> + addr = (kprobe_opcode_t *)((unsigned long)addr - offset);
> }
>
> - addr = (kprobe_opcode_t *)(((char *)addr) + offset);
> + addr = arch_adjust_kprobe_addr((unsigned long)addr, offset, on_func_entry);

Thus we can ensure the 'offset' here is the real offset from the target function entry.

Thank you,

> if (addr)
> return addr;
>
> @@ -1516,7 +1555,8 @@ static kprobe_opcode_t *_kprobe_addr(kpr
>
> static kprobe_opcode_t *kprobe_addr(struct kprobe *p)
> {
> - return _kprobe_addr(p->addr, p->symbol_name, p->offset);
> + bool on_func_entry;
> + return _kprobe_addr(p->addr, p->symbol_name, p->offset, &on_func_entry);
> }
>
> /*
> @@ -2067,15 +2107,13 @@ bool __weak arch_kprobe_on_func_entry(un
> */
> int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
> {
> - kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
> + bool on_func_entry;
> + kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset, &on_func_entry);
>
> if (IS_ERR(kp_addr))
> return PTR_ERR(kp_addr);
>
> - if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset))
> - return -ENOENT;
> -
> - if (!arch_kprobe_on_func_entry(offset))
> + if (!on_func_entry)
> return -EINVAL;
>
> return 0;


--
Masami Hiramatsu <[email protected]>

2022-03-01 08:46:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Tue, Mar 01, 2022 at 11:49:05AM +0900, Masami Hiramatsu wrote:
> > +static kprobe_opcode_t *
> > +_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
> > + unsigned long offset, bool *on_func_entry)
> > {
> > if ((symbol_name && addr) || (!symbol_name && !addr))
> > goto invalid;
> >
> > if (symbol_name) {
> > + /*
> > + * Input: @sym + @offset
> > + * Output: @addr + @offset
> > + *
> > + * NOTE: kprobe_lookup_name() does *NOT* fold the offset
> > + * argument into it's output!
> > + */
> > addr = kprobe_lookup_name(symbol_name, offset);
>
> Hmm, there are 2 issues.
>
> - the 'addr' includes the 'offset' here.

AFAICT it doesn't (I ever wrote that in the comment on top). There's two
implementations of kprobe_lookup_name(), the weak version doesn't even
use the offset argument, and the PowerPC implementation only checks for
!offset and doesn't fold it.

> - the 'offset' is NOT limited under the symbol size.
> (e.g. symbol_name = "_text" and @offset points the offset of target symbol from _text)
>
> This means we need to call kallsyms_lookup_size_offset() in this case too.

I'm feeling we should error out in that case. Using sym+offset beyond
the limits of sym is just daft.

But if you really want/need to retain that, then yes, we need that
else branch unconditionally :/

2022-03-01 17:29:33

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

Hi Peter,

Peter Zijlstra wrote:
> On Mon, Feb 28, 2022 at 03:07:05PM +0900, Masami Hiramatsu wrote:
>> Hi Peter,
>>
>> So, instead of this change, can you try below?
>> This introduce the arch_adjust_kprobe_addr() and use it in the kprobe_addr()
>> so that it can handle the case that user passed the probe address in
>> _text+OFFSET format.
>
> It works a little... at the very least it still needs
> arch_kprobe_on_func_entry() allowing offset 4.
>
> But looking at this, we've got:
>
> kprobe_on_func_entry(addr, sym, offset)
> _kprobe_addr(addr, sym, offset)
> if (sym)
> addr = kprobe_lookup_name()
> = kallsyms_lookup_name()
> arch_adjust_kprobe_addr(addr+offset)
> skip_endbr()
> kallsyms_loopup_size_offset(addr, ...)
> kallsyms_lookup_size_offset(addr, NULL, &offset)
> arch_kprobe_on_func_entry(offset)
>
> Which is _3_ kallsyms lookups and 3 weak/arch hooks.
>
> Surely we can make this a little more streamlined? The below seems to
> work.
>
> I think with a little care and testing it should be possible to fold all
> the magic of PowerPC's kprobe_lookup_name() into this one hook as well,
> meaning we can get rid of kprobe_lookup_name() entirely. Naveen?

This is timely. I've been looking at addressing a similar set of issues
on powerpc:
http://lkml.kernel.org/r/[email protected]

>
> This then gets us down to a 1 kallsyms call and 1 arch hook. Hmm?

I was going to propose making _kprobe_addr() into a weak function in
place of kprobe_lookup_name() in response to Masami in the other thread,
but this is looking better.

>
> ---
> arch/powerpc/kernel/kprobes.c | 34 +++++++++++++++---------
> arch/x86/kernel/kprobes/core.c | 17 ++++++++++++
> include/linux/kprobes.h | 3 +-
> kernel/kprobes.c | 56 ++++++++++++++++++++++++++++++++++-------
> 4 files changed, 87 insertions(+), 23 deletions(-)

I will take a closer look at this tomorrow and revert.


Thanks,
- Naveen

2022-03-01 17:38:33

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

Peter Zijlstra wrote:
> On Tue, Mar 01, 2022 at 11:49:05AM +0900, Masami Hiramatsu wrote:
>
>> - the 'offset' is NOT limited under the symbol size.
>> (e.g. symbol_name = "_text" and @offset points the offset of target symbol from _text)
>>
>> This means we need to call kallsyms_lookup_size_offset() in this case too.
>
> I'm feeling we should error out in that case. Using sym+offset beyond
> the limits of sym is just daft.
>
> But if you really want/need to retain that, then yes, we need that
> else branch unconditionally :/

I think we will need this. perf always specifies an offset from _text.

Also, I just noticed:

> -static kprobe_opcode_t *_kprobe_addr(kprobe_opcode_t *addr,
> - const char *symbol_name, unsigned int offset)
> +static kprobe_opcode_t *
> +_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
> + unsigned long offset, bool *on_func_entry)
> {
> if ((symbol_name && addr) || (!symbol_name && !addr))
> goto invalid;
>
> if (symbol_name) {
> + /*
> + * Input: @sym + @offset
> + * Output: @addr + @offset
> + *
> + * NOTE: kprobe_lookup_name() does *NOT* fold the offset
> + * argument into it's output!
> + */
> addr = kprobe_lookup_name(symbol_name, offset);
> if (!addr)
> return ERR_PTR(-ENOENT);
> + } else {
> + /*
> + * Input: @addr + @offset
> + * Output: @addr' + @offset'
> + */
> + if (!kallsyms_lookup_size_offset((unsigned long)addr + offset,
> + NULL, &offset))
> + return ERR_PTR(-ENOENT);
> + addr = (kprobe_opcode_t *)((unsigned long)addr - offset);
> }

This looks wrong. I think you need to retain offset to calculate the
proper function entry address so that you can do:
addr = (kprobe_opcode_t *)((unsigned long)(addr + offset) - func_offset);
offset = func_offset;

>
> - addr = (kprobe_opcode_t *)(((char *)addr) + offset);
> + addr = arch_adjust_kprobe_addr((unsigned long)addr, offset, on_func_entry);
> if (addr)
> return addr;
>

- Naveen

2022-03-01 19:21:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Tue, Mar 01, 2022 at 10:49:09PM +0530, Naveen N. Rao wrote:
> Peter Zijlstra wrote:
> > On Tue, Mar 01, 2022 at 11:49:05AM +0900, Masami Hiramatsu wrote:
> >
> > > - the 'offset' is NOT limited under the symbol size.
> > > (e.g. symbol_name = "_text" and @offset points the offset of target symbol from _text)
> > >
> > > This means we need to call kallsyms_lookup_size_offset() in this case too.
> >
> > I'm feeling we should error out in that case. Using sym+offset beyond
> > the limits of sym is just daft.
> >
> > But if you really want/need to retain that, then yes, we need that
> > else branch unconditionally :/
>
> I think we will need this. perf always specifies an offset from _text.

The _text section symbol should have an adequate size, no?

> Also, I just noticed:

> > + if (!kallsyms_lookup_size_offset((unsigned long)addr + offset,
> > + NULL, &offset))
> > + return ERR_PTR(-ENOENT);
> > + addr = (kprobe_opcode_t *)((unsigned long)addr - offset);
> > }
>
> This looks wrong. I think you need to retain offset to calculate the proper
> function entry address so that you can do:
> addr = (kprobe_opcode_t *)((unsigned long)(addr + offset) - func_offset);
> offset = func_offset;


Right you are, it needs to be:

addr += offset;
kallsyms_lookup_size_offset(addr, &size, &offset);
addr -= offset;

with all the extra unreadable casts on.

2022-03-02 14:10:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Wed, Mar 02, 2022 at 09:11:50AM +0900, Masami Hiramatsu wrote:

> > But if you really want/need to retain that, then yes, we need that
> > else branch unconditionally :/
>
> Thank you,

That's what I ended up doing in the latest version; I realized that
irrespective of symbol size, it is required when symbols overlap, as per
the case mentioned by Naveen.

https://lkml.kernel.org/r/[email protected]

2022-03-02 17:44:32

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

Peter Zijlstra wrote:
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -105,6 +105,27 @@ kprobe_opcode_t *kprobe_lookup_name(cons
> return addr;
> }
>
> +static bool arch_kprobe_on_func_entry(unsigned long offset)
> +{
> +#ifdef PPC64_ELF_ABI_v2
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> + return offset <= 16;
> +#else
> + return offset <= 8;
> +#endif
> +#else
> + return !offset;
> +#endif
> +}
> +
> +/* XXX try and fold the magic of kprobe_lookup_name() in this */
> +kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> + bool *on_func_entry)
> +{
> + *on_func_entry = arch_kprobe_on_func_entry(offset);
> + return (kprobe_opcode_t *)(addr + offset);
> +}
> +

With respect to kprobe_lookup_name(), one of the primary motivations there was
the issue with function descriptors for the previous elf v1 ABI (it likely also
affects ia64/parisc). I'm thinking it'll be simpler if we have a way to obtain
function entry point. Something like this:

diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 4176c7eca7b5aa..8c57cc5b77f9ae 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -73,6 +73,12 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
/* Lookup the address for a symbol. Returns 0 if not found. */
unsigned long kallsyms_lookup_name(const char *name);

+/* Return function entry point by additionally dereferencing function descriptor */
+static inline unsigned long kallsyms_lookup_function(const char *name)
+{
+ return (unsigned long)dereference_symbol_descriptor((void *)kallsyms_lookup_name(name));
+}
+
extern int kallsyms_lookup_size_offset(unsigned long addr,
unsigned long *symbolsize,
unsigned long *offset);
@@ -103,6 +109,11 @@ static inline unsigned long kallsyms_lookup_name(const char *name)
return 0;
}

+static inline unsigned long kallsyms_lookup_function(const char *name)
+{
+ return 0;
+}
+
static inline int kallsyms_lookup_size_offset(unsigned long addr,
unsigned long *symbolsize,
unsigned long *offset)


With that, we can fold some of the code from kprobe_lookup_name() into
arch_adjust_kprobe_addr() and remove kprobe_lookup_name(). This should also
address Masami's concerns with powerpc promoting all probes at function entry
into a probe at the ftrace location.


- Naveen


---
arch/powerpc/kernel/kprobes.c | 70 +++--------------------------------
include/linux/kprobes.h | 1 -
kernel/kprobes.c | 19 ++--------
kernel/trace/trace_kprobe.c | 2 +-
4 files changed, 9 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 7dae0b01abfbd6..46aa2b9e44c27c 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -41,70 +41,6 @@ bool arch_within_kprobe_blacklist(unsigned long addr)
addr < (unsigned long)__head_end);
}

-kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
-{
- kprobe_opcode_t *addr = NULL;
-
-#ifdef PPC64_ELF_ABI_v2
- /* PPC64 ABIv2 needs local entry point */
- addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
- if (addr && !offset) {
-#ifdef CONFIG_KPROBES_ON_FTRACE
- unsigned long faddr;
- /*
- * Per livepatch.h, ftrace location is always within the first
- * 16 bytes of a function on powerpc with -mprofile-kernel.
- */
- faddr = ftrace_location_range((unsigned long)addr,
- (unsigned long)addr + 16);
- if (faddr)
- addr = (kprobe_opcode_t *)faddr;
- else
-#endif
- addr = (kprobe_opcode_t *)ppc_function_entry(addr);
- }
-#elif defined(PPC64_ELF_ABI_v1)
- /*
- * 64bit powerpc ABIv1 uses function descriptors:
- * - Check for the dot variant of the symbol first.
- * - If that fails, try looking up the symbol provided.
- *
- * This ensures we always get to the actual symbol and not
- * the descriptor.
- *
- * Also handle <module:symbol> format.
- */
- char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
- bool dot_appended = false;
- const char *c;
- ssize_t ret = 0;
- int len = 0;
-
- if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) {
- c++;
- len = c - name;
- memcpy(dot_name, name, len);
- } else
- c = name;
-
- if (*c != '\0' && *c != '.') {
- dot_name[len++] = '.';
- dot_appended = true;
- }
- ret = strscpy(dot_name + len, c, KSYM_NAME_LEN);
- if (ret > 0)
- addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
-
- /* Fallback to the original non-dot symbol lookup */
- if (!addr && dot_appended)
- addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
-#else
- addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
-#endif
-
- return addr;
-}
-
static bool arch_kprobe_on_func_entry(unsigned long offset)
{
#ifdef PPC64_ELF_ABI_v2
@@ -118,11 +54,15 @@ static bool arch_kprobe_on_func_entry(unsigned long offset)
#endif
}

-/* XXX try and fold the magic of kprobe_lookup_name() in this */
kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
bool *on_func_entry)
{
*on_func_entry = arch_kprobe_on_func_entry(offset);
+#ifdef PPC64_ELF_ABI_v2
+ /* Promote probes on the GEP to the LEP */
+ if (!offset)
+ addr = ppc_function_entry((void *)addr);
+#endif
return (kprobe_opcode_t *)(addr + offset);
}

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9c28f7a0ef4268..dad375056ba049 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -382,7 +382,6 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
return this_cpu_ptr(&kprobe_ctlblk);
}

-kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry);

int register_kprobe(struct kprobe *p);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8be57fdc19bdc0..066fa644e9dfa3 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -67,12 +67,6 @@ static bool kprobes_all_disarmed;
static DEFINE_MUTEX(kprobe_mutex);
static DEFINE_PER_CPU(struct kprobe *, kprobe_instance);

-kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
- unsigned int __unused)
-{
- return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
-}
-
/*
* Blacklist -- list of 'struct kprobe_blacklist_entry' to store info where
* kprobes can not probe.
@@ -1481,7 +1475,7 @@ bool within_kprobe_blacklist(unsigned long addr)
if (!p)
return false;
*p = '\0';
- addr = (unsigned long)kprobe_lookup_name(symname, 0);
+ addr = kallsyms_lookup_function(symname);
if (addr)
return __within_kprobe_blacklist(addr);
}
@@ -1524,14 +1518,7 @@ _kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
goto invalid;

if (symbol_name) {
- /*
- * Input: @sym + @offset
- * Output: @addr + @offset
- *
- * NOTE: kprobe_lookup_name() does *NOT* fold the offset
- * argument into it's output!
- */
- addr = kprobe_lookup_name(symbol_name, offset);
+ addr = (kprobe_opcode_t *)kallsyms_lookup_function(symbol_name);
if (!addr)
return ERR_PTR(-ENOENT);
}
@@ -2621,7 +2608,7 @@ static int __init init_kprobes(void)
/* lookup the function address from its name */
for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
kretprobe_blacklist[i].addr =
- kprobe_lookup_name(kretprobe_blacklist[i].name, 0);
+ (void *)kallsyms_lookup_function(kretprobe_blacklist[i].name);
if (!kretprobe_blacklist[i].addr)
pr_err("Failed to lookup symbol '%s' for kretprobe blacklist. Maybe the target function is removed or renamed.\n",
kretprobe_blacklist[i].name);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 508f14af4f2c7e..a8d01954051e60 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -461,7 +461,7 @@ static bool within_notrace_func(struct trace_kprobe *tk)
if (!p)
return true;
*p = '\0';
- addr = (unsigned long)kprobe_lookup_name(symname, 0);
+ addr = kallsyms_lookup_function(symname);
if (addr)
return __within_notrace_func(addr);
}
--
2.35.1


2022-03-02 18:35:30

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

Peter Zijlstra wrote:
>
> How does this look?

I gave this a quick test on powerpc and this looks good to me.

> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -265,7 +265,6 @@ extern int arch_init_kprobes(void);
> extern void kprobes_inc_nmissed_count(struct kprobe *p);
> extern bool arch_within_kprobe_blacklist(unsigned long addr);
> extern int arch_populate_kprobe_blacklist(void);
> -extern bool arch_kprobe_on_func_entry(unsigned long offset);

There is a __weak definition of this function in kernel/kprobes.c which
should also be removed.


Thanks,
Naveen

2022-03-02 19:02:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Tue, Mar 01, 2022 at 08:12:45PM +0100, Peter Zijlstra wrote:
> On Tue, Mar 01, 2022 at 10:49:09PM +0530, Naveen N. Rao wrote:
> > Peter Zijlstra wrote:

> > > But if you really want/need to retain that, then yes, we need that
> > > else branch unconditionally :/
> >
> > I think we will need this. perf always specifies an offset from _text.
>
> The _text section symbol should have an adequate size, no?

n/m, I should really go get some sleep it seems. Even if the size is
correct, that isn't relevant.

> > Also, I just noticed:
>
> > > + if (!kallsyms_lookup_size_offset((unsigned long)addr + offset,
> > > + NULL, &offset))
> > > + return ERR_PTR(-ENOENT);
> > > + addr = (kprobe_opcode_t *)((unsigned long)addr - offset);
> > > }
> >
> > This looks wrong. I think you need to retain offset to calculate the proper
> > function entry address so that you can do:
> > addr = (kprobe_opcode_t *)((unsigned long)(addr + offset) - func_offset);
> > offset = func_offset;
>
>
> Right you are, it needs to be:
>
> addr += offset;
> kallsyms_lookup_size_offset(addr, &size, &offset);
> addr -= offset;
>
> with all the extra unreadable casts on.

How does this look?

--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -105,6 +105,27 @@ kprobe_opcode_t *kprobe_lookup_name(cons
return addr;
}

+static bool arch_kprobe_on_func_entry(unsigned long offset)
+{
+#ifdef PPC64_ELF_ABI_v2
+#ifdef CONFIG_KPROBES_ON_FTRACE
+ return offset <= 16;
+#else
+ return offset <= 8;
+#endif
+#else
+ return !offset;
+#endif
+}
+
+/* XXX try and fold the magic of kprobe_lookup_name() in this */
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
+ bool *on_func_entry)
+{
+ *on_func_entry = arch_kprobe_on_func_entry(offset);
+ return (kprobe_opcode_t *)(addr + offset);
+}
+
void *alloc_insn_page(void)
{
void *page;
@@ -218,19 +239,6 @@ static nokprobe_inline void set_current_
kcb->kprobe_saved_msr = regs->msr;
}

-bool arch_kprobe_on_func_entry(unsigned long offset)
-{
-#ifdef PPC64_ELF_ABI_v2
-#ifdef CONFIG_KPROBES_ON_FTRACE
- return offset <= 16;
-#else
- return offset <= 8;
-#endif
-#else
- return !offset;
-#endif
-}
-
void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->link;
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -52,6 +52,7 @@
#include <asm/insn.h>
#include <asm/debugreg.h>
#include <asm/set_memory.h>
+#include <asm/ibt.h>

#include "common.h"

@@ -301,6 +302,22 @@ static int can_probe(unsigned long paddr
return (addr == paddr);
}

+/* If the x86 support IBT (ENDBR) it must be skipped. */
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
+ bool *on_func_entry)
+{
+ if (is_endbr(*(u32 *)addr)) {
+ *on_func_entry = !offset || offset == 4;
+ if (*on_func_entry)
+ offset = 4;
+
+ } else {
+ *on_func_entry = !offset;
+ }
+
+ return (kprobe_opcode_t *)(addr + offset);
+}
+
/*
* Copy an instruction with recovering modified instruction by kprobes
* and adjust the displacement if the instruction uses the %rip-relative
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -265,7 +265,6 @@ extern int arch_init_kprobes(void);
extern void kprobes_inc_nmissed_count(struct kprobe *p);
extern bool arch_within_kprobe_blacklist(unsigned long addr);
extern int arch_populate_kprobe_blacklist(void);
-extern bool arch_kprobe_on_func_entry(unsigned long offset);
extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);

extern bool within_kprobe_blacklist(unsigned long addr);
@@ -384,6 +383,8 @@ static inline struct kprobe_ctlblk *get_
}

kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
+kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry);
+
int register_kprobe(struct kprobe *p);
void unregister_kprobe(struct kprobe *p);
int register_kprobes(struct kprobe **kps, int num);
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1489,24 +1489,68 @@ bool within_kprobe_blacklist(unsigned lo
}

/*
+ * arch_adjust_kprobe_addr - adjust the address
+ * @addr: symbol base address
+ * @offset: offset within the symbol
+ * @on_func_entry: was this @addr+@offset on the function entry
+ *
+ * Typically returns @addr + @offset, except for special cases where the
+ * function might be prefixed by a CFI landing pad, in that case any offset
+ * inside the landing pad is mapped to the first 'real' instruction of the
+ * symbol.
+ *
+ * Specifically, for things like IBT/BTI, skip the resp. ENDBR/BTI.C
+ * instruction at +0.
+ */
+kprobe_opcode_t *__weak arch_adjust_kprobe_addr(unsigned long addr,
+ unsigned long offset,
+ bool *on_func_entry)
+{
+ *on_func_entry = !offset;
+ return (kprobe_opcode_t *)(addr + offset);
+}
+
+/*
* If 'symbol_name' is specified, look it up and add the 'offset'
* to it. This way, we can specify a relative address to a symbol.
* This returns encoded errors if it fails to look up symbol or invalid
* combination of parameters.
*/
-static kprobe_opcode_t *_kprobe_addr(kprobe_opcode_t *addr,
- const char *symbol_name, unsigned int offset)
+static kprobe_opcode_t *
+_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
+ unsigned long offset, bool *on_func_entry)
{
if ((symbol_name && addr) || (!symbol_name && !addr))
goto invalid;

if (symbol_name) {
+ /*
+ * Input: @sym + @offset
+ * Output: @addr + @offset
+ *
+ * NOTE: kprobe_lookup_name() does *NOT* fold the offset
+ * argument into it's output!
+ */
addr = kprobe_lookup_name(symbol_name, offset);
if (!addr)
return ERR_PTR(-ENOENT);
}

- addr = (kprobe_opcode_t *)(((char *)addr) + offset);
+ /*
+ * So here we have @addr + @offset, displace it into a new
+ * @addr' + @offset' where @addr' is the symbol start address.
+ */
+ addr = (void *)addr + offset;
+ if (!kallsyms_lookup_size_offset((unsigned long)addr, NULL, &offset))
+ return ERR_PTR(-ENOENT);
+ addr = (void *)addr - offset;
+
+ /*
+ * Then ask the architecture to re-combine them, taking care of
+ * magical function entry details while telling us if this was indeed
+ * at the start of the function.
+ */
+ addr = arch_adjust_kprobe_addr((unsigned long)addr, offset, on_func_entry);
if (addr)
return addr;

@@ -1516,7 +1560,8 @@ static kprobe_opcode_t *_kprobe_addr(kpr

static kprobe_opcode_t *kprobe_addr(struct kprobe *p)
{
- return _kprobe_addr(p->addr, p->symbol_name, p->offset);
+ bool on_func_entry;
+ return _kprobe_addr(p->addr, p->symbol_name, p->offset, &on_func_entry);
}

/*
@@ -2067,15 +2112,13 @@ bool __weak arch_kprobe_on_func_entry(un
*/
int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset)
{
- kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset);
+ bool on_func_entry;
+ kprobe_opcode_t *kp_addr = _kprobe_addr(addr, sym, offset, &on_func_entry);

if (IS_ERR(kp_addr))
return PTR_ERR(kp_addr);

- if (!kallsyms_lookup_size_offset((unsigned long)kp_addr, NULL, &offset))
- return -ENOENT;
-
- if (!arch_kprobe_on_func_entry(offset))
+ if (!on_func_entry)
return -EINVAL;

return 0;

2022-03-02 22:25:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Wed, Mar 02, 2022 at 08:32:45PM +0100, Peter Zijlstra wrote:
> I wonder if you also want to tighten up on_func_entry? Wouldn't the
> above suggest something like:
>
> kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> bool *on_func_entry)
> {
> #ifdef PPC64_ELF_ABI_V2
> unsigned long entry = ppc_function_entry((void *)addr) - addr;
> *on_func_entry = !offset || offset == entry;
> if (*on_func_entry)
> offset = entry;
> #else
> *on_func_entry = !offset;
> #endif
> return (void *)(addr + offset);
> }

One question though; the above seems to work for +0 or +8 (IIRC your
instructions are 4 bytes each and the GEP is 2 instructions).

But what do we want to happen for +4 ?

2022-03-02 23:08:23

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Tue, 1 Mar 2022 09:28:49 +0100
Peter Zijlstra <[email protected]> wrote:

> On Tue, Mar 01, 2022 at 11:49:05AM +0900, Masami Hiramatsu wrote:
> > > +static kprobe_opcode_t *
> > > +_kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
> > > + unsigned long offset, bool *on_func_entry)
> > > {
> > > if ((symbol_name && addr) || (!symbol_name && !addr))
> > > goto invalid;
> > >
> > > if (symbol_name) {
> > > + /*
> > > + * Input: @sym + @offset
> > > + * Output: @addr + @offset
> > > + *
> > > + * NOTE: kprobe_lookup_name() does *NOT* fold the offset
> > > + * argument into it's output!
> > > + */
> > > addr = kprobe_lookup_name(symbol_name, offset);
> >
> > Hmm, there are 2 issues.
> >
> > - the 'addr' includes the 'offset' here.
>
> AFAICT it doesn't (I ever wrote that in the comment on top). There's two
> implementations of kprobe_lookup_name(), the weak version doesn't even
> use the offset argument, and the PowerPC implementation only checks for
> !offset and doesn't fold it.

Oops, OK.

>
> > - the 'offset' is NOT limited under the symbol size.
> > (e.g. symbol_name = "_text" and @offset points the offset of target symbol from _text)
> >
> > This means we need to call kallsyms_lookup_size_offset() in this case too.
>
> I'm feeling we should error out in that case. Using sym+offset beyond
> the limits of sym is just daft.

No, this is required for pointing some local scope functions, which has
same name. (And perf-probe does that)

>
> But if you really want/need to retain that, then yes, we need that
> else branch unconditionally :/

Thank you,

--
Masami Hiramatsu <[email protected]>

2022-03-02 23:29:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Wed, Mar 02, 2022 at 09:29:04PM +0530, Naveen N. Rao wrote:
> Peter Zijlstra wrote:
> >
> > How does this look?
>
> I gave this a quick test on powerpc and this looks good to me.

Thanks!

> > --- a/include/linux/kprobes.h
> > +++ b/include/linux/kprobes.h
> > @@ -265,7 +265,6 @@ extern int arch_init_kprobes(void);
> > extern void kprobes_inc_nmissed_count(struct kprobe *p);
> > extern bool arch_within_kprobe_blacklist(unsigned long addr);
> > extern int arch_populate_kprobe_blacklist(void);
> > -extern bool arch_kprobe_on_func_entry(unsigned long offset);
>
> There is a __weak definition of this function in kernel/kprobes.c which
> should also be removed.

*poof*, gone.

2022-03-03 00:27:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Wed, Mar 02, 2022 at 09:47:03PM +0530, Naveen N. Rao wrote:

> With respect to kprobe_lookup_name(), one of the primary motivations there
> was the issue with function descriptors for the previous elf v1 ABI (it
> likely also affects ia64/parisc). I'm thinking it'll be simpler if we have a
> way to obtain function entry point. Something like this:
>
> diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
> index 4176c7eca7b5aa..8c57cc5b77f9ae 100644
> --- a/include/linux/kallsyms.h
> +++ b/include/linux/kallsyms.h
> @@ -73,6 +73,12 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
> /* Lookup the address for a symbol. Returns 0 if not found. */
> unsigned long kallsyms_lookup_name(const char *name);
>
> +/* Return function entry point by additionally dereferencing function descriptor */
> +static inline unsigned long kallsyms_lookup_function(const char *name)
> +{
> + return (unsigned long)dereference_symbol_descriptor((void *)kallsyms_lookup_name(name));
> +}
> +
> extern int kallsyms_lookup_size_offset(unsigned long addr,
> unsigned long *symbolsize,
> unsigned long *offset);
> @@ -103,6 +109,11 @@ static inline unsigned long kallsyms_lookup_name(const char *name)
> return 0;
> }
>
> +static inline unsigned long kallsyms_lookup_function(const char *name)
> +{
> + return 0;
> +}
> +
> static inline int kallsyms_lookup_size_offset(unsigned long addr,
> unsigned long *symbolsize,
> unsigned long *offset)
>
>
> With that, we can fold some of the code from kprobe_lookup_name() into
> arch_adjust_kprobe_addr() and remove kprobe_lookup_name(). This should also
> address Masami's concerns with powerpc promoting all probes at function
> entry into a probe at the ftrace location.

Yes, this looks entirely reasonable to me.

> ---

> kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> bool *on_func_entry)
> {
> *on_func_entry = arch_kprobe_on_func_entry(offset);
> +#ifdef PPC64_ELF_ABI_v2
> + /* Promote probes on the GEP to the LEP */
> + if (!offset)
> + addr = ppc_function_entry((void *)addr);
> +#endif
> return (kprobe_opcode_t *)(addr + offset);
> }

I wonder if you also want to tighten up on_func_entry? Wouldn't the
above suggest something like:

kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
bool *on_func_entry)
{
#ifdef PPC64_ELF_ABI_V2
unsigned long entry = ppc_function_entry((void *)addr) - addr;
*on_func_entry = !offset || offset == entry;
if (*on_func_entry)
offset = entry;
#else
*on_func_entry = !offset;
#endif
return (void *)(addr + offset);
}

Then you can also go and delete the arch_kprobe_on_func_entry rudment.

For the rest, Yay at cleanups!, lots of magic code gone.

2022-03-03 01:56:15

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Wed, 02 Mar 2022 21:47:03 +0530
"Naveen N. Rao" <[email protected]> wrote:

> Peter Zijlstra wrote:
> > --- a/arch/powerpc/kernel/kprobes.c
> > +++ b/arch/powerpc/kernel/kprobes.c
> > @@ -105,6 +105,27 @@ kprobe_opcode_t *kprobe_lookup_name(cons
> > return addr;
> > }
> >
> > +static bool arch_kprobe_on_func_entry(unsigned long offset)
> > +{
> > +#ifdef PPC64_ELF_ABI_v2
> > +#ifdef CONFIG_KPROBES_ON_FTRACE
> > + return offset <= 16;
> > +#else
> > + return offset <= 8;
> > +#endif
> > +#else
> > + return !offset;
> > +#endif
> > +}
> > +
> > +/* XXX try and fold the magic of kprobe_lookup_name() in this */
> > +kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> > + bool *on_func_entry)
> > +{
> > + *on_func_entry = arch_kprobe_on_func_entry(offset);
> > + return (kprobe_opcode_t *)(addr + offset);
> > +}
> > +
>
> With respect to kprobe_lookup_name(), one of the primary motivations there was
> the issue with function descriptors for the previous elf v1 ABI (it likely also
> affects ia64/parisc). I'm thinking it'll be simpler if we have a way to obtain
> function entry point. Something like this:
>
> diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
> index 4176c7eca7b5aa..8c57cc5b77f9ae 100644
> --- a/include/linux/kallsyms.h
> +++ b/include/linux/kallsyms.h
> @@ -73,6 +73,12 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
> /* Lookup the address for a symbol. Returns 0 if not found. */
> unsigned long kallsyms_lookup_name(const char *name);
>
> +/* Return function entry point by additionally dereferencing function descriptor */
> +static inline unsigned long kallsyms_lookup_function(const char *name)
> +{
> + return (unsigned long)dereference_symbol_descriptor((void *)kallsyms_lookup_name(name));
> +}
> +
> extern int kallsyms_lookup_size_offset(unsigned long addr,
> unsigned long *symbolsize,
> unsigned long *offset);
> @@ -103,6 +109,11 @@ static inline unsigned long kallsyms_lookup_name(const char *name)
> return 0;
> }
>
> +static inline unsigned long kallsyms_lookup_function(const char *name)
> +{
> + return 0;
> +}
> +
> static inline int kallsyms_lookup_size_offset(unsigned long addr,
> unsigned long *symbolsize,
> unsigned long *offset)
>
>
> With that, we can fold some of the code from kprobe_lookup_name() into
> arch_adjust_kprobe_addr() and remove kprobe_lookup_name(). This should also
> address Masami's concerns with powerpc promoting all probes at function entry
> into a probe at the ftrace location.

Good point, this looks good to me.
And "kallsyms_lookup_entry_address()" will be the preferable name.

Thank you,
>
>
> - Naveen
>
>
> ---
> arch/powerpc/kernel/kprobes.c | 70 +++--------------------------------
> include/linux/kprobes.h | 1 -
> kernel/kprobes.c | 19 ++--------
> kernel/trace/trace_kprobe.c | 2 +-
> 4 files changed, 9 insertions(+), 83 deletions(-)
>
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 7dae0b01abfbd6..46aa2b9e44c27c 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -41,70 +41,6 @@ bool arch_within_kprobe_blacklist(unsigned long addr)
> addr < (unsigned long)__head_end);
> }
>
> -kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
> -{
> - kprobe_opcode_t *addr = NULL;
> -
> -#ifdef PPC64_ELF_ABI_v2
> - /* PPC64 ABIv2 needs local entry point */
> - addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
> - if (addr && !offset) {
> -#ifdef CONFIG_KPROBES_ON_FTRACE
> - unsigned long faddr;
> - /*
> - * Per livepatch.h, ftrace location is always within the first
> - * 16 bytes of a function on powerpc with -mprofile-kernel.
> - */
> - faddr = ftrace_location_range((unsigned long)addr,
> - (unsigned long)addr + 16);
> - if (faddr)
> - addr = (kprobe_opcode_t *)faddr;
> - else
> -#endif
> - addr = (kprobe_opcode_t *)ppc_function_entry(addr);
> - }
> -#elif defined(PPC64_ELF_ABI_v1)
> - /*
> - * 64bit powerpc ABIv1 uses function descriptors:
> - * - Check for the dot variant of the symbol first.
> - * - If that fails, try looking up the symbol provided.
> - *
> - * This ensures we always get to the actual symbol and not
> - * the descriptor.
> - *
> - * Also handle <module:symbol> format.
> - */
> - char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
> - bool dot_appended = false;
> - const char *c;
> - ssize_t ret = 0;
> - int len = 0;
> -
> - if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) {
> - c++;
> - len = c - name;
> - memcpy(dot_name, name, len);
> - } else
> - c = name;
> -
> - if (*c != '\0' && *c != '.') {
> - dot_name[len++] = '.';
> - dot_appended = true;
> - }
> - ret = strscpy(dot_name + len, c, KSYM_NAME_LEN);
> - if (ret > 0)
> - addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
> -
> - /* Fallback to the original non-dot symbol lookup */
> - if (!addr && dot_appended)
> - addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
> -#else
> - addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
> -#endif
> -
> - return addr;
> -}
> -
> static bool arch_kprobe_on_func_entry(unsigned long offset)
> {
> #ifdef PPC64_ELF_ABI_v2
> @@ -118,11 +54,15 @@ static bool arch_kprobe_on_func_entry(unsigned long offset)
> #endif
> }
>
> -/* XXX try and fold the magic of kprobe_lookup_name() in this */
> kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
> bool *on_func_entry)
> {
> *on_func_entry = arch_kprobe_on_func_entry(offset);
> +#ifdef PPC64_ELF_ABI_v2
> + /* Promote probes on the GEP to the LEP */
> + if (!offset)
> + addr = ppc_function_entry((void *)addr);
> +#endif
> return (kprobe_opcode_t *)(addr + offset);
> }
>
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 9c28f7a0ef4268..dad375056ba049 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -382,7 +382,6 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
> return this_cpu_ptr(&kprobe_ctlblk);
> }
>
> -kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
> kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry);
>
> int register_kprobe(struct kprobe *p);
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 8be57fdc19bdc0..066fa644e9dfa3 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -67,12 +67,6 @@ static bool kprobes_all_disarmed;
> static DEFINE_MUTEX(kprobe_mutex);
> static DEFINE_PER_CPU(struct kprobe *, kprobe_instance);
>
> -kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
> - unsigned int __unused)
> -{
> - return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
> -}
> -
> /*
> * Blacklist -- list of 'struct kprobe_blacklist_entry' to store info where
> * kprobes can not probe.
> @@ -1481,7 +1475,7 @@ bool within_kprobe_blacklist(unsigned long addr)
> if (!p)
> return false;
> *p = '\0';
> - addr = (unsigned long)kprobe_lookup_name(symname, 0);
> + addr = kallsyms_lookup_function(symname);
> if (addr)
> return __within_kprobe_blacklist(addr);
> }
> @@ -1524,14 +1518,7 @@ _kprobe_addr(kprobe_opcode_t *addr, const char *symbol_name,
> goto invalid;
>
> if (symbol_name) {
> - /*
> - * Input: @sym + @offset
> - * Output: @addr + @offset
> - *
> - * NOTE: kprobe_lookup_name() does *NOT* fold the offset
> - * argument into it's output!
> - */
> - addr = kprobe_lookup_name(symbol_name, offset);
> + addr = (kprobe_opcode_t *)kallsyms_lookup_function(symbol_name);
> if (!addr)
> return ERR_PTR(-ENOENT);
> }
> @@ -2621,7 +2608,7 @@ static int __init init_kprobes(void)
> /* lookup the function address from its name */
> for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
> kretprobe_blacklist[i].addr =
> - kprobe_lookup_name(kretprobe_blacklist[i].name, 0);
> + (void *)kallsyms_lookup_function(kretprobe_blacklist[i].name);
> if (!kretprobe_blacklist[i].addr)
> pr_err("Failed to lookup symbol '%s' for kretprobe blacklist. Maybe the target function is removed or renamed.\n",
> kretprobe_blacklist[i].name);
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 508f14af4f2c7e..a8d01954051e60 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -461,7 +461,7 @@ static bool within_notrace_func(struct trace_kprobe *tk)
> if (!p)
> return true;
> *p = '\0';
> - addr = (unsigned long)kprobe_lookup_name(symname, 0);
> + addr = kallsyms_lookup_function(symname);
> if (addr)
> return __within_notrace_func(addr);
> }
> --
> 2.35.1
>
>


--
Masami Hiramatsu <[email protected]>

2022-03-03 15:20:02

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

Peter Zijlstra wrote:
> On Wed, Mar 02, 2022 at 08:32:45PM +0100, Peter Zijlstra wrote:
>> I wonder if you also want to tighten up on_func_entry? Wouldn't the
>> above suggest something like:

Good question ;)
I noticed this yesterday, but held off on making changes so that I can
think this through.

>>
>> kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset,
>> bool *on_func_entry)
>> {
>> #ifdef PPC64_ELF_ABI_V2
>> unsigned long entry = ppc_function_entry((void *)addr) - addr;
>> *on_func_entry = !offset || offset == entry;
>> if (*on_func_entry)
>> offset = entry;
>> #else
>> *on_func_entry = !offset;
>> #endif
>> return (void *)(addr + offset);
>> }

This is more accurate and probably something we should do in the long
term. The main issue I see today is userspace, specifically perf (and
perhaps others too).

Historically, we have worked around the issue of probes on a function's
global entry point not working by having userspace adjust the offset at
which probes are placed. This works well if those object files have
either the symbol table, or debuginfo capturing if functions have a
separate local entry point. In the absence of those, we are left
guessing and we chose to just offset all probes at function entry by 8
(GEP almost always has the same two instructions) so that perf "just
works". This still works well for functions without a GEP since we
expect to see the two ftrace instructions at function entry, so we are
ok to probe after that. As an added bonus, this also allows uprobes to
work, for the most part.

On the kernel side, we only implemented logic to adjust probe address if
a function name was specified without an offset. This went for a toss
once perf probe moved to using _text as the base symbol for kprobes
though, and we weren't handling scenarios where addr+offset was
provided. With the changes in this series, we can now adjust kprobe
address across all those scenarios properly.

If we update perf to not pass an offset any more, then newer perf will
stop working on older kernels. If we make the logic to determine
function entry strict in the kernel, then we risk breaking existing
userspace.

I'm not sure how best to address this.

>
> One question though; the above seems to work for +0 or +8 (IIRC your
> instructions are 4 bytes each and the GEP is 2 instructions).
>
> But what do we want to happen for +4 ?

We don't want to change the behavior of probes at the second instruction
in GEP. The thinking is that it allows the rare scenario (if at all) of
wanting to catch indirect function calls, and/or cross-module function
calls -- especially since we now promote probes at GEP to LEP. I frankly
know of no such scenarios so far, but in any case, if the user is
specifying an offset, they better know what they are asking for :)

For the same reason, we should allow kretprobe at +4.


Thanks,
Naveen