2022-10-21 16:08:54

by Mark Rutland

[permalink] [raw]
Subject: kCFI && patchable-function-entry=M,N

Hi,

For arm64, I'd like to use -fatchable-function-entry=M,N (where N > 0), for our
ftrace implementation, which instruments *some* but not all functions.
Unfortuntately, this doesn't play nicely with -fsanitize=kcfi, as instrumented
and non-instrumented functions don't agree on where the type hash should live
relative to the function entry point, making them incompatible with one another.
AFAICT, there's no mechanism today to get them to agree.

Today we use -fatchable-function-entry=2, which happens to avoid this.

For example, in the below, functions with N=0 expect the type hash to be placed
4 bytes before the entry point, and functions with N=2 expect the type hash to
be placed 12 bytes before the entry point.

| % cat test.c
| #define __notrace __attribute__((patchable_function_entry(0, 0)))
|
| void callee_patchable(void)
| {
| }
|
| void __notrace callee_non_patchable(void)
| {
| }
|
| typedef void (*callee_fn_t)(void);
|
| void caller_patchable(callee_fn_t callee)
| {
| callee();
| }
|
| void __notrace caller_non_patchable(callee_fn_t callee)
| {
| callee();
| }
| % clang --target=aarch64-linux -c test.c -fpatchable-function-entry=2,2 -fsanitize=kcfi -O2
| % aarch64-linux-objdump -d test.o
|
| test.o: file format elf64-littleaarch64
|
|
| Disassembly of section .text:
|
| 0000000000000000 <callee_patchable-0xc>:
| 0: a540670c .word 0xa540670c
| 4: d503201f nop
| 8: d503201f nop
|
| 000000000000000c <callee_patchable>:
| c: d65f03c0 ret
| 10: a540670c .word 0xa540670c
|
| 0000000000000014 <callee_non_patchable>:
| 14: d65f03c0 ret
| 18: 07d85f31 .word 0x07d85f31
| 1c: d503201f nop
| 20: d503201f nop
|
| 0000000000000024 <caller_patchable>:
| 24: b85f4010 ldur w16, [x0, #-12]
| 28: 728ce191 movk w17, #0x670c
| 2c: 72b4a811 movk w17, #0xa540, lsl #16
| 30: 6b11021f cmp w16, w17
| 34: 54000040 b.eq 3c <caller_patchable+0x18> // b.none
| 38: d4304400 brk #0x8220
| 3c: d61f0000 br x0
| 40: 07d85f31 .word 0x07d85f31
|
| 0000000000000044 <caller_non_patchable>:
| 44: b85fc010 ldur w16, [x0, #-4]
| 48: 728ce191 movk w17, #0x670c
| 4c: 72b4a811 movk w17, #0xa540, lsl #16
| 50: 6b11021f cmp w16, w17
| 54: 54000040 b.eq 5c <caller_non_patchable+0x18> // b.none
| 58: d4304400 brk #0x8220
| 5c: d61f0000 br x0

On arm64, I'd like to use -fpatchable-function-entry=4,2 on arm64, along with
-falign-functions=8, so that we can place a naturally-aligned 8-byte literal
before the function (e.g. a pointer value). That allows us to implement an
efficient per-callsite hook without hitting branch range limitations and/or
requiring trampolines. I have a PoC that works without kCFI at:

https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/per-callsite-ops

I mentioned this in the #clangbuiltlinux IRQ channel, and Sami wrote up a
github issue at: https://github.com/ClangBuiltLinux/linux/issues/1744

Currently clang generates:

| < HASH >
| NOP
| NOP
| func: BTI // optional
| NOP
| NOP

... and to make this consistent with non-instrumented functions, the
non-instrumented functions would need pre-function padding before their hashes.

For my use-case, it doesn't matter where the pre-function NOPs are placed
relative to the type hash, so long as the location is consistent, and it might
be nicer to have the option to place the pre-function NOPs before the hash,
which would avoid needing to change non-instrumented functions (and save some
space) e.g.

| NOP
| NOP
| < HASH >
| func: BTI // optional
| NOP
| NOP

... but I understand that for x86, folk want the pre-function NOPs to
fall-through into the body of the function.

Is there any mechanism today that we could use to solve this, or could we
extend clang to have some options to control this behaviour?

It would also be helpful to have a symbol before both the hash and pre-function
NOPs so that we can filter those out of probes patching (I see that x86 does
this with the __cfi_function symbol).

Thanks,
Mark.


2022-10-21 18:19:57

by Sami Tolvanen

[permalink] [raw]
Subject: Re: kCFI && patchable-function-entry=M,N

On Fri, Oct 21, 2022 at 8:56 AM Mark Rutland <[email protected]> wrote:
>
> Hi,
>
> For arm64, I'd like to use -fatchable-function-entry=M,N (where N > 0), for our
> ftrace implementation, which instruments *some* but not all functions.
> Unfortuntately, this doesn't play nicely with -fsanitize=kcfi, as instrumented
> and non-instrumented functions don't agree on where the type hash should live
> relative to the function entry point, making them incompatible with one another.

Yes, the current implementation assumes that if prefix nops are used,
all functions have the same number of them.

> Is there any mechanism today that we could use to solve this, or could we
> extend clang to have some options to control this behaviour?

I don't think there's a mechanism to work around the issue right now,
but we could just change where the hash is emitted on arm64.

> It would also be helpful to have a symbol before both the hash and pre-function
> NOPs so that we can filter those out of probes patching (I see that x86 does
> this with the __cfi_function symbol).

Adding a symbol before the hash isn't a problem, but if we move the
hash and want the symbol to be placed before the prefix nops as well,
we might need a flag to control this. Fangrui, what do you think?

Sami

2022-10-22 04:29:55

by Fangrui Song

[permalink] [raw]
Subject: Re: kCFI && patchable-function-entry=M,N

On Fri, Oct 21, 2022 at 10:39 AM Sami Tolvanen <[email protected]> wrote:
>
> On Fri, Oct 21, 2022 at 8:56 AM Mark Rutland <[email protected]> wrote:
> >
> > Hi,
> >
> > For arm64, I'd like to use -fatchable-function-entry=M,N (where N > 0), for our
> > ftrace implementation, which instruments *some* but not all functions.
> > Unfortuntately, this doesn't play nicely with -fsanitize=kcfi, as instrumented
> > and non-instrumented functions don't agree on where the type hash should live
> > relative to the function entry point, making them incompatible with one another.
>
> Yes, the current implementation assumes that if prefix nops are used,
> all functions have the same number of them.
>
> > Is there any mechanism today that we could use to solve this, or could we
> > extend clang to have some options to control this behaviour?
>
> I don't think there's a mechanism to work around the issue right now,
> but we could just change where the hash is emitted on arm64.
>
> > It would also be helpful to have a symbol before both the hash and pre-function
> > NOPs so that we can filter those out of probes patching (I see that x86 does
> > this with the __cfi_function symbol).
>
> Adding a symbol before the hash isn't a problem, but if we move the
> hash and want the symbol to be placed before the prefix nops as well,
> we might need a flag to control this. Fangrui, what do you think?
>
> Sami

Since the kcfi code expects the hash to appear in a specific location
so that an instrumented indirect jump can reliably obtain the hash.
For a translation unit `-fpatchable-function-entry=N,M` may be
specified or not, and we want both to work. Therefore, I agree that a
consistent hash location will help. This argument favors placing M
nops before the hash. The downside is a restriction on how the M nops
can be used. Previously if M>0, the runtime code needs to check
whether a BTI exists to locate the N-M after-function-entry NOPs. If
the hash appears after the M nops, the runtime code needs to
additionally knows whether the hash exists. My question is how to
reliably detect this.

If there is motivation using M>0, I'd like to know the concrete code
sequence for `-fpatchable-function-entry=N,M` and how the runtime code
detects the NOPs with optional hash and optional BTI.


--
宋方睿

2022-10-22 15:05:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: kCFI && patchable-function-entry=M,N

On Fri, Oct 21, 2022 at 04:56:20PM +0100, Mark Rutland wrote:
> Hi,
>
> For arm64, I'd like to use -fatchable-function-entry=M,N (where N > 0), for our
> ftrace implementation, which instruments *some* but not all functions.
> Unfortuntately, this doesn't play nicely with -fsanitize=kcfi, as instrumented
> and non-instrumented functions don't agree on where the type hash should live
> relative to the function entry point, making them incompatible with one another.
> AFAICT, there's no mechanism today to get them to agree.
>
> Today we use -fatchable-function-entry=2, which happens to avoid this.

> ... but I understand that for x86, folk want the pre-function NOPs to
> fall-through into the body of the function.

Yep.

> Is there any mechanism today that we could use to solve this, or could we
> extend clang to have some options to control this behaviour?

So the main pain-point for you is differentiating between function with
notrace and those without it, right?

That is; suppose you (like x86) globally do:
-fpatchable-function-entry=4,2 to get a consistent function signature,
you're up a creek because you use the __patchable_function_entries
section to drive ftrace and now every function will have it.

So perhaps something like:

-fpatchable-function-entry=N,M,sectionname

would help, then you can have notrace be the same layout, except a
different section. Eg. something like:

#define notrace __attribute__((patchable_function_entry(4,2,__notrace_function_entries)))

It does make the whole: CFLAGS_REMOVE_file.o = $(CC_FLAGS_FTRACE)
a bit of a pain, but I've long favoured removing all that and having
explitic notrace attributes on all relevant functions.

Then again; perhaps it could be made to work by ensuring CFLAGS starts
with:

-fpatchable-function-entry=4,2,__notrace_function_entries

and have CC_FLAGS_FTRACE include (and hence override with)

-fpatchable-function-entry=4,2,__ftrace_function_entries

assuming that with duplicate argument the last is effective.

2022-10-24 11:21:37

by Mark Rutland

[permalink] [raw]
Subject: Re: kCFI && patchable-function-entry=M,N

On Fri, Oct 21, 2022 at 09:14:41PM -0700, Fangrui Song wrote:
> On Fri, Oct 21, 2022 at 10:39 AM Sami Tolvanen <[email protected]> wrote:
> >
> > On Fri, Oct 21, 2022 at 8:56 AM Mark Rutland <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > For arm64, I'd like to use -fatchable-function-entry=M,N (where N > 0), for our
> > > ftrace implementation, which instruments *some* but not all functions.
> > > Unfortuntately, this doesn't play nicely with -fsanitize=kcfi, as instrumented
> > > and non-instrumented functions don't agree on where the type hash should live
> > > relative to the function entry point, making them incompatible with one another.
> >
> > Yes, the current implementation assumes that if prefix nops are used,
> > all functions have the same number of them.
> >
> > > Is there any mechanism today that we could use to solve this, or could we
> > > extend clang to have some options to control this behaviour?
> >
> > I don't think there's a mechanism to work around the issue right now,
> > but we could just change where the hash is emitted on arm64.
> >
> > > It would also be helpful to have a symbol before both the hash and pre-function
> > > NOPs so that we can filter those out of probes patching (I see that x86 does
> > > this with the __cfi_function symbol).
> >
> > Adding a symbol before the hash isn't a problem, but if we move the
> > hash and want the symbol to be placed before the prefix nops as well,
> > we might need a flag to control this. Fangrui, what do you think?
> >
> > Sami
>
> Since the kcfi code expects the hash to appear in a specific location
> so that an instrumented indirect jump can reliably obtain the hash.
> For a translation unit `-fpatchable-function-entry=N,M` may be
> specified or not, and we want both to work. Therefore, I agree that a
> consistent hash location will help. This argument favors placing M
> nops before the hash. The downside is a restriction on how the M nops
> can be used. Previously if M>0, the runtime code needs to check
> whether a BTI exists to locate the N-M after-function-entry NOPs. If
> the hash appears after the M nops, the runtime code needs to
> additionally knows whether the hash exists. My question is how to
> reliably detect this.

That's a fair point.

For detecting BTI we can scan the binary for BTI/NOP at M instructions into the
patch-site, but a similar approach won't be reliable for the type hash since
the type hash itself could have the same bit pattern as an instruction.

> If there is motivation using M>0, I'd like to know the concrete code
> sequence for `-fpatchable-function-entry=N,M` and how the runtime code
> detects the NOPs with optional hash and optional BTI.

For the BTI case alone, I have code at:

https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/ftrace/per-callsite-ops&id=272a580fd5b7acc31747505d71530cee7cc2837d

... the gist being that it checks the instruction M insns after the start of
the patch site.

For the type hash, I think there are a few options, e.g.

* Restricting the type hash to a set of values that can be identified (e.g.
encoding those as a permanently-undefined UDF with a 16-bit immediate).

* Adding options to record additional metadata along with the pointer to the
patch-site in the __patchable_function_entries section.

* Adding an option to record patch-site variants to sub-sections of the
__patchable_function_entries section, so that at link time these can be
grouped separately, e.g.

* __patchable_function_entries.??? // no BTI, no type hash
* __patchable_function_entries.bti // has BTI
* __patchable_function_entries.bti_cfi // has BTI and type hash
* __patchable_function_entries.cfi // has type hash

Do any of those approaches sound plausible to you?

Thanks,
Mark.

2022-10-24 12:03:00

by Mark Rutland

[permalink] [raw]
Subject: Re: kCFI && patchable-function-entry=M,N

On Sat, Oct 22, 2022 at 04:57:18PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 21, 2022 at 04:56:20PM +0100, Mark Rutland wrote:
> > Hi,
> >
> > For arm64, I'd like to use -fatchable-function-entry=M,N (where N > 0), for our
> > ftrace implementation, which instruments *some* but not all functions.
> > Unfortuntately, this doesn't play nicely with -fsanitize=kcfi, as instrumented
> > and non-instrumented functions don't agree on where the type hash should live
> > relative to the function entry point, making them incompatible with one another.
> > AFAICT, there's no mechanism today to get them to agree.
> >
> > Today we use -fatchable-function-entry=2, which happens to avoid this.
>
> > ... but I understand that for x86, folk want the pre-function NOPs to
> > fall-through into the body of the function.
>
> Yep.
>
> > Is there any mechanism today that we could use to solve this, or could we
> > extend clang to have some options to control this behaviour?
>
> So the main pain-point for you is differentiating between function with
> notrace and those without it, right?
>
> That is; suppose you (like x86) globally do:
> -fpatchable-function-entry=4,2 to get a consistent function signature,
> you're up a creek because you use the __patchable_function_entries
> section to drive ftrace and now every function will have it.
>
> So perhaps something like:
>
> -fpatchable-function-entry=N,M,sectionname
>
> would help, then you can have notrace be the same layout, except a
> different section. Eg. something like:
>
> #define notrace __attribute__((patchable_function_entry(4,2,__notrace_function_entries)))

FWIW, I think that'd work for me, and that was roughly my original proposal on
IRC. My only concern with this approach is code size, since all uninstrumented
functions gain some point less prefix NOPs.

We can make that slghtly better as:

#define notrace __attribute__((patchable_function_entry(2,2,__notrace_function_entries)))

... since we don't care about placing NOPs *within* the function

> It does make the whole: CFLAGS_REMOVE_file.o = $(CC_FLAGS_FTRACE)
> a bit of a pain, but I've long favoured removing all that and having
> explitic notrace attributes on all relevant functions.
>
> Then again; perhaps it could be made to work by ensuring CFLAGS starts
> with:
>
> -fpatchable-function-entry=4,2,__notrace_function_entries
>
> and have CC_FLAGS_FTRACE include (and hence override with)
>
> -fpatchable-function-entry=4,2,__ftrace_function_entries
>
> assuming that with duplicate argument the last is effective.

TBH, it'd be nice to move ftrace to the `CFLAGS_WHATEVER_obj.o := n` approach
the other instrumentation uses, which IIUC would allow us to define different
flags for the two cases (though I'll need to go check that).

Thanks,
Mark.

2022-10-24 21:46:46

by Sami Tolvanen

[permalink] [raw]
Subject: Re: kCFI && patchable-function-entry=M,N

On Mon, Oct 24, 2022 at 4:18 AM Mark Rutland <[email protected]> wrote:
>
> On Fri, Oct 21, 2022 at 09:14:41PM -0700, Fangrui Song wrote:
> > On Fri, Oct 21, 2022 at 10:39 AM Sami Tolvanen <[email protected]> wrote:
> > >
> > > On Fri, Oct 21, 2022 at 8:56 AM Mark Rutland <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > For arm64, I'd like to use -fatchable-function-entry=M,N (where N > 0), for our
> > > > ftrace implementation, which instruments *some* but not all functions.
> > > > Unfortuntately, this doesn't play nicely with -fsanitize=kcfi, as instrumented
> > > > and non-instrumented functions don't agree on where the type hash should live
> > > > relative to the function entry point, making them incompatible with one another.
> > >
> > > Yes, the current implementation assumes that if prefix nops are used,
> > > all functions have the same number of them.
> > >
> > > > Is there any mechanism today that we could use to solve this, or could we
> > > > extend clang to have some options to control this behaviour?
> > >
> > > I don't think there's a mechanism to work around the issue right now,
> > > but we could just change where the hash is emitted on arm64.
> > >
> > > > It would also be helpful to have a symbol before both the hash and pre-function
> > > > NOPs so that we can filter those out of probes patching (I see that x86 does
> > > > this with the __cfi_function symbol).
> > >
> > > Adding a symbol before the hash isn't a problem, but if we move the
> > > hash and want the symbol to be placed before the prefix nops as well,
> > > we might need a flag to control this. Fangrui, what do you think?
> > >
> > > Sami
> >
> > Since the kcfi code expects the hash to appear in a specific location
> > so that an instrumented indirect jump can reliably obtain the hash.
> > For a translation unit `-fpatchable-function-entry=N,M` may be
> > specified or not, and we want both to work. Therefore, I agree that a
> > consistent hash location will help. This argument favors placing M
> > nops before the hash. The downside is a restriction on how the M nops
> > can be used. Previously if M>0, the runtime code needs to check
> > whether a BTI exists to locate the N-M after-function-entry NOPs. If
> > the hash appears after the M nops, the runtime code needs to
> > additionally knows whether the hash exists. My question is how to
> > reliably detect this.
>
> That's a fair point.
>
> For detecting BTI we can scan the binary for BTI/NOP at M instructions into the
> patch-site, but a similar approach won't be reliable for the type hash since
> the type hash itself could have the same bit pattern as an instruction.

We could always change the compiler to ensure the hash doesn't match
specific instructions. For example, if the hash can never be a NOP,
you could just skip any non-NOP instructions after the initial prefix
NOPs, right?

> > If there is motivation using M>0, I'd like to know the concrete code
> > sequence for `-fpatchable-function-entry=N,M` and how the runtime code
> > detects the NOPs with optional hash and optional BTI.
>
> For the BTI case alone, I have code at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/ftrace/per-callsite-ops&id=272a580fd5b7acc31747505d71530cee7cc2837d
>
> ... the gist being that it checks the instruction M insns after the start of
> the patch site.
>
> For the type hash, I think there are a few options, e.g.
>
> * Restricting the type hash to a set of values that can be identified (e.g.
> encoding those as a permanently-undefined UDF with a 16-bit immediate).

Which would be quite similar, except instead of restricting hash
values to a specific range, we'd just have a list of unallowed hash
values.

Sami

2023-01-04 17:46:55

by Fangrui Song

[permalink] [raw]
Subject: Re: kCFI && patchable-function-entry=M,N

On Mon, Oct 24, 2022 at 4:24 AM Mark Rutland <[email protected]> wrote:
>
> On Sat, Oct 22, 2022 at 04:57:18PM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 21, 2022 at 04:56:20PM +0100, Mark Rutland wrote:
> > > Hi,
> > >
> > > For arm64, I'd like to use -fatchable-function-entry=M,N (where N > 0), for our
> > > ftrace implementation, which instruments *some* but not all functions.
> > > Unfortuntately, this doesn't play nicely with -fsanitize=kcfi, as instrumented
> > > and non-instrumented functions don't agree on where the type hash should live
> > > relative to the function entry point, making them incompatible with one another.
> > > AFAICT, there's no mechanism today to get them to agree.
> > >
> > > Today we use -fatchable-function-entry=2, which happens to avoid this.
> >
> > > ... but I understand that for x86, folk want the pre-function NOPs to
> > > fall-through into the body of the function.
> >
> > Yep.
> >
> > > Is there any mechanism today that we could use to solve this, or could we
> > > extend clang to have some options to control this behaviour?
> >
> > So the main pain-point for you is differentiating between function with
> > notrace and those without it, right?
> >
> > That is; suppose you (like x86) globally do:
> > -fpatchable-function-entry=4,2 to get a consistent function signature,
> > you're up a creek because you use the __patchable_function_entries
> > section to drive ftrace and now every function will have it.
> >
> > So perhaps something like:
> >
> > -fpatchable-function-entry=N,M,sectionname
> >
> > would help, then you can have notrace be the same layout, except a
> > different section. Eg. something like:
> >
> > #define notrace __attribute__((patchable_function_entry(4,2,__notrace_function_entries)))
>
> FWIW, I think that'd work for me, and that was roughly my original proposal on
> IRC. My only concern with this approach is code size, since all uninstrumented
> functions gain some point less prefix NOPs.
>
> We can make that slghtly better as:
>
> #define notrace __attribute__((patchable_function_entry(2,2,__notrace_function_entries)))
>
> ... since we don't care about placing NOPs *within* the function
>
> > It does make the whole: CFLAGS_REMOVE_file.o = $(CC_FLAGS_FTRACE)
> > a bit of a pain, but I've long favoured removing all that and having
> > explitic notrace attributes on all relevant functions.
> >
> > Then again; perhaps it could be made to work by ensuring CFLAGS starts
> > with:
> >
> > -fpatchable-function-entry=4,2,__notrace_function_entries
> >
> > and have CC_FLAGS_FTRACE include (and hence override with)
> >
> > -fpatchable-function-entry=4,2,__ftrace_function_entries
> >
> > assuming that with duplicate argument the last is effective.
>
> TBH, it'd be nice to move ftrace to the `CFLAGS_WHATEVER_obj.o := n` approach
> the other instrumentation uses, which IIUC would allow us to define different
> flags for the two cases (though I'll need to go check that).
>
> Thanks,
> Mark.

Hi Mark and Peter,

Sami asked my opinion (as the main -fpatchable-function-entry=
implementer on the llvm-project side) on this extension
(-fpatchable-function-entry=4,2,__ftrace_function_entries).
I think this is fine.

You may consider bringing this up as a GCC feature request
(https://gcc.gnu.org/bugzilla/show_bug.cgi) and CCing the author/the
committer of https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=417ca0117a1a9a8aaf5bc5ca530adfd68cb00399
(original -fpatchable-function-entry= support) and the author of
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=c23b5006d3ffeda1a9edf5fd817765a6da3696ca
(powerpc64 ELFv2 support).
On the feature request, a summary (so that toolchain people don't have
to read every message in this thread) will help:)



--
宋方睿