2022-08-15 08:51:50

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC] ftrace: Add support to keep some functions out of ftrace

On Mon, Aug 15, 2022 at 10:07:54AM +0800, Chen Zhongjin wrote:
>
> On 2022/8/14 23:22, Jiri Olsa wrote:
> > On Sat, Aug 13, 2022 at 03:02:52PM -0400, Steven Rostedt wrote:
> > > On Fri, 12 Aug 2022 23:18:15 +0200
> > > Jiri Olsa <[email protected]> wrote:
> > >
> > > > the patch below moves the bpf function into sepatate object and switches
> > > > off the -mrecord-mcount for it.. so the function gets profile call
> > > > generated but it's not visible to ftrace
> > > >
> > > > this seems to work, but it depends on -mrecord-mcount support in gcc and
> > > > it's x86 specific... other archs seems to use -fpatchable-function-entry,
> > > > which does not seem to have option to omit symbol from being collected
> > > > to the section
> > > As I stated. the __mcount_loc section was created by ftrace. It has
> > > nothing to do with -fpatchable-function-entry. It's just that the archs
> > > that use that, do not have a gcc that creates the __mcount_loc section.
> > >
> > > > disabling specific ftrace symbol with FTRACE_FL_DISABLED flag seems to
> > > > be easir and generic solution.. I'll send RFC for that
> > > It's not easier.
> > >
> > > Here, I have a POC for you and some more history.
> > >
> > > The recordmcount.pl Perl script was the first to create the
> > > __mcount_loc section in all objects that ftrace needed to hook to. It
> > > did an objdump, found the locations of the calls to mcount, created
> > > another file that had a section __mcount_loc that referenced all those
> > > locations. Compiled and relinked that back into the original object.
> > >
> > > This was performed on all object files during the build, and had an
> > > impact on build times. But this is when I also created and introduced
> > > "make localmodconfig", which shrunk the build times for many people, so
> > > nobody noticed the build time increase! :-)
> > >
> > > Then John Reiser sent me a patch that created recordmcount.c that did
> > > the same work but directly modified the ELF object files without having
> > > to run scripts. This got rid of that horrible overhead from the scripts.
> > >
> > > Then, the gcc folks decided to be helpful here as well and created the
> > > --mrecord-mcount option that would create the __mcount_loc section for
> > > us, where we no longer needed the recordmcount scripts/C program. But
> > > is not available across the board.
> > >
> > > Today, objtool has also got involved, and added an "--mcount" option
> > > that will create the section too.
> > I overlooked that objtool is involved as well,
> > will check on that
>
> objtool --mcount option only involves mcount_loc generation (see
> annotate_call_site) and other validation check call destination directly
> (see is_fentry_call).
>
> Some simply removing --mcount option dose work for this.
>
>
> Another question, it seems we can export and use DEFINE_BPF_DISPATCHER out
> of kernel, does that means we should add NO_MCOUNT_FILES for these single
> uages as well?

yes, cc-ing Bj?rn to make sure it's valid use case for dispatcher

jirka

>
> I dont think it can be made automatically. If ignored, this can be a
> trouble.
>
>
> Best,
>
> Chen
>
> > > Below is a patch that extends yours by adding a NO_MCOUNT_FILES list,
> > > that you add the object file to and it will prevent the other methods
> > > from adding an mcount_loc location.
> > thanks,
> > jirka
> >
> > > I'm adding the objtool folks to make sure this is fine with them.
> > > Again, this is a proof of concept, but works. It may need to be cleaned
> > > a bit before it is final.
> > >
> > > -- Steve
> > >
> > > Index: linux-trace.git/scripts/Makefile.build
> > > ===================================================================
> > > --- linux-trace.git.orig/scripts/Makefile.build
> > > +++ linux-trace.git/scripts/Makefile.build
> > > @@ -206,8 +206,10 @@ sub_cmd_record_mcount = perl $(srctree)/
> > > "$(if $(part-of-module),1,0)" "$(@)";
> > > recordmcount_source := $(srctree)/scripts/recordmcount.pl
> > > endif # BUILD_C_RECORDMCOUNT
> > > -cmd_record_mcount = $(if $(findstring $(strip $(CC_FLAGS_FTRACE)),$(_c_flags)), \
> > > +chk_sub_cmd_record_mcount = $(if $(filter $(shell basename $@),$(NO_MCOUNT_FILES)),, \
> > > $(sub_cmd_record_mcount))
> > > +cmd_record_mcount = $(if $(findstring $(strip $(CC_FLAGS_FTRACE)),$(_c_flags)), \
> > > + $(chk_sub_cmd_record_mcount))
> > > endif # CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT
> > > # 'OBJECT_FILES_NON_STANDARD := y': skip objtool checking for a directory
> > > Index: linux-trace.git/scripts/Makefile.lib
> > > ===================================================================
> > > --- linux-trace.git.orig/scripts/Makefile.lib
> > > +++ linux-trace.git/scripts/Makefile.lib
> > > @@ -233,7 +233,8 @@ objtool_args = \
> > > $(if $(CONFIG_HAVE_JUMP_LABEL_HACK), --hacks=jump_label) \
> > > $(if $(CONFIG_HAVE_NOINSTR_HACK), --hacks=noinstr) \
> > > $(if $(CONFIG_X86_KERNEL_IBT), --ibt) \
> > > - $(if $(CONFIG_FTRACE_MCOUNT_USE_OBJTOOL), --mcount) \
> > > + $(if $(filter $(shell basename $@),$(NO_MCOUNT_FILES)),, \
> > > + $(if $(CONFIG_FTRACE_MCOUNT_USE_OBJTOOL), --mcount)) \
> > > $(if $(CONFIG_UNWINDER_ORC), --orc) \
> > > $(if $(CONFIG_RETPOLINE), --retpoline) \
> > > $(if $(CONFIG_SLS), --sls) \
> > > Index: linux-trace.git/net/core/Makefile
> > > ===================================================================
> > > --- linux-trace.git.orig/net/core/Makefile
> > > +++ linux-trace.git/net/core/Makefile
> > > @@ -11,10 +11,15 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.
> > > obj-y += dev.o dev_addr_lists.o dst.o netevent.o \
> > > neighbour.o rtnetlink.o utils.o link_watch.o filter.o \
> > > sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \
> > > - fib_notifier.o xdp.o flow_offload.o gro.o
> > > + fib_notifier.o xdp.o flow_offload.o gro.o \
> > > + dispatcher.o
> > > obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
> > > +# remove dispatcher function from ftrace sight
> > > +CFLAGS_REMOVE_dispatcher.o = -mrecord-mcount
> > > +NO_MCOUNT_FILES += dispatcher.o
> > > +
> > > obj-y += net-sysfs.o
> > > obj-$(CONFIG_PAGE_POOL) += page_pool.o
> > > obj-$(CONFIG_PROC_FS) += net-procfs.o
> > > Index: linux-trace.git/net/core/dispatcher.c
> > > ===================================================================
> > > --- /dev/null
> > > +++ linux-trace.git/net/core/dispatcher.c
> > > @@ -0,0 +1,3 @@
> > > +#include <linux/bpf.h>
> > > +
> > > +DEFINE_BPF_DISPATCHER(xdp)
> > > Index: linux-trace.git/net/core/filter.c
> > > ===================================================================
> > > --- linux-trace.git.orig/net/core/filter.c
> > > +++ linux-trace.git/net/core/filter.c
> > > @@ -11162,8 +11162,6 @@ const struct bpf_verifier_ops sk_lookup_
> > > #endif /* CONFIG_INET */
> > > -DEFINE_BPF_DISPATCHER(xdp)
> > > -
> > > void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog)
> > > {
> > > bpf_dispatcher_change_prog(BPF_DISPATCHER_PTR(xdp), prev_prog, prog);
> > >
>


2022-08-15 11:04:38

by Björn Töpel

[permalink] [raw]
Subject: Re: [RFC] ftrace: Add support to keep some functions out of ftrace

On Mon, 15 Aug 2022 at 10:04, Jiri Olsa <[email protected]> wrote:
[...]
> > > >
> > > > Today, objtool has also got involved, and added an "--mcount" option
> > > > that will create the section too.
> > > I overlooked that objtool is involved as well,
> > > will check on that
> >
> > objtool --mcount option only involves mcount_loc generation (see
> > annotate_call_site) and other validation check call destination directly
> > (see is_fentry_call).
> >
> > Some simply removing --mcount option dose work for this.
> >
> >
> > Another question, it seems we can export and use DEFINE_BPF_DISPATCHER out
> > of kernel, does that means we should add NO_MCOUNT_FILES for these single
> > uages as well?
>
> yes, cc-ing Björn to make sure it's valid use case for dispatcher
>

Hmm, could you expand a bit on how this would work?

2022-08-15 11:31:45

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC] ftrace: Add support to keep some functions out of ftrace

On Mon, Aug 15, 2022 at 01:01:06PM +0200, Bj?rn T?pel wrote:
> On Mon, 15 Aug 2022 at 10:04, Jiri Olsa <[email protected]> wrote:
> [...]
> > > > >
> > > > > Today, objtool has also got involved, and added an "--mcount" option
> > > > > that will create the section too.
> > > > I overlooked that objtool is involved as well,
> > > > will check on that
> > >
> > > objtool --mcount option only involves mcount_loc generation (see
> > > annotate_call_site) and other validation check call destination directly
> > > (see is_fentry_call).
> > >
> > > Some simply removing --mcount option dose work for this.
> > >
> > >
> > > Another question, it seems we can export and use DEFINE_BPF_DISPATCHER out
> > > of kernel, does that means we should add NO_MCOUNT_FILES for these single
> > > uages as well?
> >
> > yes, cc-ing Bj?rn to make sure it's valid use case for dispatcher
> >
>
> Hmm, could you expand a bit on how this would work?

the goal here is to remove bpf_dispatcher_<FUNC>_func functions from
ftrace, because it's updated by dispatcher code with bpf_arch_text_poke,
but it's also visible and attachable to ftrace.. and will cause problems
when these 2 updates will race

question was if DEFINE_BPF_DISPATCHER can be used in kernel module,
which would bring another realm of problems ;-)

jirka

2022-08-15 12:28:17

by Björn Töpel

[permalink] [raw]
Subject: Re: [RFC] ftrace: Add support to keep some functions out of ftrace

On Mon, 15 Aug 2022 at 13:29, Jiri Olsa <[email protected]> wrote:
>
> On Mon, Aug 15, 2022 at 01:01:06PM +0200, Björn Töpel wrote:
> > On Mon, 15 Aug 2022 at 10:04, Jiri Olsa <[email protected]> wrote:
> > [...]
> > > > > >
> > > > > > Today, objtool has also got involved, and added an "--mcount" option
> > > > > > that will create the section too.
> > > > > I overlooked that objtool is involved as well,
> > > > > will check on that
> > > >
> > > > objtool --mcount option only involves mcount_loc generation (see
> > > > annotate_call_site) and other validation check call destination directly
> > > > (see is_fentry_call).
> > > >
> > > > Some simply removing --mcount option dose work for this.
> > > >
> > > >
> > > > Another question, it seems we can export and use DEFINE_BPF_DISPATCHER out
> > > > of kernel, does that means we should add NO_MCOUNT_FILES for these single
> > > > uages as well?
> > >
> > > yes, cc-ing Björn to make sure it's valid use case for dispatcher
> > >
> >
> > Hmm, could you expand a bit on how this would work?
>
> the goal here is to remove bpf_dispatcher_<FUNC>_func functions from
> ftrace, because it's updated by dispatcher code with bpf_arch_text_poke,
> but it's also visible and attachable to ftrace.. and will cause problems
> when these 2 updates will race
>
> question was if DEFINE_BPF_DISPATCHER can be used in kernel module,
> which would bring another realm of problems ;-)
>

Oh, now I follow. AFAIK there is only one flavor of BPF dispatcher in
use, and that's the XDP dispatcher, which does not reside in module
code, but is typically *called* by module code.

A module could define a BPF dispatcher, but it wouldn't be able to
update it, since bpf_arch_text_poke() does not support poking in
modules.


Björn

2022-08-15 13:06:06

by Björn Töpel

[permalink] [raw]
Subject: Re: [RFC] ftrace: Add support to keep some functions out of ftrace

On Mon, 15 Aug 2022 at 14:19, Björn Töpel <[email protected]> wrote:
>
> On Mon, 15 Aug 2022 at 13:29, Jiri Olsa <[email protected]> wrote:
> >
> > On Mon, Aug 15, 2022 at 01:01:06PM +0200, Björn Töpel wrote:
> > > On Mon, 15 Aug 2022 at 10:04, Jiri Olsa <[email protected]> wrote:
> > > [...]
> > > > > > >
> > > > > > > Today, objtool has also got involved, and added an "--mcount" option
> > > > > > > that will create the section too.
> > > > > > I overlooked that objtool is involved as well,
> > > > > > will check on that
> > > > >
> > > > > objtool --mcount option only involves mcount_loc generation (see
> > > > > annotate_call_site) and other validation check call destination directly
> > > > > (see is_fentry_call).
> > > > >
> > > > > Some simply removing --mcount option dose work for this.
> > > > >
> > > > >
> > > > > Another question, it seems we can export and use DEFINE_BPF_DISPATCHER out
> > > > > of kernel, does that means we should add NO_MCOUNT_FILES for these single
> > > > > uages as well?
> > > >
> > > > yes, cc-ing Björn to make sure it's valid use case for dispatcher
> > > >
> > >
> > > Hmm, could you expand a bit on how this would work?
> >
> > the goal here is to remove bpf_dispatcher_<FUNC>_func functions from
> > ftrace, because it's updated by dispatcher code with bpf_arch_text_poke,
> > but it's also visible and attachable to ftrace.. and will cause problems
> > when these 2 updates will race
> >
> > question was if DEFINE_BPF_DISPATCHER can be used in kernel module,
> > which would bring another realm of problems ;-)
> >
>
> Oh, now I follow. AFAIK there is only one flavor of BPF dispatcher in
> use, and that's the XDP dispatcher, which does not reside in module
> code, but is typically *called* by module code.
>

Some history why the EXPORT is required:
https://lore.kernel.org/bpf/CAADnVQ+eD-=FZrg8L+YcdCyAS+E30W=Z-ShtEXAXVFjmxV4usg@mail.gmail.com/