2018-02-27 10:08:30

by Alan Kao

[permalink] [raw]
Subject: ftrace: Proposal for an Alternative RecordMcount framework

Hi Steven,

Current recordmcount framework collects the mcount call-sites by grep'ing
the relocation info in each *.o file right after it is compiled, and then
puts them into the __mcount_loc_start array. This works fine in many
architectures, but as mentioned in this riscv/ftrace patch[1], aggressive
relaxing optimizations corrupt the collected offsets, resulting in panics
due to wrong call-site patching in runtime.

Meanwhile, some architectures, such as RISC-V and the on-going nds32, highly
rely on linker relaxation as a link-time optimization to reduce code size
and improve performance. It would be very undesirable to sacrifice them for
ftrace only. But, why can't we collect the call-sites after all of them
are fixed?

We propose an alternative framework, for architectures that cannot
properly record call-sites because of relaxing. Here is the rough
procedure:

1. During the final linking stages, do "objdump vmlinux.o | grep ..." [2]
2. Form the output as an ELF objecj
3. Link the object to __mcount_loc_start symbol
4. Done

With the similar reason as the patch [3], we should mark _mcount to be
a weak symbol to prevent it from being relaxed later.

We would like to know your opinion and comments on this.
Thanks!

Alan Kao

[1] riscv/ftrace dynamic support [patch v4 1/6]:
https://lkml.org/lkml/2018/2/13/12
[2] This used to not collect some call-sites since their jumps has no
target symbol hint. It becomes possible after the fix in 2.30 release.
See https://github.com/riscv/riscv-binutils-gdb/issues/129 for more
details.
[3] https://lkml.org/lkml/2016/5/14/101


2018-02-27 21:13:54

by Steven Rostedt

[permalink] [raw]
Subject: Re: ftrace: Proposal for an Alternative RecordMcount framework

On Tue, 27 Feb 2018 18:04:26 +0800
Alan Kao <[email protected]> wrote:

> 1. During the final linking stages, do "objdump vmlinux.o | grep ..." [2]

Note, doing it at that stage takes the longest time. It makes small
changes much longer to compile. That said...

> 2. Form the output as an ELF objecj
> 3. Link the object to __mcount_loc_start symbol
> 4. Done
>
> With the similar reason as the patch [3], we should mark _mcount to be
> a weak symbol to prevent it from being relaxed later.
>
> We would like to know your opinion and comments on this.
> Thanks!

What about just having your arch use recordmcount.c instead? It doesn't
do any grepping. It is an elf reader and modifier and modifies the .o
file directly.

Note, I will be rewriting that code in the near future too, to
consolidate it with objtool.

-- Steve

2018-03-01 02:09:07

by Alan Kao

[permalink] [raw]
Subject: Re: ftrace: Proposal for an Alternative RecordMcount framework

On Wed, Feb 28, 2018 at 05:12:52AM +0800, Steven Rostedt wrote:
> On Tue, 27 Feb 2018 18:04:26 +0800
> Alan Kao <[email protected]> wrote:
>
> > 1. During the final linking stages, do "objdump vmlinux.o | grep ..." [2]
>
> Note, doing it at that stage takes the longest time. It makes small
> changes much longer to compile. That said...
>

What if we can have some option to *disable* all the recordmcount.pl lauches
after every .o? There will be only an oneshot grep for a near-complete
vmlinux binary.

> > 2. Form the output as an ELF objecj
> > 3. Link the object to __mcount_loc_start symbol
> > 4. Done
> >
> > With the similar reason as the patch [3], we should mark _mcount to be
> > a weak symbol to prevent it from being relaxed later.
> >
> > We would like to know your opinion and comments on this.
> > Thanks!
>
> What about just having your arch use recordmcount.c instead? It doesn't
> do any grepping. It is an elf reader and modifier and modifies the .o
> file directly.

Thanks for the hint. But after a quick scan, it seems that recordmcount.c
processes .o files in a per-file basis, which means that we will still
suffer from the linker relaxation problem.

>
> Note, I will be rewriting that code in the near future too, to
> consolidate it with objtool.
>
> -- Steve

Please allow me to state the problem more clearly here. I hope this helps.

1. locations of mcount are recorded in a per-file basis.
2. to optimize the binary, the linker turns on some aggressive
options, including relaxation.
3. the optimizations changes the original offset.
4. already recorded mcount call-sites no longer point to their
real positions.
5. still a linked vmlinux is made.
6. dynamic ftrace breaks the real logic in the kernel space,
panics happen.

Thanks,
Alan

2018-03-07 01:52:56

by Alan Kao

[permalink] [raw]
Subject: Re: ftrace: Proposal for an Alternative RecordMcount framework

On Thu, Mar 01, 2018 at 10:05:07AM +0800, Alan Kao wrote:
> On Wed, Feb 28, 2018 at 05:12:52AM +0800, Steven Rostedt wrote:
> > On Tue, 27 Feb 2018 18:04:26 +0800
> > Alan Kao <[email protected]> wrote:
> >
> > > 1. During the final linking stages, do "objdump vmlinux.o | grep ..." [2]
> >
> > Note, doing it at that stage takes the longest time. It makes small
> > changes much longer to compile. That said...
> >
>
> What if we can have some option to *disable* all the recordmcount.pl lauches
> after every .o? There will be only an oneshot grep for a near-complete
> vmlinux binary.
>
> > > 2. Form the output as an ELF objecj
> > > 3. Link the object to __mcount_loc_start symbol
> > > 4. Done
> > >
> > > With the similar reason as the patch [3], we should mark _mcount to be
> > > a weak symbol to prevent it from being relaxed later.
> > >
> > > We would like to know your opinion and comments on this.
> > > Thanks!
> >
> > What about just having your arch use recordmcount.c instead? It doesn't
> > do any grepping. It is an elf reader and modifier and modifies the .o
> > file directly.
>
> Thanks for the hint. But after a quick scan, it seems that recordmcount.c
> processes .o files in a per-file basis, which means that we will still
> suffer from the linker relaxation problem.
>
> >
> > Note, I will be rewriting that code in the near future too, to
> > consolidate it with objtool.
> >
> > -- Steve
>
> Please allow me to state the problem more clearly here. I hope this helps.
>
> 1. locations of mcount are recorded in a per-file basis.
> 2. to optimize the binary, the linker turns on some aggressive
> options, including relaxation.
> 3. the optimizations changes the original offset.
> 4. already recorded mcount call-sites no longer point to their
> real positions.
> 5. still a linked vmlinux is made.
> 6. dynamic ftrace breaks the real logic in the kernel space,
> panics happen.
>
> Thanks,
> Alan

Any comments on this?

BTW, the introduced framework has no effects on any other architectures that
works fine. This feature should be configurable and turned on only when the
arch has aggressive link-time optimizations.

If you consider this appropriate, I will send the patch to this once it gets
ready. Currently this is targeting at RISC-V and upcoming NDS32.

Thanks,
Alan

2018-03-07 02:01:53

by Steven Rostedt

[permalink] [raw]
Subject: Re: ftrace: Proposal for an Alternative RecordMcount framework

On Wed, 7 Mar 2018 09:47:47 +0800
Alan Kao <[email protected]> wrote:


> > Please allow me to state the problem more clearly here. I hope this helps.
> >
> > 1. locations of mcount are recorded in a per-file basis.
> > 2. to optimize the binary, the linker turns on some aggressive
> > options, including relaxation.
> > 3. the optimizations changes the original offset.
> > 4. already recorded mcount call-sites no longer point to their
> > real positions.
> > 5. still a linked vmlinux is made.
> > 6. dynamic ftrace breaks the real logic in the kernel space,
> > panics happen.
> >
> > Thanks,
> > Alan
>
> Any comments on this?

Sorry, I've been traveling and falling behind in this. I did read this
when I was offline but forgot to reply when I was back online.

>
> BTW, the introduced framework has no effects on any other architectures that
> works fine. This feature should be configurable and turned on only when the
> arch has aggressive link-time optimizations.
>
> If you consider this appropriate, I will send the patch to this once it gets
> ready. Currently this is targeting at RISC-V and upcoming NDS32.

I see the issue you explained above and it makes sense. Perhaps make it
an option that can be enabled by anyone, and archs that require
aggressive link time optimizations would just select it.

-- Steve