2022-04-22 17:13:21

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC 0/4] perf record: Implement off-cpu profiling with BPF (v1)

Hi Milian,

On Fri, Apr 22, 2022 at 3:21 AM Milian Wolff <[email protected]> wrote:
>
> On Freitag, 22. April 2022 07:33:57 CEST Namhyung Kim wrote:
> > Hello,
> >
> > This is the first version of off-cpu profiling support. Together with
> > (PMU-based) cpu profiling, it can show holistic view of the performance
> > characteristics of your application or system.
>
> Hey Namhyung,
>
> this is awesome news! In hotspot, I've long done off-cpu profiling manually by
> looking at the time between --switch-events. The downside is that we also need
> to track the sched:sched_switch event to get a call stack. But this approach
> also works with dwarf based unwinding, and also includes kernel stacks.

Thanks, I've also briefly thought about the switch event based off-cpu
profiling as it doesn't require root. But collecting call stacks is hard and
I'd like to do it in kernel/bpf to reduce the overhead.

>
> > With BPF, it can aggregate scheduling stats for interested tasks
> > and/or states and convert the data into a form of perf sample records.
> > I chose the bpf-output event which is a software event supposed to be
> > consumed by BPF programs and renamed it as "offcpu-time". So it
> > requires no change on the perf report side except for setting sample
> > types of bpf-output event.
> >
> > Basically it collects userspace callstack for tasks as it's what users
> > want mostly. Maybe we can add support for the kernel stacks but I'm
> > afraid that it'd cause more overhead. So the offcpu-time event will
> > always have callchains regardless of the command line option, and it
> > enables the children mode in perf report by default.
>
> Has anything changed wrt perf/bpf and user applications not compiled with `-
> fno-omit-frame-pointer`? I.e. does this new utility only work for specially
> compiled applications, or do we also get backtraces for "normal" binaries that
> we can install through package managers?

I am not aware of such changes, it still needs a frame pointer to get
backtraces.

Thanks,
Namhyung


2022-04-22 23:19:25

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC 0/4] perf record: Implement off-cpu profiling with BPF (v1)

Em Fri, Apr 22, 2022 at 08:01:15AM -0700, Namhyung Kim escreveu:
> Hi Milian,

> On Fri, Apr 22, 2022 at 3:21 AM Milian Wolff <[email protected]> wrote:
> > On Freitag, 22. April 2022 07:33:57 CEST Namhyung Kim wrote:
> > > This is the first version of off-cpu profiling support. Together with
> > > (PMU-based) cpu profiling, it can show holistic view of the performance
> > > characteristics of your application or system.

> > Hey Namhyung,

> > this is awesome news! In hotspot, I've long done off-cpu profiling manually by
> > looking at the time between --switch-events. The downside is that we also need
> > to track the sched:sched_switch event to get a call stack. But this approach
> > also works with dwarf based unwinding, and also includes kernel stacks.
>
> Thanks, I've also briefly thought about the switch event based off-cpu
> profiling as it doesn't require root. But collecting call stacks is hard and
> I'd like to do it in kernel/bpf to reduce the overhead.

It would be great to have both in perf. Right now since we have one in
hotspot that is working, perfecting the other method, Namhyung's, using
BPF to reduce the amount of data to postprocess in userspace, looks
great.

> > > With BPF, it can aggregate scheduling stats for interested tasks
> > > and/or states and convert the data into a form of perf sample records.
> > > I chose the bpf-output event which is a software event supposed to be
> > > consumed by BPF programs and renamed it as "offcpu-time". So it
> > > requires no change on the perf report side except for setting sample
> > > types of bpf-output event.
> > >
> > > Basically it collects userspace callstack for tasks as it's what users
> > > want mostly. Maybe we can add support for the kernel stacks but I'm
> > > afraid that it'd cause more overhead. So the offcpu-time event will
> > > always have callchains regardless of the command line option, and it
> > > enables the children mode in perf report by default.
> >
> > Has anything changed wrt perf/bpf and user applications not compiled with `-
> > fno-omit-frame-pointer`? I.e. does this new utility only work for specially
> > compiled applications, or do we also get backtraces for "normal" binaries that
> > we can install through package managers?
>
> I am not aware of such changes, it still needs a frame pointer to get
> backtraces.

I see this as an initial limitation, one that we can lift later?

- Arnaldo

2022-04-26 05:28:43

by Milian Wolff

[permalink] [raw]
Subject: Re: [RFC 0/4] perf record: Implement off-cpu profiling with BPF (v1)

On Freitag, 22. April 2022 17:01:15 CEST Namhyung Kim wrote:
> Hi Milian,
>
> On Fri, Apr 22, 2022 at 3:21 AM Milian Wolff <[email protected]> wrote:
> > On Freitag, 22. April 2022 07:33:57 CEST Namhyung Kim wrote:
> > > Hello,
> > >
> > > This is the first version of off-cpu profiling support. Together with
> > > (PMU-based) cpu profiling, it can show holistic view of the performance
> > > characteristics of your application or system.
> >
> > Hey Namhyung,
> >
> > this is awesome news! In hotspot, I've long done off-cpu profiling
> > manually by looking at the time between --switch-events. The downside is
> > that we also need to track the sched:sched_switch event to get a call
> > stack. But this approach also works with dwarf based unwinding, and also
> > includes kernel stacks.
>
> Thanks, I've also briefly thought about the switch event based off-cpu
> profiling as it doesn't require root. But collecting call stacks is hard
> and I'd like to do it in kernel/bpf to reduce the overhead.

I'm all for reducing the overhead, I just wonder about the practicality. At
the very least, please make sure to note this limitation explicitly to end
users. As a preacher for perf, I have come across lots of people stumbling
over `perf record -g` not producing any sensible output because they are
simply not aware that this requires frame pointers which are basically non
existing on most "normal" distributions. Nowadays `man perf record` tries to
educate people, please do the same for the new `--off-cpu` switch.

> > > With BPF, it can aggregate scheduling stats for interested tasks
> > > and/or states and convert the data into a form of perf sample records.
> > > I chose the bpf-output event which is a software event supposed to be
> > > consumed by BPF programs and renamed it as "offcpu-time". So it
> > > requires no change on the perf report side except for setting sample
> > > types of bpf-output event.
> > >
> > > Basically it collects userspace callstack for tasks as it's what users
> > > want mostly. Maybe we can add support for the kernel stacks but I'm
> > > afraid that it'd cause more overhead. So the offcpu-time event will
> > > always have callchains regardless of the command line option, and it
> > > enables the children mode in perf report by default.
> >
> > Has anything changed wrt perf/bpf and user applications not compiled with
> > `- fno-omit-frame-pointer`? I.e. does this new utility only work for
> > specially compiled applications, or do we also get backtraces for
> > "normal" binaries that we can install through package managers?
>
> I am not aware of such changes, it still needs a frame pointer to get
> backtraces.

May I ask what kind of setup you are using this on? Do you use something like
Gentoo or yocto where you compile your whole system with `-fno-omit-frame-
pointer`? Because otherwise, any kind of off-cpu time in system libraries will
not be resolved properly, no?

Thanks
--
Milian Wolff | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts


Attachments:
smime.p7s (5.15 kB)

2022-04-26 09:13:26

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC 0/4] perf record: Implement off-cpu profiling with BPF (v1)

On Mon, Apr 25, 2022 at 5:42 AM Milian Wolff <[email protected]> wrote:
>
> On Freitag, 22. April 2022 17:01:15 CEST Namhyung Kim wrote:
> > Hi Milian,
> >
> > On Fri, Apr 22, 2022 at 3:21 AM Milian Wolff <[email protected]> wrote:
> > > On Freitag, 22. April 2022 07:33:57 CEST Namhyung Kim wrote:
> > > > Hello,
> > > >
> > > > This is the first version of off-cpu profiling support. Together with
> > > > (PMU-based) cpu profiling, it can show holistic view of the performance
> > > > characteristics of your application or system.
> > >
> > > Hey Namhyung,
> > >
> > > this is awesome news! In hotspot, I've long done off-cpu profiling
> > > manually by looking at the time between --switch-events. The downside is
> > > that we also need to track the sched:sched_switch event to get a call
> > > stack. But this approach also works with dwarf based unwinding, and also
> > > includes kernel stacks.
> >
> > Thanks, I've also briefly thought about the switch event based off-cpu
> > profiling as it doesn't require root. But collecting call stacks is hard
> > and I'd like to do it in kernel/bpf to reduce the overhead.
>
> I'm all for reducing the overhead, I just wonder about the practicality. At
> the very least, please make sure to note this limitation explicitly to end
> users. As a preacher for perf, I have come across lots of people stumbling
> over `perf record -g` not producing any sensible output because they are
> simply not aware that this requires frame pointers which are basically non
> existing on most "normal" distributions. Nowadays `man perf record` tries to
> educate people, please do the same for the new `--off-cpu` switch.

Good point, will add it .

>
> > > > With BPF, it can aggregate scheduling stats for interested tasks
> > > > and/or states and convert the data into a form of perf sample records.
> > > > I chose the bpf-output event which is a software event supposed to be
> > > > consumed by BPF programs and renamed it as "offcpu-time". So it
> > > > requires no change on the perf report side except for setting sample
> > > > types of bpf-output event.
> > > >
> > > > Basically it collects userspace callstack for tasks as it's what users
> > > > want mostly. Maybe we can add support for the kernel stacks but I'm
> > > > afraid that it'd cause more overhead. So the offcpu-time event will
> > > > always have callchains regardless of the command line option, and it
> > > > enables the children mode in perf report by default.
> > >
> > > Has anything changed wrt perf/bpf and user applications not compiled with
> > > `- fno-omit-frame-pointer`? I.e. does this new utility only work for
> > > specially compiled applications, or do we also get backtraces for
> > > "normal" binaries that we can install through package managers?
> >
> > I am not aware of such changes, it still needs a frame pointer to get
> > backtraces.
>
> May I ask what kind of setup you are using this on? Do you use something like
> Gentoo or yocto where you compile your whole system with `-fno-omit-frame-
> pointer`? Because otherwise, any kind of off-cpu time in system libraries will
> not be resolved properly, no?

In my work environment, everything is built with the frame pointer.
It's unfortunate most distros build without it, but as Ian said, I hope
we can lift the limitation with recent technologies soon.

Thanks,
Namhyung