LinuxLists.cc - [GIT PULL] perf tools changes for v6.4

[permalink] [raw]

Subject: Re: [GIT PULL] perf tools changes for v6.4

On Wed, May 3, 2023 at 8:12 PM Linus Torvalds
<[email protected]> wrote:
>
> On Wed, May 3, 2023 at 8:00 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > I did consider it, but the end result doesn't even build, so I unpulled again..
> >
> > I get some libbpf error, and I'm just not interested in trying to
> > debug it. This has clearly not been tested well enough to be merged.
>
> Side note: its' not even about testing.
>
> The error message makes it clear that this is garbage and should never
> be merged even if it were to compile.
>
> There is not a way in hell that it is correct that a 'perf' tool build
> should ever even look at the vmlinux binary to build.
>
> The fact that it does shows that something is seriously wrong in
> perf-tool land, and I will not be touching any pulls until that
> fundamental mistake is entirely gone.
>
> The vmlinux image that is present in my tree (ie
> /home/torvalds/v2.6/linux/vmlinux) is a test build with an insane
> config. And the fact that the perf tool even looks at it is seriously
> broken.
>
> Whatever you are doing - stop it right now.
>
> Linus

I think the error you gave makes it pretty clear what is going on and
Arnaldo's e-mail explains the motivation. Perhaps we can check a
vmlinux.h into the perf tree so that we don't default to generating
it. This would avoid the binary dependency but we may need different
flavors for different architectures because of structs like pt_regs.

Thanks,
Ian

2023-05-04 11:32:12

[permalink] [raw]

Subject: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Wed, May 03, 2023 at 08:12:20PM -0700, Linus Torvalds escreveu:
> On Wed, May 3, 2023 at 8:00 PM Linus Torvalds <[email protected]> wrote:
> > I did consider it, but the end result doesn't even build, so I unpulled again..

> > I get some libbpf error, and I'm just not interested in trying to
> > debug it. This has clearly not been tested well enough to be merged.

Its the default (opt-out) in the development branch for a while and
stayed in linux-next, but as it has been opt-in it hasn't received the
same amount of testing as the default build in the past development
cycles, even with the first feature that uses it having been introduced
back in 2020 :-\

> Side note: its' not even about testing.

> The error message makes it clear that this is garbage and should never
> be merged even if it were to compile.

> There is not a way in hell that it is correct that a 'perf' tool build
> should ever even look at the vmlinux binary to build.

> The fact that it does shows that something is seriously wrong in
> perf-tool land, and I will not be touching any pulls until that
> fundamental mistake is entirely gone.

> The vmlinux image that is present in my tree (ie
> /home/torvalds/v2.6/linux/vmlinux) is a test build with an insane
> config. And the fact that the perf tool even looks at it is seriously
> broken.

Humm,

Does building runqslower works for you in this same environment where
building perf failed?

I ask this because it uses the same libbpf technique (CO-RE) to allow
tools that access kernel data structures from BPF to work with multiple
kernels, even those where the layout of the accessed kernel structures
changed.

To build it:

$ make -C tools/bpf/runqslower/
make: Entering directory '/home/acme/git/perf-tools/tools/bpf/runqslower'
MKDIR /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf/
GEN /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf/bpf_helper_defs.h
CC /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf/staticobjs/libbpf.o
<SNIP>
LINK /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf/libbpf.a
INSTALL /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf//include/bpf/bpf.h
INSTALL /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf//include/bpf/libbpf.h
INSTALL /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf//include/bpf/btf.h
<SNIP>
INSTALL /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf//include/bpf/bpf_helper_defs.h
INSTALL libbpf_headers
MKDIR /home/acme/git/perf-tools/tools/bpf/runqslower/.output/bpftool/bootstrap/libbpf/include/bpf
INSTALL /home/acme/git/perf-tools/tools/bpf/runqslower/.output/bpftool/bootstrap/libbpf/include/bpf/hashmap.h
<SNIP>
LINK /home/acme/git/perf-tools/tools/bpf/runqslower/.output/bpftool/bootstrap/bpftool
GEN /home/acme/git/perf-tools/tools/bpf/runqslower/.output//vmlinux.h
GEN /home/acme/git/perf-tools/tools/bpf/runqslower/.output//runqslower.bpf.o
GEN /home/acme/git/perf-tools/tools/bpf/runqslower/.output//runqslower.skel.h
CC /home/acme/git/perf-tools/tools/bpf/runqslower/.output//runqslower.o
LINK /home/acme/git/perf-tools/tools/bpf/runqslower/.output//runqslower
make: Leaving directory '/home/acme/git/perf-tools/tools/bpf/runqslower'
$

# strip tools/bpf/runqslower/.output/runqslower
# file tools/bpf/runqslower/.output/runqslower
tools/bpf/runqslower/.output/runqslower: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=ab0306ee3f6cdc671d9aac7006457a3646e1c266, for GNU/Linux 3.2.0, stripped
[root@quaco perf-tools]# tools/bpf/runqslower/.output/runqslower 100
Tracing run queue latency higher than 100 us
TIME COMM PID LAT(us)
08:00:17 swapper/4 30951 1001
08:00:18 kworker/7:8 2604 101
08:00:19 ksoftirqd/2 15665 105
08:00:22 swapper/2 904400 179
08:00:22 swapper/2 1247 102
08:00:23 TaskCon~ller #7 643789 102
08:00:26 ksoftirqd/2 849302 109
08:00:26 gmain 3107 238
08:00:26 rcu_tasks_trace 152972 634
08:00:27 systemd-oomd 899470 4887
08:00:28 Timer 30951 262
08:00:28 ksoftirqd/2 15665 103
08:00:29 systemd-resolve 895022 104
08:00:29 ksoftirqd/2 15665 145
08:00:30 ksoftirqd/2 15665 117
08:00:30 ksoftirqd/2 640315 149
08:00:30 gmain 3074 122
08:00:30 goa-identity-se 3107 109
^C
#

It builds and uses the tools/bpf/bpftool tool to generate the vmlinux.h
file to build the tool:

$ strace -f -e access,open,openat make -C tools/bpf/runqslower/ |& grep vmlinux
GEN /home/acme/git/perf-tools/tools/bpf/runqslower/.output//vmlinux.h
[pid 902901] openat(AT_FDCWD, "/home/acme/git/perf-tools/tools/bpf/runqslower/.output//vmlinux.h", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
[pid 902901] openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3
[pid 902909] openat(AT_FDCWD, "/home/acme/git/perf-tools/tools/bpf/runqslower/.output/vmlinux.h", O_RDONLY) = 4
$

But here it is using /sys/kernel/btf/vmlinux, which is way more sensible
than what you noticed.

Looking at tools/bpf/runqslower/Makefile:

# Try to detect best kernel BTF source
KERNEL_REL := $(shell uname -r)
VMLINUX_BTF_PATHS := $(if $(O),$(O)/vmlinux) \
$(if $(KBUILD_OUTPUT),$(KBUILD_OUTPUT)/vmlinux) \
../../../vmlinux /sys/kernel/btf/vmlinux \
/boot/vmlinux-$(KERNEL_REL)
VMLINUX_BTF_PATH := $(or $(VMLINUX_BTF),$(firstword \
$(wildcard $(VMLINUX_BTF_PATHS))))

It tries to use vmlinux.

I'll check why it isn't using the same technique, possibly you don't
generate BTF?

In this fedora 37 kernel:

$ grep DEBUG_INFO_BTF /boot/config-6.2.9-200.fc37.x86_64
CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
$

Having said that probably we should go back to making build with BPF
skels a opt in feature, as it has been since the first feature using it
was introduced, in:

commit fbcdaa1908e8f61aa56c71a1db9a9deb72110a9d
Author: Song Liu <[email protected]>
Date: Tue Dec 29 13:42:13 2020 -0800

perf build: Support build BPF skeletons with perf

BPF programs are useful in perf to profile BPF programs.

BPF skeleton is by far the easiest way to write BPF tools. Enable
building BPF skeletons in util/bpf_skel. A dummy bpf skeleton is added.
More bpf skeletons will be added for different use cases.

Signed-off-by: Song Liu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Until we sort out these build robustness issues.

- Arnaldo

2023-05-04 17:37:07

by Linus Torvalds

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 4, 2023 at 4:09 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
>>
> Does building runqslower works for you in this same environment where
> building perf failed?

I don't know, and I don't care. I've never used that thing, and I'm
not going to.

And it's irrelevant. Two wrongs do not make a right.

I'm going to ignore perf tools pulls going forward if this is the kind
of argument for garbage that you use.

Because a billion flies *can* be wrong.

Linus

2023-05-04 18:23:50

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 10:25:30AM -0700, Linus Torvalds escreveu:
> On Thu, May 4, 2023 at 4:09 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > Does building runqslower works for you in this same environment where
> > building perf failed?

> I don't know, and I don't care. I've never used that thing, and I'm
> not going to.

> And it's irrelevant. Two wrongs do not make a right.

> I'm going to ignore perf tools pulls going forward if this is the kind
> of argument for garbage that you use.

> Because a billion flies *can* be wrong.

I pushed two reverts there that make this back into a
opt-in/experimental feature till we fix the issue you reported:

⬢[acme@toolbox perf-tools]$ git log --oneline -3
e7b7a54767a71c67 (HEAD -> perf-tools, acme/perf-tools) Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
6957bdf37a1e6eca Revert "perf build: Warn for BPF skeletons if endian mismatches"
1f85d016768ff19f (tag: perf-tools-for-v6.4-1-2023-05-03) perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
⬢[acme@toolbox perf-tools]$

Its in:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf-tools

Using a vmlinux.h file built by bpftool from the BTF info, be it in a
vmlinux file or in /sys/kernel/btf/vmlinux (a RAW BTF file) is used for
building the BPF bytecode, using clang:

⬢[acme@toolbox perf-tools]$ head tools/perf/util/bpf_skel/sample_filter.bpf.c
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
// Copyright (c) 2023 Google
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>

#include "sample-filter.h"

/* BPF map that will be filled by user space */
⬢[acme@toolbox perf-tools]$

So that it can access kernel types and store the type info for those
types together with the BPF bytecode, as BTF info, and later use this
and relocation records for libbpf to be able to adjust things when
accessed data structures change in the kernel and needs adjustments
based in both the kernel BTF info (/sys/kernel/btf/vmlinux) and the
BPF bytecode being loaded (in its .BTF ELF section).

Andrii, can you add some more information about the usage of vmlinux.h
instead of using kernel headers?

- Arnaldo

2023-05-04 19:03:27

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Thu, May 04, 2023 at 10:25:30AM -0700, Linus Torvalds escreveu:
> > On Thu, May 4, 2023 at 4:09 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > Does building runqslower works for you in this same environment where
> > > building perf failed?
>
> > I don't know, and I don't care. I've never used that thing, and I'm
> > not going to.
>
> > And it's irrelevant. Two wrongs do not make a right.
>
> > I'm going to ignore perf tools pulls going forward if this is the kind
> > of argument for garbage that you use.
>
> > Because a billion flies *can* be wrong.
>
> I pushed two reverts there that make this back into a
> opt-in/experimental feature till we fix the issue you reported:
>
> ⬢[acme@toolbox perf-tools]$ git log --oneline -3
> e7b7a54767a71c67 (HEAD -> perf-tools, acme/perf-tools) Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
> 6957bdf37a1e6eca Revert "perf build: Warn for BPF skeletons if endian mismatches"
> 1f85d016768ff19f (tag: perf-tools-for-v6.4-1-2023-05-03) perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
> ⬢[acme@toolbox perf-tools]$
>
> Its in:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf-tools
>
> Using a vmlinux.h file built by bpftool from the BTF info, be it in a
> vmlinux file or in /sys/kernel/btf/vmlinux (a RAW BTF file) is used for
> building the BPF bytecode, using clang:
>
> ⬢[acme@toolbox perf-tools]$ head tools/perf/util/bpf_skel/sample_filter.bpf.c
> // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> // Copyright (c) 2023 Google
> #include "vmlinux.h"
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_tracing.h>
> #include <bpf/bpf_core_read.h>
>
> #include "sample-filter.h"
>
> /* BPF map that will be filled by user space */
> ⬢[acme@toolbox perf-tools]$
>
> So that it can access kernel types and store the type info for those
> types together with the BPF bytecode, as BTF info, and later use this
> and relocation records for libbpf to be able to adjust things when
> accessed data structures change in the kernel and needs adjustments
> based in both the kernel BTF info (/sys/kernel/btf/vmlinux) and the
> BPF bytecode being loaded (in its .BTF ELF section).
>
> Andrii, can you add some more information about the usage of vmlinux.h
> instead of using kernel headers?
>

I'll just say that vmlinux.h is not a hard requirement to build BPF
programs, it's more a convenience allowing easy access to definitions
of both UAPI and kernel-internal structures for tracing needs and
marking them relocatable using BPF CO-RE machinery. Lots of real-world
applications just check-in pregenerated vmlinux.h to avoid build-time
dependency on up-to-date host kernel and such.

If vmlinux.h generation and usage is causing issues, though, given
that perf's BPF programs don't seem to be using many different kernel
types, it might be a better option to just use UAPI headers for public
kernel type definitions, and just define CO-RE-relocatable minimal
definitions locally in perf's BPF code for the other types necessary.
E.g., if perf needs only pid and tgid from task_struct, this would
suffice:

struct task_struct {
int pid;
int tgid;
} __attribute__((preserve_access_index));

> - Arnaldo

2023-05-04 19:03:44

[permalink] [raw]

Subject: Re: [GIT PULL] perf tools changes for v6.4

On Wed, May 03, 2023 at 10:51:15PM -0700, Ian Rogers wrote:
> On Wed, May 3, 2023 at 8:12 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Wed, May 3, 2023 at 8:00 PM Linus Torvalds
> > <[email protected]> wrote:
> > >
> > > I did consider it, but the end result doesn't even build, so I unpulled again..
> > >
> > > I get some libbpf error, and I'm just not interested in trying to
> > > debug it. This has clearly not been tested well enough to be merged.
> >
> > Side note: its' not even about testing.
> >
> > The error message makes it clear that this is garbage and should never
> > be merged even if it were to compile.
> >
> > There is not a way in hell that it is correct that a 'perf' tool build
> > should ever even look at the vmlinux binary to build.
> >
> > The fact that it does shows that something is seriously wrong in
> > perf-tool land, and I will not be touching any pulls until that
> > fundamental mistake is entirely gone.
> >
> > The vmlinux image that is present in my tree (ie
> > /home/torvalds/v2.6/linux/vmlinux) is a test build with an insane
> > config. And the fact that the perf tool even looks at it is seriously
> > broken.
> >
> > Whatever you are doing - stop it right now.
> >
> > Linus
>
> I think the error you gave makes it pretty clear what is going on and
> Arnaldo's e-mail explains the motivation. Perhaps we can check a
> vmlinux.h into the perf tree so that we don't default to generating
> it. This would avoid the binary dependency but we may need different
> flavors for different architectures because of structs like pt_regs.

I think we could check that vmlinux with .BTF is present before
allowing to build skeletons

jirka

2023-05-04 19:39:54

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > Andrii, can you add some more information about the usage of vmlinux.h
> > instead of using kernel headers?

> I'll just say that vmlinux.h is not a hard requirement to build BPF
> programs, it's more a convenience allowing easy access to definitions
> of both UAPI and kernel-internal structures for tracing needs and
> marking them relocatable using BPF CO-RE machinery. Lots of real-world
> applications just check-in pregenerated vmlinux.h to avoid build-time
> dependency on up-to-date host kernel and such.

> If vmlinux.h generation and usage is causing issues, though, given
> that perf's BPF programs don't seem to be using many different kernel
> types, it might be a better option to just use UAPI headers for public
> kernel type definitions, and just define CO-RE-relocatable minimal
> definitions locally in perf's BPF code for the other types necessary.
> E.g., if perf needs only pid and tgid from task_struct, this would
> suffice:

> struct task_struct {
> int pid;
> int tgid;
> } __attribute__((preserve_access_index));

Yeah, that seems like a way better approach, no vmlinux involved, libbpf
CO-RE notices that task_struct changed from this two integers version
(of course) and does the relocation to where it is in the running kernel
by using /sys/kernel/btf/vmlinux.

I looked and the creation of vmlinux.h was introduced in:

commit 944138f048f7d7591ec7568c94b21de8df2724d4
Author: Namhyung Kim <[email protected]>
Date: Thu Jul 1 14:12:27 2021 -0700

perf stat: Enable BPF counter with --for-each-cgroup

Recently bperf was added to use BPF to count perf events for various
purposes. This is an extension for the approach and targetting to
cgroup usages.

Unlike the other bperf, it doesn't share the events with other
processes but it'd reduces unnecessary events (and the overhead of
multiplexing) for each monitored cgroup within the perf session.

When --for-each-cgroup is used with --bpf-counters, it will open
cgroup-switches event per cpu internally and attach the new BPF
program to read given perf_events and to aggregate the results for
cgroups. It's only called when task is switched to a task in a
different cgroup.

Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Song Liu <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Which I think was the first BPF skel to access a kernel data structure,
yeah:

tools/perf/util/bpf_skel/bperf_cgroup.bpf.c

For things like:

+static inline int get_cgroup_v1_idx(__u32 *cgrps, int size)
+{
+ struct task_struct *p = (void *)bpf_get_current_task();
+ struct cgroup *cgrp;
+ register int i = 0;
+ __u32 *elem;
+ int level;
+ int cnt;
+
+ cgrp = BPF_CORE_READ(p, cgroups, subsys[perf_event_cgrp_id], cgroup);
+ level = BPF_CORE_READ(cgrp, level);

So we can completely remove touching vmlinux from the perf building
process.

If we can get the revert of the patches making BPF skels to build by
default for v6.4 then we would do this work, test it thorougly and have
it available for v6.5.

Linus, would that be a way forward?

- Arnaldo

For reference, here is the definition for BPF_CORE_READ() from tools/lib/bpf/bpf_core_read.h

/*
* BPF_CORE_READ() is used to simplify BPF CO-RE relocatable read, especially
* when there are few pointer chasing steps.
* E.g., what in non-BPF world (or in BPF w/ BCC) would be something like:
* int x = s->a.b.c->d.e->f->g;
* can be succinctly achieved using BPF_CORE_READ as:
* int x = BPF_CORE_READ(s, a.b.c, d.e, f, g);
*
* BPF_CORE_READ will decompose above statement into 4 bpf_core_read (BPF
* CO-RE relocatable bpf_probe_read_kernel() wrapper) calls, logically
* equivalent to:
* 1. const void *__t = s->a.b.c;
* 2. __t = __t->d.e;
* 3. __t = __t->f;
* 4. return __t->g;
*
* Equivalence is logical, because there is a heavy type casting/preservation
* involved, as well as all the reads are happening through
* bpf_probe_read_kernel() calls using __builtin_preserve_access_index() to
* emit CO-RE relocations.
*
* N.B. Only up to 9 "field accessors" are supported, which should be more
* than enough for any practical purpose.
*/
#define BPF_CORE_READ(src, a, ...) ({ \
___type((src), a, ##__VA_ARGS__) __r; \
BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__); \
__r; \
})

2023-05-04 21:55:15

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > Andrii, can you add some more information about the usage of vmlinux.h
> > > instead of using kernel headers?
>
> > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > programs, it's more a convenience allowing easy access to definitions
> > of both UAPI and kernel-internal structures for tracing needs and
> > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > applications just check-in pregenerated vmlinux.h to avoid build-time
> > dependency on up-to-date host kernel and such.
>
> > If vmlinux.h generation and usage is causing issues, though, given
> > that perf's BPF programs don't seem to be using many different kernel
> > types, it might be a better option to just use UAPI headers for public
> > kernel type definitions, and just define CO-RE-relocatable minimal
> > definitions locally in perf's BPF code for the other types necessary.
> > E.g., if perf needs only pid and tgid from task_struct, this would
> > suffice:
>
> > struct task_struct {
> > int pid;
> > int tgid;
> > } __attribute__((preserve_access_index));
>
> Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> CO-RE notices that task_struct changed from this two integers version
> (of course) and does the relocation to where it is in the running kernel
> by using /sys/kernel/btf/vmlinux.

Doing it for one of the skels, build tested, runtime untested, but not
using any vmlinux, BTF to help, not that bad, more verbose, but at least
we state what are the fields we actually use, have those attribute
documenting that those offsets will be recorded for future use, etc.

Namhyung, can you please check that this works?

Thanks,

- Arnaldo

diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
index 6a438e0102c5a2cb..f376d162549ebd74 100644
--- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
+++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
@@ -1,11 +1,40 @@
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
// Copyright (c) 2021 Facebook
// Copyright (c) 2021 Google
-#include "vmlinux.h"
+#include <linux/types.h>
+#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>

+// libbpf's CO-RE will take care of the relocations so that these fields match
+// the layout of these structs in the kernel where this ends up running on.
+
+struct cgroup_subsys_state {
+ struct cgroup *cgroup;
+} __attribute__((preserve_access_index));
+
+struct css_set {
+ struct cgroup_subsys_state *subsys[13];
+} __attribute__((preserve_access_index));
+
+struct task_struct {
+ struct css_set *cgroups;
+} __attribute__((preserve_access_index));
+
+struct kernfs_node {
+ __u64 id;
+} __attribute__((preserve_access_index));
+
+struct cgroup {
+ struct kernfs_node *kn;
+ int level;
+} __attribute__((preserve_access_index));
+
+enum cgroup_subsys_id {
+ perf_event_cgrp_id = 8,
+};
+
#define MAX_LEVELS 10 // max cgroup hierarchy level: arbitrary
#define MAX_EVENTS 32 // max events per cgroup: arbitrary

@@ -52,7 +81,7 @@ struct cgroup___new {
/* old kernel cgroup definition */
struct cgroup___old {
int level;
- u64 ancestor_ids[];
+ __u64 ancestor_ids[];
} __attribute__((preserve_access_index));

const volatile __u32 num_events = 1;

2023-05-04 22:16:07

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 06:48:50PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > instead of using kernel headers?
> >
> > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > programs, it's more a convenience allowing easy access to definitions
> > > of both UAPI and kernel-internal structures for tracing needs and
> > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > dependency on up-to-date host kernel and such.
> >
> > > If vmlinux.h generation and usage is causing issues, though, given
> > > that perf's BPF programs don't seem to be using many different kernel
> > > types, it might be a better option to just use UAPI headers for public
> > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > definitions locally in perf's BPF code for the other types necessary.
> > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > suffice:
> >
> > > struct task_struct {
> > > int pid;
> > > int tgid;
> > > } __attribute__((preserve_access_index));
> >
> > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > CO-RE notices that task_struct changed from this two integers version
> > (of course) and does the relocation to where it is in the running kernel
> > by using /sys/kernel/btf/vmlinux.
>
> Doing it for one of the skels, build tested, runtime untested, but not
> using any vmlinux, BTF to help, not that bad, more verbose, but at least
> we state what are the fields we actually use, have those attribute
> documenting that those offsets will be recorded for future use, etc.
>
> Namhyung, can you please check that this works?

Second case was simpler:

diff --git a/tools/perf/util/bpf_skel/bperf_follower.bpf.c b/tools/perf/util/bpf_skel/bperf_follower.bpf.c
index f193998530d431d8..1ab06f2ff5ad7548 100644
--- a/tools/perf/util/bpf_skel/bperf_follower.bpf.c
+++ b/tools/perf/util/bpf_skel/bperf_follower.bpf.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
// Copyright (c) 2021 Facebook
-#include "vmlinux.h"
+#include <linux/types.h>
+#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include "bperf_u.h"

2023-05-04 22:16:57

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > instead of using kernel headers?
> >
> > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > programs, it's more a convenience allowing easy access to definitions
> > > of both UAPI and kernel-internal structures for tracing needs and
> > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > dependency on up-to-date host kernel and such.
> >
> > > If vmlinux.h generation and usage is causing issues, though, given
> > > that perf's BPF programs don't seem to be using many different kernel
> > > types, it might be a better option to just use UAPI headers for public
> > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > definitions locally in perf's BPF code for the other types necessary.
> > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > suffice:
> >
> > > struct task_struct {
> > > int pid;
> > > int tgid;
> > > } __attribute__((preserve_access_index));
> >
> > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > CO-RE notices that task_struct changed from this two integers version
> > (of course) and does the relocation to where it is in the running kernel
> > by using /sys/kernel/btf/vmlinux.
>
> Doing it for one of the skels, build tested, runtime untested, but not
> using any vmlinux, BTF to help, not that bad, more verbose, but at least
> we state what are the fields we actually use, have those attribute
> documenting that those offsets will be recorded for future use, etc.
>
> Namhyung, can you please check that this works?
>
> Thanks,
>
> - Arnaldo
>
> diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> index 6a438e0102c5a2cb..f376d162549ebd74 100644
> --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> @@ -1,11 +1,40 @@
> // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> // Copyright (c) 2021 Facebook
> // Copyright (c) 2021 Google
> -#include "vmlinux.h"
> +#include <linux/types.h>
> +#include <linux/bpf.h>

Compared to vmlinux.h here be dragons. It is easy to start dragging in
all of libc and that may not work due to missing #ifdefs, etc.. Could
we check in a vmlinux.h like libbpf-tools does?
https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64

This would also remove some of the errors that could be introduced by
copy+pasting enums, etc. and also highlight issues with things being
renamed as build time rather than runtime failures.
Could this be some shared resource for the different linux tools
projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
install_headers target that builds a vmlinux.h.

Thanks,
Ian

> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_tracing.h>
> #include <bpf/bpf_core_read.h>
>
> +// libbpf's CO-RE will take care of the relocations so that these fields match
> +// the layout of these structs in the kernel where this ends up running on.
> +
> +struct cgroup_subsys_state {
> + struct cgroup *cgroup;
> +} __attribute__((preserve_access_index));
> +
> +struct css_set {
> + struct cgroup_subsys_state *subsys[13];
> +} __attribute__((preserve_access_index));
> +
> +struct task_struct {
> + struct css_set *cgroups;
> +} __attribute__((preserve_access_index));
> +
> +struct kernfs_node {
> + __u64 id;
> +} __attribute__((preserve_access_index));
> +
> +struct cgroup {
> + struct kernfs_node *kn;
> + int level;
> +} __attribute__((preserve_access_index));
> +
> +enum cgroup_subsys_id {
> + perf_event_cgrp_id = 8,
> +};
> +
> #define MAX_LEVELS 10 // max cgroup hierarchy level: arbitrary
> #define MAX_EVENTS 32 // max events per cgroup: arbitrary
>
> @@ -52,7 +81,7 @@ struct cgroup___new {
> /* old kernel cgroup definition */
> struct cgroup___old {
> int level;
> - u64 ancestor_ids[];
> + __u64 ancestor_ids[];
> } __attribute__((preserve_access_index));
>
> const volatile __u32 num_events = 1;

2023-05-04 22:55:25

by Namhyung Kim

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 04, 2023 at 06:48:50PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > instead of using kernel headers?
> >
> > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > programs, it's more a convenience allowing easy access to definitions
> > > of both UAPI and kernel-internal structures for tracing needs and
> > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > dependency on up-to-date host kernel and such.
> >
> > > If vmlinux.h generation and usage is causing issues, though, given
> > > that perf's BPF programs don't seem to be using many different kernel
> > > types, it might be a better option to just use UAPI headers for public
> > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > definitions locally in perf's BPF code for the other types necessary.
> > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > suffice:
> >
> > > struct task_struct {
> > > int pid;
> > > int tgid;
> > > } __attribute__((preserve_access_index));
> >
> > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > CO-RE notices that task_struct changed from this two integers version
> > (of course) and does the relocation to where it is in the running kernel
> > by using /sys/kernel/btf/vmlinux.
>
> Doing it for one of the skels, build tested, runtime untested, but not
> using any vmlinux, BTF to help, not that bad, more verbose, but at least
> we state what are the fields we actually use, have those attribute
> documenting that those offsets will be recorded for future use, etc.
>
> Namhyung, can you please check that this works?

Yep, it works great!

$ sudo ./perf stat -a --bpf-counters --for-each-cgroup /,user.slice,system.slice sleep 1

Performance counter stats for 'system wide':

64,110.41 msec cpu-clock / # 64.004 CPUs utilized
15,787 context-switches / # 246.247 /sec
72 cpu-migrations / # 1.123 /sec
1,236 page-faults / # 19.279 /sec
848,608,137 cycles / # 0.013 GHz (83.23%)
106,928,070 stalled-cycles-frontend / # 12.60% frontend cycles idle (83.23%)
209,204,795 stalled-cycles-backend / # 24.65% backend cycles idle (83.23%)
645,183,025 instructions / # 0.76 insn per cycle
# 0.32 stalled cycles per insn (83.24%)
141,776,876 branches / # 2.211 M/sec (83.63%)
3,001,078 branch-misses / # 2.12% of all branches (83.44%)
66.67 msec cpu-clock user.slice # 0.067 CPUs utilized
695 context-switches user.slice # 10.424 K/sec
22 cpu-migrations user.slice # 329.966 /sec
1,202 page-faults user.slice # 18.028 K/sec
150,514,330 cycles user.slice # 2.257 GHz (90.17%)
13,504,605 stalled-cycles-frontend user.slice # 8.97% frontend cycles idle (69.71%)
38,859,376 stalled-cycles-backend user.slice # 25.82% backend cycles idle (95.28%)
189,382,145 instructions user.slice # 1.26 insn per cycle
# 0.21 stalled cycles per insn (88.92%)
36,019,878 branches user.slice # 540.242 M/sec (90.16%)
697,723 branch-misses user.slice # 1.94% of all branches (65.77%)
44.33 msec cpu-clock system.slice # 0.044 CPUs utilized
2,382 context-switches system.slice # 53.732 K/sec
42 cpu-migrations system.slice # 947.418 /sec
34 page-faults system.slice # 766.958 /sec
100,383,549 cycles system.slice # 2.264 GHz (87.27%)
10,165,225 stalled-cycles-frontend system.slice # 10.13% frontend cycles idle (71.73%)
29,964,682 stalled-cycles-backend system.slice # 29.85% backend cycles idle (84.94%)
101,210,743 instructions system.slice # 1.01 insn per cycle
# 0.30 stalled cycles per insn (80.68%)
19,893,831 branches system.slice # 448.757 M/sec (86.94%)
397,854 branch-misses system.slice # 2.00% of all branches (88.42%)

1.001667221 seconds time elapsed

Thanks,
Namhyung

> diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> index 6a438e0102c5a2cb..f376d162549ebd74 100644
> --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> @@ -1,11 +1,40 @@
> // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> // Copyright (c) 2021 Facebook
> // Copyright (c) 2021 Google
> -#include "vmlinux.h"
> +#include <linux/types.h>
> +#include <linux/bpf.h>
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_tracing.h>
> #include <bpf/bpf_core_read.h>
>
> +// libbpf's CO-RE will take care of the relocations so that these fields match
> +// the layout of these structs in the kernel where this ends up running on.
> +
> +struct cgroup_subsys_state {
> + struct cgroup *cgroup;
> +} __attribute__((preserve_access_index));
> +
> +struct css_set {
> + struct cgroup_subsys_state *subsys[13];
> +} __attribute__((preserve_access_index));
> +
> +struct task_struct {
> + struct css_set *cgroups;
> +} __attribute__((preserve_access_index));
> +
> +struct kernfs_node {
> + __u64 id;
> +} __attribute__((preserve_access_index));
> +
> +struct cgroup {
> + struct kernfs_node *kn;
> + int level;
> +} __attribute__((preserve_access_index));
> +
> +enum cgroup_subsys_id {
> + perf_event_cgrp_id = 8,
> +};
> +
> #define MAX_LEVELS 10 // max cgroup hierarchy level: arbitrary
> #define MAX_EVENTS 32 // max events per cgroup: arbitrary
>
> @@ -52,7 +81,7 @@ struct cgroup___new {
> /* old kernel cgroup definition */
> struct cgroup___old {
> int level;
> - u64 ancestor_ids[];
> + __u64 ancestor_ids[];
> } __attribute__((preserve_access_index));
>
> const volatile __u32 num_events = 1;
>

2023-05-05 00:07:12

by Namhyung Kim

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Hi Jiri,

On Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa wrote:
> On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > >
> > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > instead of using kernel headers?
> > > >
> > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > dependency on up-to-date host kernel and such.
> > > >
> > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > types, it might be a better option to just use UAPI headers for public
> > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > suffice:
> > > >
> > > > > struct task_struct {
> > > > > int pid;
> > > > > int tgid;
> > > > > } __attribute__((preserve_access_index));
> > > >
> > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > CO-RE notices that task_struct changed from this two integers version
> > > > (of course) and does the relocation to where it is in the running kernel
> > > > by using /sys/kernel/btf/vmlinux.
> > >
> > > Doing it for one of the skels, build tested, runtime untested, but not
> > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > we state what are the fields we actually use, have those attribute
> > > documenting that those offsets will be recorded for future use, etc.
> > >
> > > Namhyung, can you please check that this works?
> > >
> > > Thanks,
> > >
> > > - Arnaldo
> > >
> > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > @@ -1,11 +1,40 @@
> > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > // Copyright (c) 2021 Facebook
> > > // Copyright (c) 2021 Google
> > > -#include "vmlinux.h"
> > > +#include <linux/types.h>
> > > +#include <linux/bpf.h>
> >
> > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > we check in a vmlinux.h like libbpf-tools does?
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> >
> > This would also remove some of the errors that could be introduced by
> > copy+pasting enums, etc. and also highlight issues with things being
> > renamed as build time rather than runtime failures.
>
> we already have to deal with that, right? doing checks on fields in
> structs like mm_struct___old
>
> > Could this be some shared resource for the different linux tools
> > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > install_headers target that builds a vmlinux.h.
>
> I tried to do the minimal header and it's not too big,
> I pushed it in here:
> https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
>
> compile tested so far

Cool. But I think you missed this.

diff --git a/tools/perf/util/bpf_skel/perf-defs.h b/tools/perf/util/bpf_skel/perf-defs.h
index 1320e1be03b8..4cfa8a9fce39 100644
--- a/tools/perf/util/bpf_skel/perf-defs.h
+++ b/tools/perf/util/bpf_skel/perf-defs.h
@@ -253,6 +253,7 @@ typedef struct {
} atomic64_t;

struct rw_semaphore {
+ atomic_long_t owner;
} __attribute__((preserve_access_index));

typedef atomic64_t atomic_long_t;

Thanks,
Namhyung

2023-05-05 00:10:43

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 4, 2023 at 4:03 PM Jiri Olsa <[email protected]> wrote:
>
> On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > >
> > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > instead of using kernel headers?
> > > >
> > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > dependency on up-to-date host kernel and such.
> > > >
> > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > types, it might be a better option to just use UAPI headers for public
> > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > suffice:
> > > >
> > > > > struct task_struct {
> > > > > int pid;
> > > > > int tgid;
> > > > > } __attribute__((preserve_access_index));
> > > >
> > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > CO-RE notices that task_struct changed from this two integers version
> > > > (of course) and does the relocation to where it is in the running kernel
> > > > by using /sys/kernel/btf/vmlinux.
> > >
> > > Doing it for one of the skels, build tested, runtime untested, but not
> > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > we state what are the fields we actually use, have those attribute
> > > documenting that those offsets will be recorded for future use, etc.
> > >
> > > Namhyung, can you please check that this works?
> > >
> > > Thanks,
> > >
> > > - Arnaldo
> > >
> > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > @@ -1,11 +1,40 @@
> > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > // Copyright (c) 2021 Facebook
> > > // Copyright (c) 2021 Google
> > > -#include "vmlinux.h"
> > > +#include <linux/types.h>
> > > +#include <linux/bpf.h>
> >
> > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > we check in a vmlinux.h like libbpf-tools does?
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> >
> > This would also remove some of the errors that could be introduced by
> > copy+pasting enums, etc. and also highlight issues with things being
> > renamed as build time rather than runtime failures.
>
> we already have to deal with that, right? doing checks on fields in
> structs like mm_struct___old

We do, but the way I detected the problems in the first place was by
building against older kernels. Now the build will always succeed but
fail at runtime.

> > Could this be some shared resource for the different linux tools
> > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > install_headers target that builds a vmlinux.h.
>
> I tried to do the minimal header and it's not too big,
> I pushed it in here:
> https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
>
> compile tested so far
>
> jirka

Cool, could we just call it vmlinux.h rather than perf-defs.h?

I notice cgroup_subsys_id is in there which is called out in Andrii's
CO-RE guide/blog:
https://nakryiko.com/posts/bpf-core-reference-guide/#relocatable-enums
perhaps we can do something with names/types to make sure a helper is
being used for these enum values.

Thanks,
Ian

2023-05-05 00:12:08

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> >
> > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > instead of using kernel headers?
> > >
> > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > programs, it's more a convenience allowing easy access to definitions
> > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > dependency on up-to-date host kernel and such.
> > >
> > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > that perf's BPF programs don't seem to be using many different kernel
> > > > types, it might be a better option to just use UAPI headers for public
> > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > definitions locally in perf's BPF code for the other types necessary.
> > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > suffice:
> > >
> > > > struct task_struct {
> > > > int pid;
> > > > int tgid;
> > > > } __attribute__((preserve_access_index));
> > >
> > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > CO-RE notices that task_struct changed from this two integers version
> > > (of course) and does the relocation to where it is in the running kernel
> > > by using /sys/kernel/btf/vmlinux.
> >
> > Doing it for one of the skels, build tested, runtime untested, but not
> > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > we state what are the fields we actually use, have those attribute
> > documenting that those offsets will be recorded for future use, etc.
> >
> > Namhyung, can you please check that this works?
> >
> > Thanks,
> >
> > - Arnaldo
> >
> > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > @@ -1,11 +1,40 @@
> > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > // Copyright (c) 2021 Facebook
> > // Copyright (c) 2021 Google
> > -#include "vmlinux.h"
> > +#include <linux/types.h>
> > +#include <linux/bpf.h>
>
> Compared to vmlinux.h here be dragons. It is easy to start dragging in
> all of libc and that may not work due to missing #ifdefs, etc.. Could
> we check in a vmlinux.h like libbpf-tools does?
> https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
>
> This would also remove some of the errors that could be introduced by
> copy+pasting enums, etc. and also highlight issues with things being
> renamed as build time rather than runtime failures.

we already have to deal with that, right? doing checks on fields in
structs like mm_struct___old

> Could this be some shared resource for the different linux tools
> projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> install_headers target that builds a vmlinux.h.

I tried to do the minimal header and it's not too big,
I pushed it in here:
https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h

compile tested so far

jirka

2023-05-05 09:44:15

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 04, 2023 at 04:15:08PM -0700, Namhyung Kim wrote:
> Hi Jiri,
>
> On Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa wrote:
> > On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > >
> > > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > > instead of using kernel headers?
> > > > >
> > > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > > dependency on up-to-date host kernel and such.
> > > > >
> > > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > > types, it might be a better option to just use UAPI headers for public
> > > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > > suffice:
> > > > >
> > > > > > struct task_struct {
> > > > > > int pid;
> > > > > > int tgid;
> > > > > > } __attribute__((preserve_access_index));
> > > > >
> > > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > > CO-RE notices that task_struct changed from this two integers version
> > > > > (of course) and does the relocation to where it is in the running kernel
> > > > > by using /sys/kernel/btf/vmlinux.
> > > >
> > > > Doing it for one of the skels, build tested, runtime untested, but not
> > > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > > we state what are the fields we actually use, have those attribute
> > > > documenting that those offsets will be recorded for future use, etc.
> > > >
> > > > Namhyung, can you please check that this works?
> > > >
> > > > Thanks,
> > > >
> > > > - Arnaldo
> > > >
> > > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > @@ -1,11 +1,40 @@
> > > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > // Copyright (c) 2021 Facebook
> > > > // Copyright (c) 2021 Google
> > > > -#include "vmlinux.h"
> > > > +#include <linux/types.h>
> > > > +#include <linux/bpf.h>
> > >
> > > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > > we check in a vmlinux.h like libbpf-tools does?
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> > >
> > > This would also remove some of the errors that could be introduced by
> > > copy+pasting enums, etc. and also highlight issues with things being
> > > renamed as build time rather than runtime failures.
> >
> > we already have to deal with that, right? doing checks on fields in
> > structs like mm_struct___old
> >
> > > Could this be some shared resource for the different linux tools
> > > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > > install_headers target that builds a vmlinux.h.
> >
> > I tried to do the minimal header and it's not too big,
> > I pushed it in here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
> >
> > compile tested so far
>
> Cool. But I think you missed this.
>
> diff --git a/tools/perf/util/bpf_skel/perf-defs.h b/tools/perf/util/bpf_skel/perf-defs.h
> index 1320e1be03b8..4cfa8a9fce39 100644
> --- a/tools/perf/util/bpf_skel/perf-defs.h
> +++ b/tools/perf/util/bpf_skel/perf-defs.h
> @@ -253,6 +253,7 @@ typedef struct {
> } atomic64_t;
>
> struct rw_semaphore {
> + atomic_long_t owner;
> } __attribute__((preserve_access_index));

ah right, I did not see that because my clang took another #ifdef leg

thanks,
jirka

>
> typedef atomic64_t atomic_long_t;
>
>
> Thanks,
> Namhyung

2023-05-05 10:09:26

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 04, 2023 at 04:19:47PM -0700, Ian Rogers wrote:
> On Thu, May 4, 2023 at 4:03 PM Jiri Olsa <[email protected]> wrote:
> >
> > On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > >
> > > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > > instead of using kernel headers?
> > > > >
> > > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > > dependency on up-to-date host kernel and such.
> > > > >
> > > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > > types, it might be a better option to just use UAPI headers for public
> > > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > > suffice:
> > > > >
> > > > > > struct task_struct {
> > > > > > int pid;
> > > > > > int tgid;
> > > > > > } __attribute__((preserve_access_index));
> > > > >
> > > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > > CO-RE notices that task_struct changed from this two integers version
> > > > > (of course) and does the relocation to where it is in the running kernel
> > > > > by using /sys/kernel/btf/vmlinux.
> > > >
> > > > Doing it for one of the skels, build tested, runtime untested, but not
> > > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > > we state what are the fields we actually use, have those attribute
> > > > documenting that those offsets will be recorded for future use, etc.
> > > >
> > > > Namhyung, can you please check that this works?
> > > >
> > > > Thanks,
> > > >
> > > > - Arnaldo
> > > >
> > > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > @@ -1,11 +1,40 @@
> > > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > // Copyright (c) 2021 Facebook
> > > > // Copyright (c) 2021 Google
> > > > -#include "vmlinux.h"
> > > > +#include <linux/types.h>
> > > > +#include <linux/bpf.h>
> > >
> > > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > > we check in a vmlinux.h like libbpf-tools does?
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> > >
> > > This would also remove some of the errors that could be introduced by
> > > copy+pasting enums, etc. and also highlight issues with things being
> > > renamed as build time rather than runtime failures.
> >
> > we already have to deal with that, right? doing checks on fields in
> > structs like mm_struct___old
>
> We do, but the way I detected the problems in the first place was by
> building against older kernels. Now the build will always succeed but
> fail at runtime.
>
> > > Could this be some shared resource for the different linux tools
> > > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > > install_headers target that builds a vmlinux.h.
> >
> > I tried to do the minimal header and it's not too big,
> > I pushed it in here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
> >
> > compile tested so far
> >
> > jirka
>
> Cool, could we just call it vmlinux.h rather than perf-defs.h?

right, it also makes the change smaller

>
> I notice cgroup_subsys_id is in there which is called out in Andrii's
> CO-RE guide/blog:
> https://nakryiko.com/posts/bpf-core-reference-guide/#relocatable-enums
> perhaps we can do something with names/types to make sure a helper is
> being used for these enum values.

ok, I'll check on that.. so far I made some clean ups and updated the branch

thanks,
jirka

2023-05-05 11:48:19

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Fri, May 05, 2023 at 11:39:19AM +0200, Jiri Olsa wrote:
> On Thu, May 04, 2023 at 04:19:47PM -0700, Ian Rogers wrote:
> > On Thu, May 4, 2023 at 4:03 PM Jiri Olsa <[email protected]> wrote:
> > >
> > > On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > > > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > >
> > > > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > > > instead of using kernel headers?
> > > > > >
> > > > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > > > dependency on up-to-date host kernel and such.
> > > > > >
> > > > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > > > types, it might be a better option to just use UAPI headers for public
> > > > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > > > suffice:
> > > > > >
> > > > > > > struct task_struct {
> > > > > > > int pid;
> > > > > > > int tgid;
> > > > > > > } __attribute__((preserve_access_index));
> > > > > >
> > > > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > > > CO-RE notices that task_struct changed from this two integers version
> > > > > > (of course) and does the relocation to where it is in the running kernel
> > > > > > by using /sys/kernel/btf/vmlinux.
> > > > >
> > > > > Doing it for one of the skels, build tested, runtime untested, but not
> > > > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > > > we state what are the fields we actually use, have those attribute
> > > > > documenting that those offsets will be recorded for future use, etc.
> > > > >
> > > > > Namhyung, can you please check that this works?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > - Arnaldo
> > > > >
> > > > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > > @@ -1,11 +1,40 @@
> > > > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > > // Copyright (c) 2021 Facebook
> > > > > // Copyright (c) 2021 Google
> > > > > -#include "vmlinux.h"
> > > > > +#include <linux/types.h>
> > > > > +#include <linux/bpf.h>
> > > >
> > > > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > > > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > > > we check in a vmlinux.h like libbpf-tools does?
> > > > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > > > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> > > >
> > > > This would also remove some of the errors that could be introduced by
> > > > copy+pasting enums, etc. and also highlight issues with things being
> > > > renamed as build time rather than runtime failures.
> > >
> > > we already have to deal with that, right? doing checks on fields in
> > > structs like mm_struct___old
> >
> > We do, but the way I detected the problems in the first place was by
> > building against older kernels. Now the build will always succeed but
> > fail at runtime.
> >
> > > > Could this be some shared resource for the different linux tools
> > > > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > > > install_headers target that builds a vmlinux.h.
> > >
> > > I tried to do the minimal header and it's not too big,
> > > I pushed it in here:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
> > >
> > > compile tested so far
> > >
> > > jirka
> >
> > Cool, could we just call it vmlinux.h rather than perf-defs.h?
>
> right, it also makes the change smaller
>
> >
> > I notice cgroup_subsys_id is in there which is called out in Andrii's
> > CO-RE guide/blog:
> > https://nakryiko.com/posts/bpf-core-reference-guide/#relocatable-enums
> > perhaps we can do something with names/types to make sure a helper is
> > being used for these enum values.

both bperf_cgroup and off_cpu programs use bpf_core_enum_value, so we should be fine

jirka

2023-05-05 13:24:44

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 07:01:51PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, May 04, 2023 at 06:48:50PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > instead of using kernel headers?
> > >
> > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > programs, it's more a convenience allowing easy access to definitions
> > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > dependency on up-to-date host kernel and such.
> > >
> > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > that perf's BPF programs don't seem to be using many different kernel
> > > > types, it might be a better option to just use UAPI headers for public
> > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > definitions locally in perf's BPF code for the other types necessary.
> > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > suffice:
> > >
> > > > struct task_struct {
> > > > int pid;
> > > > int tgid;
> > > > } __attribute__((preserve_access_index));
> > >
> > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > CO-RE notices that task_struct changed from this two integers version
> > > (of course) and does the relocation to where it is in the running kernel
> > > by using /sys/kernel/btf/vmlinux.
> >
> > Doing it for one of the skels, build tested, runtime untested, but not
> > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > we state what are the fields we actually use, have those attribute
> > documenting that those offsets will be recorded for future use, etc.

Yang, can you please check that this works?

From bd6289bc3ffc89aecad3bd8798d76626c8c16d39 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <[email protected]>
Date: Fri, 5 May 2023 10:13:09 -0300
Subject: [PATCH 1/1] perf kwork_trace.bpf: Stop using vmlinux.h, grab copies
of used structs

And mark them with __attribute__((preserve_access_index)) so that
libbpf's CO-RE code can fixup offsets if they differ with the kernel
data structure.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/bpf_skel/kwork_trace.bpf.c | 70 +++++++++++++++++++++-
1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf_skel/kwork_trace.bpf.c b/tools/perf/util/bpf_skel/kwork_trace.bpf.c
index 063c124e099938ed..e38fe54c7667fa74 100644
--- a/tools/perf/util/bpf_skel/kwork_trace.bpf.c
+++ b/tools/perf/util/bpf_skel/kwork_trace.bpf.c
@@ -1,13 +1,81 @@
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
// Copyright (c) 2022, Huawei

-#include "vmlinux.h"
+#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

#define KWORK_COUNT 100
#define MAX_KWORKNAME 128

+
+// non-UAPI kernel data structures, just the fields used in this tool,
+// preserving the access index so that libbpf can fixup offsets with the ones
+// used in the kernel when loading the BPF bytecode, if they differ from what
+// is used here.
+
+enum {
+ HI_SOFTIRQ = 0,
+ TIMER_SOFTIRQ,
+ NET_TX_SOFTIRQ,
+ NET_RX_SOFTIRQ,
+ BLOCK_SOFTIRQ,
+ IRQ_POLL_SOFTIRQ,
+ TASKLET_SOFTIRQ,
+ SCHED_SOFTIRQ,
+ HRTIMER_SOFTIRQ,
+ RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */
+
+ NR_SOFTIRQS
+};
+
+struct trace_entry {
+ short unsigned int type;
+ unsigned char flags;
+ unsigned char preempt_count;
+ int pid;
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_irq_handler_entry {
+ struct trace_entry ent;
+ int irq;
+ __u32 __data_loc_name;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_irq_handler_exit {
+ struct trace_entry ent;
+ int irq;
+ int ret;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_softirq {
+ struct trace_entry ent;
+ unsigned int vec;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_execute_start {
+ struct trace_entry ent;
+ void *work;
+ void *function;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_execute_end {
+ struct trace_entry ent;
+ void *work;
+ void *function;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_activate_work {
+ struct trace_entry ent;
+ void *work;
+ char __data[];
+} __attribute__((preserve_access_index));
+
/*
* This should be in sync with "util/kwork.h"
*/
--
2.39.2

2023-05-05 13:52:17

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 07:01:51PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, May 04, 2023 at 06:48:50PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > instead of using kernel headers?
> > >
> > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > programs, it's more a convenience allowing easy access to definitions
> > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > dependency on up-to-date host kernel and such.
> > >
> > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > that perf's BPF programs don't seem to be using many different kernel
> > > > types, it might be a better option to just use UAPI headers for public
> > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > definitions locally in perf's BPF code for the other types necessary.
> > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > suffice:
> > >
> > > > struct task_struct {
> > > > int pid;
> > > > int tgid;
> > > > } __attribute__((preserve_access_index));
> > >
> > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > CO-RE notices that task_struct changed from this two integers version
> > > (of course) and does the relocation to where it is in the running kernel
> > > by using /sys/kernel/btf/vmlinux.
> >
> > Doing it for one of the skels, build tested, runtime untested, but not
> > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > we state what are the fields we actually use, have those attribute
> > documenting that those offsets will be recorded for future use, etc.
> >

Namhyung, can you please check that this one for the recent sample works?

From c6972dae6c962d7be5ba006ab90c9955268debc5 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <[email protected]>
Date: Fri, 5 May 2023 09:55:18 -0300
Subject: [PATCH 1/2] perf sample_filter.bpf: Stop using vmlinux.h generated by
bpftool, use CO-RE

Including linux/bpf.h and linux/perf_events.h we get the UAPI structs
and then define a subset 'struct perf_sample_data' with the fields we
use in this tool while using __attribute__((preserve_access_index)) so
that at libbpf load time it can fixup the offsets according to the
'struct perf_data_sample' obtained from the running kernel BTF
(/sys/kernel/btf/vmlinux).

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/bpf_skel/sample_filter.bpf.c | 37 +++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
index cffe493af1ed5f31..045532c2366d74ef 100644
--- a/tools/perf/util/bpf_skel/sample_filter.bpf.c
+++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
@@ -1,12 +1,47 @@
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
// Copyright (c) 2023 Google
-#include "vmlinux.h"
+#include <linux/bpf.h>
+#include <linux/perf_event.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>

#include "sample-filter.h"

+// non-UAPI kernel data structures, just the fields used in this tool,
+// preserving the access index so that libbpf can fixup offsets with the ones
+// used in the kernel when loading the BPF bytecode, if they differ from what
+// is used here.
+
+struct perf_sample_data {
+ __u64 addr;
+ __u64 period;
+ union perf_sample_weight weight;
+ __u64 txn;
+ union perf_mem_data_src data_src;
+ __u64 ip;
+ struct {
+ __u32 pid;
+ __u32 tid;
+ } tid_entry;
+ __u64 time;
+ __u64 id;
+ struct {
+ __u32 cpu;
+ } cpu_entry;
+ __u64 phys_addr;
+ __u64 data_page_size;
+ __u64 code_page_size;
+} __attribute__((__aligned__(64))) __attribute__((preserve_access_index));
+
+struct bpf_perf_event_data_kern {
+ struct perf_sample_data * data;
+ struct perf_event * event;
+
+ /* size: 24, cachelines: 1, members: 3 */
+ /* last cacheline: 24 bytes */
+} __attribute__((preserve_access_index));
+
/* BPF map that will be filled by user space */
struct filters {
__uint(type, BPF_MAP_TYPE_ARRAY);
--
2.39.2

2023-05-05 14:31:04

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa escreveu:
> On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > >
> > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > instead of using kernel headers?
> > > >
> > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > dependency on up-to-date host kernel and such.
> > > >
> > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > types, it might be a better option to just use UAPI headers for public
> > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > suffice:
> > > >
> > > > > struct task_struct {
> > > > > int pid;
> > > > > int tgid;
> > > > > } __attribute__((preserve_access_index));
> > > >
> > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > CO-RE notices that task_struct changed from this two integers version
> > > > (of course) and does the relocation to where it is in the running kernel
> > > > by using /sys/kernel/btf/vmlinux.
> > >
> > > Doing it for one of the skels, build tested, runtime untested, but not
> > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > we state what are the fields we actually use, have those attribute
> > > documenting that those offsets will be recorded for future use, etc.
> > >
> > > Namhyung, can you please check that this works?
> > >
> > > Thanks,
> > >
> > > - Arnaldo
> > >
> > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > @@ -1,11 +1,40 @@
> > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > // Copyright (c) 2021 Facebook
> > > // Copyright (c) 2021 Google
> > > -#include "vmlinux.h"
> > > +#include <linux/types.h>
> > > +#include <linux/bpf.h>
> >
> > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > we check in a vmlinux.h like libbpf-tools does?
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> >
> > This would also remove some of the errors that could be introduced by
> > copy+pasting enums, etc. and also highlight issues with things being
> > renamed as build time rather than runtime failures.
>
> we already have to deal with that, right? doing checks on fields in
> structs like mm_struct___old
>
> > Could this be some shared resource for the different linux tools
> > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > install_headers target that builds a vmlinux.h.
>
> I tried to do the minimal header and it's not too big,
> I pushed it in here:
> https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
>
> compile tested so far

I see it and it makes the change to be minimal, which is good at the
current stage, but I wonder if it wouldn't be better for us to define
just the ones not in UAPI and use the #include <linux/bpf.h>,
<linux/perf_event.h> as I did in the patches I posted here and Namhyung
tested at least one, this way the added vmlinux.h file get even smaller
by not including things like:

[acme@quaco perf-tools]$ egrep -w '(perf_event_sample_format|bpf_perf_event_value|perf_sample_weight|perf_mem_data_src) {' include/uapi/linux/*.h
include/uapi/linux/bpf.h:struct bpf_perf_event_value {
include/uapi/linux/perf_event.h:enum perf_event_sample_format {
include/uapi/linux/perf_event.h:union perf_mem_data_src {
include/uapi/linux/perf_event.h:union perf_mem_data_src {
include/uapi/linux/perf_event.h:union perf_sample_weight {
[acme@quaco perf-tools]$

Also why do we need these:

+struct mm_struct {
+} __attribute__((preserve_access_index));
+
+struct raw_spinlock {
+} __attribute__((preserve_access_index));
+
+typedef struct raw_spinlock raw_spinlock_t;
+
+struct spinlock {
+} __attribute__((preserve_access_index));
+
+typedef struct spinlock spinlock_t;
+
+struct sighand_struct {
+ spinlock_t siglock;
+} __attribute__((preserve_access_index));

We don't use them, they're just pointers you kept on:

+struct task_struct {
+ struct css_set *cgroups;
+ pid_t pid;
+ pid_t tgid;
+ char comm[16];
+ struct mm_struct *mm;
+ struct sighand_struct *sighand;
+ unsigned int flags;
+} __attribute__((preserve_access_index));

That with the preserve_access_index isn't needed, we need just the
fields that we access in the tools, right?

- Arnaldo

2023-05-05 15:19:32

by Alexei Starovoitov

[permalink] [raw]

Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Fri, May 5, 2023 at 6:33 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa escreveu:
> > On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > >
> > > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > > instead of using kernel headers?
> > > > >
> > > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > > dependency on up-to-date host kernel and such.
> > > > >
> > > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > > types, it might be a better option to just use UAPI headers for public
> > > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > > suffice:
> > > > >
> > > > > > struct task_struct {
> > > > > > int pid;
> > > > > > int tgid;
> > > > > > } __attribute__((preserve_access_index));
> > > > >
> > > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > > CO-RE notices that task_struct changed from this two integers version
> > > > > (of course) and does the relocation to where it is in the running kernel
> > > > > by using /sys/kernel/btf/vmlinux.
> > > >
> > > > Doing it for one of the skels, build tested, runtime untested, but not
> > > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > > we state what are the fields we actually use, have those attribute
> > > > documenting that those offsets will be recorded for future use, etc.
> > > >
> > > > Namhyung, can you please check that this works?
> > > >
> > > > Thanks,
> > > >
> > > > - Arnaldo
> > > >
> > > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > @@ -1,11 +1,40 @@
> > > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > // Copyright (c) 2021 Facebook
> > > > // Copyright (c) 2021 Google
> > > > -#include "vmlinux.h"
> > > > +#include <linux/types.h>
> > > > +#include <linux/bpf.h>
> > >
> > > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > > we check in a vmlinux.h like libbpf-tools does?
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> > >
> > > This would also remove some of the errors that could be introduced by
> > > copy+pasting enums, etc. and also highlight issues with things being
> > > renamed as build time rather than runtime failures.
> >
> > we already have to deal with that, right? doing checks on fields in
> > structs like mm_struct___old
> >
> > > Could this be some shared resource for the different linux tools
> > > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > > install_headers target that builds a vmlinux.h.
> >
> > I tried to do the minimal header and it's not too big,
> > I pushed it in here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
> >
> > compile tested so far
>
> I see it and it makes the change to be minimal, which is good at the
> current stage, but I wonder if it wouldn't be better for us to define
> just the ones not in UAPI and use the #include <linux/bpf.h>,
> <linux/perf_event.h> as I did in the patches I posted here and Namhyung
> tested at least one, this way the added vmlinux.h file get even smaller
> by not including things like:
>
> [acme@quaco perf-tools]$ egrep -w '(perf_event_sample_format|bpf_perf_event_value|perf_sample_weight|perf_mem_data_src) {' include/uapi/linux/*.h
> include/uapi/linux/bpf.h:struct bpf_perf_event_value {
> include/uapi/linux/perf_event.h:enum perf_event_sample_format {
> include/uapi/linux/perf_event.h:union perf_mem_data_src {
> include/uapi/linux/perf_event.h:union perf_mem_data_src {
> include/uapi/linux/perf_event.h:union perf_sample_weight {
> [acme@quaco perf-tools]$
>
> Also why do we need these:
>
> +struct mm_struct {
> +} __attribute__((preserve_access_index));
> +
> +struct raw_spinlock {
> +} __attribute__((preserve_access_index));
> +
> +typedef struct raw_spinlock raw_spinlock_t;
> +
> +struct spinlock {
> +} __attribute__((preserve_access_index));
> +
> +typedef struct spinlock spinlock_t;
> +
> +struct sighand_struct {
> + spinlock_t siglock;
> +} __attribute__((preserve_access_index));
>
> We don't use them, they're just pointers you kept on:
>
> +struct task_struct {
> + struct css_set *cgroups;
> + pid_t pid;
> + pid_t tgid;
> + char comm[16];
> + struct mm_struct *mm;
> + struct sighand_struct *sighand;
> + unsigned int flags;
> +} __attribute__((preserve_access_index));
>
> That with the preserve_access_index isn't needed, we need just the
> fields that we access in the tools, right?

Aside from that you probably want to take a look at BTFgen.
Old doc:
https://github.com/aquasecurity/btfhub/blob/main/docs/btfgen-internals.md
which landed as
"bpftool gen min_core_btf"
man bpftool-gen

It addresses the use case for kernels _without_ CONFIG_DEBUG_INFO_BTF.

2023-05-05 17:15:17

[permalink] [raw]

Subject: [PATCH RFC/RFT] perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE. was Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Fri, May 05, 2023 at 10:33:15AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa escreveu:
> That with the preserve_access_index isn't needed, we need just the
> fields that we access in the tools, right?

I'm now doing build test this in many distro containers, without the two
reverts, i.e. BPF skels continue as opt-out as in my pull request, to
test build and also for the functionality tests on the tools using such
bpf skels, see below, no touching of vmlinux nor BTF data during the
build.

- Arnaldo

From 882adaee50bc27f85374aeb2fbaa5b76bef60d05 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <[email protected]>
Date: Thu, 4 May 2023 19:03:51 -0300
Subject: [PATCH 1/1] perf bpf skels: Stop using vmlinux.h generated from BTF,
use subset of used structs + CO-RE

Linus reported a build break due to using a vmlinux without a BTF elf
section to generate the vmlinux.h header with bpftool for use in the BPF
tools in tools/perf/util/bpf_skel/*.bpf.c.

Instead add a vmlinux.h file with the structs needed with the fields the
tools need, marking the structs with __attribute__((preserve_access_index)),
so that libbpf's CO-RE code can fixup the struct field offsets.

In some cases the vmlinux.h file that was being generated by bpftool
from the kernel BTF information was not needed at all, just including
linux/bpf.h, sometimes linux/perf_event.h was enough as non-UAPI
types were not being used.

To keep te patch small, include those UAPI headers from the trimmed down
vmlinux.h file, that then provides the tools with just the structs and
the subset of its fields needed for them.

Testing it:

# perf lock contention -b find / > /dev/null
^C contended total wait max wait avg wait type caller

7 53.59 us 10.86 us 7.66 us rwlock:R start_this_handle+0xa0
2 30.35 us 21.99 us 15.17 us rwsem:R iterate_dir+0x52
1 9.04 us 9.04 us 9.04 us rwlock:W start_this_handle+0x291
1 8.73 us 8.73 us 8.73 us spinlock raw_spin_rq_lock_nested+0x1e
#
# perf lock contention -abl find / > /dev/null
^C contended total wait max wait avg wait address symbol

1 262.96 ms 262.96 ms 262.96 ms ffff8e67502d0170 (mutex)
12 244.24 us 39.91 us 20.35 us ffff8e6af56f8070 mmap_lock (rwsem)
7 30.28 us 6.85 us 4.33 us ffff8e6c865f1d40 rq_lock (spinlock)
3 7.42 us 4.03 us 2.47 us ffff8e6c864b1d40 rq_lock (spinlock)
2 3.72 us 2.19 us 1.86 us ffff8e6c86571d40 rq_lock (spinlock)
1 2.42 us 2.42 us 2.42 us ffff8e6c86471d40 rq_lock (spinlock)
4 2.11 us 559 ns 527 ns ffffffff9a146c80 rcu_state (spinlock)
3 1.45 us 818 ns 482 ns ffff8e674ae8384c (rwlock)
1 870 ns 870 ns 870 ns ffff8e68456ee060 (rwlock)
1 663 ns 663 ns 663 ns ffff8e6c864f1d40 rq_lock (spinlock)
1 573 ns 573 ns 573 ns ffff8e6c86531d40 rq_lock (spinlock)
1 472 ns 472 ns 472 ns ffff8e6c86431740 (spinlock)
1 397 ns 397 ns 397 ns ffff8e67413a4f04 (spinlock)
#
# perf test offcpu
95: perf record offcpu profiling tests : Ok
#
# perf kwork latency --use-bpf
Starting trace, Hit <Ctrl+C> to stop and report
^C
Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end |
--------------------------------------------------------------------------------------------------------------------------------
(w)flush_memcg_stats_dwork | 0000 | 1056.212 ms | 2 | 2112.345 ms | 550113.229573 s | 550115.341919 s |
(w)toggle_allocation_gate | 0000 | 10.144 ms | 62 | 416.389 ms | 550113.453518 s | 550113.869907 s |
(w)0xffff8e6748e28080 | 0002 | 0.623 ms | 1 | 0.623 ms | 550110.989841 s | 550110.990464 s |
(w)vmstat_shepherd | 0000 | 0.586 ms | 10 | 2.828 ms | 550111.971536 s | 550111.974364 s |
(w)vmstat_update | 0007 | 0.363 ms | 5 | 1.634 ms | 550113.222520 s | 550113.224154 s |
(w)vmstat_update | 0000 | 0.324 ms | 10 | 2.827 ms | 550111.971526 s | 550111.974354 s |
(w)0xffff8e674c5f4a58 | 0002 | 0.102 ms | 5 | 0.134 ms | 550110.989839 s | 550110.989972 s |
(w)psi_avgs_work | 0001 | 0.086 ms | 3 | 0.107 ms | 550114.957852 s | 550114.957959 s |
(w)psi_avgs_work | 0000 | 0.079 ms | 5 | 0.100 ms | 550118.605668 s | 550118.605768 s |
(w)kfree_rcu_monitor | 0006 | 0.079 ms | 1 | 0.079 ms | 550110.925821 s | 550110.925900 s |
(w)psi_avgs_work | 0004 | 0.079 ms | 1 | 0.079 ms | 550109.581835 s | 550109.581914 s |
(w)psi_avgs_work | 0001 | 0.078 ms | 1 | 0.078 ms | 550109.197809 s | 550109.197887 s |
(w)psi_avgs_work | 0002 | 0.077 ms | 5 | 0.086 ms | 550110.669819 s | 550110.669905 s |
<SNIP>
# strace -e bpf -o perf-stat-bpf-counters.output perf stat -e cycles --bpf-counters sleep 1

Performance counter stats for 'sleep 1':

6,197,983 cycles

1.003922848 seconds time elapsed

0.000000000 seconds user
0.002032000 seconds sys

# head -7 perf-stat-bpf-counters.output
bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/perf_attr_map", bpf_fd=0, file_flags=0}, 16) = 3
bpf(BPF_OBJ_GET_INFO_BY_FD, {info={bpf_fd=3, info_len=88, info=0x7ffcead64990}}, 16) = 0
bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=3, key=0x24129e0, value=0x7ffcead65a48, flags=BPF_ANY}, 32) = 0
bpf(BPF_LINK_GET_FD_BY_ID, {link_id=1252}, 12) = -1 ENOENT (No such file or directory)
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65780, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 116) = 4
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65920, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 128) = 4
bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\20\0\0\0\20\0\0\0\5\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=45, btf_log_size=0, btf_log_level=0}, 28) = 4
#

Reported-by: Linus Torvalds <[email protected]>
Suggested-by: Andrii Nakryiko <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Co-developed-by: Jiri Olsa <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/Makefile.perf | 20 +---
tools/perf/util/bpf_skel/.gitignore | 1 -
tools/perf/util/bpf_skel/vmlinux.h | 173 ++++++++++++++++++++++++++++
3 files changed, 174 insertions(+), 20 deletions(-)
create mode 100644 tools/perf/util/bpf_skel/vmlinux.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 48aba186ceb50792..61c33d100b2bcc90 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -1063,25 +1063,7 @@ $(BPFTOOL): | $(SKEL_TMP_OUT)
$(Q)CFLAGS= $(MAKE) -C ../bpf/bpftool \
OUTPUT=$(SKEL_TMP_OUT)/ bootstrap

-VMLINUX_BTF_PATHS ?= $(if $(O),$(O)/vmlinux) \
- $(if $(KBUILD_OUTPUT),$(KBUILD_OUTPUT)/vmlinux) \
- ../../vmlinux \
- /sys/kernel/btf/vmlinux \
- /boot/vmlinux-$(shell uname -r)
-VMLINUX_BTF ?= $(abspath $(firstword $(wildcard $(VMLINUX_BTF_PATHS))))
-
-$(SKEL_OUT)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL)
-ifeq ($(VMLINUX_H),)
- $(QUIET_GEN)$(BPFTOOL) btf dump file $< format c > $@ || \
- (echo "Failure to generate vmlinux.h needed for the recommended BPF skeleton support." && \
- echo "To disable this use the build option NO_BPF_SKEL=1." && \
- echo "Alternatively point at a pre-generated vmlinux.h with VMLINUX_H=<path>." && \
- false)
-else
- $(Q)cp "$(VMLINUX_H)" $@
-endif
-
-$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) $(SKEL_OUT)/vmlinux.h | $(SKEL_TMP_OUT)
+$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT)
$(QUIET_CLANG)$(CLANG) -g -O2 -target bpf -Wall -Werror $(BPF_INCLUDE) \
-c $(filter util/bpf_skel/%.bpf.c,$^) -o $@ && $(LLVM_STRIP) -g $@

diff --git a/tools/perf/util/bpf_skel/.gitignore b/tools/perf/util/bpf_skel/.gitignore
index cd01455e1b53c3d9..7a1c832825de8445 100644
--- a/tools/perf/util/bpf_skel/.gitignore
+++ b/tools/perf/util/bpf_skel/.gitignore
@@ -1,4 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
.tmp
*.skel.h
-vmlinux.h
diff --git a/tools/perf/util/bpf_skel/vmlinux.h b/tools/perf/util/bpf_skel/vmlinux.h
new file mode 100644
index 0000000000000000..449b1ea91fc48143
--- /dev/null
+++ b/tools/perf/util/bpf_skel/vmlinux.h
@@ -0,0 +1,173 @@
+#ifndef __VMLINUX_H
+#define __VMLINUX_H
+
+#include <linux/bpf.h>
+#include <linux/types.h>
+#include <linux/perf_event.h>
+#include <stdbool.h>
+
+// non-UAPI kernel data structures, used in the .bpf.c BPF tool component.
+
+// Just the fields used in these tools preserving the access index so that
+// libbpf can fixup offsets with the ones used in the kernel when loading the
+// BPF bytecode, if they differ from what is used here.
+
+typedef __u8 u8;
+typedef __u32 u32;
+typedef __u64 u64;
+typedef __s64 s64;
+
+typedef int pid_t;
+
+enum cgroup_subsys_id {
+ perf_event_cgrp_id = 8,
+};
+
+enum {
+ HI_SOFTIRQ = 0,
+ TIMER_SOFTIRQ,
+ NET_TX_SOFTIRQ,
+ NET_RX_SOFTIRQ,
+ BLOCK_SOFTIRQ,
+ IRQ_POLL_SOFTIRQ,
+ TASKLET_SOFTIRQ,
+ SCHED_SOFTIRQ,
+ HRTIMER_SOFTIRQ,
+ RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */
+
+ NR_SOFTIRQS
+};
+
+typedef struct {
+ s64 counter;
+} __attribute__((preserve_access_index)) atomic64_t;
+
+typedef atomic64_t atomic_long_t;
+
+struct raw_spinlock {
+ int rawlock;
+} __attribute__((preserve_access_index));
+
+typedef struct raw_spinlock raw_spinlock_t;
+
+typedef struct {
+ struct raw_spinlock rlock;
+} __attribute__((preserve_access_index)) spinlock_t;
+
+struct sighand_struct {
+ spinlock_t siglock;
+} __attribute__((preserve_access_index));
+
+struct rw_semaphore {
+ atomic_long_t owner;
+} __attribute__((preserve_access_index));
+
+struct mutex {
+ atomic_long_t owner;
+} __attribute__((preserve_access_index));
+
+struct kernfs_node {
+ u64 id;
+} __attribute__((preserve_access_index));
+
+struct cgroup {
+ struct kernfs_node *kn;
+ int level;
+} __attribute__((preserve_access_index));
+
+struct cgroup_subsys_state {
+ struct cgroup *cgroup;
+} __attribute__((preserve_access_index));
+
+struct css_set {
+ struct cgroup_subsys_state *subsys[13];
+ struct cgroup *dfl_cgrp;
+} __attribute__((preserve_access_index));
+
+struct mm_struct {
+ struct rw_semaphore mmap_lock;
+} __attribute__((preserve_access_index));
+
+struct task_struct {
+ unsigned int flags;
+ struct mm_struct *mm;
+ pid_t pid;
+ pid_t tgid;
+ char comm[16];
+ struct sighand_struct *sighand;
+ struct css_set *cgroups;
+} __attribute__((preserve_access_index));
+
+struct trace_entry {
+ short unsigned int type;
+ unsigned char flags;
+ unsigned char preempt_count;
+ int pid;
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_irq_handler_entry {
+ struct trace_entry ent;
+ int irq;
+ u32 __data_loc_name;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_irq_handler_exit {
+ struct trace_entry ent;
+ int irq;
+ int ret;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_softirq {
+ struct trace_entry ent;
+ unsigned int vec;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_execute_start {
+ struct trace_entry ent;
+ void *work;
+ void *function;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_execute_end {
+ struct trace_entry ent;
+ void *work;
+ void *function;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_activate_work {
+ struct trace_entry ent;
+ void *work;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct perf_sample_data {
+ u64 addr;
+ u64 period;
+ union perf_sample_weight weight;
+ u64 txn;
+ union perf_mem_data_src data_src;
+ u64 ip;
+ struct {
+ u32 pid;
+ u32 tid;
+ } tid_entry;
+ u64 time;
+ u64 id;
+ struct {
+ u32 cpu;
+ } cpu_entry;
+ u64 phys_addr;
+ u64 data_page_size;
+ u64 code_page_size;
+} __attribute__((__aligned__(64))) __attribute__((preserve_access_index));
+
+struct bpf_perf_event_data_kern {
+ struct perf_sample_data *data;
+ struct perf_event *event;
+} __attribute__((preserve_access_index));
+#endif // __VMLINUX_H
--
2.39.2

2023-05-05 20:59:31

[permalink] [raw]

Subject: Re: [PATCH RFC/RFT] perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE. was Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Fri, May 05, 2023 at 01:46:30PM -0700, Ian Rogers escreveu:
> On Fri, May 5, 2023 at 1:43 PM Jiri Olsa <[email protected]> wrote:
> >
> > On Fri, May 05, 2023 at 10:04:47AM -0700, Ian Rogers wrote:
> > > On Fri, May 5, 2023 at 9:56 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > >
> > > > Em Fri, May 05, 2023 at 10:33:15AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > Em Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa escreveu:
> > > > > That with the preserve_access_index isn't needed, we need just the
> > > > > fields that we access in the tools, right?
> > > >
> > > > I'm now doing build test this in many distro containers, without the two
> > > > reverts, i.e. BPF skels continue as opt-out as in my pull request, to
> > > > test build and also for the functionality tests on the tools using such
> > > > bpf skels, see below, no touching of vmlinux nor BTF data during the
> > > > build.
> > > >
> > > > - Arnaldo
> > > >
> > > > From 882adaee50bc27f85374aeb2fbaa5b76bef60d05 Mon Sep 17 00:00:00 2001
> > > > From: Arnaldo Carvalho de Melo <[email protected]>
> > > > Date: Thu, 4 May 2023 19:03:51 -0300
> > > > Subject: [PATCH 1/1] perf bpf skels: Stop using vmlinux.h generated from BTF,
> > > > use subset of used structs + CO-RE
> > > >
> > > > Linus reported a build break due to using a vmlinux without a BTF elf
> > > > section to generate the vmlinux.h header with bpftool for use in the BPF
> > > > tools in tools/perf/util/bpf_skel/*.bpf.c.
> > > >
> > > > Instead add a vmlinux.h file with the structs needed with the fields the
> > > > tools need, marking the structs with __attribute__((preserve_access_index)),
> > > > so that libbpf's CO-RE code can fixup the struct field offsets.
> > > >
> > > > In some cases the vmlinux.h file that was being generated by bpftool
> > > > from the kernel BTF information was not needed at all, just including
> > > > linux/bpf.h, sometimes linux/perf_event.h was enough as non-UAPI
> > > > types were not being used.
> > > >
> > > > To keep te patch small, include those UAPI headers from the trimmed down
> > > > vmlinux.h file, that then provides the tools with just the structs and
> > > > the subset of its fields needed for them.
> > > >
> > > > Testing it:
> > > >
> > > > # perf lock contention -b find / > /dev/null
> >
> > I tested perf lock con -abv -L rcu_state sleep 1
> > and needed fix below
> >
> > jirka
>
> I thought this was fixed by:
> https://lore.kernel.org/lkml/[email protected]/
> but I think that is just in perf-tools-next.

Nope, we have it in perf-tools:

commit e53de7b65a3ca59af268c78df2d773f277f717fd
Author: Namhyung Kim <[email protected]>
Date: Thu Apr 27 16:48:32 2023 -0700

perf lock contention: Fix struct rq lock access

2023-05-07 19:17:14

by pr-tracker-bot

[permalink] [raw]

Subject: Re: [GIT PULL] perf tools changes for v6.4

The pull request you sent on Wed, 3 May 2023 18:18:01 -0300:

> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-tools-for-v6.4-1-2023-05-03

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/ecc68ee216c6c5b2f84915e1441adf436f1b019b

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

2023-05-08 22:39:32