2023-05-03 21:22:14

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [GIT PULL] perf tools changes for v6.4

Hi Linus,

Please consider pulling,

Best regards,

- Arnaldo

The following changes since commit 55a21105ecc156495446d8ae75d7d73f66baed7b:

Merge tag 'riscv-for-linus-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux (2023-03-10 09:19:30 -0800)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-tools-for-v6.4-1-2023-05-03

for you to fetch changes up to 1f85d016768ff19f060f3cce014a43c761de8259:

perf test record+probe_libc_inet_pton: Fix call chain match on x86_64 (2023-05-03 11:02:21 -0300)

----------------------------------------------------------------
perf tools changes and fixes for v6.4:

Build:

- Require libtraceevent to build, one can disable it using NO_LIBTRACEEVENT=1.

It is required for tools like 'perf sched', 'perf kvm', 'perf trace', etc.

libtraceevent is available in most distros so installing 'libtraceevent-devel' should
be a one-time event to continue building perf as usual.

Using NO_LIBTRACEEVENT=1 produces tooling that is functional and sufficient
for lots of users not interested in those libtraceevent dependent features.

- Allow Python support in 'perf script' when libtraceevent isn't linked, as not
all features requires it, for instance Intel PT does not use tracepoints.

- Error if the python interpreter needed for jevents to work isn't available
and NO_JEVENTS=1 isn't set, preventing a build without support for JSON
vendor events, which is a rare but possible condition. The two check error
messages:

$(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
$(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)

- Make libbpf 1.0 the minimum required when building with out of tree, distro
provided libbpf.

- Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++ demangler, add
'perf test' entry for it.

- Make binutils libraries opt in, as distros disable building with it due to licensing,
they were used for C++ demangling, for instance.

- Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or equivalent) isn't installed, we'll
just have a build warning:

Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev

- Add a feature test for scandirat(), that is not implemented so far in musl and uclibc,
disabling features that need it, such as scanning for tracepoints in /sys/kernel/tracing/events.

- Build by default with BPF skels, to disable it use NO_BPF_SKEL=1 in the make
command line.

This make the following features to be available by default, which were
introduced in previous kernel versions built required building with BUILD_BPF_SKEL=1:

- perf lock contention
- perf kwork
- off-cpu profiling
- Filtering events using BPF
- 'perf stat' for counting events on BPF programs

- Features are being implemented using BPF skels, where BPF is built and linked
with the native perf tool to then be loaded and attached to places in the
kernel where information can be obtained to implement features that previously
required kernel changes.

These load and attach operations are only performed when the feature that
uses it is requested by the user in the perf command line.

One can look at the git history for the files in the
tools/perf/util/bpf_skel/ directory to find out about them, in addition to the
perf tools man pages, as of this writing the contents of this directory:

$ cd tools/perf/util/bpf_skel/
$ ls
bperf_follower.bpf.c bperf_leader.bpf.c
bpf_prog_profiler.bpf.c func_latency.bpf.c
kwork_trace.bpf.c lock_contention.bpf.c
off_cpu.bpf.c sample_filter.bpf.c

perf BPF filters:

- New feature where BPF can be used to filter samples, for instance:

$ sudo ./perf record -e cycles --filter 'period > 1000' true
$ sudo ./perf script
perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)

- In addition to 'period' (PERF_SAMPLE_PERIOD), the other PERF_SAMPLE_ can be
used for filtering, and also some other sample accessible values, from
tools/perf/Documentation/perf-record.txt:

Essentially the BPF filter expression is:

<term> <operator> <value> (("," | "||") <term> <operator> <value>)*

The <term> can be one of:
ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
mem_dtlb, mem_blk, mem_hops

The <operator> can be one of:
==, !=, >, >=, <, <=, &

The <value> can be one of:
<number> (for any term)
na, load, store, pfetch, exec (for mem_op)
l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
remote (for mem_remote)
na, locked (for mem_locked)
na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
na, by_data, by_addr (for mem_blk)
hops0, hops1, hops2, hops3 (for mem_hops)

perf lock contention:

- Show lock type with address.

- Track and show mmap_lock, siglock and per-cpu rq_lock with address. This is
done for mmap_lock by following the current->mm pointer:

$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80

- Update default map size to 16384.

- Allocate single letter option -M for --map-nr-entries, as it is proving being
frequently used.

- Fix struct rq lock access for older kernels with BPF's CO-RE (Compile once,
run everywhere).

- Fix problems found with MSAn.

perf report/top:

- Add inline information when using --call-graph=fp or lbr, as was already done
to the --call-graph=dwarf callchain mode.

- Improve the 'srcfile' sort key performance by really using an optimization
introduced in 6.2 for the 'srcline' sort key that avoids calling addr2line for
comparision with each sample.

perf sched:

- Make 'perf sched latency/map/replay' to use "sched:sched_waking" instead of
"sched:sched_waking", consistent with 'perf record' since d566a9c2d482 ("perf
sched: Prefer sched_waking event when it exists").

perf ftrace:

- Make system wide the default target for latency subcommand, run the following
command then generate some network traffic and press control+C:

# perf ftrace latency -T __kfree_skb
^C# DURATION | COUNT | GRAPH |
0 - 1 us | 27 | ############# |
1 - 2 us | 22 | ########### |
2 - 4 us | 8 | #### |
4 - 8 us | 5 | ## |
8 - 16 us | 24 | ############ |
16 - 32 us | 2 | # |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
#

perf top:

- Add --branch-history (LBR: Last Branch Record) option, just like already
available for 'perf record'.

- Fix segfault in thread__comm_len() where thread->comm was being used outside
thread->comm_lock.

perf annotate:

- Allow configuring objdump and addr2line in ~/.perfconfig., so that you can
use alternative binaries, such as llvm's.

perf kvm:

- Add TUI mode for 'perf kvm stat report'.

Reference counting:

- Add reference count checking infrastructure to check for use after free, done
to the 'cpumap', 'namespaces', 'maps' and 'map' structs, more to come. To
build with it use -DREFCNT_CHECKING=1 in the make command line to build
tools/perf.

Documented at: https://perf.wiki.kernel.org/index.php/Reference_Count_Checking

- This caught, for instance, fix, present in this series:

- Fix maps use after put in 'perf test "Share thread maps"':

'maps' is copied from leader, but the leader is put on line 79 and then
'maps' is used to read the reference count below - so a use after put, with the
put of maps happening within thread__put. Fix by reversing the order of puts so
that the leader is put last.

- Also several fixes were made to places where reference counts were not being held.

- Make this one of the tests in 'make -C tools/perf build-test' to regularly
build test it and to make sure no direct access to the reference counted
structs are made, doing that via accessors to check the validity of the struct
pointer.

ARM64:

- Fix 'perf report' segfault when filtering coresight traces by sparse lists of CPUs.

- Add support for 'simd' as a sort field for 'perf report', to show ARM's NEON SIMD's
predicate flags: "partial" and "empty".

Vendor events (JSON):

arm64:

- Add N1 metrics.

Intel:

- Add graniterapids, grandridge and sierraforrest events.

- Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex, jaketown,
meteorlake, knightslanding, sandybridge, sapphirerapids, silvermont, skylake,
tigerlake and westmereep-dp

- Refresh metrics for alderlake-n, broadwell, broadwellde, broadwellx, haswell,
haswellx, icelakex, ivybridge, ivytown and skylakex.

perf stat:

- Implement --topdown using JSON metrics.

- Add TopdownL1 JSON metric as a default if present, but disable it for
now for some Intel hybrid architectures, a series of patches
addressing this is being reviewed and will be submitted for v6.5.

- Use metrics for --smi-cost.

- Update topdown documentation.

Vendor events (JSON) infrastructure:

- Add support for computing and printing metric threshold values. For instance,
here is one found in the tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
file:

{
"BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
"MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
"MetricGroup": "smi",
"MetricName": "smi_cycles",
"MetricThreshold": "smi_cycles > 0.1",
"ScaleUnit": "100%"
},

- Test parsing metric thresholds with the fake PMU in 'perf test pmu-events'.

- Support for printing metric thresholds in 'perf list'.

- Add --metric-no-threshold option to 'perf stat'.

- Add rand (reverse and) and has_pmem (optane memory) support to metrics.

- Sort list of input files to avoid depending on the order from readdir()
helping in obtaining reproducible builds.

S/390:

- Add common metrics: - CPI (cycles per instruction), prbstate (ratio of
instructions executed in problem state compared to total number of
instructions), l1mp (Level one instruction and data cache misses per 100
instructions).

- Add cache metrics for z13, z14, z15 and z16.

- Add metric for TLB and cache.

ARM:

- Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE (Memory
Tagging Extension) and MOPS (Memory Operations) load/store.

Hardware tracing:

Intel PT:

- Add event type names UINTR (User interrupt delivered) and UIRET (Exiting from
user interrupt routine), documented in table 32-50 "CFE Packet Type and Vector
Fields Details" in the Intel Processor Trace chapter of The Intel SDM
Volume 3 version 078.

- Add support for new branch instructions ERETS and ERETU.

- Fix CYC timestamps after standalone CBR

ARM CoreSight:

- Allow user to override timestamp and contextid settings.

- Fix segfault in dso lookup.

- Fix timeless decode mode detection.

- Add separate decode paths for timeless and per-thread modes.

auxtrace:

- Fix address filter entire kernel size.

Miscellaneous:

- Fix use-after-free and unaligned bugs in the PLT handling routines.

- Use zfree() to reduce chances of use after free.

- Add missing 0x prefix for addresses printed in hexadecimal in 'perf probe'.

- Suppress massive unsupported target platform errors in the unwind code.

- Fix return incorrect build_id size in elf_read_build_id().

- Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .

- Add missing new parameter in kfree_skb tracepoint to the python scripts using it.

- Add 'perf bench syscall fork' benchmark.

- Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in 'perf mem'.

- Fix wrong size expectation for perf test 'Setup struct perf_event_attr' caused by
the patch adding perf_event_attr::config3.

- Fix some spelling mistakes.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

----------------------------------------------------------------
Adrian Hunter (10):
perf script: Fix Python support when no libtraceevent
perf intel-pt: Add event type names UINTR and UIRET
perf intel-pt: Add support for new branch instructions ERETS and ERETU
perf symbols: Fix use-after-free in get_plt_got_name()
perf symbols: Fix unaligned access in get_x86_64_plt_disp()
perf tools: Avoid warning in do_realloc_array_as_needed()
perf top: Add --branch-history option
perf symbol: Remove unused branch_callstack
perf auxtrace: Fix address filter entire kernel size
perf intel-pt: Fix CYC timestamps after standalone CBR

Alexander Pantyukhin (1):
perf scripts python intel-pt-events: Delete unused 'event_attr variable

Andreas Herrmann (1):
perf bench numa: Fix type of loop iterator in do_work, it should be 'long'

Arnaldo Carvalho de Melo (41):
Merge remote-tracking branch 'acme/perf-tools' into perf-tools-next
perf tools bpf: Add vmlinux.h to .gitignore
tools build: Add a feature test for scandirat(), that is not implemented so far in musl and uclibc
perf inject: Use zfree() to reduce chances of use after free
perf daemon: Use zfree() to reduce chances of use after free
perf trace: Use zfree() to reduce chances of use after free
perf c2c: Use zfree() to reduce chances of use after free
perf list: Use zfree() to reduce chances of use after free
perf symbol: Use zfree() to reduce chances of use after free
perf x86 iostat: Use zfree() to reduce chances of use after free
perf env: Use zfree() to reduce chances of use after free
perf pmu: Use zfree() to reduce chances of use after free
perf evsel: Use zfree() to reduce chances of use after free
perf expr: Use zfree() to reduce chances of use after free
perf parse-events: Use zfree() to reduce chances of use after free
perf annotate: Use zfree() to reduce chances of use after free
perf evlist: Use zfree() to reduce chances of use after free
perf genelf: Use zfree() to reduce chances of use after free
perf bench inject-buildid: Use zfree() to reduce chances of use after free
perf tests api-io: Use zfree() to reduce chances of use after free
perf arm-spe: Use zfree() to reduce chances of use after free
perf metricgroups: Use zfree() to reduce chances of use after free
perf pmu: zfree() expects a pointer to a pointer to zero it after freeing its contents
perf map: Add map__refcnt() accessor to use in the maps test
libperf: Add a perf_cpu_map__set_nr() available as an internal function for tools/perf to use
perf pmu: Use perf_cpu_map__set_nr() in perf_pmu__cpus_match() to allow for refcnt checking
perf test: Simplify for_each_test() to avoid tripping on -Werror=array-bounds
libperf: Add perf_cpu_map__refcnt() interanl accessor to use in the maps test
perf cpumap: Remove initializations done in perf_cpu_map__alloc()
perf dso: Add dso__filename_with_chroot() to reduce number of accesses to dso->nsinfo members
perf namespaces: Use the need_setns() accessors instead of accessing ->need_setns directly
perf namespaces: Introduce nsinfo__refcnt() accessor to avoid accessing ->refcnt directly
perf namespaces: Introduce nsinfo__mntns_path() accessor to avoid accessing ->mntns_path directly
perf dso: Fix use before NULL check introduced by map__dso() introduction
perf maps: Add maps__refcnt() accessor to allow checking maps pointer
perf maps: Use maps__nr_maps() instead of open coded maps->nr_maps
perf map: Add missing conversions to map__refcnt()
perf map: Add set_ methods for map->{start,end,pgoff,pgoff,reloc,erange_warned,dso,map_ip,unmap_ip,priv}
perf build: Test the refcnt check build
perf probe: Add missing 0x prefix for addresses printed in hexadecimal
perf evsel: Introduce evsel__name_is() method to check if the evsel name is equal to a given string

Artem Savkov (1):
perf report: Append inlines to non-DWARF callchains

Bernhard M. Wiedemann (1):
perf jevents: Sort list of input files

Changbin Du (4):
perf record: Reuse target::initial_delay
perf ftrace: Reuse target::initial_delay
perf script: Print raw ip instead of binary offset for callchain
perf unwind: Suppress massive unsupported target platform errors

Chunxin Zang (1):
perf sched: Fix sched latency analysis incorrection when using 'sched:sched_wakeup'

Colin Ian King (1):
perf script task-analyzer: Fix spelling mistake "miliseconds" -> "milliseconds"

Ganapatrao Kulkarni (1):
perf cs-etm: Add fix for coresight trace for any range of CPUs

German Gomez (4):
perf event: Add 'simd_flags' field to 'struct perf_sample'
perf arm-spe: Refactor arm-spe to support operation packet type
perf arm-spe: Add SVE flags to the SPE samples
perf report: Add 'simd' sort field

Hangliang Lai (1):
perf top: Expand the range of multithreaded phase

Ian Rogers (188):
perf tools: Ensure evsel name is initialized
perf metrics: Improve variable names
perf pmu-events: Remove aggr_mode from pmu_event
perf pmu-events: Change aggr_mode to be an enum
perf pmu-events: Change deprecated to be a bool
perf pmu-events: Change perpkg to be a bool
perf expr: Make the online topology accessible globally
perf pmu-events: Make the metric_constraint an enum
perf pmu-events: Don't '\0' terminate enum values
perf vendor events intel: Refresh alderlake events
perf vendor events intel: Refresh alderlake-n metrics
perf vendor events intel: Refresh broadwell metrics
perf vendor events intel: Refresh broadwellde metrics
perf vendor events intel: Refresh broadwellx metrics
perf vendor events intel: Refresh cascadelakex events
perf vendor events intel: Add graniterapids events
perf vendor events intel: Refresh haswell metrics
perf vendor events intel: Refresh haswellx metrics
perf vendor events intel: Refresh icelake events
perf vendor events intel: Refresh icelakex metrics
perf vendor events intel: Refresh ivybridge metrics
perf vendor events intel: Refresh ivytown metrics
perf vendor events intel: Refresh jaketown events
perf vendor events intel: Refresh knightslanding events
perf vendor events intel: Refresh sandybridge events
perf vendor events intel: Refresh sapphirerapids events
perf vendor events intel: Refresh silvermont events
perf vendor events intel: Refresh skylake events
perf vendor events intel: Refresh skylakex metrics
perf vendor events intel: Refresh tigerlake events
perf vendor events intel: Refresh westmereep-dp events
perf jevents: Add rand support to metrics
perf jevent: Parse metric thresholds
perf pmu-events: Test parsing metric thresholds with the fake PMU
perf list: Support for printing metric thresholds
perf metric: Compute and print threshold values
perf expr: More explicit NAN handling
perf metric: Add --metric-no-threshold option
perf stat: Add TopdownL1 metric as a default if present
perf stat: Implement --topdown using json metrics
perf stat: Remove topdown event special handling
perf doc: Refresh topdown documentation
perf stat: Remove hard coded transaction events
perf stat: Use metrics for --smi-cost
perf stat: Remove perf_stat_evsel_id
perf stat: Move enums from header
perf stat: Hide runtime_stat
perf stat: Add cpu_aggr_map for loop
perf metric: Directly use counts rather than saved_value
perf stat: Use counts rather than saved_value
perf stat: Remove saved_value/runtime_stat
perf vendor events intel: Update alderlake to v1.19
perf vendor events intel: Update alderlaken to v1.19
perf vendor events intel: Update icelakex to v1.19
libperf evlist: Avoid a use of evsel idx
perf stat: Don't remove all grouped events when CPU maps disagree
perf pmu: Earlier PMU auxtrace initialization
perf stat: Modify the group test
perf evsel: Allow const evsel for certain accesses
perf evsel: Add function to compute group PMU name
perf parse-events: Pass ownership of the group name
perf parse-events: Sort and group parsed events
perf evsel: Remove use_uncore_alias
perf evlist: Remove nr_groups
perf parse-events: Warn when events are regrouped
perf lock contention: Fix compiler builtin detection
tools build: Pass libbpf feature only if libbpf 1.0+
perf build: Remove libbpf pre-1.0 feature tests
perf bpf: Remove pre libbpf 1.0 conditional logic
perf build: Support python/perf.so testing
perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL
perf build: Remove unused HAVE_GLIBC_SUPPORT
perf util: Remove weak sched_getcpu
perf build: Error if jevents won't work and NO_JEVENTS=1 isn't set
perf build: Make binutil libraries opt in
tools build: Add feature test for abi::__cxa_demangle
perf symbol: Add abi::__cxa_demangle C++ demangling support
perf build: Switch libpfm4 to opt-out rather than opt-in
perf build: If libtraceevent isn't present error the build
perf build: Remove redundant NO_NEWT build option
perf build: Error if no libelf and NO_LIBELF isn't set
perf test: Fix "PMU event table sanity" for NO_JEVENTS=1
perf vendor events intel: Update graniterapids events
perf vendor events intel: Update meteorlake events
perf vendor events intel: Update skylake events
perf symbol: Avoid memory leak from abi::__cxa_demangle
perf bpf_counter: Use public cpumap accessors
perf tests: Add common error route for code-reading
perf test: Fix memory leak in symbols
perf symbol: Sort names under write lock
perf build: Add warning for when vmlinux.h generation fails
perf vendor events intel: Broadwell v27 events
perf vendor events intel: Broadwellde v9 events
perf vendor events intel: Broadwellx v20 events
perf vendor events intel: Haswell v33 events
perf vendor events intel: Haswellx v27 events
perf vendor events intel: Jaketown v23 events
perf vendor events intel: Sandybridge v19 events
perf metrics: Add has_pmem literal
perf vendor events intel: Update metrics to detect pmem at runtime
perf annotate: Delete session for debug builds
perf report: Additional config warnings
perf annotate: Add init/exit to annotation_options remove default
perf annotate: Own objdump_path and disassembler_style strings
perf annotate: Allow objdump to be set in perfconfig
perf symbol: Add command line support for addr2line path
perf vendor events: Update Alderlake for E-Core TMA v2.3
perf bench: Avoid NDEBUG warning
perf block-range: Move debug code behind ifndef NDEBUG
perf build: Conditionally define NDEBUG
perf vendor events intel: Update ivybridge and ivytown
tools api: Add io__getline
perf srcline: Simplify addr2line subprocess
perf srcline: Support for llvm-addr2line
perf srcline: Avoid addr2line SIGPIPEs
perf build: Allow C++ demangle without libelf
perf jit: Fix a few memory leaks
perf map: Move map list node into symbol
perf maps: Remove rb_node from struct map
perf maps: Add functions to access maps
perf map: Add accessor for dso
perf map: Add accessor for start and end
perf pmu: Make parser reentrant
perf pmu: Fix a few potential fd leaks
perf pmu: Fewer const casts
perf pmu: Improve name/comments, avoid a memory allocation
perf pmu: Sort and remove duplicates using JSON PMU name
perf vendor events intel: Update free running alderlake events
perf vendor events intel: Update free running icelakex events
perf vendor events intel: Correct knightslanding memory topic
perf vendor events intel: Update free running snowridgex events
perf vendor events intel: Update free running tigerlake events
perf map: Rename map_ip() and unmap_ip()
perf map: Add helper for ->map_ip() and ->unmap_ip()
perf map: Add accessors for ->prot, ->priv and ->flags
perf map: Add accessors for ->pgoff and ->reloc
perf test: Add extra diagnostics to maps test
perf maps: Modify maps_by_name to hold a reference to a map
perf map: Changes to reference counting
perf lock contention: Support pre-5.14 kernels
perf bpf filter: Support pre-5.16 kernels where 'mem_hops' isn't in 'union perf_mem_data_src'
perf test stat+csv_output: Write CSV output to a file
perf stat: Don't write invalid "started on" comment for JSON output
perf test stat+json_output: Write JSON output to a file
perf ui: Move window resize signal functions
perf usage: Move usage strings
perf header: Move perf_version_string declaration
perf version: Use regular verbose flag
perf util: Move input_name to util
perf util: Move perf_guest/host declarations
perf build: Warn for BPF skeletons if endian mismatches
perf evsel: Avoid SEGV if delete is called on NULL
perf bperf: Avoid use after free via unrelated 'struct evsel' anonymous union field
perf vendor events: Update alderlake to v1.20
perf vendor events: Update icelakex to v1.20
perf cpumap: Use perf_cpu_map__nr(cpus) to access cpus->nr
libperf: Make perf_cpu_map__alloc() available as an internal function for tools/perf to use
perf vendor events intel: Update sapphirerapids to v1.12
perf vendor events intel: Add grandridge
perf vendor events intel: Add sierraforest
perf vendor events intel: Fix uncore topics for alderlake
perf vendor events intel: Fix uncore topics for broadwell
perf vendor events intel: Fix uncore topics for broadwellde
perf vendor events intel: Fix uncore topics for broadwellx
perf vendor events intel: Fix uncore topics for cascadelakex
perf vendor events intel: Fix uncore topics for haswell
perf vendor events intel: Fix uncore topics for haswellx
perf vendor events intel: Fix uncore topics for icelake
perf vendor events intel: Fix uncore topics for icelakex
perf vendor events intel: Fix uncore topics for ivybridge
perf vendor events intel: Fix uncore topics for ivytown
perf vendor events intel: Fix uncore topics for jaketown
perf vendor events intel: Fix uncore topics for knightslanding
perf vendor events intel: Fix uncore topics for sandybridge
perf vendor events intel: Fix uncore topics for skylake
perf vendor events intel: Fix uncore topics for skylakex
perf vendor events intel: Fix uncore topics for snowridgex
perf vendor events intel: Fix uncore topics for tigerlake
libperf: Add reference count checking macros
perf cpumap: Use perf_cpu_map__cpu(map, cpu) instead of accessing map->map[cpu] directly
perf cpumap: Add reference count checking
perf namespaces: Add reference count checking
perf maps: Add reference count checking
perf map: Add reference count checking
perf test: Fix maps use after put
libperf rc_check: Enable implicitly with sanitizers
perf stat: Avoid SEGV on counter->name
perf stat: Disable TopdownL1 on hybrid

James Clark (12):
perf cs-etm: Reduce verbosity of ts_source warning
perf cs-etm: Avoid printing warning in cs_etm_is_ete() check
perf vendor events arm64: Add N1 metrics
perf cs-etm: Fix segfault in dso lookup
perf cs-etm: Fix timeless decode mode detection
perf tools: Add util function for overriding user set config values
perf cs-etm: Don't test full_auxtrace because it's always set
perf cs-etm: Validate options after applying them
perf cs-etm: Allow user to override timestamp and contextid settings
perf cs-etm: Use bool type for boolean values
perf cs-etm: Add separate decode paths for timeless and per-thread modes
perf build: Fix unescaped # in perf build-test

Kajol Jain (1):
perf vendor events power9: Remove UTF-8 characters from JSON files

Kan Liang (1):
perf record: Fix "read LOST count failed" msg with sample read

Leo Yan (18):
perf kvm: Refactor overall statistics
perf kvm: Add pointer to 'perf_kvm_stat' in kvm event
perf kvm: Move up metrics helpers
perf kvm: Use subtraction for comparison metrics
perf kvm: Use macro to replace variable 'decode_str_len'
perf kvm: Introduce histograms data structures
perf kvm: Pass argument 'sample' to kvm_alloc_init_event()
perf kvm: Parse address location for samples
perf hist: Add 'kvm_info' field in histograms entry
perf kvm: Add dimensions for KVM event statistics
perf kvm: Use histograms list to replace cached list
perf kvm: Polish sorting key
perf kvm: Support printing attributions for dimensions
perf kvm: Add dimensions for percentages
perf kvm: Add TUI mode for stat report
perf kvm: Update documentation to reflect new changes
perf kvm: Reference count 'struct kvm_info'
perf kvm: Delete histograms entries before exiting

Liam Howlett (1):
tools: Rename __fallthrough to fallthrough

Markus Elfring (1):
perf map: Delete two variable initialisations before null pointer checks in sort__sym_from_cmp()

Mike Leach (3):
perf cs-etm: Move mapping of Trace ID and cpu into helper function
perf cs-etm: Update record event to use new Trace ID protocol
perf cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet

Namhyung Kim (39):
perf test: Fix offcpu test prev_state check
perf lock contention: Track and show mmap_lock with address
perf lock contention: Track and show siglock with address
perf lock contention: Show per-cpu rq_lock with address
perf lock contention: Show lock type with address
perf bpf filter: Introduce basic BPF filter expression
perf bpf filter: Implement event sample filtering
perf record: Add BPF event filter support
perf record: Record dropped sample count
perf bpf filter: Add 'pid' sample data support
perf bpf filter: Add more weight sample data support
perf bpf filter: Add data_src sample data support
perf bpf filter: Add logical OR operator
perf bpf filter: Show warning for missing sample flags
perf record: Update documentation for BPF filters
perf hist: Improve srcfile sort key performance (really)
perf lock contention: Fix msan issue in lock_contention_read()
perf lock contention: Fix debug stat if no contention
perf lock contention: Show detail failure reason for BPF
perf list: Use relative path for tracepoint scan
perf tools: Fix a asan issue in parse_events_multi_pmu_add()
perf pmu: Add perf_pmu__destroy() function
perf bench: Add pmu-scan benchmark
perf pmu: Use relative path for sysfs scan
perf pmu: Use relative path in perf_pmu__caps_parse()
perf pmu: Use relative path in setup_pmu_alias_list()
perf pmu: Add perf_pmu__{open,scan}_file_at()
perf intel-pt: Use perf_pmu__scan_file_at() if possible
perf lock contention: Simplify parse_lock_type()
perf lock contention: Use -M for --map-nr-entries
perf lock contention: Update default map size to 16384
perf lock contention: Add data failure stat
perf lock contention: Update total/bad stats for hidden entries
perf lock contention: Revise needs_callstack() condition
perf lock contention: Do not try to update if hash map is full
perf lock contention: Fix struct rq lock access
perf lock contention: Rework offset calculation with BPF CO-RE
perf list: Fix memory leaks in print_tracepoint_events()
perf list: Modify the warning message about scandirat(3)

Patrice Duroux (2):
perf tests record_offcpu.sh: Fix redirection of stderr to stdin
perf tests test_bridge_fdb_stress.sh: Fix redirection of stderr to stdin

Ravi Bangoria (6):
tools include UAPI: Sync uapi/linux/perf_event.h with the kernel sources
perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
perf mem: Refactor perf_mem__lvl_scnprintf() to process 'union perf_mem_data_src' more intuitively
perf mem: Increase HISTC_MEM_LVL column size to 39 chars
perf script ibs: Change bit description according to latest AMD PPR ("Processor Programming Reference")

Rob Herring (2):
perf tools: Add support for perf_event_attr::config3
perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/store

Roman Lozko (1):
perf scripts intel-pt-events.py: Fix IPC output for Python 2

Sriram Yagnaraman (1):
perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it

Thomas Richter (12):
perf vendor events s390: Add common metrics
perf vendor events s390: Add cache metrics for z16
perf list: Add PMU pai_ext event description for IBM z16
perf vendor events s390: Add cache metrics for z15
perf vendor events s390: Add cache metrics for z14
perf vendor events s390: Add cache metrics for z13
perf vendor events s390: Add metric for TLB and cache
perf test: Fix wrong size expectation for 'Setup struct perf_event_attr'
perf vendor events s390: Remove UTF-8 characters from JSON file
perf stat: Suppress warning when using cpum_cf events on s390
perf test record+probe_libc_inet_pton: Fix call chain match on s390
perf test record+probe_libc_inet_pton: Fix call chain match on x86_64

Tiezhu Yang (2):
perf bench syscall: Add fork syscall benchmark
tools headers: Remove s390 ptrace.h in check-headers.sh

Yang Jihong (3):
perf ftrace: Make system wide the default target for latency subcommand
perf symbols: Fix return incorrect build_id size in elf_read_build_id()
perf tracepoint: Fix memory leak in is_valid_tracepoint()

liuwenyu (1):
perf top: Fix rare segfault in thread__comm_len()

tools/arch/x86/include/uapi/asm/unistd_32.h | 4 +-
tools/arch/x86/include/uapi/asm/unistd_64.h | 3 +
tools/build/Makefile.feature | 2 +
tools/build/feature/Makefile | 15 +-
tools/build/feature/test-all.c | 5 +
tools/build/feature/test-cxa-demangle.cpp | 17 +
tools/build/feature/test-libbpf-bpf_map_create.c | 8 -
.../feature/test-libbpf-bpf_object__next_map.c | 8 -
.../feature/test-libbpf-bpf_object__next_program.c | 8 -
tools/build/feature/test-libbpf-bpf_prog_load.c | 9 -
.../feature/test-libbpf-bpf_program__set_insns.c | 8 -
.../test-libbpf-btf__load_from_kernel_by_id.c | 8 -
tools/build/feature/test-libbpf-btf__raw_data.c | 8 -
tools/build/feature/test-libbpf.c | 4 +
tools/build/feature/test-scandirat.c | 13 +
tools/include/linux/compiler-gcc.h | 6 +-
tools/include/linux/compiler.h | 4 -
tools/include/linux/coresight-pmu.h | 47 +-
tools/include/uapi/linux/perf_event.h | 3 +-
tools/lib/api/io.h | 45 +
tools/lib/perf/Makefile | 2 +-
tools/lib/perf/cpumap.c | 94 +-
tools/lib/perf/evlist.c | 31 +-
tools/lib/perf/include/internal/cpumap.h | 10 +-
tools/lib/perf/include/internal/evlist.h | 1 -
tools/lib/perf/include/internal/rc_check.h | 102 +
tools/lib/perf/include/perf/event.h | 2 +
tools/lib/perf/include/perf/evlist.h | 1 +
tools/perf/Build | 2 +-
tools/perf/Documentation/perf-annotate.txt | 3 +
tools/perf/Documentation/perf-config.txt | 8 +-
tools/perf/Documentation/perf-kvm.txt | 9 +-
tools/perf/Documentation/perf-lock.txt | 4 +-
tools/perf/Documentation/perf-record.txt | 60 +-
tools/perf/Documentation/perf-report.txt | 4 +
tools/perf/Documentation/perf-stat.txt | 27 +-
tools/perf/Documentation/perf-top.txt | 10 +
tools/perf/Documentation/topdown.txt | 70 +-
tools/perf/Makefile.config | 126 +-
tools/perf/Makefile.perf | 29 +-
tools/perf/arch/arm/tests/dwarf-unwind.c | 2 +-
tools/perf/arch/arm/util/cs-etm.c | 264 +-
tools/perf/arch/arm/util/pmu.c | 2 +
tools/perf/arch/arm64/tests/dwarf-unwind.c | 2 +-
tools/perf/arch/arm64/util/arm-spe.c | 28 +-
tools/perf/arch/arm64/util/kvm-stat.c | 5 +-
tools/perf/arch/common.c | 4 +-
tools/perf/arch/common.h | 2 +-
tools/perf/arch/powerpc/tests/dwarf-unwind.c | 2 +-
tools/perf/arch/powerpc/util/header.c | 2 +-
tools/perf/arch/powerpc/util/kvm-stat.c | 7 +-
tools/perf/arch/powerpc/util/skip-callchain-idx.c | 4 +-
tools/perf/arch/powerpc/util/sym-handling.c | 4 +-
tools/perf/arch/s390/annotate/instructions.c | 2 +-
tools/perf/arch/s390/util/Build | 1 +
tools/perf/arch/s390/util/kvm-stat.c | 1 -
tools/perf/arch/s390/util/pmu.c | 23 +
tools/perf/arch/x86/tests/dwarf-unwind.c | 2 +-
tools/perf/arch/x86/tests/insn-x86.c | 4 +
tools/perf/arch/x86/util/auxtrace.c | 4 -
tools/perf/arch/x86/util/event.c | 13 +-
tools/perf/arch/x86/util/evlist.c | 45 +-
tools/perf/arch/x86/util/intel-pt.c | 72 +-
tools/perf/arch/x86/util/iostat.c | 7 +-
tools/perf/arch/x86/util/kvm-stat.c | 15 +-
tools/perf/arch/x86/util/pmu.c | 21 +-
tools/perf/arch/x86/util/topdown.c | 78 +-
tools/perf/arch/x86/util/topdown.h | 1 -
tools/perf/bench/Build | 1 +
tools/perf/bench/bench.h | 2 +
tools/perf/bench/find-bit-bench.c | 8 +-
tools/perf/bench/inject-buildid.c | 3 +-
tools/perf/bench/numa.c | 2 +-
tools/perf/bench/pmu-scan.c | 184 +
tools/perf/bench/syscall.c | 35 +
tools/perf/builtin-annotate.c | 60 +-
tools/perf/builtin-bench.c | 2 +
tools/perf/builtin-buildid-list.c | 6 +-
tools/perf/builtin-c2c.c | 20 +-
tools/perf/builtin-daemon.c | 14 +-
tools/perf/builtin-data.c | 2 +-
tools/perf/builtin-diff.c | 6 +-
tools/perf/builtin-evlist.c | 2 +-
tools/perf/builtin-ftrace.c | 16 +-
tools/perf/builtin-help.c | 1 +
tools/perf/builtin-inject.c | 20 +-
tools/perf/builtin-kallsyms.c | 6 +-
tools/perf/builtin-kmem.c | 6 +-
tools/perf/builtin-kvm.c | 870 +-
tools/perf/builtin-kwork.c | 2 +-
tools/perf/builtin-list.c | 21 +-
tools/perf/builtin-lock.c | 144 +-
tools/perf/builtin-mem.c | 12 +-
tools/perf/builtin-probe.c | 2 +-
tools/perf/builtin-record.c | 56 +-
tools/perf/builtin-report.c | 63 +-
tools/perf/builtin-sched.c | 17 +-
tools/perf/builtin-script.c | 39 +-
tools/perf/builtin-stat.c | 275 +-
tools/perf/builtin-timechart.c | 2 +-
tools/perf/builtin-top.c | 67 +-
tools/perf/builtin-trace.c | 18 +-
tools/perf/builtin-version.c | 9 +-
tools/perf/builtin.h | 3 -
tools/perf/check-headers.sh | 1 -
tools/perf/perf.c | 27 +-
tools/perf/perf.h | 9 -
.../arm/{cortex-a76-n1 => cortex-a76}/branch.json | 0
.../arm/{cortex-a76-n1 => cortex-a76}/bus.json | 0
.../arm/{cortex-a76-n1 => cortex-a76}/cache.json | 0
.../{cortex-a76-n1 => cortex-a76}/exception.json | 0
.../{cortex-a76-n1 => cortex-a76}/instruction.json | 0
.../arm/{cortex-a76-n1 => cortex-a76}/memory.json | 0
.../{cortex-a76-n1 => cortex-a76}/pipeline.json | 0
.../pmu-events/arch/arm64/arm/neoverse-n1/bus.json | 18 +
.../arch/arm64/arm/neoverse-n1/exception.json | 62 +
.../arch/arm64/arm/neoverse-n1/general.json | 6 +
.../arch/arm64/arm/neoverse-n1/l1d_cache.json | 50 +
.../arch/arm64/arm/neoverse-n1/l1i_cache.json | 10 +
.../arch/arm64/arm/neoverse-n1/l2_cache.json | 46 +
.../arch/arm64/arm/neoverse-n1/l3_cache.json | 18 +
.../arch/arm64/arm/neoverse-n1/ll_cache.json | 10 +
.../arch/arm64/arm/neoverse-n1/memory.json | 22 +
.../arch/arm64/arm/neoverse-n1/metrics.json | 219 +
.../arch/arm64/arm/neoverse-n1/retired.json | 26 +
.../pmu-events/arch/arm64/arm/neoverse-n1/spe.json | 18 +
.../arch/arm64/arm/neoverse-n1/spec_operation.json | 102 +
.../arch/arm64/arm/neoverse-n1/stall.json | 10 +
.../pmu-events/arch/arm64/arm/neoverse-n1/tlb.json | 66 +
tools/perf/pmu-events/arch/arm64/mapfile.csv | 4 +-
.../perf/pmu-events/arch/powerpc/power9/other.json | 4 +-
.../pmu-events/arch/powerpc/power9/pipeline.json | 2 +-
.../pmu-events/arch/s390/cf_z13/transaction.json | 70 +
.../pmu-events/arch/s390/cf_z14/transaction.json | 65 +
.../pmu-events/arch/s390/cf_z15/transaction.json | 65 +
.../perf/pmu-events/arch/s390/cf_z16/extended.json | 10 +-
.../perf/pmu-events/arch/s390/cf_z16/pai_ext.json | 178 +
.../pmu-events/arch/s390/cf_z16/transaction.json | 65 +
.../pmu-events/arch/x86/alderlake/adl-metrics.json | 3230 +-
.../perf/pmu-events/arch/x86/alderlake/cache.json | 36 +-
.../arch/x86/alderlake/floating-point.json | 27 +
.../pmu-events/arch/x86/alderlake/frontend.json | 9 +
.../perf/pmu-events/arch/x86/alderlake/memory.json | 11 +-
.../perf/pmu-events/arch/x86/alderlake/other.json | 3 +-
.../pmu-events/arch/x86/alderlake/pipeline.json | 28 +-
.../arch/x86/alderlake/uncore-interconnect.json | 90 +
.../arch/x86/alderlake/uncore-memory.json | 16 +-
.../arch/x86/alderlake/uncore-other.json | 64 -
.../arch/x86/alderlaken/adln-metrics.json | 825 +-
.../pmu-events/arch/x86/alderlaken/memory.json | 7 +
.../arch/x86/alderlaken/uncore-interconnect.json | 26 +
.../arch/x86/alderlaken/uncore-memory.json | 16 +-
.../arch/x86/alderlaken/uncore-other.json | 24 -
.../pmu-events/arch/x86/broadwell/bdw-metrics.json | 1439 +-
.../perf/pmu-events/arch/x86/broadwell/cache.json | 296 +-
.../arch/x86/broadwell/floating-point.json | 7 +
.../pmu-events/arch/x86/broadwell/frontend.json | 18 +-
.../perf/pmu-events/arch/x86/broadwell/memory.json | 248 +-
.../pmu-events/arch/x86/broadwell/pipeline.json | 22 +-
.../arch/x86/broadwell/uncore-cache.json | 30 +-
.../arch/x86/broadwell/uncore-interconnect.json | 61 +
.../arch/x86/broadwell/uncore-other.json | 59 -
.../arch/x86/broadwellde/bdwde-metrics.json | 1405 +-
.../pmu-events/arch/x86/broadwellde/cache.json | 105 +-
.../arch/x86/broadwellde/floating-point.json | 45 +-
.../pmu-events/arch/x86/broadwellde/frontend.json | 18 +-
.../pmu-events/arch/x86/broadwellde/memory.json | 64 +-
.../pmu-events/arch/x86/broadwellde/pipeline.json | 79 +-
.../arch/x86/broadwellde/uncore-cache.json | 396 +-
.../arch/x86/broadwellde/uncore-interconnect.json | 614 +
.../{uncore-other.json => uncore-io.json} | 595 +-
.../arch/x86/broadwellde/uncore-memory.json | 256 +-
.../arch/x86/broadwellde/uncore-power.json | 10 +-
.../arch/x86/broadwellx/bdx-metrics.json | 1626 +-
.../perf/pmu-events/arch/x86/broadwellx/cache.json | 16 +-
.../pmu-events/arch/x86/broadwellx/frontend.json | 18 +-
.../pmu-events/arch/x86/broadwellx/pipeline.json | 20 +-
.../arch/x86/broadwellx/uncore-cache.json | 456 +-
.../arch/x86/broadwellx/uncore-interconnect.json | 4305 ++-
.../pmu-events/arch/x86/broadwellx/uncore-io.json | 555 +
.../arch/x86/broadwellx/uncore-memory.json | 522 +-
.../arch/x86/broadwellx/uncore-other.json | 3250 --
.../arch/x86/broadwellx/uncore-power.json | 10 +-
.../pmu-events/arch/x86/cascadelakex/cache.json | 24 +-
.../arch/x86/cascadelakex/clx-metrics.json | 2204 +-
.../pmu-events/arch/x86/cascadelakex/frontend.json | 8 +-
.../pmu-events/arch/x86/cascadelakex/pipeline.json | 16 +
.../arch/x86/cascadelakex/uncore-cache.json | 10764 ++++++
.../arch/x86/cascadelakex/uncore-interconnect.json | 11334 +++++++
.../arch/x86/cascadelakex/uncore-io.json | 4250 +++
.../arch/x86/cascadelakex/uncore-memory.json | 18 +-
.../arch/x86/cascadelakex/uncore-other.json | 26336 ---------------
.../arch/x86/cascadelakex/uncore-power.json | 8 +-
.../perf/pmu-events/arch/x86/grandridge/cache.json | 155 +
.../pmu-events/arch/x86/grandridge/frontend.json | 16 +
.../pmu-events/arch/x86/grandridge/memory.json | 20 +
.../perf/pmu-events/arch/x86/grandridge/other.json | 20 +
.../pmu-events/arch/x86/grandridge/pipeline.json | 96 +
.../arch/x86/grandridge/virtual-memory.json | 24 +
.../pmu-events/arch/x86/graniterapids/cache.json | 54 +
.../arch/x86/graniterapids/frontend.json | 10 +
.../pmu-events/arch/x86/graniterapids/memory.json | 174 +
.../pmu-events/arch/x86/graniterapids/other.json | 29 +
.../arch/x86/graniterapids/pipeline.json | 102 +
.../arch/x86/graniterapids/virtual-memory.json | 26 +
tools/perf/pmu-events/arch/x86/haswell/cache.json | 38 +-
.../pmu-events/arch/x86/haswell/hsw-metrics.json | 1220 +-
tools/perf/pmu-events/arch/x86/haswell/memory.json | 38 +-
.../perf/pmu-events/arch/x86/haswell/pipeline.json | 8 +
.../pmu-events/arch/x86/haswell/uncore-cache.json | 50 +-
.../arch/x86/haswell/uncore-interconnect.json | 52 +
.../pmu-events/arch/x86/haswell/uncore-other.json | 50 -
tools/perf/pmu-events/arch/x86/haswellx/cache.json | 2 +-
.../pmu-events/arch/x86/haswellx/hsx-metrics.json | 1397 +-
.../pmu-events/arch/x86/haswellx/pipeline.json | 8 +
.../pmu-events/arch/x86/haswellx/uncore-cache.json | 376 +-
.../arch/x86/haswellx/uncore-interconnect.json | 4242 ++-
.../pmu-events/arch/x86/haswellx/uncore-io.json | 528 +
.../pmu-events/arch/x86/haswellx/uncore-other.json | 3160 --
tools/perf/pmu-events/arch/x86/icelake/cache.json | 16 +
.../arch/x86/icelake/floating-point.json | 31 +
.../pmu-events/arch/x86/icelake/icl-metrics.json | 1932 +-
.../perf/pmu-events/arch/x86/icelake/pipeline.json | 23 +-
.../arch/x86/icelake/uncore-interconnect.json | 74 +
.../pmu-events/arch/x86/icelake/uncore-other.json | 16 -
tools/perf/pmu-events/arch/x86/icelakex/cache.json | 8 +
.../arch/x86/icelakex/floating-point.json | 31 +
.../pmu-events/arch/x86/icelakex/icx-metrics.json | 2153 +-
.../pmu-events/arch/x86/icelakex/pipeline.json | 10 +
.../pmu-events/arch/x86/icelakex/uncore-cache.json | 9860 ++++++
.../arch/x86/icelakex/uncore-interconnect.json | 14571 ++++++++
.../pmu-events/arch/x86/icelakex/uncore-io.json | 9270 +++++
.../arch/x86/icelakex/uncore-memory.json | 6 +-
.../pmu-events/arch/x86/icelakex/uncore-other.json | 33727 -------------------
.../pmu-events/arch/x86/ivybridge/ivb-metrics.json | 1270 +-
.../pmu-events/arch/x86/ivybridge/pipeline.json | 8 +
.../arch/x86/ivybridge/uncore-cache.json | 50 +-
...{uncore-other.json => uncore-interconnect.json} | 0
.../pmu-events/arch/x86/ivytown/ivt-metrics.json | 1311 +-
.../perf/pmu-events/arch/x86/ivytown/pipeline.json | 8 +
.../pmu-events/arch/x86/ivytown/uncore-cache.json | 314 +-
.../arch/x86/ivytown/uncore-interconnect.json | 2025 +-
.../pmu-events/arch/x86/ivytown/uncore-io.json | 549 +
.../pmu-events/arch/x86/ivytown/uncore-other.json | 2174 --
tools/perf/pmu-events/arch/x86/jaketown/cache.json | 6 +-
.../arch/x86/jaketown/floating-point.json | 2 +-
.../pmu-events/arch/x86/jaketown/frontend.json | 12 +-
.../pmu-events/arch/x86/jaketown/jkt-metrics.json | 602 +-
.../pmu-events/arch/x86/jaketown/pipeline.json | 10 +-
.../pmu-events/arch/x86/jaketown/uncore-cache.json | 216 +-
.../arch/x86/jaketown/uncore-interconnect.json | 1311 +-
.../pmu-events/arch/x86/jaketown/uncore-io.json | 324 +
.../arch/x86/jaketown/uncore-memory.json | 4 +-
.../pmu-events/arch/x86/jaketown/uncore-other.json | 1393 -
.../pmu-events/arch/x86/jaketown/uncore-power.json | 8 +-
.../pmu-events/arch/x86/knightslanding/cache.json | 94 +-
.../arch/x86/knightslanding/pipeline.json | 8 +-
.../{uncore-other.json => uncore-cache.json} | 304 +-
.../arch/x86/knightslanding/uncore-io.json | 194 +
.../arch/x86/knightslanding/uncore-memory.json | 106 +
tools/perf/pmu-events/arch/x86/mapfile.csv | 47 +-
.../perf/pmu-events/arch/x86/meteorlake/cache.json | 8 +
.../pmu-events/arch/x86/meteorlake/frontend.json | 9 +
.../pmu-events/arch/x86/meteorlake/memory.json | 13 +-
.../perf/pmu-events/arch/x86/meteorlake/other.json | 4 +-
.../pmu-events/arch/x86/meteorlake/pipeline.json | 36 +-
.../arch/x86/meteorlake/virtual-memory.json | 4 +
.../pmu-events/arch/x86/sandybridge/cache.json | 8 +-
.../arch/x86/sandybridge/floating-point.json | 2 +-
.../pmu-events/arch/x86/sandybridge/frontend.json | 12 +-
.../pmu-events/arch/x86/sandybridge/pipeline.json | 10 +-
.../arch/x86/sandybridge/snb-metrics.json | 601 +-
.../arch/x86/sandybridge/uncore-cache.json | 50 +-
...{uncore-other.json => uncore-interconnect.json} | 0
.../pmu-events/arch/x86/sapphirerapids/cache.json | 24 +-
.../arch/x86/sapphirerapids/floating-point.json | 32 +
.../arch/x86/sapphirerapids/frontend.json | 8 +
.../pmu-events/arch/x86/sapphirerapids/other.json | 3 +-
.../arch/x86/sapphirerapids/pipeline.json | 23 +-
.../arch/x86/sapphirerapids/spr-metrics.json | 2293 +-
.../arch/x86/sapphirerapids/uncore-cache.json | 5644 ++++
.../arch/x86/sapphirerapids/uncore-cxl.json | 450 +
.../x86/sapphirerapids/uncore-interconnect.json | 6199 ++++
.../arch/x86/sapphirerapids/uncore-io.json | 3651 ++
.../arch/x86/sapphirerapids/uncore-memory.json | 3283 +-
.../arch/x86/sapphirerapids/uncore-other.json | 4465 ---
.../arch/x86/sapphirerapids/uncore-power.json | 107 +
.../pmu-events/arch/x86/sierraforest/cache.json | 155 +
.../pmu-events/arch/x86/sierraforest/frontend.json | 16 +
.../pmu-events/arch/x86/sierraforest/memory.json | 20 +
.../pmu-events/arch/x86/sierraforest/other.json | 20 +
.../pmu-events/arch/x86/sierraforest/pipeline.json | 96 +
.../arch/x86/sierraforest/virtual-memory.json | 24 +
.../pmu-events/arch/x86/silvermont/frontend.json | 2 +-
.../pmu-events/arch/x86/silvermont/pipeline.json | 2 +-
tools/perf/pmu-events/arch/x86/skylake/cache.json | 17 +-
.../arch/x86/skylake/floating-point.json | 15 +
.../perf/pmu-events/arch/x86/skylake/frontend.json | 8 +-
tools/perf/pmu-events/arch/x86/skylake/other.json | 1 +
.../perf/pmu-events/arch/x86/skylake/pipeline.json | 26 +
.../pmu-events/arch/x86/skylake/skl-metrics.json | 1877 +-
.../pmu-events/arch/x86/skylake/uncore-cache.json | 28 +-
.../arch/x86/skylake/uncore-interconnect.json | 67 +
.../pmu-events/arch/x86/skylake/uncore-other.json | 64 -
tools/perf/pmu-events/arch/x86/skylakex/cache.json | 8 +-
.../pmu-events/arch/x86/skylakex/frontend.json | 8 +-
.../pmu-events/arch/x86/skylakex/pipeline.json | 16 +
.../pmu-events/arch/x86/skylakex/skx-metrics.json | 2097 +-
.../pmu-events/arch/x86/skylakex/uncore-cache.json | 10649 ++++++
.../arch/x86/skylakex/uncore-interconnect.json | 11248 +++++++
.../pmu-events/arch/x86/skylakex/uncore-io.json | 4250 +++
.../arch/x86/skylakex/uncore-memory.json | 2 +-
.../pmu-events/arch/x86/skylakex/uncore-other.json | 26135 --------------
.../pmu-events/arch/x86/skylakex/uncore-power.json | 6 +-
.../arch/x86/snowridgex/uncore-cache.json | 7100 ++++
.../arch/x86/snowridgex/uncore-interconnect.json | 6016 ++++
.../pmu-events/arch/x86/snowridgex/uncore-io.json | 8944 +++++
.../arch/x86/snowridgex/uncore-memory.json | 4 +-
.../arch/x86/snowridgex/uncore-other.json | 22094 ------------
.../arch/x86/tigerlake/floating-point.json | 31 +
.../pmu-events/arch/x86/tigerlake/pipeline.json | 18 +
.../pmu-events/arch/x86/tigerlake/tgl-metrics.json | 1942 +-
.../arch/x86/tigerlake/uncore-interconnect.json | 90 +
.../arch/x86/tigerlake/uncore-memory.json | 50 +
.../arch/x86/tigerlake/uncore-other.json | 100 -
.../pmu-events/arch/x86/westmereep-dp/cache.json | 2 +-
.../arch/x86/westmereep-dp/virtual-memory.json | 2 +-
tools/perf/pmu-events/empty-pmu-events.c | 6 +-
tools/perf/pmu-events/jevents.py | 61 +-
tools/perf/pmu-events/metric.py | 8 +-
tools/perf/pmu-events/pmu-events.h | 35 +-
tools/perf/scripts/Build | 4 +-
tools/perf/scripts/python/Perf-Trace-Util/Build | 2 +-
.../perf/scripts/python/Perf-Trace-Util/Context.c | 17 +-
tools/perf/scripts/python/intel-pt-events.py | 8 +-
tools/perf/scripts/python/net_dropmonitor.py | 4 +-
tools/perf/scripts/python/netdev-times.py | 6 +-
tools/perf/scripts/python/task-analyzer.py | 2 +-
tools/perf/tests/api-io.c | 39 +-
tools/perf/tests/attr/base-record | 2 +-
tools/perf/tests/attr/base-stat | 2 +-
tools/perf/tests/attr/system-wide-dummy | 2 +-
tools/perf/tests/bpf.c | 1 -
tools/perf/tests/builtin-test.c | 4 +-
tools/perf/tests/code-reading.c | 76 +-
tools/perf/tests/cpumap.c | 4 +-
tools/perf/tests/expand-cgroup.c | 5 +-
tools/perf/tests/expr.c | 7 +-
tools/perf/tests/hists_common.c | 8 +-
tools/perf/tests/hists_cumulate.c | 14 +-
tools/perf/tests/hists_filter.c | 14 +-
tools/perf/tests/hists_link.c | 22 +-
tools/perf/tests/hists_output.c | 12 +-
tools/perf/tests/make | 28 +-
tools/perf/tests/maps.c | 69 +-
tools/perf/tests/mmap-thread-lookup.c | 3 +-
tools/perf/tests/parse-events.c | 49 +-
tools/perf/tests/parse-metric.c | 23 +-
tools/perf/tests/pfm.c | 12 +-
tools/perf/tests/pmu-events.c | 53 +-
tools/perf/tests/pmu.c | 9 +-
.../perf/tests/shell/lib/perf_json_output_lint.py | 3 +-
.../tests/shell/record+probe_libc_inet_pton.sh | 3 -
tools/perf/tests/shell/record_offcpu.sh | 4 +-
tools/perf/tests/shell/stat+csv_output.sh | 58 +-
tools/perf/tests/shell/stat+json_output.sh | 48 +-
tools/perf/tests/shell/test_arm_coresight.sh | 24 +
tools/perf/tests/symbols.c | 7 +-
tools/perf/tests/thread-maps-share.c | 28 +-
tools/perf/tests/vmlinux-kallsyms.c | 54 +-
tools/perf/ui/browsers/annotate.c | 9 +-
tools/perf/ui/browsers/hists.c | 22 +-
tools/perf/ui/browsers/map.c | 4 +-
tools/perf/ui/gtk/annotate.c | 11 +-
tools/perf/ui/gtk/browser.c | 2 +-
tools/perf/ui/gtk/gtk.h | 2 +
tools/perf/ui/gtk/helpline.c | 2 +-
tools/perf/ui/gtk/hists.c | 2 +-
tools/perf/ui/hist.c | 2 +-
tools/perf/ui/setup.c | 19 +
tools/perf/ui/tui/setup.c | 1 -
tools/perf/ui/ui.h | 3 +
tools/perf/util/Build | 19 +-
tools/perf/util/amd-sample-raw.c | 14 +-
tools/perf/util/annotate.c | 85 +-
tools/perf/util/annotate.h | 9 +-
tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 30 +-
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 47 +-
.../util/arm-spe-decoder/arm-spe-pkt-decoder.c | 9 +
.../util/arm-spe-decoder/arm-spe-pkt-decoder.h | 3 +
tools/perf/util/arm-spe.c | 28 +-
tools/perf/util/auxtrace.c | 7 +-
tools/perf/util/block-info.c | 4 +-
tools/perf/util/block-range.c | 6 +-
tools/perf/util/bpf-event.c | 76 +-
tools/perf/util/bpf-filter.c | 197 +
tools/perf/util/bpf-filter.h | 49 +
tools/perf/util/bpf-filter.l | 159 +
tools/perf/util/bpf-filter.y | 78 +
tools/perf/util/bpf-loader.c | 18 -
tools/perf/util/bpf_counter.c | 28 +-
tools/perf/util/bpf_lock_contention.c | 54 +-
tools/perf/util/bpf_skel/.gitignore | 3 +-
tools/perf/util/bpf_skel/lock_contention.bpf.c | 136 +-
tools/perf/util/bpf_skel/lock_data.h | 17 +
tools/perf/util/bpf_skel/sample-filter.h | 27 +
tools/perf/util/bpf_skel/sample_filter.bpf.c | 196 +
tools/perf/util/build-id.c | 2 +-
tools/perf/util/callchain.c | 28 +-
tools/perf/util/cloexec.c | 13 -
tools/perf/util/cpumap.c | 43 +-
tools/perf/util/cpumap.h | 3 +
tools/perf/util/cputopo.c | 14 +
tools/perf/util/cputopo.h | 5 +
tools/perf/util/cs-etm-base.c | 3 +-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 7 +
tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 8 +-
tools/perf/util/cs-etm.c | 568 +-
tools/perf/util/cs-etm.h | 20 +-
tools/perf/util/data-convert-json.c | 10 +-
tools/perf/util/db-export.c | 16 +-
tools/perf/util/demangle-cxx.cpp | 49 +
tools/perf/util/demangle-cxx.h | 16 +
tools/perf/util/dlfilter.c | 28 +-
tools/perf/util/dso.c | 13 +-
tools/perf/util/dso.h | 2 +
tools/perf/util/dsos.c | 3 +-
tools/perf/util/env.c | 2 +-
tools/perf/util/event.c | 29 +-
tools/perf/util/event.h | 3 +-
tools/perf/util/evlist.c | 42 +-
tools/perf/util/evlist.h | 8 +-
tools/perf/util/evsel.c | 46 +-
tools/perf/util/evsel.h | 21 +-
tools/perf/util/evsel_fprintf.c | 13 +-
tools/perf/util/expr.c | 49 +-
tools/perf/util/expr.y | 12 +-
tools/perf/util/ftrace.h | 1 -
tools/perf/util/genelf_debug.c | 46 +-
tools/perf/util/header.c | 3 +-
tools/perf/util/header.h | 2 +
tools/perf/util/hist.c | 49 +-
tools/perf/util/hist.h | 4 +
.../perf/util/intel-pt-decoder/intel-pt-decoder.c | 8 +-
.../util/intel-pt-decoder/intel-pt-insn-decoder.c | 18 +
.../util/intel-pt-decoder/intel-pt-insn-decoder.h | 2 +
.../util/intel-pt-decoder/intel-pt-pkt-decoder.c | 2 +-
tools/perf/util/intel-pt.c | 63 +-
tools/perf/util/jitdump.c | 7 +-
tools/perf/util/kvm-stat.h | 73 +-
tools/perf/util/lock-contention.h | 10 +-
tools/perf/util/machine.c | 257 +-
tools/perf/util/map.c | 219 +-
tools/perf/util/map.h | 144 +-
tools/perf/util/maps.c | 317 +-
tools/perf/util/maps.h | 72 +-
tools/perf/util/mem-events.c | 90 +-
tools/perf/util/metricgroup.c | 199 +-
tools/perf/util/metricgroup.h | 5 +-
tools/perf/util/namespaces.c | 141 +-
tools/perf/util/namespaces.h | 3 +-
tools/perf/util/ordered-events.c | 2 +-
tools/perf/util/parse-events.c | 295 +-
tools/perf/util/parse-events.h | 15 +-
tools/perf/util/parse-events.l | 1 +
tools/perf/util/parse-events.y | 28 +-
tools/perf/util/pfm.c | 1 -
tools/perf/util/pmu.c | 461 +-
tools/perf/util/pmu.h | 23 +-
tools/perf/util/pmu.l | 17 +-
tools/perf/util/pmu.y | 5 +-
tools/perf/util/print-events.c | 50 +-
tools/perf/util/print-events.h | 1 +
tools/perf/util/probe-event.c | 62 +-
tools/perf/util/probe-finder.c | 2 +-
tools/perf/util/python.c | 21 +-
tools/perf/util/record.h | 1 -
tools/perf/util/sample.h | 13 +
tools/perf/util/scripting-engines/Build | 2 +-
.../perf/util/scripting-engines/trace-event-perl.c | 10 +-
.../util/scripting-engines/trace-event-python.c | 101 +-
tools/perf/util/session.c | 5 +-
tools/perf/util/smt.c | 11 +-
tools/perf/util/smt.h | 12 +-
tools/perf/util/sort.c | 126 +-
tools/perf/util/sort.h | 3 +
tools/perf/util/srcline.c | 183 +-
tools/perf/util/stat-display.c | 119 +-
tools/perf/util/stat-shadow.c | 1287 +-
tools/perf/util/stat.c | 74 -
tools/perf/util/stat.h | 96 +-
tools/perf/util/strfilter.c | 2 +-
tools/perf/util/string.c | 2 +-
tools/perf/util/symbol-elf.c | 94 +-
tools/perf/util/symbol.c | 314 +-
tools/perf/util/symbol_conf.h | 2 +-
tools/perf/util/symbol_fprintf.c | 2 +-
tools/perf/util/synthetic-events.c | 36 +-
tools/perf/util/thread-stack.c | 4 +-
tools/perf/util/thread.c | 69 +-
tools/perf/util/top.c | 2 +-
tools/perf/util/topdown.c | 68 +-
tools/perf/util/topdown.h | 11 +-
tools/perf/util/trace-event-scripting.c | 9 +-
tools/perf/util/tracepoint.c | 1 +
tools/perf/util/unwind-libdw.c | 20 +-
tools/perf/util/unwind-libunwind-local.c | 68 +-
tools/perf/util/unwind-libunwind.c | 39 +-
tools/perf/util/usage.c | 6 +
tools/perf/util/util.c | 21 +-
tools/perf/util/util.h | 8 +
tools/perf/util/vdso.c | 7 +-
.../drivers/net/dsa/test_bridge_fdb_stress.sh | 2 +-
513 files changed, 169251 insertions(+), 146289 deletions(-)
create mode 100644 tools/build/feature/test-cxa-demangle.cpp
delete mode 100644 tools/build/feature/test-libbpf-bpf_map_create.c
delete mode 100644 tools/build/feature/test-libbpf-bpf_object__next_map.c
delete mode 100644 tools/build/feature/test-libbpf-bpf_object__next_program.c
delete mode 100644 tools/build/feature/test-libbpf-bpf_prog_load.c
delete mode 100644 tools/build/feature/test-libbpf-bpf_program__set_insns.c
delete mode 100644 tools/build/feature/test-libbpf-btf__load_from_kernel_by_id.c
delete mode 100644 tools/build/feature/test-libbpf-btf__raw_data.c
create mode 100644 tools/build/feature/test-scandirat.c
create mode 100644 tools/lib/perf/include/internal/rc_check.h
create mode 100644 tools/perf/arch/s390/util/pmu.c
create mode 100644 tools/perf/bench/pmu-scan.c
rename tools/perf/pmu-events/arch/arm64/arm/{cortex-a76-n1 => cortex-a76}/branch.json (100%)
rename tools/perf/pmu-events/arch/arm64/arm/{cortex-a76-n1 => cortex-a76}/bus.json (100%)
rename tools/perf/pmu-events/arch/arm64/arm/{cortex-a76-n1 => cortex-a76}/cache.json (100%)
rename tools/perf/pmu-events/arch/arm64/arm/{cortex-a76-n1 => cortex-a76}/exception.json (100%)
rename tools/perf/pmu-events/arch/arm64/arm/{cortex-a76-n1 => cortex-a76}/instruction.json (100%)
rename tools/perf/pmu-events/arch/arm64/arm/{cortex-a76-n1 => cortex-a76}/memory.json (100%)
rename tools/perf/pmu-events/arch/arm64/arm/{cortex-a76-n1 => cortex-a76}/pipeline.json (100%)
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/bus.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/exception.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/general.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/l1d_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/l1i_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/l2_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/l3_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/ll_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/memory.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/metrics.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/retired.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/spe.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/spec_operation.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/stall.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n1/tlb.json
create mode 100644 tools/perf/pmu-events/arch/s390/cf_z16/pai_ext.json
create mode 100644 tools/perf/pmu-events/arch/x86/alderlake/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/alderlaken/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json
rename tools/perf/pmu-events/arch/x86/broadwellde/{uncore-other.json => uncore-io.json} (54%)
create mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/uncore-io.json
delete mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/uncore-other.json
create mode 100644 tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json
create mode 100644 tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json
delete mode 100644 tools/perf/pmu-events/arch/x86/cascadelakex/uncore-other.json
create mode 100644 tools/perf/pmu-events/arch/x86/grandridge/cache.json
create mode 100644 tools/perf/pmu-events/arch/x86/grandridge/frontend.json
create mode 100644 tools/perf/pmu-events/arch/x86/grandridge/memory.json
create mode 100644 tools/perf/pmu-events/arch/x86/grandridge/other.json
create mode 100644 tools/perf/pmu-events/arch/x86/grandridge/pipeline.json
create mode 100644 tools/perf/pmu-events/arch/x86/grandridge/virtual-memory.json
create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/cache.json
create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/frontend.json
create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/memory.json
create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/other.json
create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json
create mode 100644 tools/perf/pmu-events/arch/x86/graniterapids/virtual-memory.json
create mode 100644 tools/perf/pmu-events/arch/x86/haswell/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/haswellx/uncore-io.json
delete mode 100644 tools/perf/pmu-events/arch/x86/haswellx/uncore-other.json
create mode 100644 tools/perf/pmu-events/arch/x86/icelake/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json
create mode 100644 tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/icelakex/uncore-io.json
delete mode 100644 tools/perf/pmu-events/arch/x86/icelakex/uncore-other.json
rename tools/perf/pmu-events/arch/x86/ivybridge/{uncore-other.json => uncore-interconnect.json} (100%)
create mode 100644 tools/perf/pmu-events/arch/x86/ivytown/uncore-io.json
delete mode 100644 tools/perf/pmu-events/arch/x86/ivytown/uncore-other.json
create mode 100644 tools/perf/pmu-events/arch/x86/jaketown/uncore-io.json
delete mode 100644 tools/perf/pmu-events/arch/x86/jaketown/uncore-other.json
rename tools/perf/pmu-events/arch/x86/knightslanding/{uncore-other.json => uncore-cache.json} (90%)
create mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-io.json
create mode 100644 tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json
rename tools/perf/pmu-events/arch/x86/sandybridge/{uncore-other.json => uncore-interconnect.json} (100%)
create mode 100644 tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-cache.json
create mode 100644 tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-cxl.json
create mode 100644 tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-io.json
delete mode 100644 tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-other.json
create mode 100644 tools/perf/pmu-events/arch/x86/sierraforest/cache.json
create mode 100644 tools/perf/pmu-events/arch/x86/sierraforest/frontend.json
create mode 100644 tools/perf/pmu-events/arch/x86/sierraforest/memory.json
create mode 100644 tools/perf/pmu-events/arch/x86/sierraforest/other.json
create mode 100644 tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json
create mode 100644 tools/perf/pmu-events/arch/x86/sierraforest/virtual-memory.json
create mode 100644 tools/perf/pmu-events/arch/x86/skylake/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json
create mode 100644 tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json
delete mode 100644 tools/perf/pmu-events/arch/x86/skylakex/uncore-other.json
create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json
create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-io.json
delete mode 100644 tools/perf/pmu-events/arch/x86/snowridgex/uncore-other.json
create mode 100644 tools/perf/pmu-events/arch/x86/tigerlake/uncore-interconnect.json
create mode 100644 tools/perf/pmu-events/arch/x86/tigerlake/uncore-memory.json
create mode 100644 tools/perf/util/bpf-filter.c
create mode 100644 tools/perf/util/bpf-filter.h
create mode 100644 tools/perf/util/bpf-filter.l
create mode 100644 tools/perf/util/bpf-filter.y
create mode 100644 tools/perf/util/bpf_skel/sample-filter.h
create mode 100644 tools/perf/util/bpf_skel/sample_filter.bpf.c
create mode 100644 tools/perf/util/demangle-cxx.cpp
create mode 100644 tools/perf/util/demangle-cxx.h


2023-05-04 03:14:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] perf tools changes for v6.4

On Wed, May 3, 2023 at 2:18 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Please consider pulling,

I did consider it, but the end result doesn't even build, so I unpulled again..

I get some libbpf error, and I'm just not interested in trying to
debug it. This has clearly not been tested well enough to be merged.

libbpf: failed to find '.BTF' ELF section in /home/torvalds/v2.6/linux/vmlinux
Error: failed to load BTF from /home/torvalds/v2.6/linux/vmlinux: No
data available
Failure to generate vmlinux.h needed for the recommended BPF skeleton support.
To disable this use the build option NO_BPF_SKEL=1.
Alternatively point at a pre-generated vmlinux.h with VMLINUX_H=<path>.

the build system assumptions are clearly completely broken.

Linus

2023-05-04 03:29:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] perf tools changes for v6.4

On Wed, May 3, 2023 at 8:00 PM Linus Torvalds
<[email protected]> wrote:
>
> I did consider it, but the end result doesn't even build, so I unpulled again..
>
> I get some libbpf error, and I'm just not interested in trying to
> debug it. This has clearly not been tested well enough to be merged.

Side note: its' not even about testing.

The error message makes it clear that this is garbage and should never
be merged even if it were to compile.

There is not a way in hell that it is correct that a 'perf' tool build
should ever even look at the vmlinux binary to build.

The fact that it does shows that something is seriously wrong in
perf-tool land, and I will not be touching any pulls until that
fundamental mistake is entirely gone.

The vmlinux image that is present in my tree (ie
/home/torvalds/v2.6/linux/vmlinux) is a test build with an insane
config. And the fact that the perf tool even looks at it is seriously
broken.

Whatever you are doing - stop it right now.

Linus

2023-05-04 05:56:44

by Ian Rogers

[permalink] [raw]
Subject: Re: [GIT PULL] perf tools changes for v6.4

On Wed, May 3, 2023 at 8:12 PM Linus Torvalds
<[email protected]> wrote:
>
> On Wed, May 3, 2023 at 8:00 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > I did consider it, but the end result doesn't even build, so I unpulled again..
> >
> > I get some libbpf error, and I'm just not interested in trying to
> > debug it. This has clearly not been tested well enough to be merged.
>
> Side note: its' not even about testing.
>
> The error message makes it clear that this is garbage and should never
> be merged even if it were to compile.
>
> There is not a way in hell that it is correct that a 'perf' tool build
> should ever even look at the vmlinux binary to build.
>
> The fact that it does shows that something is seriously wrong in
> perf-tool land, and I will not be touching any pulls until that
> fundamental mistake is entirely gone.
>
> The vmlinux image that is present in my tree (ie
> /home/torvalds/v2.6/linux/vmlinux) is a test build with an insane
> config. And the fact that the perf tool even looks at it is seriously
> broken.
>
> Whatever you are doing - stop it right now.
>
> Linus

I think the error you gave makes it pretty clear what is going on and
Arnaldo's e-mail explains the motivation. Perhaps we can check a
vmlinux.h into the perf tree so that we don't default to generating
it. This would avoid the binary dependency but we may need different
flavors for different architectures because of structs like pt_regs.

Thanks,
Ian

2023-05-04 11:32:12

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Wed, May 03, 2023 at 08:12:20PM -0700, Linus Torvalds escreveu:
> On Wed, May 3, 2023 at 8:00 PM Linus Torvalds <[email protected]> wrote:
> > I did consider it, but the end result doesn't even build, so I unpulled again..

> > I get some libbpf error, and I'm just not interested in trying to
> > debug it. This has clearly not been tested well enough to be merged.

Its the default (opt-out) in the development branch for a while and
stayed in linux-next, but as it has been opt-in it hasn't received the
same amount of testing as the default build in the past development
cycles, even with the first feature that uses it having been introduced
back in 2020 :-\

> Side note: its' not even about testing.

> The error message makes it clear that this is garbage and should never
> be merged even if it were to compile.

> There is not a way in hell that it is correct that a 'perf' tool build
> should ever even look at the vmlinux binary to build.

> The fact that it does shows that something is seriously wrong in
> perf-tool land, and I will not be touching any pulls until that
> fundamental mistake is entirely gone.

> The vmlinux image that is present in my tree (ie
> /home/torvalds/v2.6/linux/vmlinux) is a test build with an insane
> config. And the fact that the perf tool even looks at it is seriously
> broken.

Humm,

Does building runqslower works for you in this same environment where
building perf failed?

I ask this because it uses the same libbpf technique (CO-RE) to allow
tools that access kernel data structures from BPF to work with multiple
kernels, even those where the layout of the accessed kernel structures
changed.

To build it:

$ make -C tools/bpf/runqslower/
make: Entering directory '/home/acme/git/perf-tools/tools/bpf/runqslower'
MKDIR /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf/
GEN /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf/bpf_helper_defs.h
CC /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf/staticobjs/libbpf.o
<SNIP>
LINK /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf/libbpf.a
INSTALL /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf//include/bpf/bpf.h
INSTALL /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf//include/bpf/libbpf.h
INSTALL /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf//include/bpf/btf.h
<SNIP>
INSTALL /home/acme/git/perf-tools/tools/bpf/runqslower/.output/libbpf//include/bpf/bpf_helper_defs.h
INSTALL libbpf_headers
MKDIR /home/acme/git/perf-tools/tools/bpf/runqslower/.output/bpftool/bootstrap/libbpf/include/bpf
INSTALL /home/acme/git/perf-tools/tools/bpf/runqslower/.output/bpftool/bootstrap/libbpf/include/bpf/hashmap.h
<SNIP>
LINK /home/acme/git/perf-tools/tools/bpf/runqslower/.output/bpftool/bootstrap/bpftool
GEN /home/acme/git/perf-tools/tools/bpf/runqslower/.output//vmlinux.h
GEN /home/acme/git/perf-tools/tools/bpf/runqslower/.output//runqslower.bpf.o
GEN /home/acme/git/perf-tools/tools/bpf/runqslower/.output//runqslower.skel.h
CC /home/acme/git/perf-tools/tools/bpf/runqslower/.output//runqslower.o
LINK /home/acme/git/perf-tools/tools/bpf/runqslower/.output//runqslower
make: Leaving directory '/home/acme/git/perf-tools/tools/bpf/runqslower'
$

# strip tools/bpf/runqslower/.output/runqslower
# file tools/bpf/runqslower/.output/runqslower
tools/bpf/runqslower/.output/runqslower: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=ab0306ee3f6cdc671d9aac7006457a3646e1c266, for GNU/Linux 3.2.0, stripped
[root@quaco perf-tools]# tools/bpf/runqslower/.output/runqslower 100
Tracing run queue latency higher than 100 us
TIME COMM PID LAT(us)
08:00:17 swapper/4 30951 1001
08:00:18 kworker/7:8 2604 101
08:00:19 ksoftirqd/2 15665 105
08:00:22 swapper/2 904400 179
08:00:22 swapper/2 1247 102
08:00:23 TaskCon~ller #7 643789 102
08:00:26 ksoftirqd/2 849302 109
08:00:26 gmain 3107 238
08:00:26 rcu_tasks_trace 152972 634
08:00:27 systemd-oomd 899470 4887
08:00:28 Timer 30951 262
08:00:28 ksoftirqd/2 15665 103
08:00:29 systemd-resolve 895022 104
08:00:29 ksoftirqd/2 15665 145
08:00:30 ksoftirqd/2 15665 117
08:00:30 ksoftirqd/2 640315 149
08:00:30 gmain 3074 122
08:00:30 goa-identity-se 3107 109
^C
#

It builds and uses the tools/bpf/bpftool tool to generate the vmlinux.h
file to build the tool:

$ strace -f -e access,open,openat make -C tools/bpf/runqslower/ |& grep vmlinux
GEN /home/acme/git/perf-tools/tools/bpf/runqslower/.output//vmlinux.h
[pid 902901] openat(AT_FDCWD, "/home/acme/git/perf-tools/tools/bpf/runqslower/.output//vmlinux.h", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
[pid 902901] openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3
[pid 902909] openat(AT_FDCWD, "/home/acme/git/perf-tools/tools/bpf/runqslower/.output/vmlinux.h", O_RDONLY) = 4
$

But here it is using /sys/kernel/btf/vmlinux, which is way more sensible
than what you noticed.

Looking at tools/bpf/runqslower/Makefile:

# Try to detect best kernel BTF source
KERNEL_REL := $(shell uname -r)
VMLINUX_BTF_PATHS := $(if $(O),$(O)/vmlinux) \
$(if $(KBUILD_OUTPUT),$(KBUILD_OUTPUT)/vmlinux) \
../../../vmlinux /sys/kernel/btf/vmlinux \
/boot/vmlinux-$(KERNEL_REL)
VMLINUX_BTF_PATH := $(or $(VMLINUX_BTF),$(firstword \
$(wildcard $(VMLINUX_BTF_PATHS))))

It tries to use vmlinux.

I'll check why it isn't using the same technique, possibly you don't
generate BTF?

In this fedora 37 kernel:

$ grep DEBUG_INFO_BTF /boot/config-6.2.9-200.fc37.x86_64
CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
$

Having said that probably we should go back to making build with BPF
skels a opt in feature, as it has been since the first feature using it
was introduced, in:

commit fbcdaa1908e8f61aa56c71a1db9a9deb72110a9d
Author: Song Liu <[email protected]>
Date: Tue Dec 29 13:42:13 2020 -0800

perf build: Support build BPF skeletons with perf

BPF programs are useful in perf to profile BPF programs.

BPF skeleton is by far the easiest way to write BPF tools. Enable
building BPF skeletons in util/bpf_skel. A dummy bpf skeleton is added.
More bpf skeletons will be added for different use cases.

Signed-off-by: Song Liu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Until we sort out these build robustness issues.

- Arnaldo

2023-05-04 17:37:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 4, 2023 at 4:09 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
>>
> Does building runqslower works for you in this same environment where
> building perf failed?

I don't know, and I don't care. I've never used that thing, and I'm
not going to.

And it's irrelevant. Two wrongs do not make a right.

I'm going to ignore perf tools pulls going forward if this is the kind
of argument for garbage that you use.

Because a billion flies *can* be wrong.

Linus

2023-05-04 18:23:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 10:25:30AM -0700, Linus Torvalds escreveu:
> On Thu, May 4, 2023 at 4:09 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > Does building runqslower works for you in this same environment where
> > building perf failed?

> I don't know, and I don't care. I've never used that thing, and I'm
> not going to.

> And it's irrelevant. Two wrongs do not make a right.

> I'm going to ignore perf tools pulls going forward if this is the kind
> of argument for garbage that you use.

> Because a billion flies *can* be wrong.

I pushed two reverts there that make this back into a
opt-in/experimental feature till we fix the issue you reported:

⬢[acme@toolbox perf-tools]$ git log --oneline -3
e7b7a54767a71c67 (HEAD -> perf-tools, acme/perf-tools) Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
6957bdf37a1e6eca Revert "perf build: Warn for BPF skeletons if endian mismatches"
1f85d016768ff19f (tag: perf-tools-for-v6.4-1-2023-05-03) perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
⬢[acme@toolbox perf-tools]$

Its in:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf-tools

Using a vmlinux.h file built by bpftool from the BTF info, be it in a
vmlinux file or in /sys/kernel/btf/vmlinux (a RAW BTF file) is used for
building the BPF bytecode, using clang:

⬢[acme@toolbox perf-tools]$ head tools/perf/util/bpf_skel/sample_filter.bpf.c
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
// Copyright (c) 2023 Google
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>

#include "sample-filter.h"

/* BPF map that will be filled by user space */
⬢[acme@toolbox perf-tools]$

So that it can access kernel types and store the type info for those
types together with the BPF bytecode, as BTF info, and later use this
and relocation records for libbpf to be able to adjust things when
accessed data structures change in the kernel and needs adjustments
based in both the kernel BTF info (/sys/kernel/btf/vmlinux) and the
BPF bytecode being loaded (in its .BTF ELF section).

Andrii, can you add some more information about the usage of vmlinux.h
instead of using kernel headers?

- Arnaldo

2023-05-04 19:03:27

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Thu, May 04, 2023 at 10:25:30AM -0700, Linus Torvalds escreveu:
> > On Thu, May 4, 2023 at 4:09 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > Does building runqslower works for you in this same environment where
> > > building perf failed?
>
> > I don't know, and I don't care. I've never used that thing, and I'm
> > not going to.
>
> > And it's irrelevant. Two wrongs do not make a right.
>
> > I'm going to ignore perf tools pulls going forward if this is the kind
> > of argument for garbage that you use.
>
> > Because a billion flies *can* be wrong.
>
> I pushed two reverts there that make this back into a
> opt-in/experimental feature till we fix the issue you reported:
>
> ⬢[acme@toolbox perf-tools]$ git log --oneline -3
> e7b7a54767a71c67 (HEAD -> perf-tools, acme/perf-tools) Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
> 6957bdf37a1e6eca Revert "perf build: Warn for BPF skeletons if endian mismatches"
> 1f85d016768ff19f (tag: perf-tools-for-v6.4-1-2023-05-03) perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
> ⬢[acme@toolbox perf-tools]$
>
> Its in:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf-tools
>
> Using a vmlinux.h file built by bpftool from the BTF info, be it in a
> vmlinux file or in /sys/kernel/btf/vmlinux (a RAW BTF file) is used for
> building the BPF bytecode, using clang:
>
> ⬢[acme@toolbox perf-tools]$ head tools/perf/util/bpf_skel/sample_filter.bpf.c
> // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> // Copyright (c) 2023 Google
> #include "vmlinux.h"
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_tracing.h>
> #include <bpf/bpf_core_read.h>
>
> #include "sample-filter.h"
>
> /* BPF map that will be filled by user space */
> ⬢[acme@toolbox perf-tools]$
>
> So that it can access kernel types and store the type info for those
> types together with the BPF bytecode, as BTF info, and later use this
> and relocation records for libbpf to be able to adjust things when
> accessed data structures change in the kernel and needs adjustments
> based in both the kernel BTF info (/sys/kernel/btf/vmlinux) and the
> BPF bytecode being loaded (in its .BTF ELF section).
>
> Andrii, can you add some more information about the usage of vmlinux.h
> instead of using kernel headers?
>

I'll just say that vmlinux.h is not a hard requirement to build BPF
programs, it's more a convenience allowing easy access to definitions
of both UAPI and kernel-internal structures for tracing needs and
marking them relocatable using BPF CO-RE machinery. Lots of real-world
applications just check-in pregenerated vmlinux.h to avoid build-time
dependency on up-to-date host kernel and such.

If vmlinux.h generation and usage is causing issues, though, given
that perf's BPF programs don't seem to be using many different kernel
types, it might be a better option to just use UAPI headers for public
kernel type definitions, and just define CO-RE-relocatable minimal
definitions locally in perf's BPF code for the other types necessary.
E.g., if perf needs only pid and tgid from task_struct, this would
suffice:

struct task_struct {
int pid;
int tgid;
} __attribute__((preserve_access_index));

> - Arnaldo

2023-05-04 19:03:44

by Jiri Olsa

[permalink] [raw]
Subject: Re: [GIT PULL] perf tools changes for v6.4

On Wed, May 03, 2023 at 10:51:15PM -0700, Ian Rogers wrote:
> On Wed, May 3, 2023 at 8:12 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Wed, May 3, 2023 at 8:00 PM Linus Torvalds
> > <[email protected]> wrote:
> > >
> > > I did consider it, but the end result doesn't even build, so I unpulled again..
> > >
> > > I get some libbpf error, and I'm just not interested in trying to
> > > debug it. This has clearly not been tested well enough to be merged.
> >
> > Side note: its' not even about testing.
> >
> > The error message makes it clear that this is garbage and should never
> > be merged even if it were to compile.
> >
> > There is not a way in hell that it is correct that a 'perf' tool build
> > should ever even look at the vmlinux binary to build.
> >
> > The fact that it does shows that something is seriously wrong in
> > perf-tool land, and I will not be touching any pulls until that
> > fundamental mistake is entirely gone.
> >
> > The vmlinux image that is present in my tree (ie
> > /home/torvalds/v2.6/linux/vmlinux) is a test build with an insane
> > config. And the fact that the perf tool even looks at it is seriously
> > broken.
> >
> > Whatever you are doing - stop it right now.
> >
> > Linus
>
> I think the error you gave makes it pretty clear what is going on and
> Arnaldo's e-mail explains the motivation. Perhaps we can check a
> vmlinux.h into the perf tree so that we don't default to generating
> it. This would avoid the binary dependency but we may need different
> flavors for different architectures because of structs like pt_regs.

I think we could check that vmlinux with .BTF is present before
allowing to build skeletons

jirka

2023-05-04 19:39:54

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > Andrii, can you add some more information about the usage of vmlinux.h
> > instead of using kernel headers?

> I'll just say that vmlinux.h is not a hard requirement to build BPF
> programs, it's more a convenience allowing easy access to definitions
> of both UAPI and kernel-internal structures for tracing needs and
> marking them relocatable using BPF CO-RE machinery. Lots of real-world
> applications just check-in pregenerated vmlinux.h to avoid build-time
> dependency on up-to-date host kernel and such.

> If vmlinux.h generation and usage is causing issues, though, given
> that perf's BPF programs don't seem to be using many different kernel
> types, it might be a better option to just use UAPI headers for public
> kernel type definitions, and just define CO-RE-relocatable minimal
> definitions locally in perf's BPF code for the other types necessary.
> E.g., if perf needs only pid and tgid from task_struct, this would
> suffice:

> struct task_struct {
> int pid;
> int tgid;
> } __attribute__((preserve_access_index));

Yeah, that seems like a way better approach, no vmlinux involved, libbpf
CO-RE notices that task_struct changed from this two integers version
(of course) and does the relocation to where it is in the running kernel
by using /sys/kernel/btf/vmlinux.

I looked and the creation of vmlinux.h was introduced in:

commit 944138f048f7d7591ec7568c94b21de8df2724d4
Author: Namhyung Kim <[email protected]>
Date: Thu Jul 1 14:12:27 2021 -0700

perf stat: Enable BPF counter with --for-each-cgroup

Recently bperf was added to use BPF to count perf events for various
purposes. This is an extension for the approach and targetting to
cgroup usages.

Unlike the other bperf, it doesn't share the events with other
processes but it'd reduces unnecessary events (and the overhead of
multiplexing) for each monitored cgroup within the perf session.

When --for-each-cgroup is used with --bpf-counters, it will open
cgroup-switches event per cpu internally and attach the new BPF
program to read given perf_events and to aggregate the results for
cgroups. It's only called when task is switched to a task in a
different cgroup.

Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Song Liu <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Which I think was the first BPF skel to access a kernel data structure,
yeah:

tools/perf/util/bpf_skel/bperf_cgroup.bpf.c

For things like:

+static inline int get_cgroup_v1_idx(__u32 *cgrps, int size)
+{
+ struct task_struct *p = (void *)bpf_get_current_task();
+ struct cgroup *cgrp;
+ register int i = 0;
+ __u32 *elem;
+ int level;
+ int cnt;
+
+ cgrp = BPF_CORE_READ(p, cgroups, subsys[perf_event_cgrp_id], cgroup);
+ level = BPF_CORE_READ(cgrp, level);

So we can completely remove touching vmlinux from the perf building
process.

If we can get the revert of the patches making BPF skels to build by
default for v6.4 then we would do this work, test it thorougly and have
it available for v6.5.

Linus, would that be a way forward?

- Arnaldo

For reference, here is the definition for BPF_CORE_READ() from tools/lib/bpf/bpf_core_read.h

/*
* BPF_CORE_READ() is used to simplify BPF CO-RE relocatable read, especially
* when there are few pointer chasing steps.
* E.g., what in non-BPF world (or in BPF w/ BCC) would be something like:
* int x = s->a.b.c->d.e->f->g;
* can be succinctly achieved using BPF_CORE_READ as:
* int x = BPF_CORE_READ(s, a.b.c, d.e, f, g);
*
* BPF_CORE_READ will decompose above statement into 4 bpf_core_read (BPF
* CO-RE relocatable bpf_probe_read_kernel() wrapper) calls, logically
* equivalent to:
* 1. const void *__t = s->a.b.c;
* 2. __t = __t->d.e;
* 3. __t = __t->f;
* 4. return __t->g;
*
* Equivalence is logical, because there is a heavy type casting/preservation
* involved, as well as all the reads are happening through
* bpf_probe_read_kernel() calls using __builtin_preserve_access_index() to
* emit CO-RE relocations.
*
* N.B. Only up to 9 "field accessors" are supported, which should be more
* than enough for any practical purpose.
*/
#define BPF_CORE_READ(src, a, ...) ({ \
___type((src), a, ##__VA_ARGS__) __r; \
BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__); \
__r; \
})

2023-05-04 21:55:15

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > Andrii, can you add some more information about the usage of vmlinux.h
> > > instead of using kernel headers?
>
> > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > programs, it's more a convenience allowing easy access to definitions
> > of both UAPI and kernel-internal structures for tracing needs and
> > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > applications just check-in pregenerated vmlinux.h to avoid build-time
> > dependency on up-to-date host kernel and such.
>
> > If vmlinux.h generation and usage is causing issues, though, given
> > that perf's BPF programs don't seem to be using many different kernel
> > types, it might be a better option to just use UAPI headers for public
> > kernel type definitions, and just define CO-RE-relocatable minimal
> > definitions locally in perf's BPF code for the other types necessary.
> > E.g., if perf needs only pid and tgid from task_struct, this would
> > suffice:
>
> > struct task_struct {
> > int pid;
> > int tgid;
> > } __attribute__((preserve_access_index));
>
> Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> CO-RE notices that task_struct changed from this two integers version
> (of course) and does the relocation to where it is in the running kernel
> by using /sys/kernel/btf/vmlinux.

Doing it for one of the skels, build tested, runtime untested, but not
using any vmlinux, BTF to help, not that bad, more verbose, but at least
we state what are the fields we actually use, have those attribute
documenting that those offsets will be recorded for future use, etc.

Namhyung, can you please check that this works?

Thanks,

- Arnaldo

diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
index 6a438e0102c5a2cb..f376d162549ebd74 100644
--- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
+++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
@@ -1,11 +1,40 @@
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
// Copyright (c) 2021 Facebook
// Copyright (c) 2021 Google
-#include "vmlinux.h"
+#include <linux/types.h>
+#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>

+// libbpf's CO-RE will take care of the relocations so that these fields match
+// the layout of these structs in the kernel where this ends up running on.
+
+struct cgroup_subsys_state {
+ struct cgroup *cgroup;
+} __attribute__((preserve_access_index));
+
+struct css_set {
+ struct cgroup_subsys_state *subsys[13];
+} __attribute__((preserve_access_index));
+
+struct task_struct {
+ struct css_set *cgroups;
+} __attribute__((preserve_access_index));
+
+struct kernfs_node {
+ __u64 id;
+} __attribute__((preserve_access_index));
+
+struct cgroup {
+ struct kernfs_node *kn;
+ int level;
+} __attribute__((preserve_access_index));
+
+enum cgroup_subsys_id {
+ perf_event_cgrp_id = 8,
+};
+
#define MAX_LEVELS 10 // max cgroup hierarchy level: arbitrary
#define MAX_EVENTS 32 // max events per cgroup: arbitrary

@@ -52,7 +81,7 @@ struct cgroup___new {
/* old kernel cgroup definition */
struct cgroup___old {
int level;
- u64 ancestor_ids[];
+ __u64 ancestor_ids[];
} __attribute__((preserve_access_index));

const volatile __u32 num_events = 1;

2023-05-04 22:16:07

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 06:48:50PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > instead of using kernel headers?
> >
> > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > programs, it's more a convenience allowing easy access to definitions
> > > of both UAPI and kernel-internal structures for tracing needs and
> > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > dependency on up-to-date host kernel and such.
> >
> > > If vmlinux.h generation and usage is causing issues, though, given
> > > that perf's BPF programs don't seem to be using many different kernel
> > > types, it might be a better option to just use UAPI headers for public
> > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > definitions locally in perf's BPF code for the other types necessary.
> > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > suffice:
> >
> > > struct task_struct {
> > > int pid;
> > > int tgid;
> > > } __attribute__((preserve_access_index));
> >
> > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > CO-RE notices that task_struct changed from this two integers version
> > (of course) and does the relocation to where it is in the running kernel
> > by using /sys/kernel/btf/vmlinux.
>
> Doing it for one of the skels, build tested, runtime untested, but not
> using any vmlinux, BTF to help, not that bad, more verbose, but at least
> we state what are the fields we actually use, have those attribute
> documenting that those offsets will be recorded for future use, etc.
>
> Namhyung, can you please check that this works?

Second case was simpler:

diff --git a/tools/perf/util/bpf_skel/bperf_follower.bpf.c b/tools/perf/util/bpf_skel/bperf_follower.bpf.c
index f193998530d431d8..1ab06f2ff5ad7548 100644
--- a/tools/perf/util/bpf_skel/bperf_follower.bpf.c
+++ b/tools/perf/util/bpf_skel/bperf_follower.bpf.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
// Copyright (c) 2021 Facebook
-#include "vmlinux.h"
+#include <linux/types.h>
+#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include "bperf_u.h"

2023-05-04 22:16:57

by Ian Rogers

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > instead of using kernel headers?
> >
> > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > programs, it's more a convenience allowing easy access to definitions
> > > of both UAPI and kernel-internal structures for tracing needs and
> > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > dependency on up-to-date host kernel and such.
> >
> > > If vmlinux.h generation and usage is causing issues, though, given
> > > that perf's BPF programs don't seem to be using many different kernel
> > > types, it might be a better option to just use UAPI headers for public
> > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > definitions locally in perf's BPF code for the other types necessary.
> > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > suffice:
> >
> > > struct task_struct {
> > > int pid;
> > > int tgid;
> > > } __attribute__((preserve_access_index));
> >
> > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > CO-RE notices that task_struct changed from this two integers version
> > (of course) and does the relocation to where it is in the running kernel
> > by using /sys/kernel/btf/vmlinux.
>
> Doing it for one of the skels, build tested, runtime untested, but not
> using any vmlinux, BTF to help, not that bad, more verbose, but at least
> we state what are the fields we actually use, have those attribute
> documenting that those offsets will be recorded for future use, etc.
>
> Namhyung, can you please check that this works?
>
> Thanks,
>
> - Arnaldo
>
> diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> index 6a438e0102c5a2cb..f376d162549ebd74 100644
> --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> @@ -1,11 +1,40 @@
> // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> // Copyright (c) 2021 Facebook
> // Copyright (c) 2021 Google
> -#include "vmlinux.h"
> +#include <linux/types.h>
> +#include <linux/bpf.h>

Compared to vmlinux.h here be dragons. It is easy to start dragging in
all of libc and that may not work due to missing #ifdefs, etc.. Could
we check in a vmlinux.h like libbpf-tools does?
https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64

This would also remove some of the errors that could be introduced by
copy+pasting enums, etc. and also highlight issues with things being
renamed as build time rather than runtime failures.
Could this be some shared resource for the different linux tools
projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
install_headers target that builds a vmlinux.h.

Thanks,
Ian

> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_tracing.h>
> #include <bpf/bpf_core_read.h>
>
> +// libbpf's CO-RE will take care of the relocations so that these fields match
> +// the layout of these structs in the kernel where this ends up running on.
> +
> +struct cgroup_subsys_state {
> + struct cgroup *cgroup;
> +} __attribute__((preserve_access_index));
> +
> +struct css_set {
> + struct cgroup_subsys_state *subsys[13];
> +} __attribute__((preserve_access_index));
> +
> +struct task_struct {
> + struct css_set *cgroups;
> +} __attribute__((preserve_access_index));
> +
> +struct kernfs_node {
> + __u64 id;
> +} __attribute__((preserve_access_index));
> +
> +struct cgroup {
> + struct kernfs_node *kn;
> + int level;
> +} __attribute__((preserve_access_index));
> +
> +enum cgroup_subsys_id {
> + perf_event_cgrp_id = 8,
> +};
> +
> #define MAX_LEVELS 10 // max cgroup hierarchy level: arbitrary
> #define MAX_EVENTS 32 // max events per cgroup: arbitrary
>
> @@ -52,7 +81,7 @@ struct cgroup___new {
> /* old kernel cgroup definition */
> struct cgroup___old {
> int level;
> - u64 ancestor_ids[];
> + __u64 ancestor_ids[];
> } __attribute__((preserve_access_index));
>
> const volatile __u32 num_events = 1;

2023-05-04 22:55:25

by Namhyung Kim

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 04, 2023 at 06:48:50PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > instead of using kernel headers?
> >
> > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > programs, it's more a convenience allowing easy access to definitions
> > > of both UAPI and kernel-internal structures for tracing needs and
> > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > dependency on up-to-date host kernel and such.
> >
> > > If vmlinux.h generation and usage is causing issues, though, given
> > > that perf's BPF programs don't seem to be using many different kernel
> > > types, it might be a better option to just use UAPI headers for public
> > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > definitions locally in perf's BPF code for the other types necessary.
> > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > suffice:
> >
> > > struct task_struct {
> > > int pid;
> > > int tgid;
> > > } __attribute__((preserve_access_index));
> >
> > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > CO-RE notices that task_struct changed from this two integers version
> > (of course) and does the relocation to where it is in the running kernel
> > by using /sys/kernel/btf/vmlinux.
>
> Doing it for one of the skels, build tested, runtime untested, but not
> using any vmlinux, BTF to help, not that bad, more verbose, but at least
> we state what are the fields we actually use, have those attribute
> documenting that those offsets will be recorded for future use, etc.
>
> Namhyung, can you please check that this works?

Yep, it works great!

$ sudo ./perf stat -a --bpf-counters --for-each-cgroup /,user.slice,system.slice sleep 1

Performance counter stats for 'system wide':

64,110.41 msec cpu-clock / # 64.004 CPUs utilized
15,787 context-switches / # 246.247 /sec
72 cpu-migrations / # 1.123 /sec
1,236 page-faults / # 19.279 /sec
848,608,137 cycles / # 0.013 GHz (83.23%)
106,928,070 stalled-cycles-frontend / # 12.60% frontend cycles idle (83.23%)
209,204,795 stalled-cycles-backend / # 24.65% backend cycles idle (83.23%)
645,183,025 instructions / # 0.76 insn per cycle
# 0.32 stalled cycles per insn (83.24%)
141,776,876 branches / # 2.211 M/sec (83.63%)
3,001,078 branch-misses / # 2.12% of all branches (83.44%)
66.67 msec cpu-clock user.slice # 0.067 CPUs utilized
695 context-switches user.slice # 10.424 K/sec
22 cpu-migrations user.slice # 329.966 /sec
1,202 page-faults user.slice # 18.028 K/sec
150,514,330 cycles user.slice # 2.257 GHz (90.17%)
13,504,605 stalled-cycles-frontend user.slice # 8.97% frontend cycles idle (69.71%)
38,859,376 stalled-cycles-backend user.slice # 25.82% backend cycles idle (95.28%)
189,382,145 instructions user.slice # 1.26 insn per cycle
# 0.21 stalled cycles per insn (88.92%)
36,019,878 branches user.slice # 540.242 M/sec (90.16%)
697,723 branch-misses user.slice # 1.94% of all branches (65.77%)
44.33 msec cpu-clock system.slice # 0.044 CPUs utilized
2,382 context-switches system.slice # 53.732 K/sec
42 cpu-migrations system.slice # 947.418 /sec
34 page-faults system.slice # 766.958 /sec
100,383,549 cycles system.slice # 2.264 GHz (87.27%)
10,165,225 stalled-cycles-frontend system.slice # 10.13% frontend cycles idle (71.73%)
29,964,682 stalled-cycles-backend system.slice # 29.85% backend cycles idle (84.94%)
101,210,743 instructions system.slice # 1.01 insn per cycle
# 0.30 stalled cycles per insn (80.68%)
19,893,831 branches system.slice # 448.757 M/sec (86.94%)
397,854 branch-misses system.slice # 2.00% of all branches (88.42%)

1.001667221 seconds time elapsed

Thanks,
Namhyung


> diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> index 6a438e0102c5a2cb..f376d162549ebd74 100644
> --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> @@ -1,11 +1,40 @@
> // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> // Copyright (c) 2021 Facebook
> // Copyright (c) 2021 Google
> -#include "vmlinux.h"
> +#include <linux/types.h>
> +#include <linux/bpf.h>
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_tracing.h>
> #include <bpf/bpf_core_read.h>
>
> +// libbpf's CO-RE will take care of the relocations so that these fields match
> +// the layout of these structs in the kernel where this ends up running on.
> +
> +struct cgroup_subsys_state {
> + struct cgroup *cgroup;
> +} __attribute__((preserve_access_index));
> +
> +struct css_set {
> + struct cgroup_subsys_state *subsys[13];
> +} __attribute__((preserve_access_index));
> +
> +struct task_struct {
> + struct css_set *cgroups;
> +} __attribute__((preserve_access_index));
> +
> +struct kernfs_node {
> + __u64 id;
> +} __attribute__((preserve_access_index));
> +
> +struct cgroup {
> + struct kernfs_node *kn;
> + int level;
> +} __attribute__((preserve_access_index));
> +
> +enum cgroup_subsys_id {
> + perf_event_cgrp_id = 8,
> +};
> +
> #define MAX_LEVELS 10 // max cgroup hierarchy level: arbitrary
> #define MAX_EVENTS 32 // max events per cgroup: arbitrary
>
> @@ -52,7 +81,7 @@ struct cgroup___new {
> /* old kernel cgroup definition */
> struct cgroup___old {
> int level;
> - u64 ancestor_ids[];
> + __u64 ancestor_ids[];
> } __attribute__((preserve_access_index));
>
> const volatile __u32 num_events = 1;
>

2023-05-05 00:07:12

by Namhyung Kim

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Hi Jiri,

On Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa wrote:
> On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > >
> > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > instead of using kernel headers?
> > > >
> > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > dependency on up-to-date host kernel and such.
> > > >
> > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > types, it might be a better option to just use UAPI headers for public
> > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > suffice:
> > > >
> > > > > struct task_struct {
> > > > > int pid;
> > > > > int tgid;
> > > > > } __attribute__((preserve_access_index));
> > > >
> > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > CO-RE notices that task_struct changed from this two integers version
> > > > (of course) and does the relocation to where it is in the running kernel
> > > > by using /sys/kernel/btf/vmlinux.
> > >
> > > Doing it for one of the skels, build tested, runtime untested, but not
> > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > we state what are the fields we actually use, have those attribute
> > > documenting that those offsets will be recorded for future use, etc.
> > >
> > > Namhyung, can you please check that this works?
> > >
> > > Thanks,
> > >
> > > - Arnaldo
> > >
> > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > @@ -1,11 +1,40 @@
> > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > // Copyright (c) 2021 Facebook
> > > // Copyright (c) 2021 Google
> > > -#include "vmlinux.h"
> > > +#include <linux/types.h>
> > > +#include <linux/bpf.h>
> >
> > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > we check in a vmlinux.h like libbpf-tools does?
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> >
> > This would also remove some of the errors that could be introduced by
> > copy+pasting enums, etc. and also highlight issues with things being
> > renamed as build time rather than runtime failures.
>
> we already have to deal with that, right? doing checks on fields in
> structs like mm_struct___old
>
> > Could this be some shared resource for the different linux tools
> > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > install_headers target that builds a vmlinux.h.
>
> I tried to do the minimal header and it's not too big,
> I pushed it in here:
> https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
>
> compile tested so far

Cool. But I think you missed this.

diff --git a/tools/perf/util/bpf_skel/perf-defs.h b/tools/perf/util/bpf_skel/perf-defs.h
index 1320e1be03b8..4cfa8a9fce39 100644
--- a/tools/perf/util/bpf_skel/perf-defs.h
+++ b/tools/perf/util/bpf_skel/perf-defs.h
@@ -253,6 +253,7 @@ typedef struct {
} atomic64_t;

struct rw_semaphore {
+ atomic_long_t owner;
} __attribute__((preserve_access_index));

typedef atomic64_t atomic_long_t;


Thanks,
Namhyung

2023-05-05 00:10:43

by Ian Rogers

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 4, 2023 at 4:03 PM Jiri Olsa <[email protected]> wrote:
>
> On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > >
> > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > instead of using kernel headers?
> > > >
> > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > dependency on up-to-date host kernel and such.
> > > >
> > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > types, it might be a better option to just use UAPI headers for public
> > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > suffice:
> > > >
> > > > > struct task_struct {
> > > > > int pid;
> > > > > int tgid;
> > > > > } __attribute__((preserve_access_index));
> > > >
> > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > CO-RE notices that task_struct changed from this two integers version
> > > > (of course) and does the relocation to where it is in the running kernel
> > > > by using /sys/kernel/btf/vmlinux.
> > >
> > > Doing it for one of the skels, build tested, runtime untested, but not
> > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > we state what are the fields we actually use, have those attribute
> > > documenting that those offsets will be recorded for future use, etc.
> > >
> > > Namhyung, can you please check that this works?
> > >
> > > Thanks,
> > >
> > > - Arnaldo
> > >
> > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > @@ -1,11 +1,40 @@
> > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > // Copyright (c) 2021 Facebook
> > > // Copyright (c) 2021 Google
> > > -#include "vmlinux.h"
> > > +#include <linux/types.h>
> > > +#include <linux/bpf.h>
> >
> > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > we check in a vmlinux.h like libbpf-tools does?
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> >
> > This would also remove some of the errors that could be introduced by
> > copy+pasting enums, etc. and also highlight issues with things being
> > renamed as build time rather than runtime failures.
>
> we already have to deal with that, right? doing checks on fields in
> structs like mm_struct___old

We do, but the way I detected the problems in the first place was by
building against older kernels. Now the build will always succeed but
fail at runtime.

> > Could this be some shared resource for the different linux tools
> > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > install_headers target that builds a vmlinux.h.
>
> I tried to do the minimal header and it's not too big,
> I pushed it in here:
> https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
>
> compile tested so far
>
> jirka

Cool, could we just call it vmlinux.h rather than perf-defs.h?

I notice cgroup_subsys_id is in there which is called out in Andrii's
CO-RE guide/blog:
https://nakryiko.com/posts/bpf-core-reference-guide/#relocatable-enums
perhaps we can do something with names/types to make sure a helper is
being used for these enum values.

Thanks,
Ian

2023-05-05 00:12:08

by Jiri Olsa

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> >
> > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > instead of using kernel headers?
> > >
> > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > programs, it's more a convenience allowing easy access to definitions
> > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > dependency on up-to-date host kernel and such.
> > >
> > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > that perf's BPF programs don't seem to be using many different kernel
> > > > types, it might be a better option to just use UAPI headers for public
> > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > definitions locally in perf's BPF code for the other types necessary.
> > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > suffice:
> > >
> > > > struct task_struct {
> > > > int pid;
> > > > int tgid;
> > > > } __attribute__((preserve_access_index));
> > >
> > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > CO-RE notices that task_struct changed from this two integers version
> > > (of course) and does the relocation to where it is in the running kernel
> > > by using /sys/kernel/btf/vmlinux.
> >
> > Doing it for one of the skels, build tested, runtime untested, but not
> > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > we state what are the fields we actually use, have those attribute
> > documenting that those offsets will be recorded for future use, etc.
> >
> > Namhyung, can you please check that this works?
> >
> > Thanks,
> >
> > - Arnaldo
> >
> > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > @@ -1,11 +1,40 @@
> > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > // Copyright (c) 2021 Facebook
> > // Copyright (c) 2021 Google
> > -#include "vmlinux.h"
> > +#include <linux/types.h>
> > +#include <linux/bpf.h>
>
> Compared to vmlinux.h here be dragons. It is easy to start dragging in
> all of libc and that may not work due to missing #ifdefs, etc.. Could
> we check in a vmlinux.h like libbpf-tools does?
> https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
>
> This would also remove some of the errors that could be introduced by
> copy+pasting enums, etc. and also highlight issues with things being
> renamed as build time rather than runtime failures.

we already have to deal with that, right? doing checks on fields in
structs like mm_struct___old

> Could this be some shared resource for the different linux tools
> projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> install_headers target that builds a vmlinux.h.

I tried to do the minimal header and it's not too big,
I pushed it in here:
https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h

compile tested so far

jirka

2023-05-05 09:44:15

by Jiri Olsa

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 04, 2023 at 04:15:08PM -0700, Namhyung Kim wrote:
> Hi Jiri,
>
> On Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa wrote:
> > On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > >
> > > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > > instead of using kernel headers?
> > > > >
> > > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > > dependency on up-to-date host kernel and such.
> > > > >
> > > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > > types, it might be a better option to just use UAPI headers for public
> > > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > > suffice:
> > > > >
> > > > > > struct task_struct {
> > > > > > int pid;
> > > > > > int tgid;
> > > > > > } __attribute__((preserve_access_index));
> > > > >
> > > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > > CO-RE notices that task_struct changed from this two integers version
> > > > > (of course) and does the relocation to where it is in the running kernel
> > > > > by using /sys/kernel/btf/vmlinux.
> > > >
> > > > Doing it for one of the skels, build tested, runtime untested, but not
> > > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > > we state what are the fields we actually use, have those attribute
> > > > documenting that those offsets will be recorded for future use, etc.
> > > >
> > > > Namhyung, can you please check that this works?
> > > >
> > > > Thanks,
> > > >
> > > > - Arnaldo
> > > >
> > > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > @@ -1,11 +1,40 @@
> > > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > // Copyright (c) 2021 Facebook
> > > > // Copyright (c) 2021 Google
> > > > -#include "vmlinux.h"
> > > > +#include <linux/types.h>
> > > > +#include <linux/bpf.h>
> > >
> > > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > > we check in a vmlinux.h like libbpf-tools does?
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> > >
> > > This would also remove some of the errors that could be introduced by
> > > copy+pasting enums, etc. and also highlight issues with things being
> > > renamed as build time rather than runtime failures.
> >
> > we already have to deal with that, right? doing checks on fields in
> > structs like mm_struct___old
> >
> > > Could this be some shared resource for the different linux tools
> > > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > > install_headers target that builds a vmlinux.h.
> >
> > I tried to do the minimal header and it's not too big,
> > I pushed it in here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
> >
> > compile tested so far
>
> Cool. But I think you missed this.
>
> diff --git a/tools/perf/util/bpf_skel/perf-defs.h b/tools/perf/util/bpf_skel/perf-defs.h
> index 1320e1be03b8..4cfa8a9fce39 100644
> --- a/tools/perf/util/bpf_skel/perf-defs.h
> +++ b/tools/perf/util/bpf_skel/perf-defs.h
> @@ -253,6 +253,7 @@ typedef struct {
> } atomic64_t;
>
> struct rw_semaphore {
> + atomic_long_t owner;
> } __attribute__((preserve_access_index));

ah right, I did not see that because my clang took another #ifdef leg

thanks,
jirka

>
> typedef atomic64_t atomic_long_t;
>
>
> Thanks,
> Namhyung

2023-05-05 10:09:26

by Jiri Olsa

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Thu, May 04, 2023 at 04:19:47PM -0700, Ian Rogers wrote:
> On Thu, May 4, 2023 at 4:03 PM Jiri Olsa <[email protected]> wrote:
> >
> > On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > >
> > > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > > instead of using kernel headers?
> > > > >
> > > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > > dependency on up-to-date host kernel and such.
> > > > >
> > > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > > types, it might be a better option to just use UAPI headers for public
> > > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > > suffice:
> > > > >
> > > > > > struct task_struct {
> > > > > > int pid;
> > > > > > int tgid;
> > > > > > } __attribute__((preserve_access_index));
> > > > >
> > > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > > CO-RE notices that task_struct changed from this two integers version
> > > > > (of course) and does the relocation to where it is in the running kernel
> > > > > by using /sys/kernel/btf/vmlinux.
> > > >
> > > > Doing it for one of the skels, build tested, runtime untested, but not
> > > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > > we state what are the fields we actually use, have those attribute
> > > > documenting that those offsets will be recorded for future use, etc.
> > > >
> > > > Namhyung, can you please check that this works?
> > > >
> > > > Thanks,
> > > >
> > > > - Arnaldo
> > > >
> > > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > @@ -1,11 +1,40 @@
> > > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > // Copyright (c) 2021 Facebook
> > > > // Copyright (c) 2021 Google
> > > > -#include "vmlinux.h"
> > > > +#include <linux/types.h>
> > > > +#include <linux/bpf.h>
> > >
> > > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > > we check in a vmlinux.h like libbpf-tools does?
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> > >
> > > This would also remove some of the errors that could be introduced by
> > > copy+pasting enums, etc. and also highlight issues with things being
> > > renamed as build time rather than runtime failures.
> >
> > we already have to deal with that, right? doing checks on fields in
> > structs like mm_struct___old
>
> We do, but the way I detected the problems in the first place was by
> building against older kernels. Now the build will always succeed but
> fail at runtime.
>
> > > Could this be some shared resource for the different linux tools
> > > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > > install_headers target that builds a vmlinux.h.
> >
> > I tried to do the minimal header and it's not too big,
> > I pushed it in here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
> >
> > compile tested so far
> >
> > jirka
>
> Cool, could we just call it vmlinux.h rather than perf-defs.h?

right, it also makes the change smaller

>
> I notice cgroup_subsys_id is in there which is called out in Andrii's
> CO-RE guide/blog:
> https://nakryiko.com/posts/bpf-core-reference-guide/#relocatable-enums
> perhaps we can do something with names/types to make sure a helper is
> being used for these enum values.

ok, I'll check on that.. so far I made some clean ups and updated the branch

thanks,
jirka

2023-05-05 11:48:19

by Jiri Olsa

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Fri, May 05, 2023 at 11:39:19AM +0200, Jiri Olsa wrote:
> On Thu, May 04, 2023 at 04:19:47PM -0700, Ian Rogers wrote:
> > On Thu, May 4, 2023 at 4:03 PM Jiri Olsa <[email protected]> wrote:
> > >
> > > On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > > > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > >
> > > > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > > > instead of using kernel headers?
> > > > > >
> > > > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > > > dependency on up-to-date host kernel and such.
> > > > > >
> > > > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > > > types, it might be a better option to just use UAPI headers for public
> > > > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > > > suffice:
> > > > > >
> > > > > > > struct task_struct {
> > > > > > > int pid;
> > > > > > > int tgid;
> > > > > > > } __attribute__((preserve_access_index));
> > > > > >
> > > > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > > > CO-RE notices that task_struct changed from this two integers version
> > > > > > (of course) and does the relocation to where it is in the running kernel
> > > > > > by using /sys/kernel/btf/vmlinux.
> > > > >
> > > > > Doing it for one of the skels, build tested, runtime untested, but not
> > > > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > > > we state what are the fields we actually use, have those attribute
> > > > > documenting that those offsets will be recorded for future use, etc.
> > > > >
> > > > > Namhyung, can you please check that this works?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > - Arnaldo
> > > > >
> > > > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > > @@ -1,11 +1,40 @@
> > > > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > > // Copyright (c) 2021 Facebook
> > > > > // Copyright (c) 2021 Google
> > > > > -#include "vmlinux.h"
> > > > > +#include <linux/types.h>
> > > > > +#include <linux/bpf.h>
> > > >
> > > > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > > > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > > > we check in a vmlinux.h like libbpf-tools does?
> > > > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > > > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> > > >
> > > > This would also remove some of the errors that could be introduced by
> > > > copy+pasting enums, etc. and also highlight issues with things being
> > > > renamed as build time rather than runtime failures.
> > >
> > > we already have to deal with that, right? doing checks on fields in
> > > structs like mm_struct___old
> >
> > We do, but the way I detected the problems in the first place was by
> > building against older kernels. Now the build will always succeed but
> > fail at runtime.
> >
> > > > Could this be some shared resource for the different linux tools
> > > > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > > > install_headers target that builds a vmlinux.h.
> > >
> > > I tried to do the minimal header and it's not too big,
> > > I pushed it in here:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
> > >
> > > compile tested so far
> > >
> > > jirka
> >
> > Cool, could we just call it vmlinux.h rather than perf-defs.h?
>
> right, it also makes the change smaller
>
> >
> > I notice cgroup_subsys_id is in there which is called out in Andrii's
> > CO-RE guide/blog:
> > https://nakryiko.com/posts/bpf-core-reference-guide/#relocatable-enums
> > perhaps we can do something with names/types to make sure a helper is
> > being used for these enum values.

both bperf_cgroup and off_cpu programs use bpf_core_enum_value, so we should be fine

jirka

2023-05-05 13:24:44

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 07:01:51PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, May 04, 2023 at 06:48:50PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > instead of using kernel headers?
> > >
> > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > programs, it's more a convenience allowing easy access to definitions
> > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > dependency on up-to-date host kernel and such.
> > >
> > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > that perf's BPF programs don't seem to be using many different kernel
> > > > types, it might be a better option to just use UAPI headers for public
> > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > definitions locally in perf's BPF code for the other types necessary.
> > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > suffice:
> > >
> > > > struct task_struct {
> > > > int pid;
> > > > int tgid;
> > > > } __attribute__((preserve_access_index));
> > >
> > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > CO-RE notices that task_struct changed from this two integers version
> > > (of course) and does the relocation to where it is in the running kernel
> > > by using /sys/kernel/btf/vmlinux.
> >
> > Doing it for one of the skels, build tested, runtime untested, but not
> > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > we state what are the fields we actually use, have those attribute
> > documenting that those offsets will be recorded for future use, etc.

Yang, can you please check that this works?


From bd6289bc3ffc89aecad3bd8798d76626c8c16d39 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <[email protected]>
Date: Fri, 5 May 2023 10:13:09 -0300
Subject: [PATCH 1/1] perf kwork_trace.bpf: Stop using vmlinux.h, grab copies
of used structs

And mark them with __attribute__((preserve_access_index)) so that
libbpf's CO-RE code can fixup offsets if they differ with the kernel
data structure.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/bpf_skel/kwork_trace.bpf.c | 70 +++++++++++++++++++++-
1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf_skel/kwork_trace.bpf.c b/tools/perf/util/bpf_skel/kwork_trace.bpf.c
index 063c124e099938ed..e38fe54c7667fa74 100644
--- a/tools/perf/util/bpf_skel/kwork_trace.bpf.c
+++ b/tools/perf/util/bpf_skel/kwork_trace.bpf.c
@@ -1,13 +1,81 @@
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
// Copyright (c) 2022, Huawei

-#include "vmlinux.h"
+#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

#define KWORK_COUNT 100
#define MAX_KWORKNAME 128

+
+// non-UAPI kernel data structures, just the fields used in this tool,
+// preserving the access index so that libbpf can fixup offsets with the ones
+// used in the kernel when loading the BPF bytecode, if they differ from what
+// is used here.
+
+enum {
+ HI_SOFTIRQ = 0,
+ TIMER_SOFTIRQ,
+ NET_TX_SOFTIRQ,
+ NET_RX_SOFTIRQ,
+ BLOCK_SOFTIRQ,
+ IRQ_POLL_SOFTIRQ,
+ TASKLET_SOFTIRQ,
+ SCHED_SOFTIRQ,
+ HRTIMER_SOFTIRQ,
+ RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */
+
+ NR_SOFTIRQS
+};
+
+struct trace_entry {
+ short unsigned int type;
+ unsigned char flags;
+ unsigned char preempt_count;
+ int pid;
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_irq_handler_entry {
+ struct trace_entry ent;
+ int irq;
+ __u32 __data_loc_name;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_irq_handler_exit {
+ struct trace_entry ent;
+ int irq;
+ int ret;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_softirq {
+ struct trace_entry ent;
+ unsigned int vec;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_execute_start {
+ struct trace_entry ent;
+ void *work;
+ void *function;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_execute_end {
+ struct trace_entry ent;
+ void *work;
+ void *function;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_activate_work {
+ struct trace_entry ent;
+ void *work;
+ char __data[];
+} __attribute__((preserve_access_index));
+
/*
* This should be in sync with "util/kwork.h"
*/
--
2.39.2

2023-05-05 13:52:17

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Thu, May 04, 2023 at 07:01:51PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, May 04, 2023 at 06:48:50PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > instead of using kernel headers?
> > >
> > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > programs, it's more a convenience allowing easy access to definitions
> > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > dependency on up-to-date host kernel and such.
> > >
> > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > that perf's BPF programs don't seem to be using many different kernel
> > > > types, it might be a better option to just use UAPI headers for public
> > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > definitions locally in perf's BPF code for the other types necessary.
> > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > suffice:
> > >
> > > > struct task_struct {
> > > > int pid;
> > > > int tgid;
> > > > } __attribute__((preserve_access_index));
> > >
> > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > CO-RE notices that task_struct changed from this two integers version
> > > (of course) and does the relocation to where it is in the running kernel
> > > by using /sys/kernel/btf/vmlinux.
> >
> > Doing it for one of the skels, build tested, runtime untested, but not
> > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > we state what are the fields we actually use, have those attribute
> > documenting that those offsets will be recorded for future use, etc.
> >

Namhyung, can you please check that this one for the recent sample works?

From c6972dae6c962d7be5ba006ab90c9955268debc5 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <[email protected]>
Date: Fri, 5 May 2023 09:55:18 -0300
Subject: [PATCH 1/2] perf sample_filter.bpf: Stop using vmlinux.h generated by
bpftool, use CO-RE

Including linux/bpf.h and linux/perf_events.h we get the UAPI structs
and then define a subset 'struct perf_sample_data' with the fields we
use in this tool while using __attribute__((preserve_access_index)) so
that at libbpf load time it can fixup the offsets according to the
'struct perf_data_sample' obtained from the running kernel BTF
(/sys/kernel/btf/vmlinux).

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/bpf_skel/sample_filter.bpf.c | 37 +++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
index cffe493af1ed5f31..045532c2366d74ef 100644
--- a/tools/perf/util/bpf_skel/sample_filter.bpf.c
+++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
@@ -1,12 +1,47 @@
// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
// Copyright (c) 2023 Google
-#include "vmlinux.h"
+#include <linux/bpf.h>
+#include <linux/perf_event.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>

#include "sample-filter.h"

+// non-UAPI kernel data structures, just the fields used in this tool,
+// preserving the access index so that libbpf can fixup offsets with the ones
+// used in the kernel when loading the BPF bytecode, if they differ from what
+// is used here.
+
+struct perf_sample_data {
+ __u64 addr;
+ __u64 period;
+ union perf_sample_weight weight;
+ __u64 txn;
+ union perf_mem_data_src data_src;
+ __u64 ip;
+ struct {
+ __u32 pid;
+ __u32 tid;
+ } tid_entry;
+ __u64 time;
+ __u64 id;
+ struct {
+ __u32 cpu;
+ } cpu_entry;
+ __u64 phys_addr;
+ __u64 data_page_size;
+ __u64 code_page_size;
+} __attribute__((__aligned__(64))) __attribute__((preserve_access_index));
+
+struct bpf_perf_event_data_kern {
+ struct perf_sample_data * data;
+ struct perf_event * event;
+
+ /* size: 24, cachelines: 1, members: 3 */
+ /* last cacheline: 24 bytes */
+} __attribute__((preserve_access_index));
+
/* BPF map that will be filled by user space */
struct filters {
__uint(type, BPF_MAP_TYPE_ARRAY);
--
2.39.2

2023-05-05 14:31:04

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa escreveu:
> On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > >
> > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > instead of using kernel headers?
> > > >
> > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > dependency on up-to-date host kernel and such.
> > > >
> > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > types, it might be a better option to just use UAPI headers for public
> > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > suffice:
> > > >
> > > > > struct task_struct {
> > > > > int pid;
> > > > > int tgid;
> > > > > } __attribute__((preserve_access_index));
> > > >
> > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > CO-RE notices that task_struct changed from this two integers version
> > > > (of course) and does the relocation to where it is in the running kernel
> > > > by using /sys/kernel/btf/vmlinux.
> > >
> > > Doing it for one of the skels, build tested, runtime untested, but not
> > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > we state what are the fields we actually use, have those attribute
> > > documenting that those offsets will be recorded for future use, etc.
> > >
> > > Namhyung, can you please check that this works?
> > >
> > > Thanks,
> > >
> > > - Arnaldo
> > >
> > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > @@ -1,11 +1,40 @@
> > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > // Copyright (c) 2021 Facebook
> > > // Copyright (c) 2021 Google
> > > -#include "vmlinux.h"
> > > +#include <linux/types.h>
> > > +#include <linux/bpf.h>
> >
> > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > we check in a vmlinux.h like libbpf-tools does?
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> >
> > This would also remove some of the errors that could be introduced by
> > copy+pasting enums, etc. and also highlight issues with things being
> > renamed as build time rather than runtime failures.
>
> we already have to deal with that, right? doing checks on fields in
> structs like mm_struct___old
>
> > Could this be some shared resource for the different linux tools
> > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > install_headers target that builds a vmlinux.h.
>
> I tried to do the minimal header and it's not too big,
> I pushed it in here:
> https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
>
> compile tested so far

I see it and it makes the change to be minimal, which is good at the
current stage, but I wonder if it wouldn't be better for us to define
just the ones not in UAPI and use the #include <linux/bpf.h>,
<linux/perf_event.h> as I did in the patches I posted here and Namhyung
tested at least one, this way the added vmlinux.h file get even smaller
by not including things like:

[acme@quaco perf-tools]$ egrep -w '(perf_event_sample_format|bpf_perf_event_value|perf_sample_weight|perf_mem_data_src) {' include/uapi/linux/*.h
include/uapi/linux/bpf.h:struct bpf_perf_event_value {
include/uapi/linux/perf_event.h:enum perf_event_sample_format {
include/uapi/linux/perf_event.h:union perf_mem_data_src {
include/uapi/linux/perf_event.h:union perf_mem_data_src {
include/uapi/linux/perf_event.h:union perf_sample_weight {
[acme@quaco perf-tools]$

Also why do we need these:

+struct mm_struct {
+} __attribute__((preserve_access_index));
+
+struct raw_spinlock {
+} __attribute__((preserve_access_index));
+
+typedef struct raw_spinlock raw_spinlock_t;
+
+struct spinlock {
+} __attribute__((preserve_access_index));
+
+typedef struct spinlock spinlock_t;
+
+struct sighand_struct {
+ spinlock_t siglock;
+} __attribute__((preserve_access_index));

We don't use them, they're just pointers you kept on:

+struct task_struct {
+ struct css_set *cgroups;
+ pid_t pid;
+ pid_t tgid;
+ char comm[16];
+ struct mm_struct *mm;
+ struct sighand_struct *sighand;
+ unsigned int flags;
+} __attribute__((preserve_access_index));

That with the preserve_access_index isn't needed, we need just the
fields that we access in the tools, right?

- Arnaldo

2023-05-05 15:19:32

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Fri, May 5, 2023 at 6:33 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa escreveu:
> > On Thu, May 04, 2023 at 03:03:42PM -0700, Ian Rogers wrote:
> > > On Thu, May 4, 2023 at 2:48 PM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > >
> > > > Em Thu, May 04, 2023 at 04:07:29PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> > > > > > On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > > > > > Andrii, can you add some more information about the usage of vmlinux.h
> > > > > > > instead of using kernel headers?
> > > > >
> > > > > > I'll just say that vmlinux.h is not a hard requirement to build BPF
> > > > > > programs, it's more a convenience allowing easy access to definitions
> > > > > > of both UAPI and kernel-internal structures for tracing needs and
> > > > > > marking them relocatable using BPF CO-RE machinery. Lots of real-world
> > > > > > applications just check-in pregenerated vmlinux.h to avoid build-time
> > > > > > dependency on up-to-date host kernel and such.
> > > > >
> > > > > > If vmlinux.h generation and usage is causing issues, though, given
> > > > > > that perf's BPF programs don't seem to be using many different kernel
> > > > > > types, it might be a better option to just use UAPI headers for public
> > > > > > kernel type definitions, and just define CO-RE-relocatable minimal
> > > > > > definitions locally in perf's BPF code for the other types necessary.
> > > > > > E.g., if perf needs only pid and tgid from task_struct, this would
> > > > > > suffice:
> > > > >
> > > > > > struct task_struct {
> > > > > > int pid;
> > > > > > int tgid;
> > > > > > } __attribute__((preserve_access_index));
> > > > >
> > > > > Yeah, that seems like a way better approach, no vmlinux involved, libbpf
> > > > > CO-RE notices that task_struct changed from this two integers version
> > > > > (of course) and does the relocation to where it is in the running kernel
> > > > > by using /sys/kernel/btf/vmlinux.
> > > >
> > > > Doing it for one of the skels, build tested, runtime untested, but not
> > > > using any vmlinux, BTF to help, not that bad, more verbose, but at least
> > > > we state what are the fields we actually use, have those attribute
> > > > documenting that those offsets will be recorded for future use, etc.
> > > >
> > > > Namhyung, can you please check that this works?
> > > >
> > > > Thanks,
> > > >
> > > > - Arnaldo
> > > >
> > > > diff --git a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > index 6a438e0102c5a2cb..f376d162549ebd74 100644
> > > > --- a/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > +++ b/tools/perf/util/bpf_skel/bperf_cgroup.bpf.c
> > > > @@ -1,11 +1,40 @@
> > > > // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > // Copyright (c) 2021 Facebook
> > > > // Copyright (c) 2021 Google
> > > > -#include "vmlinux.h"
> > > > +#include <linux/types.h>
> > > > +#include <linux/bpf.h>
> > >
> > > Compared to vmlinux.h here be dragons. It is easy to start dragging in
> > > all of libc and that may not work due to missing #ifdefs, etc.. Could
> > > we check in a vmlinux.h like libbpf-tools does?
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools#vmlinuxh-generation
> > > https://github.com/iovisor/bcc/tree/master/libbpf-tools/arm64
> > >
> > > This would also remove some of the errors that could be introduced by
> > > copy+pasting enums, etc. and also highlight issues with things being
> > > renamed as build time rather than runtime failures.
> >
> > we already have to deal with that, right? doing checks on fields in
> > structs like mm_struct___old
> >
> > > Could this be some shared resource for the different linux tools
> > > projects using a vmlinux.h? e.g. tools/lib/vmlinuxh with an
> > > install_headers target that builds a vmlinux.h.
> >
> > I tried to do the minimal header and it's not too big,
> > I pushed it in here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=perf/vmlinux_h
> >
> > compile tested so far
>
> I see it and it makes the change to be minimal, which is good at the
> current stage, but I wonder if it wouldn't be better for us to define
> just the ones not in UAPI and use the #include <linux/bpf.h>,
> <linux/perf_event.h> as I did in the patches I posted here and Namhyung
> tested at least one, this way the added vmlinux.h file get even smaller
> by not including things like:
>
> [acme@quaco perf-tools]$ egrep -w '(perf_event_sample_format|bpf_perf_event_value|perf_sample_weight|perf_mem_data_src) {' include/uapi/linux/*.h
> include/uapi/linux/bpf.h:struct bpf_perf_event_value {
> include/uapi/linux/perf_event.h:enum perf_event_sample_format {
> include/uapi/linux/perf_event.h:union perf_mem_data_src {
> include/uapi/linux/perf_event.h:union perf_mem_data_src {
> include/uapi/linux/perf_event.h:union perf_sample_weight {
> [acme@quaco perf-tools]$
>
> Also why do we need these:
>
> +struct mm_struct {
> +} __attribute__((preserve_access_index));
> +
> +struct raw_spinlock {
> +} __attribute__((preserve_access_index));
> +
> +typedef struct raw_spinlock raw_spinlock_t;
> +
> +struct spinlock {
> +} __attribute__((preserve_access_index));
> +
> +typedef struct spinlock spinlock_t;
> +
> +struct sighand_struct {
> + spinlock_t siglock;
> +} __attribute__((preserve_access_index));
>
> We don't use them, they're just pointers you kept on:
>
> +struct task_struct {
> + struct css_set *cgroups;
> + pid_t pid;
> + pid_t tgid;
> + char comm[16];
> + struct mm_struct *mm;
> + struct sighand_struct *sighand;
> + unsigned int flags;
> +} __attribute__((preserve_access_index));
>
> That with the preserve_access_index isn't needed, we need just the
> fields that we access in the tools, right?

Aside from that you probably want to take a look at BTFgen.
Old doc:
https://github.com/aquasecurity/btfhub/blob/main/docs/btfgen-internals.md
which landed as
"bpftool gen min_core_btf"
man bpftool-gen

It addresses the use case for kernels _without_ CONFIG_DEBUG_INFO_BTF.

2023-05-05 17:15:17

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH RFC/RFT] perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE. was Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Fri, May 05, 2023 at 10:33:15AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa escreveu:
> That with the preserve_access_index isn't needed, we need just the
> fields that we access in the tools, right?

I'm now doing build test this in many distro containers, without the two
reverts, i.e. BPF skels continue as opt-out as in my pull request, to
test build and also for the functionality tests on the tools using such
bpf skels, see below, no touching of vmlinux nor BTF data during the
build.

- Arnaldo

From 882adaee50bc27f85374aeb2fbaa5b76bef60d05 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <[email protected]>
Date: Thu, 4 May 2023 19:03:51 -0300
Subject: [PATCH 1/1] perf bpf skels: Stop using vmlinux.h generated from BTF,
use subset of used structs + CO-RE

Linus reported a build break due to using a vmlinux without a BTF elf
section to generate the vmlinux.h header with bpftool for use in the BPF
tools in tools/perf/util/bpf_skel/*.bpf.c.

Instead add a vmlinux.h file with the structs needed with the fields the
tools need, marking the structs with __attribute__((preserve_access_index)),
so that libbpf's CO-RE code can fixup the struct field offsets.

In some cases the vmlinux.h file that was being generated by bpftool
from the kernel BTF information was not needed at all, just including
linux/bpf.h, sometimes linux/perf_event.h was enough as non-UAPI
types were not being used.

To keep te patch small, include those UAPI headers from the trimmed down
vmlinux.h file, that then provides the tools with just the structs and
the subset of its fields needed for them.

Testing it:

# perf lock contention -b find / > /dev/null
^C contended total wait max wait avg wait type caller

7 53.59 us 10.86 us 7.66 us rwlock:R start_this_handle+0xa0
2 30.35 us 21.99 us 15.17 us rwsem:R iterate_dir+0x52
1 9.04 us 9.04 us 9.04 us rwlock:W start_this_handle+0x291
1 8.73 us 8.73 us 8.73 us spinlock raw_spin_rq_lock_nested+0x1e
#
# perf lock contention -abl find / > /dev/null
^C contended total wait max wait avg wait address symbol

1 262.96 ms 262.96 ms 262.96 ms ffff8e67502d0170 (mutex)
12 244.24 us 39.91 us 20.35 us ffff8e6af56f8070 mmap_lock (rwsem)
7 30.28 us 6.85 us 4.33 us ffff8e6c865f1d40 rq_lock (spinlock)
3 7.42 us 4.03 us 2.47 us ffff8e6c864b1d40 rq_lock (spinlock)
2 3.72 us 2.19 us 1.86 us ffff8e6c86571d40 rq_lock (spinlock)
1 2.42 us 2.42 us 2.42 us ffff8e6c86471d40 rq_lock (spinlock)
4 2.11 us 559 ns 527 ns ffffffff9a146c80 rcu_state (spinlock)
3 1.45 us 818 ns 482 ns ffff8e674ae8384c (rwlock)
1 870 ns 870 ns 870 ns ffff8e68456ee060 (rwlock)
1 663 ns 663 ns 663 ns ffff8e6c864f1d40 rq_lock (spinlock)
1 573 ns 573 ns 573 ns ffff8e6c86531d40 rq_lock (spinlock)
1 472 ns 472 ns 472 ns ffff8e6c86431740 (spinlock)
1 397 ns 397 ns 397 ns ffff8e67413a4f04 (spinlock)
#
# perf test offcpu
95: perf record offcpu profiling tests : Ok
#
# perf kwork latency --use-bpf
Starting trace, Hit <Ctrl+C> to stop and report
^C
Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end |
--------------------------------------------------------------------------------------------------------------------------------
(w)flush_memcg_stats_dwork | 0000 | 1056.212 ms | 2 | 2112.345 ms | 550113.229573 s | 550115.341919 s |
(w)toggle_allocation_gate | 0000 | 10.144 ms | 62 | 416.389 ms | 550113.453518 s | 550113.869907 s |
(w)0xffff8e6748e28080 | 0002 | 0.623 ms | 1 | 0.623 ms | 550110.989841 s | 550110.990464 s |
(w)vmstat_shepherd | 0000 | 0.586 ms | 10 | 2.828 ms | 550111.971536 s | 550111.974364 s |
(w)vmstat_update | 0007 | 0.363 ms | 5 | 1.634 ms | 550113.222520 s | 550113.224154 s |
(w)vmstat_update | 0000 | 0.324 ms | 10 | 2.827 ms | 550111.971526 s | 550111.974354 s |
(w)0xffff8e674c5f4a58 | 0002 | 0.102 ms | 5 | 0.134 ms | 550110.989839 s | 550110.989972 s |
(w)psi_avgs_work | 0001 | 0.086 ms | 3 | 0.107 ms | 550114.957852 s | 550114.957959 s |
(w)psi_avgs_work | 0000 | 0.079 ms | 5 | 0.100 ms | 550118.605668 s | 550118.605768 s |
(w)kfree_rcu_monitor | 0006 | 0.079 ms | 1 | 0.079 ms | 550110.925821 s | 550110.925900 s |
(w)psi_avgs_work | 0004 | 0.079 ms | 1 | 0.079 ms | 550109.581835 s | 550109.581914 s |
(w)psi_avgs_work | 0001 | 0.078 ms | 1 | 0.078 ms | 550109.197809 s | 550109.197887 s |
(w)psi_avgs_work | 0002 | 0.077 ms | 5 | 0.086 ms | 550110.669819 s | 550110.669905 s |
<SNIP>
# strace -e bpf -o perf-stat-bpf-counters.output perf stat -e cycles --bpf-counters sleep 1

Performance counter stats for 'sleep 1':

6,197,983 cycles

1.003922848 seconds time elapsed

0.000000000 seconds user
0.002032000 seconds sys

# head -7 perf-stat-bpf-counters.output
bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/perf_attr_map", bpf_fd=0, file_flags=0}, 16) = 3
bpf(BPF_OBJ_GET_INFO_BY_FD, {info={bpf_fd=3, info_len=88, info=0x7ffcead64990}}, 16) = 0
bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=3, key=0x24129e0, value=0x7ffcead65a48, flags=BPF_ANY}, 32) = 0
bpf(BPF_LINK_GET_FD_BY_ID, {link_id=1252}, 12) = -1 ENOENT (No such file or directory)
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65780, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 116) = 4
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65920, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 128) = 4
bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\20\0\0\0\20\0\0\0\5\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=45, btf_log_size=0, btf_log_level=0}, 28) = 4
#

Reported-by: Linus Torvalds <[email protected]>
Suggested-by: Andrii Nakryiko <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Co-developed-by: Jiri Olsa <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/Makefile.perf | 20 +---
tools/perf/util/bpf_skel/.gitignore | 1 -
tools/perf/util/bpf_skel/vmlinux.h | 173 ++++++++++++++++++++++++++++
3 files changed, 174 insertions(+), 20 deletions(-)
create mode 100644 tools/perf/util/bpf_skel/vmlinux.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 48aba186ceb50792..61c33d100b2bcc90 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -1063,25 +1063,7 @@ $(BPFTOOL): | $(SKEL_TMP_OUT)
$(Q)CFLAGS= $(MAKE) -C ../bpf/bpftool \
OUTPUT=$(SKEL_TMP_OUT)/ bootstrap

-VMLINUX_BTF_PATHS ?= $(if $(O),$(O)/vmlinux) \
- $(if $(KBUILD_OUTPUT),$(KBUILD_OUTPUT)/vmlinux) \
- ../../vmlinux \
- /sys/kernel/btf/vmlinux \
- /boot/vmlinux-$(shell uname -r)
-VMLINUX_BTF ?= $(abspath $(firstword $(wildcard $(VMLINUX_BTF_PATHS))))
-
-$(SKEL_OUT)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL)
-ifeq ($(VMLINUX_H),)
- $(QUIET_GEN)$(BPFTOOL) btf dump file $< format c > $@ || \
- (echo "Failure to generate vmlinux.h needed for the recommended BPF skeleton support." && \
- echo "To disable this use the build option NO_BPF_SKEL=1." && \
- echo "Alternatively point at a pre-generated vmlinux.h with VMLINUX_H=<path>." && \
- false)
-else
- $(Q)cp "$(VMLINUX_H)" $@
-endif
-
-$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) $(SKEL_OUT)/vmlinux.h | $(SKEL_TMP_OUT)
+$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT)
$(QUIET_CLANG)$(CLANG) -g -O2 -target bpf -Wall -Werror $(BPF_INCLUDE) \
-c $(filter util/bpf_skel/%.bpf.c,$^) -o $@ && $(LLVM_STRIP) -g $@

diff --git a/tools/perf/util/bpf_skel/.gitignore b/tools/perf/util/bpf_skel/.gitignore
index cd01455e1b53c3d9..7a1c832825de8445 100644
--- a/tools/perf/util/bpf_skel/.gitignore
+++ b/tools/perf/util/bpf_skel/.gitignore
@@ -1,4 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
.tmp
*.skel.h
-vmlinux.h
diff --git a/tools/perf/util/bpf_skel/vmlinux.h b/tools/perf/util/bpf_skel/vmlinux.h
new file mode 100644
index 0000000000000000..449b1ea91fc48143
--- /dev/null
+++ b/tools/perf/util/bpf_skel/vmlinux.h
@@ -0,0 +1,173 @@
+#ifndef __VMLINUX_H
+#define __VMLINUX_H
+
+#include <linux/bpf.h>
+#include <linux/types.h>
+#include <linux/perf_event.h>
+#include <stdbool.h>
+
+// non-UAPI kernel data structures, used in the .bpf.c BPF tool component.
+
+// Just the fields used in these tools preserving the access index so that
+// libbpf can fixup offsets with the ones used in the kernel when loading the
+// BPF bytecode, if they differ from what is used here.
+
+typedef __u8 u8;
+typedef __u32 u32;
+typedef __u64 u64;
+typedef __s64 s64;
+
+typedef int pid_t;
+
+enum cgroup_subsys_id {
+ perf_event_cgrp_id = 8,
+};
+
+enum {
+ HI_SOFTIRQ = 0,
+ TIMER_SOFTIRQ,
+ NET_TX_SOFTIRQ,
+ NET_RX_SOFTIRQ,
+ BLOCK_SOFTIRQ,
+ IRQ_POLL_SOFTIRQ,
+ TASKLET_SOFTIRQ,
+ SCHED_SOFTIRQ,
+ HRTIMER_SOFTIRQ,
+ RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */
+
+ NR_SOFTIRQS
+};
+
+typedef struct {
+ s64 counter;
+} __attribute__((preserve_access_index)) atomic64_t;
+
+typedef atomic64_t atomic_long_t;
+
+struct raw_spinlock {
+ int rawlock;
+} __attribute__((preserve_access_index));
+
+typedef struct raw_spinlock raw_spinlock_t;
+
+typedef struct {
+ struct raw_spinlock rlock;
+} __attribute__((preserve_access_index)) spinlock_t;
+
+struct sighand_struct {
+ spinlock_t siglock;
+} __attribute__((preserve_access_index));
+
+struct rw_semaphore {
+ atomic_long_t owner;
+} __attribute__((preserve_access_index));
+
+struct mutex {
+ atomic_long_t owner;
+} __attribute__((preserve_access_index));
+
+struct kernfs_node {
+ u64 id;
+} __attribute__((preserve_access_index));
+
+struct cgroup {
+ struct kernfs_node *kn;
+ int level;
+} __attribute__((preserve_access_index));
+
+struct cgroup_subsys_state {
+ struct cgroup *cgroup;
+} __attribute__((preserve_access_index));
+
+struct css_set {
+ struct cgroup_subsys_state *subsys[13];
+ struct cgroup *dfl_cgrp;
+} __attribute__((preserve_access_index));
+
+struct mm_struct {
+ struct rw_semaphore mmap_lock;
+} __attribute__((preserve_access_index));
+
+struct task_struct {
+ unsigned int flags;
+ struct mm_struct *mm;
+ pid_t pid;
+ pid_t tgid;
+ char comm[16];
+ struct sighand_struct *sighand;
+ struct css_set *cgroups;
+} __attribute__((preserve_access_index));
+
+struct trace_entry {
+ short unsigned int type;
+ unsigned char flags;
+ unsigned char preempt_count;
+ int pid;
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_irq_handler_entry {
+ struct trace_entry ent;
+ int irq;
+ u32 __data_loc_name;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_irq_handler_exit {
+ struct trace_entry ent;
+ int irq;
+ int ret;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_softirq {
+ struct trace_entry ent;
+ unsigned int vec;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_execute_start {
+ struct trace_entry ent;
+ void *work;
+ void *function;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_execute_end {
+ struct trace_entry ent;
+ void *work;
+ void *function;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct trace_event_raw_workqueue_activate_work {
+ struct trace_entry ent;
+ void *work;
+ char __data[];
+} __attribute__((preserve_access_index));
+
+struct perf_sample_data {
+ u64 addr;
+ u64 period;
+ union perf_sample_weight weight;
+ u64 txn;
+ union perf_mem_data_src data_src;
+ u64 ip;
+ struct {
+ u32 pid;
+ u32 tid;
+ } tid_entry;
+ u64 time;
+ u64 id;
+ struct {
+ u32 cpu;
+ } cpu_entry;
+ u64 phys_addr;
+ u64 data_page_size;
+ u64 code_page_size;
+} __attribute__((__aligned__(64))) __attribute__((preserve_access_index));
+
+struct bpf_perf_event_data_kern {
+ struct perf_sample_data *data;
+ struct perf_event *event;
+} __attribute__((preserve_access_index));
+#endif // __VMLINUX_H
--
2.39.2

2023-05-05 20:59:31

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH RFC/RFT] perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE. was Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

Em Fri, May 05, 2023 at 01:46:30PM -0700, Ian Rogers escreveu:
> On Fri, May 5, 2023 at 1:43 PM Jiri Olsa <[email protected]> wrote:
> >
> > On Fri, May 05, 2023 at 10:04:47AM -0700, Ian Rogers wrote:
> > > On Fri, May 5, 2023 at 9:56 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > >
> > > > Em Fri, May 05, 2023 at 10:33:15AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > Em Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa escreveu:
> > > > > That with the preserve_access_index isn't needed, we need just the
> > > > > fields that we access in the tools, right?
> > > >
> > > > I'm now doing build test this in many distro containers, without the two
> > > > reverts, i.e. BPF skels continue as opt-out as in my pull request, to
> > > > test build and also for the functionality tests on the tools using such
> > > > bpf skels, see below, no touching of vmlinux nor BTF data during the
> > > > build.
> > > >
> > > > - Arnaldo
> > > >
> > > > From 882adaee50bc27f85374aeb2fbaa5b76bef60d05 Mon Sep 17 00:00:00 2001
> > > > From: Arnaldo Carvalho de Melo <[email protected]>
> > > > Date: Thu, 4 May 2023 19:03:51 -0300
> > > > Subject: [PATCH 1/1] perf bpf skels: Stop using vmlinux.h generated from BTF,
> > > > use subset of used structs + CO-RE
> > > >
> > > > Linus reported a build break due to using a vmlinux without a BTF elf
> > > > section to generate the vmlinux.h header with bpftool for use in the BPF
> > > > tools in tools/perf/util/bpf_skel/*.bpf.c.
> > > >
> > > > Instead add a vmlinux.h file with the structs needed with the fields the
> > > > tools need, marking the structs with __attribute__((preserve_access_index)),
> > > > so that libbpf's CO-RE code can fixup the struct field offsets.
> > > >
> > > > In some cases the vmlinux.h file that was being generated by bpftool
> > > > from the kernel BTF information was not needed at all, just including
> > > > linux/bpf.h, sometimes linux/perf_event.h was enough as non-UAPI
> > > > types were not being used.
> > > >
> > > > To keep te patch small, include those UAPI headers from the trimmed down
> > > > vmlinux.h file, that then provides the tools with just the structs and
> > > > the subset of its fields needed for them.
> > > >
> > > > Testing it:
> > > >
> > > > # perf lock contention -b find / > /dev/null
> >
> > I tested perf lock con -abv -L rcu_state sleep 1
> > and needed fix below
> >
> > jirka
>
> I thought this was fixed by:
> https://lore.kernel.org/lkml/[email protected]/
> but I think that is just in perf-tools-next.

Nope, we have it in perf-tools:

commit e53de7b65a3ca59af268c78df2d773f277f717fd
Author: Namhyung Kim <[email protected]>
Date: Thu Apr 27 16:48:32 2023 -0700

perf lock contention: Fix struct rq lock access

2023-05-07 19:17:14

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [GIT PULL] perf tools changes for v6.4

The pull request you sent on Wed, 3 May 2023 18:18:01 -0300:

> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-tools-for-v6.4-1-2023-05-03

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/ecc68ee216c6c5b2f84915e1441adf436f1b019b

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

2023-05-08 22:39:32

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH RFC/RFT] perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE. was Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4

On Fri, May 5, 2023 at 9:56 AM Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Fri, May 05, 2023 at 10:33:15AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, May 05, 2023 at 01:03:14AM +0200, Jiri Olsa escreveu:
> > That with the preserve_access_index isn't needed, we need just the
> > fields that we access in the tools, right?
>
> I'm now doing build test this in many distro containers, without the two
> reverts, i.e. BPF skels continue as opt-out as in my pull request, to
> test build and also for the functionality tests on the tools using such
> bpf skels, see below, no touching of vmlinux nor BTF data during the
> build.
>
> - Arnaldo
>
> From 882adaee50bc27f85374aeb2fbaa5b76bef60d05 Mon Sep 17 00:00:00 2001
> From: Arnaldo Carvalho de Melo <[email protected]>
> Date: Thu, 4 May 2023 19:03:51 -0300
> Subject: [PATCH 1/1] perf bpf skels: Stop using vmlinux.h generated from BTF,
> use subset of used structs + CO-RE
>
> Linus reported a build break due to using a vmlinux without a BTF elf
> section to generate the vmlinux.h header with bpftool for use in the BPF
> tools in tools/perf/util/bpf_skel/*.bpf.c.
>
> Instead add a vmlinux.h file with the structs needed with the fields the
> tools need, marking the structs with __attribute__((preserve_access_index)),
> so that libbpf's CO-RE code can fixup the struct field offsets.
>
> In some cases the vmlinux.h file that was being generated by bpftool
> from the kernel BTF information was not needed at all, just including
> linux/bpf.h, sometimes linux/perf_event.h was enough as non-UAPI
> types were not being used.
>
> To keep te patch small, include those UAPI headers from the trimmed down
> vmlinux.h file, that then provides the tools with just the structs and
> the subset of its fields needed for them.
>
> Testing it:
>
> # perf lock contention -b find / > /dev/null
> ^C contended total wait max wait avg wait type caller
>
> 7 53.59 us 10.86 us 7.66 us rwlock:R start_this_handle+0xa0
> 2 30.35 us 21.99 us 15.17 us rwsem:R iterate_dir+0x52
> 1 9.04 us 9.04 us 9.04 us rwlock:W start_this_handle+0x291
> 1 8.73 us 8.73 us 8.73 us spinlock raw_spin_rq_lock_nested+0x1e
> #
> # perf lock contention -abl find / > /dev/null
> ^C contended total wait max wait avg wait address symbol
>
> 1 262.96 ms 262.96 ms 262.96 ms ffff8e67502d0170 (mutex)
> 12 244.24 us 39.91 us 20.35 us ffff8e6af56f8070 mmap_lock (rwsem)
> 7 30.28 us 6.85 us 4.33 us ffff8e6c865f1d40 rq_lock (spinlock)
> 3 7.42 us 4.03 us 2.47 us ffff8e6c864b1d40 rq_lock (spinlock)
> 2 3.72 us 2.19 us 1.86 us ffff8e6c86571d40 rq_lock (spinlock)
> 1 2.42 us 2.42 us 2.42 us ffff8e6c86471d40 rq_lock (spinlock)
> 4 2.11 us 559 ns 527 ns ffffffff9a146c80 rcu_state (spinlock)
> 3 1.45 us 818 ns 482 ns ffff8e674ae8384c (rwlock)
> 1 870 ns 870 ns 870 ns ffff8e68456ee060 (rwlock)
> 1 663 ns 663 ns 663 ns ffff8e6c864f1d40 rq_lock (spinlock)
> 1 573 ns 573 ns 573 ns ffff8e6c86531d40 rq_lock (spinlock)
> 1 472 ns 472 ns 472 ns ffff8e6c86431740 (spinlock)
> 1 397 ns 397 ns 397 ns ffff8e67413a4f04 (spinlock)
> #
> # perf test offcpu
> 95: perf record offcpu profiling tests : Ok
> #
> # perf kwork latency --use-bpf
> Starting trace, Hit <Ctrl+C> to stop and report
> ^C
> Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end |
> --------------------------------------------------------------------------------------------------------------------------------
> (w)flush_memcg_stats_dwork | 0000 | 1056.212 ms | 2 | 2112.345 ms | 550113.229573 s | 550115.341919 s |
> (w)toggle_allocation_gate | 0000 | 10.144 ms | 62 | 416.389 ms | 550113.453518 s | 550113.869907 s |
> (w)0xffff8e6748e28080 | 0002 | 0.623 ms | 1 | 0.623 ms | 550110.989841 s | 550110.990464 s |
> (w)vmstat_shepherd | 0000 | 0.586 ms | 10 | 2.828 ms | 550111.971536 s | 550111.974364 s |
> (w)vmstat_update | 0007 | 0.363 ms | 5 | 1.634 ms | 550113.222520 s | 550113.224154 s |
> (w)vmstat_update | 0000 | 0.324 ms | 10 | 2.827 ms | 550111.971526 s | 550111.974354 s |
> (w)0xffff8e674c5f4a58 | 0002 | 0.102 ms | 5 | 0.134 ms | 550110.989839 s | 550110.989972 s |
> (w)psi_avgs_work | 0001 | 0.086 ms | 3 | 0.107 ms | 550114.957852 s | 550114.957959 s |
> (w)psi_avgs_work | 0000 | 0.079 ms | 5 | 0.100 ms | 550118.605668 s | 550118.605768 s |
> (w)kfree_rcu_monitor | 0006 | 0.079 ms | 1 | 0.079 ms | 550110.925821 s | 550110.925900 s |
> (w)psi_avgs_work | 0004 | 0.079 ms | 1 | 0.079 ms | 550109.581835 s | 550109.581914 s |
> (w)psi_avgs_work | 0001 | 0.078 ms | 1 | 0.078 ms | 550109.197809 s | 550109.197887 s |
> (w)psi_avgs_work | 0002 | 0.077 ms | 5 | 0.086 ms | 550110.669819 s | 550110.669905 s |
> <SNIP>
> # strace -e bpf -o perf-stat-bpf-counters.output perf stat -e cycles --bpf-counters sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 6,197,983 cycles
>
> 1.003922848 seconds time elapsed
>
> 0.000000000 seconds user
> 0.002032000 seconds sys
>
> # head -7 perf-stat-bpf-counters.output
> bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/perf_attr_map", bpf_fd=0, file_flags=0}, 16) = 3
> bpf(BPF_OBJ_GET_INFO_BY_FD, {info={bpf_fd=3, info_len=88, info=0x7ffcead64990}}, 16) = 0
> bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=3, key=0x24129e0, value=0x7ffcead65a48, flags=BPF_ANY}, 32) = 0
> bpf(BPF_LINK_GET_FD_BY_ID, {link_id=1252}, 12) = -1 ENOENT (No such file or directory)
> bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65780, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 116) = 4
> bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65920, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 128) = 4
> bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\20\0\0\0\20\0\0\0\5\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=45, btf_log_size=0, btf_log_level=0}, 28) = 4
> #
>
> Reported-by: Linus Torvalds <[email protected]>
> Suggested-by: Andrii Nakryiko <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> Cc: Ian Rogers <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Co-developed-by: Jiri Olsa <[email protected]>
> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> ---
> tools/perf/Makefile.perf | 20 +---
> tools/perf/util/bpf_skel/.gitignore | 1 -
> tools/perf/util/bpf_skel/vmlinux.h | 173 ++++++++++++++++++++++++++++
> 3 files changed, 174 insertions(+), 20 deletions(-)
> create mode 100644 tools/perf/util/bpf_skel/vmlinux.h
>
> diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> index 48aba186ceb50792..61c33d100b2bcc90 100644
> --- a/tools/perf/Makefile.perf
> +++ b/tools/perf/Makefile.perf
> @@ -1063,25 +1063,7 @@ $(BPFTOOL): | $(SKEL_TMP_OUT)
> $(Q)CFLAGS= $(MAKE) -C ../bpf/bpftool \
> OUTPUT=$(SKEL_TMP_OUT)/ bootstrap
>
> -VMLINUX_BTF_PATHS ?= $(if $(O),$(O)/vmlinux) \
> - $(if $(KBUILD_OUTPUT),$(KBUILD_OUTPUT)/vmlinux) \
> - ../../vmlinux \
> - /sys/kernel/btf/vmlinux \
> - /boot/vmlinux-$(shell uname -r)
> -VMLINUX_BTF ?= $(abspath $(firstword $(wildcard $(VMLINUX_BTF_PATHS))))
> -
> -$(SKEL_OUT)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL)
> -ifeq ($(VMLINUX_H),)
> - $(QUIET_GEN)$(BPFTOOL) btf dump file $< format c > $@ || \
> - (echo "Failure to generate vmlinux.h needed for the recommended BPF skeleton support." && \
> - echo "To disable this use the build option NO_BPF_SKEL=1." && \
> - echo "Alternatively point at a pre-generated vmlinux.h with VMLINUX_H=<path>." && \
> - false)
> -else
> - $(Q)cp "$(VMLINUX_H)" $@
> -endif
> -
> -$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) $(SKEL_OUT)/vmlinux.h | $(SKEL_TMP_OUT)
> +$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT)
> $(QUIET_CLANG)$(CLANG) -g -O2 -target bpf -Wall -Werror $(BPF_INCLUDE) \
> -c $(filter util/bpf_skel/%.bpf.c,$^) -o $@ && $(LLVM_STRIP) -g $@
>
> diff --git a/tools/perf/util/bpf_skel/.gitignore b/tools/perf/util/bpf_skel/.gitignore
> index cd01455e1b53c3d9..7a1c832825de8445 100644
> --- a/tools/perf/util/bpf_skel/.gitignore
> +++ b/tools/perf/util/bpf_skel/.gitignore
> @@ -1,4 +1,3 @@
> # SPDX-License-Identifier: GPL-2.0-only
> .tmp
> *.skel.h
> -vmlinux.h
> diff --git a/tools/perf/util/bpf_skel/vmlinux.h b/tools/perf/util/bpf_skel/vmlinux.h
> new file mode 100644
> index 0000000000000000..449b1ea91fc48143
> --- /dev/null
> +++ b/tools/perf/util/bpf_skel/vmlinux.h
> @@ -0,0 +1,173 @@
> +#ifndef __VMLINUX_H
> +#define __VMLINUX_H
> +
> +#include <linux/bpf.h>
> +#include <linux/types.h>

Is this inclusion tools/include/linux/types.h or
tools/include/uapi/linux/types.h? The former is the norm in the perf
tree:
https://lore.kernel.org/linux-perf-users/CAP-5=fXKi+VAr-_n5tAaJ7Z2fvU7jc5N-CKCjkCAh_01_pzMfA@mail.gmail.com/
and that has the definitions:
typedef uint64_t u64;
typedef int64_t s64;

> +#include <linux/perf_event.h>
> +#include <stdbool.h>
> +
> +// non-UAPI kernel data structures, used in the .bpf.c BPF tool component.
> +
> +// Just the fields used in these tools preserving the access index so that
> +// libbpf can fixup offsets with the ones used in the kernel when loading the
> +// BPF bytecode, if they differ from what is used here.
> +
> +typedef __u8 u8;
> +typedef __u32 u32;
> +typedef __u64 u64;
> +typedef __s64 s64;

which then collide with these two definitions. On my builds this triggers:
error: typedef redefinition with different types ('__u64' (aka
'unsigned long long') vs 'uint64_t' (aka 'unsigned long'))
I'm working around the issue by going back to using a generated vmlinux.h.

Thanks,
Ian

> +
> +typedef int pid_t;
> +
> +enum cgroup_subsys_id {
> + perf_event_cgrp_id = 8,
> +};
> +
> +enum {
> + HI_SOFTIRQ = 0,
> + TIMER_SOFTIRQ,
> + NET_TX_SOFTIRQ,
> + NET_RX_SOFTIRQ,
> + BLOCK_SOFTIRQ,
> + IRQ_POLL_SOFTIRQ,
> + TASKLET_SOFTIRQ,
> + SCHED_SOFTIRQ,
> + HRTIMER_SOFTIRQ,
> + RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */
> +
> + NR_SOFTIRQS
> +};
> +
> +typedef struct {
> + s64 counter;
> +} __attribute__((preserve_access_index)) atomic64_t;
> +
> +typedef atomic64_t atomic_long_t;
> +
> +struct raw_spinlock {
> + int rawlock;
> +} __attribute__((preserve_access_index));
> +
> +typedef struct raw_spinlock raw_spinlock_t;
> +
> +typedef struct {
> + struct raw_spinlock rlock;
> +} __attribute__((preserve_access_index)) spinlock_t;
> +
> +struct sighand_struct {
> + spinlock_t siglock;
> +} __attribute__((preserve_access_index));
> +
> +struct rw_semaphore {
> + atomic_long_t owner;
> +} __attribute__((preserve_access_index));
> +
> +struct mutex {
> + atomic_long_t owner;
> +} __attribute__((preserve_access_index));
> +
> +struct kernfs_node {
> + u64 id;
> +} __attribute__((preserve_access_index));
> +
> +struct cgroup {
> + struct kernfs_node *kn;
> + int level;
> +} __attribute__((preserve_access_index));
> +
> +struct cgroup_subsys_state {
> + struct cgroup *cgroup;
> +} __attribute__((preserve_access_index));
> +
> +struct css_set {
> + struct cgroup_subsys_state *subsys[13];
> + struct cgroup *dfl_cgrp;
> +} __attribute__((preserve_access_index));
> +
> +struct mm_struct {
> + struct rw_semaphore mmap_lock;
> +} __attribute__((preserve_access_index));
> +
> +struct task_struct {
> + unsigned int flags;
> + struct mm_struct *mm;
> + pid_t pid;
> + pid_t tgid;
> + char comm[16];
> + struct sighand_struct *sighand;
> + struct css_set *cgroups;
> +} __attribute__((preserve_access_index));
> +
> +struct trace_entry {
> + short unsigned int type;
> + unsigned char flags;
> + unsigned char preempt_count;
> + int pid;
> +} __attribute__((preserve_access_index));
> +
> +struct trace_event_raw_irq_handler_entry {
> + struct trace_entry ent;
> + int irq;
> + u32 __data_loc_name;
> + char __data[];
> +} __attribute__((preserve_access_index));
> +
> +struct trace_event_raw_irq_handler_exit {
> + struct trace_entry ent;
> + int irq;
> + int ret;
> + char __data[];
> +} __attribute__((preserve_access_index));
> +
> +struct trace_event_raw_softirq {
> + struct trace_entry ent;
> + unsigned int vec;
> + char __data[];
> +} __attribute__((preserve_access_index));
> +
> +struct trace_event_raw_workqueue_execute_start {
> + struct trace_entry ent;
> + void *work;
> + void *function;
> + char __data[];
> +} __attribute__((preserve_access_index));
> +
> +struct trace_event_raw_workqueue_execute_end {
> + struct trace_entry ent;
> + void *work;
> + void *function;
> + char __data[];
> +} __attribute__((preserve_access_index));
> +
> +struct trace_event_raw_workqueue_activate_work {
> + struct trace_entry ent;
> + void *work;
> + char __data[];
> +} __attribute__((preserve_access_index));
> +
> +struct perf_sample_data {
> + u64 addr;
> + u64 period;
> + union perf_sample_weight weight;
> + u64 txn;
> + union perf_mem_data_src data_src;
> + u64 ip;
> + struct {
> + u32 pid;
> + u32 tid;
> + } tid_entry;
> + u64 time;
> + u64 id;
> + struct {
> + u32 cpu;
> + } cpu_entry;
> + u64 phys_addr;
> + u64 data_page_size;
> + u64 code_page_size;
> +} __attribute__((__aligned__(64))) __attribute__((preserve_access_index));
> +
> +struct bpf_perf_event_data_kern {
> + struct perf_sample_data *data;
> + struct perf_event *event;
> +} __attribute__((preserve_access_index));
> +#endif // __VMLINUX_H
> --
> 2.39.2
>