Hi Linus,
Please consider pulling, there is one patch for updating
the tools/lib/ (edited) copy of rbtree.c and rbtree_augmented.h that is
already in your tree, as I didn't notice that Andrew Morton had picked
it until Stephen Rothwell mentioned it to me, no biggie, I think.
Best regards,
- Arnaldo
The following changes since commit ed30a4a51bb196781c8058073ea720133a65596f:
Linux 6.9-rc5 (2024-04-21 12:35:54 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools.git tags/perf-tools-for-v6.10-1-2024-05-21
for you to fetch changes up to ea558c86248b4955e5c5f3c0c921df450880605e:
tools lib subcmd: Show parent options in help (2024-05-12 21:09:52 -0300)
----------------------------------------------------------------
perf tools fixes and improvements for v6.10:
- Add Kan Liang to MAINTAINERS as a perf tools reviewer.
- Add support for using the 'capstone' disassembler library in various tools,
such as 'perf script' and 'perf annotate'. This is an alternative for the
use of the 'xed' and 'objdump' disassemblers.
- Data-type profiling improvements:
Resolve types for a->b->c by backtracking the assignments until it finds
DWARF info for one of those members
Support for global variables, keeping a cache to speed up lookups.
Handle the 'call' instruction, dealing with effects on registers and handling
its return when tracking register data types.
Handle x86's segment based addressing like %gs:0x28, to support things like
per CPU variables, the stack canary, etc.
Data-type profiling got big speedups when using capstone for disassembling.
The objdump outoput parsing method is left as a fallback when capstone fails or
isn't available. There are patches posted for 6.11 that to use a LLVM
disassembler.
Support event group display in the TUI when annotating types with --data-type,
for instance to show memory load and store events for the data type fields.
Optimize the 'perf annotate' data structures, reducing memory usage.
Add a initial 'perf test' for 'perf annotate', checking that a target symbol
appears on the output, specifying objdump via the command line, etc.
- Integrate the shellcheck utility with the build of perf to allow catching
shell problems early in areas such as 'perf test', 'perf trace' scrape
scripts, etc.
- Add 'uretprobe' variant in the 'perf bench uprobe' tool.
- Add script to run instances of 'perf script' in parallel.
- Allow parsing tracepoint names that start with digits, such as
9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems
where those tracepoints aren't available.
Vendor Events:
- Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice
Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X,
Sky Lake and Snow Ridge X. Remove info metrics erroneously in TopdownL1.
- Add AMD's Zen 5 core and uncore events and metrics. Those come from the
"Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors"
document, with events that capture information on op dispatch, execution and
retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc.
- Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/AmpereOneX.
Miscellaneous:
- Sync header copies with the kernel sources.
- Move some header copies used only for generating translation string tables
for ioctl cmds and other syscall integer arguments to a new directory under
tools/perf/beauty/, to separate from copies in tools/include/ that are used
to build the tools.
- Introduce scrape script for several syscall 'flags'/'mask' arguments.
- Improve cpumap utilization, fixing up pairing of refcounts, using the right
iterators (perf_cpu_map__for_each_cpu), etc.
- Give more details about raw event encodings in 'perf list', show tracepoint
encoding in the detailed output.
- Refactor the DSOs handling code, reducing memory usage.
- Document the BPF event modifier and add a 'perf test' for it.
- Improve the event parser, better error messages and add further 'perf test's
for it.
- Add reference count checking to 'struct comm_str' and 'struct mem_info'.
- Make ARM64's 'perf test' entries for the Neoverse N1 more robust.
- Tweak the ARM64's Coresight 'perf test's.
- Improve ARM64's CoreSight ETM version detection and error reporting.
- Fix handling of symbols when using kcore.
- Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual
machines in 'perf report'.
- Fix -g/--call-graph option failure in 'perf sched timehist'.
- Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent
installed in non-standard directories, such as when doing cross builds.
- Various 'perf test' and 'perf bench' fixes.
- Improve 'perf probe' error message for long C++ probe names.
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
----------------------------------------------------------------
Adrian Hunter (6):
perf script: Show also errors for --insn-trace option
perf auxtrace: Fix multiple use of --itrace option
perf script: Consolidate capstone print functions
perf record: Fix debug message placement for test consumption
perf scripts python: Add a script to run instances of 'perf script' in parallel
perf intel-pt: Fix unassigned instruction op (discovered by MemorySanitizer)
Andi Kleen (2):
perf script: Support 32bit code under 64bit OS with capstone
perf script: Add capstone support for '-F +brstackdisasm'
Anne Macedo (1):
perf lock contention: Trim backtrace by skipping traceiter functions
Arnaldo Carvalho de Melo (36):
perf trace: Collect sys_nanosleep first argument
perf beauty: Fix dependency of tables using uapi/linux/mount.h
perf beauty: Move uapi/linux/fs.h copy out of the directory used to build perf
perf beauty: Don't include uapi/linux/mount.h, use sys/mount.h instead
perf beauty: Move uapi/linux/mount.h copy out of the directory used to build perf
perf beauty: Move uapi/linux/usbdevice_fs.h copy out of the directory used to build perf
perf beauty: Move uapi/sound/asound.h copy out of the directory used to build perf
perf beauty: Move arch/x86/include/asm/irq_vectors.h copy out of the directory used to build perf
perf beauty: Stop using the copy of uapi/linux/prctl.h
perf beauty: Move prctl.h files (uapi/linux and x86's) copy out of the directory used to build perf
perf beauty: Use the system linux/fcntl.h instead of a copy from the kernel
tools headers: Remove now unused copies of uapi/{fcntl,openat2}.h and asm/fcntl.h
tools headers: Remove almost unused copy of uapi/stat.h, add few conditional defines
perf beauty: Introduce scrape script for 'clone' syscall 'flags' argument
perf beauty: Introduce scrape script for various fs syscalls 'flags' arguments
perf beauty: Introduce scrape script for the 'statx' syscall 'mask' argument
perf beauty: Introduce faccessat2 flags scnprintf routine
perf trace: Beautify the 'flags' arg of unlinkat
perf trace: Fix 'newfstatat'/'fstatat' argument pretty printing
perf probe: Add missing libgen.h header needed for using basename()
perf beauty: Move uapi/linux/vhost.h copy out of the directory used to build perf
perf tools: Add Kan Liang to MAINTAINERS as a reviewer
perf annotate: Initialize 'arch' variable not to trip some -Werror=maybe-uninitialized
Revert "tools headers: Remove almost unused copy of uapi/stat.h, add few conditional defines"
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
tools include UAPI: Sync linux/vhost.h with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf tests shell kprobes: Add missing description as used by 'perf test' output
tools lib rbtree: Pick some improvements from the kernel rbtree code
tools headers x86 cpufeatures: Sync with the kernel sources to pick BHI mitigation changes
tools headers: Synchronize linux/bits.h with the kernel sources
perf test: Reintroduce -p/--parallel and make -S/--sequential the default
perf annotate: Use zfree() to avoid possibly accessing dangling pointers
perf callchain: Use zfree() to avoid possibly accessing dangling pointers
perf kwork: Use zfree() to avoid possibly accessing dangling pointers
perf probe: Use zfree() to avoid possibly accessing dangling pointers
Athira Rajeev (1):
perf annotate: Fix a comment about multi_regs in extract_reg_offset function
Bhaskar Chowdhury (1):
perf c2c: Fix a punctuation
Chaitanya S Prakash (1):
perf tools: Enable configs required for test_uprobe_from_different_cu.sh
Chen Pei (1):
perf genelf: Fix compiling with libelf on rv32
Dima Kogan (2):
perf probe-event: Un-hardcode sizeof(buf)
perf probe-event: Better error message for a too-long probe name
Dominique Martinet (3):
perf parse-events: pass parse_state to add_tracepoint
perf parse-events: Add new 'fake_tp' parameter for tests
perf parse: Allow tracepoint names to start with digits
Ethan Adams (1):
perf build: Fix out of tree build related to installation of sysreg-defs
He Zhe (1):
perf bench internals inject-build-id: Fix trap divide when collecting just one DSO
Howard Chu (1):
perf record: Fix comment misspellings
Ian Rogers (100):
libperf cpumap: Add any, empty and min helpers
libperf cpumap: Ensure empty cpumap is NULL from alloc
perf arm-spe/cs-etm: Directly iterate CPU maps
perf intel-pt/intel-bts: Switch perf_cpu_map__has_any_cpu_or_is_empty use
perf cpumap: Clean up use of perf_cpu_map__has_any_cpu_or_is_empty
perf arm64 header: Remove unnecessary CPU map get and put
perf stat: Remove duplicate cpus_map_matched function
perf cpumap: Use perf_cpu_map__for_each_cpu when possible
perf list: Add tracepoint encoding to detailed output
perf pmu: Drop "default_core" from alias names
perf list: Allow wordwrap to wrap on commas
perf list: Give more details about raw event encodings
perf tools: Use pmus to describe type from attribute
perf tools: Add/use PMU reverse lookup from config to name
perf record: Delete session after stopping sideband thread
perf test: Stat output per thread of just the parent process
perf test: Use a single fd for the child process out/err
perf test: Read child test 10 times a second rather than 1
perf tools: Suggest inbuilt commands for unknown command
perf help: Lower levenshtein penality for deleting character
perf tests: Run tests in parallel by default
perf vendor events intel: Update cascadelakex to 1.21
perf vendor events intel: Update emeraldrapids to 1.06
perf vendor events intel: Update grandridge to 1.02
perf vendor events intel: Update icelakex to 1.24
perf vendor events intel: Update lunarlake to 1.01
perf vendor events intel: Update meteorlake to 1.08
perf vendor events intel: Update sapphirerapids to 1.20
perf vendor events intel: Update sierraforest to 1.02
perf vendor events intel: Update skylakex to 1.33
perf vendor events intel: Update skylake to v58
perf vendor events intel: Update snowridgex to 1.22
perf vendor events intel: Remove info metrics erroneously in TopdownL1
perf dso: Reorder members to save space in 'struct dso'
tools subcmd: Add check_if_command_finished()
perf metrics: Remove the "No_group" metric group
perf dsos: Attempt to better abstract DSOs internals
perf dsos: Tidy reference counting and locking
perf dsos: Introduce dsos__for_each_dso()
perf dso: Move dso functions out of dsos.c
perf dsos: Switch more loops to dsos__for_each_dso()
perf list: Escape '\r' in JSON output
perf build: Add shellcheck to tools/perf scripts
perf arch x86: Add shellcheck to build
perf util: Add shellcheck to generate-cmdlist.sh
perf trace beauty: Add shellcheck to scripts
perf bench uprobe: Remove lib64 from libc.so.6 binary path
perf bench uprobe: Add uretprobe variant of uprobe benchmarks
perf docs: Document bpf event modifier
perf test bpf-counters: Add test for BPF event modifier
perf parse-events: Factor out '<event_or_pmu>/.../' parsing
perf parse-events: Directly pass PMU to parse_events_add_pmu()
perf parse-events: Avoid copying an empty list
perf pmu: Refactor perf_pmu__match()
perf tests parse-events: Use "branches" rather than "cache-references"
perf parse-events: Legacy cache names on all PMUs and lower priority
perf parse-events: Handle PE_TERM_HW in name_or_raw
perf parse-events: Constify parse_events_add_numeric
perf parse-events: Prefer sysfs/JSON hardware events over legacy
perf parse-events: Inline parse_events_update_lists
perf parse-events: Improve error message for bad numbers
perf parse-events: Inline parse_events_evlist_error
perf parse-events: Improvements to modifier parsing
perf parse-event: Constify event_symbol arrays
perf parse-events: Minor grouping tidy up
perf parse-events: Tidy the setting of the default event name
perf build: Pretend scandirat is missing with msan
perf test pmu-events: Make it clearer that pmu-events tests JSON events
perf Document: Sysfs event names must be lower or upper case
perf test pmu: Refactor format test and exposed test APIs
perf test pmu: Add an eagerly loaded event test
perf test pmu: Test all sysfs PMU event names are the same case
perf pmu: Assume sysfs events are always the same case
perf trace: Disable syscall augmentation with record
perf dsos: Switch backing storage to array from rbtree/list
perf dsos: Remove __dsos__addnew()
perf dsos: Remove __dsos__findnew_link_by_longname_id()
perf dsos: Switch hand crafted code to bsearch()
perf dso: Add reference count checking and accessor functions
perf map: Add missing dso__put() in map__new()
perf symbol-elf: Ensure dso__put() in machine__process_ksymbol_register()
perf symbol-elf: dso__load_sym_internal() reference count fixes
perf dso: Use container_of() to avoid a pointer in 'struct dso_data'
perf ui browser: Don't save pointer to stack memory
perf annotate: Fix memory leak in annotated_source
perf block-info: Remove unused refcount
perf cpumap: Remove refcnt from 'struct cpu_aggr_map'
perf comm: Add reference count checking to 'struct comm_str'
perf mem-info: Move mem-info out of mem-events and symbol
perf mem-info: Add reference count checking
perf hist: Avoid 'struct hist_entry_iter' mem_info memory leak
perf ui browser: Avoid SEGV on title
perf comm: Fix comm_str__put() for reference count checking
perf report: Avoid SEGV in report__setup_sample_type()
perf thread: Fixes to thread__new() related to initializing comm
perf tracepoint: Don't scan all tracepoints to test if one exists
perf lock: Avoid memory leaks from strdup()
libsubcmd: Fix parse-options memory leak
perf stat: Don't display metric header for non-leader uncore events
perf pmu: Count sys and cpuid JSON events separately
Ilkka Koskinen (1):
perf vendor events arm64: AmpereOne/AmpereOneX: Mark L1D_CACHE_INVAL impacted by errata
James Clark (16):
perf docs arm_spe: Clarify more SPE requirements related to KPTI
perf tests: Make "test data symbol" more robust on Neoverse N1
perf tests: Apply attributes to all events in object code reading test
perf map: Remove kernel map before updating start and end addresses
perf tests: Remove dependency on lscpu
perf test shell arm_coresight: Increase buffer size for Coresight basic tests
perf cs-etm: Use struct perf_cpu as much as possible
perf cs-etm: Remove repeated fetches of the ETM PMU
perf cs-etm: Improve version detection and error reporting
perf cs-etm: Print error for new PERF_RECORD_AUX_OUTPUT_HW_ID versions
perf auxtrace: Allow number of queues to be specified
perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT
perf symbols: Remove map from list before updating addresses
perf maps: Re-use __maps__free_maps_by_name()
perf symbols: Update kcore map before merging in remaining symbols
perf symbols: Fix ownership of string in dso__load_vmlinux()
Madadi Vineeth Reddy (1):
perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency
Namhyung Kim (74):
perf dwarf-aux: Remove unused pc argument
perf dwarf-aux: Add die_collect_vars()
perf dwarf-aux: Handle type transfer for memory access
perf dwarf-aux: Add die_find_func_rettype()
perf map: Add map__objdump_2rip()
perf annotate-data: Introduce 'struct data_loc_info'
perf annotate: Add annotate_get_basic_blocks()
perf annotate-data: Add debug messages
perf annotate-data: Maintain variable type info
perf annotate-data: Add update_insn_state()
perf annotate-data: Add get_global_var_type()
perf annotate-data: Handle global variable access
perf annotate-data: Handle call instructions
perf annotate-data: Implement instruction tracking
perf annotate-data: Check register state for type
perf annotate: Parse x86 segment register location
perf annotate-data: Handle this-cpu variables in kernel
perf annotate-data: Track instructions with a this-cpu variable
perf annotate-data: Support general per-cpu access
perf annotate-data: Handle ADD instructions
perf annotate-data: Add stack canary type
perf annotate-data: Add a cache for global variable types
perf annotate-data: Do not retry for invalid types
perf annotate: Get rid of duplicate --group option item
perf annotate: Honor output options with --data-type
perf annotate: Use ins__is_xxx() if possible
perf annotate: Add and use ins__is_nop()
perf annotate: Split out util/disasm.c
perf annotate: Use libcapstone to disassemble
perf annotate: Add symbol name when using capstone
perf annotate: Fix annotation_calc_lines() to pass correct address to get_srcline()
perf annotate: Staticize some local functions
perf annotate: Introduce annotated_source__get_line()
perf annotate: Check annotation lines more efficiently
perf annotate: Get rid of offsets array
perf annotate: Move 'widths' struct to 'struct annotated_source'
perf annotate: Move 'max_jump_sources' struct to 'struct annotated_source'
perf annotate: Move nr_events struct to 'struct annotated_source'
perf annotate: Move 'start' field struct to 'struct annotated_source'
perf annotate-data: Fix global variable lookup
perf annotate-data: Do not delete non-asm lines
perf annotate: Get rid of symbol__ensure_annotate()
perf annotate-data: Skip sample histogram for stack canary
perf annotate: Show progress of sample processing
perf annotate-data: Add hist_entry__annotate_data_tty()
perf annotate-data: Add hist_entry__annotate_data_tui()
perf annotate-data: Support event group display in TUI
perf report: Add a menu item to annotate data type in TUI
perf report: Do not collect sample histogram unnecessarily
perf annotate: Skip DSOs not found
perf annotate-data: Improve debug message with location info
perf dwarf-aux: Check pointer offset when checking variables
perf dwarf-aux: Check variable address range properly
perf annotate-data: Handle RSP if it's not the FB register
perf hist: Move histogram related code to hist.h
perf hist: Add weight fields to hist entry stats
perf report: Add weight[123] output fields
perf test: Add a new test for 'perf annotate'
perf annotate-data: Check if 'struct annotation_source' was allocated on 'perf report' TUI
perf annotate: Fallback disassemble to objdump when capstone fails
perf annotate: Update DSO binary type when trying build-id
perf annotate: Fix data type profiling on stdio
perf dwarf-aux: Add die_collect_global_vars()
perf annotate-data: Collect global variables in advance
perf annotate-data: Handle direct global variable access
perf annotate-data: Check memory access with two registers
perf annotate-data: Handle multi regs in find_data_type_block()
perf annotate-data: Check kind of stack variables
perf maps: Remove check_invariants() from maps__lock()
perf dwarf-aux: Print array type name with "[]"
perf tools: Ignore deleted cgroups
perf annotate: Fix segfault on sample histogram
perf annotate-data: Ensure the number of type histograms
tools lib subcmd: Show parent options in help
Samasth Norway Ananda (1):
perf daemon: Fix file leak in daemon_session__control
Sandipan Das (4):
perf vendor events amd: Add Zen 5 core events
perf vendor events amd: Add Zen 5 uncore events
perf vendor events amd: Add Zen 5 metrics
perf vendor events amd: Add Zen 5 mapping
Thomas Richter (2):
perf report: Fix PAI counter names for s390 virtual machines
perf stat: Do not fail on metrics on s390 z/VM systems
Weilin Wang (1):
perf stat: Add new field in stat_config to enable hardware aware grouping
Yang Jihong (4):
perf sched timehist: Fix -g/--call-graph option failure
perf evsel: Use evsel__name_is() helper
perf beauty: Fix AT_EACCESS undeclared build error for system with kernel versions lower than v5.8
perf build: Add LIBTRACEEVENT_DIR build option
.../testing/sysfs-bus-event_source-devices-events | 6 +
MAINTAINERS | 1 +
tools/arch/x86/include/asm/cpufeatures.h | 7 +-
tools/arch/x86/include/asm/msr-index.h | 9 +-
tools/include/linux/bits.h | 8 +-
tools/include/linux/rbtree_augmented.h | 4 +-
tools/include/uapi/asm-generic/bitsperlong.h | 4 +
tools/include/uapi/asm-generic/fcntl.h | 221 --
tools/include/uapi/linux/bits.h | 15 +
tools/include/uapi/linux/openat2.h | 43 -
tools/lib/perf/cpumap.c | 33 +-
tools/lib/perf/include/perf/cpumap.h | 16 +
tools/lib/perf/libperf.map | 4 +
tools/lib/perf/mmap.c | 2 +-
tools/lib/rbtree.c | 2 +-
tools/lib/subcmd/parse-options.c | 44 +-
tools/lib/subcmd/run-command.c | 70 +-
tools/lib/subcmd/run-command.h | 3 +
tools/perf/Build | 14 +
tools/perf/Documentation/perf-arm-spe.txt | 12 +-
tools/perf/Documentation/perf-list.txt | 1 +
tools/perf/Documentation/perf-report.txt | 9 +-
tools/perf/Documentation/perf-sched.txt | 36 +
tools/perf/Documentation/perf-script.txt | 7 +-
tools/perf/Documentation/perf-test.txt | 13 +-
tools/perf/Makefile.config | 25 +-
tools/perf/Makefile.perf | 100 +-
tools/perf/arch/arm/util/cs-etm.c | 381 ++--
tools/perf/arch/arm64/util/arm-spe.c | 4 +-
tools/perf/arch/arm64/util/header.c | 13 +-
tools/perf/arch/x86/Build | 14 +
tools/perf/arch/x86/tests/Build | 14 +
tools/perf/arch/x86/tests/gen-insn-x86-dat.sh | 2 +-
tools/perf/arch/x86/util/intel-bts.c | 4 +-
tools/perf/arch/x86/util/intel-pt.c | 10 +-
tools/perf/bench/bench.h | 2 +
tools/perf/bench/inject-buildid.c | 2 +-
tools/perf/bench/uprobe.c | 22 +-
tools/perf/builtin-annotate.c | 128 +-
tools/perf/builtin-bench.c | 2 +
tools/perf/builtin-buildid-cache.c | 2 +-
tools/perf/builtin-buildid-list.c | 18 +-
tools/perf/builtin-c2c.c | 21 +-
tools/perf/builtin-daemon.c | 4 +-
tools/perf/builtin-inject.c | 96 +-
tools/perf/builtin-kallsyms.c | 2 +-
tools/perf/builtin-kmem.c | 2 +-
tools/perf/builtin-kwork.c | 2 +-
tools/perf/builtin-list.c | 24 +-
tools/perf/builtin-lock.c | 18 +-
tools/perf/builtin-mem.c | 4 +-
tools/perf/builtin-probe.c | 2 +-
tools/perf/builtin-record.c | 14 +-
tools/perf/builtin-report.c | 18 +-
tools/perf/builtin-sched.c | 13 +-
tools/perf/builtin-script.c | 86 +-
tools/perf/builtin-stat.c | 52 +-
tools/perf/builtin-top.c | 4 +-
tools/perf/builtin-trace.c | 35 +-
tools/perf/builtin.h | 4 +-
tools/perf/check-headers.sh | 23 +-
tools/perf/perf-archive.sh | 2 +-
tools/perf/perf-completion.sh | 23 +-
tools/perf/perf.c | 23 +-
.../arch/arm64/ampere/ampereone/cache.json | 4 +-
.../arch/arm64/ampere/ampereonex/cache.json | 4 +-
.../pmu-events/arch/s390/cf_z16/transaction.json | 28 +-
tools/perf/pmu-events/arch/s390/mapfile.csv | 2 +-
.../arch/x86/amdzen5/branch-prediction.json | 93 +
tools/perf/pmu-events/arch/x86/amdzen5/decode.json | 115 +
.../pmu-events/arch/x86/amdzen5/execution.json | 174 ++
.../arch/x86/amdzen5/floating-point.json | 812 +++++++
.../pmu-events/arch/x86/amdzen5/inst-cache.json | 72 +
.../perf/pmu-events/arch/x86/amdzen5/l2-cache.json | 266 +++
.../perf/pmu-events/arch/x86/amdzen5/l3-cache.json | 177 ++
.../pmu-events/arch/x86/amdzen5/load-store.json | 451 ++++
.../arch/x86/amdzen5/memory-controller.json | 101 +
.../perf/pmu-events/arch/x86/amdzen5/pipeline.json | 99 +
.../pmu-events/arch/x86/amdzen5/recommended.json | 345 +++
.../arch/x86/broadwellx/bdx-metrics.json | 35 +-
.../arch/x86/cascadelakex/clx-metrics.json | 85 +-
.../pmu-events/arch/x86/cascadelakex/frontend.json | 10 +-
.../pmu-events/arch/x86/cascadelakex/memory.json | 2 +-
.../pmu-events/arch/x86/cascadelakex/other.json | 2 +-
.../pmu-events/arch/x86/cascadelakex/pipeline.json | 2 +-
.../arch/x86/cascadelakex/uncore-interconnect.json | 14 +-
.../arch/x86/cascadelakex/virtual-memory.json | 2 +-
.../arch/x86/emeraldrapids/frontend.json | 2 +-
.../pmu-events/arch/x86/emeraldrapids/memory.json | 1 +
.../arch/x86/emeraldrapids/pipeline.json | 3 +
.../arch/x86/emeraldrapids/uncore-cache.json | 112 +-
.../x86/emeraldrapids/uncore-interconnect.json | 26 +-
.../pmu-events/arch/x86/grandridge/pipeline.json | 43 +-
.../arch/x86/grandridge/uncore-cache.json | 28 +-
.../pmu-events/arch/x86/haswellx/hsx-metrics.json | 35 +-
.../pmu-events/arch/x86/icelakex/frontend.json | 2 +-
.../pmu-events/arch/x86/icelakex/icx-metrics.json | 95 +-
.../perf/pmu-events/arch/x86/icelakex/memory.json | 1 +
.../pmu-events/arch/x86/icelakex/uncore-cache.json | 22 +-
.../arch/x86/icelakex/uncore-interconnect.json | 64 +-
.../pmu-events/arch/x86/icelakex/uncore-io.json | 11 -
.../perf/pmu-events/arch/x86/lunarlake/cache.json | 24 +-
.../pmu-events/arch/x86/lunarlake/frontend.json | 2 +-
.../perf/pmu-events/arch/x86/lunarlake/memory.json | 4 +-
.../perf/pmu-events/arch/x86/lunarlake/other.json | 4 +-
.../pmu-events/arch/x86/lunarlake/pipeline.json | 109 +-
tools/perf/pmu-events/arch/x86/mapfile.csv | 21 +-
.../perf/pmu-events/arch/x86/meteorlake/cache.json | 30 +
.../pmu-events/arch/x86/meteorlake/frontend.json | 4 +-
.../pmu-events/arch/x86/meteorlake/memory.json | 20 +
.../perf/pmu-events/arch/x86/meteorlake/other.json | 42 +-
.../pmu-events/arch/x86/meteorlake/pipeline.json | 44 +-
.../arch/x86/meteorlake/uncore-interconnect.json | 22 +-
.../pmu-events/arch/x86/sapphirerapids/cache.json | 1 +
.../arch/x86/sapphirerapids/frontend.json | 2 +-
.../pmu-events/arch/x86/sapphirerapids/memory.json | 1 +
.../arch/x86/sapphirerapids/pipeline.json | 19 +-
.../arch/x86/sapphirerapids/spr-metrics.json | 119 +-
.../arch/x86/sapphirerapids/uncore-cache.json | 112 +-
.../x86/sapphirerapids/uncore-interconnect.json | 26 +-
.../pmu-events/arch/x86/sierraforest/pipeline.json | 36 +-
.../perf/pmu-events/arch/x86/skylake/frontend.json | 10 +-
tools/perf/pmu-events/arch/x86/skylakex/cache.json | 9 +
.../pmu-events/arch/x86/skylakex/frontend.json | 10 +-
.../perf/pmu-events/arch/x86/skylakex/memory.json | 2 +-
tools/perf/pmu-events/arch/x86/skylakex/other.json | 2 +-
.../pmu-events/arch/x86/skylakex/pipeline.json | 2 +-
.../pmu-events/arch/x86/skylakex/skx-metrics.json | 85 +-
.../arch/x86/skylakex/uncore-interconnect.json | 14 +-
.../pmu-events/arch/x86/skylakex/uncore-io.json | 2 +-
.../arch/x86/skylakex/virtual-memory.json | 2 +-
.../arch/x86/snowridgex/uncore-cache.json | 4 +-
.../arch/x86/snowridgex/uncore-interconnect.json | 6 +-
.../pmu-events/arch/x86/snowridgex/uncore-io.json | 11 -
tools/perf/scripts/python/parallel-perf.py | 988 ++++++++
tools/perf/tests/bitmap.c | 13 +-
tools/perf/tests/builtin-test.c | 62 +-
tools/perf/tests/code-reading.c | 18 +-
tools/perf/tests/config-fragments/config | 3 +
tools/perf/tests/dso-data.c | 67 +-
tools/perf/tests/evsel-roundtrip-name.c | 4 +-
tools/perf/tests/hists_common.c | 6 +-
tools/perf/tests/hists_cumulate.c | 4 +-
tools/perf/tests/hists_output.c | 2 +-
tools/perf/tests/maps.c | 4 +-
tools/perf/tests/mem.c | 11 +-
tools/perf/tests/parse-events.c | 58 +-
tools/perf/tests/pmu-events.c | 4 +-
tools/perf/tests/pmu.c | 467 ++--
tools/perf/tests/shell/annotate.sh | 83 +
.../tests/shell/base_probe/test_adding_kernel.sh | 1 +
tools/perf/tests/shell/lib/stat_output.sh | 2 +-
tools/perf/tests/shell/script.sh | 26 +-
tools/perf/tests/shell/stat+json_output.sh | 2 +-
tools/perf/tests/shell/stat_bpf_counters.sh | 75 +-
tools/perf/tests/shell/test_arm_callgraph_fp.sh | 4 +-
tools/perf/tests/shell/test_arm_coresight.sh | 2 +-
tools/perf/tests/symbols.c | 8 +-
tools/perf/tests/topology.c | 46 +-
tools/perf/tests/vmlinux-kallsyms.c | 6 +-
tools/perf/tests/workloads/datasym.c | 16 +
tools/perf/trace/beauty/Build | 15 +
.../beauty}/arch/x86/include/asm/irq_vectors.h | 0
.../beauty}/arch/x86/include/uapi/asm/prctl.h | 0
tools/perf/trace/beauty/arch_errno_names.sh | 8 +-
tools/perf/trace/beauty/beauty.h | 7 +-
tools/perf/trace/beauty/clone.c | 46 +-
tools/perf/trace/beauty/clone.sh | 17 +
tools/perf/trace/beauty/fcntl.c | 2 +-
tools/perf/trace/beauty/flock.c | 2 +-
tools/perf/trace/beauty/fs_at_flags.c | 58 +
tools/perf/trace/beauty/fs_at_flags.sh | 21 +
tools/perf/trace/beauty/fsconfig.sh | 6 +-
tools/perf/trace/beauty/fsmount.c | 9 +-
tools/perf/trace/beauty/fsmount.sh | 6 +-
tools/perf/trace/beauty/fspick.sh | 6 +-
.../trace/beauty}/include/uapi/linux/fcntl.h | 0
.../trace/beauty}/include/uapi/linux/fs.h | 0
.../trace/beauty}/include/uapi/linux/mount.h | 0
.../trace/beauty}/include/uapi/linux/prctl.h | 0
.../trace/beauty}/include/uapi/linux/sched.h | 0
tools/perf/trace/beauty/include/uapi/linux/stat.h | 195 ++
.../beauty}/include/uapi/linux/usbdevice_fs.h | 0
.../trace/beauty}/include/uapi/linux/vhost.h | 20 +-
.../trace/beauty}/include/uapi/sound/asound.h | 0
tools/perf/trace/beauty/mount_flags.sh | 6 +-
tools/perf/trace/beauty/move_mount_flags.sh | 6 +-
tools/perf/trace/beauty/prctl.c | 2 +-
tools/perf/trace/beauty/prctl_option.sh | 6 +-
tools/perf/trace/beauty/rename_flags.sh | 2 +-
tools/perf/trace/beauty/sndrv_ctl_ioctl.sh | 4 +-
tools/perf/trace/beauty/sndrv_pcm_ioctl.sh | 4 +-
tools/perf/trace/beauty/statx.c | 67 +-
tools/perf/trace/beauty/statx_mask.sh | 23 +
tools/perf/trace/beauty/sync_file_range.c | 11 +-
tools/perf/trace/beauty/sync_file_range.sh | 2 +-
.../trace/beauty/tracepoints/x86_irq_vectors.sh | 6 +-
tools/perf/trace/beauty/usbdevfs_ioctl.sh | 6 +-
tools/perf/trace/beauty/vhost_virtio_ioctl.sh | 6 +-
tools/perf/trace/beauty/x86_arch_prctl.sh | 4 +-
tools/perf/ui/browser.c | 6 +-
tools/perf/ui/browser.h | 2 +-
tools/perf/ui/browsers/Build | 1 +
tools/perf/ui/browsers/annotate-data.c | 313 +++
tools/perf/ui/browsers/annotate.c | 21 +-
tools/perf/ui/browsers/hists.c | 39 +-
tools/perf/ui/browsers/map.c | 4 +-
tools/perf/ui/hist.c | 92 +-
tools/perf/util/Build | 16 +
tools/perf/util/annotate-data.c | 1648 +++++++++++++-
tools/perf/util/annotate-data.h | 74 +-
tools/perf/util/annotate.c | 2378 +++++---------------
tools/perf/util/annotate.h | 129 +-
tools/perf/util/auxtrace.c | 19 +-
tools/perf/util/auxtrace.h | 1 +
tools/perf/util/block-info.c | 24 +-
tools/perf/util/block-info.h | 15 +-
tools/perf/util/bpf-event.c | 8 +-
tools/perf/util/bpf_counter_cgroup.c | 5 +-
tools/perf/util/bpf_kwork.c | 16 +-
tools/perf/util/bpf_kwork_top.c | 12 +-
.../util/bpf_skel/augmented_raw_syscalls.bpf.c | 21 +
tools/perf/util/bpf_skel/bench_uprobe.bpf.c | 16 +
tools/perf/util/build-id.c | 136 +-
tools/perf/util/build-id.h | 2 -
tools/perf/util/callchain.c | 4 +-
tools/perf/util/cgroup.c | 4 +-
tools/perf/util/comm.c | 207 +-
tools/perf/util/cpumap.c | 14 +-
tools/perf/util/cpumap.h | 2 -
tools/perf/util/cs-etm.c | 5 +-
tools/perf/util/data-convert-json.c | 2 +-
tools/perf/util/db-export.c | 6 +-
tools/perf/util/debug.c | 3 +
tools/perf/util/debug.h | 1 +
tools/perf/util/disasm.c | 1837 +++++++++++++++
tools/perf/util/disasm.h | 112 +
tools/perf/util/dlfilter.c | 12 +-
tools/perf/util/dso.c | 472 ++--
tools/perf/util/dso.h | 560 ++++-
tools/perf/util/dsos.c | 529 +++--
tools/perf/util/dsos.h | 40 +-
tools/perf/util/dump-insn.h | 1 +
tools/perf/util/dwarf-aux.c | 442 +++-
tools/perf/util/dwarf-aux.h | 41 +-
tools/perf/util/event.c | 8 +-
tools/perf/util/evlist.c | 3 +-
tools/perf/util/evsel.c | 20 +-
tools/perf/util/evsel.h | 4 +-
tools/perf/util/genelf.h | 3 +
tools/perf/util/header.c | 8 +-
tools/perf/util/help-unknown-cmd.c | 51 +-
tools/perf/util/hist.c | 78 +-
tools/perf/util/hist.h | 217 +-
.../perf/util/intel-pt-decoder/intel-pt-decoder.c | 2 +
tools/perf/util/intel-pt.c | 24 +-
tools/perf/util/machine.c | 227 +-
tools/perf/util/machine.h | 4 +-
tools/perf/util/map.c | 91 +-
tools/perf/util/map.h | 3 +
tools/perf/util/maps.c | 44 +-
tools/perf/util/mem-events.c | 36 +-
tools/perf/util/mem-events.h | 29 +-
tools/perf/util/mem-info.c | 35 +
tools/perf/util/mem-info.h | 54 +
tools/perf/util/metricgroup.c | 10 +-
tools/perf/util/metricgroup.h | 1 +
tools/perf/util/parse-events.c | 522 ++---
tools/perf/util/parse-events.h | 60 +-
tools/perf/util/parse-events.l | 200 +-
tools/perf/util/parse-events.y | 263 +--
tools/perf/util/perf_event_attr_fprintf.c | 26 +-
tools/perf/util/pmu.c | 291 ++-
tools/perf/util/pmu.h | 16 +-
tools/perf/util/pmus.c | 110 +-
tools/perf/util/pmus.h | 3 +
tools/perf/util/print-events.c | 55 +-
tools/perf/util/print_insn.c | 75 +-
tools/perf/util/print_insn.h | 8 +-
tools/perf/util/probe-event.c | 32 +-
tools/perf/util/python.c | 10 +
tools/perf/util/record.c | 2 +-
.../perf/util/scripting-engines/trace-event-perl.c | 6 +-
.../util/scripting-engines/trace-event-python.c | 45 +-
tools/perf/util/session.c | 26 +-
tools/perf/util/session.h | 2 +
tools/perf/util/sort.c | 116 +-
tools/perf/util/sort.h | 190 +-
tools/perf/util/srcline.c | 65 +-
tools/perf/util/stat-display.c | 3 +
tools/perf/util/stat.c | 2 +-
tools/perf/util/stat.h | 1 +
tools/perf/util/svghelper.c | 20 +-
tools/perf/util/symbol-elf.c | 145 +-
tools/perf/util/symbol-minimal.c | 4 +-
tools/perf/util/symbol.c | 261 +--
tools/perf/util/symbol.h | 12 -
tools/perf/util/symbol_fprintf.c | 4 +-
tools/perf/util/synthetic-events.c | 24 +-
tools/perf/util/thread.c | 18 +-
tools/perf/util/tracepoint.c | 56 +-
tools/perf/util/tracepoint.h | 3 +-
tools/perf/util/unwind-libunwind-local.c | 18 +-
tools/perf/util/unwind-libunwind.c | 2 +-
tools/perf/util/values.h | 1 +
tools/perf/util/vdso.c | 56 +-
306 files changed, 15214 insertions(+), 6278 deletions(-)
delete mode 100644 tools/include/uapi/asm-generic/fcntl.h
create mode 100644 tools/include/uapi/linux/bits.h
delete mode 100644 tools/include/uapi/linux/openat2.h
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen5/branch-prediction.json
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen5/decode.json
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen5/execution.json
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen5/floating-point.json
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen5/l2-cache.json
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen5/l3-cache.json
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen5/load-store.json
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen5/memory-controller.json
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen5/pipeline.json
create mode 100644 tools/perf/pmu-events/arch/x86/amdzen5/recommended.json
create mode 100755 tools/perf/scripts/python/parallel-perf.py
create mode 100755 tools/perf/tests/shell/annotate.sh
rename tools/{ => perf/trace/beauty}/arch/x86/include/asm/irq_vectors.h (100%)
rename tools/{ => perf/trace/beauty}/arch/x86/include/uapi/asm/prctl.h (100%)
create mode 100755 tools/perf/trace/beauty/clone.sh
create mode 100644 tools/perf/trace/beauty/fs_at_flags.c
create mode 100755 tools/perf/trace/beauty/fs_at_flags.sh
rename tools/{ => perf/trace/beauty}/include/uapi/linux/fcntl.h (100%)
rename tools/{ => perf/trace/beauty}/include/uapi/linux/fs.h (100%)
rename tools/{ => perf/trace/beauty}/include/uapi/linux/mount.h (100%)
rename tools/{ => perf/trace/beauty}/include/uapi/linux/prctl.h (100%)
rename tools/{ => perf/trace/beauty}/include/uapi/linux/sched.h (100%)
create mode 100644 tools/perf/trace/beauty/include/uapi/linux/stat.h
rename tools/{ => perf/trace/beauty}/include/uapi/linux/usbdevice_fs.h (100%)
rename tools/{ => perf/trace/beauty}/include/uapi/linux/vhost.h (97%)
rename tools/{ => perf/trace/beauty}/include/uapi/sound/asound.h (100%)
create mode 100755 tools/perf/trace/beauty/statx_mask.sh
create mode 100644 tools/perf/ui/browsers/annotate-data.c
create mode 100644 tools/perf/util/disasm.c
create mode 100644 tools/perf/util/disasm.h
create mode 100644 tools/perf/util/mem-info.c
create mode 100644 tools/perf/util/mem-info.h
The pull request you sent on Tue, 21 May 2024 16:26:14 -0300:
> git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools.git tags/perf-tools-for-v6.10-1-2024-05-21
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/29c73fc794c83505066ee6db893b2a83ac5fac63
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
On Tue, 21 May 2024 at 12:26, Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> perf tools fixes and improvements for v6.10:
This actually broke 'perf' completely for me on arm64.
With a 6.9 version of 'perf', I can do this:
perf record -e cycles:pp make -j199
and it all works fine.
With the current -git version, when I do the same, I instead get
Error:
cycles:pp: PMU Hardware doesn't support
sampling/overflow-interrupts. Try 'perf stat'
and after trying desperately to chase down what went wrong on the
kernel side, I finally figured out that it wasn't a kernel change at
all, it was the tooling that had changed.
I did a 'git bisect', and it says
617824a7f0f73e4de325cf8add58e55b28c12493 is the first bad commit
commit 617824a7f0f73e4de325cf8add58e55b28c12493
Author: Ian Rogers <[email protected]>
Date: Mon Apr 15 23:15:25 2024 -0700
perf parse-events: Prefer sysfs/JSON hardware events over legacy
and very clearly this does *NOT* work at all for me.
I didn't notice until now, simply because I had been busy with the
merge window, so I hadn't been doing any profiles, but the merge
window is calming down and the end is nigh, and I just wasted more
time than I care to admit trying to figure out what went wrong in the
kernel.
And no, I don't speak JSON, and I have *no* idea what the legacy
events are. Plus I'm not very familiar with the arm64 profiling etc
anyway, so I'm just a clueless user here.
I *can* confirm that just reverting that commit makes that trivial
"perf record" work for me. So the bisect was good, and it reverts
cleanly, but I don't know _why_ my arm64 setup hates it so much.
Linus
On Fri, May 24, 2024 at 06:31:52PM -0700, Linus Torvalds wrote:
> On Tue, 21 May 2024 at 12:26, Arnaldo Carvalho de Melo <[email protected]> wrote:
> >
> > perf tools fixes and improvements for v6.10:
>
> This actually broke 'perf' completely for me on arm64.
>
> With a 6.9 version of 'perf', I can do this:
>
> perf record -e cycles:pp make -j199
>
> and it all works fine.
>
> With the current -git version, when I do the same, I instead get
>
> Error:
> cycles:pp: PMU Hardware doesn't support
> sampling/overflow-interrupts. Try 'perf stat'
>
> and after trying desperately to chase down what went wrong on the
> kernel side, I finally figured out that it wasn't a kernel change at
> all, it was the tooling that had changed.
>
> I did a 'git bisect', and it says
>
> 617824a7f0f73e4de325cf8add58e55b28c12493 is the first bad commit
> commit 617824a7f0f73e4de325cf8add58e55b28c12493
> Author: Ian Rogers <[email protected]>
> Date: Mon Apr 15 23:15:25 2024 -0700
>
> perf parse-events: Prefer sysfs/JSON hardware events over legacy
>
> and very clearly this does *NOT* work at all for me.
>
> I didn't notice until now, simply because I had been busy with the
> merge window, so I hadn't been doing any profiles, but the merge
> window is calming down and the end is nigh, and I just wasted more
> time than I care to admit trying to figure out what went wrong in the
> kernel.
>
> And no, I don't speak JSON, and I have *no* idea what the legacy
> events are. Plus I'm not very familiar with the arm64 profiling etc
> anyway, so I'm just a clueless user here.
>
> I *can* confirm that just reverting that commit makes that trivial
> "perf record" work for me. So the bisect was good, and it reverts
> cleanly, but I don't know _why_ my arm64 setup hates it so much.
That is a good data point, we probably could go with the revert, but I
think Ian submitted a few patches fixing this issue that came up close
to LSFMM/BPF and the merge window, so didn't have time to sit on
linux-next for a while, I'm looking those up now.
ARM64 eyes on this would also be good. Adding Mark Rutland and Leo Yan
to the CC list, maybe they can help us here with the best course of
action.
- Arnaldo
On Fri, May 24, 2024 at 10:55:11PM -0300, Arnaldo Carvalho de Melo wrote:
> On Fri, May 24, 2024 at 06:31:52PM -0700, Linus Torvalds wrote:
> > On Tue, 21 May 2024 at 12:26, Arnaldo Carvalho de Melo <[email protected]> wrote:
> > >
> > > perf tools fixes and improvements for v6.10:
> >
> > This actually broke 'perf' completely for me on arm64.
> >
> > With a 6.9 version of 'perf', I can do this:
> >
> > perf record -e cycles:pp make -j199
> >
> > and it all works fine.
> >
> > With the current -git version, when I do the same, I instead get
> >
> > Error:
> > cycles:pp: PMU Hardware doesn't support
> > sampling/overflow-interrupts. Try 'perf stat'
> >
> > and after trying desperately to chase down what went wrong on the
> > kernel side, I finally figured out that it wasn't a kernel change at
> > all, it was the tooling that had changed.
> >
> > I did a 'git bisect', and it says
> >
> > 617824a7f0f73e4de325cf8add58e55b28c12493 is the first bad commit
> > commit 617824a7f0f73e4de325cf8add58e55b28c12493
> > Author: Ian Rogers <[email protected]>
> > Date: Mon Apr 15 23:15:25 2024 -0700
> >
> > perf parse-events: Prefer sysfs/JSON hardware events over legacy
> >
> > and very clearly this does *NOT* work at all for me.
> >
> > I didn't notice until now, simply because I had been busy with the
> > merge window, so I hadn't been doing any profiles, but the merge
> > window is calming down and the end is nigh, and I just wasted more
> > time than I care to admit trying to figure out what went wrong in the
> > kernel.
> >
> > And no, I don't speak JSON, and I have *no* idea what the legacy
> > events are. Plus I'm not very familiar with the arm64 profiling etc
> > anyway, so I'm just a clueless user here.
> >
> > I *can* confirm that just reverting that commit makes that trivial
> > "perf record" work for me. So the bisect was good, and it reverts
> > cleanly, but I don't know _why_ my arm64 setup hates it so much.
> That is a good data point, we probably could go with the revert, but I
> think Ian submitted a few patches fixing this issue that came up close
> to LSFMM/BPF and the merge window, so didn't have time to sit on
> linux-next for a while, I'm looking those up now.
Couldn't find it quickly, its late here, perhaps Ian can chime in and
point those fixes here. I'll try and continue tomorrow.
- Arnaldo
> ARM64 eyes on this would also be good. Adding Mark Rutland and Leo Yan
> to the CC list, maybe they can help us here with the best course of
> action.
On Fri, May 24, 2024 at 7:02 PM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> On Fri, May 24, 2024 at 10:55:11PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Fri, May 24, 2024 at 06:31:52PM -0700, Linus Torvalds wrote:
> > > On Tue, 21 May 2024 at 12:26, Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > >
> > > > perf tools fixes and improvements for v6.10:
> > >
> > > This actually broke 'perf' completely for me on arm64.
> > >
> > > With a 6.9 version of 'perf', I can do this:
> > >
> > > perf record -e cycles:pp make -j199
> > >
> > > and it all works fine.
> > >
> > > With the current -git version, when I do the same, I instead get
> > >
> > > Error:
> > > cycles:pp: PMU Hardware doesn't support
> > > sampling/overflow-interrupts. Try 'perf stat'
> > >
> > > and after trying desperately to chase down what went wrong on the
> > > kernel side, I finally figured out that it wasn't a kernel change at
> > > all, it was the tooling that had changed.
> > >
> > > I did a 'git bisect', and it says
> > >
> > > 617824a7f0f73e4de325cf8add58e55b28c12493 is the first bad commit
> > > commit 617824a7f0f73e4de325cf8add58e55b28c12493
> > > Author: Ian Rogers <[email protected]>
> > > Date: Mon Apr 15 23:15:25 2024 -0700
> > >
> > > perf parse-events: Prefer sysfs/JSON hardware events over legacy
> > >
> > > and very clearly this does *NOT* work at all for me.
> > >
> > > I didn't notice until now, simply because I had been busy with the
> > > merge window, so I hadn't been doing any profiles, but the merge
> > > window is calming down and the end is nigh, and I just wasted more
> > > time than I care to admit trying to figure out what went wrong in the
> > > kernel.
> > >
> > > And no, I don't speak JSON, and I have *no* idea what the legacy
> > > events are. Plus I'm not very familiar with the arm64 profiling etc
> > > anyway, so I'm just a clueless user here.
> > >
> > > I *can* confirm that just reverting that commit makes that trivial
> > > "perf record" work for me. So the bisect was good, and it reverts
> > > cleanly, but I don't know _why_ my arm64 setup hates it so much.
Thanks for the report. TL;DR try putting the PMU name with the event
name, so "cycles:pp" becomes "armv8_pmuv3_0/cycles/pp", where
armv8_pmuv3_0 is the name of the PMU from /sys/devices.
The perf tool has a number of inbuilt "legacy" event names, cycles is
one. Most events these days are either found in sysfs (this is on a
Raspberry Pi 5):
```
$ ls /sys/devices/armv8_cortex_a76/events/
br_mis_pred exc_return l1d_tlb_refill l2d_tlb
remote_access
br_mis_pred_retired exc_taken l1i_cache
l2d_tlb_refill stall_backend
br_pred inst_retired l1i_cache_refill l3d_cache
stall_frontend
br_retired inst_spec l1i_tlb
l3d_cache_allocate sw_incr
bus_access itlb_walk l1i_tlb_refill
l3d_cache_refill ttbr_write_retired
bus_cycles l1d_cache l2d_cache ll_cache_miss_rd
cid_write_retired l1d_cache_refill l2d_cache_allocate ll_cache_rd
cpu_cycles l1d_cache_wb l2d_cache_refill mem_access
dtlb_walk l1d_tlb l2d_cache_wb memory_error
```
or json that is turned into tables built into the perf tool.
ARM requested that sysfs and json events take priority over legacy
events. So 'armv8_cortex_a76/cycles/' should first try to open an
event in either sysfs or json, and if not present fallback on using
the legacy constants. Note that the event name has the PMU name first.
What should the behavior of the 'cycles' event with no PMU be? In
Linux 6.9 the behavior was that cycles without a PMU would be a legacy
encoding and only try to be on the core's PMU (generally cpu on Intel
or armv8... on ARM), but with a PMU it would prefer sysfs and json
tables.
RISC-V had asked that in the no PMU case they'd also like sysfs/json
to have priority so the driver could be more ignorant of event
encodings. It was also inconsistent that in Linux 6.9:
$ perf stat -e inst_retired.any true
was a sysfs/json event that we'd try to open on every PMU but:
$ perf stat -e instructions true
was a legacy event that would only be opened on the core PMU. (I'm
ignoring the complexity that BIG.little/hybrid adds).
The blamed patch does away with the inconsistency and makes it so that
legacy events are always the 2nd choice and we try to open every event
on every PMU. It is the 2nd point that I think is breaking you. I
think you have a PMU on your system with a cycles event, maybe an
uncore PMU with memory controller data or CXL, and the perf record is
trying to open the cycles event on that and the error message
correctly reports that sampling wouldn't be possible on that PMU.
Putting verbose flags on perf record:
$ perf record -vv ...
should hopefully give more breadcrumbs and confirm this. You could also do:
$ ls /sys/devices/*/events/cycles
If there are more than 1 sysfs cycles event then probably one of them
is the problem. Adding the PMU name removes the trying every PMU
behavior and so should be a fix. We could also change it so that when
we open multiple events for perf record we don't fail in a case like
this. Maybe it is a feature to fail.
Thanks,
Ian
> > That is a good data point, we probably could go with the revert, but I
> > think Ian submitted a few patches fixing this issue that came up close
> > to LSFMM/BPF and the merge window, so didn't have time to sit on
> > linux-next for a while, I'm looking those up now.
>
> Couldn't find it quickly, its late here, perhaps Ian can chime in and
> point those fixes here. I'll try and continue tomorrow.
>
> - Arnaldo
>
> > ARM64 eyes on this would also be good. Adding Mark Rutland and Leo Yan
> > to the CC list, maybe they can help us here with the best course of
> > action.
On Fri, 24 May 2024 at 20:48, Ian Rogers <[email protected]> wrote:
>
> Thanks for the report. TL;DR try putting the PMU name with the event
> name, so "cycles:pp" becomes "armv8_pmuv3_0/cycles/pp", where
> armv8_pmuv3_0 is the name of the PMU from /sys/devices.
What? No.
If that is the new rule, then I'm just going to revert. I'm not going
to use some random different PMU names across architectures when all I
want is just "cycles".
In fact, the "cycles:pp" is a complete red herring. Just doing a simple
$ perf record sleep 1
with no explicit expression at all, results in that same
Error:
cycles:P: PMU Hardware doesn't support sampling/overflow-interrupts.
Try 'perf stat'
because perf *ITSELF* uses the sane format by default.
So ABSOLUTELY NO. We're not changing the rules to "You have to know
some idiotic per-architecture JSON magic".
I don't want any excuses like this. No "You are holding it wrong".
This is a BUG. Treat it as such.
And yes, "perf record -vv" shows that it apparently picked some insane
arm_dsu_0 event. Which just shows that trying to parse the JSON rules
instead of just having sane defaults is clearly not working.
So here's the deal: I will revert your commit tomorrow unless you show
that you are taking it seriously and have a sane fix.
Because no, "You are holding it wrong" is not the answer. Never was,
never will be. Things used to work, they don't work any more. That
shit is not acceptable.
Linus
Hi guys,
On Fri, May 24, 2024 at 9:20 PM Linus Torvalds
<[email protected]> wrote:
>
> On Fri, 24 May 2024 at 20:48, Ian Rogers <[email protected]> wrote:
> >
> > Thanks for the report. TL;DR try putting the PMU name with the event
> > name, so "cycles:pp" becomes "armv8_pmuv3_0/cycles/pp", where
> > armv8_pmuv3_0 is the name of the PMU from /sys/devices.
>
> What? No.
>
> If that is the new rule, then I'm just going to revert. I'm not going
> to use some random different PMU names across architectures when all I
> want is just "cycles".
>
> In fact, the "cycles:pp" is a complete red herring. Just doing a simple
>
> $ perf record sleep 1
>
> with no explicit expression at all, results in that same
>
> Error:
> cycles:P: PMU Hardware doesn't support sampling/overflow-interrupts.
> Try 'perf stat'
>
> because perf *ITSELF* uses the sane format by default.
Yep, so I'm curious what makes it fail. IIUC the commit in question
would convert "cycles" event to "${whatever_pmu}/cycles/" if the pmu
has "events/cycles" alias in sysfs or in JSON. But it should work after
that too. :(
It seems my system doesn't have the alias (both in x86_64 and arm64)
at least in sysfs. I think that's why I don't see the failure and maybe
there's a specific hardware issue - like a M1 macbook issue? IIRC it
required the exclude_guest bit to be set, but I think we handled it already
so this must be a different issue.
The error message is for EOPNOTSUPP, and I don't think it will set any
special attributes for the default event. So I have no clue yet..
Thanks,
Namhyung
>
> So ABSOLUTELY NO. We're not changing the rules to "You have to know
> some idiotic per-architecture JSON magic".
>
> I don't want any excuses like this. No "You are holding it wrong".
> This is a BUG. Treat it as such.
>
> And yes, "perf record -vv" shows that it apparently picked some insane
> arm_dsu_0 event. Which just shows that trying to parse the JSON rules
> instead of just having sane defaults is clearly not working.
>
> So here's the deal: I will revert your commit tomorrow unless you show
> that you are taking it seriously and have a sane fix.
>
> Because no, "You are holding it wrong" is not the answer. Never was,
> never will be. Things used to work, they don't work any more. That
> shit is not acceptable.
>
> Linus
On Fri, May 24, 2024 at 9:20 PM Linus Torvalds
<[email protected]> wrote:
>
> On Fri, 24 May 2024 at 20:48, Ian Rogers <[email protected]> wrote:
> >
> > Thanks for the report. TL;DR try putting the PMU name with the event
> > name, so "cycles:pp" becomes "armv8_pmuv3_0/cycles/pp", where
> > armv8_pmuv3_0 is the name of the PMU from /sys/devices.
>
> What? No.
>
> If that is the new rule, then I'm just going to revert. I'm not going
> to use some random different PMU names across architectures when all I
> want is just "cycles".
The issue is that perf previously would do two things, if you asked
for a 'cycles' event then as the name was a legacy one it would encode
the type in the perf_event_attr as PERF_TYPE_HARDWARE and the config
as PERF_COUNT_HW_CPU_CYCLES job done. With BIG.little/hybrid/.. those
events now need opening on multiple PMUs otherwise you end up only
sampling on either the big or the little core. On Intel with hybrid,
cycles had to become cpu_core/cycles/ and cpu_atom/cycles/.
If we look at an event name that isn't a legacy one, like
inst_retired.any, then when no PMU is specified perf's behavior is to
try to open this event on every PMU it can. On Intel this event will
be found on the 'cpu' PMU, or on a hybrid Intel on cpu_core and
cpu_atom. Were the event 'data_read' then it could be found on the
uncore_imc_free_running_0 and uncore_imc_free_running_1 PMUs. My point
here is that wildcarding an event to every PMU that supports it is
arguably a feature as it avoids users needing to remember a specific
PMU name. PMU names are often duplicated and it would be laborious to
name them all.
> In fact, the "cycles:pp" is a complete red herring. Just doing a simple
>
> $ perf record sleep 1
>
> with no explicit expression at all, results in that same
>
> Error:
> cycles:P: PMU Hardware doesn't support sampling/overflow-interrupts.
> Try 'perf stat'
>
> because perf *ITSELF* uses the sane format by default.
>
> So ABSOLUTELY NO. We're not changing the rules to "You have to know
> some idiotic per-architecture JSON magic".
>
> I don't want any excuses like this. No "You are holding it wrong".
> This is a BUG. Treat it as such.
>
> And yes, "perf record -vv" shows that it apparently picked some insane
> arm_dsu_0 event. Which just shows that trying to parse the JSON rules
> instead of just having sane defaults is clearly not working.
>
> So here's the deal: I will revert your commit tomorrow unless you show
> that you are taking it seriously and have a sane fix.
>
> Because no, "You are holding it wrong" is not the answer. Never was,
> never will be. Things used to work, they don't work any more. That
> shit is not acceptable.
I wasn't trying to say you were holding it wrong, I was trying to give
you practical advice that would solve your problem whilst we determine
what the right fix is.
The question is what to do about events when no PMU is specified, here
are the alternatives I see:
1) make legacy event names special and only open them on core PMUs,
2) make sure PMUs, like arm_dsu_0 in your case, don't advertise events
matching legacy names like cycles,
3) make perf record only scan PMUs that support sampling,
4) in the case that we're using cycles:P as a default event, only open
that on core PMUs (on Intel make cycles:P ->
cpu_core/cycles/P,cpu_atom/cycles/P).
A revert achieves (1) and I consider it a regression as now the
behavior of 'cycles' doesn't match 'inst_retired.any' and the user
somehow needs to know that certain event names are special.
Changing the driver (2) is a workaround, but it feels like a burden on
the driver writers.
In order to determine whether a PMU supports sampling (3) we'd need to
try probe using perf_event_open. The perf_event_open may fail due to
things like host/guest permissions, etc. so we need fallbacks. We do
something similar here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/print-events.c#n242
(4) is the simplest change and I think it lowers the surprise from the
behavior change in the patch. It means that were you to do "perf
record -e cycles ..." you would still need to specify a PMU on Linus'
special ARM box. We may lack permissions to read sysfs and so then it
is hard to determine the PMU name, but the code was likely to fail
then anyway.
I'll write a patch to do (4), hopefully this qualifies as being
serious, but it would be useful if the interested parties could work
out what they think the behavior should be.
Thanks,
Ian
> Linus
>
On Sat, May 25, 2024 at 7:09 AM Ian Rogers <[email protected]> wrote:
>
> On Fri, May 24, 2024 at 9:20 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Fri, 24 May 2024 at 20:48, Ian Rogers <[email protected]> wrote:
> > >
> > > Thanks for the report. TL;DR try putting the PMU name with the event
> > > name, so "cycles:pp" becomes "armv8_pmuv3_0/cycles/pp", where
> > > armv8_pmuv3_0 is the name of the PMU from /sys/devices.
> >
> > What? No.
> >
> > If that is the new rule, then I'm just going to revert. I'm not going
> > to use some random different PMU names across architectures when all I
> > want is just "cycles".
>
> The issue is that perf previously would do two things, if you asked
> for a 'cycles' event then as the name was a legacy one it would encode
> the type in the perf_event_attr as PERF_TYPE_HARDWARE and the config
> as PERF_COUNT_HW_CPU_CYCLES job done. With BIG.little/hybrid/.. those
> events now need opening on multiple PMUs otherwise you end up only
> sampling on either the big or the little core. On Intel with hybrid,
> cycles had to become cpu_core/cycles/ and cpu_atom/cycles/.
>
> If we look at an event name that isn't a legacy one, like
> inst_retired.any, then when no PMU is specified perf's behavior is to
> try to open this event on every PMU it can. On Intel this event will
> be found on the 'cpu' PMU, or on a hybrid Intel on cpu_core and
> cpu_atom. Were the event 'data_read' then it could be found on the
> uncore_imc_free_running_0 and uncore_imc_free_running_1 PMUs. My point
> here is that wildcarding an event to every PMU that supports it is
> arguably a feature as it avoids users needing to remember a specific
> PMU name. PMU names are often duplicated and it would be laborious to
> name them all.
>
> > In fact, the "cycles:pp" is a complete red herring. Just doing a simple
> >
> > $ perf record sleep 1
> >
> > with no explicit expression at all, results in that same
> >
> > Error:
> > cycles:P: PMU Hardware doesn't support sampling/overflow-interrupts.
> > Try 'perf stat'
> >
> > because perf *ITSELF* uses the sane format by default.
> >
> > So ABSOLUTELY NO. We're not changing the rules to "You have to know
> > some idiotic per-architecture JSON magic".
> >
> > I don't want any excuses like this. No "You are holding it wrong".
> > This is a BUG. Treat it as such.
> >
> > And yes, "perf record -vv" shows that it apparently picked some insane
> > arm_dsu_0 event. Which just shows that trying to parse the JSON rules
> > instead of just having sane defaults is clearly not working.
> >
> > So here's the deal: I will revert your commit tomorrow unless you show
> > that you are taking it seriously and have a sane fix.
> >
> > Because no, "You are holding it wrong" is not the answer. Never was,
> > never will be. Things used to work, they don't work any more. That
> > shit is not acceptable.
>
> I wasn't trying to say you were holding it wrong, I was trying to give
> you practical advice that would solve your problem whilst we determine
> what the right fix is.
>
> The question is what to do about events when no PMU is specified, here
> are the alternatives I see:
>
> 1) make legacy event names special and only open them on core PMUs,
> 2) make sure PMUs, like arm_dsu_0 in your case, don't advertise events
> matching legacy names like cycles,
> 3) make perf record only scan PMUs that support sampling,
> 4) in the case that we're using cycles:P as a default event, only open
> that on core PMUs (on Intel make cycles:P ->
> cpu_core/cycles/P,cpu_atom/cycles/P).
>
> A revert achieves (1) and I consider it a regression as now the
> behavior of 'cycles' doesn't match 'inst_retired.any' and the user
> somehow needs to know that certain event names are special.
> Changing the driver (2) is a workaround, but it feels like a burden on
> the driver writers.
> In order to determine whether a PMU supports sampling (3) we'd need to
> try probe using perf_event_open. The perf_event_open may fail due to
> things like host/guest permissions, etc. so we need fallbacks. We do
> something similar here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/print-events.c#n242
> (4) is the simplest change and I think it lowers the surprise from the
> behavior change in the patch. It means that were you to do "perf
> record -e cycles ..." you would still need to specify a PMU on Linus'
> special ARM box. We may lack permissions to read sysfs and so then it
> is hard to determine the PMU name, but the code was likely to fail
> then anyway.
>
> I'll write a patch to do (4), hopefully this qualifies as being
> serious, but it would be useful if the interested parties could work
> out what they think the behavior should be.
Patch sent:
https://lore.kernel.org/lkml/[email protected]/
Thanks,
Ian
> Thanks,
> Ian
>
> > Linus
> >
On Sat, 25 May 2024 at 00:38, Namhyung Kim <[email protected]> wrote:
>
> Yep, so I'm curious what makes it fail. IIUC the commit in question
> would convert "cycles" event to "${whatever_pmu}/cycles/" if the pmu
> has "events/cycles" alias in sysfs or in JSON. But it should work after
> that too. :(
Well, having looked at it some more, I have the main CPU PMU:
/sys/bus/event_source/devices/armv8_pmuv3_0/
which has a 'cpu_cycles' event, and the cpumask covers all the 128
cores in this thing.
That's the thing the regular profiling *should* use:
$ ls events/
br_mis_pred l1d_cache_wb l3d_cache_allocate
br_mis_pred_retired l1d_tlb l3d_cache_refill
br_pred l1d_tlb_refill l3d_cache_wb
br_retired l1i_cache mem_access
bus_access l1i_cache_refill memory_error
bus_cycles l1i_tlb sample_collision
cid_write_retired l1i_tlb_refill sample_feed
cpu_cycles l2d_cache sample_filtrate
exc_return l2d_cache_allocate sample_pop
exc_taken l2d_cache_refill stall_backend
inst_retired l2d_cache_wb stall_frontend
inst_spec l2d_tlb ttbr_write_retired
l1d_cache l2d_tlb_refill
l1d_cache_refill l3d_cache
But the thing that seems to have confused the new parsing is that this
machine _also_ has
/sys/bus/event_source/devices/arm_dsu_{0..63}/
which all have a 'cycles' event:
$ ls events/
bus_access cycles l3d_cache_allocate l3d_cache_wb
bus_cycles l3d_cache l3d_cache_refill memory_error
but these things are basically some "each pair of cpus has some bridge
to the L3", which is why this 128-core machine has 64 of them. So each
of those have something like this:
$ cat cpumask
40
$ cat associated_cpus
40-41
I did not look any closer at what the 'cycles' there can actually
count, but clearly they can't be used for recording. And even if they
can, they shouldn't for the default "cycles".
> It seems my system doesn't have the alias (both in x86_64 and arm64)
> at least in sysfs. I think that's why I don't see the failure and maybe
> there's a specific hardware issue - like a M1 macbook issue? IIRC it
> required the exclude_guest bit to be set, but I think we handled it already
> so this must be a different issue.
This is my new Ampere system, which has basically replaced the M2
Macbook Air as my arm64 test platform, since it builds the kernel a
whole lot faster. That's what 128 cores will do...
Ian's patch at
https://lore.kernel.org/lkml/[email protected]/
makes things work for me again, but I get the feeling that the JSON
parsing is too fragile. It should have noticed that the 'cycles' event
it found wasn't useful for profiling, and gone on to the next one,
instead of just giving an incomprehensible error message.
I think the real problem was that the JSON parsing code blindly just
looked for "cycles".
The only thing that seems to make Ian's new
evlist__add_default_events() special is that it uses
"perf_pmus__scan_core()" for that cycles finding, which in turn then
uses only the *core* PMU's.
It feels to me like the "parse the JSON info" is just too fragile,
picks the wrong events, and Ian's "add defaults" just works around the
weakness.
But whatever. With his patch, at least it works for me.
Linus
On Sat, May 25, 2024 at 10:22 AM Linus Torvalds
<[email protected]> wrote:
>
> On Sat, 25 May 2024 at 00:38, Namhyung Kim <[email protected]> wrote:
> >
> > Yep, so I'm curious what makes it fail. IIUC the commit in question
> > would convert "cycles" event to "${whatever_pmu}/cycles/" if the pmu
> > has "events/cycles" alias in sysfs or in JSON. But it should work after
> > that too. :(
>
> Well, having looked at it some more, I have the main CPU PMU:
>
> /sys/bus/event_source/devices/armv8_pmuv3_0/
>
> which has a 'cpu_cycles' event, and the cpumask covers all the 128
> cores in this thing.
>
> That's the thing the regular profiling *should* use:
>
> $ ls events/
> br_mis_pred l1d_cache_wb l3d_cache_allocate
> br_mis_pred_retired l1d_tlb l3d_cache_refill
> br_pred l1d_tlb_refill l3d_cache_wb
> br_retired l1i_cache mem_access
> bus_access l1i_cache_refill memory_error
> bus_cycles l1i_tlb sample_collision
> cid_write_retired l1i_tlb_refill sample_feed
> cpu_cycles l2d_cache sample_filtrate
> exc_return l2d_cache_allocate sample_pop
> exc_taken l2d_cache_refill stall_backend
> inst_retired l2d_cache_wb stall_frontend
> inst_spec l2d_tlb ttbr_write_retired
> l1d_cache l2d_tlb_refill
> l1d_cache_refill l3d_cache
>
> But the thing that seems to have confused the new parsing is that this
> machine _also_ has
>
> /sys/bus/event_source/devices/arm_dsu_{0..63}/
>
> which all have a 'cycles' event:
>
> $ ls events/
> bus_access cycles l3d_cache_allocate l3d_cache_wb
> bus_cycles l3d_cache l3d_cache_refill memory_error
>
> but these things are basically some "each pair of cpus has some bridge
> to the L3", which is why this 128-core machine has 64 of them. So each
> of those have something like this:
>
> $ cat cpumask
> 40
> $ cat associated_cpus
> 40-41
>
> I did not look any closer at what the 'cycles' there can actually
> count, but clearly they can't be used for recording. And even if they
> can, they shouldn't for the default "cycles".
So this is now the behavior with the patch I sent.
> > It seems my system doesn't have the alias (both in x86_64 and arm64)
> > at least in sysfs. I think that's why I don't see the failure and maybe
> > there's a specific hardware issue - like a M1 macbook issue? IIRC it
> > required the exclude_guest bit to be set, but I think we handled it already
> > so this must be a different issue.
>
> This is my new Ampere system, which has basically replaced the M2
> Macbook Air as my arm64 test platform, since it builds the kernel a
> whole lot faster. That's what 128 cores will do...
>
> Ian's patch at
>
> https://lore.kernel.org/lkml/[email protected]/
>
> makes things work for me again, but I get the feeling that the JSON
> parsing is too fragile. It should have noticed that the 'cycles' event
> it found wasn't useful for profiling, and gone on to the next one,
> instead of just giving an incomprehensible error message.
>
> I think the real problem was that the JSON parsing code blindly just
> looked for "cycles".
I don't think JSON really means anything here. We use JSON as a way to
import tables of events. We want to map an event name to a config
value, we could do it in lots of different ways. There is a difference
between:
- legacy events: hard coded event numbers for certain typically CPU
events, although there are L3 cache events in there,
- sysfs: events advertised by the PMU driver in sysfs,
- events built into the tool, imported from json files.
The first problem that happened was BIG.little. ARM did nothing to fix
the perf APIs for BIG.little and Intel fixed things up when they did
hybrid. What does it mean to open a legacy event on a process in a
BIG.little CPU when the process may be scheduled onto either BIG or
little cores? ARM's non-solution was that you'd profile based on which
PMU driver registered first and not profile on the other - you would
be profiling on some fraction of your CPUs but not all.
While Intel added proper hybrid/BIG.little support they did so by
essentially forking every aspect of the perf tool to have a "hybrid"
version. I put a stop to this in the metrics code and by doing so
ended up taking on responsibility for cleaning up the mess Intel had
been creating. This led to the event parser being able to parse "<PMU
name>/<legacy event name>/" for example in "cpu_core/cycles/" and
"cpu_atom/cycles/" and encode the legacy event.
This caused issues on ARM, especially Apple ones - maybe Apple should
be asked to support their PMU? Previously their PMU would advertise
events like cycles and the sysfs encoding would be used as there was
no notion of using the legacy encoding. Now the legacy encoding was
used and for some reason we don't maintain legacy mappings of events
for PMUs on Apple CPUs. After being politely shouted at, and to my
surprise Intel agreeing, we made it so sysfs or events built into the
tool take priority over legacy events if the PMU is specified. This
wasn't a small change as every test expectation needed updating for
this. We still have some broken test expectations.
So we had a case where legacy events weren't the priority if the PMU
was specified but when no PMU was specified they had priority. This
was frankly weird and so the patch you bisected to fixed this and was
also a behavior RISC-V had asked for. There was even some discussion
on this subject at LPC 2023.
Is everything fixed? Nope. It is still the case on ARM that we don't
profile on all CPUs for perf stat. This is because when no event is
specified hand crafted perf_event_attr are passed to perf_event_open
and these perf_event_attr need to open multiple for a
BIG.little/hybrid system:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/builtin-stat.c#n1942
I wrote a fix for this:
https://lore.kernel.org/lkml/[email protected]/
Unreviewed/unmerged and frankly I think ARM should be in the driving
seat for reviewing it as it is trying to fix their CPUs.
> The only thing that seems to make Ian's new
> evlist__add_default_events() special is that it uses
> "perf_pmus__scan_core()" for that cycles finding, which in turn then
> uses only the *core* PMU's.
>
> It feels to me like the "parse the JSON info" is just too fragile,
> picks the wrong events, and Ian's "add defaults" just works around the
> weakness.
So I think the parsing is now robust/consistent but that doesn't mean
the behavior matches your expectations.
I say robust as when any event, regardless of whether it happens to
alias the few hundred legacy names, will parse. Previous parsing meant
events advertised by AMD's PMU, like branch-brs, would return parse
errors as "branch" and "brs" looked like components of a legacy event
name.
Consistent in that, regardless of event name, we will try the same
process to find the event.
I don't understand why you think JSON comes into this. Is it because
I've sent patches with JSON encoding events and metrics in? In the
case of AMD's "branch-brs" and the "cycles" event, the event has been
advertised via sysfs and not from data in JSON tables.
> But whatever. With his patch, at least it works for me.
You followed up in:
https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com/
with:
> So no. That still needs to be fixed, or the whole "prefer sysfs/JSON
> by default" needs to be reverted.
So I think we still need to figure out what:
$ perf <command> -e <event> ...
where <event> doesn't specify a PMU means. I'll try to enumerate the options:
1) the 6.9 behavior where <event> will be searched for on every PMU
and each PMU that has that event will have it opened. This can lead to
failures depending on what <command> is going to do. If <event> is a
legacy event name then it will be opened only on core PMUs.
2) the current perf-tools branch behavior where <event> is searched
for on every PMU regardless of it being a legacy or non-legacy event
name.
3) (2) but we change drivers to not advertise events where the name
conflicts with a legacy event name.
4) if the PMU isn't specified with <event> then we only search core
PMUs (e.g. armv8_pmuv3_0, cpu, cpu_core, cpu_atom)
5) we parse all events as with (2) but depending on <command> can
choose to ignore perf_event_open failing.
I'll try to rustle up a patch for (5), I have no way to test it. I
expect it will make somebody somewhere unhappy, apologies in advance.
Thanks,
Ian
> Linus
On Sat, 25 May 2024 at 16:34, Ian Rogers <[email protected]> wrote:
>
> So I think we still need to figure out what:
>
> $ perf <command> -e <event> ...
>
> where <event> doesn't specify a PMU means. I'll try to enumerate the options:
[ snip snip ]
How about make the rule be that if the event doesn't have a specified
PMU, then that just means "legacy rules first".
IOW, if you have a fully qualified event name (maybe define that as
"event name contains a slash), then you use the sysfs lookup.
But a simple event name that doesn't contain a slash shall mean "use
legacy lookup rules".
Maybe in practice that ends up being the same as your option #4 ("if
the PMU isn't specified with <event> then we only search core PMUs")?
I don't know the perf code well enough to be able to say.
But basically, the #1 rule in the kernel is that we do not break user
workflows. I happen to think that that is a really important rule, and
I'm disgusted at how many other open source projects ignore that rule
and think that "in the name of improvement, we will break the world".
And as long as "perf" is maintained in the kernel sources, that kernel
rule will guide perf too. Because the rule is not so much "kernels are
special" as a "Linus wants people to be able to feel confident in
updating".
Linus
On Sat, May 25, 2024 at 10:22 PM Linus Torvalds
<[email protected]> wrote:
>
> On Sat, 25 May 2024 at 16:34, Ian Rogers <[email protected]> wrote:
> >
> > So I think we still need to figure out what:
> >
> > $ perf <command> -e <event> ...
> >
> > where <event> doesn't specify a PMU means. I'll try to enumerate the options:
>
> [ snip snip ]
>
> How about make the rule be that if the event doesn't have a specified
> PMU, then that just means "legacy rules first".
>
> IOW, if you have a fully qualified event name (maybe define that as
> "event name contains a slash), then you use the sysfs lookup.
>
> But a simple event name that doesn't contain a slash shall mean "use
> legacy lookup rules".
What to do with events with no PMU like data_read? On my Intel tigerlake laptop:
```
$ ls /sys/devices/*/events/data_read
/sys/devices/uncore_imc_free_running_0/events/data_read
/sys/devices/uncore_imc_free_running_1/events/data_read
$ perf --version
perf version 6.6.15
$ sudo perf stat --no-merge -e data_read -a sleep 0.1
Performance counter stats for 'system wide':
122.84 MiB data_read [uncore_imc_free_running_0]
123.00 MiB data_read [uncore_imc_free_running_1]
0.101826301 seconds time elapsed
```
The rule for looking up an event with no PMU specified is to try it on
every PMU - the rule is about as old as perf itself. For heterogeneous
systems (BIG.little, hybrid) legacy events (cycles, instructions..)
will be opened on every "core" PMU before the change. The behavior now
is to just be consistent and say when no PMU is specified we always
search all PMUs. This is motivated by ARM, RISC-V, .. wanting legacy
events to be a last resort if sysfs or json encodings can't be found.
> Maybe in practice that ends up being the same as your option #4 ("if
> the PMU isn't specified with <event> then we only search core PMUs")?
> I don't know the perf code well enough to be able to say.
>
> But basically, the #1 rule in the kernel is that we do not break user
> workflows. I happen to think that that is a really important rule, and
> I'm disgusted at how many other open source projects ignore that rule
> and think that "in the name of improvement, we will break the world".
So the problem was that the world was broken. It still is. Find a
BIG.little ARM and try "perf stat -a sleep 1", the counts reported
will be for either the BIG or the little cores. BIG.little is a decade
old now.
The switch to prioritize sysfs/JSON events over legacy events was
specifically so that we wouldn't break Apple M1 and newer CPUs when
making ARM properly support BIG.little when an event is specified. I
think perf was broken over 2 kernel releases on Apple devices because
of that. I had to fix issues with Apple M1 and later while not having
access to such a machine. Frankly life would have been easier if ARM
had just fixed the driver.
> And as long as "perf" is maintained in the kernel sources, that kernel
> rule will guide perf too. Because the rule is not so much "kernels are
> special" as a "Linus wants people to be able to feel confident in
> updating".
Sure, me too. This wouldn't have been an issue if Ampere had chosen a
different event name than cycles. Often the driver writers don't test
or consider perf. For example, perf will treat PMUs with a number at
the end as being something to merge together in counts. On my laptop,
uncore_imc_free_running_0 and uncore_imc_free_running_1 were merged in
the data_read example above. ARM decided to start a new convention of
putting hex addresses as suffixes many Linux releases ago, I've sent
patches to make these encodings work like the numeric case and we may
get it landed in v6.11 (a big problem being S390 already had PMUs like
cpum_cf and cf is a valid hex suffix):
https://lore.kernel.org/lkml/[email protected]/
In doing that work the code was tested by IBM for S390 and by Intel,
but it was fixing an ARM created problem. ARM were the first to have
BIG.little systems but contributed nothing to the perf tool to handle
it, even though each core type has a different PMU. ARM BIG.little
remains broken with the perf tool and when I fix it for them they
don't review or test the code. ARM changed and left unworking uncore
PMU naming conventions. ARM don't fix tests for their platform. ARM
don't help make perf's tests cover their different way of naming PMUs.
No one is trying to break ARM machines, but when ARM fails to do
anything other than review their own changes in the perf tree it is
something of an inevitability.
Fwiw, I am working on making perf record, perf top, etc. skip events
on non-core PMUs when they fail to open. It is a rather large and ugly
change. It is also a holiday weekend and I'm spending a lot of my time
in it addressing latent ARM problems.
Thanks,
Ian
> Linus
>
Hi Ian,
On Sat, May 25, 2024 at 11:22:08PM -0700, Ian Rogers wrote:
[...]
> In doing that work the code was tested by IBM for S390 and by Intel,
> but it was fixing an ARM created problem. ARM were the first to have
> BIG.little systems but contributed nothing to the perf tool to handle
> it, even though each core type has a different PMU. ARM BIG.little
> remains broken with the perf tool and when I fix it for them they
> don't review or test the code. ARM changed and left unworking uncore
> PMU naming conventions. ARM don't fix tests for their platform. ARM
> don't help make perf's tests cover their different way of naming PMUs.
> No one is trying to break ARM machines, but when ARM fails to do
> anything other than review their own changes in the perf tree it is
> something of an inevitability.
>
> Fwiw, I am working on making perf record, perf top, etc. skip events
> on non-core PMUs when they fail to open. It is a rather large and ugly
> change. It is also a holiday weekend and I'm spending a lot of my time
> in it addressing latent ARM problems.
James is in holiday, so I should cover this in time. Sorry that I did
not respond the issue quickly - mainly as I am not familiar with JSON
and perf event part.
I tried to reproduce the issue reported by Linus on my Arm big.LITTLE
system but cannot reproduce it. As I saw Linus mentioned the issue is
related with Arm DSU event, which is absent on my test board.
But your complaint is received well and I am very appreciate your work
(especially in the weekend). I will continue to look into the details
in this thread, and monitor incoming patches and I will verify them.
Thanks,
Leo
On Sun, May 26, 2024 at 05:58:38PM +0800, Leo Yan wrote:
> On Sat, May 25, 2024 at 11:22:08PM -0700, Ian Rogers wrote:
> > In doing that work the code was tested by IBM for S390 and by Intel,
> > but it was fixing an ARM created problem. ARM were the first to have
> > BIG.little systems but contributed nothing to the perf tool to handle
> > it, even though each core type has a different PMU. ARM BIG.little
> > remains broken with the perf tool and when I fix it for them they
> > don't review or test the code. ARM changed and left unworking uncore
> > PMU naming conventions. ARM don't fix tests for their platform. ARM
> > don't help make perf's tests cover their different way of naming PMUs.
> > No one is trying to break ARM machines, but when ARM fails to do
> > anything other than review their own changes in the perf tree it is
> > something of an inevitability.
> > Fwiw, I am working on making perf record, perf top, etc. skip events
> > on non-core PMUs when they fail to open. It is a rather large and ugly
> > change. It is also a holiday weekend and I'm spending a lot of my time
> > in it addressing latent ARM problems.
> James is in holiday, so I should cover this in time. Sorry that I did
> not respond the issue quickly - mainly as I am not familiar with JSON
> and perf event part.
> I tried to reproduce the issue reported by Linus on my Arm big.LITTLE
> system but cannot reproduce it. As I saw Linus mentioned the issue is
> related with Arm DSU event, which is absent on my test board.
Yeah, I couldn't either with ARM boards I have, I think we should revert
the change, as per Linus bisect and test that it makes thing work the
way they were before that patch for him and get this work restarted as
the first thing in the next devel cycle.
It has to be widely tested, Ian tried, Linus tried his patch and it is
still not acceptable, its a holiday, kids are home, we should be with
family.
> But your complaint is received well and I am very appreciate your work
> (especially in the weekend). I will continue to look into the details
> in this thread, and monitor incoming patches and I will verify them.
Indeed, the situation was detailed to a wider audience, hybrid is a
reality and we should properly support it, taming perf's tech debit on
this matter.
Ian's has been working on this for long time, testing all this takes a
lot of time and requires access to lots of hardware not easily available
for all of us, so its bound to break here and there, what we do when
this happens depends on when the problem is detected, and today Linus is
expected to close the merge window, so a revert is the technical, if not
pleasant thing to do now.
- Arnaldo
On Sun, May 26, 2024 at 4:10 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
> Ian's has been working on this for long time, testing all this takes a
> lot of time and requires access to lots of hardware not easily available
> for all of us, so its bound to break here and there, what we do when
> this happens depends on when the problem is detected, and today Linus is
> expected to close the merge window, so a revert is the technical, if not
> pleasant thing to do now.
Agreed. I think the only other option is:
https://lore.kernel.org/lkml/[email protected]/
and then when specifying the CPU cycles event on these Ampere systems
you have to do:
$ perf record -e 'armv8_pmuv3_0/cycles/P' ...
but Linus isn't happy with having to specify a PMU like that. As that
is an ARM/Ampere only problem I'm not sure it is so bad.
Thanks,
Ian
> - Arnaldo
On Sat, 25 May 2024 at 23:22, Ian Rogers <[email protected]> wrote:
>
> What to do with events with no PMU like data_read?
Are they actually ambiguous?
> The rule for looking up an event with no PMU specified is to try it on
> every PMU - the rule is about as old as perf itself.
Well, clearly you then violated that rule with your change. Or you
ended up changing the rule to look them up in the wrong order.
I'm not interested in theoretical breakages. I'm interested in actual
_real_ and _clear_ breakages, and that commit of yours introduced
them.
Maybe the fix is to make sure the events are always looked up in the
right order, with core events always getting priority.
That said, in the *particular* case that I hit, it's even worse than
that. It's not just that the wrong event was looked up - it looked up
an event that DIDN'T EVEN WORK for the workload.
So it's doubly broken.
I don't understand why you then bring up a "what about" issue that is
entirely irrelevant and would presumably never have shown the issue in
the first place because it wasn't probably wasn't ambiguous, and that
actually did work as an event source.
Reality matters. What-about-isms don't.
Linus
On Sun, 26 May 2024 at 09:50, Ian Rogers <[email protected]> wrote:
>
> If you think what you are asking perf with no pmu for a cycles event, is it unreasonable it gives you every cycles event for event PMU?
But that's not what it gave me. At all. All it gave me was NO
PROFILING AT ALL and a useless error message.
You're trying to change the goal-posts here and keep bringing up
irrelevant things that didn't actually happen.
I have a revert in the pull request from Arnaldo, and the revert is happening.
Linus