2023-07-10 03:25:17

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [perf parse] 70c90e4a6b: perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail


hi Ian Rogers,

when we reported
"[linux-next:master] [perf parse] 70c90e4a6b: perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail"
on
https://lore.kernel.org/all/[email protected]/
when this commit is still on linus-next, you mentioned it should be fixed by
https://lore.kernel.org/r/[email protected]
which we noticed is already on mainline now.
"1981da1fe2499 perf machine: Don't leak module maps"

now we noticed the commit is on mainline already, and the issues seem still
exist. we also tested on latest linus/master linux-next/master when this bisect
done, which we confirmed both include 1981da1fe2499. but we found the tests
still failed. so we send this report again FYI.


Hello,

kernel test robot noticed "perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail" on:

commit: 70c90e4a6b2fbe775b662eafefae51f64d627790 ("perf parse-events: Avoid scanning PMUs before parsing")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 1c7873e3364570ec89343ff4877e0f27a7b21a61]
[test failed on linux-next/master 123212f53f3e394c1ae69a58c05dfdda56fec8c6]

in testcase: perf-test
version: perf-test-x86_64-git-1_20220520
with following parameters:

type: lkp
group: group-00

test-description: The internal Perf Test suite.


compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


besides, we also noticed several other cases will fail on this commit but pass
on parent:

442eeb77044705f2 70c90e4a6b2fbe775b662eafefa
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail
:6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1.fail
:6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_instructions_k_HAS_FIX_NO_NMI_R1.fail
:6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_instructions_k_NO_FIX_HAS_NMI_R1.fail



28 test cases pass for perf_hw_event_sample_group test. 4 test cases fail for perf_hw_event_sample_group test.
Test Case sampe_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
Test Case group_sampe_cpu-cycles_cache-misses_NO_FIX_HAS_NMI_R1 PASS!
Test Case group_sampe_cache-misses_instructions_u_NO_FIX_HAS_NMI_R1 PASS!
Test Case group_sampe_cpu-cycles_instructions_k_NO_FIX_HAS_NMI_R1 FAILED! <----------
Test Case group_sampe_cpu-cycles_cache-misses_and_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
Test Case group_sampe_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
Test Case group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 FAILED! <----------
Test Case group_sampe_cpu-cycles_cache-misses_instructions_and_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
Test Case sampe_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
Test Case group_sampe_cpu-cycles_cache-misses_HAS_FIX_NO_NMI_R1 PASS!
Test Case group_sampe_cache-misses_instructions_u_HAS_FIX_NO_NMI_R1 PASS!
Test Case group_sampe_cpu-cycles_instructions_k_HAS_FIX_NO_NMI_R1 FAILED! <----------
Test Case group_sampe_cpu-cycles_cache-misses_and_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
Test Case group_sampe_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
Test Case group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 FAILED! <----------
Test Case group_sampe_cpu-cycles_cache-misses_instructions_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
Test Case sampe_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_bus-cycles_NO_FIX_HAS_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_branch-misses_u_NO_FIX_HAS_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_branch-misses_k_NO_FIX_HAS_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_and_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
Test Case sampe_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_bus-cycles_HAS_FIX_NO_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_branch-misses_u_HAS_FIX_NO_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_branch-misses_k_HAS_FIX_NO_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_and_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
perf hardware cache event sample group test



To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Attachments:
(No filename) (6.42 kB)
config-6.4.0-rc1-00063-g70c90e4a6b2f (166.22 kB)
job-script (5.54 kB)
dmesg.xz (70.23 kB)
perf-test (85.99 kB)
job.yaml (4.51 kB)
reproduce (417.00 B)
Download all attachments

2023-07-10 19:44:45

by Ian Rogers

[permalink] [raw]
Subject: Re: [linus:master] [perf parse] 70c90e4a6b: perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail

Hi and thanks for the report, I'm confused by the output. Specifically:

Direct leak of 17544 byte(s) in 51 object(s) allocated from:
#0 0x7f49ee50c037 in __interceptor_calloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x556656895a6b in map__new2 util/map.c:226
#2 0x55665687a6ac in machine__addnew_module_map util/machine.c:1039
#3 0x556656880bfa in machine__process_kernel_mmap_event util/machine.c:1809
#4 0x556656882eb7 in machine__process_mmap_event util/machine.c:1996
#5 0x5566567426bd in perf_event__process_mmap util/event.c:370
#6 0x5566568b3536 in machines__deliver_event util/session.c:1565
#7 0x5566568b4e16 in perf_session__deliver_event util/session.c:1645
#8 0x5566568b7ea1 in perf_session__process_event util/session.c:1881
#9 0x5566568bed4d in process_simple util/session.c:2442
#10 0x5566568bdd9d in reader__read_event util/session.c:2371
#11 0x5566568be6dd in reader__process_events util/session.c:2420
#12 0x5566568bf506 in __perf_session__process_events util/session.c:2467
#13 0x5566568c243e in perf_session__process_events util/session.c:2633
#14 0x5566563ff7d9 in __cmd_report
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/builtin-report.c:989
#15 0x55665640be73 in cmd_report
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/builtin-report.c:1709
#16 0x5566566e0d7f in run_builtin
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/perf.c:323
#17 0x5566566e1601 in handle_internal_command
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/perf.c:377
#18 0x5566566e1b33 in run_argv
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/perf.c:421
#19 0x5566566e225f in main
/usr/src/perf_selftests-x86_64-rhel-8.3-bpf-70c90e4a6b2fbe775b662eafefae51f64d627790/tools/perf/perf.c:537
#20 0x7f49ed6b3d09 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x23d09)

It shows a map being leaked but without the reference count checker
being enabled, which shouldn't happen given:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/tools/lib/perf/include/internal/rc_check.h#n12

Trying to look further, the blamed line is a closing curly for a function:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/tools/perf/util/machine.c#n1039

As such I'm not sure there is anything actionable here and I suspect
the underlying issues were fixed with the numerous reference count
checker fixes to the perf tool.

Thanks,
Ian

On Sun, Jul 9, 2023 at 8:10 PM kernel test robot <[email protected]> wrote:
>
>
> hi Ian Rogers,
>
> when we reported
> "[linux-next:master] [perf parse] 70c90e4a6b: perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail"
> on
> https://lore.kernel.org/all/[email protected]/
> when this commit is still on linus-next, you mentioned it should be fixed by
> https://lore.kernel.org/r/[email protected]
> which we noticed is already on mainline now.
> "1981da1fe2499 perf machine: Don't leak module maps"
>
> now we noticed the commit is on mainline already, and the issues seem still
> exist. we also tested on latest linus/master linux-next/master when this bisect
> done, which we confirmed both include 1981da1fe2499. but we found the tests
> still failed. so we send this report again FYI.
>
>
> Hello,
>
> kernel test robot noticed "perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail" on:
>
> commit: 70c90e4a6b2fbe775b662eafefae51f64d627790 ("perf parse-events: Avoid scanning PMUs before parsing")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master 1c7873e3364570ec89343ff4877e0f27a7b21a61]
> [test failed on linux-next/master 123212f53f3e394c1ae69a58c05dfdda56fec8c6]
>
> in testcase: perf-test
> version: perf-test-x86_64-git-1_20220520
> with following parameters:
>
> type: lkp
> group: group-00
>
> test-description: The internal Perf Test suite.
>
>
> compiler: gcc-12
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <[email protected]>
> | Closes: https://lore.kernel.org/oe-lkp/[email protected]
>
>
> besides, we also noticed several other cases will fail on this commit but pass
> on parent:
>
> 442eeb77044705f2 70c90e4a6b2fbe775b662eafefa
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1.fail
> :6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1.fail
> :6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_instructions_k_HAS_FIX_NO_NMI_R1.fail
> :6 100% 6:6 perf-test.perf_hw_event_sample_group.group_sampe_cpu-cycles_instructions_k_NO_FIX_HAS_NMI_R1.fail
>
>
>
> 28 test cases pass for perf_hw_event_sample_group test. 4 test cases fail for perf_hw_event_sample_group test.
> Test Case sampe_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_NO_FIX_HAS_NMI_R1 PASS!
> Test Case group_sampe_cache-misses_instructions_u_NO_FIX_HAS_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_instructions_k_NO_FIX_HAS_NMI_R1 FAILED! <----------
> Test Case group_sampe_cpu-cycles_cache-misses_and_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 FAILED! <----------
> Test Case group_sampe_cpu-cycles_cache-misses_instructions_and_cpu-cycles_cache-misses_instructions_NO_FIX_HAS_NMI_R1 PASS!
> Test Case sampe_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_HAS_FIX_NO_NMI_R1 PASS!
> Test Case group_sampe_cache-misses_instructions_u_HAS_FIX_NO_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_instructions_k_HAS_FIX_NO_NMI_R1 FAILED! <----------
> Test Case group_sampe_cpu-cycles_cache-misses_and_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
> Test Case group_sampe_cpu-cycles_cache-misses_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 FAILED! <----------
> Test Case group_sampe_cpu-cycles_cache-misses_instructions_and_cpu-cycles_cache-misses_instructions_HAS_FIX_NO_NMI_R1 PASS!
> Test Case sampe_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_branch-misses_u_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_branch-misses_k_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_and_bus-cycles_bus-cycles_branch-misses_NO_FIX_HAS_NMI_R0 PASS!
> Test Case sampe_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_branch-misses_u_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_branch-misses_k_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_and_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
> Test Case group_sampe_bus-cycles_bus-cycles_branch-misses_and_bus-cycles_bus-cycles_branch-misses_HAS_FIX_NO_NMI_R0 PASS!
> perf hardware cache event sample group test
>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
>