2021-08-02 02:33:54

by kernel test robot

[permalink] [raw]
Subject: [pipe] 3a34b13a88: hackbench.throughput -12.6% regression



Greeting,

FYI, we noticed a -12.6% regression of hackbench.throughput due to commit:


commit: 3a34b13a88caeb2800ab44a4918f230041b37dd9 ("pipe: make pipe writes always wake up readers")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: hackbench
on test machine: 96 threads 2 sockets Ice Lake with 256G memory
with following parameters:

nr_threads: 100%
iterations: 4
mode: process
ipc: pipe
cpufreq_governor: performance
ucode: 0xb000280

test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
bin/lkp run generated-yaml-file

=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase/ucode:
gcc-9/performance/pipe/4/x86_64-rhel-8.3/process/100%/debian-10.4-x86_64-20200603.cgz/lkp-icl-2sp1/hackbench/0xb000280

commit:
4669e13cd6 ("Merge tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block")
3a34b13a88 ("pipe: make pipe writes always wake up readers")

4669e13cd67f8532 3a34b13a88caeb2800ab44a4918
---------------- ---------------------------
%stddev %change %stddev
\ | \
544258 -12.6% 475829 hackbench.throughput
84.88 +14.4% 97.08 hackbench.time.elapsed_time
84.88 +14.4% 97.08 hackbench.time.elapsed_time.max
5.035e+08 ? 2% +23.6% 6.222e+08 hackbench.time.involuntary_context_switches
6892 +16.1% 8002 hackbench.time.system_time
953.30 +6.8% 1018 hackbench.time.user_time
1.003e+09 +16.4% 1.168e+09 hackbench.time.voluntary_context_switches
100612 +10.4% 111038 meminfo.AnonHugePages
3.83 ? 11% -0.6 3.19 ? 5% mpstat.cpu.all.idle%
0.05 ? 2% -0.0 0.04 ? 2% mpstat.cpu.all.soft%
0.72 ? 13% -0.2 0.57 turbostat.C1%
66799255 ? 5% +55.4% 1.038e+08 turbostat.IRQ
17392735 +4.0% 18083175 vmstat.system.cs
759236 ? 4% +36.5% 1036422 vmstat.system.in
35655124 +6.3% 37906260 proc-vmstat.numa_hit
35587231 +6.3% 37837737 proc-vmstat.numa_local
35647053 +6.3% 37897853 proc-vmstat.pgalloc_normal
35563777 +6.3% 37815534 proc-vmstat.pgfree
8.10 ?103% -4.6 3.49 ?154% perf-profile.calltrace.cycles-pp.seq_read_iter.seq_read.vfs_read.ksys_read.do_syscall_64
8.10 ?103% -4.6 3.49 ?154% perf-profile.calltrace.cycles-pp.seq_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.85 ?106% -4.4 3.49 ?154% perf-profile.calltrace.cycles-pp.proc_pid_status.proc_single_show.seq_read_iter.seq_read.vfs_read
7.85 ?106% -4.4 3.49 ?154% perf-profile.calltrace.cycles-pp.proc_single_show.seq_read_iter.seq_read.vfs_read.ksys_read
3.50 ?103% +0.7 4.23 ?143% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.do_sys_open
7.11 ?127% -7.1 0.00 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
6.68 ?133% -6.7 0.00 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
8.10 ?103% -4.6 3.49 ?154% perf-profile.children.cycles-pp.seq_read
7.85 ?106% -4.4 3.49 ?154% perf-profile.children.cycles-pp.proc_pid_status
7.85 ?106% -4.4 3.49 ?154% perf-profile.children.cycles-pp.proc_single_show
7.66 ?109% -3.4 4.23 ?143% perf-profile.children.cycles-pp.link_path_walk
9.41 +9.5% 10.30 perf-stat.i.MPKI
4.361e+10 -4.4% 4.169e+10 perf-stat.i.branch-instructions
0.74 +0.1 0.80 ? 2% perf-stat.i.branch-miss-rate%
3.163e+08 +3.0% 3.258e+08 ? 2% perf-stat.i.branch-misses
6.03 ? 3% -0.6 5.46 ? 2% perf-stat.i.cache-miss-rate%
1.148e+08 ? 2% -4.6% 1.096e+08 ? 2% perf-stat.i.cache-misses
2.019e+09 +4.8% 2.116e+09 perf-stat.i.cache-references
17901191 +3.7% 18564184 perf-stat.i.context-switches
1.27 +5.8% 1.35 perf-stat.i.cpi
703234 ? 6% +32.2% 929682 ? 3% perf-stat.i.cpu-migrations
2562 ? 2% +6.0% 2717 ? 2% perf-stat.i.cycles-between-cache-misses
0.07 ? 2% +0.0 0.08 ? 2% perf-stat.i.dTLB-load-miss-rate%
44303865 ? 2% +9.8% 48637001 perf-stat.i.dTLB-load-misses
6.517e+10 -5.2% 6.178e+10 perf-stat.i.dTLB-loads
3.893e+10 -3.4% 3.763e+10 perf-stat.i.dTLB-stores
2.182e+11 -4.3% 2.088e+11 perf-stat.i.instructions
0.80 -5.2% 0.76 perf-stat.i.ipc
10.90 ? 11% -28.7% 7.77 ? 11% perf-stat.i.major-faults
1560 -4.4% 1492 perf-stat.i.metric.M/sec
9811 -9.2% 8904 perf-stat.i.minor-faults
8014412 ? 6% -18.5% 6533868 ? 5% perf-stat.i.node-loads
9821 -9.3% 8912 perf-stat.i.page-faults
9.24 +9.6% 10.12 perf-stat.overall.MPKI
0.72 +0.1 0.78 ? 2% perf-stat.overall.branch-miss-rate%
5.69 ? 2% -0.5 5.18 ? 2% perf-stat.overall.cache-miss-rate%
1.25 +5.5% 1.32 perf-stat.overall.cpi
2374 ? 2% +5.7% 2510 ? 2% perf-stat.overall.cycles-between-cache-misses
0.07 +0.0 0.08 perf-stat.overall.dTLB-load-miss-rate%
0.80 -5.2% 0.76 perf-stat.overall.ipc
4.313e+10 -4.3% 4.129e+10 perf-stat.ps.branch-instructions
3.124e+08 +3.2% 3.223e+08 ? 2% perf-stat.ps.branch-misses
1.134e+08 ? 2% -4.4% 1.085e+08 ? 2% perf-stat.ps.cache-misses
1.994e+09 +5.0% 2.093e+09 perf-stat.ps.cache-references
17684557 +3.9% 18367344 perf-stat.ps.context-switches
2.692e+11 +1.1% 2.722e+11 perf-stat.ps.cpu-cycles
686423 ? 6% +33.0% 913152 ? 3% perf-stat.ps.cpu-migrations
43736179 +10.0% 48097818 perf-stat.ps.dTLB-load-misses
6.446e+10 -5.1% 6.118e+10 perf-stat.ps.dTLB-loads
3.851e+10 -3.2% 3.727e+10 perf-stat.ps.dTLB-stores
2.158e+11 -4.2% 2.068e+11 perf-stat.ps.instructions
10.28 ? 11% -28.3% 7.37 ? 11% perf-stat.ps.major-faults
9390 -8.8% 8566 ? 2% perf-stat.ps.minor-faults
7938983 ? 6% -18.3% 6483156 ? 5% perf-stat.ps.node-loads
9400 -8.8% 8573 ? 2% perf-stat.ps.page-faults
1.845e+13 +9.4% 2.02e+13 perf-stat.total.instructions
12447 ? 5% +17.9% 14680 ? 5% softirqs.CPU0.RCU
11964 ? 6% +17.1% 14015 ? 3% softirqs.CPU1.RCU
11909 ? 5% +21.5% 14470 ? 5% softirqs.CPU10.RCU
11897 ? 6% +16.5% 13855 ? 4% softirqs.CPU12.RCU
12237 ? 4% +14.3% 13992 ? 5% softirqs.CPU13.RCU
12760 ? 7% +22.5% 15630 ? 6% softirqs.CPU17.RCU
12875 ? 7% +23.6% 15909 ? 5% softirqs.CPU19.RCU
11568 ? 5% +20.3% 13918 ? 3% softirqs.CPU2.RCU
13069 ? 7% +21.1% 15823 ? 5% softirqs.CPU21.RCU
13003 ? 6% +32.0% 17163 ? 13% softirqs.CPU22.RCU
13094 ? 9% +21.9% 15967 ? 3% softirqs.CPU23.RCU
12931 ? 5% +19.4% 15434 ? 8% softirqs.CPU24.RCU
12766 ? 5% +19.0% 15187 ? 7% softirqs.CPU25.RCU
12810 ? 4% +18.1% 15128 ? 7% softirqs.CPU26.RCU
12890 ? 5% +18.7% 15301 ? 7% softirqs.CPU27.RCU
12863 ? 6% +17.9% 15167 ? 8% softirqs.CPU28.RCU
12016 ? 7% +17.1% 14066 ? 2% softirqs.CPU3.RCU
12652 ? 6% +20.1% 15190 ? 10% softirqs.CPU30.RCU
12779 ? 5% +19.2% 15226 ? 7% softirqs.CPU31.RCU
12056 ? 8% +23.4% 14878 ? 5% softirqs.CPU32.RCU
12246 ? 9% +20.9% 14808 ? 6% softirqs.CPU34.RCU
12136 ? 5% +19.9% 14549 ? 6% softirqs.CPU35.RCU
12079 ? 3% +17.7% 14217 ? 7% softirqs.CPU36.RCU
11889 ? 4% +19.6% 14218 ? 6% softirqs.CPU37.RCU
12056 ? 4% +19.0% 14341 ? 6% softirqs.CPU38.RCU
11974 ? 5% +19.8% 14349 ? 6% softirqs.CPU39.RCU
11750 ? 6% +18.1% 13878 ? 6% softirqs.CPU4.RCU
12369 ? 7% +15.8% 14318 ? 6% softirqs.CPU40.RCU
12176 ? 5% +18.2% 14393 ? 6% softirqs.CPU41.RCU
11903 ? 4% +18.9% 14154 ? 5% softirqs.CPU42.RCU
11876 ? 4% +20.9% 14361 ? 4% softirqs.CPU43.RCU
11959 ? 5% +20.8% 14445 ? 6% softirqs.CPU44.RCU
12030 ? 7% +19.4% 14365 ? 6% softirqs.CPU45.RCU
12393 ? 4% +19.3% 14786 ? 7% softirqs.CPU46.RCU
12197 ? 5% +18.6% 14470 ? 6% softirqs.CPU47.RCU
11729 ? 7% +17.5% 13777 ? 3% softirqs.CPU48.RCU
11433 ? 5% +22.9% 14049 ? 4% softirqs.CPU49.RCU
11723 ? 3% +16.2% 13621 ? 3% softirqs.CPU5.RCU
11595 ? 6% +18.9% 13783 ? 3% softirqs.CPU50.RCU
11633 ? 5% +21.3% 14108 ? 4% softirqs.CPU52.RCU
11694 ? 5% +18.0% 13796 ? 4% softirqs.CPU53.RCU
11832 ? 5% +16.9% 13829 ? 5% softirqs.CPU54.RCU
11784 ? 6% +20.8% 14232 ? 4% softirqs.CPU55.RCU
11807 ? 6% +18.8% 14029 ? 4% softirqs.CPU56.RCU
11570 ? 7% +20.4% 13925 ? 5% softirqs.CPU57.RCU
11999 ? 7% +20.2% 14427 ? 5% softirqs.CPU58.RCU
11330 ? 7% +20.8% 13692 ? 6% softirqs.CPU59.RCU
11520 ? 5% +17.9% 13580 ? 3% softirqs.CPU6.RCU
11753 ? 6% +15.4% 13558 ? 2% softirqs.CPU60.RCU
11508 ? 6% +17.7% 13550 ? 3% softirqs.CPU61.RCU
11126 ? 10% +21.2% 13488 ? 3% softirqs.CPU62.RCU
12442 ? 8% +27.8% 15901 ? 5% softirqs.CPU64.RCU
12552 ? 7% +21.9% 15303 ? 5% softirqs.CPU65.RCU
12423 ? 6% +20.4% 14952 ? 5% softirqs.CPU66.RCU
12716 ? 7% +19.7% 15218 ? 5% softirqs.CPU67.RCU
12329 ? 7% +25.3% 15449 ? 6% softirqs.CPU68.RCU
12529 ? 7% +23.0% 15416 ? 5% softirqs.CPU69.RCU
11695 ? 5% +18.9% 13907 ? 4% softirqs.CPU7.RCU
12421 ? 6% +25.5% 15582 ? 6% softirqs.CPU70.RCU
12663 ? 7% +21.9% 15436 ? 5% softirqs.CPU71.RCU
12636 ? 7% +17.8% 14890 ? 8% softirqs.CPU72.RCU
12716 ? 7% +17.9% 14988 ? 8% softirqs.CPU73.RCU
12546 ? 5% +16.2% 14577 ? 8% softirqs.CPU74.RCU
12677 ? 6% +18.3% 15000 ? 7% softirqs.CPU75.RCU
12639 ? 6% +17.8% 14890 ? 7% softirqs.CPU76.RCU
12322 ? 5% +19.0% 14664 ? 8% softirqs.CPU77.RCU
12338 ? 5% +20.7% 14887 ? 8% softirqs.CPU78.RCU
12671 ? 5% +18.0% 14948 ? 8% softirqs.CPU79.RCU
11863 ? 5% +16.6% 13831 ? 3% softirqs.CPU8.RCU
11686 ? 5% +20.6% 14097 ? 5% softirqs.CPU80.RCU
11703 ? 6% +21.9% 14269 ? 4% softirqs.CPU82.RCU
11679 ? 4% +19.3% 13938 ? 5% softirqs.CPU83.RCU
11771 ? 4% +16.9% 13759 ? 10% softirqs.CPU84.RCU
11465 ? 4% +18.7% 13611 ? 5% softirqs.CPU85.RCU
11584 ? 4% +19.4% 13832 ? 6% softirqs.CPU86.RCU
11564 ? 4% +18.4% 13689 ? 4% softirqs.CPU87.RCU
11395 ? 4% +19.3% 13593 ? 5% softirqs.CPU88.RCU
11618 ? 4% +18.2% 13733 ? 5% softirqs.CPU89.RCU
11535 ? 5% +18.3% 13643 ? 5% softirqs.CPU9.RCU
11525 ? 3% +17.6% 13548 ? 5% softirqs.CPU90.RCU
11367 ? 3% +21.2% 13779 ? 5% softirqs.CPU91.RCU
11519 ? 6% +17.7% 13555 ? 6% softirqs.CPU92.RCU
11286 ? 6% +23.9% 13979 ? 5% softirqs.CPU93.RCU
11748 ? 4% +17.8% 13836 ? 6% softirqs.CPU95.RCU
7592 ? 11% +58.2% 12015 ? 57% softirqs.NET_RX
1165396 ? 3% +19.2% 1389425 ? 4% softirqs.RCU
172050 +14.4% 196874 interrupts.CPU0.LOC:Local_timer_interrupts
480047 ? 14% +77.7% 853153 ? 8% interrupts.CPU0.RES:Rescheduling_interrupts
172187 +14.4% 196935 interrupts.CPU1.LOC:Local_timer_interrupts
451155 ? 11% +91.2% 862509 ? 9% interrupts.CPU1.RES:Rescheduling_interrupts
172084 +14.4% 196921 interrupts.CPU10.LOC:Local_timer_interrupts
460876 ? 12% +83.3% 844723 ? 5% interrupts.CPU10.RES:Rescheduling_interrupts
172080 +14.4% 196910 interrupts.CPU11.LOC:Local_timer_interrupts
478819 ? 18% +90.1% 910319 ? 9% interrupts.CPU11.RES:Rescheduling_interrupts
172115 +14.4% 196840 interrupts.CPU12.LOC:Local_timer_interrupts
468516 ? 19% +79.9% 843060 ? 11% interrupts.CPU12.RES:Rescheduling_interrupts
172011 +14.4% 196859 interrupts.CPU13.LOC:Local_timer_interrupts
456764 ? 17% +85.1% 845446 ? 7% interrupts.CPU13.RES:Rescheduling_interrupts
172111 +14.4% 196877 interrupts.CPU14.LOC:Local_timer_interrupts
432031 ? 14% +99.3% 861102 ? 7% interrupts.CPU14.RES:Rescheduling_interrupts
172147 +14.4% 196870 interrupts.CPU15.LOC:Local_timer_interrupts
455488 ? 17% +90.7% 868551 ? 7% interrupts.CPU15.RES:Rescheduling_interrupts
172003 +14.5% 196913 interrupts.CPU16.LOC:Local_timer_interrupts
472295 ? 17% +75.4% 828368 ? 10% interrupts.CPU16.RES:Rescheduling_interrupts
171985 +14.5% 196916 interrupts.CPU17.LOC:Local_timer_interrupts
480713 ? 15% +78.2% 856566 ? 7% interrupts.CPU17.RES:Rescheduling_interrupts
171980 +14.5% 196865 interrupts.CPU18.LOC:Local_timer_interrupts
478651 ? 13% +79.1% 857356 ? 8% interrupts.CPU18.RES:Rescheduling_interrupts
172037 +14.4% 196875 interrupts.CPU19.LOC:Local_timer_interrupts
459007 ? 15% +85.8% 852930 ? 7% interrupts.CPU19.RES:Rescheduling_interrupts
172210 +14.4% 196928 interrupts.CPU2.LOC:Local_timer_interrupts
467677 ? 17% +79.9% 841505 ? 11% interrupts.CPU2.RES:Rescheduling_interrupts
172041 +14.4% 196856 interrupts.CPU20.LOC:Local_timer_interrupts
483538 ? 14% +82.0% 880187 ? 7% interrupts.CPU20.RES:Rescheduling_interrupts
172119 +14.4% 196941 interrupts.CPU21.LOC:Local_timer_interrupts
462455 ? 15% +78.8% 826649 ? 6% interrupts.CPU21.RES:Rescheduling_interrupts
171983 +14.4% 196830 interrupts.CPU22.LOC:Local_timer_interrupts
472547 ? 18% +78.4% 843091 ? 10% interrupts.CPU22.RES:Rescheduling_interrupts
171943 +14.5% 196860 interrupts.CPU23.LOC:Local_timer_interrupts
457901 ? 12% +79.1% 819895 ? 9% interrupts.CPU23.RES:Rescheduling_interrupts
171684 +14.6% 196801 interrupts.CPU24.LOC:Local_timer_interrupts
514148 ? 18% +62.5% 835721 ? 9% interrupts.CPU24.RES:Rescheduling_interrupts
171650 +14.6% 196776 interrupts.CPU25.LOC:Local_timer_interrupts
536969 ? 22% +61.9% 869226 ? 8% interrupts.CPU25.RES:Rescheduling_interrupts
171701 +14.6% 196809 interrupts.CPU26.LOC:Local_timer_interrupts
524497 ? 21% +69.0% 886296 ? 10% interrupts.CPU26.RES:Rescheduling_interrupts
171665 +14.7% 196818 interrupts.CPU27.LOC:Local_timer_interrupts
538990 ? 24% +59.3% 858808 ? 10% interrupts.CPU27.RES:Rescheduling_interrupts
171585 +14.7% 196782 interrupts.CPU28.LOC:Local_timer_interrupts
528650 ? 22% +68.2% 889093 ? 12% interrupts.CPU28.RES:Rescheduling_interrupts
171610 +14.7% 196780 interrupts.CPU29.LOC:Local_timer_interrupts
525768 ? 25% +67.5% 880456 ? 8% interrupts.CPU29.RES:Rescheduling_interrupts
172081 +14.4% 196925 interrupts.CPU3.LOC:Local_timer_interrupts
464521 ? 17% +77.0% 822073 ? 9% interrupts.CPU3.RES:Rescheduling_interrupts
171737 +14.6% 196758 interrupts.CPU30.LOC:Local_timer_interrupts
518840 ? 23% +73.1% 898314 ? 8% interrupts.CPU30.RES:Rescheduling_interrupts
171581 +14.7% 196801 interrupts.CPU31.LOC:Local_timer_interrupts
516607 ? 24% +67.8% 866720 ? 10% interrupts.CPU31.RES:Rescheduling_interrupts
171710 +14.6% 196793 interrupts.CPU32.LOC:Local_timer_interrupts
515658 ? 25% +60.0% 824828 ? 11% interrupts.CPU32.RES:Rescheduling_interrupts
171727 +14.6% 196808 interrupts.CPU33.LOC:Local_timer_interrupts
516588 ? 21% +65.9% 856986 ? 10% interrupts.CPU33.RES:Rescheduling_interrupts
171691 +14.6% 196814 interrupts.CPU34.LOC:Local_timer_interrupts
515117 ? 24% +61.1% 829679 ? 11% interrupts.CPU34.RES:Rescheduling_interrupts
171667 +14.6% 196777 interrupts.CPU35.LOC:Local_timer_interrupts
531011 ? 20% +56.1% 829120 ? 11% interrupts.CPU35.RES:Rescheduling_interrupts
171731 +14.6% 196809 interrupts.CPU36.LOC:Local_timer_interrupts
536363 ? 20% +58.8% 851958 ? 8% interrupts.CPU36.RES:Rescheduling_interrupts
171493 +14.8% 196879 interrupts.CPU37.LOC:Local_timer_interrupts
534322 ? 24% +55.3% 830066 ? 13% interrupts.CPU37.RES:Rescheduling_interrupts
171661 +14.6% 196807 interrupts.CPU38.LOC:Local_timer_interrupts
519552 ? 24% +63.9% 851431 ? 14% interrupts.CPU38.RES:Rescheduling_interrupts
171732 +14.6% 196842 interrupts.CPU39.LOC:Local_timer_interrupts
514974 ? 19% +73.5% 893368 ? 10% interrupts.CPU39.RES:Rescheduling_interrupts
172053 +14.5% 196937 interrupts.CPU4.LOC:Local_timer_interrupts
476008 ? 20% +76.4% 839533 ? 10% interrupts.CPU4.RES:Rescheduling_interrupts
171748 +14.6% 196838 interrupts.CPU40.LOC:Local_timer_interrupts
502383 ? 24% +75.5% 881705 ? 13% interrupts.CPU40.RES:Rescheduling_interrupts
171767 +14.6% 196788 interrupts.CPU41.LOC:Local_timer_interrupts
515739 ? 26% +66.7% 859913 ? 13% interrupts.CPU41.RES:Rescheduling_interrupts
171782 +14.6% 196785 interrupts.CPU42.LOC:Local_timer_interrupts
530399 ? 24% +70.0% 901663 ? 9% interrupts.CPU42.RES:Rescheduling_interrupts
171750 +14.6% 196831 interrupts.CPU43.LOC:Local_timer_interrupts
557949 ? 23% +58.6% 884833 ? 10% interrupts.CPU43.RES:Rescheduling_interrupts
171756 +14.6% 196802 interrupts.CPU44.LOC:Local_timer_interrupts
522061 ? 21% +68.7% 880527 ? 14% interrupts.CPU44.RES:Rescheduling_interrupts
171786 +14.6% 196790 interrupts.CPU45.LOC:Local_timer_interrupts
540577 ? 23% +65.1% 892540 ? 13% interrupts.CPU45.RES:Rescheduling_interrupts
171586 +14.7% 196800 interrupts.CPU46.LOC:Local_timer_interrupts
542628 ? 25% +53.6% 833219 ? 8% interrupts.CPU46.RES:Rescheduling_interrupts
171507 +14.8% 196849 interrupts.CPU47.LOC:Local_timer_interrupts
540778 ? 21% +57.4% 850965 ? 6% interrupts.CPU47.RES:Rescheduling_interrupts
172079 +14.4% 196832 interrupts.CPU48.LOC:Local_timer_interrupts
489128 ? 13% +71.4% 838135 ? 10% interrupts.CPU48.RES:Rescheduling_interrupts
172106 +14.4% 196899 interrupts.CPU49.LOC:Local_timer_interrupts
453944 ? 14% +96.8% 893583 ? 9% interrupts.CPU49.RES:Rescheduling_interrupts
171987 +14.5% 196856 interrupts.CPU5.LOC:Local_timer_interrupts
484175 ? 20% +71.6% 830828 ? 11% interrupts.CPU5.RES:Rescheduling_interrupts
172181 +14.3% 196828 interrupts.CPU50.LOC:Local_timer_interrupts
464812 ? 14% +81.2% 842077 ? 8% interrupts.CPU50.RES:Rescheduling_interrupts
172050 +14.5% 196930 interrupts.CPU51.LOC:Local_timer_interrupts
475875 ? 17% +71.0% 813946 ? 8% interrupts.CPU51.RES:Rescheduling_interrupts
172069 +14.4% 196915 interrupts.CPU52.LOC:Local_timer_interrupts
463346 ? 17% +83.9% 852084 ? 9% interrupts.CPU52.RES:Rescheduling_interrupts
172038 +14.4% 196897 interrupts.CPU53.LOC:Local_timer_interrupts
474533 ? 20% +74.9% 829996 ? 6% interrupts.CPU53.RES:Rescheduling_interrupts
172062 +14.4% 196870 interrupts.CPU54.LOC:Local_timer_interrupts
484893 ? 21% +75.5% 851218 ? 8% interrupts.CPU54.RES:Rescheduling_interrupts
171996 +14.4% 196808 interrupts.CPU55.LOC:Local_timer_interrupts
469735 ? 14% +72.4% 809917 ? 6% interrupts.CPU55.RES:Rescheduling_interrupts
172108 +14.4% 196906 interrupts.CPU56.LOC:Local_timer_interrupts
458936 ? 16% +83.5% 842109 ? 7% interrupts.CPU56.RES:Rescheduling_interrupts
172095 +14.4% 196859 interrupts.CPU57.LOC:Local_timer_interrupts
464846 ? 17% +84.6% 857895 ? 7% interrupts.CPU57.RES:Rescheduling_interrupts
172134 +14.4% 196906 interrupts.CPU58.LOC:Local_timer_interrupts
466419 ? 16% +78.6% 832811 ? 11% interrupts.CPU58.RES:Rescheduling_interrupts
172116 +14.4% 196912 interrupts.CPU59.LOC:Local_timer_interrupts
476724 ? 17% +88.1% 896558 ? 9% interrupts.CPU59.RES:Rescheduling_interrupts
172150 +14.4% 196869 interrupts.CPU6.LOC:Local_timer_interrupts
476706 ? 23% +76.7% 842133 ? 6% interrupts.CPU6.RES:Rescheduling_interrupts
172110 +14.4% 196819 interrupts.CPU60.LOC:Local_timer_interrupts
463662 ? 17% +81.6% 842003 ? 13% interrupts.CPU60.RES:Rescheduling_interrupts
172042 +14.4% 196903 interrupts.CPU61.LOC:Local_timer_interrupts
449164 ? 13% +89.3% 850111 ? 5% interrupts.CPU61.RES:Rescheduling_interrupts
172190 +14.4% 196954 interrupts.CPU62.LOC:Local_timer_interrupts
457682 ? 20% +83.1% 837803 ? 5% interrupts.CPU62.RES:Rescheduling_interrupts
172142 +14.4% 196935 interrupts.CPU63.LOC:Local_timer_interrupts
466861 ? 16% +85.3% 865097 ? 9% interrupts.CPU63.RES:Rescheduling_interrupts
172091 +14.4% 196921 interrupts.CPU64.LOC:Local_timer_interrupts
460827 ? 18% +83.2% 844369 ? 10% interrupts.CPU64.RES:Rescheduling_interrupts
172084 +14.4% 196919 interrupts.CPU65.LOC:Local_timer_interrupts
465536 ? 20% +81.8% 846151 ? 3% interrupts.CPU65.RES:Rescheduling_interrupts
172076 +14.4% 196923 interrupts.CPU66.LOC:Local_timer_interrupts
473969 ? 16% +84.7% 875513 ? 8% interrupts.CPU66.RES:Rescheduling_interrupts
172104 +14.4% 196943 interrupts.CPU67.LOC:Local_timer_interrupts
454104 ? 16% +88.0% 853533 ? 5% interrupts.CPU67.RES:Rescheduling_interrupts
172095 +14.4% 196922 interrupts.CPU68.LOC:Local_timer_interrupts
474337 ? 13% +88.4% 893576 ? 9% interrupts.CPU68.RES:Rescheduling_interrupts
172210 +14.3% 196912 interrupts.CPU69.LOC:Local_timer_interrupts
474798 ? 11% +77.9% 844752 ? 5% interrupts.CPU69.RES:Rescheduling_interrupts
172054 +14.4% 196906 interrupts.CPU7.LOC:Local_timer_interrupts
476046 ? 12% +74.0% 828127 ? 8% interrupts.CPU7.RES:Rescheduling_interrupts
172097 +14.4% 196869 interrupts.CPU70.LOC:Local_timer_interrupts
488889 ? 19% +73.5% 848149 ? 15% interrupts.CPU70.RES:Rescheduling_interrupts
172049 +14.5% 196978 interrupts.CPU71.LOC:Local_timer_interrupts
453513 ? 15% +85.4% 840788 ? 9% interrupts.CPU71.RES:Rescheduling_interrupts
171738 +14.6% 196825 interrupts.CPU72.LOC:Local_timer_interrupts
516711 ? 21% +63.7% 845713 ? 8% interrupts.CPU72.RES:Rescheduling_interrupts
171619 +14.7% 196825 interrupts.CPU73.LOC:Local_timer_interrupts
546790 ? 22% +61.4% 882525 ? 8% interrupts.CPU73.RES:Rescheduling_interrupts
171744 +14.6% 196806 interrupts.CPU74.LOC:Local_timer_interrupts
522024 ? 23% +65.2% 862192 ? 9% interrupts.CPU74.RES:Rescheduling_interrupts
171716 +14.6% 196854 interrupts.CPU75.LOC:Local_timer_interrupts
531488 ? 25% +66.1% 882690 ? 10% interrupts.CPU75.RES:Rescheduling_interrupts
171708 +14.6% 196845 interrupts.CPU76.LOC:Local_timer_interrupts
522789 ? 24% +70.4% 890642 ? 11% interrupts.CPU76.RES:Rescheduling_interrupts
171657 +14.7% 196846 interrupts.CPU77.LOC:Local_timer_interrupts
532873 ? 25% +67.1% 890242 ? 11% interrupts.CPU77.RES:Rescheduling_interrupts
171793 +14.6% 196911 interrupts.CPU78.LOC:Local_timer_interrupts
529585 ? 22% +69.5% 897564 ? 10% interrupts.CPU78.RES:Rescheduling_interrupts
171649 +14.7% 196878 interrupts.CPU79.LOC:Local_timer_interrupts
507941 ? 18% +70.7% 866940 ? 12% interrupts.CPU79.RES:Rescheduling_interrupts
172110 +14.4% 196909 interrupts.CPU8.LOC:Local_timer_interrupts
462124 ? 16% +86.6% 862371 ? 4% interrupts.CPU8.RES:Rescheduling_interrupts
171701 +14.7% 196908 interrupts.CPU80.LOC:Local_timer_interrupts
540658 ? 22% +56.5% 846154 ? 8% interrupts.CPU80.RES:Rescheduling_interrupts
171845 +14.5% 196837 interrupts.CPU81.LOC:Local_timer_interrupts
530859 ? 22% +58.3% 840275 ? 7% interrupts.CPU81.RES:Rescheduling_interrupts
171748 +14.6% 196840 interrupts.CPU82.LOC:Local_timer_interrupts
552798 ? 24% +54.5% 854162 ? 10% interrupts.CPU82.RES:Rescheduling_interrupts
171813 +14.6% 196815 interrupts.CPU83.LOC:Local_timer_interrupts
531018 ? 21% +55.8% 827459 ? 12% interrupts.CPU83.RES:Rescheduling_interrupts
171803 +14.6% 196872 interrupts.CPU84.LOC:Local_timer_interrupts
539149 ? 22% +58.8% 856221 ? 8% interrupts.CPU84.RES:Rescheduling_interrupts
171569 +14.8% 196880 interrupts.CPU85.LOC:Local_timer_interrupts
526010 ? 22% +66.2% 874052 ? 9% interrupts.CPU85.RES:Rescheduling_interrupts
171692 +14.7% 196906 interrupts.CPU86.LOC:Local_timer_interrupts
534575 ? 22% +66.5% 890142 ? 14% interrupts.CPU86.RES:Rescheduling_interrupts
171716 +14.6% 196813 interrupts.CPU87.LOC:Local_timer_interrupts
511131 ? 21% +75.4% 896459 ? 8% interrupts.CPU87.RES:Rescheduling_interrupts
171805 +14.6% 196896 interrupts.CPU88.LOC:Local_timer_interrupts
501808 ? 22% +78.8% 897272 ? 12% interrupts.CPU88.RES:Rescheduling_interrupts
171807 +14.6% 196844 interrupts.CPU89.LOC:Local_timer_interrupts
533732 ? 27% +65.9% 885715 ? 9% interrupts.CPU89.RES:Rescheduling_interrupts
172160 +14.3% 196842 interrupts.CPU9.LOC:Local_timer_interrupts
462507 ? 15% +82.6% 844650 ? 7% interrupts.CPU9.RES:Rescheduling_interrupts
171869 +14.6% 196894 interrupts.CPU90.LOC:Local_timer_interrupts
534561 ? 23% +62.3% 867424 ? 8% interrupts.CPU90.RES:Rescheduling_interrupts
171852 +14.6% 196874 interrupts.CPU91.LOC:Local_timer_interrupts
554759 ? 21% +61.1% 893865 ? 10% interrupts.CPU91.RES:Rescheduling_interrupts
171819 +14.6% 196887 interrupts.CPU92.LOC:Local_timer_interrupts
525086 ? 26% +65.3% 867955 ? 11% interrupts.CPU92.RES:Rescheduling_interrupts
171811 +14.6% 196888 interrupts.CPU93.LOC:Local_timer_interrupts
524309 ? 22% +67.6% 878816 ? 10% interrupts.CPU93.RES:Rescheduling_interrupts
171673 +14.7% 196886 interrupts.CPU94.LOC:Local_timer_interrupts
535674 ? 24% +57.9% 845858 ? 6% interrupts.CPU94.RES:Rescheduling_interrupts
171591 +14.7% 196883 interrupts.CPU95.LOC:Local_timer_interrupts
531867 ? 24% +61.5% 858734 ? 8% interrupts.CPU95.RES:Rescheduling_interrupts
16502093 +14.5% 18898965 interrupts.LOC:Local_timer_interrupts
47805899 ? 7% +72.4% 82403841 ? 2% interrupts.RES:Rescheduling_interrupts



hackbench.throughput

560000 +------------------------------------------------------------------+
550000 |-+ + +.|
|.+.+.+.+ +. .+.+.+.+.+.+.+.+..+.+.+.+.+.+. .+.+.+.+..+.+.+. + |
540000 |-+ + + + |
530000 |-+ |
520000 |-+ |
510000 |-+ |
| |
500000 |-+ |
490000 |-+ |
480000 |-+ O O O O O O O O O O O |
470000 |-+ O O O O O O O O O |
| O O O O O O O O O O |
460000 |-O |
450000 +------------------------------------------------------------------+


hackbench.time.user_time

1040 +--------------------------------------------------------------------+
| O O O O |
1020 |-+ O O O O O O O O O O O O O O |
| O O O O O O O O O O O |
| O |
1000 |-+ |
| |
980 |-+ + |
| : : |
960 |-+ : : |
|.+.+. .+. .+..+. .+.+.. .+. .+ +.. .+ : +. |
| +..+.+.+ + +.+.+ +.+ + + + +.+.+.+ : + +.|
940 |-+ + + |
| |
920 +--------------------------------------------------------------------+


hackbench.time.system_time

8400 +--------------------------------------------------------------------+
| O |
8200 |-+ O O O O |
8000 |-+ O O O O O O O O O O O O |
| O O O O O O O O O O O O |
7800 |-+ |
| |
7600 |-+ |
| |
7400 |-+ |
7200 |-+ |
| |
7000 |-+ .+.+.+. + |
|.+.+.+..+. .+.+.+.+..+.+.+.+.+.+..+.+. .+.+.. .+ +.. + + |
6800 +--------------------------------------------------------------------+


hackbench.time.elapsed_time

102 +---------------------------------------------------------------------+
100 |-+ O O |
| O O O |
98 |-+ O O O O O O O O O O O O O O O |
96 |-+ O O O O O O O O O O |
| |
94 |-+ |
92 |-+ |
90 |-+ |
| |
88 |-+ |
86 |-+ .+.+.+. .+ |
|.+.+..+.+.+.+.+..+.+.+.+.+.+..+.+.+.+.+..+. .+.+. .+ +.+. + |
84 |-+ + +. +.|
82 +---------------------------------------------------------------------+


hackbench.time.elapsed_time.max

102 +---------------------------------------------------------------------+
100 |-+ O O |
| O O O |
98 |-+ O O O O O O O O O O O O O O O |
96 |-+ O O O O O O O O O O |
| |
94 |-+ |
92 |-+ |
90 |-+ |
| |
88 |-+ |
86 |-+ .+.+.+. .+ |
|.+.+..+.+.+.+.+..+.+.+.+.+.+..+.+.+.+.+..+. .+.+. .+ +.+. + |
84 |-+ + +. +.|
82 +---------------------------------------------------------------------+


hackbench.time.voluntary_context_switches

1.25e+09 +----------------------------------------------------------------+
| |
1.2e+09 |-O O O O O O O |
| O O O O O O O O O O O O O O O |
| O O O O O |
1.15e+09 |-+ O O O O |
| |
1.1e+09 |-+ |
| |
1.05e+09 |-+ |
| .+ |
|.+.+.+.+. .+ + .+.+.+.+. .+.+..+.+.+. .+.+. .+.+.+ +. .|
1e+09 |-+ + + + +.+.+ +.+ + + + |
| + |
9.5e+08 +----------------------------------------------------------------+


hackbench.time.involuntary_context_switches

6.6e+08 +-----------------------------------------------------------------+
| O O O O |
6.4e+08 |-+ O O O O O O O O O O O O O O |
6.2e+08 |-+ O O O O O |
| O O O O |
6e+08 |-+ |
5.8e+08 |-+ |
| |
5.6e+08 |-+ |
5.4e+08 |-+ |
| |
5.2e+08 |-+ .+.. +. |
5e+08 |.+.+.+.+.+.+ +.+.+.+.+.+.+.+.+.+.+. .+ +.+..+.+.+.+.+ : +.|
| + + + + : |
4.8e+08 +-----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (42.21 kB)
config-5.14.0-rc3-00142-g3a34b13a88ca (178.04 kB)
job-script (8.13 kB)
job.yaml (5.41 kB)
reproduce (575.00 B)
Download all attachments

2021-08-02 17:08:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [pipe] 3a34b13a88: hackbench.throughput -12.6% regression

On Sun, Aug 1, 2021 at 7:31 PM kernel test robot <[email protected]> wrote:
>
> FYI, we noticed a -12.6% regression of hackbench.throughput due to commit:

I had already forgotten how sensitive hackbench is to pipe wakeups.

I think it's all good for stable, and we can live with this -
particularly since I'm not sure how much hackbench really matters.

But it might be one of those things where it is a good idea to make
the crazy epoll case explicitly special.

Sandeep, does something like the attached patch (written to be on top
of the existing one) work for you?

It's not a great patch - I'd like to catch _just_ the broken EPOLLET
case, but this patch triggers on any select/poll usage - but it might
be a good idea to do it this way simply because it now separates out
the "ok, now we need to do stupid things" logic, so that we *could*
make that "stupid things" test tighter some day.

And I think it's actually better to make sure that the unnecessary
extra wakeup be the _last_ one a write() system call does, not the
first one. So this may be the way to go for that reason too.

This all probably doesn't matter one whit, but hey, I love how the
kernel test robot gives us heads-up about performance anomalies, so I
try to take them seriously when they aren't totally strange (which
happens sometimes: some of the benchmarks end up having subtle cache
placement effects)

Linus

2021-08-02 17:16:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [pipe] 3a34b13a88: hackbench.throughput -12.6% regression

On Mon, Aug 2, 2021 at 10:06 AM Linus Torvalds
<[email protected]> wrote:
>
> Sandeep, does something like the attached patch (written to be on top
> of the existing one) work for you?

.. and by "attached patch" I obviously meant the patch that wasn't attached.

It's here.

Linus


Attachments:
patch.diff (2.54 kB)

2021-08-10 18:27:38

by Sandeep Patil

[permalink] [raw]
Subject: Re: [pipe] 3a34b13a88: hackbench.throughput -12.6% regression

On 8/2/21 5:14 PM, Linus Torvalds wrote:
> On Mon, Aug 2, 2021 at 10:06 AM Linus Torvalds
> <[email protected]> wrote:
>>
>> Sandeep, does something like the attached patch (written to be on top
>> of the existing one) work for you?
>
> .. and by "attached patch" I obviously meant the patch that wasn't attached.
>
> It's here.
>

Thank and apologies for delay. I'll get back by my EOD if this still works.

Sandeep

2021-08-11 15:48:15

by Sandeep Patil

[permalink] [raw]
Subject: Re: [pipe] 3a34b13a88: hackbench.throughput -12.6% regression

On 8/2/21 5:06 PM, Linus Torvalds wrote:
> On Sun, Aug 1, 2021 at 7:31 PM kernel test robot <[email protected]> wrote:
>>
>> FYI, we noticed a -12.6% regression of hackbench.throughput due to commit:
>
> I had already forgotten how sensitive hackbench is to pipe wakeups.
>
> I think it's all good for stable, and we can live with this -
> particularly since I'm not sure how much hackbench really matters.
>
> But it might be one of those things where it is a good idea to make
> the crazy epoll case explicitly special.
>
> Sandeep, does something like the attached patch (written to be on top
> of the existing one) work for you?
>
> It's not a great patch - I'd like to catch _just_ the broken EPOLLET
> case, but this patch triggers on any select/poll usage - but it might
> be a good idea to do it this way simply because it now separates out
> the "ok, now we need to do stupid things" logic, so that we *could*
> make that "stupid things" test tighter some day.
>
> And I think it's actually better to make sure that the unnecessary
> extra wakeup be the _last_ one a write() system call does, not the
> first one. So this may be the way to go for that reason too.

So the patch works for Android apps that I could test like
the last one did.

Also, the way that library was using pipes, I think first/last write
doesn't matter since the kernel will only see one small write afaict.
So, if this change helps with performance, it LGTM.

I can make sure its picked up for Android if you decide to merge in
your tree and report if something breaks happens again (I don't expect
anything)

>
> This all probably doesn't matter one whit, but hey, I love how the
> kernel test robot gives us heads-up about performance anomalies, so I
> try to take them seriously when they aren't totally strange (which
> happens sometimes: some of the benchmarks end up having subtle cache
> placement effects)
>

Thanks for sending this and sorry for the delay again.

- ssp


2021-08-18 15:33:29

by Mel Gorman

[permalink] [raw]
Subject: Re: [pipe] 3a34b13a88: hackbench.throughput -12.6% regression

On Mon, Aug 02, 2021 at 10:14:14AM -0700, Linus Torvalds wrote:
> On Mon, Aug 2, 2021 at 10:06 AM Linus Torvalds
> <[email protected]> wrote:
> >
> > Sandeep, does something like the attached patch (written to be on top
> > of the existing one) work for you?
>
> .. and by "attached patch" I obviously meant the patch that wasn't attached.
>
> It's here.
>

Hi Linus,

Our own test systems also detected the regression and automatically
bisected to 3a34b13a88 but it is not universal. For example, it's not
visible on a single socket 8-cpu skylake machine but it gets progressively
worse the more CPUs there are and potentially the more LLCs there are. I
say LLCs because the number of interrupts generated was significantly
higher which *might* be more IPIs being sent to tasks running on CPUs
that do not share target with the waker.

Your "attached" patch alleviates the problem. The worse results were on
a Zen3 machine (256 CPUs, lots of last level caches).

hackbench-process-pipes
5.14.0-rc3 5.14.0-rc3 5.14.0-rc3
vanilla alwayswake-v1r1 specialcase-v1r1
Amean 1 0.3800 ( 0.00%) 0.4520 * -18.95%* 0.3580 ( 5.79%)
Amean 4 0.5107 ( 0.00%) 0.6660 * -30.42%* 0.5080 ( 0.52%)
Amean 7 0.5717 ( 0.00%) 0.6387 * -11.72%* 0.5310 * 7.11%*
Amean 12 0.5853 ( 0.00%) 0.8563 * -46.30%* 0.5457 ( 6.78%)
Amean 21 0.7693 ( 0.00%) 1.3150 * -70.93%* 0.7297 ( 5.16%)
Amean 30 0.9667 ( 0.00%) 1.8530 * -91.69%* 0.8547 ( 11.59%)
Amean 48 1.3100 ( 0.00%) 2.5027 * -91.04%* 1.3143 ( -0.33%)
Amean 79 2.0633 ( 0.00%) 3.4983 * -69.55%* 2.1180 ( -2.65%)
Amean 110 2.8080 ( 0.00%) 3.9927 * -42.19%* 2.7850 ( 0.82%)
Amean 141 3.6237 ( 0.00%) 4.6810 * -29.18%* 3.6663 ( -1.18%)
Amean 172 4.6480 ( 0.00%) 5.6707 * -22.00%* 4.8003 ( -3.28%)
Amean 203 5.2630 ( 0.00%) 6.5000 * -23.50%* 5.6100 ( -6.59%)
Amean 234 6.3290 ( 0.00%) 7.1473 * -12.93%* 6.1570 ( 2.72%)
Amean 265 7.0907 ( 0.00%) 7.9200 * -11.70%* 7.1417 ( -0.72%)
Amean 296 7.7550 ( 0.00%) 8.2590 * -6.50%* 7.8167 ( -0.80%)

alwayswake-v1r1 is commit 3a34b13a88 cherry-picked on top of 5.14-rc3.
specialcase-v1r1 is your suggested patch on top.

The * mark results that are likely statistically significant so with
your suggested patch, most of the results are mostly in the noise.

While this is Zen3, the same problem exists on at least 2-socket
cascadelake, 2-socket skylake and 2-socket broadwell systems although
not to the same degree (different CPU counts and topology).

I know hackbench is not particularly interesting but it's used often
enough when comparing kernels and patches that commit 3a34b13a88 will be
a regression magnet.

--
Mel Gorman
SUSE Labs

2021-08-18 18:20:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [pipe] 3a34b13a88: hackbench.throughput -12.6% regression

On Wed, Aug 18, 2021 at 8:31 AM Mel Gorman <[email protected]> wrote:
>
> I know hackbench is not particularly interesting but it's used often
> enough when comparing kernels and patches that commit 3a34b13a88 will be
> a regression magnet.

Fair enough.

And I like that patch anyway because of how it makes that EPOLLET
behaviour much more explicit, so I just committed it.

Linus