2019-11-27 00:56:25

by Chen, Rong A

[permalink] [raw]
Subject: [x86/mm/pat] 8d04a5f97a: phoronix-test-suite.glmark2.0.score -23.7% regression

Greeting,

FYI, we noticed a -23.7% regression of phoronix-test-suite.glmark2.0.score due to commit:


commit: 8d04a5f97a5fa9d7afdf46eda3a5ceaa973a1bcc ("x86/mm/pat: Convert the PAT tree to a generic interval tree")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: phoronix-test-suite
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
with following parameters:

need_x: true
test: glmark2-1.1.0
cpufreq_governor: performance
ucode: 0x500002b

test-description: The Phoronix Test Suite is the most comprehensive testing and benchmarking platform available that provides an extensible framework for which new tests can be easily added.
test-url: http://www.phoronix-test-suite.com/

In addition to that, the commit also has significant impact on the following tests:

+------------------+-----------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.gtkperf.0.seconds 26.8% regression |
| test machine | 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory |
| test parameters | cpufreq_governor=performance |
| | need_x=true |
| | test=gtkperf-1.2.1 |
| | ucode=0x500002b |
+------------------+-----------------------------------------------------------------------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/need_x/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.6/true/debian-x86_64-phoronix/lkp-csl-2sp8/glmark2-1.1.0/phoronix-test-suite/0x500002b

commit:
9f4813b531 (" Linux 5.4-rc8")
8d04a5f97a ("x86/mm/pat: Convert the PAT tree to a generic interval tree")

9f4813b531a0b8cc 8d04a5f97a5fa9d7afdf46eda3a
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
1:4 -25% :4 dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x
2:4 -50% :4 dmesg.WARNING:stack_recursion
%stddev %change %stddev
\ | \
290.50 -23.7% 221.75 ± 4% phoronix-test-suite.glmark2.0.score
210.25 -24.1% 159.50 ± 4% phoronix-test-suite.glmark2.1.score
35084 -19.8% 28122 ± 3% vmstat.system.cs
16157 +1.6% 16415 proc-vmstat.nr_kernel_stack
17551 +2.0% 17909 proc-vmstat.nr_slab_reclaimable
46846 +2.4% 47962 proc-vmstat.nr_slab_unreclaimable
156401 ± 15% -19.5% 125917 ± 11% proc-vmstat.numa_hint_faults_local
2091859 -2.4% 2041250 ± 2% proc-vmstat.pgfault
72.57 ± 2% -10.0 62.61 ± 19% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
1.25 ± 8% -0.5 0.79 ± 58% perf-profile.calltrace.cycles-pp.irq_enter.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
1.04 ± 10% -0.4 0.66 ± 58% perf-profile.calltrace.cycles-pp.tick_irq_enter.irq_enter.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state
72.70 ± 2% -9.9 62.81 ± 19% perf-profile.children.cycles-pp.intel_idle
1.28 ± 8% -0.3 0.94 ± 30% perf-profile.children.cycles-pp.irq_enter
1.08 ± 9% -0.3 0.78 ± 28% perf-profile.children.cycles-pp.tick_irq_enter
0.47 ± 14% -0.2 0.32 ± 39% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.48 ± 4% -0.1 0.37 ± 32% perf-profile.children.cycles-pp.futex_wait_queue_me
0.16 ± 11% -0.1 0.06 ± 63% perf-profile.children.cycles-pp.perf_event_task_tick
0.20 ± 5% -0.1 0.10 ± 62% perf-profile.children.cycles-pp.rb_next
0.11 ± 15% -0.1 0.03 ±100% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.18 ± 7% -0.1 0.10 ± 61% perf-profile.children.cycles-pp.update_ts_time_stats
72.59 ± 2% -9.8 62.77 ± 19% perf-profile.self.cycles-pp.intel_idle
0.16 ± 14% -0.1 0.06 ± 63% perf-profile.self.cycles-pp.perf_event_task_tick
0.18 ± 6% -0.1 0.10 ± 65% perf-profile.self.cycles-pp.rb_next
26427524 -30.2% 18439861 ± 11% perf-stat.i.branch-misses
1.264e+08 -32.7% 85112641 ± 15% perf-stat.i.cache-references
35249 -19.8% 28261 ± 3% perf-stat.i.context-switches
1663983 ± 10% -47.0% 881904 ± 41% perf-stat.i.dTLB-load-misses
2.55e+09 ± 3% -29.5% 1.799e+09 ± 13% perf-stat.i.dTLB-loads
1.123e+09 ± 2% -6.9% 1.046e+09 ± 3% perf-stat.i.dTLB-stores
1.112e+10 -15.7% 9.376e+09 ± 4% perf-stat.i.instructions
6700 -21.1% 5289 ± 7% perf-stat.i.instructions-per-iTLB-miss
92755 ± 4% -5.2% 87901 perf-stat.i.node-loads
6929 -20.3% 5525 ± 7% perf-stat.overall.instructions-per-iTLB-miss
26388462 -30.2% 18414499 ± 11% perf-stat.ps.branch-misses
1.262e+08 -32.7% 84986397 ± 15% perf-stat.ps.cache-references
35191 -19.8% 28218 ± 3% perf-stat.ps.context-switches
1661390 ± 10% -47.0% 880613 ± 41% perf-stat.ps.dTLB-load-misses
2.546e+09 ± 3% -29.4% 1.796e+09 ± 13% perf-stat.ps.dTLB-loads
1.122e+09 ± 2% -6.8% 1.045e+09 ± 3% perf-stat.ps.dTLB-stores
1.111e+10 -15.7% 9.362e+09 ± 4% perf-stat.ps.instructions
92587 ± 4% -5.2% 87778 perf-stat.ps.node-loads
7.805e+12 -16.3% 6.535e+12 ± 4% perf-stat.total.instructions
421893 ± 13% -33.5% 280495 ± 17% numa-meminfo.node0.Active
301962 ± 18% -24.7% 227271 ± 3% numa-meminfo.node0.Active(anon)
119930 -55.6% 53224 ± 75% numa-meminfo.node0.Active(file)
289371 ± 19% -23.6% 220986 ± 3% numa-meminfo.node0.AnonPages
1599232 -46.1% 861252 ± 63% numa-meminfo.node0.FilePages
1467294 -45.5% 800320 ± 63% numa-meminfo.node0.Inactive
406871 -60.9% 159255 ± 92% numa-meminfo.node0.Inactive(file)
10042 ± 2% -14.2% 8615 ± 12% numa-meminfo.node0.KernelStack
109027 -67.1% 35838 ± 78% numa-meminfo.node0.Mapped
2488468 ± 2% -36.4% 1582090 ± 41% numa-meminfo.node0.MemUsed
4081 ± 3% -53.1% 1913 ± 56% numa-meminfo.node0.PageTables
169752 ± 3% -15.8% 143010 ± 19% numa-meminfo.node0.Slab
183294 ± 31% +78.6% 327336 ± 16% numa-meminfo.node1.Active
176459 ± 32% +44.1% 254351 ± 4% numa-meminfo.node1.Active(anon)
6834 ± 11% +967.9% 72984 ± 57% numa-meminfo.node1.Active(file)
171600 ± 33% +42.0% 243637 ± 2% numa-meminfo.node1.AnonPages
18539 ± 5% +3986.1% 757532 ± 72% numa-meminfo.node1.FilePages
6284 ± 14% +10623.2% 673848 ± 75% numa-meminfo.node1.Inactive
3812 ± 13% +6513.3% 252099 ± 58% numa-meminfo.node1.Inactive(file)
4526 ± 18% +1622.5% 77962 ± 35% numa-meminfo.node1.Mapped
477729 ± 11% +190.4% 1387370 ± 48% numa-meminfo.node1.MemUsed
704.00 ± 20% +307.6% 2869 ± 37% numa-meminfo.node1.PageTables
75486 ± 18% -24.7% 56818 ± 3% numa-vmstat.node0.nr_active_anon
29985 -55.6% 13304 ± 75% numa-vmstat.node0.nr_active_file
72338 ± 19% -23.6% 55246 ± 3% numa-vmstat.node0.nr_anon_pages
399807 -46.1% 215312 ± 63% numa-vmstat.node0.nr_file_pages
101715 -60.9% 39814 ± 92% numa-vmstat.node0.nr_inactive_file
10043 ± 2% -14.2% 8614 ± 12% numa-vmstat.node0.nr_kernel_stack
27254 -67.1% 8959 ± 78% numa-vmstat.node0.nr_mapped
1019 ± 3% -53.2% 477.50 ± 56% numa-vmstat.node0.nr_page_table_pages
75486 ± 18% -24.7% 56818 ± 3% numa-vmstat.node0.nr_zone_active_anon
29985 -55.6% 13304 ± 75% numa-vmstat.node0.nr_zone_active_file
101715 -60.9% 39814 ± 92% numa-vmstat.node0.nr_zone_inactive_file
1483935 ± 6% -18.2% 1214502 ± 21% numa-vmstat.node0.numa_local
44115 ± 32% +44.2% 63599 ± 4% numa-vmstat.node1.nr_active_anon
1708 ± 11% +968.1% 18243 ± 57% numa-vmstat.node1.nr_active_file
42900 ± 33% +42.0% 60922 ± 2% numa-vmstat.node1.nr_anon_pages
4635 ± 5% +3985.9% 189382 ± 72% numa-vmstat.node1.nr_file_pages
952.75 ± 13% +6515.2% 63026 ± 58% numa-vmstat.node1.nr_inactive_file
1131 ± 18% +1623.5% 19492 ± 35% numa-vmstat.node1.nr_mapped
176.00 ± 20% +307.4% 717.00 ± 37% numa-vmstat.node1.nr_page_table_pages
44115 ± 32% +44.2% 63599 ± 4% numa-vmstat.node1.nr_zone_active_anon
1708 ± 11% +968.1% 18243 ± 57% numa-vmstat.node1.nr_zone_active_file
952.75 ± 13% +6515.2% 63026 ± 58% numa-vmstat.node1.nr_zone_inactive_file
1546 ± 3% +34.6% 2081 ± 6% slabinfo.UNIX.active_objs
1546 ± 3% +34.6% 2081 ± 6% slabinfo.UNIX.num_objs
2089 ± 4% +11.9% 2337 ± 7% slabinfo.dmaengine-unmap-16.active_objs
2089 ± 4% +11.9% 2337 ± 7% slabinfo.dmaengine-unmap-16.num_objs
21286 ± 4% +10.5% 23516 ± 5% slabinfo.filp.active_objs
21594 ± 4% +10.8% 23932 ± 4% slabinfo.filp.num_objs
3632 ± 7% +15.4% 4190 ± 7% slabinfo.mnt_cache.active_objs
3632 ± 7% +15.4% 4190 ± 7% slabinfo.mnt_cache.num_objs
3343 ± 4% +23.1% 4116 ± 7% slabinfo.numa_policy.active_objs
3343 ± 4% +23.1% 4116 ± 7% slabinfo.numa_policy.num_objs
5711 ± 3% +10.0% 6282 ± 4% slabinfo.ovl_inode.active_objs
5711 ± 3% +10.0% 6282 ± 4% slabinfo.ovl_inode.num_objs
2143 ± 4% +15.8% 2481 ± 7% slabinfo.pool_workqueue.active_objs
2143 ± 4% +15.8% 2481 ± 7% slabinfo.pool_workqueue.num_objs
2309 ± 8% +14.7% 2648 ± 6% slabinfo.skbuff_ext_cache.active_objs
2309 ± 8% +14.7% 2648 ± 6% slabinfo.skbuff_ext_cache.num_objs
2785 ± 3% +21.8% 3390 ± 4% slabinfo.sock_inode_cache.active_objs
2785 ± 3% +21.8% 3390 ± 4% slabinfo.sock_inode_cache.num_objs
1300 ± 3% +15.3% 1500 ± 8% slabinfo.task_group.active_objs
1300 ± 3% +15.3% 1500 ± 8% slabinfo.task_group.num_objs
471.00 ± 12% -33.1% 315.25 ± 31% slabinfo.xfrm_state.active_objs
471.00 ± 12% -33.1% 315.25 ± 31% slabinfo.xfrm_state.num_objs
15189 ± 4% -16.9% 12626 ± 4% sched_debug.cfs_rq:/.exec_clock.avg
54802 ± 5% -20.8% 43426 ± 7% sched_debug.cfs_rq:/.load.avg
59.91 ± 51% -38.0% 37.16 ± 5% sched_debug.cfs_rq:/.load_avg.avg
2493 ±112% -68.5% 784.19 ± 3% sched_debug.cfs_rq:/.load_avg.max
320.29 ± 88% -55.3% 143.18 ± 2% sched_debug.cfs_rq:/.load_avg.stddev
68001 ± 4% -12.7% 59359 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg
67912 -18.0% 55681 ± 8% sched_debug.cfs_rq:/.min_vruntime.stddev
0.11 ± 5% -26.6% 0.08 ± 7% sched_debug.cfs_rq:/.nr_running.avg
0.30 ± 2% -11.0% 0.27 ± 3% sched_debug.cfs_rq:/.nr_running.stddev
28.47 ± 6% -16.1% 23.89 ± 6% sched_debug.cfs_rq:/.runnable_load_avg.avg
754.35 ± 4% -8.9% 687.54 ± 6% sched_debug.cfs_rq:/.runnable_load_avg.max
127.97 ± 4% -10.0% 115.24 ± 6% sched_debug.cfs_rq:/.runnable_load_avg.stddev
54698 ± 5% -20.6% 43419 ± 7% sched_debug.cfs_rq:/.runnable_weight.avg
44849 ± 15% -52.5% 21314 ± 60% sched_debug.cfs_rq:/.spread0.avg
67914 -18.0% 55682 ± 8% sched_debug.cfs_rq:/.spread0.stddev
110.05 ± 2% -20.3% 87.72 ± 2% sched_debug.cfs_rq:/.util_avg.avg
22.41 ± 9% -27.7% 16.21 ± 10% sched_debug.cfs_rq:/.util_est_enqueued.avg
0.05 ± 4% -10.7% 0.05 ± 4% sched_debug.cpu.nr_running.avg
134130 -20.1% 107139 ± 3% sched_debug.cpu.nr_switches.avg
764365 ± 4% -25.1% 572854 ± 5% sched_debug.cpu.nr_switches.max
212569 ± 2% -22.1% 165622 ± 8% sched_debug.cpu.nr_switches.stddev
131734 -20.5% 104682 ± 3% sched_debug.cpu.sched_count.avg
760935 ± 5% -25.2% 569239 ± 5% sched_debug.cpu.sched_count.max
212271 ± 2% -22.0% 165483 ± 8% sched_debug.cpu.sched_count.stddev
65857 -20.5% 52334 ± 3% sched_debug.cpu.sched_goidle.avg
380419 ± 5% -25.2% 284602 ± 5% sched_debug.cpu.sched_goidle.max
106128 ± 2% -22.0% 82735 ± 8% sched_debug.cpu.sched_goidle.stddev
65779 -20.6% 52253 ± 3% sched_debug.cpu.ttwu_count.avg
164575 ± 4% -15.1% 139787 ± 12% sched_debug.cpu.ttwu_count.stddev
925.77 +14.7% 1061 ± 4% sched_debug.cpu.ttwu_local.avg
95.00 ± 81% -96.3% 3.50 ± 91% interrupts.48:PCI-MSI.31981581-edge.i40e-eth0-TxRx-12
81.00 ±150% -100.0% 0.00 interrupts.59:PCI-MSI.31981592-edge.i40e-eth0-TxRx-23
0.00 +4.9e+105% 4915 ±172% interrupts.76:PCI-MSI.31981609-edge.i40e-eth0-TxRx-40
71.75 ±167% -99.7% 0.25 ±173% interrupts.82:PCI-MSI.31981615-edge.i40e-eth0-TxRx-46
1278 ± 29% +1192.4% 16520 ±127% interrupts.CPU1.RES:Rescheduling_interrupts
410.50 ± 26% -94.5% 22.50 ±173% interrupts.CPU10.TLB:TLB_shootdowns
95.00 ± 81% -97.4% 2.50 ±128% interrupts.CPU12.48:PCI-MSI.31981581-edge.i40e-eth0-TxRx-12
218.50 ± 82% -91.5% 18.50 ±173% interrupts.CPU12.TLB:TLB_shootdowns
1142 -64.1% 409.75 ± 92% interrupts.CPU13.NMI:Non-maskable_interrupts
1142 -64.1% 409.75 ± 92% interrupts.CPU13.PMI:Performance_monitoring_interrupts
11287 ±167% -98.6% 154.00 ±163% interrupts.CPU13.RES:Rescheduling_interrupts
1141 ± 2% -49.4% 578.00 ± 72% interrupts.CPU15.NMI:Non-maskable_interrupts
1141 ± 2% -49.4% 578.00 ± 72% interrupts.CPU15.PMI:Performance_monitoring_interrupts
1124 ± 2% -67.4% 366.75 ±115% interrupts.CPU16.NMI:Non-maskable_interrupts
1124 ± 2% -67.4% 366.75 ±115% interrupts.CPU16.PMI:Performance_monitoring_interrupts
315.50 ± 62% -69.3% 96.75 ±173% interrupts.CPU17.TLB:TLB_shootdowns
137.25 ± 63% -96.5% 4.75 ±161% interrupts.CPU18.TLB:TLB_shootdowns
160.00 ±170% -99.5% 0.75 ±110% interrupts.CPU2.38:PCI-MSI.31981571-edge.i40e-eth0-TxRx-2
506.25 ± 30% -77.4% 114.50 ± 84% interrupts.CPU2.TLB:TLB_shootdowns
241.25 ± 78% -99.1% 2.25 ±173% interrupts.CPU21.TLB:TLB_shootdowns
295.00 ± 56% -79.1% 61.75 ±169% interrupts.CPU22.TLB:TLB_shootdowns
723.25 ± 63% -84.0% 116.00 ±167% interrupts.CPU23.RES:Rescheduling_interrupts
24258 ± 60% -96.5% 848.00 ± 5% interrupts.CPU24.RES:Rescheduling_interrupts
254.75 ± 27% +109.6% 534.00 ± 54% interrupts.CPU24.TLB:TLB_shootdowns
124.25 ± 75% +178.5% 346.00 ± 32% interrupts.CPU25.TLB:TLB_shootdowns
70.00 ±173% +362.9% 324.00 ± 89% interrupts.CPU27.TLB:TLB_shootdowns
41.75 ± 84% +1.9e+05% 77527 ±171% interrupts.CPU28.RES:Rescheduling_interrupts
1088 ± 22% +12337.9% 135324 ± 82% interrupts.CPU3.RES:Rescheduling_interrupts
546.00 ± 39% -87.0% 71.25 ± 68% interrupts.CPU4.TLB:TLB_shootdowns
26.25 ± 65% +2721.0% 740.50 ± 82% interrupts.CPU43.RES:Rescheduling_interrupts
4.50 ±104% +8927.8% 406.25 ± 92% interrupts.CPU47.TLB:TLB_shootdowns
1154 ±172% -99.9% 1.50 ±137% interrupts.CPU48.84:PCI-MSI.31981617-edge.i40e-eth0-TxRx-48
435.50 ± 70% -98.5% 6.75 ± 92% interrupts.CPU48.RES:Rescheduling_interrupts
504.50 ± 62% -99.8% 1.25 ±131% interrupts.CPU49.RES:Rescheduling_interrupts
180.25 ± 72% -100.0% 0.00 interrupts.CPU49.TLB:TLB_shootdowns
1605 ± 69% -79.6% 327.75 ± 70% interrupts.CPU5.NMI:Non-maskable_interrupts
1605 ± 69% -79.6% 327.75 ± 70% interrupts.CPU5.PMI:Performance_monitoring_interrupts
527.00 ± 8% -96.6% 17.75 ±173% interrupts.CPU5.TLB:TLB_shootdowns
1248 ± 44% -77.9% 276.25 ± 84% interrupts.CPU53.NMI:Non-maskable_interrupts
1248 ± 44% -77.9% 276.25 ± 84% interrupts.CPU53.PMI:Performance_monitoring_interrupts
613.00 ±100% -99.5% 3.25 ± 88% interrupts.CPU54.RES:Rescheduling_interrupts
165.50 ± 98% -96.5% 5.75 ±173% interrupts.CPU54.TLB:TLB_shootdowns
577.75 ±111% -99.7% 1.75 ±116% interrupts.CPU55.RES:Rescheduling_interrupts
668.25 ± 76% -46.1% 360.50 ±123% interrupts.CPU56.NMI:Non-maskable_interrupts
668.25 ± 76% -46.1% 360.50 ±123% interrupts.CPU56.PMI:Performance_monitoring_interrupts
390.50 ±111% -87.8% 47.75 ±168% interrupts.CPU56.RES:Rescheduling_interrupts
134.50 ± 64% -74.2% 34.75 ±173% interrupts.CPU56.TLB:TLB_shootdowns
356.50 ± 96% -94.5% 19.50 ± 73% interrupts.CPU59.RES:Rescheduling_interrupts
77.25 ± 57% -99.0% 0.75 ±173% interrupts.CPU59.TLB:TLB_shootdowns
1004 ± 24% -62.6% 376.25 ±106% interrupts.CPU61.NMI:Non-maskable_interrupts
1004 ± 24% -62.6% 376.25 ±106% interrupts.CPU61.PMI:Performance_monitoring_interrupts
1181 ± 34% -96.8% 38.25 ±143% interrupts.CPU61.RES:Rescheduling_interrupts
379.00 ± 65% -97.8% 8.50 ±173% interrupts.CPU61.TLB:TLB_shootdowns
2301 ±140% -99.7% 6.50 ± 86% interrupts.CPU62.RES:Rescheduling_interrupts
95.75 ± 59% -92.4% 7.25 ±173% interrupts.CPU62.TLB:TLB_shootdowns
433.75 ± 83% -98.4% 6.75 ±139% interrupts.CPU63.RES:Rescheduling_interrupts
103.00 ± 56% -92.5% 7.75 ±173% interrupts.CPU63.TLB:TLB_shootdowns
994.50 ± 23% -62.2% 376.25 ±113% interrupts.CPU64.NMI:Non-maskable_interrupts
994.50 ± 23% -62.2% 376.25 ±113% interrupts.CPU64.PMI:Performance_monitoring_interrupts
512.75 ± 65% -98.2% 9.25 ±137% interrupts.CPU64.RES:Rescheduling_interrupts
151.00 ± 61% -99.7% 0.50 ±173% interrupts.CPU64.TLB:TLB_shootdowns
1008 ± 44% -94.9% 51.50 ±100% interrupts.CPU67.RES:Rescheduling_interrupts
242.25 ± 79% -96.0% 9.75 ±155% interrupts.CPU67.TLB:TLB_shootdowns
779.50 ± 75% -98.8% 9.50 ±126% interrupts.CPU68.RES:Rescheduling_interrupts
1059 ± 43% -93.1% 73.25 ±109% interrupts.CPU69.RES:Rescheduling_interrupts
246.00 ±104% -93.1% 17.00 ±173% interrupts.CPU69.TLB:TLB_shootdowns
1054 ±152% -98.1% 20.00 ±161% interrupts.CPU7.RES:Rescheduling_interrupts
66.00 ± 66% -90.9% 6.00 ±173% interrupts.CPU7.TLB:TLB_shootdowns
1056 ± 24% -94.3% 59.75 ±125% interrupts.CPU70.RES:Rescheduling_interrupts
224.75 ± 69% -89.5% 23.50 ±163% interrupts.CPU70.TLB:TLB_shootdowns
0.00 +1.1e+104% 107.00 ± 93% interrupts.CPU72.TLB:TLB_shootdowns
9.00 ±137% +3747.2% 346.25 ± 96% interrupts.CPU73.RES:Rescheduling_interrupts
0.00 +2.3e+104% 230.00 ±133% interrupts.CPU74.TLB:TLB_shootdowns
0.00 +1.6e+104% 165.00 ± 51% interrupts.CPU75.TLB:TLB_shootdowns
537.75 ± 26% -90.2% 52.50 ±113% interrupts.CPU8.TLB:TLB_shootdowns
0.00 +9.8e+103% 97.50 ± 67% interrupts.CPU83.TLB:TLB_shootdowns
0.00 +2e+104% 198.00 ± 49% interrupts.CPU85.TLB:TLB_shootdowns
13.25 ± 59% +809.4% 120.50 ± 75% interrupts.CPU88.RES:Rescheduling_interrupts
5593 ± 51% -93.2% 380.50 ±160% interrupts.CPU9.RES:Rescheduling_interrupts
0.75 ± 57% +31033.3% 233.50 ± 89% interrupts.CPU93.RES:Rescheduling_interrupts
0.50 ±100% +47100.0% 236.00 ±115% interrupts.CPU94.RES:Rescheduling_interrupts
16.25 ±169% +3118.5% 523.00 ± 78% interrupts.CPU95.RES:Rescheduling_interrupts
2.00 ±173% +8037.5% 162.75 ± 67% interrupts.CPU95.TLB:TLB_shootdowns
49869 ± 7% +94.1% 96790 ± 11% softirqs.CPU0.RCU
51718 ± 6% +67.2% 86459 ± 11% softirqs.CPU1.RCU
51281 ± 5% +92.7% 98810 ± 7% softirqs.CPU10.RCU
49686 ± 12% +95.1% 96913 ± 11% softirqs.CPU11.RCU
51222 ± 6% +95.4% 100111 ± 11% softirqs.CPU12.RCU
50980 ± 6% +99.1% 101505 ± 6% softirqs.CPU13.RCU
51046 ± 7% +98.3% 101207 ± 6% softirqs.CPU14.RCU
45578 ± 21% +116.5% 98669 ± 8% softirqs.CPU15.RCU
44168 ± 7% +99.2% 87963 ± 11% softirqs.CPU16.RCU
41610 ± 9% +119.4% 91301 ± 13% softirqs.CPU17.RCU
43759 ± 8% +110.7% 92217 ± 11% softirqs.CPU18.RCU
42902 ± 8% +109.1% 89698 ± 10% softirqs.CPU19.RCU
52217 ± 5% +94.3% 101454 ± 5% softirqs.CPU2.RCU
42704 ± 5% +114.8% 91744 ± 10% softirqs.CPU20.RCU
44910 ± 7% +102.1% 90783 ± 10% softirqs.CPU22.RCU
44991 ± 7% +104.7% 92096 ± 11% softirqs.CPU23.RCU
47098 ± 12% +92.3% 90567 ± 9% softirqs.CPU24.RCU
106586 ± 6% -11.9% 93907 ± 3% softirqs.CPU24.SCHED
42456 ± 12% +99.8% 84830 ± 8% softirqs.CPU25.RCU
47709 ± 6% +82.9% 87250 ± 10% softirqs.CPU26.RCU
50970 ± 13% +73.4% 88408 ± 12% softirqs.CPU27.RCU
51241 ± 13% +67.2% 85676 ± 15% softirqs.CPU28.RCU
50293 ± 13% +80.0% 90528 ± 10% softirqs.CPU29.RCU
53868 ± 4% +84.6% 99454 ± 10% softirqs.CPU3.RCU
44947 ± 14% +98.9% 89421 ± 8% softirqs.CPU31.RCU
54210 ± 19% +86.6% 101147 ± 5% softirqs.CPU32.RCU
58728 ± 15% +70.0% 99864 ± 9% softirqs.CPU33.RCU
55716 ± 18% +87.2% 104278 ± 5% softirqs.CPU34.RCU
58583 ± 15% +77.6% 104065 ± 5% softirqs.CPU35.RCU
58080 ± 14% +82.0% 105718 ± 5% softirqs.CPU36.RCU
50595 ± 12% +95.7% 99025 ± 4% softirqs.CPU37.RCU
53939 ± 13% +82.0% 98153 ± 4% softirqs.CPU38.RCU
53424 ± 14% +90.6% 101817 ± 6% softirqs.CPU39.RCU
52327 ± 5% +96.8% 102971 ± 4% softirqs.CPU4.RCU
54501 ± 14% +87.8% 102330 ± 7% softirqs.CPU40.RCU
53036 ± 35% +92.4% 102050 ± 4% softirqs.CPU41.RCU
60492 ± 14% +69.9% 102761 ± 2% softirqs.CPU42.RCU
57222 ± 15% +69.7% 97101 ± 5% softirqs.CPU43.RCU
58535 ± 14% +76.2% 103127 ± 5% softirqs.CPU44.RCU
57025 ± 14% +76.3% 100515 ± 6% softirqs.CPU45.RCU
60244 ± 17% +63.4% 98447 ± 7% softirqs.CPU46.RCU
57985 ± 14% +68.9% 97910 ± 12% softirqs.CPU47.RCU
49462 ± 6% +95.9% 96879 ± 11% softirqs.CPU48.RCU
49436 ± 4% +92.7% 95249 ± 12% softirqs.CPU49.RCU
52413 ± 5% +88.8% 98965 ± 11% softirqs.CPU5.RCU
51032 ± 6% +104.5% 104382 ± 7% softirqs.CPU50.RCU
45684 ± 30% +117.7% 99464 ± 15% softirqs.CPU51.RCU
51919 ± 5% +103.2% 105507 ± 6% softirqs.CPU52.RCU
52420 ± 4% +95.6% 102530 ± 12% softirqs.CPU53.RCU
49538 ± 9% +112.3% 105146 ± 7% softirqs.CPU54.RCU
49883 ± 7% +103.9% 101712 ± 9% softirqs.CPU55.RCU
49604 ± 6% +95.9% 97166 ± 9% softirqs.CPU56.RCU
51798 ± 4% +98.5% 102805 ± 6% softirqs.CPU57.RCU
49777 ± 8% +99.7% 99405 ± 7% softirqs.CPU58.RCU
49338 ± 14% +105.1% 101169 ± 11% softirqs.CPU59.RCU
49653 ± 10% +106.2% 102399 ± 6% softirqs.CPU6.RCU
51134 ± 6% +98.8% 101668 ± 11% softirqs.CPU60.RCU
50488 ± 7% +102.4% 102180 ± 7% softirqs.CPU61.RCU
50078 ± 6% +102.7% 101522 ± 6% softirqs.CPU62.RCU
51026 ± 5% +99.6% 101832 ± 8% softirqs.CPU63.RCU
42895 ± 7% +107.0% 88783 ± 13% softirqs.CPU64.RCU
42387 ± 11% +113.3% 90430 ± 11% softirqs.CPU65.RCU
43746 ± 8% +108.8% 91341 ± 12% softirqs.CPU66.RCU
40324 ± 12% +121.3% 89240 ± 11% softirqs.CPU67.RCU
41900 ± 6% +114.6% 89917 ± 11% softirqs.CPU68.RCU
44157 ± 7% +104.3% 90228 ± 11% softirqs.CPU69.RCU
50551 ± 9% +97.0% 99596 ± 8% softirqs.CPU7.RCU
44094 ± 7% +107.9% 91683 ± 13% softirqs.CPU70.RCU
44086 ± 7% +106.8% 91179 ± 11% softirqs.CPU71.RCU
46488 ± 15% +90.2% 88422 ± 8% softirqs.CPU72.RCU
42653 ± 12% +99.2% 84953 ± 9% softirqs.CPU73.RCU
48559 ± 12% +76.5% 85713 ± 12% softirqs.CPU74.RCU
51770 ± 16% +73.0% 89572 ± 9% softirqs.CPU75.RCU
50182 ± 14% +72.4% 86513 ± 14% softirqs.CPU76.RCU
50446 ± 14% +77.7% 89651 ± 10% softirqs.CPU77.RCU
49119 ± 13% +84.9% 90818 ± 8% softirqs.CPU78.RCU
45781 ± 15% +93.5% 88595 ± 8% softirqs.CPU79.RCU
50229 ± 6% +96.4% 98632 ± 7% softirqs.CPU8.RCU
45155 ± 36% +128.7% 103272 ± 5% softirqs.CPU80.RCU
58487 ± 14% +77.0% 103508 ± 6% softirqs.CPU81.RCU
55666 ± 18% +87.0% 104094 ± 5% softirqs.CPU82.RCU
58506 ± 14% +77.7% 103963 ± 5% softirqs.CPU83.RCU
58416 ± 14% +80.3% 105297 ± 6% softirqs.CPU84.RCU
51291 ± 12% +95.8% 100447 ± 4% softirqs.CPU85.RCU
54557 ± 13% +82.1% 99356 ± 4% softirqs.CPU86.RCU
54147 ± 13% +89.8% 102754 ± 6% softirqs.CPU87.RCU
55853 ± 14% +82.1% 101688 ± 6% softirqs.CPU88.RCU
59176 ± 16% +70.8% 101077 ± 5% softirqs.CPU89.RCU
52291 ± 6% +93.8% 101344 ± 5% softirqs.CPU9.RCU
60854 ± 15% +69.0% 102864 ± 2% softirqs.CPU90.RCU
57902 ± 12% +70.1% 98501 ± 4% softirqs.CPU91.RCU
59683 ± 15% +73.3% 103419 ± 5% softirqs.CPU92.RCU
57459 ± 14% +78.5% 102584 ± 6% softirqs.CPU93.RCU
59398 ± 13% +63.2% 96927 ± 8% softirqs.CPU94.RCU
56576 ± 15% +76.2% 99664 ± 11% softirqs.CPU95.RCU
5123 ± 16% +168.1% 13739 ± 7% softirqs.NET_RX
4882876 ± 10% +90.1% 9282598 ± 6% softirqs.RCU



phoronix-test-suite.glmark2.0.score

300 +-+-------------------------------------------------------------------+
| : +. +..+ +. +..+. +. + |
250 +-+: |
O : O O O O O O O O O O O O O O
| :O O O O O O O O O O |
200 +-+ |
| : |
150 +-+ |
|: |
100 +-+ |
|: |
|: |
50 +-+ |
| |
0 +-+-------------------------------------------------------------------+


phoronix-test-suite.glmark2.1.score

250 +-+-------------------------------------------------------------------+
| |
| +..+..+..+..+..+.+..+..+.. .+..+..+..+..+..+..+..+.+..+ |
200 +-+: +. |
| : O O O
O :O O O O O O O O O O O O O O O O O O O O O |
150 +-+ |
| : |
100 +-+ |
|: |
|: |
50 +-+ |
|: |
| |
0 +-+-------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample

***************************************************************************************************
lkp-csl-2sp8: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/need_x/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.6/true/debian-x86_64-phoronix/lkp-csl-2sp8/gtkperf-1.2.1/phoronix-test-suite/0x500002b

commit:
9f4813b531 (" Linux 5.4-rc8")
8d04a5f97a ("x86/mm/pat: Convert the PAT tree to a generic interval tree")

9f4813b531a0b8cc 8d04a5f97a5fa9d7afdf46eda3a
---------------- ---------------------------
%stddev %change %stddev
\ | \
10.34 +26.8% 13.11 ± 5% phoronix-test-suite.gtkperf.0.seconds
62.29 +13.3% 70.56 ± 2% phoronix-test-suite.time.elapsed_time
62.29 +13.3% 70.56 ± 2% phoronix-test-suite.time.elapsed_time.max
46875 -12.5% 41002 ± 3% meminfo.max_used_kB
29.85 +39.6% 41.68 ± 22% boot-time.boot
2433 +48.7% 3617 ± 25% boot-time.idle
1628 -11.4% 1442 ± 4% vmstat.io.bi
9046 -11.6% 7993 ± 2% vmstat.system.cs
1.954e+09 ± 69% +226.7% 6.384e+09 ± 6% cpuidle.C1E.time
6278082 ± 33% +94.4% 12207082 ± 16% cpuidle.C1E.usage
3.958e+09 ± 32% -93.8% 2.444e+08 ± 10% cpuidle.C6.time
5684464 -95.4% 262692 ± 13% cpuidle.C6.usage
8.34 ±100% -8.3 0.00 perf-profile.calltrace.cycles-pp.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.34 ±100% -8.3 0.00 perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.34 ±100% -8.3 0.00 perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.28 ±100% -3.2 2.08 ±173% perf-profile.calltrace.cycles-pp.mmput.do_exit.do_group_exit.get_signal.do_signal
5.28 ±100% -3.2 2.08 ±173% perf-profile.calltrace.cycles-pp.exit_mmap.mmput.do_exit.do_group_exit.get_signal
8.34 ±100% -8.3 0.00 perf-profile.children.cycles-pp.do_sys_open
8.34 ±100% -8.3 0.00 perf-profile.children.cycles-pp.do_filp_open
8.34 ±100% -8.3 0.00 perf-profile.children.cycles-pp.path_openat
5.28 ±100% -3.2 2.08 ±173% perf-profile.children.cycles-pp.mmput
5.28 ±100% -3.2 2.08 ±173% perf-profile.children.cycles-pp.exit_mmap
17.00 ± 13% +230.9% 56.25 ± 29% sched_debug.cfs_rq:/.runnable_load_avg.max
3.87 ± 17% +73.5% 6.71 ± 22% sched_debug.cfs_rq:/.runnable_load_avg.stddev
25.25 ± 12% -22.4% 19.58 ± 20% sched_debug.cfs_rq:/.util_est_enqueued.avg
34397 +44.4% 49658 ± 29% sched_debug.cpu.clock.avg
34402 +44.4% 49663 ± 29% sched_debug.cpu.clock.max
34391 +44.4% 49654 ± 29% sched_debug.cpu.clock.min
34397 +44.4% 49658 ± 29% sched_debug.cpu.clock_task.avg
34402 +44.4% 49663 ± 29% sched_debug.cpu.clock_task.max
34391 +44.4% 49654 ± 29% sched_debug.cpu.clock_task.min
9.74 ± 5% -11.5% 8.62 ± 4% sched_debug.cpu.nr_uninterruptible.stddev
34393 +44.4% 49654 ± 29% sched_debug.cpu_clk
31537 +48.4% 46795 ± 31% sched_debug.ktime
34725 +44.0% 49988 ± 29% sched_debug.sched_clk
96262 +2.1% 98322 proc-vmstat.nr_active_anon
29673 +1.4% 30100 proc-vmstat.nr_active_file
94407 +2.3% 96570 proc-vmstat.nr_anon_pages
1082 +1.5% 1099 proc-vmstat.nr_page_table_pages
96262 +2.1% 98322 proc-vmstat.nr_zone_active_anon
29673 +1.4% 30100 proc-vmstat.nr_zone_active_file
7664 ± 27% +172.9% 20912 ± 38% proc-vmstat.numa_hint_faults
7507 ± 27% +124.1% 16824 ± 38% proc-vmstat.numa_hint_faults_local
328182 +5.9% 347387 ± 4% proc-vmstat.numa_hit
309363 +6.2% 328630 ± 4% proc-vmstat.numa_local
39400 ± 35% +90.3% 74971 ± 42% proc-vmstat.numa_pte_updates
370297 +7.5% 398082 ± 4% proc-vmstat.pgalloc_normal
278665 +12.0% 312075 ± 4% proc-vmstat.pgfault
269968 ± 2% +10.9% 299343 ± 5% proc-vmstat.pgfree
16.26 ± 53% -51.0% 7.96 ± 2% perf-stat.i.MPKI
8.359e+08 ± 2% -12.0% 7.355e+08 ± 5% perf-stat.i.branch-instructions
3.78 ± 21% -1.0 2.76 ± 9% perf-stat.i.branch-miss-rate%
30272546 ± 11% -25.3% 22600376 ± 15% perf-stat.i.branch-misses
1870084 ± 6% -59.7% 752958 ± 15% perf-stat.i.cache-misses
47650854 ± 37% -46.3% 25593693 ± 2% perf-stat.i.cache-references
9360 -12.3% 8208 ± 2% perf-stat.i.context-switches
6.312e+09 ± 6% -15.0% 5.364e+09 ± 2% perf-stat.i.cpu-cycles
3783 ± 9% +176.2% 10448 ± 18% perf-stat.i.cycles-between-cache-misses
0.16 ± 40% -0.1 0.01 ± 16% perf-stat.i.dTLB-load-miss-rate%
921731 ± 27% -87.9% 111653 ± 10% perf-stat.i.dTLB-load-misses
0.05 ± 32% -0.0 0.01 ± 21% perf-stat.i.dTLB-store-miss-rate%
157895 ± 26% -82.3% 28003 ± 22% perf-stat.i.dTLB-store-misses
41.55 ± 4% -3.5 38.06 perf-stat.i.iTLB-load-miss-rate%
3.826e+09 ± 2% -11.9% 3.372e+09 ± 6% perf-stat.i.instructions
9.33 -13.3% 8.09 ± 3% perf-stat.i.major-faults
12.54 ± 38% -39.3% 7.61 ± 5% perf-stat.overall.MPKI
3.63 ± 13% -0.6 3.06 ± 10% perf-stat.overall.branch-miss-rate%
4.51 ± 35% -1.6 2.94 ± 14% perf-stat.overall.cache-miss-rate%
3391 ± 10% +113.9% 7255 ± 12% perf-stat.overall.cycles-between-cache-misses
0.10 ± 33% -0.1 0.01 ± 14% perf-stat.overall.dTLB-load-miss-rate%
0.04 ± 26% -0.0 0.01 ± 23% perf-stat.overall.dTLB-store-miss-rate%
8.23e+08 ± 2% -11.9% 7.253e+08 ± 5% perf-stat.ps.branch-instructions
29792047 ± 11% -25.2% 22290398 ± 15% perf-stat.ps.branch-misses
1841339 ± 6% -59.6% 743171 ± 15% perf-stat.ps.cache-misses
46876977 ± 37% -46.2% 25232671 ± 2% perf-stat.ps.cache-references
9205 -12.1% 8091 ± 2% perf-stat.ps.context-switches
6.211e+09 ± 6% -14.8% 5.289e+09 ± 2% perf-stat.ps.cpu-cycles
907015 ± 27% -87.9% 110123 ± 10% perf-stat.ps.dTLB-load-misses
155379 ± 26% -82.2% 27623 ± 22% perf-stat.ps.dTLB-store-misses
3.766e+09 ± 2% -11.7% 3.325e+09 ± 6% perf-stat.ps.instructions
9.21 -13.4% 7.98 ± 3% perf-stat.ps.major-faults
16348 ± 5% +33.4% 21802 ± 5% softirqs.CPU0.SCHED
11186 ± 15% +37.7% 15406 ± 16% softirqs.CPU1.SCHED
9914 ± 8% +48.9% 14762 ± 30% softirqs.CPU10.SCHED
9645 ± 2% +21.1% 11681 ± 4% softirqs.CPU11.SCHED
9779 ± 3% +17.1% 11450 ± 7% softirqs.CPU13.SCHED
9805 ± 3% +12.6% 11040 ± 6% softirqs.CPU14.SCHED
9658 ± 2% +25.3% 12104 ± 10% softirqs.CPU15.SCHED
31797 ± 7% +23.0% 39108 ± 14% softirqs.CPU16.TIMER
9330 ± 6% +29.9% 12116 ± 4% softirqs.CPU17.SCHED
31554 ± 10% +16.1% 36648 ± 6% softirqs.CPU17.TIMER
9493 ± 4% +21.9% 11574 ± 6% softirqs.CPU18.SCHED
9700 ± 2% +61.1% 15631 ± 47% softirqs.CPU19.SCHED
10097 ± 4% +20.7% 12186 softirqs.CPU2.SCHED
9731 ± 2% +18.3% 11516 ± 3% softirqs.CPU20.SCHED
9664 ± 3% +18.1% 11413 ± 4% softirqs.CPU21.SCHED
9810 ± 3% +12.3% 11013 ± 6% softirqs.CPU22.SCHED
9509 ± 3% +20.0% 11407 ± 4% softirqs.CPU23.SCHED
9355 ± 9% +23.8% 11581 ± 8% softirqs.CPU26.SCHED
34990 +24.1% 43413 ± 6% softirqs.CPU26.TIMER
9577 ± 8% +20.8% 11569 ± 4% softirqs.CPU27.SCHED
9920 ± 2% +15.5% 11462 ± 6% softirqs.CPU28.SCHED
10084 +13.1% 11409 ± 4% softirqs.CPU29.SCHED
30741 ± 10% +27.7% 39262 ± 8% softirqs.CPU3.TIMER
9843 +18.7% 11688 ± 6% softirqs.CPU30.SCHED
8609 ± 21% +33.6% 11503 ± 4% softirqs.CPU32.SCHED
9714 ± 2% +22.8% 11927 ± 7% softirqs.CPU33.SCHED
9695 +18.1% 11445 ± 5% softirqs.CPU34.SCHED
8242 ± 30% +39.4% 11488 ± 6% softirqs.CPU35.SCHED
8035 ± 30% +40.4% 11280 ± 5% softirqs.CPU37.SCHED
9880 +17.7% 11627 ± 6% softirqs.CPU46.SCHED
8892 ± 9% +26.4% 11243 ± 8% softirqs.CPU47.SCHED
9244 ± 7% +16.8% 10799 ± 3% softirqs.CPU49.SCHED
9617 +18.4% 11389 ± 3% softirqs.CPU50.SCHED
31719 ± 7% +12.5% 35691 ± 5% softirqs.CPU50.TIMER
9366 ± 5% +16.5% 10911 ± 3% softirqs.CPU51.SCHED
9929 ± 3% +22.9% 12207 ± 7% softirqs.CPU53.SCHED
8616 ± 13% +41.5% 12194 ± 10% softirqs.CPU54.SCHED
9500 ± 3% +15.4% 10962 softirqs.CPU55.SCHED
32014 ± 9% +20.3% 38529 ± 15% softirqs.CPU56.TIMER
9732 ± 6% +16.2% 11310 ± 4% softirqs.CPU59.SCHED
9628 ± 2% +21.5% 11699 ± 10% softirqs.CPU60.SCHED
10026 ± 5% +10.9% 11124 ± 6% softirqs.CPU62.SCHED
9585 +14.5% 10972 ± 8% softirqs.CPU63.SCHED
9541 ± 3% +21.4% 11586 ± 16% softirqs.CPU64.SCHED
9035 ± 10% +17.1% 10577 ± 7% softirqs.CPU65.SCHED
9483 ± 2% +20.2% 11401 ± 5% softirqs.CPU66.SCHED
9026 ± 12% +23.4% 11134 ± 4% softirqs.CPU68.SCHED
9482 ± 2% +13.8% 10787 ± 5% softirqs.CPU69.SCHED
9658 ± 2% +20.4% 11624 ± 4% softirqs.CPU7.SCHED
9764 ± 3% +11.0% 10836 ± 5% softirqs.CPU70.SCHED
9316 ± 11% +18.8% 11072 ± 12% softirqs.CPU74.SCHED
9963 +12.3% 11189 ± 6% softirqs.CPU75.SCHED
10017 +14.1% 11425 ± 7% softirqs.CPU76.SCHED
9788 +18.4% 11588 ± 5% softirqs.CPU77.SCHED
9968 +16.3% 11591 ± 7% softirqs.CPU78.SCHED
9911 ± 2% +16.6% 11555 ± 5% softirqs.CPU79.SCHED
9408 ± 2% +19.7% 11258 ± 7% softirqs.CPU80.SCHED
10027 ± 4% +11.3% 11164 ± 5% softirqs.CPU81.SCHED
9638 ± 3% +14.3% 11019 ± 7% softirqs.CPU82.SCHED
9551 ± 2% +15.0% 10981 ± 8% softirqs.CPU83.SCHED
9583 ± 4% +18.8% 11389 ± 6% softirqs.CPU84.SCHED
8173 ± 29% +34.4% 10989 ± 7% softirqs.CPU86.SCHED
9936 ± 4% +14.8% 11410 ± 6% softirqs.CPU9.SCHED
8179 ± 30% +37.6% 11257 ± 10% softirqs.CPU91.SCHED
8077 ± 30% +39.1% 11233 ± 8% softirqs.CPU95.SCHED
919351 ± 7% +19.9% 1102697 ± 4% softirqs.SCHED
3193255 ± 2% +11.6% 3563915 ± 6% softirqs.TIMER
33.00 ±147% -100.0% 0.00 interrupts.52:PCI-MSI.31981585-edge.i40e-eth0-TxRx-16
127.00 +15.4% 146.50 ± 3% interrupts.9:IO-APIC.9-fasteoi.acpi
124159 ± 3% +16.5% 144587 ± 3% interrupts.CPU0.LOC:Local_timer_interrupts
127.00 +15.4% 146.50 ± 3% interrupts.CPU1.9:IO-APIC.9-fasteoi.acpi
123924 ± 3% +16.7% 144656 ± 2% interrupts.CPU1.LOC:Local_timer_interrupts
123637 ± 3% +16.2% 143635 ± 4% interrupts.CPU10.LOC:Local_timer_interrupts
123785 ± 3% +15.3% 142755 ± 3% interrupts.CPU11.LOC:Local_timer_interrupts
123691 ± 3% +16.3% 143829 ± 3% interrupts.CPU12.LOC:Local_timer_interrupts
123545 ± 3% +15.7% 142970 ± 3% interrupts.CPU13.LOC:Local_timer_interrupts
123726 ± 3% +16.2% 143718 ± 3% interrupts.CPU14.LOC:Local_timer_interrupts
123727 ± 3% +15.2% 142493 ± 2% interrupts.CPU15.LOC:Local_timer_interrupts
123761 ± 3% +16.7% 144390 ± 3% interrupts.CPU16.LOC:Local_timer_interrupts
123749 ± 3% +16.9% 144632 ± 3% interrupts.CPU17.LOC:Local_timer_interrupts
123660 ± 3% +16.9% 144533 ± 3% interrupts.CPU18.LOC:Local_timer_interrupts
123976 ± 3% +17.0% 145106 ± 3% interrupts.CPU19.LOC:Local_timer_interrupts
123983 ± 3% +16.8% 144764 ± 3% interrupts.CPU2.LOC:Local_timer_interrupts
123783 ± 3% +16.7% 144400 ± 3% interrupts.CPU20.LOC:Local_timer_interrupts
123924 ± 3% +16.5% 144405 ± 3% interrupts.CPU21.LOC:Local_timer_interrupts
124054 ± 3% +15.9% 143768 ± 3% interrupts.CPU22.LOC:Local_timer_interrupts
124085 ± 3% +15.7% 143543 ± 3% interrupts.CPU23.LOC:Local_timer_interrupts
110004 ± 22% +30.5% 143553 ± 4% interrupts.CPU24.LOC:Local_timer_interrupts
110068 ± 22% +29.7% 142727 ± 3% interrupts.CPU25.LOC:Local_timer_interrupts
110942 ± 22% +29.6% 143774 ± 2% interrupts.CPU26.LOC:Local_timer_interrupts
111894 ± 22% +28.3% 143555 ± 4% interrupts.CPU27.LOC:Local_timer_interrupts
111050 ± 22% +29.3% 143625 ± 4% interrupts.CPU28.LOC:Local_timer_interrupts
111805 ± 22% +27.5% 142570 ± 3% interrupts.CPU29.LOC:Local_timer_interrupts
122901 ± 3% +17.6% 144537 ± 3% interrupts.CPU3.LOC:Local_timer_interrupts
111025 ± 22% +29.5% 143752 ± 4% interrupts.CPU30.LOC:Local_timer_interrupts
110372 ± 21% +29.3% 142764 ± 3% interrupts.CPU31.LOC:Local_timer_interrupts
110041 ± 22% +30.2% 143294 ± 3% interrupts.CPU32.LOC:Local_timer_interrupts
110289 ± 21% +29.4% 142737 ± 3% interrupts.CPU33.LOC:Local_timer_interrupts
0.75 ±110% +4200.0% 32.25 ±143% interrupts.CPU33.RES:Rescheduling_interrupts
111042 ± 22% +28.7% 142872 ± 4% interrupts.CPU34.LOC:Local_timer_interrupts
110820 ± 22% +29.6% 143676 ± 4% interrupts.CPU35.LOC:Local_timer_interrupts
110849 ± 22% +29.9% 143999 ± 4% interrupts.CPU36.LOC:Local_timer_interrupts
110376 ± 22% +30.4% 143915 ± 3% interrupts.CPU37.LOC:Local_timer_interrupts
111593 ± 22% +29.0% 143915 ± 4% interrupts.CPU38.LOC:Local_timer_interrupts
111949 ± 22% +27.1% 142241 ± 4% interrupts.CPU39.LOC:Local_timer_interrupts
124261 ± 3% +15.7% 143828 ± 3% interrupts.CPU4.LOC:Local_timer_interrupts
111554 ± 22% +27.5% 142286 ± 4% interrupts.CPU40.LOC:Local_timer_interrupts
110802 ± 22% +29.1% 143047 ± 4% interrupts.CPU41.LOC:Local_timer_interrupts
111642 ± 22% +29.4% 144476 ± 4% interrupts.CPU42.LOC:Local_timer_interrupts
110980 ± 22% +29.3% 143450 ± 4% interrupts.CPU43.LOC:Local_timer_interrupts
111595 ± 22% +27.4% 142158 ± 4% interrupts.CPU44.LOC:Local_timer_interrupts
110926 ± 22% +28.8% 142914 ± 4% interrupts.CPU45.LOC:Local_timer_interrupts
111788 ± 22% +28.5% 143694 ± 4% interrupts.CPU46.LOC:Local_timer_interrupts
110789 ± 22% +29.2% 143182 ± 4% interrupts.CPU47.LOC:Local_timer_interrupts
123853 ± 3% +16.0% 143633 ± 3% interrupts.CPU48.LOC:Local_timer_interrupts
123954 ± 3% +16.1% 143915 ± 3% interrupts.CPU49.LOC:Local_timer_interrupts
125940 ± 5% +14.2% 143844 ± 2% interrupts.CPU5.LOC:Local_timer_interrupts
122996 ± 3% +16.9% 143771 ± 3% interrupts.CPU50.LOC:Local_timer_interrupts
123469 ± 3% +16.3% 143600 ± 3% interrupts.CPU51.LOC:Local_timer_interrupts
122963 ± 2% +16.8% 143584 ± 3% interrupts.CPU52.LOC:Local_timer_interrupts
123920 ± 3% +16.7% 144558 ± 3% interrupts.CPU53.LOC:Local_timer_interrupts
123791 ± 3% +16.7% 144460 ± 3% interrupts.CPU54.LOC:Local_timer_interrupts
123877 ± 3% +16.6% 144392 ± 3% interrupts.CPU55.LOC:Local_timer_interrupts
123809 ± 3% +16.4% 144159 ± 3% interrupts.CPU56.LOC:Local_timer_interrupts
123506 ± 3% +16.6% 144033 ± 3% interrupts.CPU57.LOC:Local_timer_interrupts
123672 ± 3% +17.7% 145596 ± 4% interrupts.CPU58.LOC:Local_timer_interrupts
123454 ± 3% +16.8% 144178 ± 3% interrupts.CPU59.LOC:Local_timer_interrupts
123108 ± 4% +17.5% 144649 ± 3% interrupts.CPU6.LOC:Local_timer_interrupts
123692 ± 3% +16.1% 143634 ± 3% interrupts.CPU60.LOC:Local_timer_interrupts
123855 ± 3% +15.5% 143073 ± 3% interrupts.CPU61.LOC:Local_timer_interrupts
123714 ± 3% +16.2% 143797 ± 3% interrupts.CPU62.LOC:Local_timer_interrupts
123863 ± 3% +16.0% 143707 ± 3% interrupts.CPU63.LOC:Local_timer_interrupts
123875 ± 3% +15.9% 143543 ± 3% interrupts.CPU64.LOC:Local_timer_interrupts
124440 ± 2% +16.8% 145383 ± 2% interrupts.CPU65.LOC:Local_timer_interrupts
124284 ± 2% +15.9% 144070 ± 3% interrupts.CPU66.LOC:Local_timer_interrupts
123799 ± 3% +15.8% 143331 ± 3% interrupts.CPU67.LOC:Local_timer_interrupts
123400 ± 3% +17.0% 144332 ± 3% interrupts.CPU68.LOC:Local_timer_interrupts
125156 ± 4% +14.5% 143289 ± 3% interrupts.CPU69.LOC:Local_timer_interrupts
124195 ± 3% +16.3% 144402 ± 3% interrupts.CPU7.LOC:Local_timer_interrupts
123791 ± 3% +16.0% 143551 ± 3% interrupts.CPU70.LOC:Local_timer_interrupts
123826 ± 3% +15.7% 143277 ± 3% interrupts.CPU71.LOC:Local_timer_interrupts
110101 ± 22% +29.0% 141983 ± 4% interrupts.CPU72.LOC:Local_timer_interrupts
110774 ± 22% +28.3% 142100 ± 4% interrupts.CPU73.LOC:Local_timer_interrupts
111785 ± 22% +27.6% 142623 ± 3% interrupts.CPU74.LOC:Local_timer_interrupts
111709 ± 22% +29.2% 144354 ± 3% interrupts.CPU75.LOC:Local_timer_interrupts
111739 ± 22% +28.9% 143992 ± 3% interrupts.CPU76.LOC:Local_timer_interrupts
111017 ± 22% +29.7% 143954 ± 3% interrupts.CPU77.LOC:Local_timer_interrupts
111625 ± 22% +28.7% 143651 ± 3% interrupts.CPU78.LOC:Local_timer_interrupts
110970 ± 22% +29.5% 143695 ± 4% interrupts.CPU79.LOC:Local_timer_interrupts
122666 ± 4% +18.0% 144790 ± 3% interrupts.CPU8.LOC:Local_timer_interrupts
109397 ± 21% +32.0% 144381 ± 3% interrupts.CPU80.LOC:Local_timer_interrupts
111101 ± 22% +29.2% 143593 ± 3% interrupts.CPU81.LOC:Local_timer_interrupts
110996 ± 22% +28.0% 142033 ± 4% interrupts.CPU82.LOC:Local_timer_interrupts
110733 ± 22% +28.4% 142147 ± 4% interrupts.CPU83.LOC:Local_timer_interrupts
110217 ± 22% +30.5% 143787 ± 4% interrupts.CPU84.LOC:Local_timer_interrupts
109986 ± 22% +31.0% 144082 ± 3% interrupts.CPU85.LOC:Local_timer_interrupts
110129 ± 22% +30.2% 143365 ± 3% interrupts.CPU86.LOC:Local_timer_interrupts
110749 ± 22% +28.8% 142696 ± 4% interrupts.CPU87.LOC:Local_timer_interrupts
110778 ± 22% +29.8% 143804 ± 4% interrupts.CPU88.LOC:Local_timer_interrupts
111348 ± 22% +28.5% 143096 ± 4% interrupts.CPU89.LOC:Local_timer_interrupts
123749 ± 3% +16.1% 143677 ± 3% interrupts.CPU9.LOC:Local_timer_interrupts
110728 ± 22% +29.0% 142841 ± 4% interrupts.CPU90.LOC:Local_timer_interrupts
110097 ± 22% +30.1% 143207 ± 4% interrupts.CPU91.LOC:Local_timer_interrupts
111533 ± 22% +28.3% 143083 ± 4% interrupts.CPU92.LOC:Local_timer_interrupts
110777 ± 22% +29.0% 142893 ± 4% interrupts.CPU93.LOC:Local_timer_interrupts
109394 ± 21% +31.1% 143427 ± 4% interrupts.CPU94.LOC:Local_timer_interrupts
110000 ± 22% +30.0% 143027 ± 4% interrupts.CPU95.LOC:Local_timer_interrupts
11264859 ± 12% +22.4% 13788804 ± 3% interrupts.LOC:Local_timer_interrupts





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (52.58 kB)
config-5.4.0-rc8-00006-g8d04a5f97a5fa9 (203.98 kB)
job-script (7.09 kB)
job.yaml (4.73 kB)
reproduce (263.00 B)
Download all attachments

2019-11-30 20:27:16

by Mariusz Ceier

[permalink] [raw]
Subject: Re: [x86/mm/pat] 8d04a5f97a: phoronix-test-suite.glmark2.0.score -23.7% regression

I can also confirm this - just bisected framebuffer rendering
performance regression on amdgpu and
8d04a5f97a5fa9d7afdf46eda3a5ceaa973a1bcc is the first bad commit
(leading to drop from around 260-300fps to about 60fps in CS:GO on
Fury X).

PS. Sorry for duplicate email (sending plain text only now, hopefully).

On Wed, 27 Nov 2019 at 00:56, kernel test robot <[email protected]> wrote:
>
> Greeting,
>
> FYI, we noticed a -23.7% regression of phoronix-test-suite.glmark2.0.score due to commit:
>
>
> commit: 8d04a5f97a5fa9d7afdf46eda3a5ceaa973a1bcc ("x86/mm/pat: Convert the PAT tree to a generic interval tree")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: phoronix-test-suite
> on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
> with following parameters:
>
> need_x: true
> test: glmark2-1.1.0
> cpufreq_governor: performance
> ucode: 0x500002b
>
> test-description: The Phoronix Test Suite is the most comprehensive testing and benchmarking platform available that provides an extensible framework for which new tests can be easily added.
> test-url: http://www.phoronix-test-suite.com/
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+-----------------------------------------------------------------------------+
> | testcase: change | phoronix-test-suite: phoronix-test-suite.gtkperf.0.seconds 26.8% regression |
> | test machine | 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory |
> | test parameters | cpufreq_governor=performance |
> | | need_x=true |
> | | test=gtkperf-1.2.1 |
> | | ucode=0x500002b |
> +------------------+-----------------------------------------------------------------------------+
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/need_x/rootfs/tbox_group/test/testcase/ucode:
> gcc-7/performance/x86_64-rhel-7.6/true/debian-x86_64-phoronix/lkp-csl-2sp8/glmark2-1.1.0/phoronix-test-suite/0x500002b
>
> commit:
> 9f4813b531 (" Linux 5.4-rc8")
> 8d04a5f97a ("x86/mm/pat: Convert the PAT tree to a generic interval tree")
>
> 9f4813b531a0b8cc 8d04a5f97a5fa9d7afdf46eda3a
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> 1:4 -25% :4 dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x
> 2:4 -50% :4 dmesg.WARNING:stack_recursion
> %stddev %change %stddev
> \ | \
> 290.50 -23.7% 221.75 ± 4% phoronix-test-suite.glmark2.0.score
> 210.25 -24.1% 159.50 ± 4% phoronix-test-suite.glmark2.1.score
> 35084 -19.8% 28122 ± 3% vmstat.system.cs
> 16157 +1.6% 16415 proc-vmstat.nr_kernel_stack
> 17551 +2.0% 17909 proc-vmstat.nr_slab_reclaimable
> 46846 +2.4% 47962 proc-vmstat.nr_slab_unreclaimable
> 156401 ± 15% -19.5% 125917 ± 11% proc-vmstat.numa_hint_faults_local
> 2091859 -2.4% 2041250 ± 2% proc-vmstat.pgfault
> 72.57 ± 2% -10.0 62.61 ± 19% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
> 1.25 ± 8% -0.5 0.79 ± 58% perf-profile.calltrace.cycles-pp.irq_enter.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
> 1.04 ± 10% -0.4 0.66 ± 58% perf-profile.calltrace.cycles-pp.tick_irq_enter.irq_enter.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state
> 72.70 ± 2% -9.9 62.81 ± 19% perf-profile.children.cycles-pp.intel_idle
> 1.28 ± 8% -0.3 0.94 ± 30% perf-profile.children.cycles-pp.irq_enter
> 1.08 ± 9% -0.3 0.78 ± 28% perf-profile.children.cycles-pp.tick_irq_enter
> 0.47 ± 14% -0.2 0.32 ± 39% perf-profile.children.cycles-pp.rcu_sched_clock_irq
> 0.48 ± 4% -0.1 0.37 ± 32% perf-profile.children.cycles-pp.futex_wait_queue_me
> 0.16 ± 11% -0.1 0.06 ± 63% perf-profile.children.cycles-pp.perf_event_task_tick
> 0.20 ± 5% -0.1 0.10 ± 62% perf-profile.children.cycles-pp.rb_next
> 0.11 ± 15% -0.1 0.03 ±100% perf-profile.children.cycles-pp.switch_mm_irqs_off
> 0.18 ± 7% -0.1 0.10 ± 61% perf-profile.children.cycles-pp.update_ts_time_stats
> 72.59 ± 2% -9.8 62.77 ± 19% perf-profile.self.cycles-pp.intel_idle
> 0.16 ± 14% -0.1 0.06 ± 63% perf-profile.self.cycles-pp.perf_event_task_tick
> 0.18 ± 6% -0.1 0.10 ± 65% perf-profile.self.cycles-pp.rb_next
> 26427524 -30.2% 18439861 ± 11% perf-stat.i.branch-misses
> 1.264e+08 -32.7% 85112641 ± 15% perf-stat.i.cache-references
> 35249 -19.8% 28261 ± 3% perf-stat.i.context-switches
> 1663983 ± 10% -47.0% 881904 ± 41% perf-stat.i.dTLB-load-misses
> 2.55e+09 ± 3% -29.5% 1.799e+09 ± 13% perf-stat.i.dTLB-loads
> 1.123e+09 ± 2% -6.9% 1.046e+09 ± 3% perf-stat.i.dTLB-stores
> 1.112e+10 -15.7% 9.376e+09 ± 4% perf-stat.i.instructions
> 6700 -21.1% 5289 ± 7% perf-stat.i.instructions-per-iTLB-miss
> 92755 ± 4% -5.2% 87901 perf-stat.i.node-loads
> 6929 -20.3% 5525 ± 7% perf-stat.overall.instructions-per-iTLB-miss
> 26388462 -30.2% 18414499 ± 11% perf-stat.ps.branch-misses
> 1.262e+08 -32.7% 84986397 ± 15% perf-stat.ps.cache-references
> 35191 -19.8% 28218 ± 3% perf-stat.ps.context-switches
> 1661390 ± 10% -47.0% 880613 ± 41% perf-stat.ps.dTLB-load-misses
> 2.546e+09 ± 3% -29.4% 1.796e+09 ± 13% perf-stat.ps.dTLB-loads
> 1.122e+09 ± 2% -6.8% 1.045e+09 ± 3% perf-stat.ps.dTLB-stores
> 1.111e+10 -15.7% 9.362e+09 ± 4% perf-stat.ps.instructions
> 92587 ± 4% -5.2% 87778 perf-stat.ps.node-loads
> 7.805e+12 -16.3% 6.535e+12 ± 4% perf-stat.total.instructions
> 421893 ± 13% -33.5% 280495 ± 17% numa-meminfo.node0.Active
> 301962 ± 18% -24.7% 227271 ± 3% numa-meminfo.node0.Active(anon)
> 119930 -55.6% 53224 ± 75% numa-meminfo.node0.Active(file)
> 289371 ± 19% -23.6% 220986 ± 3% numa-meminfo.node0.AnonPages
> 1599232 -46.1% 861252 ± 63% numa-meminfo.node0.FilePages
> 1467294 -45.5% 800320 ± 63% numa-meminfo.node0.Inactive
> 406871 -60.9% 159255 ± 92% numa-meminfo.node0.Inactive(file)
> 10042 ± 2% -14.2% 8615 ± 12% numa-meminfo.node0.KernelStack
> 109027 -67.1% 35838 ± 78% numa-meminfo.node0.Mapped
> 2488468 ± 2% -36.4% 1582090 ± 41% numa-meminfo.node0.MemUsed
> 4081 ± 3% -53.1% 1913 ± 56% numa-meminfo.node0.PageTables
> 169752 ± 3% -15.8% 143010 ± 19% numa-meminfo.node0.Slab
> 183294 ± 31% +78.6% 327336 ± 16% numa-meminfo.node1.Active
> 176459 ± 32% +44.1% 254351 ± 4% numa-meminfo.node1.Active(anon)
> 6834 ± 11% +967.9% 72984 ± 57% numa-meminfo.node1.Active(file)
> 171600 ± 33% +42.0% 243637 ± 2% numa-meminfo.node1.AnonPages
> 18539 ± 5% +3986.1% 757532 ± 72% numa-meminfo.node1.FilePages
> 6284 ± 14% +10623.2% 673848 ± 75% numa-meminfo.node1.Inactive
> 3812 ± 13% +6513.3% 252099 ± 58% numa-meminfo.node1.Inactive(file)
> 4526 ± 18% +1622.5% 77962 ± 35% numa-meminfo.node1.Mapped
> 477729 ± 11% +190.4% 1387370 ± 48% numa-meminfo.node1.MemUsed
> 704.00 ± 20% +307.6% 2869 ± 37% numa-meminfo.node1.PageTables
> 75486 ± 18% -24.7% 56818 ± 3% numa-vmstat.node0.nr_active_anon
> 29985 -55.6% 13304 ± 75% numa-vmstat.node0.nr_active_file
> 72338 ± 19% -23.6% 55246 ± 3% numa-vmstat.node0.nr_anon_pages
> 399807 -46.1% 215312 ± 63% numa-vmstat.node0.nr_file_pages
> 101715 -60.9% 39814 ± 92% numa-vmstat.node0.nr_inactive_file
> 10043 ± 2% -14.2% 8614 ± 12% numa-vmstat.node0.nr_kernel_stack
> 27254 -67.1% 8959 ± 78% numa-vmstat.node0.nr_mapped
> 1019 ± 3% -53.2% 477.50 ± 56% numa-vmstat.node0.nr_page_table_pages
> 75486 ± 18% -24.7% 56818 ± 3% numa-vmstat.node0.nr_zone_active_anon
> 29985 -55.6% 13304 ± 75% numa-vmstat.node0.nr_zone_active_file
> 101715 -60.9% 39814 ± 92% numa-vmstat.node0.nr_zone_inactive_file
> 1483935 ± 6% -18.2% 1214502 ± 21% numa-vmstat.node0.numa_local
> 44115 ± 32% +44.2% 63599 ± 4% numa-vmstat.node1.nr_active_anon
> 1708 ± 11% +968.1% 18243 ± 57% numa-vmstat.node1.nr_active_file
> 42900 ± 33% +42.0% 60922 ± 2% numa-vmstat.node1.nr_anon_pages
> 4635 ± 5% +3985.9% 189382 ± 72% numa-vmstat.node1.nr_file_pages
> 952.75 ± 13% +6515.2% 63026 ± 58% numa-vmstat.node1.nr_inactive_file
> 1131 ± 18% +1623.5% 19492 ± 35% numa-vmstat.node1.nr_mapped
> 176.00 ± 20% +307.4% 717.00 ± 37% numa-vmstat.node1.nr_page_table_pages
> 44115 ± 32% +44.2% 63599 ± 4% numa-vmstat.node1.nr_zone_active_anon
> 1708 ± 11% +968.1% 18243 ± 57% numa-vmstat.node1.nr_zone_active_file
> 952.75 ± 13% +6515.2% 63026 ± 58% numa-vmstat.node1.nr_zone_inactive_file
> 1546 ± 3% +34.6% 2081 ± 6% slabinfo.UNIX.active_objs
> 1546 ± 3% +34.6% 2081 ± 6% slabinfo.UNIX.num_objs
> 2089 ± 4% +11.9% 2337 ± 7% slabinfo.dmaengine-unmap-16.active_objs
> 2089 ± 4% +11.9% 2337 ± 7% slabinfo.dmaengine-unmap-16.num_objs
> 21286 ± 4% +10.5% 23516 ± 5% slabinfo.filp.active_objs
> 21594 ± 4% +10.8% 23932 ± 4% slabinfo.filp.num_objs
> 3632 ± 7% +15.4% 4190 ± 7% slabinfo.mnt_cache.active_objs
> 3632 ± 7% +15.4% 4190 ± 7% slabinfo.mnt_cache.num_objs
> 3343 ± 4% +23.1% 4116 ± 7% slabinfo.numa_policy.active_objs
> 3343 ± 4% +23.1% 4116 ± 7% slabinfo.numa_policy.num_objs
> 5711 ± 3% +10.0% 6282 ± 4% slabinfo.ovl_inode.active_objs
> 5711 ± 3% +10.0% 6282 ± 4% slabinfo.ovl_inode.num_objs
> 2143 ± 4% +15.8% 2481 ± 7% slabinfo.pool_workqueue.active_objs
> 2143 ± 4% +15.8% 2481 ± 7% slabinfo.pool_workqueue.num_objs
> 2309 ± 8% +14.7% 2648 ± 6% slabinfo.skbuff_ext_cache.active_objs
> 2309 ± 8% +14.7% 2648 ± 6% slabinfo.skbuff_ext_cache.num_objs
> 2785 ± 3% +21.8% 3390 ± 4% slabinfo.sock_inode_cache.active_objs
> 2785 ± 3% +21.8% 3390 ± 4% slabinfo.sock_inode_cache.num_objs
> 1300 ± 3% +15.3% 1500 ± 8% slabinfo.task_group.active_objs
> 1300 ± 3% +15.3% 1500 ± 8% slabinfo.task_group.num_objs
> 471.00 ± 12% -33.1% 315.25 ± 31% slabinfo.xfrm_state.active_objs
> 471.00 ± 12% -33.1% 315.25 ± 31% slabinfo.xfrm_state.num_objs
> 15189 ± 4% -16.9% 12626 ± 4% sched_debug.cfs_rq:/.exec_clock.avg
> 54802 ± 5% -20.8% 43426 ± 7% sched_debug.cfs_rq:/.load.avg
> 59.91 ± 51% -38.0% 37.16 ± 5% sched_debug.cfs_rq:/.load_avg.avg
> 2493 ±112% -68.5% 784.19 ± 3% sched_debug.cfs_rq:/.load_avg.max
> 320.29 ± 88% -55.3% 143.18 ± 2% sched_debug.cfs_rq:/.load_avg.stddev
> 68001 ± 4% -12.7% 59359 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg
> 67912 -18.0% 55681 ± 8% sched_debug.cfs_rq:/.min_vruntime.stddev
> 0.11 ± 5% -26.6% 0.08 ± 7% sched_debug.cfs_rq:/.nr_running.avg
> 0.30 ± 2% -11.0% 0.27 ± 3% sched_debug.cfs_rq:/.nr_running.stddev
> 28.47 ± 6% -16.1% 23.89 ± 6% sched_debug.cfs_rq:/.runnable_load_avg.avg
> 754.35 ± 4% -8.9% 687.54 ± 6% sched_debug.cfs_rq:/.runnable_load_avg.max
> 127.97 ± 4% -10.0% 115.24 ± 6% sched_debug.cfs_rq:/.runnable_load_avg.stddev
> 54698 ± 5% -20.6% 43419 ± 7% sched_debug.cfs_rq:/.runnable_weight.avg
> 44849 ± 15% -52.5% 21314 ± 60% sched_debug.cfs_rq:/.spread0.avg
> 67914 -18.0% 55682 ± 8% sched_debug.cfs_rq:/.spread0.stddev
> 110.05 ± 2% -20.3% 87.72 ± 2% sched_debug.cfs_rq:/.util_avg.avg
> 22.41 ± 9% -27.7% 16.21 ± 10% sched_debug.cfs_rq:/.util_est_enqueued.avg
> 0.05 ± 4% -10.7% 0.05 ± 4% sched_debug.cpu.nr_running.avg
> 134130 -20.1% 107139 ± 3% sched_debug.cpu.nr_switches.avg
> 764365 ± 4% -25.1% 572854 ± 5% sched_debug.cpu.nr_switches.max
> 212569 ± 2% -22.1% 165622 ± 8% sched_debug.cpu.nr_switches.stddev
> 131734 -20.5% 104682 ± 3% sched_debug.cpu.sched_count.avg
> 760935 ± 5% -25.2% 569239 ± 5% sched_debug.cpu.sched_count.max
> 212271 ± 2% -22.0% 165483 ± 8% sched_debug.cpu.sched_count.stddev
> 65857 -20.5% 52334 ± 3% sched_debug.cpu.sched_goidle.avg
> 380419 ± 5% -25.2% 284602 ± 5% sched_debug.cpu.sched_goidle.max
> 106128 ± 2% -22.0% 82735 ± 8% sched_debug.cpu.sched_goidle.stddev
> 65779 -20.6% 52253 ± 3% sched_debug.cpu.ttwu_count.avg
> 164575 ± 4% -15.1% 139787 ± 12% sched_debug.cpu.ttwu_count.stddev
> 925.77 +14.7% 1061 ± 4% sched_debug.cpu.ttwu_local.avg
> 95.00 ± 81% -96.3% 3.50 ± 91% interrupts.48:PCI-MSI.31981581-edge.i40e-eth0-TxRx-12
> 81.00 ±150% -100.0% 0.00 interrupts.59:PCI-MSI.31981592-edge.i40e-eth0-TxRx-23
> 0.00 +4.9e+105% 4915 ±172% interrupts.76:PCI-MSI.31981609-edge.i40e-eth0-TxRx-40
> 71.75 ±167% -99.7% 0.25 ±173% interrupts.82:PCI-MSI.31981615-edge.i40e-eth0-TxRx-46
> 1278 ± 29% +1192.4% 16520 ±127% interrupts.CPU1.RES:Rescheduling_interrupts
> 410.50 ± 26% -94.5% 22.50 ±173% interrupts.CPU10.TLB:TLB_shootdowns
> 95.00 ± 81% -97.4% 2.50 ±128% interrupts.CPU12.48:PCI-MSI.31981581-edge.i40e-eth0-TxRx-12
> 218.50 ± 82% -91.5% 18.50 ±173% interrupts.CPU12.TLB:TLB_shootdowns
> 1142 -64.1% 409.75 ± 92% interrupts.CPU13.NMI:Non-maskable_interrupts
> 1142 -64.1% 409.75 ± 92% interrupts.CPU13.PMI:Performance_monitoring_interrupts
> 11287 ±167% -98.6% 154.00 ±163% interrupts.CPU13.RES:Rescheduling_interrupts
> 1141 ± 2% -49.4% 578.00 ± 72% interrupts.CPU15.NMI:Non-maskable_interrupts
> 1141 ± 2% -49.4% 578.00 ± 72% interrupts.CPU15.PMI:Performance_monitoring_interrupts
> 1124 ± 2% -67.4% 366.75 ±115% interrupts.CPU16.NMI:Non-maskable_interrupts
> 1124 ± 2% -67.4% 366.75 ±115% interrupts.CPU16.PMI:Performance_monitoring_interrupts
> 315.50 ± 62% -69.3% 96.75 ±173% interrupts.CPU17.TLB:TLB_shootdowns
> 137.25 ± 63% -96.5% 4.75 ±161% interrupts.CPU18.TLB:TLB_shootdowns
> 160.00 ±170% -99.5% 0.75 ±110% interrupts.CPU2.38:PCI-MSI.31981571-edge.i40e-eth0-TxRx-2
> 506.25 ± 30% -77.4% 114.50 ± 84% interrupts.CPU2.TLB:TLB_shootdowns
> 241.25 ± 78% -99.1% 2.25 ±173% interrupts.CPU21.TLB:TLB_shootdowns
> 295.00 ± 56% -79.1% 61.75 ±169% interrupts.CPU22.TLB:TLB_shootdowns
> 723.25 ± 63% -84.0% 116.00 ±167% interrupts.CPU23.RES:Rescheduling_interrupts
> 24258 ± 60% -96.5% 848.00 ± 5% interrupts.CPU24.RES:Rescheduling_interrupts
> 254.75 ± 27% +109.6% 534.00 ± 54% interrupts.CPU24.TLB:TLB_shootdowns
> 124.25 ± 75% +178.5% 346.00 ± 32% interrupts.CPU25.TLB:TLB_shootdowns
> 70.00 ±173% +362.9% 324.00 ± 89% interrupts.CPU27.TLB:TLB_shootdowns
> 41.75 ± 84% +1.9e+05% 77527 ±171% interrupts.CPU28.RES:Rescheduling_interrupts
> 1088 ± 22% +12337.9% 135324 ± 82% interrupts.CPU3.RES:Rescheduling_interrupts
> 546.00 ± 39% -87.0% 71.25 ± 68% interrupts.CPU4.TLB:TLB_shootdowns
> 26.25 ± 65% +2721.0% 740.50 ± 82% interrupts.CPU43.RES:Rescheduling_interrupts
> 4.50 ±104% +8927.8% 406.25 ± 92% interrupts.CPU47.TLB:TLB_shootdowns
> 1154 ±172% -99.9% 1.50 ±137% interrupts.CPU48.84:PCI-MSI.31981617-edge.i40e-eth0-TxRx-48
> 435.50 ± 70% -98.5% 6.75 ± 92% interrupts.CPU48.RES:Rescheduling_interrupts
> 504.50 ± 62% -99.8% 1.25 ±131% interrupts.CPU49.RES:Rescheduling_interrupts
> 180.25 ± 72% -100.0% 0.00 interrupts.CPU49.TLB:TLB_shootdowns
> 1605 ± 69% -79.6% 327.75 ± 70% interrupts.CPU5.NMI:Non-maskable_interrupts
> 1605 ± 69% -79.6% 327.75 ± 70% interrupts.CPU5.PMI:Performance_monitoring_interrupts
> 527.00 ± 8% -96.6% 17.75 ±173% interrupts.CPU5.TLB:TLB_shootdowns
> 1248 ± 44% -77.9% 276.25 ± 84% interrupts.CPU53.NMI:Non-maskable_interrupts
> 1248 ± 44% -77.9% 276.25 ± 84% interrupts.CPU53.PMI:Performance_monitoring_interrupts
> 613.00 ±100% -99.5% 3.25 ± 88% interrupts.CPU54.RES:Rescheduling_interrupts
> 165.50 ± 98% -96.5% 5.75 ±173% interrupts.CPU54.TLB:TLB_shootdowns
> 577.75 ±111% -99.7% 1.75 ±116% interrupts.CPU55.RES:Rescheduling_interrupts
> 668.25 ± 76% -46.1% 360.50 ±123% interrupts.CPU56.NMI:Non-maskable_interrupts
> 668.25 ± 76% -46.1% 360.50 ±123% interrupts.CPU56.PMI:Performance_monitoring_interrupts
> 390.50 ±111% -87.8% 47.75 ±168% interrupts.CPU56.RES:Rescheduling_interrupts
> 134.50 ± 64% -74.2% 34.75 ±173% interrupts.CPU56.TLB:TLB_shootdowns
> 356.50 ± 96% -94.5% 19.50 ± 73% interrupts.CPU59.RES:Rescheduling_interrupts
> 77.25 ± 57% -99.0% 0.75 ±173% interrupts.CPU59.TLB:TLB_shootdowns
> 1004 ± 24% -62.6% 376.25 ±106% interrupts.CPU61.NMI:Non-maskable_interrupts
> 1004 ± 24% -62.6% 376.25 ±106% interrupts.CPU61.PMI:Performance_monitoring_interrupts
> 1181 ± 34% -96.8% 38.25 ±143% interrupts.CPU61.RES:Rescheduling_interrupts
> 379.00 ± 65% -97.8% 8.50 ±173% interrupts.CPU61.TLB:TLB_shootdowns
> 2301 ±140% -99.7% 6.50 ± 86% interrupts.CPU62.RES:Rescheduling_interrupts
> 95.75 ± 59% -92.4% 7.25 ±173% interrupts.CPU62.TLB:TLB_shootdowns
> 433.75 ± 83% -98.4% 6.75 ±139% interrupts.CPU63.RES:Rescheduling_interrupts
> 103.00 ± 56% -92.5% 7.75 ±173% interrupts.CPU63.TLB:TLB_shootdowns
> 994.50 ± 23% -62.2% 376.25 ±113% interrupts.CPU64.NMI:Non-maskable_interrupts
> 994.50 ± 23% -62.2% 376.25 ±113% interrupts.CPU64.PMI:Performance_monitoring_interrupts
> 512.75 ± 65% -98.2% 9.25 ±137% interrupts.CPU64.RES:Rescheduling_interrupts
> 151.00 ± 61% -99.7% 0.50 ±173% interrupts.CPU64.TLB:TLB_shootdowns
> 1008 ± 44% -94.9% 51.50 ±100% interrupts.CPU67.RES:Rescheduling_interrupts
> 242.25 ± 79% -96.0% 9.75 ±155% interrupts.CPU67.TLB:TLB_shootdowns
> 779.50 ± 75% -98.8% 9.50 ±126% interrupts.CPU68.RES:Rescheduling_interrupts
> 1059 ± 43% -93.1% 73.25 ±109% interrupts.CPU69.RES:Rescheduling_interrupts
> 246.00 ±104% -93.1% 17.00 ±173% interrupts.CPU69.TLB:TLB_shootdowns
> 1054 ±152% -98.1% 20.00 ±161% interrupts.CPU7.RES:Rescheduling_interrupts
> 66.00 ± 66% -90.9% 6.00 ±173% interrupts.CPU7.TLB:TLB_shootdowns
> 1056 ± 24% -94.3% 59.75 ±125% interrupts.CPU70.RES:Rescheduling_interrupts
> 224.75 ± 69% -89.5% 23.50 ±163% interrupts.CPU70.TLB:TLB_shootdowns
> 0.00 +1.1e+104% 107.00 ± 93% interrupts.CPU72.TLB:TLB_shootdowns
> 9.00 ±137% +3747.2% 346.25 ± 96% interrupts.CPU73.RES:Rescheduling_interrupts
> 0.00 +2.3e+104% 230.00 ±133% interrupts.CPU74.TLB:TLB_shootdowns
> 0.00 +1.6e+104% 165.00 ± 51% interrupts.CPU75.TLB:TLB_shootdowns
> 537.75 ± 26% -90.2% 52.50 ±113% interrupts.CPU8.TLB:TLB_shootdowns
> 0.00 +9.8e+103% 97.50 ± 67% interrupts.CPU83.TLB:TLB_shootdowns
> 0.00 +2e+104% 198.00 ± 49% interrupts.CPU85.TLB:TLB_shootdowns
> 13.25 ± 59% +809.4% 120.50 ± 75% interrupts.CPU88.RES:Rescheduling_interrupts
> 5593 ± 51% -93.2% 380.50 ±160% interrupts.CPU9.RES:Rescheduling_interrupts
> 0.75 ± 57% +31033.3% 233.50 ± 89% interrupts.CPU93.RES:Rescheduling_interrupts
> 0.50 ±100% +47100.0% 236.00 ±115% interrupts.CPU94.RES:Rescheduling_interrupts
> 16.25 ±169% +3118.5% 523.00 ± 78% interrupts.CPU95.RES:Rescheduling_interrupts
> 2.00 ±173% +8037.5% 162.75 ± 67% interrupts.CPU95.TLB:TLB_shootdowns
> 49869 ± 7% +94.1% 96790 ± 11% softirqs.CPU0.RCU
> 51718 ± 6% +67.2% 86459 ± 11% softirqs.CPU1.RCU
> 51281 ± 5% +92.7% 98810 ± 7% softirqs.CPU10.RCU
> 49686 ± 12% +95.1% 96913 ± 11% softirqs.CPU11.RCU
> 51222 ± 6% +95.4% 100111 ± 11% softirqs.CPU12.RCU
> 50980 ± 6% +99.1% 101505 ± 6% softirqs.CPU13.RCU
> 51046 ± 7% +98.3% 101207 ± 6% softirqs.CPU14.RCU
> 45578 ± 21% +116.5% 98669 ± 8% softirqs.CPU15.RCU
> 44168 ± 7% +99.2% 87963 ± 11% softirqs.CPU16.RCU
> 41610 ± 9% +119.4% 91301 ± 13% softirqs.CPU17.RCU
> 43759 ± 8% +110.7% 92217 ± 11% softirqs.CPU18.RCU
> 42902 ± 8% +109.1% 89698 ± 10% softirqs.CPU19.RCU
> 52217 ± 5% +94.3% 101454 ± 5% softirqs.CPU2.RCU
> 42704 ± 5% +114.8% 91744 ± 10% softirqs.CPU20.RCU
> 44910 ± 7% +102.1% 90783 ± 10% softirqs.CPU22.RCU
> 44991 ± 7% +104.7% 92096 ± 11% softirqs.CPU23.RCU
> 47098 ± 12% +92.3% 90567 ± 9% softirqs.CPU24.RCU
> 106586 ± 6% -11.9% 93907 ± 3% softirqs.CPU24.SCHED
> 42456 ± 12% +99.8% 84830 ± 8% softirqs.CPU25.RCU
> 47709 ± 6% +82.9% 87250 ± 10% softirqs.CPU26.RCU
> 50970 ± 13% +73.4% 88408 ± 12% softirqs.CPU27.RCU
> 51241 ± 13% +67.2% 85676 ± 15% softirqs.CPU28.RCU
> 50293 ± 13% +80.0% 90528 ± 10% softirqs.CPU29.RCU
> 53868 ± 4% +84.6% 99454 ± 10% softirqs.CPU3.RCU
> 44947 ± 14% +98.9% 89421 ± 8% softirqs.CPU31.RCU
> 54210 ± 19% +86.6% 101147 ± 5% softirqs.CPU32.RCU
> 58728 ± 15% +70.0% 99864 ± 9% softirqs.CPU33.RCU
> 55716 ± 18% +87.2% 104278 ± 5% softirqs.CPU34.RCU
> 58583 ± 15% +77.6% 104065 ± 5% softirqs.CPU35.RCU
> 58080 ± 14% +82.0% 105718 ± 5% softirqs.CPU36.RCU
> 50595 ± 12% +95.7% 99025 ± 4% softirqs.CPU37.RCU
> 53939 ± 13% +82.0% 98153 ± 4% softirqs.CPU38.RCU
> 53424 ± 14% +90.6% 101817 ± 6% softirqs.CPU39.RCU
> 52327 ± 5% +96.8% 102971 ± 4% softirqs.CPU4.RCU
> 54501 ± 14% +87.8% 102330 ± 7% softirqs.CPU40.RCU
> 53036 ± 35% +92.4% 102050 ± 4% softirqs.CPU41.RCU
> 60492 ± 14% +69.9% 102761 ± 2% softirqs.CPU42.RCU
> 57222 ± 15% +69.7% 97101 ± 5% softirqs.CPU43.RCU
> 58535 ± 14% +76.2% 103127 ± 5% softirqs.CPU44.RCU
> 57025 ± 14% +76.3% 100515 ± 6% softirqs.CPU45.RCU
> 60244 ± 17% +63.4% 98447 ± 7% softirqs.CPU46.RCU
> 57985 ± 14% +68.9% 97910 ± 12% softirqs.CPU47.RCU
> 49462 ± 6% +95.9% 96879 ± 11% softirqs.CPU48.RCU
> 49436 ± 4% +92.7% 95249 ± 12% softirqs.CPU49.RCU
> 52413 ± 5% +88.8% 98965 ± 11% softirqs.CPU5.RCU
> 51032 ± 6% +104.5% 104382 ± 7% softirqs.CPU50.RCU
> 45684 ± 30% +117.7% 99464 ± 15% softirqs.CPU51.RCU
> 51919 ± 5% +103.2% 105507 ± 6% softirqs.CPU52.RCU
> 52420 ± 4% +95.6% 102530 ± 12% softirqs.CPU53.RCU
> 49538 ± 9% +112.3% 105146 ± 7% softirqs.CPU54.RCU
> 49883 ± 7% +103.9% 101712 ± 9% softirqs.CPU55.RCU
> 49604 ± 6% +95.9% 97166 ± 9% softirqs.CPU56.RCU
> 51798 ± 4% +98.5% 102805 ± 6% softirqs.CPU57.RCU
> 49777 ± 8% +99.7% 99405 ± 7% softirqs.CPU58.RCU
> 49338 ± 14% +105.1% 101169 ± 11% softirqs.CPU59.RCU
> 49653 ± 10% +106.2% 102399 ± 6% softirqs.CPU6.RCU
> 51134 ± 6% +98.8% 101668 ± 11% softirqs.CPU60.RCU
> 50488 ± 7% +102.4% 102180 ± 7% softirqs.CPU61.RCU
> 50078 ± 6% +102.7% 101522 ± 6% softirqs.CPU62.RCU
> 51026 ± 5% +99.6% 101832 ± 8% softirqs.CPU63.RCU
> 42895 ± 7% +107.0% 88783 ± 13% softirqs.CPU64.RCU
> 42387 ± 11% +113.3% 90430 ± 11% softirqs.CPU65.RCU
> 43746 ± 8% +108.8% 91341 ± 12% softirqs.CPU66.RCU
> 40324 ± 12% +121.3% 89240 ± 11% softirqs.CPU67.RCU
> 41900 ± 6% +114.6% 89917 ± 11% softirqs.CPU68.RCU
> 44157 ± 7% +104.3% 90228 ± 11% softirqs.CPU69.RCU
> 50551 ± 9% +97.0% 99596 ± 8% softirqs.CPU7.RCU
> 44094 ± 7% +107.9% 91683 ± 13% softirqs.CPU70.RCU
> 44086 ± 7% +106.8% 91179 ± 11% softirqs.CPU71.RCU
> 46488 ± 15% +90.2% 88422 ± 8% softirqs.CPU72.RCU
> 42653 ± 12% +99.2% 84953 ± 9% softirqs.CPU73.RCU
> 48559 ± 12% +76.5% 85713 ± 12% softirqs.CPU74.RCU
> 51770 ± 16% +73.0% 89572 ± 9% softirqs.CPU75.RCU
> 50182 ± 14% +72.4% 86513 ± 14% softirqs.CPU76.RCU
> 50446 ± 14% +77.7% 89651 ± 10% softirqs.CPU77.RCU
> 49119 ± 13% +84.9% 90818 ± 8% softirqs.CPU78.RCU
> 45781 ± 15% +93.5% 88595 ± 8% softirqs.CPU79.RCU
> 50229 ± 6% +96.4% 98632 ± 7% softirqs.CPU8.RCU
> 45155 ± 36% +128.7% 103272 ± 5% softirqs.CPU80.RCU
> 58487 ± 14% +77.0% 103508 ± 6% softirqs.CPU81.RCU
> 55666 ± 18% +87.0% 104094 ± 5% softirqs.CPU82.RCU
> 58506 ± 14% +77.7% 103963 ± 5% softirqs.CPU83.RCU
> 58416 ± 14% +80.3% 105297 ± 6% softirqs.CPU84.RCU
> 51291 ± 12% +95.8% 100447 ± 4% softirqs.CPU85.RCU
> 54557 ± 13% +82.1% 99356 ± 4% softirqs.CPU86.RCU
> 54147 ± 13% +89.8% 102754 ± 6% softirqs.CPU87.RCU
> 55853 ± 14% +82.1% 101688 ± 6% softirqs.CPU88.RCU
> 59176 ± 16% +70.8% 101077 ± 5% softirqs.CPU89.RCU
> 52291 ± 6% +93.8% 101344 ± 5% softirqs.CPU9.RCU
> 60854 ± 15% +69.0% 102864 ± 2% softirqs.CPU90.RCU
> 57902 ± 12% +70.1% 98501 ± 4% softirqs.CPU91.RCU
> 59683 ± 15% +73.3% 103419 ± 5% softirqs.CPU92.RCU
> 57459 ± 14% +78.5% 102584 ± 6% softirqs.CPU93.RCU
> 59398 ± 13% +63.2% 96927 ± 8% softirqs.CPU94.RCU
> 56576 ± 15% +76.2% 99664 ± 11% softirqs.CPU95.RCU
> 5123 ± 16% +168.1% 13739 ± 7% softirqs.NET_RX
> 4882876 ± 10% +90.1% 9282598 ± 6% softirqs.RCU
>
>
>
> phoronix-test-suite.glmark2.0.score
>
> 300 +-+-------------------------------------------------------------------+
> | : +. +..+ +. +..+. +. + |
> 250 +-+: |
> O : O O O O O O O O O O O O O O
> | :O O O O O O O O O O |
> 200 +-+ |
> | : |
> 150 +-+ |
> |: |
> 100 +-+ |
> |: |
> |: |
> 50 +-+ |
> | |
> 0 +-+-------------------------------------------------------------------+
>
>
> phoronix-test-suite.glmark2.1.score
>
> 250 +-+-------------------------------------------------------------------+
> | |
> | +..+..+..+..+..+.+..+..+.. .+..+..+..+..+..+..+..+.+..+ |
> 200 +-+: +. |
> | : O O O
> O :O O O O O O O O O O O O O O O O O O O O O |
> 150 +-+ |
> | : |
> 100 +-+ |
> |: |
> |: |
> 50 +-+ |
> |: |
> | |
> 0 +-+-------------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> ***************************************************************************************************
> lkp-csl-2sp8: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/need_x/rootfs/tbox_group/test/testcase/ucode:
> gcc-7/performance/x86_64-rhel-7.6/true/debian-x86_64-phoronix/lkp-csl-2sp8/gtkperf-1.2.1/phoronix-test-suite/0x500002b
>
> commit:
> 9f4813b531 (" Linux 5.4-rc8")
> 8d04a5f97a ("x86/mm/pat: Convert the PAT tree to a generic interval tree")
>
> 9f4813b531a0b8cc 8d04a5f97a5fa9d7afdf46eda3a
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 10.34 +26.8% 13.11 ± 5% phoronix-test-suite.gtkperf.0.seconds
> 62.29 +13.3% 70.56 ± 2% phoronix-test-suite.time.elapsed_time
> 62.29 +13.3% 70.56 ± 2% phoronix-test-suite.time.elapsed_time.max
> 46875 -12.5% 41002 ± 3% meminfo.max_used_kB
> 29.85 +39.6% 41.68 ± 22% boot-time.boot
> 2433 +48.7% 3617 ± 25% boot-time.idle
> 1628 -11.4% 1442 ± 4% vmstat.io.bi
> 9046 -11.6% 7993 ± 2% vmstat.system.cs
> 1.954e+09 ± 69% +226.7% 6.384e+09 ± 6% cpuidle.C1E.time
> 6278082 ± 33% +94.4% 12207082 ± 16% cpuidle.C1E.usage
> 3.958e+09 ± 32% -93.8% 2.444e+08 ± 10% cpuidle.C6.time
> 5684464 -95.4% 262692 ± 13% cpuidle.C6.usage
> 8.34 ±100% -8.3 0.00 perf-profile.calltrace.cycles-pp.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 8.34 ±100% -8.3 0.00 perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 8.34 ±100% -8.3 0.00 perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 5.28 ±100% -3.2 2.08 ±173% perf-profile.calltrace.cycles-pp.mmput.do_exit.do_group_exit.get_signal.do_signal
> 5.28 ±100% -3.2 2.08 ±173% perf-profile.calltrace.cycles-pp.exit_mmap.mmput.do_exit.do_group_exit.get_signal
> 8.34 ±100% -8.3 0.00 perf-profile.children.cycles-pp.do_sys_open
> 8.34 ±100% -8.3 0.00 perf-profile.children.cycles-pp.do_filp_open
> 8.34 ±100% -8.3 0.00 perf-profile.children.cycles-pp.path_openat
> 5.28 ±100% -3.2 2.08 ±173% perf-profile.children.cycles-pp.mmput
> 5.28 ±100% -3.2 2.08 ±173% perf-profile.children.cycles-pp.exit_mmap
> 17.00 ± 13% +230.9% 56.25 ± 29% sched_debug.cfs_rq:/.runnable_load_avg.max
> 3.87 ± 17% +73.5% 6.71 ± 22% sched_debug.cfs_rq:/.runnable_load_avg.stddev
> 25.25 ± 12% -22.4% 19.58 ± 20% sched_debug.cfs_rq:/.util_est_enqueued.avg
> 34397 +44.4% 49658 ± 29% sched_debug.cpu.clock.avg
> 34402 +44.4% 49663 ± 29% sched_debug.cpu.clock.max
> 34391 +44.4% 49654 ± 29% sched_debug.cpu.clock.min
> 34397 +44.4% 49658 ± 29% sched_debug.cpu.clock_task.avg
> 34402 +44.4% 49663 ± 29% sched_debug.cpu.clock_task.max
> 34391 +44.4% 49654 ± 29% sched_debug.cpu.clock_task.min
> 9.74 ± 5% -11.5% 8.62 ± 4% sched_debug.cpu.nr_uninterruptible.stddev
> 34393 +44.4% 49654 ± 29% sched_debug.cpu_clk
> 31537 +48.4% 46795 ± 31% sched_debug.ktime
> 34725 +44.0% 49988 ± 29% sched_debug.sched_clk
> 96262 +2.1% 98322 proc-vmstat.nr_active_anon
> 29673 +1.4% 30100 proc-vmstat.nr_active_file
> 94407 +2.3% 96570 proc-vmstat.nr_anon_pages
> 1082 +1.5% 1099 proc-vmstat.nr_page_table_pages
> 96262 +2.1% 98322 proc-vmstat.nr_zone_active_anon
> 29673 +1.4% 30100 proc-vmstat.nr_zone_active_file
> 7664 ± 27% +172.9% 20912 ± 38% proc-vmstat.numa_hint_faults
> 7507 ± 27% +124.1% 16824 ± 38% proc-vmstat.numa_hint_faults_local
> 328182 +5.9% 347387 ± 4% proc-vmstat.numa_hit
> 309363 +6.2% 328630 ± 4% proc-vmstat.numa_local
> 39400 ± 35% +90.3% 74971 ± 42% proc-vmstat.numa_pte_updates
> 370297 +7.5% 398082 ± 4% proc-vmstat.pgalloc_normal
> 278665 +12.0% 312075 ± 4% proc-vmstat.pgfault
> 269968 ± 2% +10.9% 299343 ± 5% proc-vmstat.pgfree
> 16.26 ± 53% -51.0% 7.96 ± 2% perf-stat.i.MPKI
> 8.359e+08 ± 2% -12.0% 7.355e+08 ± 5% perf-stat.i.branch-instructions
> 3.78 ± 21% -1.0 2.76 ± 9% perf-stat.i.branch-miss-rate%
> 30272546 ± 11% -25.3% 22600376 ± 15% perf-stat.i.branch-misses
> 1870084 ± 6% -59.7% 752958 ± 15% perf-stat.i.cache-misses
> 47650854 ± 37% -46.3% 25593693 ± 2% perf-stat.i.cache-references
> 9360 -12.3% 8208 ± 2% perf-stat.i.context-switches
> 6.312e+09 ± 6% -15.0% 5.364e+09 ± 2% perf-stat.i.cpu-cycles
> 3783 ± 9% +176.2% 10448 ± 18% perf-stat.i.cycles-between-cache-misses
> 0.16 ± 40% -0.1 0.01 ± 16% perf-stat.i.dTLB-load-miss-rate%
> 921731 ± 27% -87.9% 111653 ± 10% perf-stat.i.dTLB-load-misses
> 0.05 ± 32% -0.0 0.01 ± 21% perf-stat.i.dTLB-store-miss-rate%
> 157895 ± 26% -82.3% 28003 ± 22% perf-stat.i.dTLB-store-misses
> 41.55 ± 4% -3.5 38.06 perf-stat.i.iTLB-load-miss-rate%
> 3.826e+09 ± 2% -11.9% 3.372e+09 ± 6% perf-stat.i.instructions
> 9.33 -13.3% 8.09 ± 3% perf-stat.i.major-faults
> 12.54 ± 38% -39.3% 7.61 ± 5% perf-stat.overall.MPKI
> 3.63 ± 13% -0.6 3.06 ± 10% perf-stat.overall.branch-miss-rate%
> 4.51 ± 35% -1.6 2.94 ± 14% perf-stat.overall.cache-miss-rate%
> 3391 ± 10% +113.9% 7255 ± 12% perf-stat.overall.cycles-between-cache-misses
> 0.10 ± 33% -0.1 0.01 ± 14% perf-stat.overall.dTLB-load-miss-rate%
> 0.04 ± 26% -0.0 0.01 ± 23% perf-stat.overall.dTLB-store-miss-rate%
> 8.23e+08 ± 2% -11.9% 7.253e+08 ± 5% perf-stat.ps.branch-instructions
> 29792047 ± 11% -25.2% 22290398 ± 15% perf-stat.ps.branch-misses
> 1841339 ± 6% -59.6% 743171 ± 15% perf-stat.ps.cache-misses
> 46876977 ± 37% -46.2% 25232671 ± 2% perf-stat.ps.cache-references
> 9205 -12.1% 8091 ± 2% perf-stat.ps.context-switches
> 6.211e+09 ± 6% -14.8% 5.289e+09 ± 2% perf-stat.ps.cpu-cycles
> 907015 ± 27% -87.9% 110123 ± 10% perf-stat.ps.dTLB-load-misses
> 155379 ± 26% -82.2% 27623 ± 22% perf-stat.ps.dTLB-store-misses
> 3.766e+09 ± 2% -11.7% 3.325e+09 ± 6% perf-stat.ps.instructions
> 9.21 -13.4% 7.98 ± 3% perf-stat.ps.major-faults
> 16348 ± 5% +33.4% 21802 ± 5% softirqs.CPU0.SCHED
> 11186 ± 15% +37.7% 15406 ± 16% softirqs.CPU1.SCHED
> 9914 ± 8% +48.9% 14762 ± 30% softirqs.CPU10.SCHED
> 9645 ± 2% +21.1% 11681 ± 4% softirqs.CPU11.SCHED
> 9779 ± 3% +17.1% 11450 ± 7% softirqs.CPU13.SCHED
> 9805 ± 3% +12.6% 11040 ± 6% softirqs.CPU14.SCHED
> 9658 ± 2% +25.3% 12104 ± 10% softirqs.CPU15.SCHED
> 31797 ± 7% +23.0% 39108 ± 14% softirqs.CPU16.TIMER
> 9330 ± 6% +29.9% 12116 ± 4% softirqs.CPU17.SCHED
> 31554 ± 10% +16.1% 36648 ± 6% softirqs.CPU17.TIMER
> 9493 ± 4% +21.9% 11574 ± 6% softirqs.CPU18.SCHED
> 9700 ± 2% +61.1% 15631 ± 47% softirqs.CPU19.SCHED
> 10097 ± 4% +20.7% 12186 softirqs.CPU2.SCHED
> 9731 ± 2% +18.3% 11516 ± 3% softirqs.CPU20.SCHED
> 9664 ± 3% +18.1% 11413 ± 4% softirqs.CPU21.SCHED
> 9810 ± 3% +12.3% 11013 ± 6% softirqs.CPU22.SCHED
> 9509 ± 3% +20.0% 11407 ± 4% softirqs.CPU23.SCHED
> 9355 ± 9% +23.8% 11581 ± 8% softirqs.CPU26.SCHED
> 34990 +24.1% 43413 ± 6% softirqs.CPU26.TIMER
> 9577 ± 8% +20.8% 11569 ± 4% softirqs.CPU27.SCHED
> 9920 ± 2% +15.5% 11462 ± 6% softirqs.CPU28.SCHED
> 10084 +13.1% 11409 ± 4% softirqs.CPU29.SCHED
> 30741 ± 10% +27.7% 39262 ± 8% softirqs.CPU3.TIMER
> 9843 +18.7% 11688 ± 6% softirqs.CPU30.SCHED
> 8609 ± 21% +33.6% 11503 ± 4% softirqs.CPU32.SCHED
> 9714 ± 2% +22.8% 11927 ± 7% softirqs.CPU33.SCHED
> 9695 +18.1% 11445 ± 5% softirqs.CPU34.SCHED
> 8242 ± 30% +39.4% 11488 ± 6% softirqs.CPU35.SCHED
> 8035 ± 30% +40.4% 11280 ± 5% softirqs.CPU37.SCHED
> 9880 +17.7% 11627 ± 6% softirqs.CPU46.SCHED
> 8892 ± 9% +26.4% 11243 ± 8% softirqs.CPU47.SCHED
> 9244 ± 7% +16.8% 10799 ± 3% softirqs.CPU49.SCHED
> 9617 +18.4% 11389 ± 3% softirqs.CPU50.SCHED
> 31719 ± 7% +12.5% 35691 ± 5% softirqs.CPU50.TIMER
> 9366 ± 5% +16.5% 10911 ± 3% softirqs.CPU51.SCHED
> 9929 ± 3% +22.9% 12207 ± 7% softirqs.CPU53.SCHED
> 8616 ± 13% +41.5% 12194 ± 10% softirqs.CPU54.SCHED
> 9500 ± 3% +15.4% 10962 softirqs.CPU55.SCHED
> 32014 ± 9% +20.3% 38529 ± 15% softirqs.CPU56.TIMER
> 9732 ± 6% +16.2% 11310 ± 4% softirqs.CPU59.SCHED
> 9628 ± 2% +21.5% 11699 ± 10% softirqs.CPU60.SCHED
> 10026 ± 5% +10.9% 11124 ± 6% softirqs.CPU62.SCHED
> 9585 +14.5% 10972 ± 8% softirqs.CPU63.SCHED
> 9541 ± 3% +21.4% 11586 ± 16% softirqs.CPU64.SCHED
> 9035 ± 10% +17.1% 10577 ± 7% softirqs.CPU65.SCHED
> 9483 ± 2% +20.2% 11401 ± 5% softirqs.CPU66.SCHED
> 9026 ± 12% +23.4% 11134 ± 4% softirqs.CPU68.SCHED
> 9482 ± 2% +13.8% 10787 ± 5% softirqs.CPU69.SCHED
> 9658 ± 2% +20.4% 11624 ± 4% softirqs.CPU7.SCHED
> 9764 ± 3% +11.0% 10836 ± 5% softirqs.CPU70.SCHED
> 9316 ± 11% +18.8% 11072 ± 12% softirqs.CPU74.SCHED
> 9963 +12.3% 11189 ± 6% softirqs.CPU75.SCHED
> 10017 +14.1% 11425 ± 7% softirqs.CPU76.SCHED
> 9788 +18.4% 11588 ± 5% softirqs.CPU77.SCHED
> 9968 +16.3% 11591 ± 7% softirqs.CPU78.SCHED
> 9911 ± 2% +16.6% 11555 ± 5% softirqs.CPU79.SCHED
> 9408 ± 2% +19.7% 11258 ± 7% softirqs.CPU80.SCHED
> 10027 ± 4% +11.3% 11164 ± 5% softirqs.CPU81.SCHED
> 9638 ± 3% +14.3% 11019 ± 7% softirqs.CPU82.SCHED
> 9551 ± 2% +15.0% 10981 ± 8% softirqs.CPU83.SCHED
> 9583 ± 4% +18.8% 11389 ± 6% softirqs.CPU84.SCHED
> 8173 ± 29% +34.4% 10989 ± 7% softirqs.CPU86.SCHED
> 9936 ± 4% +14.8% 11410 ± 6% softirqs.CPU9.SCHED
> 8179 ± 30% +37.6% 11257 ± 10% softirqs.CPU91.SCHED
> 8077 ± 30% +39.1% 11233 ± 8% softirqs.CPU95.SCHED
> 919351 ± 7% +19.9% 1102697 ± 4% softirqs.SCHED
> 3193255 ± 2% +11.6% 3563915 ± 6% softirqs.TIMER
> 33.00 ±147% -100.0% 0.00 interrupts.52:PCI-MSI.31981585-edge.i40e-eth0-TxRx-16
> 127.00 +15.4% 146.50 ± 3% interrupts.9:IO-APIC.9-fasteoi.acpi
> 124159 ± 3% +16.5% 144587 ± 3% interrupts.CPU0.LOC:Local_timer_interrupts
> 127.00 +15.4% 146.50 ± 3% interrupts.CPU1.9:IO-APIC.9-fasteoi.acpi
> 123924 ± 3% +16.7% 144656 ± 2% interrupts.CPU1.LOC:Local_timer_interrupts
> 123637 ± 3% +16.2% 143635 ± 4% interrupts.CPU10.LOC:Local_timer_interrupts
> 123785 ± 3% +15.3% 142755 ± 3% interrupts.CPU11.LOC:Local_timer_interrupts
> 123691 ± 3% +16.3% 143829 ± 3% interrupts.CPU12.LOC:Local_timer_interrupts
> 123545 ± 3% +15.7% 142970 ± 3% interrupts.CPU13.LOC:Local_timer_interrupts
> 123726 ± 3% +16.2% 143718 ± 3% interrupts.CPU14.LOC:Local_timer_interrupts
> 123727 ± 3% +15.2% 142493 ± 2% interrupts.CPU15.LOC:Local_timer_interrupts
> 123761 ± 3% +16.7% 144390 ± 3% interrupts.CPU16.LOC:Local_timer_interrupts
> 123749 ± 3% +16.9% 144632 ± 3% interrupts.CPU17.LOC:Local_timer_interrupts
> 123660 ± 3% +16.9% 144533 ± 3% interrupts.CPU18.LOC:Local_timer_interrupts
> 123976 ± 3% +17.0% 145106 ± 3% interrupts.CPU19.LOC:Local_timer_interrupts
> 123983 ± 3% +16.8% 144764 ± 3% interrupts.CPU2.LOC:Local_timer_interrupts
> 123783 ± 3% +16.7% 144400 ± 3% interrupts.CPU20.LOC:Local_timer_interrupts
> 123924 ± 3% +16.5% 144405 ± 3% interrupts.CPU21.LOC:Local_timer_interrupts
> 124054 ± 3% +15.9% 143768 ± 3% interrupts.CPU22.LOC:Local_timer_interrupts
> 124085 ± 3% +15.7% 143543 ± 3% interrupts.CPU23.LOC:Local_timer_interrupts
> 110004 ± 22% +30.5% 143553 ± 4% interrupts.CPU24.LOC:Local_timer_interrupts
> 110068 ± 22% +29.7% 142727 ± 3% interrupts.CPU25.LOC:Local_timer_interrupts
> 110942 ± 22% +29.6% 143774 ± 2% interrupts.CPU26.LOC:Local_timer_interrupts
> 111894 ± 22% +28.3% 143555 ± 4% interrupts.CPU27.LOC:Local_timer_interrupts
> 111050 ± 22% +29.3% 143625 ± 4% interrupts.CPU28.LOC:Local_timer_interrupts
> 111805 ± 22% +27.5% 142570 ± 3% interrupts.CPU29.LOC:Local_timer_interrupts
> 122901 ± 3% +17.6% 144537 ± 3% interrupts.CPU3.LOC:Local_timer_interrupts
> 111025 ± 22% +29.5% 143752 ± 4% interrupts.CPU30.LOC:Local_timer_interrupts
> 110372 ± 21% +29.3% 142764 ± 3% interrupts.CPU31.LOC:Local_timer_interrupts
> 110041 ± 22% +30.2% 143294 ± 3% interrupts.CPU32.LOC:Local_timer_interrupts
> 110289 ± 21% +29.4% 142737 ± 3% interrupts.CPU33.LOC:Local_timer_interrupts
> 0.75 ±110% +4200.0% 32.25 ±143% interrupts.CPU33.RES:Rescheduling_interrupts
> 111042 ± 22% +28.7% 142872 ± 4% interrupts.CPU34.LOC:Local_timer_interrupts
> 110820 ± 22% +29.6% 143676 ± 4% interrupts.CPU35.LOC:Local_timer_interrupts
> 110849 ± 22% +29.9% 143999 ± 4% interrupts.CPU36.LOC:Local_timer_interrupts
> 110376 ± 22% +30.4% 143915 ± 3% interrupts.CPU37.LOC:Local_timer_interrupts
> 111593 ± 22% +29.0% 143915 ± 4% interrupts.CPU38.LOC:Local_timer_interrupts
> 111949 ± 22% +27.1% 142241 ± 4% interrupts.CPU39.LOC:Local_timer_interrupts
> 124261 ± 3% +15.7% 143828 ± 3% interrupts.CPU4.LOC:Local_timer_interrupts
> 111554 ± 22% +27.5% 142286 ± 4% interrupts.CPU40.LOC:Local_timer_interrupts
> 110802 ± 22% +29.1% 143047 ± 4% interrupts.CPU41.LOC:Local_timer_interrupts
> 111642 ± 22% +29.4% 144476 ± 4% interrupts.CPU42.LOC:Local_timer_interrupts
> 110980 ± 22% +29.3% 143450 ± 4% interrupts.CPU43.LOC:Local_timer_interrupts
> 111595 ± 22% +27.4% 142158 ± 4% interrupts.CPU44.LOC:Local_timer_interrupts
> 110926 ± 22% +28.8% 142914 ± 4% interrupts.CPU45.LOC:Local_timer_interrupts
> 111788 ± 22% +28.5% 143694 ± 4% interrupts.CPU46.LOC:Local_timer_interrupts
> 110789 ± 22% +29.2% 143182 ± 4% interrupts.CPU47.LOC:Local_timer_interrupts
> 123853 ± 3% +16.0% 143633 ± 3% interrupts.CPU48.LOC:Local_timer_interrupts
> 123954 ± 3% +16.1% 143915 ± 3% interrupts.CPU49.LOC:Local_timer_interrupts
> 125940 ± 5% +14.2% 143844 ± 2% interrupts.CPU5.LOC:Local_timer_interrupts
> 122996 ± 3% +16.9% 143771 ± 3% interrupts.CPU50.LOC:Local_timer_interrupts
> 123469 ± 3% +16.3% 143600 ± 3% interrupts.CPU51.LOC:Local_timer_interrupts
> 122963 ± 2% +16.8% 143584 ± 3% interrupts.CPU52.LOC:Local_timer_interrupts
> 123920 ± 3% +16.7% 144558 ± 3% interrupts.CPU53.LOC:Local_timer_interrupts
> 123791 ± 3% +16.7% 144460 ± 3% interrupts.CPU54.LOC:Local_timer_interrupts
> 123877 ± 3% +16.6% 144392 ± 3% interrupts.CPU55.LOC:Local_timer_interrupts
> 123809 ± 3% +16.4% 144159 ± 3% interrupts.CPU56.LOC:Local_timer_interrupts
> 123506 ± 3% +16.6% 144033 ± 3% interrupts.CPU57.LOC:Local_timer_interrupts
> 123672 ± 3% +17.7% 145596 ± 4% interrupts.CPU58.LOC:Local_timer_interrupts
> 123454 ± 3% +16.8% 144178 ± 3% interrupts.CPU59.LOC:Local_timer_interrupts
> 123108 ± 4% +17.5% 144649 ± 3% interrupts.CPU6.LOC:Local_timer_interrupts
> 123692 ± 3% +16.1% 143634 ± 3% interrupts.CPU60.LOC:Local_timer_interrupts
> 123855 ± 3% +15.5% 143073 ± 3% interrupts.CPU61.LOC:Local_timer_interrupts
> 123714 ± 3% +16.2% 143797 ± 3% interrupts.CPU62.LOC:Local_timer_interrupts
> 123863 ± 3% +16.0% 143707 ± 3% interrupts.CPU63.LOC:Local_timer_interrupts
> 123875 ± 3% +15.9% 143543 ± 3% interrupts.CPU64.LOC:Local_timer_interrupts
> 124440 ± 2% +16.8% 145383 ± 2% interrupts.CPU65.LOC:Local_timer_interrupts
> 124284 ± 2% +15.9% 144070 ± 3% interrupts.CPU66.LOC:Local_timer_interrupts
> 123799 ± 3% +15.8% 143331 ± 3% interrupts.CPU67.LOC:Local_timer_interrupts
> 123400 ± 3% +17.0% 144332 ± 3% interrupts.CPU68.LOC:Local_timer_interrupts
> 125156 ± 4% +14.5% 143289 ± 3% interrupts.CPU69.LOC:Local_timer_interrupts
> 124195 ± 3% +16.3% 144402 ± 3% interrupts.CPU7.LOC:Local_timer_interrupts
> 123791 ± 3% +16.0% 143551 ± 3% interrupts.CPU70.LOC:Local_timer_interrupts
> 123826 ± 3% +15.7% 143277 ± 3% interrupts.CPU71.LOC:Local_timer_interrupts
> 110101 ± 22% +29.0% 141983 ± 4% interrupts.CPU72.LOC:Local_timer_interrupts
> 110774 ± 22% +28.3% 142100 ± 4% interrupts.CPU73.LOC:Local_timer_interrupts
> 111785 ± 22% +27.6% 142623 ± 3% interrupts.CPU74.LOC:Local_timer_interrupts
> 111709 ± 22% +29.2% 144354 ± 3% interrupts.CPU75.LOC:Local_timer_interrupts
> 111739 ± 22% +28.9% 143992 ± 3% interrupts.CPU76.LOC:Local_timer_interrupts
> 111017 ± 22% +29.7% 143954 ± 3% interrupts.CPU77.LOC:Local_timer_interrupts
> 111625 ± 22% +28.7% 143651 ± 3% interrupts.CPU78.LOC:Local_timer_interrupts
> 110970 ± 22% +29.5% 143695 ± 4% interrupts.CPU79.LOC:Local_timer_interrupts
> 122666 ± 4% +18.0% 144790 ± 3% interrupts.CPU8.LOC:Local_timer_interrupts
> 109397 ± 21% +32.0% 144381 ± 3% interrupts.CPU80.LOC:Local_timer_interrupts
> 111101 ± 22% +29.2% 143593 ± 3% interrupts.CPU81.LOC:Local_timer_interrupts
> 110996 ± 22% +28.0% 142033 ± 4% interrupts.CPU82.LOC:Local_timer_interrupts
> 110733 ± 22% +28.4% 142147 ± 4% interrupts.CPU83.LOC:Local_timer_interrupts
> 110217 ± 22% +30.5% 143787 ± 4% interrupts.CPU84.LOC:Local_timer_interrupts
> 109986 ± 22% +31.0% 144082 ± 3% interrupts.CPU85.LOC:Local_timer_interrupts
> 110129 ± 22% +30.2% 143365 ± 3% interrupts.CPU86.LOC:Local_timer_interrupts
> 110749 ± 22% +28.8% 142696 ± 4% interrupts.CPU87.LOC:Local_timer_interrupts
> 110778 ± 22% +29.8% 143804 ± 4% interrupts.CPU88.LOC:Local_timer_interrupts
> 111348 ± 22% +28.5% 143096 ± 4% interrupts.CPU89.LOC:Local_timer_interrupts
> 123749 ± 3% +16.1% 143677 ± 3% interrupts.CPU9.LOC:Local_timer_interrupts
> 110728 ± 22% +29.0% 142841 ± 4% interrupts.CPU90.LOC:Local_timer_interrupts
> 110097 ± 22% +30.1% 143207 ± 4% interrupts.CPU91.LOC:Local_timer_interrupts
> 111533 ± 22% +28.3% 143083 ± 4% interrupts.CPU92.LOC:Local_timer_interrupts
> 110777 ± 22% +29.0% 142893 ± 4% interrupts.CPU93.LOC:Local_timer_interrupts
> 109394 ± 21% +31.1% 143427 ± 4% interrupts.CPU94.LOC:Local_timer_interrupts
> 110000 ± 22% +30.0% 143027 ± 4% interrupts.CPU95.LOC:Local_timer_interrupts
> 11264859 ± 12% +22.4% 13788804 ± 3% interrupts.LOC:Local_timer_interrupts
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Rong Chen
>

2019-11-30 21:33:32

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [x86/mm/pat] 8d04a5f97a: phoronix-test-suite.glmark2.0.score -23.7% regression

On Sat, 30 Nov 2019, Mariusz Ceier wrote:

>I can also confirm this - just bisected framebuffer rendering
>performance regression on amdgpu and
>8d04a5f97a5fa9d7afdf46eda3a5ceaa973a1bcc is the first bad commit
>(leading to drop from around 260-300fps to about 60fps in CS:GO on
>Fury X).

This is a third report now. Could you please provide the contents
of the following file, before and after the offending commit.

/sys/kernel/debug/x86/pat_memtype_list

This will show any attribute differences in the tree, which is likely
the culprit.

Thanks,
Davidlohr

2019-11-30 22:12:36

by Mariusz Ceier

[permalink] [raw]
Subject: Re: [x86/mm/pat] 8d04a5f97a: phoronix-test-suite.glmark2.0.score -23.7% regression

Contents of /sys/kernel/debug/x86/pat_memtype_list on master
(32ef9553635ab1236c33951a8bd9b5af1c3b1646) where performance is
degraded:

PAT memtype list:
write-back @ 0x55ba4000-0x55ba5000
write-back @ 0x5e88c000-0x5e8b5000
write-back @ 0x5e8b4000-0x5e8b5000
write-back @ 0x5e8b4000-0x5e8b8000
write-back @ 0x5e8b7000-0x5e8bb000
write-back @ 0x5e8ba000-0x5e8bc000
write-back @ 0x5e8bb000-0x5e8be000
write-back @ 0x5e8bd000-0x5e8bf000
write-back @ 0x5e8be000-0x5e8c2000
write-back @ 0x5ef3c000-0x5ef3d000
write-back @ 0x5ef6c000-0x5ef6d000
write-back @ 0x5ef6f000-0x5ef70000
write-back @ 0x5ef72000-0x5ef73000
write-back @ 0x5f5b3000-0x5f5b5000
uncached-minus @ 0xe3f00000-0xe3f10000
uncached-minus @ 0xec000000-0xec040000
uncached-minus @ 0xec002000-0xec003000
uncached-minus @ 0xec110000-0xec111000
uncached-minus @ 0xec200000-0xec240000
uncached-minus @ 0xec260000-0xec264000
uncached-minus @ 0xec300000-0xec320000
uncached-minus @ 0xec326000-0xec327000
uncached-minus @ 0xf0000000-0xf8000000
uncached-minus @ 0xf0000000-0xf0001000
uncached-minus @ 0xfdc43000-0xfdc44000
uncached-minus @ 0xfe000000-0xfe001000
uncached-minus @ 0xfed00000-0xfed01000
uncached-minus @ 0xfed10000-0xfed16000
uncached-minus @ 0xfed90000-0xfed91000
uncached-minus @ 0x2000000000-0x2100000000
uncached-minus @ 0x2000000000-0x2100000000
uncached-minus @ 0x2100000000-0x2100001000
uncached-minus @ 0x2100001000-0x2100002000
uncached-minus @ 0x2ffff10000-0x2ffff20000
uncached-minus @ 0x2ffff20000-0x2ffff24000

Contents of /sys/kernel/debug/x86/pat_memtype_list after reverting
7f264dab5b60, 511aaca834fe, 6a9930b1c50d, 8d04a5f97a5f (in that order)
which restores the performance:

PAT memtype list:
write-back @ 0x55ba4000-0x55ba5000
write-back @ 0x5e88c000-0x5e8b5000
write-back @ 0x5e8b4000-0x5e8b8000
write-back @ 0x5e8b4000-0x5e8b5000
write-back @ 0x5e8b7000-0x5e8bb000
write-back @ 0x5e8ba000-0x5e8bc000
write-back @ 0x5e8bb000-0x5e8be000
write-back @ 0x5e8bd000-0x5e8bf000
write-back @ 0x5e8be000-0x5e8c2000
write-back @ 0x5ef3c000-0x5ef3d000
write-back @ 0x5ef6c000-0x5ef6d000
write-back @ 0x5ef6f000-0x5ef70000
write-back @ 0x5ef72000-0x5ef73000
write-back @ 0x5f5b3000-0x5f5b5000
uncached-minus @ 0xe3f00000-0xe3f10000
uncached-minus @ 0xec000000-0xec040000
uncached-minus @ 0xec002000-0xec003000
uncached-minus @ 0xec110000-0xec111000
uncached-minus @ 0xec200000-0xec240000
uncached-minus @ 0xec260000-0xec264000
uncached-minus @ 0xec300000-0xec320000
uncached-minus @ 0xec326000-0xec327000
uncached-minus @ 0xf0000000-0xf0001000
uncached-minus @ 0xf0000000-0xf8000000
uncached-minus @ 0xfdc43000-0xfdc44000
uncached-minus @ 0xfe000000-0xfe001000
uncached-minus @ 0xfed00000-0xfed01000
uncached-minus @ 0xfed10000-0xfed16000
uncached-minus @ 0xfed90000-0xfed91000
write-combining @ 0x2000000000-0x2100000000
write-combining @ 0x2000000000-0x2100000000
uncached-minus @ 0x2100000000-0x2100001000
uncached-minus @ 0x2100001000-0x2100002000
uncached-minus @ 0x2ffff10000-0x2ffff20000
uncached-minus @ 0x2ffff20000-0x2ffff24000

Hope this helps (if you need the contents from commits exactly
8d04a5f97a5f and 8d04a5f97a5f~1, I can provide it, but it will take
more time).

Best regards,
Mariusz Ceier

On Sat, 30 Nov 2019 at 21:31, Davidlohr Bueso <[email protected]> wrote:
>
> On Sat, 30 Nov 2019, Mariusz Ceier wrote:
>
> >I can also confirm this - just bisected framebuffer rendering
> >performance regression on amdgpu and
> >8d04a5f97a5fa9d7afdf46eda3a5ceaa973a1bcc is the first bad commit
> >(leading to drop from around 260-300fps to about 60fps in CS:GO on
> >Fury X).
>
> This is a third report now. Could you please provide the contents
> of the following file, before and after the offending commit.
>
> /sys/kernel/debug/x86/pat_memtype_list
>
> This will show any attribute differences in the tree, which is likely
> the culprit.
>
> Thanks,
> Davidlohr

2019-11-30 22:36:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [x86/mm/pat] 8d04a5f97a: phoronix-test-suite.glmark2.0.score -23.7% regression

On Sat, Nov 30, 2019 at 2:09 PM Mariusz Ceier <[email protected]> wrote:
>
> Contents of /sys/kernel/debug/x86/pat_memtype_list on master
> (32ef9553635ab1236c33951a8bd9b5af1c3b1646) where performance is
> degraded:

Diff between good and bad case:

@@ -1,8 +1,8 @@
PAT memtype list:
write-back @ 0x55ba4000-0x55ba5000
write-back @ 0x5e88c000-0x5e8b5000
-write-back @ 0x5e8b4000-0x5e8b8000
write-back @ 0x5e8b4000-0x5e8b5000
+write-back @ 0x5e8b4000-0x5e8b8000
write-back @ 0x5e8b7000-0x5e8bb000
write-back @ 0x5e8ba000-0x5e8bc000
write-back @ 0x5e8bb000-0x5e8be000
@@ -21,15 +21,15 @@
uncached-minus @ 0xec260000-0xec264000
uncached-minus @ 0xec300000-0xec320000
uncached-minus @ 0xec326000-0xec327000
-uncached-minus @ 0xf0000000-0xf0001000
uncached-minus @ 0xf0000000-0xf8000000
+uncached-minus @ 0xf0000000-0xf0001000
uncached-minus @ 0xfdc43000-0xfdc44000
uncached-minus @ 0xfe000000-0xfe001000
uncached-minus @ 0xfed00000-0xfed01000
uncached-minus @ 0xfed10000-0xfed16000
uncached-minus @ 0xfed90000-0xfed91000
-write-combining @ 0x2000000000-0x2100000000
-write-combining @ 0x2000000000-0x2100000000
+uncached-minus @ 0x2000000000-0x2100000000
+uncached-minus @ 0x2000000000-0x2100000000
uncached-minus @ 0x2100000000-0x2100001000
uncached-minus @ 0x2100001000-0x2100002000
uncached-minus @ 0x2ffff10000-0x2ffff20000

the first two differences are just trivial ordering differences for
overlapping ranges (starting at 0x5e8b4000 and 0xf0000000)
respectively.

But the final difference is a real difference where it used to be WC,
and is now UC-:

-write-combining @ 0x2000000000-0x2100000000
-write-combining @ 0x2000000000-0x2100000000
+uncached-minus @ 0x2000000000-0x2100000000
+uncached-minus @ 0x2000000000-0x2100000000

which certainly could easily explain the huge performance degradation.

Linus

2019-12-01 11:05:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [x86/mm/pat] 8d04a5f97a: phoronix-test-suite.glmark2.0.score -23.7% regression


* Linus Torvalds <[email protected]> wrote:

> On Sat, Nov 30, 2019 at 2:09 PM Mariusz Ceier <[email protected]> wrote:
> >
> > Contents of /sys/kernel/debug/x86/pat_memtype_list on master
> > (32ef9553635ab1236c33951a8bd9b5af1c3b1646) where performance is
> > degraded:
>
> Diff between good and bad case:
>
> @@ -1,8 +1,8 @@
> PAT memtype list:
> write-back @ 0x55ba4000-0x55ba5000
> write-back @ 0x5e88c000-0x5e8b5000
> -write-back @ 0x5e8b4000-0x5e8b8000
> write-back @ 0x5e8b4000-0x5e8b5000
> +write-back @ 0x5e8b4000-0x5e8b8000
> write-back @ 0x5e8b7000-0x5e8bb000
> write-back @ 0x5e8ba000-0x5e8bc000
> write-back @ 0x5e8bb000-0x5e8be000
> @@ -21,15 +21,15 @@
> uncached-minus @ 0xec260000-0xec264000
> uncached-minus @ 0xec300000-0xec320000
> uncached-minus @ 0xec326000-0xec327000
> -uncached-minus @ 0xf0000000-0xf0001000
> uncached-minus @ 0xf0000000-0xf8000000
> +uncached-minus @ 0xf0000000-0xf0001000
> uncached-minus @ 0xfdc43000-0xfdc44000
> uncached-minus @ 0xfe000000-0xfe001000
> uncached-minus @ 0xfed00000-0xfed01000
> uncached-minus @ 0xfed10000-0xfed16000
> uncached-minus @ 0xfed90000-0xfed91000
> -write-combining @ 0x2000000000-0x2100000000
> -write-combining @ 0x2000000000-0x2100000000
> +uncached-minus @ 0x2000000000-0x2100000000
> +uncached-minus @ 0x2000000000-0x2100000000
> uncached-minus @ 0x2100000000-0x2100001000
> uncached-minus @ 0x2100001000-0x2100002000
> uncached-minus @ 0x2ffff10000-0x2ffff20000
>
> the first two differences are just trivial ordering differences for
> overlapping ranges (starting at 0x5e8b4000 and 0xf0000000)
> respectively.
>
> But the final difference is a real difference where it used to be WC,
> and is now UC-:
>
> -write-combining @ 0x2000000000-0x2100000000
> -write-combining @ 0x2000000000-0x2100000000
> +uncached-minus @ 0x2000000000-0x2100000000
> +uncached-minus @ 0x2000000000-0x2100000000
>
> which certainly could easily explain the huge performance degradation.

Indeed, as two days ago I speculated to Kenneth R. Crudup who reported a
similar slowdown on i915:

> * Ingo Molnar <[email protected]> wrote:
> > > * Kenneth R. Crudup <[email protected]> wrote:
> > >
> > > > As soon as the i915 driver module is loaded, it takes over the
> > > > EFI framebuffer on my machine (HP Spectre X360 with Intel UHD620
> > > > Graphics) and the subsequent text (as well as any VTs) is
> > > > rendered much more slowly. I don't know if the i915/DRM guys need
> > > > to do anything to their code to take advantage of this change to
> > > > the PATs, but reverting this change (after the associated
> > > > subseqent commits) has fixed that issue for me.
> > > >
> > > > Let me know if you need any further info.
> > >
> > > This is almost certainly the PAT bits being wrong in the
> > > pagetables, i.e. an x86 bug, not a GPU driver bug.
> > >
> > >
> > > Davidlohr, any idea what's going on? The interval tree conversion went
> > > bad. The slowdown symptoms are consistent with perhaps the framebuffer
> > > not getting WC mapped, but uncacheable mapped:
> > >
> > > ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap,
> > > vma->node.start,
> > > vma->node.size);
> > >
> > > Which is a wrapper around ioremap_wc().
> > >
> > > To debug this it would be useful to do a before/after comparison of the
> > > kernel pagetables:
> > >
> > > - before: git checkout 8d04a5f97a^1
> > > - after: git checkout 8d04a5f97a

And yesterday:

> [...]
>
> There's another similar bugreport of a -20% GL performance drop, from
> the ktest automated benchmark suite:
>
> https://lkml.kernel.org/r/20191127005312.GD20422@shao2-debian
>
> My shot-in-the-dark hypothesis is that perhaps we somehow fail to find
> a newly mapped memtype and leave a key ioremap_wc() area uncached,
> instead of write-combining?
>
> The order of magnitude of the slowdown would be roughly consistent with
> that, in GPU limited workloads - it would be more marked in 3D scenes
> with a lot of vertices or perhaps a lot of texture changes.
>
> But this is really just a random guess.

It's not an unconditional regression, as both Boris and me tried to
reproduce it on different systems that do ioremap_wc() as well and didn't
measure a slowdown, but something about the memory layout probably
triggers the tree management bug.

Thanks,

Ingo

2019-12-01 14:52:13

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search


* Ingo Molnar <[email protected]> wrote:

> * Linus Torvalds <[email protected]> wrote:

> > But the final difference is a real difference where it used to be WC,
> > and is now UC-:
> >
> > -write-combining @ 0x2000000000-0x2100000000
> > -write-combining @ 0x2000000000-0x2100000000
> > +uncached-minus @ 0x2000000000-0x2100000000
> > +uncached-minus @ 0x2000000000-0x2100000000
> >
> > which certainly could easily explain the huge performance degradation.

> It's not an unconditional regression, as both Boris and me tried to
> reproduce it on different systems that do ioremap_wc() as well and didn't
> measure a slowdown, but something about the memory layout probably
> triggers the tree management bug.

Ok, I think I found at least one bug in the new PAT code, the conversion
of memtype_check_conflict() is buggy I think:

8d04a5f97a5f: ("x86/mm/pat: Convert the PAT tree to a generic interval tree")

dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
found_type = match->type;

- node = rb_next(&match->rb);
- while (node) {
- match = rb_entry(node, struct memtype, rb);
-
- if (match->start >= end) /* Checked all possible matches */
- goto success;
-
- if (is_node_overlap(match, start, end) &&
- match->type != found_type) {
+ match = memtype_interval_iter_next(match, start, end);
+ while (match) {
+ if (match->type != found_type)
goto failure;
- }

- node = rb_next(&match->rb);
+ match = memtype_interval_iter_next(match, start, end);
}


Note how the '>= end' condition to end the interval check, got converted
into:

+ match = memtype_interval_iter_next(match, start, end);

This is subtly off by one, because the interval trees interfaces require
closed interval parameters:

include/linux/interval_tree_generic.h

/* \
* Iterate over intervals intersecting [start;last] \
* \
* Note that a node's interval intersects [start;last] iff: \
* Cond1: ITSTART(node) <= last \
* and \
* Cond2: start <= ITLAST(node) \
*/ \

...

if (ITSTART(node) <= last) { /* Cond1 */ \
if (start <= ITLAST(node)) /* Cond2 */ \
return node; /* node is leftmost match */ \

[start;last] is a closed interval (note that '<= last' check) - while the
PAT 'end' parameter is 1 byte beyond the end of the range, because
ioremap() and the other mapping APIs usually use the [start,end)
half-open interval, derived from 'size'.

This is what ioremap() does for example:

/*
* Mappings have to be page-aligned
*/
offset = phys_addr & ~PAGE_MASK;
phys_addr &= PHYSICAL_PAGE_MASK;
size = PAGE_ALIGN(last_addr+1) - phys_addr;

retval = reserve_memtype(phys_addr, (u64)phys_addr + size,
pcm, &new_pcm);


phys_addr+size will be on a page boundary, after the last byte of the
mapped interval.

So the correct parameter to use in the interval tree searches is not
'end' but 'end-1'.

This could have relevance if conflicting PAT ranges are exactly adjacent,
for example a future WC region is followed immediately by an already
mapped UC- region - in this case memtype_check_conflict() would
incorrectly deny the WC memtype region and downgrade the memtype to UC-.

BTW., rather annoyingly this downgrading is done silently in
memtype_check_insert():

int memtype_check_insert(struct memtype *new,
enum page_cache_mode *ret_type)
{
int err = 0;

err = memtype_check_conflict(new->start, new->end, new->type, ret_type);
if (err)
return err;

if (ret_type)
new->type = *ret_type;

memtype_interval_insert(new, &memtype_rbroot);
return 0;
}


So on such a conflict we'd just silently get UC- in *ret_type, and write
it into the new region, never the wiser ...

So assuming that the patch below fixes the primary bug the diagnostics
side of ioremap() cache attribute downgrades would be another thing to
fix.

Anyway, I checked all the interval-tree iterations, and most of them are
off by one - but I think the one related to memtype_check_conflict() is
the one causing this particular performance regression.

The only correct interval-tree searches were these two:

arch/x86/mm/pat_interval.c: match = memtype_interval_iter_first(&memtype_rbroot, 0, ULONG_MAX);
arch/x86/mm/pat_interval.c: match = memtype_interval_iter_next(match, 0, ULONG_MAX);

The ULONG_MAX was hiding the off-by-one in plain sight. :-)

So it would be nice if everyone who is seeing this bug could test the
patch below against Linus's latest tree - does it fix the regression?

If not then please send the before/after dump of
/sys/kernel/debug/x86/pat_memtype_list - and even if it works please send
the dumps so we can double check it all.

Note that the bug was benign in the sense of implementing a too strict
cache attribute conflict policy and downgrading cache attributes - so
AFAICS the worst outcome of this bug would be a performance regression.

Patch is only lightly tested, so take care. (Patch is emphatically not
signed off yet, because I spent most of the day on this and I don't yet
trust my fix - all of the affected sites need to be reviewed more
carefully.)

Thanks,

Ingo


====================>
From: Ingo Molnar <[email protected]>
Date: Sun, 1 Dec 2019 15:25:50 +0100
Subject: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search

NOT-Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/mm/pat_interval.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pat_interval.c b/arch/x86/mm/pat_interval.c
index 47a1bf30748f..6855362eaf21 100644
--- a/arch/x86/mm/pat_interval.c
+++ b/arch/x86/mm/pat_interval.c
@@ -56,7 +56,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
{
struct memtype *match;

- match = memtype_interval_iter_first(&memtype_rbroot, start, end);
+ match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
while (match != NULL && match->start < end) {
if ((match_type == MEMTYPE_EXACT_MATCH) &&
(match->start == start) && (match->end == end))
@@ -66,7 +66,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
(match->start < start) && (match->end == end))
return match;

- match = memtype_interval_iter_next(match, start, end);
+ match = memtype_interval_iter_next(match, start, end-1);
}

return NULL; /* Returns NULL if there is no match */
@@ -79,7 +79,7 @@ static int memtype_check_conflict(u64 start, u64 end,
struct memtype *match;
enum page_cache_mode found_type = reqtype;

- match = memtype_interval_iter_first(&memtype_rbroot, start, end);
+ match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
if (match == NULL)
goto success;

@@ -89,12 +89,12 @@ static int memtype_check_conflict(u64 start, u64 end,
dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
found_type = match->type;

- match = memtype_interval_iter_next(match, start, end);
+ match = memtype_interval_iter_next(match, start, end-1);
while (match) {
if (match->type != found_type)
goto failure;

- match = memtype_interval_iter_next(match, start, end);
+ match = memtype_interval_iter_next(match, start, end-1);
}
success:
if (newtype)
@@ -160,7 +160,7 @@ struct memtype *memtype_erase(u64 start, u64 end)
struct memtype *memtype_lookup(u64 addr)
{
return memtype_interval_iter_first(&memtype_rbroot, addr,
- addr + PAGE_SIZE);
+ addr + PAGE_SIZE-1);
}

#if defined(CONFIG_DEBUG_FS)

2019-12-01 16:11:40

by Mariusz Ceier

[permalink] [raw]
Subject: Re: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search

Your patch fixes performance issue on my system and afterwards
/sys/kernel/debug/x86/pat_memtype_list contents are:

PAT memtype list:
write-back @ 0x55ba4000-0x55ba5000
write-back @ 0x5e88c000-0x5e8b5000
write-back @ 0x5e8b4000-0x5e8b5000
write-back @ 0x5e8b4000-0x5e8b8000
write-back @ 0x5e8b7000-0x5e8bb000
write-back @ 0x5e8ba000-0x5e8bc000
write-back @ 0x5e8bb000-0x5e8be000
write-back @ 0x5e8bd000-0x5e8bf000
write-back @ 0x5e8be000-0x5e8c2000
write-back @ 0x5ef3c000-0x5ef3d000
write-back @ 0x5ef6c000-0x5ef6d000
write-back @ 0x5ef6f000-0x5ef70000
write-back @ 0x5ef72000-0x5ef73000
write-back @ 0x5f5b3000-0x5f5b5000
uncached-minus @ 0xe3f00000-0xe3f10000
uncached-minus @ 0xec000000-0xec040000
uncached-minus @ 0xec002000-0xec003000
uncached-minus @ 0xec110000-0xec111000
uncached-minus @ 0xec200000-0xec240000
uncached-minus @ 0xec260000-0xec264000
uncached-minus @ 0xec300000-0xec320000
uncached-minus @ 0xec326000-0xec327000
uncached-minus @ 0xf0000000-0xf8000000
uncached-minus @ 0xf0000000-0xf0001000
uncached-minus @ 0xfdc43000-0xfdc44000
uncached-minus @ 0xfe000000-0xfe001000
uncached-minus @ 0xfed00000-0xfed01000
uncached-minus @ 0xfed10000-0xfed16000
uncached-minus @ 0xfed90000-0xfed91000
write-combining @ 0x2000000000-0x2100000000
write-combining @ 0x2000000000-0x2100000000
uncached-minus @ 0x2100000000-0x2100001000
uncached-minus @ 0x2100001000-0x2100002000
uncached-minus @ 0x2ffff10000-0x2ffff20000
uncached-minus @ 0x2ffff20000-0x2ffff24000

It's very similar to pat_memtype_list contents after reverting 4
x86/mm/pat patches affecting performance:

@@ -1,8 +1,8 @@
PAT memtype list:
write-back @ 0x55ba4000-0x55ba5000
write-back @ 0x5e88c000-0x5e8b5000
-write-back @ 0x5e8b4000-0x5e8b8000
write-back @ 0x5e8b4000-0x5e8b5000
+write-back @ 0x5e8b4000-0x5e8b8000
write-back @ 0x5e8b7000-0x5e8bb000
write-back @ 0x5e8ba000-0x5e8bc000
write-back @ 0x5e8bb000-0x5e8be000
@@ -21,8 +21,8 @@
uncached-minus @ 0xec260000-0xec264000
uncached-minus @ 0xec300000-0xec320000
uncached-minus @ 0xec326000-0xec327000
-uncached-minus @ 0xf0000000-0xf0001000
uncached-minus @ 0xf0000000-0xf8000000
+uncached-minus @ 0xf0000000-0xf0001000
uncached-minus @ 0xfdc43000-0xfdc44000
uncached-minus @ 0xfe000000-0xfe001000
uncached-minus @ 0xfed00000-0xfed01000

Best regards,
Mariusz Ceier

On Sun, 1 Dec 2019 at 14:49, Ingo Molnar <[email protected]> wrote:
>
>
> * Ingo Molnar <[email protected]> wrote:
>
> > * Linus Torvalds <[email protected]> wrote:
>
> > > But the final difference is a real difference where it used to be WC,
> > > and is now UC-:
> > >
> > > -write-combining @ 0x2000000000-0x2100000000
> > > -write-combining @ 0x2000000000-0x2100000000
> > > +uncached-minus @ 0x2000000000-0x2100000000
> > > +uncached-minus @ 0x2000000000-0x2100000000
> > >
> > > which certainly could easily explain the huge performance degradation.
>
> > It's not an unconditional regression, as both Boris and me tried to
> > reproduce it on different systems that do ioremap_wc() as well and didn't
> > measure a slowdown, but something about the memory layout probably
> > triggers the tree management bug.
>
> Ok, I think I found at least one bug in the new PAT code, the conversion
> of memtype_check_conflict() is buggy I think:
>
> 8d04a5f97a5f: ("x86/mm/pat: Convert the PAT tree to a generic interval tree")
>
> dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
> found_type = match->type;
>
> - node = rb_next(&match->rb);
> - while (node) {
> - match = rb_entry(node, struct memtype, rb);
> -
> - if (match->start >= end) /* Checked all possible matches */
> - goto success;
> -
> - if (is_node_overlap(match, start, end) &&
> - match->type != found_type) {
> + match = memtype_interval_iter_next(match, start, end);
> + while (match) {
> + if (match->type != found_type)
> goto failure;
> - }
>
> - node = rb_next(&match->rb);
> + match = memtype_interval_iter_next(match, start, end);
> }
>
>
> Note how the '>= end' condition to end the interval check, got converted
> into:
>
> + match = memtype_interval_iter_next(match, start, end);
>
> This is subtly off by one, because the interval trees interfaces require
> closed interval parameters:
>
> include/linux/interval_tree_generic.h
>
> /* \
> * Iterate over intervals intersecting [start;last] \
> * \
> * Note that a node's interval intersects [start;last] iff: \
> * Cond1: ITSTART(node) <= last \
> * and \
> * Cond2: start <= ITLAST(node) \
> */ \
>
> ...
>
> if (ITSTART(node) <= last) { /* Cond1 */ \
> if (start <= ITLAST(node)) /* Cond2 */ \
> return node; /* node is leftmost match */ \
>
> [start;last] is a closed interval (note that '<= last' check) - while the
> PAT 'end' parameter is 1 byte beyond the end of the range, because
> ioremap() and the other mapping APIs usually use the [start,end)
> half-open interval, derived from 'size'.
>
> This is what ioremap() does for example:
>
> /*
> * Mappings have to be page-aligned
> */
> offset = phys_addr & ~PAGE_MASK;
> phys_addr &= PHYSICAL_PAGE_MASK;
> size = PAGE_ALIGN(last_addr+1) - phys_addr;
>
> retval = reserve_memtype(phys_addr, (u64)phys_addr + size,
> pcm, &new_pcm);
>
>
> phys_addr+size will be on a page boundary, after the last byte of the
> mapped interval.
>
> So the correct parameter to use in the interval tree searches is not
> 'end' but 'end-1'.
>
> This could have relevance if conflicting PAT ranges are exactly adjacent,
> for example a future WC region is followed immediately by an already
> mapped UC- region - in this case memtype_check_conflict() would
> incorrectly deny the WC memtype region and downgrade the memtype to UC-.
>
> BTW., rather annoyingly this downgrading is done silently in
> memtype_check_insert():
>
> int memtype_check_insert(struct memtype *new,
> enum page_cache_mode *ret_type)
> {
> int err = 0;
>
> err = memtype_check_conflict(new->start, new->end, new->type, ret_type);
> if (err)
> return err;
>
> if (ret_type)
> new->type = *ret_type;
>
> memtype_interval_insert(new, &memtype_rbroot);
> return 0;
> }
>
>
> So on such a conflict we'd just silently get UC- in *ret_type, and write
> it into the new region, never the wiser ...
>
> So assuming that the patch below fixes the primary bug the diagnostics
> side of ioremap() cache attribute downgrades would be another thing to
> fix.
>
> Anyway, I checked all the interval-tree iterations, and most of them are
> off by one - but I think the one related to memtype_check_conflict() is
> the one causing this particular performance regression.
>
> The only correct interval-tree searches were these two:
>
> arch/x86/mm/pat_interval.c: match = memtype_interval_iter_first(&memtype_rbroot, 0, ULONG_MAX);
> arch/x86/mm/pat_interval.c: match = memtype_interval_iter_next(match, 0, ULONG_MAX);
>
> The ULONG_MAX was hiding the off-by-one in plain sight. :-)
>
> So it would be nice if everyone who is seeing this bug could test the
> patch below against Linus's latest tree - does it fix the regression?
>
> If not then please send the before/after dump of
> /sys/kernel/debug/x86/pat_memtype_list - and even if it works please send
> the dumps so we can double check it all.
>
> Note that the bug was benign in the sense of implementing a too strict
> cache attribute conflict policy and downgrading cache attributes - so
> AFAICS the worst outcome of this bug would be a performance regression.
>
> Patch is only lightly tested, so take care. (Patch is emphatically not
> signed off yet, because I spent most of the day on this and I don't yet
> trust my fix - all of the affected sites need to be reviewed more
> carefully.)
>
> Thanks,
>
> Ingo
>
>
> ====================>
> From: Ingo Molnar <[email protected]>
> Date: Sun, 1 Dec 2019 15:25:50 +0100
> Subject: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search
>
> NOT-Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/mm/pat_interval.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/mm/pat_interval.c b/arch/x86/mm/pat_interval.c
> index 47a1bf30748f..6855362eaf21 100644
> --- a/arch/x86/mm/pat_interval.c
> +++ b/arch/x86/mm/pat_interval.c
> @@ -56,7 +56,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
> {
> struct memtype *match;
>
> - match = memtype_interval_iter_first(&memtype_rbroot, start, end);
> + match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
> while (match != NULL && match->start < end) {
> if ((match_type == MEMTYPE_EXACT_MATCH) &&
> (match->start == start) && (match->end == end))
> @@ -66,7 +66,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
> (match->start < start) && (match->end == end))
> return match;
>
> - match = memtype_interval_iter_next(match, start, end);
> + match = memtype_interval_iter_next(match, start, end-1);
> }
>
> return NULL; /* Returns NULL if there is no match */
> @@ -79,7 +79,7 @@ static int memtype_check_conflict(u64 start, u64 end,
> struct memtype *match;
> enum page_cache_mode found_type = reqtype;
>
> - match = memtype_interval_iter_first(&memtype_rbroot, start, end);
> + match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
> if (match == NULL)
> goto success;
>
> @@ -89,12 +89,12 @@ static int memtype_check_conflict(u64 start, u64 end,
> dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
> found_type = match->type;
>
> - match = memtype_interval_iter_next(match, start, end);
> + match = memtype_interval_iter_next(match, start, end-1);
> while (match) {
> if (match->type != found_type)
> goto failure;
>
> - match = memtype_interval_iter_next(match, start, end);
> + match = memtype_interval_iter_next(match, start, end-1);
> }
> success:
> if (newtype)
> @@ -160,7 +160,7 @@ struct memtype *memtype_erase(u64 start, u64 end)
> struct memtype *memtype_lookup(u64 addr)
> {
> return memtype_interval_iter_first(&memtype_rbroot, addr,
> - addr + PAGE_SIZE);
> + addr + PAGE_SIZE-1);
> }
>
> #if defined(CONFIG_DEBUG_FS)

2019-12-01 16:47:10

by Kenneth Crudup

[permalink] [raw]
Subject: Re: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search


On Sun, 1 Dec 2019, Ingo Molnar wrote:

> So it would be nice if everyone who is seeing this bug could test the
> patch below against Linus's latest tree - does it fix the regression?

I'll be sure to test it later today.

-Kenny

--
Kenneth R. Crudup Sr. SW Engineer, Scott County Consulting, Silicon Valley

2019-12-01 17:12:06

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search

On Sun, 01 Dec 2019, Ingo Molnar wrote:
>So the correct parameter to use in the interval tree searches is not
>'end' but 'end-1'.

Yes absolutely, I overlooked this in the final conversion. Going through some
older conversions, I had this end-1 at one point. Lookups need half-open intervals,
consistent with what memtype_interval_end() does.

[...]

>Patch is only lightly tested, so take care. (Patch is emphatically not
>signed off yet, because I spent most of the day on this and I don't yet
>trust my fix - all of the affected sites need to be reviewed more
>carefully.)

As a general note, this is rather consistent with how all interval-tree
users that need [a,b) nodes use the api.

Thanks,
Davidlohr

2019-12-01 17:12:44

by Kenneth Crudup

[permalink] [raw]
Subject: Re: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search


On Sun, 1 Dec 2019, Ingo Molnar wrote:

> So it would be nice if everyone who is seeing this bug could test the
> patch below against Linus's latest tree - does it fix the regression?

The patch fixes the issue for me.

> If not then please send the before/after dump of
> /sys/kernel/debug/x86/pat_memtype_list - and even if it works please send
> the dumps so we can double check it all.

I don't have the "before patch" (but could if it is absolutely needed) but
here's the "after patch":

----
PAT memtype list:
write-back @ 0x4c314000-0x4c35f000
write-back @ 0x4c35e000-0x4c35f000
write-back @ 0x4c35e000-0x4c364000
write-back @ 0x4c363000-0x4c366000
write-back @ 0x4c365000-0x4c369000
write-back @ 0x4c368000-0x4c36b000
write-back @ 0x4c36a000-0x4c36e000
write-back @ 0x4c36d000-0x4c36f000
write-back @ 0x4c36e000-0x4c370000
write-back @ 0x4c36f000-0x4c371000
write-back @ 0x4c370000-0x4c372000
write-back @ 0x4c7eb000-0x4c7ec000
write-back @ 0x4c7ec000-0x4c7ef000
write-back @ 0x4c7ec000-0x4c7ed000
write-back @ 0x4c7ef000-0x4c7f0000
write-back @ 0x4c7f0000-0x4c7f1000
write-back @ 0x4c867000-0x4c868000
write-back @ 0x4c868000-0x4c869000
write-back @ 0x4fa86000-0x4fa87000
write-back @ 0x4fefc000-0x4fefd000
uncached-minus @ 0x77f00000-0x77f10000
uncached-minus @ 0x8e000000-0x8e040000
uncached-minus @ 0x8e040000-0x8e041000
uncached-minus @ 0x8e200000-0x8e202000
uncached-minus @ 0x8e203000-0x8e204000
uncached-minus @ 0x8e300000-0x8e301000
uncached-minus @ 0xe0000000-0xf0000000
uncached-minus @ 0xfd6a0000-0xfd6a1000
uncached-minus @ 0xfd6a0000-0xfd6b0000
uncached-minus @ 0xfd6d0000-0xfd6e0000
uncached-minus @ 0xfd6e0000-0xfd6e1000
uncached-minus @ 0xfd6e0000-0xfd6f0000
uncached-minus @ 0xfe000000-0xfe002000
uncached-minus @ 0xfe001000-0xfe002000
uncached-minus @ 0xfed00000-0xfed01000
uncached-minus @ 0xfed10000-0xfed16000
uncached-minus @ 0xfed15000-0xfed16000
uncached-minus @ 0xfed40000-0xfed45000
uncached-minus @ 0xfed90000-0xfed91000
uncached-minus @ 0xfed91000-0xfed92000
uncached-minus @ 0xff340000-0xff341000
write-combining @ 0x4000000000-0x4010000000
uncached-minus @ 0x4010000000-0x4010001000
uncached-minus @ 0x4010000000-0x4010001000
uncached-minus @ 0x4010000000-0x4010001000
uncached-minus @ 0x4010001000-0x4010002000
uncached-minus @ 0x4010001000-0x4010002000
uncached-minus @ 0x4010001000-0x4010002000
uncached-minus @ 0x604a000000-0x604a200000
write-combining @ 0x604a800000-0x604b000000
uncached-minus @ 0x604b100000-0x604b110000
uncached-minus @ 0x604b110000-0x604b118000
uncached-minus @ 0x604b118000-0x604b11c000
uncached-minus @ 0x604b11c000-0x604b120000
uncached-minus @ 0x604b11e000-0x604b11f000
uncached-minus @ 0x604b122000-0x604b124000
uncached-minus @ 0x604b125000-0x604b126000
uncached-minus @ 0x604b129000-0x604b12a000
----

-Kenny

--
Kenneth R. Crudup Sr. SW Engineer, Scott County Consulting, Silicon Valley

2019-12-01 20:00:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search


* Mariusz Ceier <[email protected]> wrote:

> Your patch fixes performance issue on my system and afterwards
> /sys/kernel/debug/x86/pat_memtype_list contents are:

Great, thanks for testing it!

> PAT memtype list:

> uncached-minus @ 0xfed90000-0xfed91000
> write-combining @ 0x2000000000-0x2100000000
> write-combining @ 0x2000000000-0x2100000000
> uncached-minus @ 0x2100000000-0x2100001000

Note how the UC- region starts right after the WC region, which triggered
the bug on your system.

> It's very similar to pat_memtype_list contents after reverting 4
> x86/mm/pat patches affecting performance:
>
> @@ -1,8 +1,8 @@
> PAT memtype list:
> write-back @ 0x55ba4000-0x55ba5000
> write-back @ 0x5e88c000-0x5e8b5000
> -write-back @ 0x5e8b4000-0x5e8b8000
> write-back @ 0x5e8b4000-0x5e8b5000
> +write-back @ 0x5e8b4000-0x5e8b8000
> write-back @ 0x5e8b7000-0x5e8bb000
> write-back @ 0x5e8ba000-0x5e8bc000
> write-back @ 0x5e8bb000-0x5e8be000
> @@ -21,8 +21,8 @@
> uncached-minus @ 0xec260000-0xec264000
> uncached-minus @ 0xec300000-0xec320000
> uncached-minus @ 0xec326000-0xec327000
> -uncached-minus @ 0xf0000000-0xf0001000
> uncached-minus @ 0xf0000000-0xf8000000
> +uncached-minus @ 0xf0000000-0xf0001000

Yes, the ordering of same-start regions is different. I believe the
difference is due to how the old rbtree logic inserted subtrees:


- while (*node) {
- struct memtype *data = rb_entry(*node, struct memtype, rb);
-
- parent = *node;
- if (data->subtree_max_end < newdata->end)
- data->subtree_max_end = newdata->end;
- if (newdata->start <= data->start)
- node = &((*node)->rb_left);
- else if (newdata->start > data->start)
- node = &((*node)->rb_right);
- }
-
- newdata->subtree_max_end = newdata->end;
- rb_link_node(&newdata->rb, parent, node);
- rb_insert_augmented(&newdata->rb, root, &memtype_rb_augment_cb);

In the new interval-tree logic this is:

while (*link) { \
rb_parent = *link; \
parent = rb_entry(rb_parent, ITSTRUCT, ITRB); \
if (parent->ITSUBTREE < last) \
parent->ITSUBTREE = last; \
if (start < ITSTART(parent)) \
link = &parent->ITRB.rb_left; \
else { \
link = &parent->ITRB.rb_right; \
leftmost = false; \
} \
} \
\
node->ITSUBTREE = last; \
rb_link_node(&node->ITRB, rb_parent, link); \
rb_insert_augmented_cached(&node->ITRB, root, \
leftmost, &ITPREFIX ## _augment); \

The old logic was a bit convoluted, but it can be written as:

if (newdata->start <= data->start)
node = &parent->rb_left;
else
node = &parent->rb_right;

The new logic is, in effect:

if (start < data->start)
link = &parent->rb_left;
else
link = &parent->rb_right;

Note the "<=" vs. '<' difference - this I believe changes the ordering
within the tree. It's still fine as long as this is used consistently,
but this changes the internal ordering of the nodes of the tree.

Thanks,

Ingo

2019-12-01 20:00:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search


* Kenneth R. Crudup <[email protected]> wrote:

>
> On Sun, 1 Dec 2019, Ingo Molnar wrote:
>
> > So it would be nice if everyone who is seeing this bug could test the
> > patch below against Linus's latest tree - does it fix the regression?
>
> The patch fixes the issue for me.

Great, thanks!

> > If not then please send the before/after dump of
> > /sys/kernel/debug/x86/pat_memtype_list - and even if it works please send
> > the dumps so we can double check it all.
>
> I don't have the "before patch" (but could if it is absolutely needed) but
> here's the "after patch":

> uncached-minus @ 0xfed91000-0xfed92000
> uncached-minus @ 0xff340000-0xff341000
> write-combining @ 0x4000000000-0x4010000000
> uncached-minus @ 0x4010000000-0x4010001000

I believe this is the region that caused the problem, the 0x4010000000
'end' address of the WC region is the same as the 0x4010000000 'start'
address of the UC- region that follows it.

> write-combining @ 0x604a800000-0x604b000000
> uncached-minus @ 0x604b100000-0x604b110000

This WC region was probably unaffected by the bug.

Thanks,

Ingo

2019-12-01 20:06:02

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: x86/urgent] x86/mm/pat: Fix off-by-one bugs in interval tree search

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 91298f1a302dad0f0f630413c812818636faa8a0
Gitweb: https://git.kernel.org/tip/91298f1a302dad0f0f630413c812818636faa8a0
Author: Ingo Molnar <[email protected]>
AuthorDate: Sun, 01 Dec 2019 15:49:47 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sun, 01 Dec 2019 20:57:25 +01:00

x86/mm/pat: Fix off-by-one bugs in interval tree search

There's a bug in the new PAT code, the conversion of memtype_check_conflict()
is buggy:

8d04a5f97a5f: ("x86/mm/pat: Convert the PAT tree to a generic interval tree")

dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
found_type = match->type;

- node = rb_next(&match->rb);
- while (node) {
- match = rb_entry(node, struct memtype, rb);
-
- if (match->start >= end) /* Checked all possible matches */
- goto success;
-
- if (is_node_overlap(match, start, end) &&
- match->type != found_type) {
+ match = memtype_interval_iter_next(match, start, end);
+ while (match) {
+ if (match->type != found_type)
goto failure;
- }

- node = rb_next(&match->rb);
+ match = memtype_interval_iter_next(match, start, end);
}

Note how the '>= end' condition to end the interval check, got converted
into:

+ match = memtype_interval_iter_next(match, start, end);

This is subtly off by one, because the interval trees interfaces require
closed interval parameters:

include/linux/interval_tree_generic.h

/* \
* Iterate over intervals intersecting [start;last] \
* \
* Note that a node's interval intersects [start;last] iff: \
* Cond1: ITSTART(node) <= last \
* and \
* Cond2: start <= ITLAST(node) \
*/ \

...

if (ITSTART(node) <= last) { /* Cond1 */ \
if (start <= ITLAST(node)) /* Cond2 */ \
return node; /* node is leftmost match */ \

[start;last] is a closed interval (note that '<= last' check) - while the
PAT 'end' parameter is 1 byte beyond the end of the range, because
ioremap() and the other mapping APIs usually use the [start,end)
half-open interval, derived from 'size'.

This is what ioremap() does for example:

/*
* Mappings have to be page-aligned
*/
offset = phys_addr & ~PAGE_MASK;
phys_addr &= PHYSICAL_PAGE_MASK;
size = PAGE_ALIGN(last_addr+1) - phys_addr;

retval = reserve_memtype(phys_addr, (u64)phys_addr + size,
pcm, &new_pcm);

phys_addr+size will be on a page boundary, after the last byte of the
mapped interval.

So the correct parameter to use in the interval tree searches is not
'end' but 'end-1'.

This could have relevance if conflicting PAT ranges are exactly adjacent,
for example a future WC region is followed immediately by an already
mapped UC- region - in this case memtype_check_conflict() would
incorrectly deny the WC memtype region and downgrade the memtype to UC-.

BTW., rather annoyingly this downgrading is done silently in
memtype_check_insert():

int memtype_check_insert(struct memtype *new,
enum page_cache_mode *ret_type)
{
int err = 0;

err = memtype_check_conflict(new->start, new->end, new->type, ret_type);
if (err)
return err;

if (ret_type)
new->type = *ret_type;

memtype_interval_insert(new, &memtype_rbroot);
return 0;
}

So on such a conflict we'd just silently get UC- in *ret_type, and write
it into the new region, never the wiser ...

So assuming that the patch below fixes the primary bug the diagnostics
side of ioremap() cache attribute downgrades would be another thing to
fix.

Anyway, I checked all the interval-tree iterations, and most of them are
off by one - but I think the one related to memtype_check_conflict() is
the one causing this particular performance regression.

The only correct interval-tree searches were these two:

arch/x86/mm/pat_interval.c: match = memtype_interval_iter_first(&memtype_rbroot, 0, ULONG_MAX);
arch/x86/mm/pat_interval.c: match = memtype_interval_iter_next(match, 0, ULONG_MAX);

The ULONG_MAX was hiding the off-by-one in plain sight. :-)

Note that the bug was probably benign in the sense of implementing a too
strict cache attribute conflict policy and downgrading cache attributes,
so AFAICS the worst outcome of this bug would be a performance regression,
not any instabilities.

Reported-by: kernel test robot <[email protected]>
Reported-by: Kenneth R. Crudup <[email protected]>
Reported-by: Mariusz Ceier <[email protected]>
Tested-by: Mariusz Ceier <[email protected]>
Tested-by: Kenneth R. Crudup <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/mm/pat_interval.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pat_interval.c b/arch/x86/mm/pat_interval.c
index 47a1bf3..6855362 100644
--- a/arch/x86/mm/pat_interval.c
+++ b/arch/x86/mm/pat_interval.c
@@ -56,7 +56,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
{
struct memtype *match;

- match = memtype_interval_iter_first(&memtype_rbroot, start, end);
+ match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
while (match != NULL && match->start < end) {
if ((match_type == MEMTYPE_EXACT_MATCH) &&
(match->start == start) && (match->end == end))
@@ -66,7 +66,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
(match->start < start) && (match->end == end))
return match;

- match = memtype_interval_iter_next(match, start, end);
+ match = memtype_interval_iter_next(match, start, end-1);
}

return NULL; /* Returns NULL if there is no match */
@@ -79,7 +79,7 @@ static int memtype_check_conflict(u64 start, u64 end,
struct memtype *match;
enum page_cache_mode found_type = reqtype;

- match = memtype_interval_iter_first(&memtype_rbroot, start, end);
+ match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
if (match == NULL)
goto success;

@@ -89,12 +89,12 @@ static int memtype_check_conflict(u64 start, u64 end,
dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
found_type = match->type;

- match = memtype_interval_iter_next(match, start, end);
+ match = memtype_interval_iter_next(match, start, end-1);
while (match) {
if (match->type != found_type)
goto failure;

- match = memtype_interval_iter_next(match, start, end);
+ match = memtype_interval_iter_next(match, start, end-1);
}
success:
if (newtype)
@@ -160,7 +160,7 @@ struct memtype *memtype_erase(u64 start, u64 end)
struct memtype *memtype_lookup(u64 addr)
{
return memtype_interval_iter_first(&memtype_rbroot, addr,
- addr + PAGE_SIZE);
+ addr + PAGE_SIZE-1);
}

#if defined(CONFIG_DEBUG_FS)

2019-12-01 20:11:39

by Kenneth Crudup

[permalink] [raw]
Subject: Re: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search


> > uncached-minus @ 0xfed91000-0xfed92000
> > uncached-minus @ 0xff340000-0xff341000
> > write-combining @ 0x4000000000-0x4010000000
> > uncached-minus @ 0x4010000000-0x4010001000

On Sun, 1 Dec 2019, Ingo Molnar wrote:

> I believe this is the region that caused the problem, the 0x4010000000
> 'end' address of the WC region is the same as the 0x4010000000 'start'
> address of the UC- region that follows it.

> > write-combining @ 0x604a800000-0x604b000000
> > uncached-minus @ 0x604b100000-0x604b110000

> This WC region was probably unaffected by the bug.

For my education, and for completeness' sake, is there a proc/sys entry
that would tell me which device/module has reserved which PAT region?

-Kenny

--
Kenneth R. Crudup Sr. SW Engineer, Scott County Consulting, Silicon Valley

2019-12-01 20:35:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search


* Kenneth R. Crudup <[email protected]> wrote:

> > > write-combining @ 0x604a800000-0x604b000000
> > > uncached-minus @ 0x604b100000-0x604b110000
>
> > This WC region was probably unaffected by the bug.
>
> For my education, and for completeness' sake, is there a proc/sys entry
> that would tell me which device/module has reserved which PAT region?

Not that I know of :-/

I suspect you could run the attached patch and run:

dmesg | grep -i 'x86/pat'

To see all the ioremap() activities, with a symbolic name of the caller printed.

I'm quite sure 0x604a800000 will be among them, pointing to somewhere
like i915_vma_pin_iomap(), ggtt_init_hw() or ggtt_probe_common() in the
i915 GPU driver?

Another possibility is that this is the FB framebuffer, mapped by
efifb_probe() or so?

Patch is untested though. :-)

Thanks,

Ingo

=================>

arch/x86/mm/ioremap.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 1ff9c2030b4f..2f0a4f99471a 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -166,6 +166,8 @@ __ioremap_caller(resource_size_t phys_addr, unsigned long size,
int retval;
void __iomem *ret_addr;

+ printk("# x86/pat: ioremap(%016Lx, %08lx, pcm: %d), caller: %pS\n", phys_addr, size, pcm, caller);
+
/* Don't allow wraparound or zero size */
last_addr = phys_addr + size - 1;
if (!size || last_addr < phys_addr)

2019-12-02 08:36:18

by Chen, Rong A

[permalink] [raw]
Subject: Re: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search

Hi Ingo,

The patch fixes the regression reported by 0day-CI.

   "[LKP] [x86/mm/pat] 8d04a5f97a: phoronix-test-suite.glmark2.0.score
-23.7% regression":
https://lkml.kernel.org/r/20191127005312.GD20422@shao2-debian

Best Regards,
Rong Chen

On 12/1/19 10:49 PM, Ingo Molnar wrote:
> * Ingo Molnar <[email protected]> wrote:
>
>> * Linus Torvalds <[email protected]> wrote:
>>> But the final difference is a real difference where it used to be WC,
>>> and is now UC-:
>>>
>>> -write-combining @ 0x2000000000-0x2100000000
>>> -write-combining @ 0x2000000000-0x2100000000
>>> +uncached-minus @ 0x2000000000-0x2100000000
>>> +uncached-minus @ 0x2000000000-0x2100000000
>>>
>>> which certainly could easily explain the huge performance degradation.
>> It's not an unconditional regression, as both Boris and me tried to
>> reproduce it on different systems that do ioremap_wc() as well and didn't
>> measure a slowdown, but something about the memory layout probably
>> triggers the tree management bug.
> Ok, I think I found at least one bug in the new PAT code, the conversion
> of memtype_check_conflict() is buggy I think:
>
> 8d04a5f97a5f: ("x86/mm/pat: Convert the PAT tree to a generic interval tree")
>
> dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
> found_type = match->type;
>
> - node = rb_next(&match->rb);
> - while (node) {
> - match = rb_entry(node, struct memtype, rb);
> -
> - if (match->start >= end) /* Checked all possible matches */
> - goto success;
> -
> - if (is_node_overlap(match, start, end) &&
> - match->type != found_type) {
> + match = memtype_interval_iter_next(match, start, end);
> + while (match) {
> + if (match->type != found_type)
> goto failure;
> - }
>
> - node = rb_next(&match->rb);
> + match = memtype_interval_iter_next(match, start, end);
> }
>
>
> Note how the '>= end' condition to end the interval check, got converted
> into:
>
> + match = memtype_interval_iter_next(match, start, end);
>
> This is subtly off by one, because the interval trees interfaces require
> closed interval parameters:
>
> include/linux/interval_tree_generic.h
>
> /* \
> * Iterate over intervals intersecting [start;last] \
> * \
> * Note that a node's interval intersects [start;last] iff: \
> * Cond1: ITSTART(node) <= last \
> * and \
> * Cond2: start <= ITLAST(node) \
> */ \
>
> ...
>
> if (ITSTART(node) <= last) { /* Cond1 */ \
> if (start <= ITLAST(node)) /* Cond2 */ \
> return node; /* node is leftmost match */ \
>
> [start;last] is a closed interval (note that '<= last' check) - while the
> PAT 'end' parameter is 1 byte beyond the end of the range, because
> ioremap() and the other mapping APIs usually use the [start,end)
> half-open interval, derived from 'size'.
>
> This is what ioremap() does for example:
>
> /*
> * Mappings have to be page-aligned
> */
> offset = phys_addr & ~PAGE_MASK;
> phys_addr &= PHYSICAL_PAGE_MASK;
> size = PAGE_ALIGN(last_addr+1) - phys_addr;
>
> retval = reserve_memtype(phys_addr, (u64)phys_addr + size,
> pcm, &new_pcm);
>
>
> phys_addr+size will be on a page boundary, after the last byte of the
> mapped interval.
>
> So the correct parameter to use in the interval tree searches is not
> 'end' but 'end-1'.
>
> This could have relevance if conflicting PAT ranges are exactly adjacent,
> for example a future WC region is followed immediately by an already
> mapped UC- region - in this case memtype_check_conflict() would
> incorrectly deny the WC memtype region and downgrade the memtype to UC-.
>
> BTW., rather annoyingly this downgrading is done silently in
> memtype_check_insert():
>
> int memtype_check_insert(struct memtype *new,
> enum page_cache_mode *ret_type)
> {
> int err = 0;
>
> err = memtype_check_conflict(new->start, new->end, new->type, ret_type);
> if (err)
> return err;
>
> if (ret_type)
> new->type = *ret_type;
>
> memtype_interval_insert(new, &memtype_rbroot);
> return 0;
> }
>
>
> So on such a conflict we'd just silently get UC- in *ret_type, and write
> it into the new region, never the wiser ...
>
> So assuming that the patch below fixes the primary bug the diagnostics
> side of ioremap() cache attribute downgrades would be another thing to
> fix.
>
> Anyway, I checked all the interval-tree iterations, and most of them are
> off by one - but I think the one related to memtype_check_conflict() is
> the one causing this particular performance regression.
>
> The only correct interval-tree searches were these two:
>
> arch/x86/mm/pat_interval.c: match = memtype_interval_iter_first(&memtype_rbroot, 0, ULONG_MAX);
> arch/x86/mm/pat_interval.c: match = memtype_interval_iter_next(match, 0, ULONG_MAX);
>
> The ULONG_MAX was hiding the off-by-one in plain sight. :-)
>
> So it would be nice if everyone who is seeing this bug could test the
> patch below against Linus's latest tree - does it fix the regression?
>
> If not then please send the before/after dump of
> /sys/kernel/debug/x86/pat_memtype_list - and even if it works please send
> the dumps so we can double check it all.
>
> Note that the bug was benign in the sense of implementing a too strict
> cache attribute conflict policy and downgrading cache attributes - so
> AFAICS the worst outcome of this bug would be a performance regression.
>
> Patch is only lightly tested, so take care. (Patch is emphatically not
> signed off yet, because I spent most of the day on this and I don't yet
> trust my fix - all of the affected sites need to be reviewed more
> carefully.)
>
> Thanks,
>
> Ingo
>
>
> ====================>
> From: Ingo Molnar <[email protected]>
> Date: Sun, 1 Dec 2019 15:25:50 +0100
> Subject: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search
>
> NOT-Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/mm/pat_interval.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/mm/pat_interval.c b/arch/x86/mm/pat_interval.c
> index 47a1bf30748f..6855362eaf21 100644
> --- a/arch/x86/mm/pat_interval.c
> +++ b/arch/x86/mm/pat_interval.c
> @@ -56,7 +56,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
> {
> struct memtype *match;
>
> - match = memtype_interval_iter_first(&memtype_rbroot, start, end);
> + match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
> while (match != NULL && match->start < end) {
> if ((match_type == MEMTYPE_EXACT_MATCH) &&
> (match->start == start) && (match->end == end))
> @@ -66,7 +66,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
> (match->start < start) && (match->end == end))
> return match;
>
> - match = memtype_interval_iter_next(match, start, end);
> + match = memtype_interval_iter_next(match, start, end-1);
> }
>
> return NULL; /* Returns NULL if there is no match */
> @@ -79,7 +79,7 @@ static int memtype_check_conflict(u64 start, u64 end,
> struct memtype *match;
> enum page_cache_mode found_type = reqtype;
>
> - match = memtype_interval_iter_first(&memtype_rbroot, start, end);
> + match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
> if (match == NULL)
> goto success;
>
> @@ -89,12 +89,12 @@ static int memtype_check_conflict(u64 start, u64 end,
> dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
> found_type = match->type;
>
> - match = memtype_interval_iter_next(match, start, end);
> + match = memtype_interval_iter_next(match, start, end-1);
> while (match) {
> if (match->type != found_type)
> goto failure;
>
> - match = memtype_interval_iter_next(match, start, end);
> + match = memtype_interval_iter_next(match, start, end-1);
> }
> success:
> if (newtype)
> @@ -160,7 +160,7 @@ struct memtype *memtype_erase(u64 start, u64 end)
> struct memtype *memtype_lookup(u64 addr)
> {
> return memtype_interval_iter_first(&memtype_rbroot, addr,
> - addr + PAGE_SIZE);
> + addr + PAGE_SIZE-1);
> }
>
> #if defined(CONFIG_DEBUG_FS)