LinuxLists.cc - [smp] a32a4d8a81: netperf.Throughput

2021-05-19 19:59:23

Subject: [smp] a32a4d8a81: netperf.Throughput_tps -2.1% regression

Greeting,

FYI, we noticed a

commit: a32a4d8a815c4eb6dc6

If you fix the issue, Reported-by: kernel

Details are as below:
---------------------------

To reproduce:

git clone bin/lkp install bin/lkp split-job bin/lkp run
=========================== cluster/compiler/cpufreq_go cs-localhost/gcc-9/perfor
commit:
v5.12-rc2
a32a4d8a81 ("smp:
v5.12-rc2 a32a4d8a81 ---------------- ---------- %stddev %change \ | 116903 116903 35066769 35071059 67295 +1.5% 463520 535.28 ? 6% 0.02 ? 8% -10.8% 76309820 ? 4% 23409116 ? 3% 46720133 ? 2% 5282 ?110% +317.0% 11998 ? 55% +138.7% 11998 ? 55% +138.7% 8397 ?136% +588.7% 21162 ?110% +316.7% 48780 ? 54% +136.8% 48780 ? 54% +136.8% 467040 0.01 ?138% +0.0 9.415e+08 0.01 ?137% +0.0 465472 9.385e+08 1.21 ? 14% +0.2 2.05 ? 10% +0.3 0.06 ? 7% +0.0 0.08 ? 19% +0.0 0.09 ? 22% +0.0 0.12 ? 20% +0.0 0.14 ? 11% +0.1 1.21 ? 14% +0.2 2.07 ? 11% +0.3 0.06 ? 7% +0.0 0.19 ? 8% +0.0 0.24 ? 8% +0.1 0.14 ? 11% +0.1 0.02 ?142% +0.1 0.11 ? 17% +0.1 0.12 ? 34% +0.1 0.87 ? 13% +0.2 1287 ? 42% +75.3% 1326 ? 43% +71.0% 1300 ? 45% +75.9% 1299 ? 45% +60.1% 1305 ? 45% +61.7% 1299 ? 45% +61.8% 66.67 ?133% -97.2% 1299 ? 45% +107.8% 301.83 ?128% -95.6% 389.17 ? 89% -73.5% 389.17 ? 89% -73.5% 1299 ? 45% +60.2% 1244 ? 50% +66.8% 1300 ? 44% +59.5% 1.50 ? 63% +1422.2% 467.33 ? 85% -64.6% 467.33 ? 85% -64.6% 306.67 ? 75% -59.9% 306.67 ? 75% -59.9% 1131 ? 27% +61.2% 1180 ? 31% +79.6%

121000 +----------------- 120000 |-+ :+ | : + 119000 |-+ : + + 118000 |-+ : : :+ |.+ : : : + 117000 |-++ +.: 116000 |-+ + + :+ 115000 |-+ O + | O O O 114000 |-+ O 113000 |-+ | O O O 112000 |-+ O 111000 +----------------- [*] bisect-good sample
[O] bisect-bad sample

Disclaimer:
Results have been for informational design or configuration

---
0DAY/LKP+ Test Infrastructure https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
Netperf is a benchmark that can be use to measure various aspect of networking performance.
www.netperf.org/netperf/">http://www.netperf.org/netperf/
kindly add following tag
test robot <[email protected]>
----------------------------------------------------------------------->
href="https://github.com/intel/lkp-tests.git">https://github.com/intel/lkp-tests.git
job.yaml # job file is attached in this email
--compatible job.yaml # generate the yaml file for lkp run
generated-yaml-file
==============================================================
vernor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase/ucode:
mance/ipv4/x86_64-rhel-8.3/1/debian-10.4-x86_64-20200603.cgz/300s/lkp-csl-2ap3/UDP_RR/netperf/0x5003006
Run functions concurrently in smp_call_function_many_cond()")
5c4eb6dc64b8962dc
-----------------
%stddev
\
-2.1% 114404 netperf.Throughput_total_tps
-2.1% 114404 netperf.Throughput_tps
-2.1% 34317990 netperf.time.voluntary_context_switches
-2.1% 34321258 netperf.workload
68333 proc-vmstat.nr_anon_pages
-2.1% 453603 vmstat.system.cs
-8.3% 490.97 ? 10% sched_debug.cfs_rq:/.util_est_enqueued.max
0.02 ? 4% sched_debug.cpu.nr_running.avg
+320.0% 3.205e+08 ?158% cpuidle.C1.time
+31.0% 30676822 ? 20% cpuidle.C1.usage
-12.9% 40709940 ? 2% cpuidle.POLL.usage
22029 ? 58% numa-vmstat.node3.nr_anon_pages
28637 ? 45% numa-vmstat.node3.nr_inactive_anon
28637 ? 45% numa-vmstat.node3.nr_zone_inactive_anon
57827 ? 75% numa-meminfo.node3.AnonHugePages
88189 ? 58% numa-meminfo.node3.AnonPages
115533 ? 45% numa-meminfo.node3.Inactive
115533 ? 45% numa-meminfo.node3.Inactive(anon)
-2.1% 457094 perf-stat.i.context-switches
0.03 ? 73% perf-stat.i.dTLB-store-miss-rate%
-2.4% 9.188e+08 ? 2% perf-stat.i.dTLB-stores
0.03 ? 73% perf-stat.overall.dTLB-store-miss-rate%
-2.1% 455557 perf-stat.ps.context-switches
-2.4% 9.158e+08 ? 2% perf-stat.ps.dTLB-stores
1.41 ? 5% perf-profile.calltrace.cycles-pp.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg.__sys_sendto
2.33 ? 4% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
0.08 ? 14% perf-profile.children.cycles-pp.__calc_delta
0.10 ? 9% perf-profile.children.cycles-pp._copy_to_user
0.12 ? 8% perf-profile.children.cycles-pp._copy_from_user
0.17 ? 13% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.19 ? 9% perf-profile.children.cycles-pp.skb_release_data
1.41 ? 5% perf-profile.children.cycles-pp.__ip_append_data
2.33 ? 4% perf-profile.children.cycles-pp.schedule_idle
0.08 ? 11% perf-profile.self.cycles-pp.__calc_delta
0.24 ? 6% perf-profile.self.cycles-pp.__softirqentry_text_start
0.29 ? 4% perf-profile.self.cycles-pp.__skb_recv_udp
0.19 ? 9% perf-profile.self.cycles-pp.skb_release_data
0.08 ? 17% perf-profile.self.cycles-pp.sock_alloc_send_pskb
0.19 ? 13% perf-profile.self.cycles-pp.__ip_append_data
0.26 ? 22% perf-profile.self.cycles-pp.perf_mux_hrtimer_handler
1.05 ? 6% perf-profile.self.cycles-pp._raw_spin_lock
2256 ? 14% interrupts.CPU111.CAL:Function_call_interrupts
2267 ? 13% interrupts.CPU119.CAL:Function_call_interrupts
2287 ? 37% interrupts.CPU120.CAL:Function_call_interrupts
2081 ? 28% interrupts.CPU128.CAL:Function_call_interrupts
2110 ? 29% interrupts.CPU131.CAL:Function_call_interrupts
2102 ? 28% interrupts.CPU139.CAL:Function_call_interrupts
1.83 ?155% interrupts.CPU14.TLB:TLB_shootdowns
2700 ? 33% interrupts.CPU142.CAL:Function_call_interrupts
13.17 ?140% interrupts.CPU149.RES:Rescheduling_interrupts
103.17 ? 35% interrupts.CPU164.NMI:Non-maskable_interrupts
103.17 ? 35% interrupts.CPU164.PMI:Performance_monitoring_interrupts
2081 ? 28% interrupts.CPU35.CAL:Function_call_interrupts
2076 ? 27% interrupts.CPU45.CAL:Function_call_interrupts
2075 ? 28% interrupts.CPU46.CAL:Function_call_interrupts
22.83 ?167% interrupts.CPU47.RES:Rescheduling_interrupts
165.67 ? 74% interrupts.CPU58.NMI:Non-maskable_interrupts
165.67 ? 74% interrupts.CPU58.PMI:Performance_monitoring_interrupts
122.83 ? 16% interrupts.CPU68.NMI:Non-maskable_interrupts
122.83 ? 16% interrupts.CPU68.PMI:Performance_monitoring_interrupts
1822 ? 35% interrupts.CPU85.CAL:Function_call_interrupts
2119 ? 24% interrupts.CPU86.CAL:Function_call_interrupts

netperf.Throughput_tps

-------------------------------------------------+
|
|
|
+ + + + |
+ + :: + + : |
: +.+ + + +. : :.+ + : |
+.+ + + + |
O O |
O O O O O O O O O |
O O O O O O |
O O O O O |
O O |
O |
-------------------------------------------------+

estimated based on internal Intel analysis and are provided
purposes only. Any difference in system hardware or software
may affect actual performance.
Open Source Technology Center
org/hyperkitty/list/lkp@lists.01.org">https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Attachments:

(No filename) (9.36 kB)
config-5.12.0-rc2-00001-ga32a4d8a815c (175.54 kB)
job-script (8.04 kB)
job.yaml (5.47 kB)
reproduce (337.00 B)
Download all attachments

2021-05-19 20:20:48

by Nadav Amit

[permalink] [raw]

Subject: Re: [smp] a32a4d8a81: netperf.Throughput_tps -2.1% regression

[ +PeterZ for reference ]

> On May 19, 2021, at 7:27 AM, kernel test robot <[email protected]> wrote:
>
>
>
> Greeting,
>
> FYI, we noticed a -2.1% regression of netperf.Throughput_tps due to commit:
>
>
> commit: a32a4d8a815c4eb6dc64b8962dc13a9dfae70868 ("smp: Run functions concurrently in smp_call_function_many_cond()")
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fcgit%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git&data=04%7C01%7Cnamit%40vmware.com%7Ca49b22e928144aab039908d91acff8c4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637570302823256266%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=h2VRetBNlEQBvOlkYrRCMCK6%2BukRqlCElYxM8UfVxqI%3D&reserved=0 master
>
>
> in testcase: netperf
> on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> with following parameters:
>
> ip: ipv4
> runtime: 300s
> nr_threads: 1
> cluster: cs-localhost
> test: UDP_RR
> cpufreq_governor: performance
> ucode: 0x5003006
>
>

[snip]

> commit:
> v5.12-rc2
> a32a4d8a81 ("smp: Run functions concurrently in smp_call_function_many_cond()")
>
> v5.12-rc2 a32a4d8a815c4eb6dc64b8962dc
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 116903 -2.1% 114404 netperf.Throughput_total_tps
> 116903 -2.1% 114404 netperf.Throughput_tps
> 35066769 -2.1% 34317990 netperf.time.voluntary_context_switches
> 35071059 -2.1% 34321258 netperf.workload
> 67295 +1.5% 68333 proc-vmstat.nr_anon_pages
> 463520 -2.1% 453603 vmstat.system.cs
> 535.28 ± 6% -8.3% 490.97 ± 10% sched_debug.cfs_rq:/.util_est_enqueued.max
> 0.02 ± 8% -10.8% 0.02 ± 4% sched_debug.cpu.nr_running.avg
> 76309820 ± 4% +320.0% 3.205e+08 ±158% cpuidle.C1.time
> 23409116 ± 3% +31.0% 30676822 ± 20% cpuidle.C1.usage
> 46720133 ± 2% -12.9% 40709940 ± 2% cpuidle.POLL.usage
> 5282 ±110% +317.0% 22029 ± 58% numa-vmstat.node3.nr_anon_pages
> 11998 ± 55% +138.7% 28637 ± 45% numa-vmstat.node3.nr_inactive_anon
> 11998 ± 55% +138.7% 28637 ± 45% numa-vmstat.node3.nr_zone_inactive_anon
> 8397 ±136% +588.7% 57827 ± 75% numa-meminfo.node3.AnonHugePages
> 21162 ±110% +316.7% 88189 ± 58% numa-meminfo.node3.AnonPages
> 48780 ± 54% +136.8% 115533 ± 45% numa-meminfo.node3.Inactive
> 48780 ± 54% +136.8% 115533 ± 45% numa-meminfo.node3.Inactive(anon)
> 467040 -2.1% 457094 perf-stat.i.context-switches
> 0.01 ±138% +0.0 0.03 ± 73% perf-stat.i.dTLB-store-miss-rate%
> 9.415e+08 -2.4% 9.188e+08 ± 2% perf-stat.i.dTLB-stores
> 0.01 ±137% +0.0 0.03 ± 73% perf-stat.overall.dTLB-store-miss-rate%
> 465472 -2.1% 455557 perf-stat.ps.context-switches
> 9.385e+08 -2.4% 9.158e+08 ± 2% perf-stat.ps.dTLB-stores
> 1.21 ± 14% +0.2 1.41 ± 5% perf-profile.calltrace.cycles-pp.__ip_append_data.ip_make_skb.udp_sendmsg.sock_sendmsg.__sys_sendto
> 2.05 ± 10% +0.3 2.33 ± 4% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> 0.06 ± 7% +0.0 0.08 ± 14% perf-profile.children.cycles-pp.__calc_delta
> 0.08 ± 19% +0.0 0.10 ± 9% perf-profile.children.cycles-pp._copy_to_user
> 0.09 ± 22% +0.0 0.12 ± 8% perf-profile.children.cycles-pp._copy_from_user
> 0.12 ± 20% +0.0 0.17 ± 13% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> 0.14 ± 11% +0.1 0.19 ± 9% perf-profile.children.cycles-pp.skb_release_data
> 1.21 ± 14% +0.2 1.41 ± 5% perf-profile.children.cycles-pp.__ip_append_data
> 2.07 ± 11% +0.3 2.33 ± 4% perf-profile.children.cycles-pp.schedule_idle
> 0.06 ± 7% +0.0 0.08 ± 11% perf-profile.self.cycles-pp.__calc_delta
> 0.19 ± 8% +0.0 0.24 ± 6% perf-profile.self.cycles-pp.__softirqentry_text_start
> 0.24 ± 8% +0.1 0.29 ± 4% perf-profile.self.cycles-pp.__skb_recv_udp
> 0.14 ± 11% +0.1 0.19 ± 9% perf-profile.self.cycles-pp.skb_release_data
> 0.02 ±142% +0.1 0.08 ± 17% perf-profile.self.cycles-pp.sock_alloc_send_pskb
> 0.11 ± 17% +0.1 0.19 ± 13% perf-profile.self.cycles-pp.__ip_append_data
> 0.12 ± 34% +0.1 0.26 ± 22% perf-profile.self.cycles-pp.perf_mux_hrtimer_handler
> 0.87 ± 13% +0.2 1.05 ± 6% perf-profile.self.cycles-pp._raw_spin_lock
> 1287 ± 42% +75.3% 2256 ± 14% interrupts.CPU111.CAL:Function_call_interrupts
> 1326 ± 43% +71.0% 2267 ± 13% interrupts.CPU119.CAL:Function_call_interrupts
> 1300 ± 45% +75.9% 2287 ± 37% interrupts.CPU120.CAL:Function_call_interrupts
> 1299 ± 45% +60.1% 2081 ± 28% interrupts.CPU128.CAL:Function_call_interrupts
> 1305 ± 45% +61.7% 2110 ± 29% interrupts.CPU131.CAL:Function_call_interrupts
> 1299 ± 45% +61.8% 2102 ± 28% interrupts.CPU139.CAL:Function_call_interrupts
> 66.67 ±133% -97.2% 1.83 ±155% interrupts.CPU14.TLB:TLB_shootdowns
> 1299 ± 45% +107.8% 2700 ± 33% interrupts.CPU142.CAL:Function_call_interrupts
> 301.83 ±128% -95.6% 13.17 ±140% interrupts.CPU149.RES:Rescheduling_interrupts
> 389.17 ± 89% -73.5% 103.17 ± 35% interrupts.CPU164.NMI:Non-maskable_interrupts
> 389.17 ± 89% -73.5% 103.17 ± 35% interrupts.CPU164.PMI:Performance_monitoring_interrupts
> 1299 ± 45% +60.2% 2081 ± 28% interrupts.CPU35.CAL:Function_call_interrupts
> 1244 ± 50% +66.8% 2076 ± 27% interrupts.CPU45.CAL:Function_call_interrupts
> 1300 ± 44% +59.5% 2075 ± 28% interrupts.CPU46.CAL:Function_call_interrupts
> 1.50 ± 63% +1422.2% 22.83 ±167% interrupts.CPU47.RES:Rescheduling_interrupts
> 467.33 ± 85% -64.6% 165.67 ± 74% interrupts.CPU58.NMI:Non-maskable_interrupts
> 467.33 ± 85% -64.6% 165.67 ± 74% interrupts.CPU58.PMI:Performance_monitoring_interrupts
> 306.67 ± 75% -59.9% 122.83 ± 16% interrupts.CPU68.NMI:Non-maskable_interrupts
> 306.67 ± 75% -59.9% 122.83 ± 16% interrupts.CPU68.PMI:Performance_monitoring_interrupts
> 1131 ± 27% +61.2% 1822 ± 35% interrupts.CPU85.CAL:Function_call_interrupts
> 1180 ± 31% +79.6% 2119 ± 24% interrupts.CPU86.CAL:Function_call_interrupts
>

Could it be a result of a regression that was resolved by commit
641acbf6fd6 ("smp: Micro-optimize smp_call_function_many_cond()")
or does this report mean that the performance regression also
happened on the -rc?

Attachments:

signature.asc (849.00 B)
Message signed with OpenPGP

2021-05-19 20:21:22

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [smp] a32a4d8a81: netperf.Throughput_tps -2.1% regression

On Wed, May 19, 2021 at 06:17:35PM +0000, Nadav Amit wrote:
> > 1287 ? 42% +75.3% 2256 ? 14% interrupts.CPU111.CAL:Function_call_interrupts
> > 1326 ? 43% +71.0% 2267 ? 13% interrupts.CPU119.CAL:Function_call_interrupts
> > 1300 ? 45% +75.9% 2287 ? 37% interrupts.CPU120.CAL:Function_call_interrupts
> > 1299 ? 45% +60.1% 2081 ? 28% interrupts.CPU128.CAL:Function_call_interrupts
> > 1305 ? 45% +61.7% 2110 ? 29% interrupts.CPU131.CAL:Function_call_interrupts
> > 1299 ? 45% +61.8% 2102 ? 28% interrupts.CPU139.CAL:Function_call_interrupts
> > 66.67 ?133% -97.2% 1.83 ?155% interrupts.CPU14.TLB:TLB_shootdowns
> > 1299 ? 45% +107.8% 2700 ? 33% interrupts.CPU142.CAL:Function_call_interrupts
> > 301.83 ?128% -95.6% 13.17 ?140% interrupts.CPU149.RES:Rescheduling_interrupts
> > 389.17 ? 89% -73.5% 103.17 ? 35% interrupts.CPU164.NMI:Non-maskable_interrupts
> > 389.17 ? 89% -73.5% 103.17 ? 35% interrupts.CPU164.PMI:Performance_monitoring_interrupts
> > 1299 ? 45% +60.2% 2081 ? 28% interrupts.CPU35.CAL:Function_call_interrupts
> > 1244 ? 50% +66.8% 2076 ? 27% interrupts.CPU45.CAL:Function_call_interrupts
> > 1300 ? 44% +59.5% 2075 ? 28% interrupts.CPU46.CAL:Function_call_interrupts
> > 1.50 ? 63% +1422.2% 22.83 ?167% interrupts.CPU47.RES:Rescheduling_interrupts
> > 467.33 ? 85% -64.6% 165.67 ? 74% interrupts.CPU58.NMI:Non-maskable_interrupts
> > 467.33 ? 85% -64.6% 165.67 ? 74% interrupts.CPU58.PMI:Performance_monitoring_interrupts
> > 306.67 ? 75% -59.9% 122.83 ? 16% interrupts.CPU68.NMI:Non-maskable_interrupts
> > 306.67 ? 75% -59.9% 122.83 ? 16% interrupts.CPU68.PMI:Performance_monitoring_interrupts
> > 1131 ? 27% +61.2% 1822 ? 35% interrupts.CPU85.CAL:Function_call_interrupts
> > 1180 ? 31% +79.6% 2119 ? 24% interrupts.CPU86.CAL:Function_call_interrupts
> >

It looks to be sending *waay* more call IPIs, did we mess up the mask or
loose an optimization somewhere?

I'll go read the commit again...

2021-05-19 20:24:45

by Nadav Amit

[permalink] [raw]

Subject: Re: [smp] a32a4d8a81: netperf.Throughput_tps -2.1% regression

> On May 19, 2021, at 11:38 AM, Peter Zijlstra <[email protected]> wrote:
>
> On Wed, May 19, 2021 at 06:17:35PM +0000, Nadav Amit wrote:
>>> 1287 ± 42% +75.3% 2256 ± 14% interrupts.CPU111.CAL:Function_call_interrupts
>>> 1326 ± 43% +71.0% 2267 ± 13% interrupts.CPU119.CAL:Function_call_interrupts
>>> 1300 ± 45% +75.9% 2287 ± 37% interrupts.CPU120.CAL:Function_call_interrupts
>>> 1299 ± 45% +60.1% 2081 ± 28% interrupts.CPU128.CAL:Function_call_interrupts
>>> 1305 ± 45% +61.7% 2110 ± 29% interrupts.CPU131.CAL:Function_call_interrupts
>>> 1299 ± 45% +61.8% 2102 ± 28% interrupts.CPU139.CAL:Function_call_interrupts
>>> 66.67 ±133% -97.2% 1.83 ±155% interrupts.CPU14.TLB:TLB_shootdowns
>>> 1299 ± 45% +107.8% 2700 ± 33% interrupts.CPU142.CAL:Function_call_interrupts
>>> 301.83 ±128% -95.6% 13.17 ±140% interrupts.CPU149.RES:Rescheduling_interrupts
>>> 389.17 ± 89% -73.5% 103.17 ± 35% interrupts.CPU164.NMI:Non-maskable_interrupts
>>> 389.17 ± 89% -73.5% 103.17 ± 35% interrupts.CPU164.PMI:Performance_monitoring_interrupts
>>> 1299 ± 45% +60.2% 2081 ± 28% interrupts.CPU35.CAL:Function_call_interrupts
>>> 1244 ± 50% +66.8% 2076 ± 27% interrupts.CPU45.CAL:Function_call_interrupts
>>> 1300 ± 44% +59.5% 2075 ± 28% interrupts.CPU46.CAL:Function_call_interrupts
>>> 1.50 ± 63% +1422.2% 22.83 ±167% interrupts.CPU47.RES:Rescheduling_interrupts
>>> 467.33 ± 85% -64.6% 165.67 ± 74% interrupts.CPU58.NMI:Non-maskable_interrupts
>>> 467.33 ± 85% -64.6% 165.67 ± 74% interrupts.CPU58.PMI:Performance_monitoring_interrupts
>>> 306.67 ± 75% -59.9% 122.83 ± 16% interrupts.CPU68.NMI:Non-maskable_interrupts
>>> 306.67 ± 75% -59.9% 122.83 ± 16% interrupts.CPU68.PMI:Performance_monitoring_interrupts
>>> 1131 ± 27% +61.2% 1822 ± 35% interrupts.CPU85.CAL:Function_call_interrupts
>>> 1180 ± 31% +79.6% 2119 ± 24% interrupts.CPU86.CAL:Function_call_interrupts
>>>
>
> It looks to be sending *waay* more call IPIs, did we mess up the mask or
> loose an optimization somewhere?
>
> I'll go read the commit again…

As you know, I did mess up by calling arch_send_call_function_single_ipi()
instead of smp_call_function_single(), which could explain the extra IPIs.
But that was resolved by your subsequent patch.

For me, what stands out is the time in C1 spent after the patch.

I will try to reproduce the issue to figure it out, since so far I could
not find an error in the code.

Attachments:

signature.asc (849.00 B)
Message signed with OpenPGP