2021-04-27 08:46:36

by kernel test robot

[permalink] [raw]
Subject: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression


Greeting,

FYI, we noticed a -5.2% regression of will-it-scale.per_thread_ops due to commit:


commit: cbe16f35bee6880becca6f20d2ebf6b457148552 ("genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: will-it-scale
on test machine: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:

nr_task: 100%
mode: thread
test: getppid1
cpufreq_governor: performance
ucode: 0x5003006

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/getppid1/will-it-scale/0x5003006

commit:
v5.12-rc2
cbe16f35be ("genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()")

v5.12-rc2 cbe16f35bee6880becca6f20d2e
---------------- ---------------------------
%stddev %change %stddev
\ | \
7.408e+08 -5.2% 7.021e+08 will-it-scale.88.threads
8417726 -5.2% 7978644 will-it-scale.per_thread_ops
7.408e+08 -5.2% 7.021e+08 will-it-scale.workload
3.851e+10 -5.2% 3.65e+10 perf-stat.i.branch-instructions
1.839e+08 -4.2% 1.763e+08 perf-stat.i.branch-misses
1.39 +5.3% 1.46 perf-stat.i.cpi
5.988e+10 -5.2% 5.674e+10 perf-stat.i.dTLB-loads
4.139e+10 -5.2% 3.922e+10 perf-stat.i.dTLB-stores
2.239e+08 ? 3% -14.6% 1.913e+08 ? 4% perf-stat.i.iTLB-load-misses
1.741e+11 -5.2% 1.65e+11 perf-stat.i.instructions
794.20 ? 2% +10.7% 879.14 ? 4% perf-stat.i.instructions-per-iTLB-miss
0.72 -5.0% 0.68 perf-stat.i.ipc
1588 -5.2% 1505 perf-stat.i.metric.M/sec
1.39 +5.4% 1.47 perf-stat.overall.cpi
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
778.31 ? 3% +11.1% 864.35 ? 4% perf-stat.overall.instructions-per-iTLB-miss
0.72 -5.1% 0.68 perf-stat.overall.ipc
3.838e+10 -5.2% 3.638e+10 perf-stat.ps.branch-instructions
1.833e+08 -4.2% 1.757e+08 perf-stat.ps.branch-misses
5.968e+10 -5.2% 5.655e+10 perf-stat.ps.dTLB-loads
4.125e+10 -5.2% 3.909e+10 perf-stat.ps.dTLB-stores
2.231e+08 ? 3% -14.6% 1.907e+08 ? 4% perf-stat.ps.iTLB-load-misses
1.735e+11 -5.2% 1.645e+11 perf-stat.ps.instructions
5.243e+13 -5.2% 4.971e+13 perf-stat.total.instructions
43.20 -2.3 40.87 perf-profile.calltrace.cycles-pp.__entry_text_start.getppid
9.09 -0.4 8.65 ? 2% perf-profile.calltrace.cycles-pp.testcase
8.01 -0.3 7.66 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.getppid
2.93 -0.2 2.73 perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.getppid
3.02 -0.2 2.83 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.getppid
2.52 ? 3% +0.7 3.23 ? 9% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.getppid
18.24 +1.1 19.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getppid
13.33 +1.1 14.46 ? 2% perf-profile.calltrace.cycles-pp.__x64_sys_getppid.do_syscall_64.entry_SYSCALL_64_after_hwframe.getppid
1.94 ? 2% +1.7 3.62 ? 3% perf-profile.calltrace.cycles-pp.rcu_nocb_flush_deferred_wakeup.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.getppid
7.47 ? 2% +1.8 9.25 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.getppid
11.51 ? 2% +2.3 13.81 ? 2% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.getppid
39.05 +3.1 42.17 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getppid
27.84 -1.5 26.36 perf-profile.children.cycles-pp.__entry_text_start
23.40 -1.2 22.22 perf-profile.children.cycles-pp.syscall_return_via_sysret
2.97 -0.2 2.76 perf-profile.children.cycles-pp.syscall_enter_from_user_mode
3.23 -0.2 3.03 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.41 ? 2% -0.0 0.37 ? 2% perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
2.56 ? 3% +0.7 3.25 ? 7% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
13.88 +1.2 15.05 ? 2% perf-profile.children.cycles-pp.__x64_sys_getppid
2.02 ? 2% +1.6 3.67 ? 3% perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
7.86 +1.7 9.60 perf-profile.children.cycles-pp.exit_to_user_mode_prepare
12.71 +2.3 15.04 ? 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
39.65 +3.1 42.77 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
23.19 -1.2 22.01 perf-profile.self.cycles-pp.syscall_return_via_sysret
19.57 -1.1 18.50 perf-profile.self.cycles-pp.getppid
12.47 -0.6 11.83 perf-profile.self.cycles-pp.__entry_text_start
8.60 -0.2 8.36 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
3.22 -0.2 3.02 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
2.52 -0.2 2.35 perf-profile.self.cycles-pp.syscall_enter_from_user_mode
3.58 +0.4 3.95 perf-profile.self.cycles-pp.__x64_sys_getppid
2.12 ? 3% +0.6 2.74 ? 8% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
1.72 ? 2% +1.7 3.41 ? 3% perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup



will-it-scale.per_thread_ops

8.8e+06 +-----------------------------------------------------------------+
8.7e+06 |-+ .+. |
| .+ + |
8.6e+06 |.+.+.+.+ + .+. .+.+. .+.+. .+.+.+.+. |
8.5e+06 |-+ ++ + + + +. |
| +.+.++.+.+.+.+.+.+.+.|
8.4e+06 |-+ |
8.3e+06 |-O O O O O O OO |
8.2e+06 |-+ O O |
| |
8.1e+06 |-+ |
8e+06 |-+ O O O O O O O O O O O O O O |
| O O O O |
7.9e+06 |-+ O O O |
7.8e+06 +-----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (9.37 kB)
config-5.12.0-rc2-00001-gcbe16f35bee6 (175.55 kB)
job-script (7.75 kB)
job.yaml (5.37 kB)
reproduce (349.00 B)
Download all attachments
Subject: RE: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression



> -----Original Message-----
> From: kernel test robot [mailto:[email protected]]
> Sent: Tuesday, April 27, 2021 9:00 PM
> To: Song Bao Hua (Barry Song) <[email protected]>
> Cc: Ingo Molnar <[email protected]>; Thomas Gleixner <[email protected]>; LKML
> <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression
>
>
> Greeting,
>
> FYI, we noticed a -5.2% regression of will-it-scale.per_thread_ops due to
> commit:
>
>
> commit: cbe16f35bee6880becca6f20d2ebf6b457148552 ("genirq: Add IRQF_NO_AUTOEN
> for request_irq/nmi()")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>

Might be relevant. Can't figure out the relation between getppid and
request_irq().

Thanks
Barry

>
> in testcase: will-it-scale
> on test machine: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G
> memory
> with following parameters:
>
> nr_task: 100%
> mode: thread
> test: getppid1
> cpufreq_governor: performance
> ucode: 0x5003006
>
> test-description: Will It Scale takes a testcase and runs it from 1 through
> to n parallel copies to see if the testcase will scale. It builds both a process
> and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> Details are as below:
> --------------------------------------------------------------------------
> ------------------------>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this
> email
> bin/lkp split-job --compatible job.yaml
> bin/lkp run compatible-job.yaml
>
> ==========================================================================
> ===============
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/test
> case/ucode:
>
> gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.
> cgz/lkp-csl-2sp9/getppid1/will-it-scale/0x5003006
>
> commit:
> v5.12-rc2
> cbe16f35be ("genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()")
>
> v5.12-rc2 cbe16f35bee6880becca6f20d2e
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 7.408e+08 -5.2% 7.021e+08 will-it-scale.88.threads
> 8417726 -5.2% 7978644 will-it-scale.per_thread_ops
> 7.408e+08 -5.2% 7.021e+08 will-it-scale.workload
> 3.851e+10 -5.2% 3.65e+10 perf-stat.i.branch-instructions
> 1.839e+08 -4.2% 1.763e+08 perf-stat.i.branch-misses
> 1.39 +5.3% 1.46 perf-stat.i.cpi
> 5.988e+10 -5.2% 5.674e+10 perf-stat.i.dTLB-loads
> 4.139e+10 -5.2% 3.922e+10 perf-stat.i.dTLB-stores
> 2.239e+08 ? 3% -14.6% 1.913e+08 ? 4% perf-stat.i.iTLB-load-misses
> 1.741e+11 -5.2% 1.65e+11 perf-stat.i.instructions
> 794.20 ? 2% +10.7% 879.14 ? 4%
> perf-stat.i.instructions-per-iTLB-miss
> 0.72 -5.0% 0.68 perf-stat.i.ipc
> 1588 -5.2% 1505 perf-stat.i.metric.M/sec
> 1.39 +5.4% 1.47 perf-stat.overall.cpi
> 0.00 +0.0 0.00
> perf-stat.overall.dTLB-store-miss-rate%
> 778.31 ? 3% +11.1% 864.35 ? 4%
> perf-stat.overall.instructions-per-iTLB-miss
> 0.72 -5.1% 0.68 perf-stat.overall.ipc
> 3.838e+10 -5.2% 3.638e+10 perf-stat.ps.branch-instructions
> 1.833e+08 -4.2% 1.757e+08 perf-stat.ps.branch-misses
> 5.968e+10 -5.2% 5.655e+10 perf-stat.ps.dTLB-loads
> 4.125e+10 -5.2% 3.909e+10 perf-stat.ps.dTLB-stores
> 2.231e+08 ? 3% -14.6% 1.907e+08 ? 4% perf-stat.ps.iTLB-load-misses
> 1.735e+11 -5.2% 1.645e+11 perf-stat.ps.instructions
> 5.243e+13 -5.2% 4.971e+13 perf-stat.total.instructions
> 43.20 -2.3 40.87
> perf-profile.calltrace.cycles-pp.__entry_text_start.getppid
> 9.09 -0.4 8.65 ? 2%
> perf-profile.calltrace.cycles-pp.testcase
> 8.01 -0.3 7.66
> perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.getppid
> 2.93 -0.2 2.73
> perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_6
> 4.entry_SYSCALL_64_after_hwframe.getppid
> 3.02 -0.2 2.83
> perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.getppid
> 2.52 ? 3% +0.7 3.23 ? 9%
> perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall
> _exit_to_user_mode.entry_SYSCALL_64_after_hwframe.getppid
> 18.24 +1.1 19.29
> perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwfr
> ame.getppid
> 13.33 +1.1 14.46 ? 2%
> perf-profile.calltrace.cycles-pp.__x64_sys_getppid.do_syscall_64.entry_SYS
> CALL_64_after_hwframe.getppid
> 1.94 ? 2% +1.7 3.62 ? 3%
> perf-profile.calltrace.cycles-pp.rcu_nocb_flush_deferred_wakeup.exit_to_us
> er_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.g
> etppid
> 7.47 ? 2% +1.8 9.25
> perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to
> _user_mode.entry_SYSCALL_64_after_hwframe.getppid
> 11.51 ? 2% +2.3 13.81 ? 2%
> perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.entry_SYSCALL_6
> 4_after_hwframe.getppid
> 39.05 +3.1 42.17
> perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getppid
> 27.84 -1.5 26.36
> perf-profile.children.cycles-pp.__entry_text_start
> 23.40 -1.2 22.22
> perf-profile.children.cycles-pp.syscall_return_via_sysret
> 2.97 -0.2 2.76
> perf-profile.children.cycles-pp.syscall_enter_from_user_mode
> 3.23 -0.2 3.03
> perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.41 ? 2% -0.0 0.37 ? 2%
> perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
> 2.56 ? 3% +0.7 3.25 ? 7%
> perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
> 13.88 +1.2 15.05 ? 2%
> perf-profile.children.cycles-pp.__x64_sys_getppid
> 2.02 ? 2% +1.6 3.67 ? 3%
> perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
> 7.86 +1.7 9.60
> perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> 12.71 +2.3 15.04 ? 2%
> perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> 39.65 +3.1 42.77
> perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 23.19 -1.2 22.01
> perf-profile.self.cycles-pp.syscall_return_via_sysret
> 19.57 -1.1 18.50 perf-profile.self.cycles-pp.getppid
> 12.47 -0.6 11.83
> perf-profile.self.cycles-pp.__entry_text_start
> 8.60 -0.2 8.36
> perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 3.22 -0.2 3.02
> perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> 2.52 -0.2 2.35
> perf-profile.self.cycles-pp.syscall_enter_from_user_mode
> 3.58 +0.4 3.95
> perf-profile.self.cycles-pp.__x64_sys_getppid
> 2.12 ? 3% +0.6 2.74 ? 8%
> perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
> 1.72 ? 2% +1.7 3.41 ? 3%
> perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
>
>
>
> will-it-scale.per_thread_ops
>
> 8.8e+06
> +-----------------------------------------------------------------+
> 8.7e+06 |-+ .+. |
> | .+ + |
> 8.6e+06 |.+.+.+.+ + .+. .+.+. .+.+. .+.+.+.+. |
> 8.5e+06 |-+ ++ + + + +. |
> | +.+.++.+.+.+.+.+.+.+.|
> 8.4e+06 |-+ |
> 8.3e+06 |-O O O O O O OO |
> 8.2e+06 |-+ O O |
> | |
> 8.1e+06 |-+ |
> 8e+06 |-+ O O O O O O O O O O O O O O |
> | O O O O |
> 7.9e+06 |-+ O O O |
> 7.8e+06
> +-----------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> ---
> 0DAY/LKP+ Test Infrastructure Open Source Technology Center
> https://lists.01.org/hyperkitty/list/[email protected] Intel
> Corporation
>
> Thanks,
> Oliver Sang

2021-04-27 11:39:46

by Thomas Gleixner

[permalink] [raw]
Subject: RE: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression

On Tue, Apr 27 2021 at 09:20, Song Bao Hua wrote:
>> FYI, we noticed a -5.2% regression of will-it-scale.per_thread_ops due to
>> commit:
>>
>>
>> commit: cbe16f35bee6880becca6f20d2ebf6b457148552 ("genirq: Add IRQF_NO_AUTOEN
>> for request_irq/nmi()")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>
> Might be relevant. Can't figure out the relation between getppid and
> request_irq().

Me neither ...

2021-04-27 11:43:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression

Folks,

On Tue, Apr 27 2021 at 17:00, kernel test robot wrote:

> Greeting,
>
> FYI, we noticed a -5.2% regression of will-it-scale.per_thread_ops due to commit:
>
> commit: cbe16f35bee6880becca6f20d2ebf6b457148552 ("genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

this is the second report in the last week which makes not a lot of sense.
And this oneis makes absolutely no sense at all.

This commit affects request_irq() and the related variants and has
exactly ZERO influence on anything related to that test case simply
because.

I seriously have to ask the question whether this test infrastructure is
actually measuring what it claims to measure.

As this commit clearly _cannot_ have the 'measured' side effect, this
points to some serious issue in the tests or the test infrastructure
itself.

Thanks,

tglx

2021-04-27 19:38:37

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression

On Tue, Apr 27 2021 at 13:42, Thomas Gleixner wrote:
> On Tue, Apr 27 2021 at 17:00, kernel test robot wrote:
>> FYI, we noticed a -5.2% regression of will-it-scale.per_thread_ops due to commit:
>>
>> commit: cbe16f35bee6880becca6f20d2ebf6b457148552 ("genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> this is the second report in the last week which makes not a lot of sense.
> And this oneis makes absolutely no sense at all.
>
> This commit affects request_irq() and the related variants and has
> exactly ZERO influence on anything related to that test case simply
> because.
>
> I seriously have to ask the question whether this test infrastructure is
> actually measuring what it claims to measure.
>
> As this commit clearly _cannot_ have the 'measured' side effect, this
> points to some serious issue in the tests or the test infrastructure
> itself.

Just to illustrate the issue:

I ran the will-it-scale getppid1 test manually against plain v5.12 and
against v5.12 + cherrypicked cbe16f35be, i.e. the "offending" commit.

The result for a full run is just in the noise:

average: < 0.1%
minimum: -0.22%
maximum: 0.29%

IOW very far away from -5.2%.

That's an order of magnitude off.

And no, I'm not going to run that lkp-test muck simply because it's
unusable and the test result of will-it-scale itself is clear enough.

Thanks,

tglx

2021-04-28 05:13:57

by Feng Tang

[permalink] [raw]
Subject: Re: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression

Hi Thomas,

On Tue, Apr 27, 2021 at 01:42:12PM +0200, Thomas Gleixner wrote:
> Folks,
>
> On Tue, Apr 27 2021 at 17:00, kernel test robot wrote:
>
> > Greeting,
> >
> > FYI, we noticed a -5.2% regression of will-it-scale.per_thread_ops due to commit:
> >
> > commit: cbe16f35bee6880becca6f20d2ebf6b457148552 ("genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> this is the second report in the last week which makes not a lot of sense.
> And this oneis makes absolutely no sense at all.
>
> This commit affects request_irq() and the related variants and has
> exactly ZERO influence on anything related to that test case simply
> because.
>
> I seriously have to ask the question whether this test infrastructure is
> actually measuring what it claims to measure.
>
> As this commit clearly _cannot_ have the 'measured' side effect, this
> points to some serious issue in the tests or the test infrastructure
> itself.

0day has reported about 20 similar cases that the bisected commit has
nothing to do with the benchmark case, and we were very confused too
back then. And our debug showed many of them changed the code alignment
of kernel data or text of other modules which is relevant with the
benchmark, though some cases are not well explained yet. Following are
links of some explained cases.

https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/
https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/

And to debug code alignment case, one debug patch to force all
function address aligned to 32 bytes was merged in v5.9

09c60546f04f ./Makefile: add debug option to enable function aligned on 32 bytes


For this particular case, the commit changes the code size of
request_threaded_irq(), and many following functions' alignment
are changed.

So I extended the debug patch to force 64 bytes aligned, then
this commit will cause _no_ performance change for the same test
case on same platform.

diff --git a/Makefile b/Makefile

ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B
-KBUILD_CFLAGS += -falign-functions=32
+KBUILD_CFLAGS += -falign-functions=64
endif

Though I don't know the detail of how exactly this code alignment
affects the case.

Thanks,
Feng

> Thanks,
>
> tglx

Subject: RE: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression



> -----Original Message-----
> From: Feng Tang [mailto:[email protected]]
> Sent: Wednesday, April 28, 2021 5:08 PM
> To: Thomas Gleixner <[email protected]>
> Cc: kernel test robot <[email protected]>; Song Bao Hua (Barry Song)
> <[email protected]>; Ingo Molnar <[email protected]>; LKML
> <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2%
> regression
>
> Hi Thomas,
>
> On Tue, Apr 27, 2021 at 01:42:12PM +0200, Thomas Gleixner wrote:
> > Folks,
> >
> > On Tue, Apr 27 2021 at 17:00, kernel test robot wrote:
> >
> > > Greeting,
> > >
> > > FYI, we noticed a -5.2% regression of will-it-scale.per_thread_ops due to
> commit:
> > >
> > > commit: cbe16f35bee6880becca6f20d2ebf6b457148552 ("genirq: Add
> > > IRQF_NO_AUTOEN for request_irq/nmi()")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
> > > master
> >
> > this is the second report in the last week which makes not a lot of sense.
> > And this oneis makes absolutely no sense at all.
> >
> > This commit affects request_irq() and the related variants and has
> > exactly ZERO influence on anything related to that test case simply
> > because.
> >
> > I seriously have to ask the question whether this test infrastructure
> > is actually measuring what it claims to measure.
> >
> > As this commit clearly _cannot_ have the 'measured' side effect, this
> > points to some serious issue in the tests or the test infrastructure
> > itself.
>
> 0day has reported about 20 similar cases that the bisected commit has nothing
> to do with the benchmark case, and we were very confused too back then. And
> our debug showed many of them changed the code alignment of kernel data or text
> of other modules which is relevant with the benchmark, though some cases are
> not well explained yet. Following are links of some explained cases.
>
> https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/
> https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
> https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
>
> And to debug code alignment case, one debug patch to force all function address
> aligned to 32 bytes was merged in v5.9
>
> 09c60546f04f ./Makefile: add debug option to enable function aligned on 32 bytes
>
>
> For this particular case, the commit changes the code size of
> request_threaded_irq(), and many following functions' alignment are changed.
>

If so, the performance impact of code change would be random.

> So I extended the debug patch to force 64 bytes aligned, then this commit will
> cause _no_ performance change for the same test case on same platform.
>
> diff --git a/Makefile b/Makefile
>
> ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B
> -KBUILD_CFLAGS += -falign-functions=32
> +KBUILD_CFLAGS += -falign-functions=64
> endif
>
> Though I don't know the detail of how exactly this code alignment affects the
> case.

Guess it is related with icache.
But it is still an irrelevant problem.

>
> Thanks,
> Feng
>
> > Thanks,
> >
> > tglx

Thanks
Barry

2021-04-28 08:09:13

by Feng Tang

[permalink] [raw]
Subject: Re: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression

Hi Barry,

On Wed, Apr 28, 2021 at 07:01:35AM +0000, Song Bao Hua (Barry Song) wrote:
>
>
> > -----Original Message-----
> > From: Feng Tang [mailto:[email protected]]
> > Sent: Wednesday, April 28, 2021 5:08 PM
> > To: Thomas Gleixner <[email protected]>
> > Cc: kernel test robot <[email protected]>; Song Bao Hua (Barry Song)
> > <[email protected]>; Ingo Molnar <[email protected]>; LKML
> > <[email protected]>; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Subject: Re: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2%
> > regression
> >
> > Hi Thomas,
> >
> > On Tue, Apr 27, 2021 at 01:42:12PM +0200, Thomas Gleixner wrote:
> > > Folks,
> > >
> > > On Tue, Apr 27 2021 at 17:00, kernel test robot wrote:
> > >
> > > > Greeting,
> > > >
> > > > FYI, we noticed a -5.2% regression of will-it-scale.per_thread_ops due to
> > commit:
> > > >
> > > > commit: cbe16f35bee6880becca6f20d2ebf6b457148552 ("genirq: Add
> > > > IRQF_NO_AUTOEN for request_irq/nmi()")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
> > > > master
> > >
> > > this is the second report in the last week which makes not a lot of sense.
> > > And this oneis makes absolutely no sense at all.
> > >
> > > This commit affects request_irq() and the related variants and has
> > > exactly ZERO influence on anything related to that test case simply
> > > because.
> > >
> > > I seriously have to ask the question whether this test infrastructure
> > > is actually measuring what it claims to measure.
> > >
> > > As this commit clearly _cannot_ have the 'measured' side effect, this
> > > points to some serious issue in the tests or the test infrastructure
> > > itself.
> >
> > 0day has reported about 20 similar cases that the bisected commit has nothing
> > to do with the benchmark case, and we were very confused too back then. And
> > our debug showed many of them changed the code alignment of kernel data or text
> > of other modules which is relevant with the benchmark, though some cases are
> > not well explained yet. Following are links of some explained cases.
> >
> > https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/
> > https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
> > https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
> >
> > And to debug code alignment case, one debug patch to force all function address
> > aligned to 32 bytes was merged in v5.9
> >
> > 09c60546f04f ./Makefile: add debug option to enable function aligned on 32 bytes
> >
> >
> > For this particular case, the commit changes the code size of
> > request_threaded_irq(), and many following functions' alignment are changed.
> >
>
> If so, the performance impact of code change would be random.

Right, I heard 0day team has enabled the force_func_align_32B for some
kernel build to filter the case.

> > So I extended the debug patch to force 64 bytes aligned, then this commit will
> > cause _no_ performance change for the same test case on same platform.
> >
> > diff --git a/Makefile b/Makefile
> >
> > ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B
> > -KBUILD_CFLAGS += -falign-functions=32
> > +KBUILD_CFLAGS += -falign-functions=64
> > endif
> >
> > Though I don't know the detail of how exactly this code alignment affects the
> > case.
>
> Guess it is related with icache.

Possibly, and sometime iTLB also.

> But it is still an irrelevant problem.
Yes, the commit itself has no problem. And my personal thought
is no further action is needed.

Thanks,
Feng

> >
> > Thanks,
> > Feng
> >
> > > Thanks,
> > >
> > > tglx
>
> Thanks
> Barry

2021-04-28 08:39:10

by Feng Tang

[permalink] [raw]
Subject: Re: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression

On Tue, Apr 27, 2021 at 09:37:11PM +0200, Thomas Gleixner wrote:
> On Tue, Apr 27 2021 at 13:42, Thomas Gleixner wrote:
> > On Tue, Apr 27 2021 at 17:00, kernel test robot wrote:
> >> FYI, we noticed a -5.2% regression of will-it-scale.per_thread_ops due to commit:
> >>
> >> commit: cbe16f35bee6880becca6f20d2ebf6b457148552 ("genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()")
> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > this is the second report in the last week which makes not a lot of sense.
> > And this oneis makes absolutely no sense at all.
> >
> > This commit affects request_irq() and the related variants and has
> > exactly ZERO influence on anything related to that test case simply
> > because.
> >
> > I seriously have to ask the question whether this test infrastructure is
> > actually measuring what it claims to measure.
> >
> > As this commit clearly _cannot_ have the 'measured' side effect, this
> > points to some serious issue in the tests or the test infrastructure
> > itself.
>
> Just to illustrate the issue:
>
> I ran the will-it-scale getppid1 test manually against plain v5.12 and
> against v5.12 + cherrypicked cbe16f35be, i.e. the "offending" commit.
>
> The result for a full run is just in the noise:
>
> average: < 0.1%
> minimum: -0.22%
> maximum: 0.29%
>
> IOW very far away from -5.2%.
>
> That's an order of magnitude off.

The test in original report was done on a 2S/44C/88T Cascade Lake box.
I just run the same case on one Skylake server and one CoffeLake
desktop, and the commit cause no performance change, just like your
result.

And per our experience, this is common that some kernel performance
change can only be reproduced on one or several type of platforms.

Thanks,
Feng

> And no, I'm not going to run that lkp-test muck simply because it's
> unusable and the test result of will-it-scale itself is clear enough.
>
> Thanks,
>
> tglx

2021-04-28 08:57:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression

On Wed, Apr 28 2021 at 16:08, Feng Tang wrote:
> On Wed, Apr 28, 2021 at 07:01:35AM +0000, Song Bao Hua (Barry Song) wrote:
>
>> But it is still an irrelevant problem.
> Yes, the commit itself has no problem. And my personal thought
> is no further action is needed.

The commit does not need any further action, but this testing stuff
really needs further action because we can't differentiate between real
regressions and these bogus reports anymore.

Thanks,

tglx

2021-04-28 17:37:17

by Philip Li

[permalink] [raw]
Subject: Re: [LKP] Re: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression

On Wed, Apr 28, 2021 at 10:56:16AM +0200, Thomas Gleixner wrote:
> On Wed, Apr 28 2021 at 16:08, Feng Tang wrote:
> > On Wed, Apr 28, 2021 at 07:01:35AM +0000, Song Bao Hua (Barry Song) wrote:
> >
> >> But it is still an irrelevant problem.
> > Yes, the commit itself has no problem. And my personal thought
> > is no further action is needed.
>
> The commit does not need any further action, but this testing stuff
Sorry Thomas for confusion and trouble caused by this. We will take it
seriously to refine the report process for this category (alignment or
cache behavior) of performance change, to avoid meaningless ones.

Thanks

> really needs further action because we can't differentiate between real
> regressions and these bogus reports anymore.
>
> Thanks,
>
> tglx
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

2021-04-28 20:51:13

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [LKP] Re: [genirq] cbe16f35be: will-it-scale.per_thread_ops -5.2% regression

Philip,

On Wed, Apr 28 2021 at 23:23, Philip Li wrote:
> On Wed, Apr 28, 2021 at 10:56:16AM +0200, Thomas Gleixner wrote:
>> On Wed, Apr 28 2021 at 16:08, Feng Tang wrote:
>> > On Wed, Apr 28, 2021 at 07:01:35AM +0000, Song Bao Hua (Barry Song) wrote:
>> >
>> >> But it is still an irrelevant problem.
>> > Yes, the commit itself has no problem. And my personal thought
>> > is no further action is needed.
>>
>> The commit does not need any further action, but this testing stuff
> Sorry Thomas for confusion and trouble caused by this. We will take it
> seriously to refine the report process for this category (alignment or
> cache behavior) of performance change, to avoid meaningless ones.

Things go wrong every now and then. As long as we figure it out and
stuff gets fixed, no problem.

Thanks,

tglx