LinuxLists.cc - Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

2023-06-08 23:11:09

Subject: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

Hi all,

I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:

Commit Data:
commit-id : a53ce18cacb477dd0513c607f187d16f0fa96f71
subject : sched/fair: Sanitize vruntime of entity being migrated
author : [email protected]
author date : 2023-03-17 16:08:10

We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.

ub_gcc_1copy_Shell_Scripts_1_concurrent : -0.01%
ub_gcc_1copy_Shell_Scripts_8_concurrent : -0.1%
ub_gcc_1copy_Shell_Scripts_16_concurrent : -0.12%%
ub_gcc_56copies_Shell_Scripts_1_concurrent : -2.29%%
ub_gcc_56copies_Shell_Scripts_8_concurrent : -4.22%
ub_gcc_56copies_Shell_Scripts_16_concurrent : -4.23%
ub_gcc_224copies_Shell_Scripts_1_concurrent : -5.54%
ub_gcc_224copies_Shell_Scripts_8_concurrent : -8%
ub_gcc_224copies_Shell_Scripts_16_concurrent : -7.05%
ub_gcc_448copies_Shell_Scripts_1_concurrent : -6.4%
ub_gcc_448copies_Shell_Scripts_8_concurrent : -8.35%
ub_gcc_448copies_Shell_Scripts_16_concurrent : -7.09%

Link to unixbench:
github.com/kdlucas/byte-unixbench

Info about benchmark:
"The shells scripts test measures the number of times per minute a
process can start and reap a set of one, two, four and eight concurrent
copies of a shell scripts where the shell script applies a series of
transformation to a data file”

I have also evaluated performance before and after both of these two commits (one if fixing the other) but I still observe the same regression (C1 is still the source of regression).
C1. a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
C2. 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed

Thank you very much,
Saeed

2023-06-09 17:25:54

by Vincent Guittot

[permalink] [raw]

Subject: Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

Hi Saeed,

On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
<[email protected]> wrote:
>
> Hi all,
>
> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>
> Commit Data:
> commit-id : a53ce18cacb477dd0513c607f187d16f0fa96f71
> subject : sched/fair: Sanitize vruntime of entity being migrated
> author : [email protected]
> author date : 2023-03-17 16:08:10
>
>
> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.

It would be good to confirm that the regression is present on v6.3
where the patch has been merged originally. It can be that there is
hidden dependency with other patches introduced since v5.4

>
> ub_gcc_1copy_Shell_Scripts_1_concurrent : -0.01%
> ub_gcc_1copy_Shell_Scripts_8_concurrent : -0.1%
> ub_gcc_1copy_Shell_Scripts_16_concurrent : -0.12%%
> ub_gcc_56copies_Shell_Scripts_1_concurrent : -2.29%%
> ub_gcc_56copies_Shell_Scripts_8_concurrent : -4.22%
> ub_gcc_56copies_Shell_Scripts_16_concurrent : -4.23%
> ub_gcc_224copies_Shell_Scripts_1_concurrent : -5.54%
> ub_gcc_224copies_Shell_Scripts_8_concurrent : -8%
> ub_gcc_224copies_Shell_Scripts_16_concurrent : -7.05%
> ub_gcc_448copies_Shell_Scripts_1_concurrent : -6.4%
> ub_gcc_448copies_Shell_Scripts_8_concurrent : -8.35%
> ub_gcc_448copies_Shell_Scripts_16_concurrent : -7.09%
>
> Link to unixbench:
> github.com/kdlucas/byte-unixbench

I tried to reproduce the problem with v6.3 on my system but I don't
see any difference with or without the patch

Do you have more details on your setup ? number of cpu and topology ?

>
> Info about benchmark:
> "The shells scripts test measures the number of times per minute a
> process can start and reap a set of one, two, four and eight concurrent
> copies of a shell scripts where the shell script applies a series of
> transformation to a data file”
>
> I have also evaluated performance before and after both of these two commits (one if fixing the other) but I still observe the same regression (C1 is still the source of regression).
> C1. a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
> C2. 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed

C2 has introduced some regressions because of the case of newly
migrated tasks that were not correctly managed and C1 fixes this
problem. Then, both have an impact on system that runs for days with
low prio task

Thanks,
Vincent

>
> Thank you very much,
> Saeed
>

2023-06-13 20:11:54

by Saeed Mirzamohammadi

[permalink] [raw]

Subject: Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

Hi Vincent,

> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <[email protected]> wrote:
>
> Hi Saeed,
>
> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
> <[email protected]> wrote:
>>
>> Hi all,
>>
>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>>
>> Commit Data:
>> commit-id : a53ce18cacb477dd0513c607f187d16f0fa96f71
>> subject : sched/fair: Sanitize vruntime of entity being migrated
>> author : [email protected]
>> author date : 2023-03-17 16:08:10
>>
>>
>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
>
> It would be good to confirm that the regression is present on v6.3
> where the patch has been merged originally. It can be that there is
> hidden dependency with other patches introduced since v5.4

Regression is present on v6.3 as well, examples:
ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
>
>
>>
>> ub_gcc_1copy_Shell_Scripts_1_concurrent : -0.01%
>> ub_gcc_1copy_Shell_Scripts_8_concurrent : -0.1%
>> ub_gcc_1copy_Shell_Scripts_16_concurrent : -0.12%%
>> ub_gcc_56copies_Shell_Scripts_1_concurrent : -2.29%%
>> ub_gcc_56copies_Shell_Scripts_8_concurrent : -4.22%
>> ub_gcc_56copies_Shell_Scripts_16_concurrent : -4.23%
>> ub_gcc_224copies_Shell_Scripts_1_concurrent : -5.54%
>> ub_gcc_224copies_Shell_Scripts_8_concurrent : -8%
>> ub_gcc_224copies_Shell_Scripts_16_concurrent : -7.05%
>> ub_gcc_448copies_Shell_Scripts_1_concurrent : -6.4%
>> ub_gcc_448copies_Shell_Scripts_8_concurrent : -8.35%
>> ub_gcc_448copies_Shell_Scripts_16_concurrent : -7.09%
>>
>> Link to unixbench:
>> github.com/kdlucas/byte-unixbench
>
> I tried to reproduce the problem with v6.3 on my system but I don't
> see any difference with or without the patch
>
> Do you have more details on your setup ? number of cpu and topology ?
>
model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz

Topology:
node 0 1
0: 10 21
1: 21 10

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 56
On-line CPU(s) list: 0-55
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
NUMA node(s): 2

Thanks,

>>
>> Info about benchmark:
>> "The shells scripts test measures the number of times per minute a
>> process can start and reap a set of one, two, four and eight concurrent
>> copies of a shell scripts where the shell script applies a series of
>> transformation to a data file”
>>
>> I have also evaluated performance before and after both of these two commits (one if fixing the other) but I still observe the same regression (C1 is still the source of regression).
>> C1. a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
>> C2. 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed
>
> C2 has introduced some regressions because of the case of newly
> migrated tasks that were not correctly managed and C1 fixes this
> problem. Then, both have an impact on system that runs for days with
> low prio task
>
> Thanks,
> Vincent
>
>
>>
>> Thank you very much,
>> Saeed

2023-06-14 06:40:50

by Chen Yu

[permalink] [raw]

Subject: Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
> Hi Vincent,
>
> > On Jun 9, 2023, at 9:52 AM, Vincent Guittot <[email protected]> wrote:
> >
> > Hi Saeed,
> >
> > On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
> > <[email protected]> wrote:
> >>
> >> Hi all,
> >>
> >> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
> >>
> >> Commit Data:
> >> commit-id : a53ce18cacb477dd0513c607f187d16f0fa96f71
> >> subject : sched/fair: Sanitize vruntime of entity being migrated
> >> author : [email protected]
> >> author date : 2023-03-17 16:08:10
> >>
> >>
> >> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
> >
> > It would be good to confirm that the regression is present on v6.3
> > where the patch has been merged originally. It can be that there is
> > hidden dependency with other patches introduced since v5.4
>
> Regression is present on v6.3 as well, examples:
> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
> >
> >
> >>
> >> ub_gcc_1copy_Shell_Scripts_1_concurrent : -0.01%
> >> ub_gcc_1copy_Shell_Scripts_8_concurrent : -0.1%
> >> ub_gcc_1copy_Shell_Scripts_16_concurrent : -0.12%%
> >> ub_gcc_56copies_Shell_Scripts_1_concurrent : -2.29%%
> >> ub_gcc_56copies_Shell_Scripts_8_concurrent : -4.22%
> >> ub_gcc_56copies_Shell_Scripts_16_concurrent : -4.23%
> >> ub_gcc_224copies_Shell_Scripts_1_concurrent : -5.54%
> >> ub_gcc_224copies_Shell_Scripts_8_concurrent : -8%
> >> ub_gcc_224copies_Shell_Scripts_16_concurrent : -7.05%
> >> ub_gcc_448copies_Shell_Scripts_1_concurrent : -6.4%
> >> ub_gcc_448copies_Shell_Scripts_8_concurrent : -8.35%
> >> ub_gcc_448copies_Shell_Scripts_16_concurrent : -7.09%
> >>
> >> Link to unixbench:
> >> github.com/kdlucas/byte-unixbench
> >
> > I tried to reproduce the problem with v6.3 on my system but I don't
> > see any difference with or without the patch
> >
> > Do you have more details on your setup ? number of cpu and topology ?
> >
> model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>
> Topology:
> node 0 1
> 0: 10 21
> 1: 21 10
>
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> CPU(s): 56
> On-line CPU(s) list: 0-55
> Thread(s) per core: 2
> Core(s) per socket: 14
> Socket(s): 2
> NUMA node(s): 2
>
Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
24 cores/48 CPUs in total, however I could not reproduce the issue.
Since the regression was reported mainly against 224 and 448 copies case
on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.

a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
---------------- ---------------------------
%stddev %change %stddev
\ | \
21304 +0.5% 21420 unixbench.score
632.43 +0.0% 632.44 unixbench.time.elapsed_time
632.43 +0.0% 632.44 unixbench.time.elapsed_time.max
11837046 -4.7% 11277727 unixbench.time.involuntary_context_switches
864713 +0.1% 865914 unixbench.time.major_page_faults
9600 +4.0% 9984 unixbench.time.maximum_resident_set_size
8.433e+08 +0.6% 8.48e+08 unixbench.time.minor_page_faults
4096 +0.0% 4096 unixbench.time.page_size
3741 +1.1% 3783 unixbench.time.percent_of_cpu_this_job_got
18341 +1.3% 18572 unixbench.time.system_time
5323 +0.6% 5353 unixbench.time.user_time
78197044 -3.1% 75791701 unixbench.time.voluntary_context_switches
57178573 +0.4% 57399061 unixbench.workload

There is no much difference with a53ce18cacb477dd applied or not.

a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
---------------- ---------------------------
%stddev %change %stddev
\ | \
19985 +8.6% 21697 unixbench.score
632.64 -0.0% 632.53 unixbench.time.elapsed_time
632.64 -0.0% 632.53 unixbench.time.elapsed_time.max
11453985 +3.7% 11880259 unixbench.time.involuntary_context_switches
818996 +3.1% 844681 unixbench.time.major_page_faults
9600 +0.0% 9600 unixbench.time.maximum_resident_set_size
7.911e+08 +8.4% 8.575e+08 unixbench.time.minor_page_faults
4096 +0.0% 4096 unixbench.time.page_size
3767 -0.4% 3752 unixbench.time.percent_of_cpu_this_job_got
18873 -2.4% 18423 unixbench.time.system_time
4960 +7.1% 5313 unixbench.time.user_time
75436000 +10.8% 83581483 unixbench.time.voluntary_context_switches
53553404 +8.7% 58235303 unixbench.workload

Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
remains with a53ce18cacb477dd applied.

Can you send the full test script so I can have a try locally?

thanks,
Chenyu

2023-06-21 17:16:15

by Saeed Mirzamohammadi

[permalink] [raw]

Subject: Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

Hi Chen, Vincent,

> On Jun 13, 2023, at 11:37 PM, Chen Yu <[email protected]> wrote:
>
> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
>> Hi Vincent,
>>
>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <[email protected]> wrote:
>>>
>>> Hi Saeed,
>>>
>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
>>> <[email protected]> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>>>>
>>>> Commit Data:
>>>> commit-id : a53ce18cacb477dd0513c607f187d16f0fa96f71
>>>> subject : sched/fair: Sanitize vruntime of entity being migrated
>>>> author : [email protected]
>>>> author date : 2023-03-17 16:08:10
>>>>
>>>>
>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
>>>
>>> It would be good to confirm that the regression is present on v6.3
>>> where the patch has been merged originally. It can be that there is
>>> hidden dependency with other patches introduced since v5.4
>>
>> Regression is present on v6.3 as well, examples:
>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%

Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
v6.3.y -> no regression
v5.15.y -> no regression
v5.4.y -> 5-8% regression.

>>>
>>>
>>>>
>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent : -0.01%
>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent : -0.1%
>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent : -0.12%%
>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent : -2.29%%
>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent : -4.22%
>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent : -4.23%
>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent : -5.54%
>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent : -8%
>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent : -7.05%
>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent : -6.4%
>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent : -8.35%
>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent : -7.09%
>>>>
>>>> Link to unixbench:
>>>> github.com/kdlucas/byte-unixbench
>>>
>>> I tried to reproduce the problem with v6.3 on my system but I don't
>>> see any difference with or without the patch
>>>
>>> Do you have more details on your setup ? number of cpu and topology ?
>>>
>> model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>
>> Topology:
>> node 0 1
>> 0: 10 21
>> 1: 21 10
>>
>> Architecture: x86_64
>> CPU op-mode(s): 32-bit, 64-bit
>> CPU(s): 56
>> On-line CPU(s) list: 0-55
>> Thread(s) per core: 2
>> Core(s) per socket: 14
>> Socket(s): 2
>> NUMA node(s): 2
>>
> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
> 24 cores/48 CPUs in total, however I could not reproduce the issue.
> Since the regression was reported mainly against 224 and 448 copies case
> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
>
>
> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 21304 +0.5% 21420 unixbench.score
> 632.43 +0.0% 632.44 unixbench.time.elapsed_time
> 632.43 +0.0% 632.44 unixbench.time.elapsed_time.max
> 11837046 -4.7% 11277727 unixbench.time.involuntary_context_switches
> 864713 +0.1% 865914 unixbench.time.major_page_faults
> 9600 +4.0% 9984 unixbench.time.maximum_resident_set_size
> 8.433e+08 +0.6% 8.48e+08 unixbench.time.minor_page_faults
> 4096 +0.0% 4096 unixbench.time.page_size
> 3741 +1.1% 3783 unixbench.time.percent_of_cpu_this_job_got
> 18341 +1.3% 18572 unixbench.time.system_time
> 5323 +0.6% 5353 unixbench.time.user_time
> 78197044 -3.1% 75791701 unixbench.time.voluntary_context_switches
> 57178573 +0.4% 57399061 unixbench.workload
>
> There is no much difference with a53ce18cacb477dd applied or not.
>
>
>
>
>
> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 19985 +8.6% 21697 unixbench.score
> 632.64 -0.0% 632.53 unixbench.time.elapsed_time
> 632.64 -0.0% 632.53 unixbench.time.elapsed_time.max
> 11453985 +3.7% 11880259 unixbench.time.involuntary_context_switches
> 818996 +3.1% 844681 unixbench.time.major_page_faults
> 9600 +0.0% 9600 unixbench.time.maximum_resident_set_size
> 7.911e+08 +8.4% 8.575e+08 unixbench.time.minor_page_faults
> 4096 +0.0% 4096 unixbench.time.page_size
> 3767 -0.4% 3752 unixbench.time.percent_of_cpu_this_job_got
> 18873 -2.4% 18423 unixbench.time.system_time
> 4960 +7.1% 5313 unixbench.time.user_time
> 75436000 +10.8% 83581483 unixbench.time.voluntary_context_switches
> 53553404 +8.7% 58235303 unixbench.workload
>
> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
> remains with a53ce18cacb477dd applied.
>
> Can you send the full test script so I can have a try locally?

Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
And that’s ’shell8’ with ‘-c 448’ copies passed as argument.

Thanks,
Saeed

>
> thanks,
> Chenyu

2023-06-29 22:28:39

by Saeed Mirzamohammadi

[permalink] [raw]

Subject: Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

> On Jun 21, 2023, at 9:41 AM, Saeed Mirzamohammadi <[email protected]> wrote:
>
> Hi Chen, Vincent,
>
>> On Jun 13, 2023, at 11:37 PM, Chen Yu <[email protected]> wrote:
>>
>> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
>>> Hi Vincent,
>>>
>>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <[email protected]> wrote:
>>>>
>>>> Hi Saeed,
>>>>
>>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
>>>> <[email protected]> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>>>>>
>>>>> Commit Data:
>>>>> commit-id : a53ce18cacb477dd0513c607f187d16f0fa96f71
>>>>> subject : sched/fair: Sanitize vruntime of entity being migrated
>>>>> author : [email protected]
>>>>> author date : 2023-03-17 16:08:10
>>>>>
>>>>>
>>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
>>>>
>>>> It would be good to confirm that the regression is present on v6.3
>>>> where the patch has been merged originally. It can be that there is
>>>> hidden dependency with other patches introduced since v5.4
>>>
>>> Regression is present on v6.3 as well, examples:
>>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
>>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
>>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
>
> Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
> v6.3.y -> no regression
> v5.15.y -> no regression
> v5.4.y -> 5-8% regression.

A gentle reminder if there is any recommendation for v5.4.y and v4.14.y regression. Thanks!

>
>
>>>>
>>>>
>>>>>
>>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent : -0.01%
>>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent : -0.1%
>>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent : -0.12%%
>>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent : -2.29%%
>>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent : -4.22%
>>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent : -4.23%
>>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent : -5.54%
>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent : -8%
>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent : -7.05%
>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent : -6.4%
>>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent : -8.35%
>>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent : -7.09%
>>>>>
>>>>> Link to unixbench:
>>>>> github.com/kdlucas/byte-unixbench
>>>>
>>>> I tried to reproduce the problem with v6.3 on my system but I don't
>>>> see any difference with or without the patch
>>>>
>>>> Do you have more details on your setup ? number of cpu and topology ?
>>>>
>>> model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>>
>>> Topology:
>>> node 0 1
>>> 0: 10 21
>>> 1: 21 10
>>>
>>> Architecture: x86_64
>>> CPU op-mode(s): 32-bit, 64-bit
>>> CPU(s): 56
>>> On-line CPU(s) list: 0-55
>>> Thread(s) per core: 2
>>> Core(s) per socket: 14
>>> Socket(s): 2
>>> NUMA node(s): 2
>>>
>> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
>> 24 cores/48 CPUs in total, however I could not reproduce the issue.
>> Since the regression was reported mainly against 224 and 448 copies case
>> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
>>
>>
>> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
>> ---------------- ---------------------------
>> %stddev %change %stddev
>> \ | \
>> 21304 +0.5% 21420 unixbench.score
>> 632.43 +0.0% 632.44 unixbench.time.elapsed_time
>> 632.43 +0.0% 632.44 unixbench.time.elapsed_time.max
>> 11837046 -4.7% 11277727 unixbench.time.involuntary_context_switches
>> 864713 +0.1% 865914 unixbench.time.major_page_faults
>> 9600 +4.0% 9984 unixbench.time.maximum_resident_set_size
>> 8.433e+08 +0.6% 8.48e+08 unixbench.time.minor_page_faults
>> 4096 +0.0% 4096 unixbench.time.page_size
>> 3741 +1.1% 3783 unixbench.time.percent_of_cpu_this_job_got
>> 18341 +1.3% 18572 unixbench.time.system_time
>> 5323 +0.6% 5353 unixbench.time.user_time
>> 78197044 -3.1% 75791701 unixbench.time.voluntary_context_switches
>> 57178573 +0.4% 57399061 unixbench.workload
>>
>> There is no much difference with a53ce18cacb477dd applied or not.
>>
>>
>>
>>
>>
>> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
>> ---------------- ---------------------------
>> %stddev %change %stddev
>> \ | \
>> 19985 +8.6% 21697 unixbench.score
>> 632.64 -0.0% 632.53 unixbench.time.elapsed_time
>> 632.64 -0.0% 632.53 unixbench.time.elapsed_time.max
>> 11453985 +3.7% 11880259 unixbench.time.involuntary_context_switches
>> 818996 +3.1% 844681 unixbench.time.major_page_faults
>> 9600 +0.0% 9600 unixbench.time.maximum_resident_set_size
>> 7.911e+08 +8.4% 8.575e+08 unixbench.time.minor_page_faults
>> 4096 +0.0% 4096 unixbench.time.page_size
>> 3767 -0.4% 3752 unixbench.time.percent_of_cpu_this_job_got
>> 18873 -2.4% 18423 unixbench.time.system_time
>> 4960 +7.1% 5313 unixbench.time.user_time
>> 75436000 +10.8% 83581483 unixbench.time.voluntary_context_switches
>> 53553404 +8.7% 58235303 unixbench.workload
>>
>> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
>> remains with a53ce18cacb477dd applied.
>>
>> Can you send the full test script so I can have a try locally?
>
> Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
> And that’s ’shell8’ with ‘-c 448’ copies passed as argument.
>
> Thanks,
> Saeed
>
>>
>> thanks,
>> Chenyu

2023-06-30 08:33:49

by Vincent Guittot

[permalink] [raw]

Subject: Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

On Fri, 30 Jun 2023 at 00:20, Saeed Mirzamohammadi
<[email protected]> wrote:
>
>
>
> > On Jun 21, 2023, at 9:41 AM, Saeed Mirzamohammadi <[email protected]> wrote:
> >
> > Hi Chen, Vincent,
> >
> >> On Jun 13, 2023, at 11:37 PM, Chen Yu <[email protected]> wrote:
> >>
> >> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
> >>> Hi Vincent,
> >>>
> >>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <[email protected]> wrote:
> >>>>
> >>>> Hi Saeed,
> >>>>
> >>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
> >>>>>
> >>>>> Commit Data:
> >>>>> commit-id : a53ce18cacb477dd0513c607f187d16f0fa96f71
> >>>>> subject : sched/fair: Sanitize vruntime of entity being migrated
> >>>>> author : [email protected]
> >>>>> author date : 2023-03-17 16:08:10
> >>>>>
> >>>>>
> >>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
> >>>>
> >>>> It would be good to confirm that the regression is present on v6.3
> >>>> where the patch has been merged originally. It can be that there is
> >>>> hidden dependency with other patches introduced since v5.4
> >>>
> >>> Regression is present on v6.3 as well, examples:
> >>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
> >>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
> >>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
> >
> > Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
> > v6.3.y -> no regression
> > v5.15.y -> no regression
> > v5.4.y -> 5-8% regression.
>
> A gentle reminder if there is any recommendation for v5.4.y and v4.14.y regression. Thanks!

I tried to find why the regression happens only for v5.4.y (or lower)
and not for v5.15.y (or above) but I haven't been able to find any
possible reason in the code.

Regarding the 2 commits below, they must come together so we can't
simply revert 1 and not the other.
commit 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed
commit a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated

entity_is_long_sleeper() should never return true in your case. Could
you try to check that it's the case for you ?

>
> >
> >
> >>>>
> >>>>
> >>>>>
> >>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent : -0.01%
> >>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent : -0.1%
> >>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent : -0.12%%
> >>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent : -2.29%%
> >>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent : -4.22%
> >>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent : -4.23%
> >>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent : -5.54%
> >>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent : -8%
> >>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent : -7.05%
> >>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent : -6.4%
> >>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent : -8.35%
> >>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent : -7.09%
> >>>>>
> >>>>> Link to unixbench:
> >>>>> github.com/kdlucas/byte-unixbench
> >>>>
> >>>> I tried to reproduce the problem with v6.3 on my system but I don't
> >>>> see any difference with or without the patch
> >>>>
> >>>> Do you have more details on your setup ? number of cpu and topology ?
> >>>>
> >>> model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> >>>
> >>> Topology:
> >>> node 0 1
> >>> 0: 10 21
> >>> 1: 21 10
> >>>
> >>> Architecture: x86_64
> >>> CPU op-mode(s): 32-bit, 64-bit
> >>> CPU(s): 56
> >>> On-line CPU(s) list: 0-55
> >>> Thread(s) per core: 2
> >>> Core(s) per socket: 14
> >>> Socket(s): 2
> >>> NUMA node(s): 2
> >>>
> >> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
> >> 24 cores/48 CPUs in total, however I could not reproduce the issue.
> >> Since the regression was reported mainly against 224 and 448 copies case
> >> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
> >>
> >>
> >> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
> >> ---------------- ---------------------------
> >> %stddev %change %stddev
> >> \ | \
> >> 21304 +0.5% 21420 unixbench.score
> >> 632.43 +0.0% 632.44 unixbench.time.elapsed_time
> >> 632.43 +0.0% 632.44 unixbench.time.elapsed_time.max
> >> 11837046 -4.7% 11277727 unixbench.time.involuntary_context_switches
> >> 864713 +0.1% 865914 unixbench.time.major_page_faults
> >> 9600 +4.0% 9984 unixbench.time.maximum_resident_set_size
> >> 8.433e+08 +0.6% 8.48e+08 unixbench.time.minor_page_faults
> >> 4096 +0.0% 4096 unixbench.time.page_size
> >> 3741 +1.1% 3783 unixbench.time.percent_of_cpu_this_job_got
> >> 18341 +1.3% 18572 unixbench.time.system_time
> >> 5323 +0.6% 5353 unixbench.time.user_time
> >> 78197044 -3.1% 75791701 unixbench.time.voluntary_context_switches
> >> 57178573 +0.4% 57399061 unixbench.workload
> >>
> >> There is no much difference with a53ce18cacb477dd applied or not.
> >>
> >>
> >>
> >>
> >>
> >> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> >> ---------------- ---------------------------
> >> %stddev %change %stddev
> >> \ | \
> >> 19985 +8.6% 21697 unixbench.score
> >> 632.64 -0.0% 632.53 unixbench.time.elapsed_time
> >> 632.64 -0.0% 632.53 unixbench.time.elapsed_time.max
> >> 11453985 +3.7% 11880259 unixbench.time.involuntary_context_switches
> >> 818996 +3.1% 844681 unixbench.time.major_page_faults
> >> 9600 +0.0% 9600 unixbench.time.maximum_resident_set_size
> >> 7.911e+08 +8.4% 8.575e+08 unixbench.time.minor_page_faults
> >> 4096 +0.0% 4096 unixbench.time.page_size
> >> 3767 -0.4% 3752 unixbench.time.percent_of_cpu_this_job_got
> >> 18873 -2.4% 18423 unixbench.time.system_time
> >> 4960 +7.1% 5313 unixbench.time.user_time
> >> 75436000 +10.8% 83581483 unixbench.time.voluntary_context_switches
> >> 53553404 +8.7% 58235303 unixbench.workload
> >>
> >> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
> >> remains with a53ce18cacb477dd applied.
> >>
> >> Can you send the full test script so I can have a try locally?
> >
> > Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
> > And that’s ’shell8’ with ‘-c 448’ copies passed as argument.
> >
> > Thanks,
> > Saeed
> >
> >>
> >> thanks,
> >> Chenyu
>

2023-07-20 23:33:51

by Saeed Mirzamohammadi

[permalink] [raw]

Subject: Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

Hi Vincent,

> On Jun 30, 2023, at 1:28 AM, Vincent Guittot <[email protected]> wrote:
>
> On Fri, 30 Jun 2023 at 00:20, Saeed Mirzamohammadi
> <[email protected]> wrote:
>>
>>
>>
>>> On Jun 21, 2023, at 9:41 AM, Saeed Mirzamohammadi <[email protected]> wrote:
>>>
>>> Hi Chen, Vincent,
>>>
>>>> On Jun 13, 2023, at 11:37 PM, Chen Yu <[email protected]> wrote:
>>>>
>>>> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
>>>>> Hi Vincent,
>>>>>
>>>>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <[email protected]> wrote:
>>>>>>
>>>>>> Hi Saeed,
>>>>>>
>>>>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>>>>>>>
>>>>>>> Commit Data:
>>>>>>> commit-id : a53ce18cacb477dd0513c607f187d16f0fa96f71
>>>>>>> subject : sched/fair: Sanitize vruntime of entity being migrated
>>>>>>> author : [email protected]
>>>>>>> author date : 2023-03-17 16:08:10
>>>>>>>
>>>>>>>
>>>>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
>>>>>>
>>>>>> It would be good to confirm that the regression is present on v6.3
>>>>>> where the patch has been merged originally. It can be that there is
>>>>>> hidden dependency with other patches introduced since v5.4
>>>>>
>>>>> Regression is present on v6.3 as well, examples:
>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
>>>
>>> Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
>>> v6.3.y -> no regression
>>> v5.15.y -> no regression
>>> v5.4.y -> 5-8% regression.
>>
>> A gentle reminder if there is any recommendation for v5.4.y and v4.14.y regression. Thanks!
>
> I tried to find why the regression happens only for v5.4.y (or lower)
> and not for v5.15.y (or above) but I haven't been able to find any
> possible reason in the code.
>
> Regarding the 2 commits below, they must come together so we can't
> simply revert 1 and not the other.
> commit 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed
> commit a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
>
Tests were done before and after these 2 commits.

> entity_is_long_sleeper() should never return true in your case. Could
> you try to check that it's the case for you ?
>
Tested this and entity_is_long_sleeper() never returns True.

I actually removed the related part, tested, and the regression is gone with the following change (partial revert):

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3ebd2054996bc..0d70dd6e14844 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -792,9 +792,6 @@ static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags)

void activate_task(struct rq *rq, struct task_struct *p, int flags)
{
- if (task_on_rq_migrating(p))
- flags |= ENQUEUE_MIGRATED;
-
if (task_contributes_to_load(p))
rq->nr_uninterruptible--;

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 83a7cf62c0f53..ef9aca05c7bdf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3779,9 +3779,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)

if (flags & ENQUEUE_WAKEUP)
place_entity(cfs_rq, se, 0);
- /* Entity has migrated, no longer consider this task hot */
- if (flags & ENQUEUE_MIGRATED)
- se->exec_start = 0;

check_schedstat_required();
update_stats_enqueue(cfs_rq, se, flags);
@@ -6182,6 +6179,9 @@ static void migrate_task_rq_fair(struct task_struct *p)

/* Tell new CPU we are migrated */
p->se.avg.last_update_time = 0;
+
+ /* We have migrated, no longer consider this task hot */
+ p->se.exec_start = 0;
}

static void task_dead_fair(struct task_struct *p)

>
>
>
>
>>
>>>
>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent : -0.01%
>>>>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent : -0.1%
>>>>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent : -0.12%%
>>>>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent : -2.29%%
>>>>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent : -4.22%
>>>>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent : -4.23%
>>>>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent : -5.54%
>>>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent : -8%
>>>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent : -7.05%
>>>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent : -6.4%
>>>>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent : -8.35%
>>>>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent : -7.09%
>>>>>>>
>>>>>>> Link to unixbench:
>>>>>>> github.com/kdlucas/byte-unixbench
>>>>>>
>>>>>> I tried to reproduce the problem with v6.3 on my system but I don't
>>>>>> see any difference with or without the patch
>>>>>>
>>>>>> Do you have more details on your setup ? number of cpu and topology ?
>>>>>>
>>>>> model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>>>>
>>>>> Topology:
>>>>> node 0 1
>>>>> 0: 10 21
>>>>> 1: 21 10
>>>>>
>>>>> Architecture: x86_64
>>>>> CPU op-mode(s): 32-bit, 64-bit
>>>>> CPU(s): 56
>>>>> On-line CPU(s) list: 0-55
>>>>> Thread(s) per core: 2
>>>>> Core(s) per socket: 14
>>>>> Socket(s): 2
>>>>> NUMA node(s): 2
>>>>>
>>>> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
>>>> 24 cores/48 CPUs in total, however I could not reproduce the issue.
>>>> Since the regression was reported mainly against 224 and 448 copies case
>>>> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
>>>>
>>>>
>>>> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
>>>> ---------------- ---------------------------
>>>> %stddev %change %stddev
>>>> \ | \
>>>> 21304 +0.5% 21420 unixbench.score
>>>> 632.43 +0.0% 632.44 unixbench.time.elapsed_time
>>>> 632.43 +0.0% 632.44 unixbench.time.elapsed_time.max
>>>> 11837046 -4.7% 11277727 unixbench.time.involuntary_context_switches
>>>> 864713 +0.1% 865914 unixbench.time.major_page_faults
>>>> 9600 +4.0% 9984 unixbench.time.maximum_resident_set_size
>>>> 8.433e+08 +0.6% 8.48e+08 unixbench.time.minor_page_faults
>>>> 4096 +0.0% 4096 unixbench.time.page_size
>>>> 3741 +1.1% 3783 unixbench.time.percent_of_cpu_this_job_got
>>>> 18341 +1.3% 18572 unixbench.time.system_time
>>>> 5323 +0.6% 5353 unixbench.time.user_time
>>>> 78197044 -3.1% 75791701 unixbench.time.voluntary_context_switches
>>>> 57178573 +0.4% 57399061 unixbench.workload
>>>>
>>>> There is no much difference with a53ce18cacb477dd applied or not.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
>>>> ---------------- ---------------------------
>>>> %stddev %change %stddev
>>>> \ | \
>>>> 19985 +8.6% 21697 unixbench.score
>>>> 632.64 -0.0% 632.53 unixbench.time.elapsed_time
>>>> 632.64 -0.0% 632.53 unixbench.time.elapsed_time.max
>>>> 11453985 +3.7% 11880259 unixbench.time.involuntary_context_switches
>>>> 818996 +3.1% 844681 unixbench.time.major_page_faults
>>>> 9600 +0.0% 9600 unixbench.time.maximum_resident_set_size
>>>> 7.911e+08 +8.4% 8.575e+08 unixbench.time.minor_page_faults
>>>> 4096 +0.0% 4096 unixbench.time.page_size
>>>> 3767 -0.4% 3752 unixbench.time.percent_of_cpu_this_job_got
>>>> 18873 -2.4% 18423 unixbench.time.system_time
>>>> 4960 +7.1% 5313 unixbench.time.user_time
>>>> 75436000 +10.8% 83581483 unixbench.time.voluntary_context_switches
>>>> 53553404 +8.7% 58235303 unixbench.workload
>>>>
>>>> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
>>>> remains with a53ce18cacb477dd applied.
>>>>
>>>> Can you send the full test script so I can have a try locally?
>>>
>>> Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
>>> And that’s ’shell8’ with ‘-c 448’ copies passed as argument.
>>>
>>> Thanks,
>>> Saeed
>>>
>>>>
>>>> thanks,
>>>> Chenyu

2023-07-21 14:26:35

by Vincent Guittot

[permalink] [raw]

Subject: Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

Hi Saeed,

On Fri, 21 Jul 2023 at 01:04, Saeed Mirzamohammadi
<[email protected]> wrote:
>
> Hi Vincent,
>
> > On Jun 30, 2023, at 1:28 AM, Vincent Guittot <[email protected]> wrote:
> >
> > On Fri, 30 Jun 2023 at 00:20, Saeed Mirzamohammadi
> > <[email protected]> wrote:
> >>
> >>
> >>
> >>> On Jun 21, 2023, at 9:41 AM, Saeed Mirzamohammadi <[email protected]> wrote:
> >>>
> >>> Hi Chen, Vincent,
> >>>
> >>>> On Jun 13, 2023, at 11:37 PM, Chen Yu <[email protected]> wrote:
> >>>>
> >>>> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
> >>>>> Hi Vincent,
> >>>>>
> >>>>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <[email protected]> wrote:
> >>>>>>
> >>>>>> Hi Saeed,
> >>>>>>
> >>>>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
> >>>>>> <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
> >>>>>>>
> >>>>>>> Commit Data:
> >>>>>>> commit-id : a53ce18cacb477dd0513c607f187d16f0fa96f71
> >>>>>>> subject : sched/fair: Sanitize vruntime of entity being migrated
> >>>>>>> author : [email protected]
> >>>>>>> author date : 2023-03-17 16:08:10
> >>>>>>>
> >>>>>>>
> >>>>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
> >>>>>>
> >>>>>> It would be good to confirm that the regression is present on v6.3
> >>>>>> where the patch has been merged originally. It can be that there is
> >>>>>> hidden dependency with other patches introduced since v5.4
> >>>>>
> >>>>> Regression is present on v6.3 as well, examples:
> >>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
> >>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
> >>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
> >>>
> >>> Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
> >>> v6.3.y -> no regression
> >>> v5.15.y -> no regression
> >>> v5.4.y -> 5-8% regression.
> >>
> >> A gentle reminder if there is any recommendation for v5.4.y and v4.14.y regression. Thanks!
> >
> > I tried to find why the regression happens only for v5.4.y (or lower)
> > and not for v5.15.y (or above) but I haven't been able to find any
> > possible reason in the code.
> >
> > Regarding the 2 commits below, they must come together so we can't
> > simply revert 1 and not the other.
> > commit 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed
> > commit a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
> >
> Tests were done before and after these 2 commits.
>
> > entity_is_long_sleeper() should never return true in your case. Could
> > you try to check that it's the case for you ?
> >
> Tested this and entity_is_long_sleeper() never returns True.
>
> I actually removed the related part, tested, and the regression is gone with the following change (partial revert):
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3ebd2054996bc..0d70dd6e14844 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -792,9 +792,6 @@ static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
>
> void activate_task(struct rq *rq, struct task_struct *p, int flags)
> {
> - if (task_on_rq_migrating(p))
> - flags |= ENQUEUE_MIGRATED;
> -
> if (task_contributes_to_load(p))
> rq->nr_uninterruptible--;
>

Is the regression still there if you only apply the partial revert
below but not the above part ?
I have rechecked the code but can't see any obvious reason why there
is a regression on v5.4 and not on v5.15.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 83a7cf62c0f53..ef9aca05c7bdf 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3779,9 +3779,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>
> if (flags & ENQUEUE_WAKEUP)
> place_entity(cfs_rq, se, 0);
> - /* Entity has migrated, no longer consider this task hot */
> - if (flags & ENQUEUE_MIGRATED)
> - se->exec_start = 0;
>
> check_schedstat_required();
> update_stats_enqueue(cfs_rq, se, flags);
> @@ -6182,6 +6179,9 @@ static void migrate_task_rq_fair(struct task_struct *p)
>
> /* Tell new CPU we are migrated */
> p->se.avg.last_update_time = 0;
> +
> + /* We have migrated, no longer consider this task hot */
> + p->se.exec_start = 0;
> }
>
> static void task_dead_fair(struct task_struct *p)
>
>
> >
> >
> >
> >
> >>
> >>>
> >>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent : -0.01%
> >>>>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent : -0.1%
> >>>>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent : -0.12%%
> >>>>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent : -2.29%%
> >>>>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent : -4.22%
> >>>>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent : -4.23%
> >>>>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent : -5.54%
> >>>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent : -8%
> >>>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent : -7.05%
> >>>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent : -6.4%
> >>>>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent : -8.35%
> >>>>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent : -7.09%
> >>>>>>>
> >>>>>>> Link to unixbench:
> >>>>>>> github.com/kdlucas/byte-unixbench
> >>>>>>
> >>>>>> I tried to reproduce the problem with v6.3 on my system but I don't
> >>>>>> see any difference with or without the patch
> >>>>>>
> >>>>>> Do you have more details on your setup ? number of cpu and topology ?
> >>>>>>
> >>>>> model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> >>>>>
> >>>>> Topology:
> >>>>> node 0 1
> >>>>> 0: 10 21
> >>>>> 1: 21 10
> >>>>>
> >>>>> Architecture: x86_64
> >>>>> CPU op-mode(s): 32-bit, 64-bit
> >>>>> CPU(s): 56
> >>>>> On-line CPU(s) list: 0-55
> >>>>> Thread(s) per core: 2
> >>>>> Core(s) per socket: 14
> >>>>> Socket(s): 2
> >>>>> NUMA node(s): 2
> >>>>>
> >>>> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
> >>>> 24 cores/48 CPUs in total, however I could not reproduce the issue.
> >>>> Since the regression was reported mainly against 224 and 448 copies case
> >>>> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
> >>>>
> >>>>
> >>>> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
> >>>> ---------------- ---------------------------
> >>>> %stddev %change %stddev
> >>>> \ | \
> >>>> 21304 +0.5% 21420 unixbench.score
> >>>> 632.43 +0.0% 632.44 unixbench.time.elapsed_time
> >>>> 632.43 +0.0% 632.44 unixbench.time.elapsed_time.max
> >>>> 11837046 -4.7% 11277727 unixbench.time.involuntary_context_switches
> >>>> 864713 +0.1% 865914 unixbench.time.major_page_faults
> >>>> 9600 +4.0% 9984 unixbench.time.maximum_resident_set_size
> >>>> 8.433e+08 +0.6% 8.48e+08 unixbench.time.minor_page_faults
> >>>> 4096 +0.0% 4096 unixbench.time.page_size
> >>>> 3741 +1.1% 3783 unixbench.time.percent_of_cpu_this_job_got
> >>>> 18341 +1.3% 18572 unixbench.time.system_time
> >>>> 5323 +0.6% 5353 unixbench.time.user_time
> >>>> 78197044 -3.1% 75791701 unixbench.time.voluntary_context_switches
> >>>> 57178573 +0.4% 57399061 unixbench.workload
> >>>>
> >>>> There is no much difference with a53ce18cacb477dd applied or not.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> >>>> ---------------- ---------------------------
> >>>> %stddev %change %stddev
> >>>> \ | \
> >>>> 19985 +8.6% 21697 unixbench.score
> >>>> 632.64 -0.0% 632.53 unixbench.time.elapsed_time
> >>>> 632.64 -0.0% 632.53 unixbench.time.elapsed_time.max
> >>>> 11453985 +3.7% 11880259 unixbench.time.involuntary_context_switches
> >>>> 818996 +3.1% 844681 unixbench.time.major_page_faults
> >>>> 9600 +0.0% 9600 unixbench.time.maximum_resident_set_size
> >>>> 7.911e+08 +8.4% 8.575e+08 unixbench.time.minor_page_faults
> >>>> 4096 +0.0% 4096 unixbench.time.page_size
> >>>> 3767 -0.4% 3752 unixbench.time.percent_of_cpu_this_job_got
> >>>> 18873 -2.4% 18423 unixbench.time.system_time
> >>>> 4960 +7.1% 5313 unixbench.time.user_time
> >>>> 75436000 +10.8% 83581483 unixbench.time.voluntary_context_switches
> >>>> 53553404 +8.7% 58235303 unixbench.workload
> >>>>
> >>>> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
> >>>> remains with a53ce18cacb477dd applied.
> >>>>
> >>>> Can you send the full test script so I can have a try locally?
> >>>
> >>> Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
> >>> And that’s ’shell8’ with ‘-c 448’ copies passed as argument.
> >>>
> >>> Thanks,
> >>> Saeed
> >>>
> >>>>
> >>>> thanks,
> >>>> Chenyu
>

2023-07-26 01:06:11

by Saeed Mirzamohammadi

[permalink] [raw]

Subject: Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

Hi Vincent,

> On Jul 21, 2023, at 7:01 AM, Vincent Guittot <[email protected]> wrote:
>
> Hi Saeed,
>
> On Fri, 21 Jul 2023 at 01:04, Saeed Mirzamohammadi
> <[email protected]> wrote:
>>
>> Hi Vincent,
>>
>>> On Jun 30, 2023, at 1:28 AM, Vincent Guittot <[email protected]> wrote:
>>>
>>> On Fri, 30 Jun 2023 at 00:20, Saeed Mirzamohammadi
>>> <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>>> On Jun 21, 2023, at 9:41 AM, Saeed Mirzamohammadi <[email protected]> wrote:
>>>>>
>>>>> Hi Chen, Vincent,
>>>>>
>>>>>> On Jun 13, 2023, at 11:37 PM, Chen Yu <[email protected]> wrote:
>>>>>>
>>>>>> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
>>>>>>> Hi Vincent,
>>>>>>>
>>>>>>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi Saeed,
>>>>>>>>
>>>>>>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>>>>>>>>>
>>>>>>>>> Commit Data:
>>>>>>>>> commit-id : a53ce18cacb477dd0513c607f187d16f0fa96f71
>>>>>>>>> subject : sched/fair: Sanitize vruntime of entity being migrated
>>>>>>>>> author : [email protected]
>>>>>>>>> author date : 2023-03-17 16:08:10
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
>>>>>>>>
>>>>>>>> It would be good to confirm that the regression is present on v6.3
>>>>>>>> where the patch has been merged originally. It can be that there is
>>>>>>>> hidden dependency with other patches introduced since v5.4
>>>>>>>
>>>>>>> Regression is present on v6.3 as well, examples:
>>>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
>>>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
>>>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
>>>>>
>>>>> Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
>>>>> v6.3.y -> no regression
>>>>> v5.15.y -> no regression
>>>>> v5.4.y -> 5-8% regression.
>>>>
>>>> A gentle reminder if there is any recommendation for v5.4.y and v4.14.y regression. Thanks!
>>>
>>> I tried to find why the regression happens only for v5.4.y (or lower)
>>> and not for v5.15.y (or above) but I haven't been able to find any
>>> possible reason in the code.
>>>
>>> Regarding the 2 commits below, they must come together so we can't
>>> simply revert 1 and not the other.
>>> commit 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed
>>> commit a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
>>>
>> Tests were done before and after these 2 commits.
>>
>>> entity_is_long_sleeper() should never return true in your case. Could
>>> you try to check that it's the case for you ?
>>>
>> Tested this and entity_is_long_sleeper() never returns True.
>>
>> I actually removed the related part, tested, and the regression is gone with the following change (partial revert):
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 3ebd2054996bc..0d70dd6e14844 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -792,9 +792,6 @@ static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
>>
>> void activate_task(struct rq *rq, struct task_struct *p, int flags)
>> {
>> - if (task_on_rq_migrating(p))
>> - flags |= ENQUEUE_MIGRATED;
>> -
>> if (task_contributes_to_load(p))
>> rq->nr_uninterruptible--;
>>
>
> Is the regression still there if you only apply the partial revert
> below but not the above part ?
Regression is still gone after I added back the following change from partial revert:

+ if (task_on_rq_migrating(p))
+ flags |= ENQUEUE_MIGRATED;
+

So this partial revert below is fixing the regression:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e19fe88914574..ccc0acd477a09 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3777,9 +3777,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)

if (flags & ENQUEUE_WAKEUP)
place_entity(cfs_rq, se, 0);
- /* Entity has migrated, no longer consider this task hot */
- if (flags & ENQUEUE_MIGRATED)
- se->exec_start = 0;

check_schedstat_required();
update_stats_enqueue(cfs_rq, se, flags);
@@ -6180,6 +6177,9 @@ static void migrate_task_rq_fair(struct task_struct *p)

/* Tell new CPU we are migrated */
p->se.avg.last_update_time = 0;
+
+ /* We have migrated, no longer consider this task hot */
+ p->se.exec_start = 0;
}

static void task_dead_fair(struct task_struct *p)

> I have rechecked the code but can't see any obvious reason why there
> is a regression on v5.4 and not on v5.15.
>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 83a7cf62c0f53..ef9aca05c7bdf 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3779,9 +3779,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>>
>> if (flags & ENQUEUE_WAKEUP)
>> place_entity(cfs_rq, se, 0);
>> - /* Entity has migrated, no longer consider this task hot */
>> - if (flags & ENQUEUE_MIGRATED)
>> - se->exec_start = 0;
>>
>> check_schedstat_required();
>> update_stats_enqueue(cfs_rq, se, flags);
>> @@ -6182,6 +6179,9 @@ static void migrate_task_rq_fair(struct task_struct *p)
>>
>> /* Tell new CPU we are migrated */
>> p->se.avg.last_update_time = 0;
>> +
>> + /* We have migrated, no longer consider this task hot */
>> + p->se.exec_start = 0;
>> }
>>
>> static void task_dead_fair(struct task_struct *p)
>>
>>
>>>
>>>
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent : -0.01%
>>>>>>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent : -0.1%
>>>>>>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent : -0.12%%
>>>>>>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent : -2.29%%
>>>>>>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent : -4.22%
>>>>>>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent : -4.23%
>>>>>>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent : -5.54%
>>>>>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent : -8%
>>>>>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent : -7.05%
>>>>>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent : -6.4%
>>>>>>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent : -8.35%
>>>>>>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent : -7.09%
>>>>>>>>>
>>>>>>>>> Link to unixbench:
>>>>>>>>> github.com/kdlucas/byte-unixbench
>>>>>>>>
>>>>>>>> I tried to reproduce the problem with v6.3 on my system but I don't
>>>>>>>> see any difference with or without the patch
>>>>>>>>
>>>>>>>> Do you have more details on your setup ? number of cpu and topology ?
>>>>>>>>
>>>>>>> model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>>>>>>
>>>>>>> Topology:
>>>>>>> node 0 1
>>>>>>> 0: 10 21
>>>>>>> 1: 21 10
>>>>>>>
>>>>>>> Architecture: x86_64
>>>>>>> CPU op-mode(s): 32-bit, 64-bit
>>>>>>> CPU(s): 56
>>>>>>> On-line CPU(s) list: 0-55
>>>>>>> Thread(s) per core: 2
>>>>>>> Core(s) per socket: 14
>>>>>>> Socket(s): 2
>>>>>>> NUMA node(s): 2
>>>>>>>
>>>>>> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
>>>>>> 24 cores/48 CPUs in total, however I could not reproduce the issue.
>>>>>> Since the regression was reported mainly against 224 and 448 copies case
>>>>>> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
>>>>>>
>>>>>>
>>>>>> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
>>>>>> ---------------- ---------------------------
>>>>>> %stddev %change %stddev
>>>>>> \ | \
>>>>>> 21304 +0.5% 21420 unixbench.score
>>>>>> 632.43 +0.0% 632.44 unixbench.time.elapsed_time
>>>>>> 632.43 +0.0% 632.44 unixbench.time.elapsed_time.max
>>>>>> 11837046 -4.7% 11277727 unixbench.time.involuntary_context_switches
>>>>>> 864713 +0.1% 865914 unixbench.time.major_page_faults
>>>>>> 9600 +4.0% 9984 unixbench.time.maximum_resident_set_size
>>>>>> 8.433e+08 +0.6% 8.48e+08 unixbench.time.minor_page_faults
>>>>>> 4096 +0.0% 4096 unixbench.time.page_size
>>>>>> 3741 +1.1% 3783 unixbench.time.percent_of_cpu_this_job_got
>>>>>> 18341 +1.3% 18572 unixbench.time.system_time
>>>>>> 5323 +0.6% 5353 unixbench.time.user_time
>>>>>> 78197044 -3.1% 75791701 unixbench.time.voluntary_context_switches
>>>>>> 57178573 +0.4% 57399061 unixbench.workload
>>>>>>
>>>>>> There is no much difference with a53ce18cacb477dd applied or not.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
>>>>>> ---------------- ---------------------------
>>>>>> %stddev %change %stddev
>>>>>> \ | \
>>>>>> 19985 +8.6% 21697 unixbench.score
>>>>>> 632.64 -0.0% 632.53 unixbench.time.elapsed_time
>>>>>> 632.64 -0.0% 632.53 unixbench.time.elapsed_time.max
>>>>>> 11453985 +3.7% 11880259 unixbench.time.involuntary_context_switches
>>>>>> 818996 +3.1% 844681 unixbench.time.major_page_faults
>>>>>> 9600 +0.0% 9600 unixbench.time.maximum_resident_set_size
>>>>>> 7.911e+08 +8.4% 8.575e+08 unixbench.time.minor_page_faults
>>>>>> 4096 +0.0% 4096 unixbench.time.page_size
>>>>>> 3767 -0.4% 3752 unixbench.time.percent_of_cpu_this_job_got
>>>>>> 18873 -2.4% 18423 unixbench.time.system_time
>>>>>> 4960 +7.1% 5313 unixbench.time.user_time
>>>>>> 75436000 +10.8% 83581483 unixbench.time.voluntary_context_switches
>>>>>> 53553404 +8.7% 58235303 unixbench.workload
>>>>>>
>>>>>> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
>>>>>> remains with a53ce18cacb477dd applied.
>>>>>>
>>>>>> Can you send the full test script so I can have a try locally?
>>>>>
>>>>> Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
>>>>> And that’s ’shell8’ with ‘-c 448’ copies passed as argument.
>>>>>
>>>>> Thanks,
>>>>> Saeed
>>>>>
>>>>>>
>>>>>> thanks,
>>>>>> Chenyu