2023-03-06 14:21:13

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 00/10] sched: EEVDF using latency-nice

Hi!

Ever since looking at the latency-nice patches, I've wondered if EEVDF would
not make more sense, and I did point Vincent at some older patches I had for
that (which is here his augmented rbtree thing comes from).

Also, since I really dislike the dual tree, I also figured we could dynamically
switch between an augmented tree and not (and while I have code for that,
that's not included in this posting because with the current results I don't
think we actually need this).

Anyway, since I'm somewhat under the weather, I spend last week desperately
trying to connect a small cluster of neurons in defiance of the snot overlord
and bring back the EEVDF patches from the dark crypts where they'd been
gathering cobwebs for the past 13 odd years.

By friday they worked well enough, and this morning (because obviously I forgot
the weekend is ideal to run benchmarks) I ran a bunch of hackbenck, netperf,
tbench and sysbench -- there's a bunch of wins and losses, but nothing that
indicates a total fail.

( in fact, some of the schbench results seem to indicate EEVDF schedules a lot
more consistent than CFS and has a bunch of latency wins )

( hackbench also doesn't show the augmented tree and generally more expensive
pick to be a loss, in fact it shows a slight win here )


hackbech load + cyclictest --policy other results:


EEVDF CFS

# Min Latencies: 00053
LNICE(19) # Avg Latencies: 04350
# Max Latencies: 76019

# Min Latencies: 00052 00053
LNICE(0) # Avg Latencies: 00690 00687
# Max Latencies: 14145 13913

# Min Latencies: 00019
LNICE(-19) # Avg Latencies: 00261
# Max Latencies: 05642


The nice -19 numbers aren't as pretty as Vincent's, but at the end I was going
cross-eyed from staring at tree prints and I just couldn't figure out where it
was going side-ways.

There's definitely more benchmarking/tweaking to be done (0-day already
reported a stress-ng loss), but if we can pull this off we can delete a whole
much of icky heuristics code. EEVDF is a much better defined policy than what
we currently have.




2023-03-07 10:28:58

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 00/10] sched: EEVDF using latency-nice

On Mon, 6 Mar 2023 at 15:17, Peter Zijlstra <[email protected]> wrote:
>
> Hi!
>
> Ever since looking at the latency-nice patches, I've wondered if EEVDF would
> not make more sense, and I did point Vincent at some older patches I had for
> that (which is here his augmented rbtree thing comes from).
>
> Also, since I really dislike the dual tree, I also figured we could dynamically
> switch between an augmented tree and not (and while I have code for that,
> that's not included in this posting because with the current results I don't
> think we actually need this).
>
> Anyway, since I'm somewhat under the weather, I spend last week desperately
> trying to connect a small cluster of neurons in defiance of the snot overlord
> and bring back the EEVDF patches from the dark crypts where they'd been
> gathering cobwebs for the past 13 odd years.

I haven't studied your patchset in detail yet but at a 1st glance this
seems to be a major rework on the cfs task placement and the latency
is just an add-on on top of moving to the EEVDF scheduling.

>
> By friday they worked well enough, and this morning (because obviously I forgot
> the weekend is ideal to run benchmarks) I ran a bunch of hackbenck, netperf,
> tbench and sysbench -- there's a bunch of wins and losses, but nothing that
> indicates a total fail.
>
> ( in fact, some of the schbench results seem to indicate EEVDF schedules a lot
> more consistent than CFS and has a bunch of latency wins )
>
> ( hackbench also doesn't show the augmented tree and generally more expensive
> pick to be a loss, in fact it shows a slight win here )
>
>
> hackbech load + cyclictest --policy other results:
>
>
> EEVDF CFS
>
> # Min Latencies: 00053
> LNICE(19) # Avg Latencies: 04350
> # Max Latencies: 76019
>
> # Min Latencies: 00052 00053
> LNICE(0) # Avg Latencies: 00690 00687
> # Max Latencies: 14145 13913
>
> # Min Latencies: 00019
> LNICE(-19) # Avg Latencies: 00261
> # Max Latencies: 05642
>
>
> The nice -19 numbers aren't as pretty as Vincent's, but at the end I was going
> cross-eyed from staring at tree prints and I just couldn't figure out where it
> was going side-ways.
>
> There's definitely more benchmarking/tweaking to be done (0-day already
> reported a stress-ng loss), but if we can pull this off we can delete a whole
> much of icky heuristics code. EEVDF is a much better defined policy than what
> we currently have.
>
>

2023-03-07 13:47:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/10] sched: EEVDF using latency-nice

On Tue, Mar 07, 2023 at 11:27:37AM +0100, Vincent Guittot wrote:
> On Mon, 6 Mar 2023 at 15:17, Peter Zijlstra <[email protected]> wrote:
> >
> > Hi!
> >
> > Ever since looking at the latency-nice patches, I've wondered if EEVDF would
> > not make more sense, and I did point Vincent at some older patches I had for
> > that (which is here his augmented rbtree thing comes from).
> >
> > Also, since I really dislike the dual tree, I also figured we could dynamically
> > switch between an augmented tree and not (and while I have code for that,
> > that's not included in this posting because with the current results I don't
> > think we actually need this).
> >
> > Anyway, since I'm somewhat under the weather, I spend last week desperately
> > trying to connect a small cluster of neurons in defiance of the snot overlord
> > and bring back the EEVDF patches from the dark crypts where they'd been
> > gathering cobwebs for the past 13 odd years.
>
> I haven't studied your patchset in detail yet but at a 1st glance this
> seems to be a major rework on the cfs task placement and the latency
> is just an add-on on top of moving to the EEVDF scheduling.

It completely reworks the base scheduler, placement, preemption, picking
-- everything. The only thing they have in common is that they're both a
virtual time based scheduler.

The big advantage I see is that EEVDF is fairly well known and studied,
and a much better defined scheduler than WFQ. Specifically, where WFQ is
only well defined in how much time is given to any task (bandwidth), but
says nothing about how that is distributed in time. That is, there is no
native preemption condition/constraint etc. -- all that code we have is
random heuristics mostly.

The WF2Q/EEVDF class of schedulers otoh *do* define all that. There is a
lot less wiggle room as a result. The avg_vruntime / placement stuff I
did is fundamental to how it controls bandwidth distribution and
guarantees the WFQ subset. Specifically, by limiting the pick to that
subset of tasks that has positive lag (owed time), it guarantees this
fairness -- but that means we need a working measure of lag.

Similarly, since the whole 'when' thing is well defined in order to
provide the additional latency goals of these schedulers, placement is
crucial. Things like sleeper bonus is fundamentally incompatible with
latency guarantees -- both affect the 'when'.


Initial EEVDF paper is here:

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=805acf7726282721504c8f00575d91ebfd750564

It contains a few 'mistakes' and oversights, but those should not
matter.

Anyway, I'm still struggling to make complete sense of what you did --
will continue to stare at that.



2023-03-08 15:14:42

by Shrikanth Hegde

[permalink] [raw]
Subject: Re: [PATCH 00/10] sched: EEVDF using latency-nice




> Hi!
>
> Ever since looking at the latency-nice patches, I've wondered if EEVDF would
> not make more sense, and I did point Vincent at some older patches I had for
> that (which is here his augmented rbtree thing comes from).
>
> Also, since I really dislike the dual tree, I also figured we could dynamically
> switch between an augmented tree and not (and while I have code for that,
> that's not included in this posting because with the current results I don't
> think we actually need this).
>
> Anyway, since I'm somewhat under the weather, I spend last week desperately
> trying to connect a small cluster of neurons in defiance of the snot overlord
> and bring back the EEVDF patches from the dark crypts where they'd been
> gathering cobwebs for the past 13 odd years.
>
> By friday they worked well enough, and this morning (because obviously I forgot
> the weekend is ideal to run benchmarks) I ran a bunch of hackbenck, netperf,
> tbench and sysbench -- there's a bunch of wins and losses, but nothing that
> indicates a total fail.
>
> ( in fact, some of the schbench results seem to indicate EEVDF schedules a lot
> more consistent than CFS and has a bunch of latency wins )
>
> ( hackbench also doesn't show the augmented tree and generally more expensive
> pick to be a loss, in fact it shows a slight win here )
>
>
> hackbech load + cyclictest --policy other results:
>
>
> EEVDF CFS
>
> # Min Latencies: 00053
> LNICE(19) # Avg Latencies: 04350
> # Max Latencies: 76019
>
> # Min Latencies: 00052 00053
> LNICE(0) # Avg Latencies: 00690 00687
> # Max Latencies: 14145 13913
>
> # Min Latencies: 00019
> LNICE(-19) # Avg Latencies: 00261
> # Max Latencies: 05642
>
>
> The nice -19 numbers aren't as pretty as Vincent's, but at the end I was going
> cross-eyed from staring at tree prints and I just couldn't figure out where it
> was going side-ways.
>
> There's definitely more benchmarking/tweaking to be done (0-day already
> reported a stress-ng loss), but if we can pull this off we can delete a whole
> much of icky heuristics code. EEVDF is a much better defined policy than what
> we currently have.
>
Tested the patch series on powerpc systems. This test is done in the same way
that was done for vincent's V12 series.

Creating two cgroups. In cgroup2 running stress-ng -l 50 --cpu=<total_cpu> and
in cgroup1 running micro benchmarks. Different latency values are assigned to
cgroup1.

Tested on two different system. One system has 480 CPU and other one has 96
CPU.

++++++++
Summary:
++++++++
For hackbench, 480 CPU system shows good improvement.
96 CPU system shows same numbers as 6.2. Smaller system was showing regressing
results as discussed in Vincent's V12 series. With this patch, there is no regression.

Schbench shows good improvement compared to v6.2 at LN=0 or LN=-20. Whereas
at LN=19, it shows regression.

Please suggest if any variation of the benchmark or a different benchmark to be run.


++++++++++++++++++
480 CPU system
++++++++++++++++++

==========
schbench
==========

v6.2 | v6.2+LN=0 | v6.2+LN=-20 | v6.2+LN=19
1 Threads
50.0th: 14.00 | 12.00 | 14.50 | 15.00
75.0th: 16.50 | 14.50 | 17.00 | 18.00
90.0th: 18.50 | 17.00 | 19.50 | 20.00
95.0th: 20.50 | 18.50 | 22.00 | 23.50
99.0th: 27.50 | 24.50 | 31.50 | 155.00
99.5th: 36.00 | 30.00 | 44.50 | 2991.00
99.9th: 81.50 | 171.50 | 153.00 | 4621.00
2 Threads
50.0th: 14.00 | 15.50 | 17.00 | 16.00
75.0th: 17.00 | 18.00 | 19.00 | 19.00
90.0th: 20.00 | 21.00 | 22.00 | 22.50
95.0th: 23.00 | 23.00 | 25.00 | 25.50
99.0th: 71.00 | 30.50 | 35.50 | 990.50
99.5th: 1170.00 | 53.00 | 71.00 | 3719.00
99.9th: 5088.00 | 245.50 | 138.00 | 6644.00
4 Threads
50.0th: 20.50 | 20.00 | 20.00 | 19.50
75.0th: 24.50 | 23.00 | 23.00 | 23.50
90.0th: 31.00 | 27.00 | 26.50 | 27.50
95.0th: 260.50 | 29.50 | 29.00 | 35.00
99.0th: 3644.00 | 106.00 | 37.50 | 2884.00
99.5th: 5152.00 | 227.00 | 92.00 | 5496.00
99.9th: 8076.00 | 3662.50 | 517.00 | 8640.00
8 Threads
50.0th: 26.00 | 23.50 | 22.50 | 25.00
75.0th: 32.50 | 29.50 | 27.50 | 31.00
90.0th: 41.50 | 34.50 | 31.50 | 39.00
95.0th: 794.00 | 37.00 | 34.50 | 579.50
99.0th: 5992.00 | 48.50 | 52.00 | 5872.00
99.5th: 7208.00 | 100.50 | 97.50 | 7280.00
99.9th: 9392.00 | 4098.00 | 1226.00 | 9328.00
16 Threads
50.0th: 37.50 | 33.00 | 34.00 | 37.00
75.0th: 49.50 | 43.50 | 44.00 | 49.00
90.0th: 70.00 | 52.00 | 53.00 | 66.00
95.0th: 1284.00 | 57.50 | 59.00 | 1162.50
99.0th: 5600.00 | 79.50 | 111.50 | 5912.00
99.5th: 7216.00 | 282.00 | 194.50 | 7392.00
99.9th: 9328.00 | 4026.00 | 2009.00 | 9440.00
32 Threads
50.0th: 59.00 | 56.00 | 57.00 | 59.00
75.0th: 83.00 | 77.50 | 79.00 | 83.00
90.0th: 118.50 | 94.00 | 95.00 | 120.50
95.0th: 1921.00 | 104.50 | 104.00 | 1800.00
99.0th: 6672.00 | 425.00 | 255.00 | 6384.00
99.5th: 8252.00 | 2800.00 | 1252.00 | 7696.00
99.9th: 10448.00 | 7264.00 | 5888.00 | 9504.00

=========
hackbench
=========

Process 10 0.19 | 0.18 | 0.17 | 0.18
Process 20 0.34 | 0.32 | 0.33 | 0.31
Process 30 0.45 | 0.42 | 0.43 | 0.43
Process 40 0.58 | 0.53 | 0.53 | 0.53
Process 50 0.70 | 0.64 | 0.64 | 0.65
Process 60 0.82 | 0.74 | 0.75 | 0.76
thread 10 0.20 | 0.19 | 0.19 | 0.19
thread 20 0.36 | 0.34 | 0.34 | 0.34
Process(Pipe) 10 0.24 | 0.15 | 0.15 | 0.15
Process(Pipe) 20 0.46 | 0.22 | 0.22 | 0.21
Process(Pipe) 30 0.65 | 0.30 | 0.29 | 0.29
Process(Pipe) 40 0.90 | 0.35 | 0.36 | 0.34
Process(Pipe) 50 1.04 | 0.38 | 0.39 | 0.38
Process(Pipe) 60 1.16 | 0.42 | 0.42 | 0.43
thread(Pipe) 10 0.19 | 0.13 | 0.13 | 0.13
thread(Pipe) 20 0.46 | 0.21 | 0.21 | 0.21


++++++++++++++++++
96 CPU system
++++++++++++++++++

===========
schbench
===========
v6.2 | v6.2+LN=0 | v6.2+LN=-20 | v6.2+LN=19
1 Thread
50.0th: 10.50 | 10.00 | 10.00 | 11.00
75.0th: 12.50 | 11.50 | 11.50 | 12.50
90.0th: 15.00 | 13.00 | 13.50 | 16.50
95.0th: 47.50 | 15.00 | 15.00 | 274.50
99.0th: 4744.00 | 17.50 | 18.00 | 5032.00
99.5th: 7640.00 | 18.50 | 525.00 | 6636.00
99.9th: 8916.00 | 538.00 | 6704.00 | 9264.00
2 Threads
50.0th: 11.00 | 10.00 | 11.00 | 11.00
75.0th: 13.50 | 12.00 | 12.50 | 13.50
90.0th: 17.00 | 14.00 | 14.00 | 17.00
95.0th: 451.50 | 16.00 | 15.50 | 839.00
99.0th: 5488.00 | 20.50 | 18.00 | 6312.00
99.5th: 6712.00 | 986.00 | 19.00 | 7664.00
99.9th: 9856.00 | 4913.00 | 1154.00 | 8736.00
4 Threads
50.0th: 13.00 | 12.00 | 12.00 | 13.00
75.0th: 15.00 | 14.00 | 14.00 | 15.00
90.0th: 23.50 | 16.00 | 16.00 | 20.00
95.0th: 2508.00 | 17.50 | 17.50 | 1818.00
99.0th: 7232.00 | 777.00 | 38.50 | 5952.00
99.5th: 8720.00 | 3548.00 | 1926.00 | 7788.00
99.9th: 10352.00 | 6320.00 | 7160.00 | 10000.00
8 Threads
50.0th: 16.00 | 15.00 | 15.00 | 16.00
75.0th: 20.00 | 18.00 | 18.00 | 19.50
90.0th: 371.50 | 20.00 | 21.00 | 245.50
95.0th: 2992.00 | 22.00 | 23.00 | 2608.00
99.0th: 7784.00 | 1084.50 | 563.50 | 7136.00
99.5th: 9488.00 | 2612.00 | 2696.00 | 8720.00
99.9th: 15568.00 | 6656.00 | 7496.00 | 10000.00
16 Threads
50.0th: 23.00 | 21.00 | 20.00 | 22.50
75.0th: 31.00 | 27.50 | 26.00 | 29.50
90.0th: 1981.00 | 32.50 | 30.50 | 1500.50
95.0th: 4856.00 | 304.50 | 34.00 | 4046.00
99.0th: 10112.00 | 5720.00 | 4590.00 | 8220.00
99.5th: 13104.00 | 7828.00 | 7008.00 | 9312.00
99.9th: 18624.00 | 9856.00 | 9504.00 | 11984.00
32 Threads
50.0th: 36.50 | 34.50 | 33.50 | 35.50
75.0th: 56.50 | 48.00 | 46.00 | 52.50
90.0th: 4728.00 | 1470.50 | 376.00 | 3624.00
95.0th: 7808.00 | 4130.00 | 3850.00 | 6488.00
99.0th: 15776.00 | 8972.00 | 9060.00 | 9872.00
99.5th: 19072.00 | 11328.00 | 12224.00 | 11520.00
99.9th: 28864.00 | 18016.00 | 18368.00 | 18848.00


==========
Hackbench
=========

Type groups v6.2 | v6.2+LN=0 | v6.2+LN=-20 | v6.2+LN=19
Process 10 0.33 | 0.33 | 0.33 | 0.33
Process 20 0.61 | 0.56 | 0.58 | 0.57
Process 30 0.87 | 0.82 | 0.81 | 0.81
Process 40 1.10 | 1.05 | 1.06 | 1.05
Process 50 1.34 | 1.28 | 1.29 | 1.29
Process 60 1.58 | 1.53 | 1.52 | 1.51
thread 10 0.36 | 0.35 | 0.35 | 0.35
thread 20 0.64 | 0.63 | 0.62 | 0.62
Process(Pipe) 10 0.18 | 0.18 | 0.18 | 0.17
Process(Pipe) 20 0.32 | 0.31 | 0.31 | 0.31
Process(Pipe) 30 0.42 | 0.41 | 0.41 | 0.42
Process(Pipe) 40 0.56 | 0.53 | 0.55 | 0.53
Process(Pipe) 50 0.68 | 0.66 | 0.66 | 0.66
Process(Pipe) 60 0.80 | 0.78 | 0.78 | 0.78
thread(Pipe) 10 0.20 | 0.18 | 0.19 | 0.18
thread(Pipe) 20 0.34 | 0.34 | 0.33 | 0.33

Tested-by: Shrikanth Hegde <[email protected]>



2023-03-22 06:51:44

by K Prateek Nayak

[permalink] [raw]
Subject: Re: [PATCH 00/10] sched: EEVDF using latency-nice

Hello Peter,

Leaving some results from my testing on a dual socket Zen3 machine
(2 x 64C/128T) below.

tl;dr

o I've not tested workloads with nice and latency nice yet focusing more
on the out of the box performance. No changes to sched_feat were made
for the same reason.

o Except for hackbench (m:n communication relationship), I do not see any
regression for other standard benchmarks (mostly 1:1 or 1:n) relation
when system is below fully loaded.

o At fully loaded scenario, schbench seems to be unhappy. Looking at the
data from /proc/<pid>/sched for the tasks with schedstats enabled,
there is an increase in number of context switches and the total wait
sum. When system is overloaded, things flip and the schbench tail
latency improves drastically. I suspect the involuntary
context-switches help workers make progress much sooner after wakeup
compared to tip thus leading to lower tail latency.

o For the same reason as above, tbench throughput takes a hit with
number of involuntary context-switches increasing drastically for the
tbench server. There is also an increase in wait sum noticed.

o Couple of real world workloads were also tested. DeathStarBench
throughput tanks much more with the updated version in your tree
compared to this series as is.
SpecJBB Max-jOPS sees large improvements but comes at a cost of
drop in Critical-jOPS signifying an increase in either wait time
or an increase in involuntary context-switches which can lead to
transactions taking longer to complete.

o Apart from DeathStarBench, the all the trends reported remain same
comparing the version in your tree and this series, as is, applied
on the same base kernel.

I'll leave the detailed results below and some limited analysis.

On 3/6/2023 6:55 PM, Peter Zijlstra wrote:
> Hi!
>
> Ever since looking at the latency-nice patches, I've wondered if EEVDF would
> not make more sense, and I did point Vincent at some older patches I had for
> that (which is here his augmented rbtree thing comes from).
>
> Also, since I really dislike the dual tree, I also figured we could dynamically
> switch between an augmented tree and not (and while I have code for that,
> that's not included in this posting because with the current results I don't
> think we actually need this).
>
> Anyway, since I'm somewhat under the weather, I spend last week desperately
> trying to connect a small cluster of neurons in defiance of the snot overlord
> and bring back the EEVDF patches from the dark crypts where they'd been
> gathering cobwebs for the past 13 odd years.
>
> By friday they worked well enough, and this morning (because obviously I forgot
> the weekend is ideal to run benchmarks) I ran a bunch of hackbenck, netperf,
> tbench and sysbench -- there's a bunch of wins and losses, but nothing that
> indicates a total fail.
>
> ( in fact, some of the schbench results seem to indicate EEVDF schedules a lot
> more consistent than CFS and has a bunch of latency wins )
>
> ( hackbench also doesn't show the augmented tree and generally more expensive
> pick to be a loss, in fact it shows a slight win here )
>
>
> hackbech load + cyclictest --policy other results:
>
>
> EEVDF CFS
>
> # Min Latencies: 00053
> LNICE(19) # Avg Latencies: 04350
> # Max Latencies: 76019
>
> # Min Latencies: 00052 00053
> LNICE(0) # Avg Latencies: 00690 00687
> # Max Latencies: 14145 13913
>
> # Min Latencies: 00019
> LNICE(-19) # Avg Latencies: 00261
> # Max Latencies: 05642
>

Following are the results from testing the series on a dual socket
Zen3 machine (2 x 64C/128T):

NPS Modes are used to logically divide single socket into
multiple NUMA region.
Following is the NUMA configuration for each NPS mode on the system:

NPS1: Each socket is a NUMA node.
Total 2 NUMA nodes in the dual socket machine.

Node 0: 0-63, 128-191
Node 1: 64-127, 192-255

NPS2: Each socket is further logically divided into 2 NUMA regions.
Total 4 NUMA nodes exist over 2 socket.

Node 0: 0-31, 128-159
Node 1: 32-63, 160-191
Node 2: 64-95, 192-223
Node 3: 96-127, 223-255

NPS4: Each socket is logically divided into 4 NUMA regions.
Total 8 NUMA nodes exist over 2 socket.

Node 0: 0-15, 128-143
Node 1: 16-31, 144-159
Node 2: 32-47, 160-175
Node 3: 48-63, 176-191
Node 4: 64-79, 192-207
Node 5: 80-95, 208-223
Node 6: 96-111, 223-231
Node 7: 112-127, 232-255

Kernel versions:
- tip: 6.2.0-rc6 tip sched/core
- eevdf: 6.2.0-rc6 tip sched/core
+ eevdf commits from your tree
(https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/eevdf)

- eevdf prev: 6.2.0-rc6 tip sched/core + this series as is

When the testing started, the tip was at:
commit 7c4a5b89a0b5 "sched/rt: pick_next_rt_entity(): check list_entry"

Benchmark Results:

~~~~~~~~~~~~~
~ hackbench ~
~~~~~~~~~~~~~

o NPS1

Test: tip eevdf
1-groups: 4.63 (0.00 pct) 4.52 (2.37 pct)
2-groups: 4.42 (0.00 pct) 5.41 (-22.39 pct) *
4-groups: 4.21 (0.00 pct) 5.26 (-24.94 pct) *
8-groups: 4.95 (0.00 pct) 5.01 (-1.21 pct)
16-groups: 5.43 (0.00 pct) 6.24 (-14.91 pct) *

o NPS2

Test: tip eevdf
1-groups: 4.68 (0.00 pct) 4.56 (2.56 pct)
2-groups: 4.45 (0.00 pct) 5.19 (-16.62 pct) *
4-groups: 4.19 (0.00 pct) 4.53 (-8.11 pct) *
8-groups: 4.80 (0.00 pct) 4.81 (-0.20 pct)
16-groups: 5.60 (0.00 pct) 6.22 (-11.07 pct) *

o NPS4

Test: tip eevdf
1-groups: 4.68 (0.00 pct) 4.57 (2.35 pct)
2-groups: 4.56 (0.00 pct) 5.19 (-13.81 pct) *
4-groups: 4.50 (0.00 pct) 4.96 (-10.22 pct) *
8-groups: 5.76 (0.00 pct) 5.49 (4.68 pct)
16-groups: 5.60 (0.00 pct) 6.53 (-16.60 pct) *

~~~~~~~~~~~~
~ schbench ~
~~~~~~~~~~~~

o NPS1

#workers: tip eevdf
1: 36.00 (0.00 pct) 36.00 (0.00 pct)
2: 37.00 (0.00 pct) 37.00 (0.00 pct)
4: 38.00 (0.00 pct) 39.00 (-2.63 pct)
8: 52.00 (0.00 pct) 50.00 (3.84 pct)
16: 66.00 (0.00 pct) 68.00 (-3.03 pct)
32: 111.00 (0.00 pct) 109.00 (1.80 pct)
64: 213.00 (0.00 pct) 212.00 (0.46 pct)
128: 502.00 (0.00 pct) 637.00 (-26.89 pct) *
256: 45632.00 (0.00 pct) 24992.00 (45.23 pct) ^
512: 78720.00 (0.00 pct) 44096.00 (43.98 pct) ^

o NPS2

#workers: tip eevdf
1: 31.00 (0.00 pct) 23.00 (25.80 pct)
2: 32.00 (0.00 pct) 33.00 (-3.12 pct)
4: 39.00 (0.00 pct) 37.00 (5.12 pct)
8: 52.00 (0.00 pct) 49.00 (5.76 pct)
16: 67.00 (0.00 pct) 68.00 (-1.49 pct)
32: 113.00 (0.00 pct) 112.00 (0.88 pct)
64: 213.00 (0.00 pct) 214.00 (-0.46 pct)
128: 508.00 (0.00 pct) 491.00 (3.34 pct)
256: 46912.00 (0.00 pct) 22304.00 (52.45 pct) ^
512: 76672.00 (0.00 pct) 42944.00 (43.98 pct) ^

o NPS4

#workers: tip eevdf
1: 33.00 (0.00 pct) 30.00 (9.09 pct)
2: 40.00 (0.00 pct) 36.00 (10.00 pct)
4: 44.00 (0.00 pct) 41.00 (6.81 pct)
8: 73.00 (0.00 pct) 73.00 (0.00 pct)
16: 71.00 (0.00 pct) 71.00 (0.00 pct)
32: 111.00 (0.00 pct) 115.00 (-3.60 pct)
64: 217.00 (0.00 pct) 211.00 (2.76 pct)
128: 509.00 (0.00 pct) 553.00 (-8.64 pct) *
256: 44352.00 (0.00 pct) 26848.00 (39.46 pct) ^
512: 75392.00 (0.00 pct) 44352.00 (41.17 pct) ^


~~~~~~~~~~
~ tbench ~
~~~~~~~~~~

o NPS1

Clients: tip eevdf
1 483.10 (0.00 pct) 476.46 (-1.37 pct)
2 956.03 (0.00 pct) 943.12 (-1.35 pct)
4 1786.36 (0.00 pct) 1760.64 (-1.43 pct)
8 3304.47 (0.00 pct) 3105.19 (-6.03 pct)
16 5440.44 (0.00 pct) 5609.24 (3.10 pct)
32 10462.02 (0.00 pct) 10416.02 (-0.43 pct)
64 18995.99 (0.00 pct) 19317.34 (1.69 pct)
128 27896.44 (0.00 pct) 28459.38 (2.01 pct)
256 49742.89 (0.00 pct) 46371.44 (-6.77 pct) *
512 49583.01 (0.00 pct) 45717.22 (-7.79 pct) *
1024 48467.75 (0.00 pct) 43475.31 (-10.30 pct) *

o NPS2

Clients: tip eevdf
1 472.57 (0.00 pct) 475.35 (0.58 pct)
2 938.27 (0.00 pct) 942.19 (0.41 pct)
4 1764.34 (0.00 pct) 1783.50 (1.08 pct)
8 3043.57 (0.00 pct) 3205.85 (5.33 pct)
16 5103.53 (0.00 pct) 5154.94 (1.00 pct)
32 9767.22 (0.00 pct) 9793.81 (0.27 pct)
64 18712.65 (0.00 pct) 18601.10 (-0.59 pct)
128 27691.95 (0.00 pct) 27542.57 (-0.53 pct)
256 47939.24 (0.00 pct) 43401.62 (-9.46 pct) *
512 47843.70 (0.00 pct) 43971.16 (-8.09 pct) *
1024 48412.05 (0.00 pct) 42808.58 (-11.57 pct) *

o NPS4

Clients: tip eevdf
1 486.74 (0.00 pct) 484.88 (-0.38 pct)
2 950.50 (0.00 pct) 950.04 (-0.04 pct)
4 1778.58 (0.00 pct) 1796.03 (0.98 pct)
8 3106.36 (0.00 pct) 3180.09 (2.37 pct)
16 5139.81 (0.00 pct) 5139.50 (0.00 pct)
32 9911.04 (0.00 pct) 10086.37 (1.76 pct)
64 18201.46 (0.00 pct) 18289.40 (0.48 pct)
128 27284.67 (0.00 pct) 26947.19 (-1.23 pct)
256 46793.72 (0.00 pct) 43971.87 (-6.03 pct) *
512 48841.96 (0.00 pct) 44255.01 (-9.39 pct) *
1024 48811.99 (0.00 pct) 43118.99 (-11.66 pct) *

~~~~~~~~~~
~ stream ~
~~~~~~~~~~

o NPS1

- 10 Runs:

Test: tip eevdf
Copy: 321229.54 (0.00 pct) 332975.45 (3.65 pct)
Scale: 207471.32 (0.00 pct) 212534.83 (2.44 pct)
Add: 234962.15 (0.00 pct) 243011.39 (3.42 pct)
Triad: 246256.00 (0.00 pct) 256453.73 (4.14 pct)

- 100 Runs:

Test: tip eevdf
Copy: 332714.94 (0.00 pct) 333183.42 (0.14 pct)
Scale: 216140.84 (0.00 pct) 212160.53 (-1.84 pct)
Add: 239605.00 (0.00 pct) 233168.69 (-2.68 pct)
Triad: 258580.84 (0.00 pct) 256972.33 (-0.62 pct)

o NPS2

- 10 Runs:

Test: tip eevdf
Copy: 324423.92 (0.00 pct) 340685.20 (5.01 pct)
Scale: 215993.56 (0.00 pct) 217895.31 (0.88 pct)
Add: 250590.28 (0.00 pct) 257495.12 (2.75 pct)
Triad: 261284.44 (0.00 pct) 261373.49 (0.03 pct)

- 100 Runs:

Test: tip eevdf
Copy: 325993.72 (0.00 pct) 341244.18 (4.67 pct)
Scale: 227201.27 (0.00 pct) 227255.98 (0.02 pct)
Add: 256601.84 (0.00 pct) 258026.75 (0.55 pct)
Triad: 260222.19 (0.00 pct) 269878.75 (3.71 pct)

o NPS4

- 10 Runs:

Test: tip eevdf
Copy: 356850.80 (0.00 pct) 371230.27 (4.02 pct)
Scale: 247219.39 (0.00 pct) 237846.20 (-3.79 pct)
Add: 268588.78 (0.00 pct) 261088.54 (-2.79 pct)
Triad: 272932.59 (0.00 pct) 284068.07 (4.07 pct)

- 100 Runs:

Test: tip eevdf
Copy: 365965.18 (0.00 pct) 371186.97 (1.42 pct)
Scale: 246068.58 (0.00 pct) 245991.10 (-0.03 pct)
Add: 263677.73 (0.00 pct) 269021.14 (2.02 pct)
Triad: 273701.36 (0.00 pct) 280566.44 (2.50 pct)

~~~~~~~~~~~~~
~ Unixbench ~
~~~~~~~~~~~~~

o NPS1

Test Metric Parallelism tip eevdf
unixbench-dhry2reg Hmean unixbench-dhry2reg-1 49077561.21 ( 0.00%) 49144835.64 ( 0.14%)
unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6285373890.61 ( 0.00%) 6270537933.92 ( -0.24%)
unixbench-syscall Amean unixbench-syscall-1 2664815.40 ( 0.00%) 2679289.17 * -0.54%*
unixbench-syscall Amean unixbench-syscall-512 7848462.70 ( 0.00%) 7456802.37 * 4.99%*
unixbench-pipe Hmean unixbench-pipe-1 2531131.89 ( 0.00%) 2475863.05 * -2.18%*
unixbench-pipe Hmean unixbench-pipe-512 305171024.40 ( 0.00%) 301182156.60 ( -1.31%)
unixbench-spawn Hmean unixbench-spawn-1 4058.05 ( 0.00%) 4284.38 * 5.58%*
unixbench-spawn Hmean unixbench-spawn-512 79893.24 ( 0.00%) 78234.45 * -2.08%*
unixbench-execl Hmean unixbench-execl-1 4148.64 ( 0.00%) 4086.73 * -1.49%*
unixbench-execl Hmean unixbench-execl-512 11077.20 ( 0.00%) 11137.79 ( 0.55%)

o NPS2

Test Metric Parallelism tip eevdf
unixbench-dhry2reg Hmean unixbench-dhry2reg-1 49394822.56 ( 0.00%) 49175574.26 ( -0.44%)
unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6267817215.36 ( 0.00%) 6282838979.08 * 0.24%*
unixbench-syscall Amean unixbench-syscall-1 2663675.03 ( 0.00%) 2677018.53 * -0.50%*
unixbench-syscall Amean unixbench-syscall-512 7342392.90 ( 0.00%) 7443264.00 * -1.37%*
unixbench-pipe Hmean unixbench-pipe-1 2533194.04 ( 0.00%) 2475969.01 * -2.26%*
unixbench-pipe Hmean unixbench-pipe-512 303588239.03 ( 0.00%) 302217597.98 * -0.45%*
unixbench-spawn Hmean unixbench-spawn-1 5141.40 ( 0.00%) 4862.78 ( -5.42%) *
unixbench-spawn Hmean unixbench-spawn-512 82993.79 ( 0.00%) 79139.42 * -4.64%* *
unixbench-execl Hmean unixbench-execl-1 4140.15 ( 0.00%) 4084.20 * -1.35%*
unixbench-execl Hmean unixbench-execl-512 12229.25 ( 0.00%) 11445.22 ( -6.41%) *

o NPS4

Test Metric Parallelism tip eevdf
unixbench-dhry2reg Hmean unixbench-dhry2reg-1 48970677.27 ( 0.00%) 49070289.56 ( 0.20%)
unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6297506696.81 ( 0.00%) 6311038905.07 ( 0.21%)
unixbench-syscall Amean unixbench-syscall-1 2664715.13 ( 0.00%) 2677752.20 * -0.49%*
unixbench-syscall Amean unixbench-syscall-512 7938670.70 ( 0.00%) 7972291.60 ( -0.42%)
unixbench-pipe Hmean unixbench-pipe-1 2527605.54 ( 0.00%) 2476140.77 * -2.04%*
unixbench-pipe Hmean unixbench-pipe-512 305068507.23 ( 0.00%) 304114548.50 ( -0.31%)
unixbench-spawn Hmean unixbench-spawn-1 5207.34 ( 0.00%) 4964.39 ( -4.67%) *
unixbench-spawn Hmean unixbench-spawn-512 81352.38 ( 0.00%) 74467.00 * -8.46%* *
unixbench-execl Hmean unixbench-execl-1 4131.37 ( 0.00%) 4044.09 * -2.11%*
unixbench-execl Hmean unixbench-execl-512 13025.56 ( 0.00%) 11124.77 * -14.59%* *

~~~~~~~~~~~
~ netperf ~
~~~~~~~~~~~

o NPS1

tip eevdf
1-clients: 107932.22 (0.00 pct) 106167.39 (-1.63 pct)
2-clients: 106887.99 (0.00 pct) 105304.25 (-1.48 pct)
4-clients: 106676.11 (0.00 pct) 104328.10 (-2.20 pct)
8-clients: 98645.45 (0.00 pct) 94076.26 (-4.63 pct)
16-clients: 88881.23 (0.00 pct) 86831.85 (-2.30 pct)
32-clients: 86654.28 (0.00 pct) 86313.80 (-0.39 pct)
64-clients: 81431.90 (0.00 pct) 74885.75 (-8.03 pct)
128-clients: 55993.77 (0.00 pct) 55378.10 (-1.09 pct)
256-clients: 43865.59 (0.00 pct) 44326.30 (1.05 pct)

o NPS2

tip eevdf
1-clients: 106711.81 (0.00 pct) 108576.27 (1.74 pct)
2-clients: 106987.79 (0.00 pct) 108348.24 (1.27 pct)
4-clients: 105275.37 (0.00 pct) 105702.12 (0.40 pct)
8-clients: 103028.31 (0.00 pct) 96250.20 (-6.57 pct)
16-clients: 87382.43 (0.00 pct) 87683.29 (0.34 pct)
32-clients: 86578.14 (0.00 pct) 86968.29 (0.45 pct)
64-clients: 81470.63 (0.00 pct) 75906.15 (-6.83 pct)
128-clients: 54803.35 (0.00 pct) 55051.90 (0.45 pct)
256-clients: 42910.29 (0.00 pct) 44062.33 (2.68 pct)

~~~~~~~~~~~
~ SpecJBB ~
~~~~~~~~~~~

o NPS1

tip eevdf
Max-jOPS 100% 115.71% (+15.71%) ^
Critical-jOPS 100% 93.59% (-6.41%) *

~~~~~~~~~~~~~~~~~~
~ DeathStarBench ~
~~~~~~~~~~~~~~~~~~

o NPS1

#CCX 1 CCX 2 CCX 3 CCX 4 CCX
o eevdf compared to tip -10.93 -14.35 -9.74 -6.07
o eevdf prev (this sries as is)
compared to tip -1.99 -6.64 -4.99 -3.87

Note: #CCX is the number of LLCs the the services are pinned to.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ Some Preliminary Analysis ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

tl;dr

- There seems to be an increase in number of involuntary context switches
when the system is overloaded. This probably allows newly waking task to
make progress benefiting latency sensitive workload like schbench in
overloaded scenario compared to tip but hurts tbench performance.
When system is fully loaded, the larger average wait time seems to hurt
the schbench performance.
More analysis is needed to get to the bottom of the problem.

- For hackbench 2 groups scenario, there seems the wait time seems to go up
drastically.

Scheduler statistics of interest are listed in detail below.

Note: Units of all metrics denoting time is ms. They are processed from
per-task schedstats in /proc/<pid>/sched.

o Hackbench (2 Groups) (NPS1)

tip eevdf %diff
Comm sched-messaging sched-messaging N/A
Sum of avg_atom 282.0024818 19.04355233 -93.24702669
Average of avg_atom 3.481512121 0.235105584 -93.24702669
Sum of avg_per_cpu 1761.949461 61.52537145 -96.50810805
Average of avg_per_cpu 21.75246248 0.759572487 -96.50810805
Average of avg_wait_time 0.007239228 0.012899105 78.18343632
Sum of nr_switches 4897740 4728784 -3.449672706
Sum of nr_voluntary_switches 4742512 4621606 -2.549408415
Sum of nr_involuntary_switches 155228 107178 -30.95446698
Sum of nr_wakeups 4742648 4623175 -2.51912012
Sum of nr_migrations 1263925 930600 -26.37221354
Sum of sum_exec_runtime 288481.15 262255.2574 -9.091024712
Sum of sum_idle_runtime 2576164.568 2851759.68 10.69788457
Sum of sum_sleep_runtime 76890.14753 78632.31679 2.265789982
Sum of wait_count 4897894 4728939 -3.449543824
Sum of wait_sum 3041.78227 24167.4694 694.5167422

o schbench (2 messengers, 128 workers - fully loaded) (NPS1)

tip eevdf %diff
Comm schbench schbench N/A
Sum of avg_atom 7538.162897 7289.565705 -3.297848503
Average of avg_atom 29.10487605 28.14504133 -3.297848503
Sum of avg_per_cpu 630248.6079 471215.3671 -25.23341406
Average of avg_per_cpu 2433.392309 1819.364352 -25.23341406
Average of avg_wait_time 0.054147456 25.34304285 46703.75524
Sum of nr_switches 85210 88176 3.480812111
Sum of nr_voluntary_switches 83165 83457 0.351109241
Sum of nr_involuntary_switches 2045 4719 130.7579462
Sum of nr_wakeups 83168 83459 0.34989419
Sum of nr_migrations 3265 3025 -7.350689127
Sum of sum_exec_runtime 2476504.52 2469058.164 -0.300680129
Sum of sum_idle_runtime 110294825.8 132520924.2 20.15153321
Sum of sum_sleep_runtime 5293337.741 5297778.714 0.083897408
Sum of sum_block_runtime 56.043253 15.12936 -73.00413664
Sum of wait_count 85615 88606 3.493546692
Sum of wait_sum 4653.340163 9605.221964 106.4156418

o schbench (2 messengers, 256 workers - overloaded) (NPS1)

tip eevdf %diff
Comm schbench schbench N/A
Sum of avg_atom 11676.77306 4803.485728 -58.8629007
Average of avg_atom 22.67334574 9.327156753 -58.8629007
Sum of avg_per_cpu 55235.68013 38286.47722 -30.68524343
Average of avg_per_cpu 107.2537478 74.34267421 -30.68524343
Average of avg_wait_time 2.23189096 2.58191945 15.68304621
Sum of nr_switches 202862 425258 109.6292061
Sum of nr_voluntary_switches 163079 165058 1.213522281
Sum of nr_involuntary_switches 39783 260200 554.0482115
Sum of nr_wakeups 163082 165058 1.211660392
Sum of nr_migrations 44199 54894 24.19738003
Sum of sum_exec_runtime 4586675.667 3963846.024 -13.57910801
Sum of sum_idle_runtime 201050644.2 195126863.7 -2.946412087
Sum of sum_sleep_runtime 10418117.66 10402686.4 -0.148119407
Sum of sum_block_runtime 1548.979156 516.115078 -66.68030838
Sum of wait_count 203377 425792 109.3609405
Sum of wait_sum 455609.3122 1100885.201 141.6292142

o tbench (256 clients - overloaded) (NPS1)

- tbench client
tip eevdf % diff
comm tbench tbench N/A
Sum of avg_atom 3.594587941 5.112101854 42.21663064
Average of avg_atom 0.013986724 0.019891447 42.21663064
Sum of avg_per_cpu 392838.0975 142065.4206 -63.83613975
Average of avg_per_cpu 1528.552909 552.7837377 -63.83613975
Average of avg_wait_time 0.010512441 0.006861579 -34.72895916
Sum of nr_switches 692845080 511780111 -26.1335433
Sum of nr_voluntary_switches 178151085 371234907 108.3820635
Sum of nr_involuntary_switches 514693995 140545204 -72.69344399
Sum of nr_wakeups 178151085 371234909 108.3820646
Sum of nr_migrations 45279 71177 57.19649286
Sum of sum_exec_runtime 9192343.465 9624025.792 4.69610746
Sum of sum_idle_runtime 7125370.721 16145736.39 126.5950365
Sum of sum_sleep_runtime 2222469.726 5792868.629 160.650058
Sum of sum_block_runtime 68.60879 446.080476 550.1797743
Sum of wait_count 692845479 511780543 -26.13352349
Sum of wait_sum 7287852.246 3297894.139 -54.7480653

- tbench server

tip eevdf % diff
Comm tbench_srv tbench_srv N/A
Sum of avg_atom 5.077837807 5.447267364 7.275331971
Average of avg_atom2 0.019758124 0.021195593 7.275331971
Sum of avg_per_cpu 538586.1634 87925.51225 -83.67475471
Average of avg_per_cpu2 2095.666006 342.1226158 -83.67475471
Average of avg_wait_time 0.000827346 0.006505748 686.3392261
Sum of nr_switches 692980666 511838912 -26.13951051
Sum of nr_voluntary_switches 690367607 390304935 -43.46418762
Sum of nr_involuntary_switches 2613059 121533977 4551.023073
Sum of nr_wakeups 690367607 390304935 -43.46418762
Sum of nr_migrations 39486 84474 113.9340526
Sum of sum_exec_runtime 9176708.278 8734423.401 -4.819646259
Sum of sum_idle_runtime 413900.3645 447180.3879 8.040588086
Sum of sum_sleep_runtime 8966201.976 6690818.107 -25.37734345
Sum of sum_block_runtime 1.776413 1.617435 -8.949382829
Sum of wait_count 692980942 511839229 -26.13949418
Sum of wait_sum 565739.6984 3295519.077 482.5150836

>
> The nice -19 numbers aren't as pretty as Vincent's, but at the end I was going
> cross-eyed from staring at tree prints and I just couldn't figure out where it
> was going side-ways.
>
> There's definitely more benchmarking/tweaking to be done (0-day already
> reported a stress-ng loss), but if we can pull this off we can delete a whole
> much of icky heuristics code. EEVDF is a much better defined policy than what
> we currently have.
>

DeathStarBench and SpecJBB and slightly more complex to analyze. I'll
get the schedstat data for both soon. I'll rerun some of the above
workloads with NO_PRESERVE_LAG to see if that makes any difference.
In the meantime, if you need more data from the test system for any
particular workload, please do let me know. I will collect the per-task
and system-wide schedstat data for the workload as it is rather
inexpensive to collect and gives good insights but if you need any
other data, I'll be more than happy to get those too for analysis.

--
Thanks and Regards,
Prateek

2023-03-22 09:47:40

by K Prateek Nayak

[permalink] [raw]
Subject: Re: [PATCH 00/10] sched: EEVDF using latency-nice

Hello Peter,

One important detail I forgot to mention: When I picked eevdf commits
from your tree
(https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=sched/core),
they were based on v6.3-rc1 with the sched/eevdf HEAD at:

commit: 0dddbc0b54ad ("sched/fair: Implement an EEVDF like policy")

On 3/22/2023 12:19 PM, K Prateek Nayak wrote:
> Hello Peter,
>
> Leaving some results from my testing on a dual socket Zen3 machine
> (2 x 64C/128T) below.
>
> tl;dr
>
> o I've not tested workloads with nice and latency nice yet focusing more
> on the out of the box performance. No changes to sched_feat were made
> for the same reason.
>
> o Except for hackbench (m:n communication relationship), I do not see any
> regression for other standard benchmarks (mostly 1:1 or 1:n) relation
> when system is below fully loaded.
>
> o At fully loaded scenario, schbench seems to be unhappy. Looking at the
> data from /proc/<pid>/sched for the tasks with schedstats enabled,
> there is an increase in number of context switches and the total wait
> sum. When system is overloaded, things flip and the schbench tail
> latency improves drastically. I suspect the involuntary
> context-switches help workers make progress much sooner after wakeup
> compared to tip thus leading to lower tail latency.
>
> o For the same reason as above, tbench throughput takes a hit with
> number of involuntary context-switches increasing drastically for the
> tbench server. There is also an increase in wait sum noticed.
>
> o Couple of real world workloads were also tested. DeathStarBench
> throughput tanks much more with the updated version in your tree
> compared to this series as is.
> SpecJBB Max-jOPS sees large improvements but comes at a cost of
> drop in Critical-jOPS signifying an increase in either wait time
> or an increase in involuntary context-switches which can lead to
> transactions taking longer to complete.
>
> o Apart from DeathStarBench, the all the trends reported remain same
> comparing the version in your tree and this series, as is, applied
> on the same base kernel.
>
> I'll leave the detailed results below and some limited analysis.
>
> On 3/6/2023 6:55 PM, Peter Zijlstra wrote:
>> Hi!
>>
>> Ever since looking at the latency-nice patches, I've wondered if EEVDF would
>> not make more sense, and I did point Vincent at some older patches I had for
>> that (which is here his augmented rbtree thing comes from).
>>
>> Also, since I really dislike the dual tree, I also figured we could dynamically
>> switch between an augmented tree and not (and while I have code for that,
>> that's not included in this posting because with the current results I don't
>> think we actually need this).
>>
>> Anyway, since I'm somewhat under the weather, I spend last week desperately
>> trying to connect a small cluster of neurons in defiance of the snot overlord
>> and bring back the EEVDF patches from the dark crypts where they'd been
>> gathering cobwebs for the past 13 odd years.
>>
>> By friday they worked well enough, and this morning (because obviously I forgot
>> the weekend is ideal to run benchmarks) I ran a bunch of hackbenck, netperf,
>> tbench and sysbench -- there's a bunch of wins and losses, but nothing that
>> indicates a total fail.
>>
>> ( in fact, some of the schbench results seem to indicate EEVDF schedules a lot
>> more consistent than CFS and has a bunch of latency wins )
>>
>> ( hackbench also doesn't show the augmented tree and generally more expensive
>> pick to be a loss, in fact it shows a slight win here )
>>
>>
>> hackbech load + cyclictest --policy other results:
>>
>>
>> EEVDF CFS
>>
>> # Min Latencies: 00053
>> LNICE(19) # Avg Latencies: 04350
>> # Max Latencies: 76019
>>
>> # Min Latencies: 00052 00053
>> LNICE(0) # Avg Latencies: 00690 00687
>> # Max Latencies: 14145 13913
>>
>> # Min Latencies: 00019
>> LNICE(-19) # Avg Latencies: 00261
>> # Max Latencies: 05642
>>
>
> Following are the results from testing the series on a dual socket
> Zen3 machine (2 x 64C/128T):
>
> NPS Modes are used to logically divide single socket into
> multiple NUMA region.
> Following is the NUMA configuration for each NPS mode on the system:
>
> NPS1: Each socket is a NUMA node.
> Total 2 NUMA nodes in the dual socket machine.
>
> Node 0: 0-63, 128-191
> Node 1: 64-127, 192-255
>
> NPS2: Each socket is further logically divided into 2 NUMA regions.
> Total 4 NUMA nodes exist over 2 socket.
>
> Node 0: 0-31, 128-159
> Node 1: 32-63, 160-191
> Node 2: 64-95, 192-223
> Node 3: 96-127, 223-255
>
> NPS4: Each socket is logically divided into 4 NUMA regions.
> Total 8 NUMA nodes exist over 2 socket.
>
> Node 0: 0-15, 128-143
> Node 1: 16-31, 144-159
> Node 2: 32-47, 160-175
> Node 3: 48-63, 176-191
> Node 4: 64-79, 192-207
> Node 5: 80-95, 208-223
> Node 6: 96-111, 223-231
> Node 7: 112-127, 232-255
>
> Kernel versions:
> - tip: 6.2.0-rc6 tip sched/core
> - eevdf: 6.2.0-rc6 tip sched/core
> + eevdf commits from your tree
> (https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/eevdf)

I had cherry picked the following commits for eevdf:

commit: b84a8f6b6fa3 ("sched: Introduce latency-nice as a per-task attribute")
commit: eea7fc6f13b4 ("sched/core: Propagate parent task's latency requirements to the child task")
commit: a143d2bcef65 ("sched: Allow sched_{get,set}attr to change latency_nice of the task")
commit: d9790468df14 ("sched/fair: Add latency_offset")
commit: 3d4d37acaba4 ("sched/fair: Add sched group latency support")
commit: 707840ffc8fa ("sched/fair: Add avg_vruntime")
commit: 394af9db316b ("sched/fair: Remove START_DEBIT")
commit: 89b2a2ee0e9d ("sched/fair: Add lag based placement")
commit: e3db9631d8ca ("rbtree: Add rb_add_augmented_cached() helper")
commit: 0dddbc0b54ad ("sched/fair: Implement an EEVDF like policy")

from the sched/eevdf branch in your tree onto the tip branch back when
I started testing. I notice some more changes have been added since then.
Queuing testing of latest changes on the updated tip:sched/core based
on v6.3-rc3. I was able to cherry pick the latest commits from
sched/eevdf cleanly.

>
> - eevdf prev: 6.2.0-rc6 tip sched/core + this series as is
>
> When the testing started, the tip was at:
> commit 7c4a5b89a0b5 "sched/rt: pick_next_rt_entity(): check list_entry"
> [..snip..]
>
--
Thanks and Regards,
Prateek

2023-03-23 11:57:43

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 00/10] sched: EEVDF using latency-nice

Hi!

> Ever since looking at the latency-nice patches, I've wondered if EEVDF would
> not make more sense, and I did point Vincent at some older patches I had for
> that (which is here his augmented rbtree thing comes from).

Link for context: https://lwn.net/Articles/925371/ . "EEVDF" is not
commonly known acronym :-).

BR, Pavel


--
People of Russia, stop Putin before his war on Ukraine escalates.


Attachments:
(No filename) (426.00 B)
signature.asc (201.00 B)
Download all attachments