by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH 5/9] sched/fair: Take into account latency priority at wakeup

On Wed, 30 Nov 2022 at 04:10, Joel Fernandes <[email protected]> wrote:
>
> Hi Vincent,
>
> On Tue, Nov 29, 2022 at 5:21 PM Vincent Guittot
> <[email protected]> wrote:
> [...]
> > > >>> }
> > > >>>
> > > >>> /*
> > > >>> @@ -7544,7 +7558,7 @@ static int __sched_setscheduler(struct task_struct *p,
> > > >>> if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
> > > >>> goto change;
> > > >>> if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE &&
> > > >>> - attr->sched_latency_nice != p->latency_nice)
> > > >>> + attr->sched_latency_nice != LATENCY_TO_NICE(p->latency_prio))
> > > >>> goto change;
> > > >>>
> > > >>> p->sched_reset_on_fork = reset_on_fork;
> > > >>> @@ -8085,7 +8099,7 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
> > > >>> get_params(p, &kattr);
> > > >>> kattr.sched_flags &= SCHED_FLAG_ALL;
> > > >>>
> > > >>> - kattr.sched_latency_nice = p->latency_nice;
> > > >>> + kattr.sched_latency_nice = LATENCY_TO_NICE(p->latency_prio);
> > > >>>
> > > >>> #ifdef CONFIG_UCLAMP_TASK
> > > >>> /*
> > > >>> @@ -11294,6 +11308,20 @@ const u32 sched_prio_to_wmult[40] = {
> > > >>> /* 15 */ 119304647, 148102320, 186737708, 238609294, 286331153,
> > > >>> };
> > > >>>
> > > >>> +/*
> > > >>> + * latency weight for wakeup preemption
> > > >>> + */
> > > >>> +const int sched_latency_to_weight[40] = {
> > > >>> + /* -20 */ -1024, -973, -922, -870, -819,
> > > >>> + /* -15 */ -768, -717, -666, -614, -563,
> > > >>> + /* -10 */ -512, -461, -410, -358, -307,
> > > >>> + /* -5 */ -256, -205, -154, -102, -51,
> > > >>> + /* 0 */ 0, 51, 102, 154, 205,
> > > >>> + /* 5 */ 256, 307, 358, 410, 461,
> > > >>> + /* 10 */ 512, 563, 614, 666, 717,
> > > >>> + /* 15 */ 768, 819, 870, 922, 973,
> > > >>> +};
> > > >>> +
> > > >>
> > > >> The table is linear. You could approximate this as: weight = nice * 51
> > > >> since it is a linear scale and do the conversion in place.
> > > >>
> > > >> Or, since the only place you are using the latency_to_weight is in
> > > >> set_latency_offset(), can we drop the sched_latency_to_weight array
> > > >> and simplify as follows?
> > > >
> > > > It's also used in cgroup patch and keeps a coherency between
> > > > nice/weight an latency_nice/offset so I prefer
> > >
> > > I dont think it’s a valid comparison as nice/weight conversion are non linear and over there a table makes sense: weight = 1024 / 1.25 ^ nice
> > >
> > > > keeping current
> > > > implementation
> > >
> > > I could be missing something, but, since its a linear scale, why does cgroup need weight at all? Just store nice directly. Why would that not work?
> > >
> > > In the end the TG and SE has the latency offset in the struct, that is all you care about. All the conversion back and forth is unnecessary, as it is a linear scale and just increases LOC and takes more memory to store linear arrays.
> > >
> > > Again I could be missing something and I will try to play with your series and see if I can show you what I mean (or convince myself it’s needed).
> >
> > I get what you mean but I think that having an array gives latitude to
> > adjust this internal offset mapping at a minimum cost of a const array
>
> Ok that makes sense. If you feel like there might be updates in the
> future to this mapping array (like changing the constants as you
> mentioned), then I am Ok with us keeping it.
>
> Reviewed-by: Joel Fernandes (Google) <[email protected]>
>
> I am excited about your series, the CFS latency issues have been
> thorny. This feels like a step forward in the right direction. Cheers,

Thanks
Vincent

>
> - Joel

2022-12-07 16:58:14

by K Prateek Nayak

[permalink] [raw]

Subject: Re: [PATCH v9 0/9] Add latency priority for CFS class

Hello Vincent,

Thank you for taking a look at the report.

On 11/28/2022 10:49 PM, Vincent Guittot wrote:
> Hi Prateek,
>
> On Mon, 28 Nov 2022 at 12:52, K Prateek Nayak <[email protected]> wrote:
>>
>> Hello Vincent,
>>
>> Following are the test results on dual socket Zen3 machine (2 x 64C/128T)
>>
>> tl;dr
>>
>> o All benchmarks with DEFAULT_LATENCY_NICE value are comparable to tip.
>> There is, however, a noticeable dip for unixbench-spawn test case.
>>
>> o With the 2 rbtree approach, I do not see much difference in the
>> hackbench results with varying latency nice value. Tests on v5 did
>> yield noticeable improvements for hackbench.
>> (https://lore.kernel.org/lkml/[email protected]/)
>
> The 2 rbtree approach is the one that was already used in v5. I just
> rerun hackbench tests with latest tip and v6.2-rc7 and I can see large
> performance improvement for pipe tests on my system (8 cores system).
> Could you try witha larger number of group ? like 64, 128 and 256
> groups

Ah! My bad. I've rerun hackbench with larger number of groups and I see a
clear win for pipes with latency nice 19. Hackbench with sockets too see a
small win.

o pipes

$ perf bench sched messaging -p -l 50000 -g <groups>

latency_nice: 0 19 -20
32-groups: 9.43 (0.00 pct) 6.42 (31.91 pct) 9.75 (-3.39 pct)
64-groups: 21.55 (0.00 pct) 12.97 (39.81 pct) 21.48 (0.32 pct)
128-groups: 41.15 (0.00 pct) 24.18 (41.23 pct) 46.69 (-13.46 pct)
256-groups: 78.87 (0.00 pct) 43.65 (44.65 pct) 78.84 (0.03 pct)
512-groups: 125.48 (0.00 pct) 78.91 (37.11 pct) 136.21 (-8.55 pct)
1024-groups: 292.81 (0.00 pct) 151.36 (48.30 pct) 323.57 (-10.50 pct)

o sockets

$ perf bench sched messaging -l 100000 -g <groups>

latency_nice: 0 19 -20
32-groups: 27.23 (0.00 pct) 27.00 (0.84 pct) 26.92 (1.13 pct)
64-groups: 45.71 (0.00 pct) 44.58 (2.47 pct) 45.86 (-0.32 pct)
128-groups: 79.55 (0.00 pct) 78.22 (1.67 pct) 80.01 (-0.57 pct)
256-groups: 161.41 (0.00 pct) 164.04 (-1.62 pct) 169.57 (-5.05 pct)
512-groups: 326.41 (0.00 pct) 310.00 (5.02 pct) 342.17 (-4.82 pct)
1024-groups: 634.36 (0.00 pct) 633.59 (0.12 pct) 640.05 (-0.89 pct)

Note: All tests were done in NPS1 mode.

>
>>
>> [..snip..]
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> ~ Unixbench - DEFAULT_LATENCY_NICE ~
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> o NPS1
>>
>> Test Metric Parallelism tip latency_nice
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-1 48929419.48 ( 0.00%) 49137039.06 ( 0.42%)
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6275526953.25 ( 0.00%) 6265580479.15 ( -0.16%)
>> unixbench-syscall Amean unixbench-syscall-1 2994319.73 ( 0.00%) 3008596.83 * -0.48%*
>> unixbench-syscall Amean unixbench-syscall-512 7349715.87 ( 0.00%) 7420994.50 * -0.97%*
>> unixbench-pipe Hmean unixbench-pipe-1 2830206.03 ( 0.00%) 2854405.99 * 0.86%*
>> unixbench-pipe Hmean unixbench-pipe-512 326207828.01 ( 0.00%) 328997804.52 * 0.86%*
>> unixbench-spawn Hmean unixbench-spawn-1 6394.21 ( 0.00%) 6367.75 ( -0.41%)
>> unixbench-spawn Hmean unixbench-spawn-512 72700.64 ( 0.00%) 71454.19 * -1.71%*
>> unixbench-execl Hmean unixbench-execl-1 4723.61 ( 0.00%) 4750.59 ( 0.57%)
>> unixbench-execl Hmean unixbench-execl-512 11212.05 ( 0.00%) 11262.13 ( 0.45%)
>>
>> o NPS2
>>
>> Test Metric Parallelism tip latency_nice
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-1 49271512.85 ( 0.00%) 49245260.43 ( -0.05%)
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6267992483.03 ( 0.00%) 6264951100.67 ( -0.05%)
>> unixbench-syscall Amean unixbench-syscall-1 2995885.93 ( 0.00%) 3005975.10 * -0.34%*
>> unixbench-syscall Amean unixbench-syscall-512 7388865.77 ( 0.00%) 7276275.63 * 1.52%*
>> unixbench-pipe Hmean unixbench-pipe-1 2828971.95 ( 0.00%) 2856578.72 * 0.98%*
>> unixbench-pipe Hmean unixbench-pipe-512 326225385.37 ( 0.00%) 328941270.81 * 0.83%*
>> unixbench-spawn Hmean unixbench-spawn-1 6958.71 ( 0.00%) 6954.21 ( -0.06%)
>> unixbench-spawn Hmean unixbench-spawn-512 85443.56 ( 0.00%) 70536.42 * -17.45%* (0.67% vs 0.93% - CoEff var)
>
> I don't expect any perf improvement or regression when the latency
> nice is not changed

This regression can be ignored. Although the results from back to
back runs are very stable, I see the results vary when I rebuild
the unixbench binaries on my test setup.

tip latency_nice
unixbench-spawn-512 73489.0 78260.4 (kexec)
unixbench-spawn-512 73332.7 77821.2 (reboot)
unixbench-spawn-512 86207.4 82281.2 (rebuilt + reboot)

I'll go back and look more into the spawn test because there is
something else at play there but other Unixbench results seem to
be stable looking at the rerun.

>
>> unixbench-execl Hmean unixbench-execl-1 4767.99 ( 0.00%) 4752.63 * -0.32%*
>> unixbench-execl Hmean unixbench-execl-512 11250.72 ( 0.00%) 11320.97 ( 0.62%)
>>
>> o NPS4
>>
>> Test Metric Parallelism tip latency_nice
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-1 49041932.68 ( 0.00%) 49156671.05 ( 0.23%)
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6286981589.85 ( 0.00%) 6285248711.40 ( -0.03%)
>> unixbench-syscall Amean unixbench-syscall-1 2992405.60 ( 0.00%) 3008933.03 * -0.55%*
>> unixbench-syscall Amean unixbench-syscall-512 7971789.70 ( 0.00%) 7814622.23 * 1.97%*
>> unixbench-pipe Hmean unixbench-pipe-1 2822892.54 ( 0.00%) 2852615.11 * 1.05%*
>> unixbench-pipe Hmean unixbench-pipe-512 326408309.83 ( 0.00%) 329617202.56 * 0.98%*
>> unixbench-spawn Hmean unixbench-spawn-1 7685.31 ( 0.00%) 7243.54 ( -5.75%)
>> unixbench-spawn Hmean unixbench-spawn-512 72245.56 ( 0.00%) 77000.81 * 6.58%*
>> unixbench-execl Hmean unixbench-execl-1 4761.42 ( 0.00%) 4733.12 * -0.59%*
>> unixbench-execl Hmean unixbench-execl-512 11533.53 ( 0.00%) 11660.17 ( 1.10%)
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> ~ Hackbench - Various Latency Nice Values ~
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> o 100000 loops
>>
>> - pipe (process)
>>
>> Test: LN: 0 LN: 19 LN: -20
>> 1-groups: 3.91 (0.00 pct) 3.91 (0.00 pct) 3.81 (2.55 pct)
>> 2-groups: 4.48 (0.00 pct) 4.52 (-0.89 pct) 4.53 (-1.11 pct)
>> 4-groups: 4.83 (0.00 pct) 4.83 (0.00 pct) 4.87 (-0.82 pct)
>> 8-groups: 5.09 (0.00 pct) 5.00 (1.76 pct) 5.07 (0.39 pct)
>> 16-groups: 6.92 (0.00 pct) 6.79 (1.87 pct) 6.96 (-0.57 pct)
>>
>> - pipe (thread)
>>
>> 1-groups: 4.13 (0.00 pct) 4.08 (1.21 pct) 4.11 (0.48 pct)
>> 2-groups: 4.78 (0.00 pct) 4.90 (-2.51 pct) 4.79 (-0.20 pct)
>> 4-groups: 5.12 (0.00 pct) 5.08 (0.78 pct) 5.16 (-0.78 pct)
>> 8-groups: 5.31 (0.00 pct) 5.28 (0.56 pct) 5.33 (-0.37 pct)
>> 16-groups: 7.34 (0.00 pct) 7.27 (0.95 pct) 7.33 (0.13 pct)
>>
>> - socket (process)
>>
>> Test: LN: 0 LN: 19 LN: -20
>> 1-groups: 6.61 (0.00 pct) 6.38 (3.47 pct) 6.54 (1.05 pct)
>> 2-groups: 6.59 (0.00 pct) 6.67 (-1.21 pct) 6.11 (7.28 pct)
>> 4-groups: 6.77 (0.00 pct) 6.78 (-0.14 pct) 6.79 (-0.29 pct)
>> 8-groups: 8.29 (0.00 pct) 8.39 (-1.20 pct) 8.36 (-0.84 pct)
>> 16-groups: 12.21 (0.00 pct) 12.03 (1.47 pct) 12.35 (-1.14 pct)
>>
>> - socket (thread)
>>
>> Test: LN: 0 LN: 19 LN: -20
>> 1-groups: 6.50 (0.00 pct) 5.99 (7.84 pct) 6.02 (7.38 pct) ^
>> 2-groups: 6.07 (0.00 pct) 6.20 (-2.14 pct) 6.23 (-2.63 pct)
>> 4-groups: 6.61 (0.00 pct) 6.64 (-0.45 pct) 6.63 (-0.30 pct)
>> 8-groups: 8.87 (0.00 pct) 8.67 (2.25 pct) 8.78 (1.01 pct)
>> 16-groups: 12.63 (0.00 pct) 12.54 (0.71 pct) 12.59 (0.31 pct)
>>
>>> [..snip..]
>>>
>>
>> Apart from couple of anomalies, latency nice reduces wait time, especially
>> when the system is heavily loaded. If there is any data, or any specific
>> workload you would like me to run on the test system, please do let me know.
>> Meanwhile, I'll try to get some numbers for larger workloads like SpecJBB
>> that did see improvements with latency nice on v5.

Following are results for SpecJBB in NPS1 mode:

+----------------------------------------------+
| | Latency Nice | |
| Metric |-------------------| tip |
| | 0 | 19 | |
|----------------|-------------------|---------|
| Max jOPS | 100.00% | 102.19% | 101.02% |
| Criritcal jOPS | 100.00% | 122.41% | 100.41% |
+----------------------------------------------+

SpecJBB throughput for Max-jOPS is similar across the board
but Critical-jOPS throughput sees a good uplift again with
latency nice 19.

>
> [..snip..]
>

If there is any specific workload you would like me to test,
please do let me know. I'll try to test more workloads I come
across with different latency nice values and update you
with the results on this thread.

Tested-by: K Prateek Nayak <[email protected]>
--
Thanks and Regards,
Prateek