2023-09-12 09:54:32

by Mike Galbraith

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in select_idle_cpu()

On Mon, 2023-09-11 at 18:19 +0800, Chen Yu wrote:
>
> > Speaking of cache-hot idle CPU, is netperf actually more happy with
> > piling on current CPU?
>
> Yes. Per my previous test, netperf of TCP_RR/UDP_RR really likes to
> put the waker and wakee together.

Hm, seems there's at least one shared L2 case where that's untrue by
more than a tiny margin, which surprised me rather a lot.

For grins, I tested netperf on my dinky rpi4b, and while its RR numbers
seem kinda odd, they're also seemingly repeatable (ergo showing them).
I measured a very modest cross-core win on a shared L2 Intel CPU some
years ago (when Q6600 was shiny/new) but nothing close to these deltas.

Makes me wonder what (a tad beefier) Bulldog RR numbers look like.

root@rpi4:~# ONLY=TCP_RR netperf.sh
TCP_RR-1 unbound Avg: 29611 Sum: 29611
TCP_RR-1 stacked Avg: 22540 Sum: 22540
TCP_RR-1 cross-core Avg: 30181 Sum: 30181

root@rpi4:~# netperf.sh
TCP_SENDFILE-1 unbound Avg: 15572 Sum: 15572
TCP_SENDFILE-1 stacked Avg: 11533 Sum: 11533
TCP_SENDFILE-1 cross-core Avg: 15751 Sum: 15751

TCP_STREAM-1 unbound Avg: 6331 Sum: 6331
TCP_STREAM-1 stacked Avg: 6031 Sum: 6031
TCP_STREAM-1 cross-core Avg: 6211 Sum: 6211

TCP_MAERTS-1 unbound Avg: 6306 Sum: 6306
TCP_MAERTS-1 stacked Avg: 6094 Sum: 6094
TCP_MAERTS-1 cross-core Avg: 9393 Sum: 9393

UDP_STREAM-1 unbound Avg: 22277 Sum: 22277
UDP_STREAM-1 stacked Avg: 18844 Sum: 18844
UDP_STREAM-1 cross-core Avg: 24749 Sum: 24749

TCP_RR-1 unbound Avg: 29674 Sum: 29674
TCP_RR-1 stacked Avg: 22267 Sum: 22267
TCP_RR-1 cross-core Avg: 30237 Sum: 30237

UDP_RR-1 unbound Avg: 36189 Sum: 36189
UDP_RR-1 stacked Avg: 27129 Sum: 27129
UDP_RR-1 cross-core Avg: 37033 Sum: 37033


2023-09-12 22:55:41

by Chen Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in select_idle_cpu()

Hi Mike,

thanks for taking a look,

On 2023-09-12 at 11:39:55 +0200, Mike Galbraith wrote:
> On Mon, 2023-09-11 at 18:19 +0800, Chen Yu wrote:
> >
> > > Speaking of cache-hot idle CPU, is netperf actually more happy with
> > > piling on current CPU?
> >
> > Yes. Per my previous test, netperf of TCP_RR/UDP_RR really likes to
> > put the waker and wakee together.
>
> Hm, seems there's at least one shared L2 case where that's untrue by
> more than a tiny margin, which surprised me rather a lot.
>

Yes, the task stacking is in theory against the work conservation of the
scheduler, and it depends on how much the resource(l1/l2 cache, dsb) locallity
is, and it is workload and hardware specific.

> For grins, I tested netperf on my dinky rpi4b, and while its RR numbers
> seem kinda odd, they're also seemingly repeatable (ergo showing them).
> I measured a very modest cross-core win on a shared L2 Intel CPU some
> years ago (when Q6600 was shiny/new) but nothing close to these deltas.
>

This is interesting, I have a Jacobsville which also has shared L2, I'll
run some tests to check what the difference between task stacking vs spreading task
on that platform. But I guess that is another topic because current patch
avoids stacking tasks.

thanks,
Chenyu

> Makes me wonder what (a tad beefier) Bulldog RR numbers look like.
>
> root@rpi4:~# ONLY=TCP_RR netperf.sh
> TCP_RR-1 unbound Avg: 29611 Sum: 29611
> TCP_RR-1 stacked Avg: 22540 Sum: 22540
> TCP_RR-1 cross-core Avg: 30181 Sum: 30181
>
> root@rpi4:~# netperf.sh
> TCP_SENDFILE-1 unbound Avg: 15572 Sum: 15572
> TCP_SENDFILE-1 stacked Avg: 11533 Sum: 11533
> TCP_SENDFILE-1 cross-core Avg: 15751 Sum: 15751
>
> TCP_STREAM-1 unbound Avg: 6331 Sum: 6331
> TCP_STREAM-1 stacked Avg: 6031 Sum: 6031
> TCP_STREAM-1 cross-core Avg: 6211 Sum: 6211
>
> TCP_MAERTS-1 unbound Avg: 6306 Sum: 6306
> TCP_MAERTS-1 stacked Avg: 6094 Sum: 6094
> TCP_MAERTS-1 cross-core Avg: 9393 Sum: 9393
>
> UDP_STREAM-1 unbound Avg: 22277 Sum: 22277
> UDP_STREAM-1 stacked Avg: 18844 Sum: 18844
> UDP_STREAM-1 cross-core Avg: 24749 Sum: 24749
>
> TCP_RR-1 unbound Avg: 29674 Sum: 29674
> TCP_RR-1 stacked Avg: 22267 Sum: 22267
> TCP_RR-1 cross-core Avg: 30237 Sum: 30237
>
> UDP_RR-1 unbound Avg: 36189 Sum: 36189
> UDP_RR-1 stacked Avg: 27129 Sum: 27129
> UDP_RR-1 cross-core Avg: 37033 Sum: 37033