2023-08-01 20:23:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC/RFT][PATCH v1 0/2] cpuidle: teo: Do not check timers unconditionally every time

Hi Folks,

This is on top of the fixes series posted previously:

https://lore.kernel.org/linux-pm/4515817.LvFx2qVVIh@kreacher/

(I'll put it all into one git branch tomorrow).

I started to play with the idea described here

https://lore.kernel.org/linux-pm/CAJZ5v0hQh2Pg_uXxj8KBRw3oLS1WdsU+rUafBAAq7dRdbRwYSA@mail.gmail.com/

and this is the result.

Note that this is completely experimental, even though it doesn't kill any of
the test boxes I've run it on.

Patch [1/2] moves the tick_nohz_get_sleep_length() call in teo_select() after
a preliminary idle state selection based on statistics and patch [2/2] adds
checks to avoid it completely if the idle state selected so far is shallow
enough.

I would appreciate checking if this actually makes any difference.

Thanks!





2023-08-03 13:34:24

by Kajetan Puchalski

[permalink] [raw]
Subject: Re: [RFC/RFT][PATCH v1 0/2] cpuidle: teo: Do not check timers unconditionally every time

Hi Rafael,

On Tue, Aug 01, 2023 at 09:35:15PM +0200, Rafael J. Wysocki wrote:
> Hi Folks,
>
> This is on top of the fixes series posted previously:
>
> https://lore.kernel.org/linux-pm/4515817.LvFx2qVVIh@kreacher/
>
> (I'll put it all into one git branch tomorrow).
>
> I started to play with the idea described here
>
> https://lore.kernel.org/linux-pm/CAJZ5v0hQh2Pg_uXxj8KBRw3oLS1WdsU+rUafBAAq7dRdbRwYSA@mail.gmail.com/
>
> and this is the result.
>
> Note that this is completely experimental, even though it doesn't kill any of
> the test boxes I've run it on.
>
> Patch [1/2] moves the tick_nohz_get_sleep_length() call in teo_select() after
> a preliminary idle state selection based on statistics and patch [2/2] adds
> checks to avoid it completely if the idle state selected so far is shallow
> enough.
>
> I would appreciate checking if this actually makes any difference.
>
> Thanks!

As mentioned in the other thread I did some testing with these two
patches on top as well, here are the results:

1. Geekbench 6

+---------------------------+---------------+-----------------+-------------------+
| metric | teo | teo_tick | teo_tick_rfc |
+---------------------------+---------------+-----------------+-------------------+
| multicore_score | 3320.9 (0.0%) | 3303.3 (-0.53%) | 3293.6 (-0.82%) |
| score | 1415.7 (0.0%) | 1417.7 (0.14%) | 1423.4 (0.54%) |
| CPU_total_power | 2421.3 (0.0%) | 2429.3 (0.33%) | 2442.2 (0.86%) |
| latency (AsyncTask #1) | 49.41μ (0.0%) | 51.07μ (3.36%) | 50.1μ (1.4%) |
| latency (labs.geekbench6) | 65.63μ (0.0%) | 77.47μ (18.03%) | 55.82μ (-14.95%) |
| latency (surfaceflinger) | 39.46μ (0.0%) | 36.94μ (-6.39%) | 35.79μ (-9.28%) |
+---------------------------+---------------+-----------------+-------------------+

Ie the big picture is all right, the latency either improves with these
patches or the spike in the previous patchset was an anomaly, either way
seems fine. Not sure where the change in the score is coming from but
for the record the line plots of the 3 iterations for both the tick
variants look the same while they're slightly distinct from the pure 'teo'
variant. It's still a below 1% gap so not the end of the world if
there's benefits elsewhere.

+-------------------+---------+------------+--------+
| kernel | cluster | idle_state | time |
+-------------------+---------+------------+--------+
| teo | little | 0.0 | 146.75 |
| teo_tick | little | 0.0 | 63.5 |
| teo_tick_rfc | little | 0.0 | 62.48 |
| teo | little | 1.0 | 53.75 |
| teo_tick | little | 1.0 | 146.78 |
| teo_tick_rfc | little | 1.0 | 147.14 |
+-------------------+---------+------------+--------+

The idle numbers look pretty much the same as the previous variant which
confirms that the change for the little cluster residency is caused by
the previous changes but also that these two patches don't affect it.

2. JetNews

+-----------------+---------------+----------------+-------------------+
| metric | teo | teo_tick | teo_tick_rfc |
+-----------------+---------------+----------------+-------------------+
| fps | 86.2 (0.0%) | 86.4 (0.16%) | 86.0 (-0.28%) |
| janks_pc | 0.8 (0.0%) | 0.8 (-0.66%) | 0.8 (-1.37%) |
| CPU_total_power | 185.2 (0.0%) | 178.2 (-3.76%) | 182.2 (-1.6%) |
+-----------------+---------------+----------------+-------------------+

Pretty much no change here, the power is still better than in base teo.

+-------------------+---------+------------+-------+
| kernel | cluster | idle_state | time |
+-------------------+---------+------------+-------+
| teo | mid | -1.0 | 21.63 |
| teo_tick | mid | -1.0 | 21.57 |
| teo_tick_rfc | mid | -1.0 | 17.66 |
| teo | big | -1.0 | 8.81 |
| teo_tick | big | -1.0 | 8.55 |
| teo_tick_rfc | big | -1.0 | 12.04 |
+-------------------+---------+------------+-------+

This part slightly stands out so could be worth noting. For some reason
the trace registers a few seconds less running time (-1 means 'not
idle') on the mid cores but a few seconds more on the big cores. This
wasn't the case for the 'teo_tick' variant before so looks like it's
caused by these two patches. Doesn't seem to be an issue though, just
interesting.

TLDR:
Does not blow up, looks okay :)

2023-08-03 14:10:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC/RFT][PATCH v1 0/2] cpuidle: teo: Do not check timers unconditionally every time

Hi Kajetan,

On Thu, Aug 3, 2023 at 3:18 PM Kajetan Puchalski
<[email protected]> wrote:
>
> Hi Rafael,
>
> On Tue, Aug 01, 2023 at 09:35:15PM +0200, Rafael J. Wysocki wrote:
> > Hi Folks,
> >
> > This is on top of the fixes series posted previously:
> >
> > https://lore.kernel.org/linux-pm/4515817.LvFx2qVVIh@kreacher/
> >
> > (I'll put it all into one git branch tomorrow).
> >
> > I started to play with the idea described here
> >
> > https://lore.kernel.org/linux-pm/CAJZ5v0hQh2Pg_uXxj8KBRw3oLS1WdsU+rUafBAAq7dRdbRwYSA@mail.gmail.com/
> >
> > and this is the result.
> >
> > Note that this is completely experimental, even though it doesn't kill any of
> > the test boxes I've run it on.
> >
> > Patch [1/2] moves the tick_nohz_get_sleep_length() call in teo_select() after
> > a preliminary idle state selection based on statistics and patch [2/2] adds
> > checks to avoid it completely if the idle state selected so far is shallow
> > enough.
> >
> > I would appreciate checking if this actually makes any difference.
> >
> > Thanks!
>
> As mentioned in the other thread I did some testing with these two
> patches on top as well, here are the results:
>
> 1. Geekbench 6
>
> +---------------------------+---------------+-----------------+-------------------+
> | metric | teo | teo_tick | teo_tick_rfc |
> +---------------------------+---------------+-----------------+-------------------+
> | multicore_score | 3320.9 (0.0%) | 3303.3 (-0.53%) | 3293.6 (-0.82%) |
> | score | 1415.7 (0.0%) | 1417.7 (0.14%) | 1423.4 (0.54%) |
> | CPU_total_power | 2421.3 (0.0%) | 2429.3 (0.33%) | 2442.2 (0.86%) |
> | latency (AsyncTask #1) | 49.41μ (0.0%) | 51.07μ (3.36%) | 50.1μ (1.4%) |
> | latency (labs.geekbench6) | 65.63μ (0.0%) | 77.47μ (18.03%) | 55.82μ (-14.95%) |
> | latency (surfaceflinger) | 39.46μ (0.0%) | 36.94μ (-6.39%) | 35.79μ (-9.28%) |
> +---------------------------+---------------+-----------------+-------------------+
>
> Ie the big picture is all right, the latency either improves with these
> patches or the spike in the previous patchset was an anomaly, either way
> seems fine. Not sure where the change in the score is coming from but
> for the record the line plots of the 3 iterations for both the tick
> variants look the same while they're slightly distinct from the pure 'teo'
> variant. It's still a below 1% gap so not the end of the world if
> there's benefits elsewhere.
>
> +-------------------+---------+------------+--------+
> | kernel | cluster | idle_state | time |
> +-------------------+---------+------------+--------+
> | teo | little | 0.0 | 146.75 |
> | teo_tick | little | 0.0 | 63.5 |
> | teo_tick_rfc | little | 0.0 | 62.48 |
> | teo | little | 1.0 | 53.75 |
> | teo_tick | little | 1.0 | 146.78 |
> | teo_tick_rfc | little | 1.0 | 147.14 |
> +-------------------+---------+------------+--------+
>
> The idle numbers look pretty much the same as the previous variant which
> confirms that the change for the little cluster residency is caused by
> the previous changes but also that these two patches don't affect it.
>
> 2. JetNews
>
> +-----------------+---------------+----------------+-------------------+
> | metric | teo | teo_tick | teo_tick_rfc |
> +-----------------+---------------+----------------+-------------------+
> | fps | 86.2 (0.0%) | 86.4 (0.16%) | 86.0 (-0.28%) |
> | janks_pc | 0.8 (0.0%) | 0.8 (-0.66%) | 0.8 (-1.37%) |
> | CPU_total_power | 185.2 (0.0%) | 178.2 (-3.76%) | 182.2 (-1.6%) |
> +-----------------+---------------+----------------+-------------------+
>
> Pretty much no change here, the power is still better than in base teo.
>
> +-------------------+---------+------------+-------+
> | kernel | cluster | idle_state | time |
> +-------------------+---------+------------+-------+
> | teo | mid | -1.0 | 21.63 |
> | teo_tick | mid | -1.0 | 21.57 |
> | teo_tick_rfc | mid | -1.0 | 17.66 |
> | teo | big | -1.0 | 8.81 |
> | teo_tick | big | -1.0 | 8.55 |
> | teo_tick_rfc | big | -1.0 | 12.04 |
> +-------------------+---------+------------+-------+
>
> This part slightly stands out so could be worth noting. For some reason
> the trace registers a few seconds less running time (-1 means 'not
> idle') on the mid cores but a few seconds more on the big cores. This
> wasn't the case for the 'teo_tick' variant before so looks like it's
> caused by these two patches. Doesn't seem to be an issue though, just
> interesting.
>
> TLDR:
> Does not blow up, looks okay :)

Thank you for the feedback, much appreciated!

I'll likely send a new version of this series later today including
one more patch and I will set up a git branch with it later.

Thanks!