by Qais Yousef

[permalink] [raw]

Subject: Re: [PATCH v2 8/8] sched/pelt: Introduce PELT multiplier

Hi Ashay

On 04/12/24 15:36, Ashay Jaiswal wrote:
> On 2/6/2024 10:37 PM, Ashay Jaiswal wrote:
> >
> >
> > On 1/30/2024 10:58 PM, Vincent Guittot wrote:
> >> On Sun, 28 Jan 2024 at 17:22, Ashay Jaiswal <[email protected]> wrote:
> >>>
> >>> Hello Qais Yousef,
> >>>
> >>> Thank you for your response.
> >>>
> >>> On 1/21/2024 5:34 AM, Qais Yousef wrote:
> >>>> Hi Ashay
> >>>>
> >>>> On 01/20/24 13:22, Ashay Jaiswal wrote:
> >>>>> Hello Qais Yousef,
> >>>>>
> >>>>> We ran few benchmarks with PELT multiplier patch on a Snapdragon 8Gen2
> >>>>> based internal Android device and we are observing significant
> >>>>> improvements with PELT8 configuration compared to PELT32.
> >>>>>
> >>>>> Following are some of the benchmark results with PELT32 and PELT8
> >>>>> configuration:
> >>>>>
> >>>>> +-----------------+---------------+----------------+----------------+
> >>>>> | Test case | PELT32 | PELT8 |
> >>>>> +-----------------+---------------+----------------+----------------+
> >>>>> | | Overall | 711543 | 971275 |
> >>>>> | +---------------+----------------+----------------+
> >>>>> | | CPU | 193704 | 224378 |
> >>>>> | +---------------+----------------+----------------+
> >>>>> |ANTUTU V9.3.9 | GPU | 284650 | 424774 |
> >>>>> | +---------------+----------------+----------------+
> >>>>> | | MEM | 125207 | 160548 |
> >>>>> | +---------------+----------------+----------------+
> >>>>> | | UX | 107982 | 161575 |
> >>>>> +-----------------+---------------+----------------+----------------+
> >>>>> | | Single core | 1170 | 1268 |
> >>>>> |GeekBench V5.4.4 +---------------+----------------+----------------+
> >>>>> | | Multi core | 2530 | 3797 |
> >>>>> +-----------------+---------------+----------------+----------------+
> >>>>> | | Twitter | >50 Janks | 0 |
> >>>>> | SCROLL +---------------+----------------+----------------+
> >>>>> | | Contacts | >30 Janks | 0 |
> >>>>> +-----------------+---------------+----------------+----------------+
> >>>>>
> >>>>> Please let us know if you need any support with running any further
> >>>>> workloads for PELT32/PELT8 experiments, we can help with running the
> >>>>> experiments.
> >>>>
> >>>> Thanks a lot for the test results. Was this tried with this patch alone or
> >>>> the whole series applied?
> >>>>
> >>> I have only applied patch8(sched/pelt: Introduce PELT multiplier) for the tests.
> >>>
> >>>> Have you tried to tweak each policy response_time_ms introduced in patch
> >>>> 7 instead? With the series applied, boot with PELT8, record the response time
> >>>> values for each policy, then boot back again to PELT32 and use those values.
> >>>> Does this produce similar results?
> >>>>
> >>> As the device is based on 5.15 kernel, I will try to pull all the 8 patches
> >>> along with the dependency patches on 5.15 and try out the experiments as
> >>> suggested.
> >>
> >> Generally speaking, it would be better to compare with the latest
> >> kernel or at least close and which includes new features added since
> >> v5.15 (which is more than 2 years old now). I understand that this is
> >> not always easy or doable but you could be surprised by the benefit of
> >> some features like [0] merged since v5.15
> >>
> >> [0] https://lore.kernel.org/lkml/[email protected]/
> >>
> > Thank you Vincent for the suggestion, I will try to get the results on device running
> > with most recent kernel and update.
> >
> > Thanks,
> > Ashay Jaiswal
>
> Hello Qais Yousef and Vincent,
>
> Sorry for the delay, setting up internal device on latest kernel is taking more time than anticipated.
> We are trying to bring-up latest kernel on the device and will complete the testing with the latest
> cpufreq patches as you suggested.
>
> Regarding PELT multiplier patch [1], are we planning to merge it separately or will it be merged
> altogether with the cpufreq patches?
>
> [1]: https://lore.kernel.org/all/[email protected]/

I am working on updated version. I've been analysing the problem more since the
last posting and found more issues to fix and enhance the response time of the
system in terms of migration and DVFS.

For the PELT multiplier, I have an updated patch now based on Vincent
suggestion of a different implementation without anew clodk. I will include
that, but haven't been testing this part so far.

I hope to send the new version soon. Will CC you so you can try and see if
these improvements help. In my view the boot time PELT multiplier is only
necessary to help low-end type of system where the max performance is
relatively low, but the scheduler has a constant model (and response time)
which means they need to a different default behavior so workloads reach this
max performance point faster.

I can potentially see a powerful system needing that, but IMHO the trade-off
with power will be very costly. If everything goes to plan, I hope we can
introduce a per-task util_est_faster like Peter suggested in earlier discussion
which should help workload that needs best ST perf to reach that faster without
changing the system default behavior.

The biggest challenge is handling those bursty tasks, and I hope the proposal
I am working on will put us on the right direction in terms of better default
behavior for those tasks.

If you have any analysis on why you think faster PELT helps, that'd be great to
share.

Cheers

--
Qais Yousef