2016-10-05 05:18:00

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes

Sorry for being late Andreas :(

On 14-09-16, 16:56, Andreas Herrmann wrote:

First of all, thanks for your hardwork in getting these numbers out.
Really appreciate it.

> Below is some trace data. I hope it is of some help.
>
> (A) - sampling 10s period when system is idle
> (B) - sampling 10s period when system partially loaded (kernel
> compilation using 2 jobs)
>
> (1) 4.8-rc5
> (2) 4.8-rc5 with my patch (reintro of deadband effect within
> pcc-cpufreq)
> (3) 4.8-rc5 with reversal of 6393d6a102 (cpufreq: ondemand: Eliminate
> the deadband effect)
>
> Let me know whether you are looking for other trace data wrt this
> issue.
>
>
> Thanks,
>
> Andreas
>
> ---
>
> (A)-(1)
>
> # Total Lost Samples: 0
> # Samples: 41 of event 'power:cpu_frequency'
> # Event count (approx.): 41
> # Overhead Command Shared Object Symbol
> # ........ ............ ................ .............................
> 39.02% kworker/14:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 29.27% kworker/0:0 [kernel.vmlinux] [k] cpufreq_notify_transition
> 19.51% kworker/10:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 7.32% kworker/5:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> 2.44% kworker/23:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> 2.44% kworker/40:1 [kernel.vmlinux] [k] cpufreq_notify_transition
>
> (A)-(2)
>
> # Total Lost Samples: 0
> # Samples: 6 of event 'power:cpu_frequency'
> # Event count (approx.): 6
> # Overhead Command Shared Object Symbol
> # ........ ............ ................ .............................
> 33.33% kworker/1:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> 16.67% kworker/16:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 16.67% kworker/22:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 16.67% kworker/26:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 16.67% kworker/33:1 [kernel.vmlinux] [k] cpufreq_notify_transition
>
> (A)-(3)
>
> # Total Lost Samples: 0
> # Samples: 7 of event 'power:cpu_frequency'
> # Event count (approx.): 7
> # Overhead Command Shared Object Symbol
> # ........ ............ ................ .............................
> 28.57% kworker/58:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 14.29% kworker/19:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> 14.29% kworker/20:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> 14.29% kworker/22:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> 14.29% kworker/23:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 14.29% kworker/35:1 [kernel.vmlinux] [k] cpufreq_notify_transition
>
> ---
>
> (B)-(1)
>
> # Total Lost Samples: 0
> # Samples: 2K of event 'power:cpu_frequency'
> # Event count (approx.): 2382
> # Overhead Command Shared Object Symbol
> # ........ ............ ................ .............................
> 5.75% kworker/0:0 [kernel.vmlinux] [k] cpufreq_notify_transition
> 4.16% kworker/12:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> 3.11% kworker/17:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 2.94% kworker/2:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 2.73% kworker/19:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> ...
>
> (B)-(2)
>
> # Total Lost Samples: 0
> # Samples: 320 of event 'power:cpu_frequency'
> # Event count (approx.): 320
> # Overhead Command Shared Object Symbol
> # ........ ............ ................ .............................
> 4.69% kworker/56:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 4.06% kworker/12:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 4.06% kworker/28:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 4.06% kworker/6:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> 3.75% kworker/32:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> ...
>
> (B)-(3)
>
> # Total Lost Samples: 0
> # Samples: 333 of event 'power:cpu_frequency'
> # Event count (approx.): 333
> # Overhead Command Shared Object Symbol
> # ........ ............ ................ .............................
> 4.80% kworker/51:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 4.50% kworker/39:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 4.20% kworker/47:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 3.90% kworker/59:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> 3.90% kworker/7:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> ...

I am worried by all of these.. So, its not the DVFS transition which
took time, but cpufreq_notify_transition(). And probably one of the
drivers in your setup is screwing up here, which has registered with
cpufreq_register_notifier().

Can you please take a look at that ? Just check which all routines are
getting called as part of srcu_notifier_call_chain().

Also a simple (hacky) solution to fix the problem you have got is to
divide the range of frequency in steps (which you have already done
AFAICT in one of your patches) and not mention the frequencies that
were part of the deadband earlier. That will keep the CPU at either
low freq or some higher freqs.

But I am quite sure that we have an abuser here and we better find it
now.

--
viresh


2016-10-11 06:29:52

by Andreas Herrmann

[permalink] [raw]
Subject: Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes

On Wed, Oct 05, 2016 at 10:47:53AM +0530, Viresh Kumar wrote:
> Sorry for being late Andreas :(
>
> On 14-09-16, 16:56, Andreas Herrmann wrote:
>
> First of all, thanks for your hardwork in getting these numbers out.
> Really appreciate it.
>
> > Below is some trace data. I hope it is of some help.
> >
> > (A) - sampling 10s period when system is idle
> > (B) - sampling 10s period when system partially loaded (kernel
> > compilation using 2 jobs)
> >
> > (1) 4.8-rc5
> > (2) 4.8-rc5 with my patch (reintro of deadband effect within
> > pcc-cpufreq)
> > (3) 4.8-rc5 with reversal of 6393d6a102 (cpufreq: ondemand: Eliminate
> > the deadband effect)
> >
> > Let me know whether you are looking for other trace data wrt this
> > issue.
> >
> >
> > Thanks,
> >
> > Andreas
> >
> > ---
> >
> > (A)-(1)
> >
> > # Total Lost Samples: 0
> > # Samples: 41 of event 'power:cpu_frequency'
> > # Event count (approx.): 41
> > # Overhead Command Shared Object Symbol
> > # ........ ............ ................ .............................
> > 39.02% kworker/14:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 29.27% kworker/0:0 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 19.51% kworker/10:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 7.32% kworker/5:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 2.44% kworker/23:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 2.44% kworker/40:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> >
> > (A)-(2)
> >
> > # Total Lost Samples: 0
> > # Samples: 6 of event 'power:cpu_frequency'
> > # Event count (approx.): 6
> > # Overhead Command Shared Object Symbol
> > # ........ ............ ................ .............................
> > 33.33% kworker/1:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 16.67% kworker/16:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 16.67% kworker/22:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 16.67% kworker/26:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 16.67% kworker/33:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> >
> > (A)-(3)
> >
> > # Total Lost Samples: 0
> > # Samples: 7 of event 'power:cpu_frequency'
> > # Event count (approx.): 7
> > # Overhead Command Shared Object Symbol
> > # ........ ............ ................ .............................
> > 28.57% kworker/58:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 14.29% kworker/19:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 14.29% kworker/20:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 14.29% kworker/22:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 14.29% kworker/23:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 14.29% kworker/35:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> >
> > ---
> >
> > (B)-(1)
> >
> > # Total Lost Samples: 0
> > # Samples: 2K of event 'power:cpu_frequency'
> > # Event count (approx.): 2382
> > # Overhead Command Shared Object Symbol
> > # ........ ............ ................ .............................
> > 5.75% kworker/0:0 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 4.16% kworker/12:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 3.11% kworker/17:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 2.94% kworker/2:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 2.73% kworker/19:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > ...
> >
> > (B)-(2)
> >
> > # Total Lost Samples: 0
> > # Samples: 320 of event 'power:cpu_frequency'
> > # Event count (approx.): 320
> > # Overhead Command Shared Object Symbol
> > # ........ ............ ................ .............................
> > 4.69% kworker/56:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 4.06% kworker/12:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 4.06% kworker/28:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 4.06% kworker/6:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 3.75% kworker/32:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> > ...
> >
> > (B)-(3)
> >
> > # Total Lost Samples: 0
> > # Samples: 333 of event 'power:cpu_frequency'
> > # Event count (approx.): 333
> > # Overhead Command Shared Object Symbol
> > # ........ ............ ................ .............................
> > 4.80% kworker/51:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 4.50% kworker/39:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 4.20% kworker/47:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 3.90% kworker/59:1 [kernel.vmlinux] [k] cpufreq_notify_transition
> > 3.90% kworker/7:2 [kernel.vmlinux] [k] cpufreq_notify_transition
> > ...
>
> I am worried by all of these.. So, its not the DVFS transition which
> took time, but cpufreq_notify_transition(). And probably one of the
> drivers in your setup is screwing up here, which has registered with
> cpufreq_register_notifier().
>
> Can you please take a look at that ? Just check which all routines are
> getting called as part of srcu_notifier_call_chain().

I'll take a look into this.

> Also a simple (hacky) solution to fix the problem you have got is to
> divide the range of frequency in steps (which you have already done
> AFAICT in one of your patches) and not mention the frequencies that
> were part of the deadband earlier. That will keep the CPU at either
> low freq or some higher freqs.

Yes, seems worth to try this.

> But I am quite sure that we have an abuser here and we better find it
> now.


Thanks,

Andreas