LinuxLists.cc - Re: [PATCH] cpufreq: governor: Be friendly towards latency-sensitive bursty workloads

2014-06-03 08:18:49

Subject: Re: [PATCH] cpufreq: governor: Be friendly towards latency-sensitive bursty workloads

On 27 May 2014 02:23, Srivatsa S. Bhat <[email protected]> wrote:

Looks fine, some nits..

> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> -void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
> +void dbs_check_cpu(struct dbs_data *dbs_data, int cpu,
> + unsigned int sampling_rate)

We don't need to pass a new argument, we can get all the information from
dbs_data alone. Its already done for multiple routines. Let me know if you
find it difficult to figure out..

2014-06-03 09:34:03

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: governor: Be friendly towards latency-sensitive bursty workloads

On 06/03/2014 01:48 PM, Viresh Kumar wrote:
> On 27 May 2014 02:23, Srivatsa S. Bhat <[email protected]> wrote:
>
> Looks fine, some nits..
>
>> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
>> -void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>> +void dbs_check_cpu(struct dbs_data *dbs_data, int cpu,
>> + unsigned int sampling_rate)
>
> We don't need to pass a new argument, we can get all the information from
> dbs_data alone. Its already done for multiple routines. Let me know if you
> find it difficult to figure out..
>

Sure, that would be a good improvement. Does something like the patch below
look good? I have only compile-tested it. I'll send out the patch with changelog
once I finish testing it.

Thank you!

Regards,
Srivatsa S. Bhat

----------------------------------------------------------------------------

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index e1c6433..3e8588f 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -36,14 +36,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
struct od_dbs_tuners *od_tuners = dbs_data->tuners;
struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
struct cpufreq_policy *policy;
+ unsigned int sampling_rate;
unsigned int max_load = 0;
unsigned int ignore_nice;
unsigned int j;

- if (dbs_data->cdata->governor == GOV_ONDEMAND)
+ if (dbs_data->cdata->governor == GOV_ONDEMAND) {
+ struct od_cpu_dbs_info_s *od_dbs_info;
+
+ /*
+ * Sometimes, the ondemand governor uses an additional
+ * multiplier to give long delays. So apply this multiplier to
+ * the 'sampling_rate', so as to keep the wake-up-from-idle
+ * detection logic a bit conservative.
+ */
+ sampling_rate = od_tuners->sampling_rate;
+ od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu);
+ sampling_rate *= od_dbs_info->rate_mult;
+
ignore_nice = od_tuners->ignore_nice_load;
- else
+ } else {
+ sampling_rate = cs_tuners->sampling_rate;
ignore_nice = cs_tuners->ignore_nice_load;
+ }

policy = cdbs->cur_policy;

@@ -96,7 +111,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
if (unlikely(!wall_time || wall_time < idle_time))
continue;

- load = 100 * (wall_time - idle_time) / wall_time;
+ /*
+ * If the CPU had gone completely idle, and a task just woke up
+ * on this CPU now, it would be unfair to calculate 'load' the
+ * usual way for this elapsed time-window, because it will show
+ * near-zero load, irrespective of how CPU intensive the new
+ * task is. This is undesirable for latency-sensitive bursty
+ * workloads.
+ *
+ * To avoid this, we reuse the 'load' from the previous
+ * time-window and give this task a chance to start with a
+ * reasonably high CPU frequency.
+ *
+ * Detecting this situation is easy: the governor's deferrable
+ * timer would not have fired during CPU-idle periods. Hence
+ * an unusually large 'wall_time' (as compared to the sampling
+ * rate) indicates this scenario.
+ */
+ if (unlikely(wall_time > (2 * sampling_rate))) {
+ load = j_cdbs->prev_load;
+ } else {
+ load = 100 * (wall_time - idle_time) / wall_time;
+ j_cdbs->prev_load = load;
+ }

if (load > max_load)
max_load = load;
@@ -323,6 +360,10 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
j_cdbs->cur_policy = policy;
j_cdbs->prev_cpu_idle = get_cpu_idle_time(j,
&j_cdbs->prev_cpu_wall, io_busy);
+ j_cdbs->prev_load = 100 * (j_cdbs->prev_cpu_wall -
+ j_cdbs->prev_cpu_idle) /
+ j_cdbs->prev_cpu_wall;
+
if (ignore_nice)
j_cdbs->prev_cpu_nice =
kcpustat_cpu(j).cpustat[CPUTIME_NICE];
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index bfb9ae1..b56552b 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -134,6 +134,7 @@ struct cpu_dbs_common_info {
u64 prev_cpu_idle;
u64 prev_cpu_wall;
u64 prev_cpu_nice;
+ unsigned int prev_load;
struct cpufreq_policy *cur_policy;
struct delayed_work work;
/*

2014-06-03 09:39:21

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: governor: Be friendly towards latency-sensitive bursty workloads

On 3 June 2014 15:02, Srivatsa S. Bhat <[email protected]> wrote:
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index e1c6433..3e8588f 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -36,14 +36,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
> struct od_dbs_tuners *od_tuners = dbs_data->tuners;
> struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
> struct cpufreq_policy *policy;
> + unsigned int sampling_rate;
> unsigned int max_load = 0;
> unsigned int ignore_nice;
> unsigned int j;
>
> - if (dbs_data->cdata->governor == GOV_ONDEMAND)
> + if (dbs_data->cdata->governor == GOV_ONDEMAND) {
> + struct od_cpu_dbs_info_s *od_dbs_info;
> +
> + /*
> + * Sometimes, the ondemand governor uses an additional
> + * multiplier to give long delays. So apply this multiplier to
> + * the 'sampling_rate', so as to keep the wake-up-from-idle
> + * detection logic a bit conservative.
> + */
> + sampling_rate = od_tuners->sampling_rate;
> + od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu);

Probably do both above right after definition of od_dbs_info or merge
above with it. and just keep below after the comment ?

> + sampling_rate *= od_dbs_info->rate_mult;

2014-06-03 10:06:21

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: governor: Be friendly towards latency-sensitive bursty workloads

On 06/03/2014 03:09 PM, Viresh Kumar wrote:
> On 3 June 2014 15:02, Srivatsa S. Bhat <[email protected]> wrote:
>> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
>> index e1c6433..3e8588f 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -36,14 +36,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>> struct od_dbs_tuners *od_tuners = dbs_data->tuners;
>> struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
>> struct cpufreq_policy *policy;
>> + unsigned int sampling_rate;
>> unsigned int max_load = 0;
>> unsigned int ignore_nice;
>> unsigned int j;
>>
>> - if (dbs_data->cdata->governor == GOV_ONDEMAND)
>> + if (dbs_data->cdata->governor == GOV_ONDEMAND) {
>> + struct od_cpu_dbs_info_s *od_dbs_info;
>> +
>> + /*
>> + * Sometimes, the ondemand governor uses an additional
>> + * multiplier to give long delays. So apply this multiplier to
>> + * the 'sampling_rate', so as to keep the wake-up-from-idle
>> + * detection logic a bit conservative.
>> + */
>> + sampling_rate = od_tuners->sampling_rate;
>> + od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu);
>
> Probably do both above right after definition of od_dbs_info or merge
> above with it. and just keep below after the comment ?
>
>> + sampling_rate *= od_dbs_info->rate_mult;
>

Well, the method I used keeps the organization such that the code following
the comment does precisely what the comment says (i.e, get the sampling_rate,
fetch the multiplier, and then multiply). So I feel it makes it easier to
understand.

Regards,
Srivatsa S. Bhat

2014-06-03 10:08:20

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: governor: Be friendly towards latency-sensitive bursty workloads

On 3 June 2014 15:34, Srivatsa S. Bhat <[email protected]> wrote:
> Well, the method I used keeps the organization such that the code following
> the comment does precisely what the comment says (i.e, get the sampling_rate,
> fetch the multiplier, and then multiply). So I feel it makes it easier to
> understand.

It looked like the comment is there only for this special statement:

>>> + sampling_rate *= od_dbs_info->rate_mult;

And so suggested that :)

Anyway move this up as it doesn't belong to comment for sure.
>> + od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu);

2014-06-03 10:15:11

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: governor: Be friendly towards latency-sensitive bursty workloads

On 06/03/2014 03:38 PM, Viresh Kumar wrote:
> On 3 June 2014 15:34, Srivatsa S. Bhat <[email protected]> wrote:
>> Well, the method I used keeps the organization such that the code following
>> the comment does precisely what the comment says (i.e, get the sampling_rate,
>> fetch the multiplier, and then multiply). So I feel it makes it easier to
>> understand.
>
> It looked like the comment is there only for this special statement:
>
>>>> + sampling_rate *= od_dbs_info->rate_mult;
>
> And so suggested that :)
>
> Anyway move this up as it doesn't belong to comment for sure.
>>> + od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu);
>

Fair enough :) Here it is:

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index e1c6433..2597bbe 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -36,14 +36,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
struct od_dbs_tuners *od_tuners = dbs_data->tuners;
struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
struct cpufreq_policy *policy;
+ unsigned int sampling_rate;
unsigned int max_load = 0;
unsigned int ignore_nice;
unsigned int j;

- if (dbs_data->cdata->governor == GOV_ONDEMAND)
+ if (dbs_data->cdata->governor == GOV_ONDEMAND) {
+ struct od_cpu_dbs_info_s *od_dbs_info =
+ dbs_data->cdata->get_cpu_dbs_info_s(cpu);
+
+ /*
+ * Sometimes, the ondemand governor uses an additional
+ * multiplier to give long delays. So apply this multiplier to
+ * the 'sampling_rate', so as to keep the wake-up-from-idle
+ * detection logic a bit conservative.
+ */
+ sampling_rate = od_tuners->sampling_rate;
+ sampling_rate *= od_dbs_info->rate_mult;
+
ignore_nice = od_tuners->ignore_nice_load;
- else
+ } else {
+ sampling_rate = cs_tuners->sampling_rate;
ignore_nice = cs_tuners->ignore_nice_load;
+ }

policy = cdbs->cur_policy;

@@ -96,7 +111,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
if (unlikely(!wall_time || wall_time < idle_time))
continue;

- load = 100 * (wall_time - idle_time) / wall_time;
+ /*
+ * If the CPU had gone completely idle, and a task just woke up
+ * on this CPU now, it would be unfair to calculate 'load' the
+ * usual way for this elapsed time-window, because it will show
+ * near-zero load, irrespective of how CPU intensive the new
+ * task is. This is undesirable for latency-sensitive bursty
+ * workloads.
+ *
+ * To avoid this, we reuse the 'load' from the previous
+ * time-window and give this task a chance to start with a
+ * reasonably high CPU frequency.
+ *
+ * Detecting this situation is easy: the governor's deferrable
+ * timer would not have fired during CPU-idle periods. Hence
+ * an unusually large 'wall_time' (as compared to the sampling
+ * rate) indicates this scenario.
+ */
+ if (unlikely(wall_time > (2 * sampling_rate))) {
+ load = j_cdbs->prev_load;
+ } else {
+ load = 100 * (wall_time - idle_time) / wall_time;
+ j_cdbs->prev_load = load;
+ }

if (load > max_load)
max_load = load;
@@ -323,6 +360,10 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
j_cdbs->cur_policy = policy;
j_cdbs->prev_cpu_idle = get_cpu_idle_time(j,
&j_cdbs->prev_cpu_wall, io_busy);
+ j_cdbs->prev_load = 100 * (j_cdbs->prev_cpu_wall -
+ j_cdbs->prev_cpu_idle) /
+ j_cdbs->prev_cpu_wall;
+
if (ignore_nice)
j_cdbs->prev_cpu_nice =
kcpustat_cpu(j).cpustat[CPUTIME_NICE];
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index bfb9ae1..b56552b 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -134,6 +134,7 @@ struct cpu_dbs_common_info {
u64 prev_cpu_idle;
u64 prev_cpu_wall;
u64 prev_cpu_nice;
+ unsigned int prev_load;
struct cpufreq_policy *cur_policy;
struct delayed_work work;
/*

2014-06-03 10:16:16

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: governor: Be friendly towards latency-sensitive bursty workloads

On 3 June 2014 15:43, Srivatsa S. Bhat <[email protected]> wrote:
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index e1c6433..2597bbe 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -36,14 +36,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
> struct od_dbs_tuners *od_tuners = dbs_data->tuners;
> struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
> struct cpufreq_policy *policy;
> + unsigned int sampling_rate;
> unsigned int max_load = 0;
> unsigned int ignore_nice;
> unsigned int j;
>
> - if (dbs_data->cdata->governor == GOV_ONDEMAND)
> + if (dbs_data->cdata->governor == GOV_ONDEMAND) {
> + struct od_cpu_dbs_info_s *od_dbs_info =
> + dbs_data->cdata->get_cpu_dbs_info_s(cpu);
> +
> + /*
> + * Sometimes, the ondemand governor uses an additional
> + * multiplier to give long delays. So apply this multiplier to
> + * the 'sampling_rate', so as to keep the wake-up-from-idle
> + * detection logic a bit conservative.
> + */
> + sampling_rate = od_tuners->sampling_rate;
> + sampling_rate *= od_dbs_info->rate_mult;
> +
> ignore_nice = od_tuners->ignore_nice_load;
> - else
> + } else {
> + sampling_rate = cs_tuners->sampling_rate;
> ignore_nice = cs_tuners->ignore_nice_load;
> + }
>
> policy = cdbs->cur_policy;
>
> @@ -96,7 +111,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
> if (unlikely(!wall_time || wall_time < idle_time))
> continue;
>
> - load = 100 * (wall_time - idle_time) / wall_time;
> + /*
> + * If the CPU had gone completely idle, and a task just woke up
> + * on this CPU now, it would be unfair to calculate 'load' the
> + * usual way for this elapsed time-window, because it will show
> + * near-zero load, irrespective of how CPU intensive the new
> + * task is. This is undesirable for latency-sensitive bursty
> + * workloads.
> + *
> + * To avoid this, we reuse the 'load' from the previous
> + * time-window and give this task a chance to start with a
> + * reasonably high CPU frequency.
> + *
> + * Detecting this situation is easy: the governor's deferrable
> + * timer would not have fired during CPU-idle periods. Hence
> + * an unusually large 'wall_time' (as compared to the sampling
> + * rate) indicates this scenario.
> + */
> + if (unlikely(wall_time > (2 * sampling_rate))) {
> + load = j_cdbs->prev_load;
> + } else {
> + load = 100 * (wall_time - idle_time) / wall_time;
> + j_cdbs->prev_load = load;
> + }
>
> if (load > max_load)
> max_load = load;
> @@ -323,6 +360,10 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
> j_cdbs->cur_policy = policy;
> j_cdbs->prev_cpu_idle = get_cpu_idle_time(j,
> &j_cdbs->prev_cpu_wall, io_busy);
> + j_cdbs->prev_load = 100 * (j_cdbs->prev_cpu_wall -
> + j_cdbs->prev_cpu_idle) /
> + j_cdbs->prev_cpu_wall;
> +
> if (ignore_nice)
> j_cdbs->prev_cpu_nice =
> kcpustat_cpu(j).cpustat[CPUTIME_NICE];
> diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
> index bfb9ae1..b56552b 100644
> --- a/drivers/cpufreq/cpufreq_governor.h
> +++ b/drivers/cpufreq/cpufreq_governor.h
> @@ -134,6 +134,7 @@ struct cpu_dbs_common_info {
> u64 prev_cpu_idle;
> u64 prev_cpu_wall;
> u64 prev_cpu_nice;
> + unsigned int prev_load;
> struct cpufreq_policy *cur_policy;
> struct delayed_work work;
> /*

Acked-by: Viresh Kumar <[email protected]>

2014-06-03 10:17:40

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: governor: Be friendly towards latency-sensitive bursty workloads

On 06/03/2014 03:46 PM, Viresh Kumar wrote:
> On 3 June 2014 15:43, Srivatsa S. Bhat <[email protected]> wrote:
>> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
>> index e1c6433..2597bbe 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -36,14 +36,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>> struct od_dbs_tuners *od_tuners = dbs_data->tuners;
>> struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
>> struct cpufreq_policy *policy;
>> + unsigned int sampling_rate;
>> unsigned int max_load = 0;
>> unsigned int ignore_nice;
>> unsigned int j;
[...]
>> diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
>> index bfb9ae1..b56552b 100644
>> --- a/drivers/cpufreq/cpufreq_governor.h
>> +++ b/drivers/cpufreq/cpufreq_governor.h
>> @@ -134,6 +134,7 @@ struct cpu_dbs_common_info {
>> u64 prev_cpu_idle;
>> u64 prev_cpu_wall;
>> u64 prev_cpu_nice;
>> + unsigned int prev_load;
>> struct cpufreq_policy *cur_policy;
>> struct delayed_work work;
>> /*
>
> Acked-by: Viresh Kumar <[email protected]>
>

Thanks a lot!

Regards,
Srivatsa S. Bhat