LinuxLists.cc - [RFC 1/4] sched/core: Introduce per_cpu counter to track latency sensitive tasks

2020-05-07 13:40:04

Subject: [RFC 1/4] sched/core: Introduce per_cpu counter to track latency sensitive tasks

The "nr_lat_sensitive" per_cpu variable provides hints on the possible
number of latency-sensitive tasks occupying the CPU. This hints further
helps in inhibiting the CPUIDLE governor from calling deeper IDLE states
(next patches includes this).

Signed-off-by: Parth Shah <[email protected]>
---
kernel/sched/core.c | 2 ++
kernel/sched/sched.h | 2 ++
2 files changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2576fd8cacf9..2d8b76f41d61 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6606,6 +6606,7 @@ static struct kmem_cache *task_group_cache __read_mostly;

DECLARE_PER_CPU(cpumask_var_t, load_balance_mask);
DECLARE_PER_CPU(cpumask_var_t, select_idle_mask);
+DEFINE_PER_CPU(int, nr_lat_sensitive);

void __init sched_init(void)
{
@@ -6737,6 +6738,7 @@ void __init sched_init(void)
#endif /* CONFIG_SMP */
hrtick_rq_init(rq);
atomic_set(&rq->nr_iowait, 0);
+ per_cpu(nr_lat_sensitive, i) = 0;
}

set_load_weight(&init_task, false);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b2c86dfe913e..5c41020c530e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1439,6 +1439,8 @@ DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa);
DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
+DECLARE_PER_CPU(int, nr_lat_sensitive);
+
extern struct static_key_false sched_asym_cpucapacity;

struct sched_group_capacity {
--
2.17.2

2020-05-08 08:44:24

by Pavankumar Kondeti

[permalink] [raw]

Subject: Re: [RFC 1/4] sched/core: Introduce per_cpu counter to track latency sensitive tasks

On Thu, May 07, 2020 at 07:07:20PM +0530, Parth Shah wrote:
> The "nr_lat_sensitive" per_cpu variable provides hints on the possible
> number of latency-sensitive tasks occupying the CPU. This hints further
> helps in inhibiting the CPUIDLE governor from calling deeper IDLE states
> (next patches includes this).
>

Can you please explain the intended use case here? Once a latency sensitive
task is created, it prevents c-state on a CPU whether the task runs again
or not in the near future.

I assume, either these latency sensitive tasks won't be around for long time
or applications set/reset latency sensitive nice value dynamically.

Thanks,
Pavan
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-05-08 11:35:17

by Parth Shah

[permalink] [raw]

Subject: Re: [RFC 1/4] sched/core: Introduce per_cpu counter to track latency sensitive tasks

On 5/8/20 2:10 PM, Pavan Kondeti wrote:
> On Thu, May 07, 2020 at 07:07:20PM +0530, Parth Shah wrote:
>> The "nr_lat_sensitive" per_cpu variable provides hints on the possible
>> number of latency-sensitive tasks occupying the CPU. This hints further
>> helps in inhibiting the CPUIDLE governor from calling deeper IDLE states
>> (next patches includes this).
>>
>
> Can you please explain the intended use case here? Once a latency sensitive
> task is created, it prevents c-state on a CPU whether the task runs again
> or not in the near future.
>
> I assume, either these latency sensitive tasks won't be around for long time
> or applications set/reset latency sensitive nice value dynamically.
>

Intended use-cases is to get rid of IDLE states exit_latency for
wakeup-sleep-wakeup pattern workload. This types of tasks (like GPU
workloads, few DB benchmarks) makes CPU go IDLE due to its low runtime on
rq, resulting in higher wakeups due to IDLE states exit_latency.

And this kind of workloads may last for long time as well.

In current scenario, Sysadmins do disable all IDLE states or use PM_QoS to
not have latency penalty on workload. This model was good when core counts
were less. But now higher core count and Turbo frequencies have led to save
power in-order to get higher performance and hence this patch-set tries to
do PM_QoS like thing but at per-task granularity.

If idea seems good to go, then this can potentially be extended to do IDLE
gating upto certain level where latency_nice value hints on which IDLE
states can't be chosen, just like PM_QoS have cpu_dma_latency constraints.

Thanks,
Parth

> Thanks,
> Pavan
>

2020-05-09 02:17:13

by Pavankumar Kondeti

[permalink] [raw]

Subject: Re: [RFC 1/4] sched/core: Introduce per_cpu counter to track latency sensitive tasks

On Fri, May 08, 2020 at 05:00:44PM +0530, Parth Shah wrote:
>
>
> On 5/8/20 2:10 PM, Pavan Kondeti wrote:
> > On Thu, May 07, 2020 at 07:07:20PM +0530, Parth Shah wrote:
> >> The "nr_lat_sensitive" per_cpu variable provides hints on the possible
> >> number of latency-sensitive tasks occupying the CPU. This hints further
> >> helps in inhibiting the CPUIDLE governor from calling deeper IDLE states
> >> (next patches includes this).
> >>
> >
> > Can you please explain the intended use case here? Once a latency sensitive
> > task is created, it prevents c-state on a CPU whether the task runs again
> > or not in the near future.
> >
> > I assume, either these latency sensitive tasks won't be around for long time
> > or applications set/reset latency sensitive nice value dynamically.
> >
>
> Intended use-cases is to get rid of IDLE states exit_latency for
> wakeup-sleep-wakeup pattern workload. This types of tasks (like GPU
> workloads, few DB benchmarks) makes CPU go IDLE due to its low runtime on
> rq, resulting in higher wakeups due to IDLE states exit_latency.
>
> And this kind of workloads may last for long time as well.
>
> In current scenario, Sysadmins do disable all IDLE states or use PM_QoS to
> not have latency penalty on workload. This model was good when core counts
> were less. But now higher core count and Turbo frequencies have led to save
> power in-order to get higher performance and hence this patch-set tries to
> do PM_QoS like thing but at per-task granularity.
>
Thanks for the details. Instead of disabling C-states for all CPUs, we disable
it for CPUS that host latency sensitive tasks. Since this is hooked with the
scheduler, the task migrations are accounted for. Got it.

Thanks,
Pavan

> If idea seems good to go, then this can potentially be extended to do IDLE
> gating upto certain level where latency_nice value hints on which IDLE
> states can't be chosen, just like PM_QoS have cpu_dma_latency constraints.
>
>
> Thanks,
> Parth
>
>
> > Thanks,
> > Pavan
> >

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.