2023-03-22 14:06:42

by Dietmar Eggemann

[permalink] [raw]
Subject: [RFC PATCH 0/2] sched/cpuset: Fix DL BW accounting in case can_attach() fails

I followed Longman's idea to add a `deadline task transfer count` into
cpuset and only update the `dl task count` in cpuset_attach().

Moreover, I switched from per-task DL BW request to a per-cpuset one.
This way we don't have to free per-task in case xxx_can_attach() fails.

The DL BW freeing is handled in cpuset_cancel_attach() for the case
`multiple controllers and one of the non-cpuset can_attach() fails`.

Only lightly tested on cgroup v1 with exclusive cpusets so far.

Dietmar Eggemann (2):
sched/deadline: Create DL BW alloc, free & check overflow interface
cgroup/cpuset: Free DL BW in case can_attach() fails

include/linux/sched.h | 4 ++-
kernel/cgroup/cpuset.c | 55 +++++++++++++++++++++++++++++++++++++----
kernel/sched/core.c | 19 +++-----------
kernel/sched/deadline.c | 53 +++++++++++++++++++++++++++++----------
kernel/sched/sched.h | 2 +-
5 files changed, 97 insertions(+), 36 deletions(-)

--
2.25.1


2023-03-22 14:06:43

by Dietmar Eggemann

[permalink] [raw]
Subject: [RFC PATCH 1/2] sched/deadline: Create DL BW alloc, free & check overflow interface

Rework the existing dl_cpu_busy() interface which offered DL BW
overflow checking and per-task DL BW allocation.

Add dl_bw_free() as an interface to be able to free DL BW.
It will be used to allow freeing of the DL BW request done during
cpuset_can_attach() in case multiple controllers are attached to the
cgroup next to cpuset and one of the non-cpuset can_attach() fails.

Signed-off-by: Dietmar Eggemann <[email protected]>
---
include/linux/sched.h | 2 ++
kernel/sched/core.c | 4 ++--
kernel/sched/deadline.c | 53 +++++++++++++++++++++++++++++++----------
kernel/sched/sched.h | 2 +-
4 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4df2b3e76b30..658e997ba057 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1847,6 +1847,8 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags)

extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus);
+extern int dl_bw_alloc(int cpu, u64 dl_bw);
+extern void dl_bw_free(int cpu, u64 dl_bw);
#ifdef CONFIG_SMP
extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d586a8440348..2f07aecb7434 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9226,7 +9226,7 @@ int task_can_attach(struct task_struct *p,

if (unlikely(cpu >= nr_cpu_ids))
return -EINVAL;
- ret = dl_cpu_busy(cpu, p);
+ ret = dl_bw_alloc(cpu, p->dl.dl_bw);
}

out:
@@ -9511,7 +9511,7 @@ static void cpuset_cpu_active(void)
static int cpuset_cpu_inactive(unsigned int cpu)
{
if (!cpuhp_tasks_frozen) {
- int ret = dl_cpu_busy(cpu, NULL);
+ int ret = dl_bw_check_overflow(cpu);

if (ret)
return ret;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 71b24371a6f7..41ed6c6d2628 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3033,26 +3033,38 @@ int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur,
return ret;
}

-int dl_cpu_busy(int cpu, struct task_struct *p)
+enum dl_bw_request {
+ dl_bw_req_check_overflow = 0,
+ dl_bw_req_alloc,
+ dl_bw_req_free
+};
+
+static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
{
- unsigned long flags, cap;
+ unsigned long flags;
struct dl_bw *dl_b;
- bool overflow;
+ bool overflow = 0;

rcu_read_lock_sched();
dl_b = dl_bw_of(cpu);
raw_spin_lock_irqsave(&dl_b->lock, flags);
- cap = dl_bw_capacity(cpu);
- overflow = __dl_overflow(dl_b, cap, 0, p ? p->dl.dl_bw : 0);

- if (!overflow && p) {
- /*
- * We reserve space for this task in the destination
- * root_domain, as we can't fail after this point.
- * We will free resources in the source root_domain
- * later on (see set_cpus_allowed_dl()).
- */
- __dl_add(dl_b, p->dl.dl_bw, dl_bw_cpus(cpu));
+ if (req == dl_bw_req_free) {
+ __dl_sub(dl_b, dl_bw, dl_bw_cpus(cpu));
+ } else {
+ unsigned long cap = dl_bw_capacity(cpu);
+
+ overflow = __dl_overflow(dl_b, cap, 0, dl_bw);
+
+ if (req == dl_bw_req_alloc && !overflow) {
+ /*
+ * We reserve space in the destination
+ * root_domain, as we can't fail after this point.
+ * We will free resources in the source root_domain
+ * later on (see set_cpus_allowed_dl()).
+ */
+ __dl_add(dl_b, dl_bw, dl_bw_cpus(cpu));
+ }
}

raw_spin_unlock_irqrestore(&dl_b->lock, flags);
@@ -3060,6 +3072,21 @@ int dl_cpu_busy(int cpu, struct task_struct *p)

return overflow ? -EBUSY : 0;
}
+
+int dl_bw_check_overflow(int cpu)
+{
+ return dl_bw_manage(dl_bw_req_check_overflow, cpu, 0);
+}
+
+int dl_bw_alloc(int cpu, u64 dl_bw)
+{
+ return dl_bw_manage(dl_bw_req_alloc, cpu, dl_bw);
+}
+
+void dl_bw_free(int cpu, u64 dl_bw)
+{
+ dl_bw_manage(dl_bw_req_free, cpu, dl_bw);
+}
#endif

#ifdef CONFIG_SCHED_DEBUG
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3e8df6d31c1e..6cb4cf878fe2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -330,7 +330,7 @@ extern void __getparam_dl(struct task_struct *p, struct sched_attr *attr);
extern bool __checkparam_dl(const struct sched_attr *attr);
extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr);
extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
-extern int dl_cpu_busy(int cpu, struct task_struct *p);
+extern int dl_bw_check_overflow(int cpu);

#ifdef CONFIG_CGROUP_SCHED

--
2.25.1

2023-03-23 09:40:17

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] sched/cpuset: Fix DL BW accounting in case can_attach() fails

Hi,

On 22/03/23 14:59, Dietmar Eggemann wrote:
> I followed Longman's idea to add a `deadline task transfer count` into
> cpuset and only update the `dl task count` in cpuset_attach().
>
> Moreover, I switched from per-task DL BW request to a per-cpuset one.
> This way we don't have to free per-task in case xxx_can_attach() fails.
>
> The DL BW freeing is handled in cpuset_cancel_attach() for the case
> `multiple controllers and one of the non-cpuset can_attach() fails`.
>
> Only lightly tested on cgroup v1 with exclusive cpusets so far.

This makes sense to me. Thanks for working on it!

Guess I might incorporate these in my (RFC) series and re-post the whole
lot?

Best,
Juri

2023-03-24 14:57:32

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] sched/cpuset: Fix DL BW accounting in case can_attach() fails

On 23/03/2023 10:33, Juri Lelli wrote:
> Hi,
>
> On 22/03/23 14:59, Dietmar Eggemann wrote:

[...]

> This makes sense to me. Thanks for working on it!
>
> Guess I might incorporate these in my (RFC) series and re-post the whole
> lot?

Yes, please do so.