Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp870473rwr; Thu, 4 May 2023 10:37:12 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5MrRwiVad058TKD/9s26HiaDTG67CISEUY5WVWcs8rKimrT+Sj5YTT4kKZTfOqcFDenzTK X-Received: by 2002:a05:6a20:728b:b0:f4:ac2d:8eb7 with SMTP id o11-20020a056a20728b00b000f4ac2d8eb7mr3182794pzk.31.1683221832590; Thu, 04 May 2023 10:37:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683221832; cv=none; d=google.com; s=arc-20160816; b=RKBuB29Z2fOHje+fg9PRJtPx1WOHUQfwBcVtzhaiXG5MXGnqkrxNph+iYNPEqK1Ec+ aZOW+PDQWPaJRmXX85QnyxHmQtYVNqS2DxOxu+05fRNWXxPTiZVu1iChX90j2dO2Uchc zj4YT3Xw62ungqiTEnMzbrSCWpWi4Obn17RJU2rYBX+JWgdWGvg7saxr2zEHqZng5wec HBdTzYH2JVHZx88rnWcaLM1FZof0t6WIzc5+7503ph1R9uuC3/GCxMutxusKexSQWsWg tfkVmKeZg5G7nASadzY/1qqlCs/PL4DZAQZ0r6qe0YAUCGSAt6z4BVFxnEkBtlv6Kwg8 ho5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=FV39ewPVGlVOlvePcwc0raRzFYBoMzesfvdjoOjg6ZY=; b=DzcCS5JImWxpd5htsRzffBT92E19tFzHRzuXoSawAxgw7tiVXIjCYpQITXfyTR342y WoMD2c93wKClAEB45ihHAZQ89evtZj5Fur828nvBM928Vo8CUnBcdMMLg0/0DBSp8HZM bbJUXOYLBBc5yhFsa9FiJYNifd9vcZJ9ROGboL2UxjUJeydPVrX2tz7YRsL5JBF0i992 3Nrt1D2DOpsJ+T5JXJFBTLCXQhm+2K0oEa8QG0OYTIzEUXI7L4loLwEmnmQt3GP3DeQR qB18rArSkg+zPlebNshM570smOUEv5GxnEv3sx2EjmcJkfY1AvWp3R8BYP80AHFNUuFA VDpA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u185-20020a6385c2000000b00525064abb76si31656891pgd.495.2023.05.04.10.36.55; Thu, 04 May 2023 10:37:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229962AbjEDRV6 (ORCPT + 99 others); Thu, 4 May 2023 13:21:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229778AbjEDRV4 (ORCPT ); Thu, 4 May 2023 13:21:56 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 096E2E5D; Thu, 4 May 2023 10:21:55 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F36681FB; Thu, 4 May 2023 10:22:38 -0700 (PDT) Received: from [192.168.178.6] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1670A3F64C; Thu, 4 May 2023 10:21:50 -0700 (PDT) Message-ID: <2c1a8c86-7c99-4edf-d18d-cc3227a02150@arm.com> Date: Thu, 4 May 2023 19:21:49 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH v2 5/6] sched/deadline: Create DL BW alloc, free & check overflow interface Content-Language: en-US To: Juri Lelli , Peter Zijlstra Cc: Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla References: <20230503072228.115707-1-juri.lelli@redhat.com> <20230503072228.115707-6-juri.lelli@redhat.com> <20230504062359.GE1734100@hirez.programming.kicks-ass.net> From: Dietmar Eggemann In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/05/2023 10:15, Juri Lelli wrote: > On 04/05/23 08:23, Peter Zijlstra wrote: >> On Wed, May 03, 2023 at 09:22:27AM +0200, Juri Lelli wrote: >>> From: Dietmar Eggemann >>> >>> Rework the existing dl_cpu_busy() interface which offers DL BW overflow >>> checking and per-task DL BW allocation. >>> >>> Add dl_bw_free() as an interface to be able to free DL BW. >>> It will be used to allow freeing of the DL BW request done during >>> cpuset_can_attach() in case multiple controllers are attached to the >>> cgroup next to the cpuset controller and one of the non-cpuset >>> can_attach() fails. >>> >>> dl_bw_alloc() (and dl_bw_free()) now take a `u64 dl_bw` parameter >>> instead of `struct task_struct *p` used in dl_cpu_busy(). This allows >>> to allocate DL BW for a set of tasks too rater than only for a single >>> task. >>> >> >> Changlog fails the 'why' test. >> > > Dietmar, if you could please add (or rework) the 'why' as a reply to > this email, I can fold in v3. > > Thanks! > -->8-- From 4ce3556723728a606b527bef8813b0fce3ad6e22 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Wed, 3 May 2023 09:22:27 +0200 Subject: [PATCH] sched/deadline: Create DL BW alloc, free & check overflow interface While moving a set of tasks between exclusive cpusets, cpuset_can_attach() -> task_can_attach() calls dl_cpu_busy(..., p) for DL BW overflow checking and per-task DL BW allocation on the destination root_domain for the DL tasks in this set. This approach has the issue of not freeing already allocated DL BW in the following error cases: (1) The set of tasks includes multiple DL tasks and DL BW overflow checking fails for one of the subsequent DL tasks. (2) Another controller next to the cpuset controller which is attached to the same cgroup fails in its can_attach(). To address this problem rework dl_cpu_busy(): (1) Split it into dl_bw_check_overflow() & dl_bw_alloc() and add a dedicated dl_bw_free(). (2) dl_bw_alloc() & dl_bw_free() take a `u64 dl_bw` parameter instead of a `struct task_struct *p` used in dl_cpu_busy(). This allows to allocate DL BW for a set of tasks too rather than only for a single task. Signed-off-by: Dietmar Eggemann Signed-off-by: Juri Lelli --- include/linux/sched.h | 2 ++ kernel/sched/core.c | 4 ++-- kernel/sched/deadline.c | 53 +++++++++++++++++++++++++++++++---------- kernel/sched/sched.h | 2 +- 4 files changed, 45 insertions(+), 16 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index eed5d65b8d1f..0bee06542450 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1853,6 +1853,8 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags) extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus); +extern int dl_bw_alloc(int cpu, u64 dl_bw); +extern void dl_bw_free(int cpu, u64 dl_bw); #ifdef CONFIG_SMP extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d826bec1c522..df659892d7d5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9319,7 +9319,7 @@ int task_can_attach(struct task_struct *p, if (unlikely(cpu >= nr_cpu_ids)) return -EINVAL; - ret = dl_cpu_busy(cpu, p); + ret = dl_bw_alloc(cpu, p->dl.dl_bw); } out: @@ -9604,7 +9604,7 @@ static void cpuset_cpu_active(void) static int cpuset_cpu_inactive(unsigned int cpu) { if (!cpuhp_tasks_frozen) { - int ret = dl_cpu_busy(cpu, NULL); + int ret = dl_bw_check_overflow(cpu); if (ret) return ret; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index e11de074a6fd..166c3e6eae61 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -3058,26 +3058,38 @@ int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, return ret; } -int dl_cpu_busy(int cpu, struct task_struct *p) +enum dl_bw_request { + dl_bw_req_check_overflow = 0, + dl_bw_req_alloc, + dl_bw_req_free +}; + +static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw) { - unsigned long flags, cap; + unsigned long flags; struct dl_bw *dl_b; - bool overflow; + bool overflow = 0; rcu_read_lock_sched(); dl_b = dl_bw_of(cpu); raw_spin_lock_irqsave(&dl_b->lock, flags); - cap = dl_bw_capacity(cpu); - overflow = __dl_overflow(dl_b, cap, 0, p ? p->dl.dl_bw : 0); - if (!overflow && p) { - /* - * We reserve space for this task in the destination - * root_domain, as we can't fail after this point. - * We will free resources in the source root_domain - * later on (see set_cpus_allowed_dl()). - */ - __dl_add(dl_b, p->dl.dl_bw, dl_bw_cpus(cpu)); + if (req == dl_bw_req_free) { + __dl_sub(dl_b, dl_bw, dl_bw_cpus(cpu)); + } else { + unsigned long cap = dl_bw_capacity(cpu); + + overflow = __dl_overflow(dl_b, cap, 0, dl_bw); + + if (req == dl_bw_req_alloc && !overflow) { + /* + * We reserve space in the destination + * root_domain, as we can't fail after this point. + * We will free resources in the source root_domain + * later on (see set_cpus_allowed_dl()). + */ + __dl_add(dl_b, dl_bw, dl_bw_cpus(cpu)); + } } raw_spin_unlock_irqrestore(&dl_b->lock, flags); @@ -3085,6 +3097,21 @@ int dl_cpu_busy(int cpu, struct task_struct *p) return overflow ? -EBUSY : 0; } + +int dl_bw_check_overflow(int cpu) +{ + return dl_bw_manage(dl_bw_req_check_overflow, cpu, 0); +} + +int dl_bw_alloc(int cpu, u64 dl_bw) +{ + return dl_bw_manage(dl_bw_req_alloc, cpu, dl_bw); +} + +void dl_bw_free(int cpu, u64 dl_bw) +{ + dl_bw_manage(dl_bw_req_free, cpu, dl_bw); +} #endif #ifdef CONFIG_SCHED_DEBUG diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ec7b3e0a2b20..0ad712811e35 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -330,7 +330,7 @@ extern void __getparam_dl(struct task_struct *p, struct sched_attr *attr); extern bool __checkparam_dl(const struct sched_attr *attr); extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr); extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); -extern int dl_cpu_busy(int cpu, struct task_struct *p); +extern int dl_bw_check_overflow(int cpu); #ifdef CONFIG_CGROUP_SCHED -- 2.25.1