Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp228274pxb; Tue, 2 Feb 2021 03:47:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJwif4ZRiNEsxeM0or2nctcBRFLV+FAhT7k4sw+lZkhcEYlhT1TyFcAQLnwf2e1jAqyzOgoC X-Received: by 2002:a05:6402:1155:: with SMTP id g21mr23129847edw.279.1612266468418; Tue, 02 Feb 2021 03:47:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612266468; cv=none; d=google.com; s=arc-20160816; b=MhncG6TVhq4Ud3hxRYLBqp629FnGSVzCpbvMl3ewtqIouYuTquaxaB0NDkYYe+Ru/S H3cpVySaIQFbEYReZOBdSdoYqhfWQKfoDaPheJTdV/JbNHilQrfNL2Glqalel6aKxKib R/WDkMSzq7LEmB+/PD/toJfT6l0G80T+ge/j58BhWlzGrc9I08L4aebxv3TgG695mbGY ZELwY77OAYoBSkiXGH3BHbnCqMBa7GpuUPMs76FITpPVVh/BWGdldeMVtdj9ZgIlFhcu qwfIOC09OODairkQR0RaRx9ttQvVCyLShWr0thzF3+KoHhs1xNXqhcZ2QhZmSp3/MfK/ ekEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=UNipaA5jah5i7FzI2uM7opuyuYiXdWkp7hxvj0JJ/kk=; b=hJlKv8QPCcAvVBlAyoawCz0+dy5dqwWWAOgq4iSeJVYUxPf35llR3tmvJjmWc021wl vFFx/hJ5aBf/UpvEzRe2RO1GpN4VzMdbpOUuAOW9rjfhA/aVwP2SlKlJAx+VHKWlxnUF gDpm5Qw2B5v/UrMyG5CjUQS7SG8mQwQ2z3y7zSk3TGcJebDJUcy8r2ie3kH2e+T8eW1G iq5n3KybRUHESaqvF9V/Fu5/QfG53axw7e9uIY9E9bDXmO+v0rYLt2ZmEUv4gN+OyS9w pJwUzmuGZ2zCC+RJfTsk5bI5dLwTy9bSxt6CMRDUaSnwF6NRNB3xL+3UEYTCJL1CzyAE U1XQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gb7si12870728ejc.286.2021.02.02.03.47.22; Tue, 02 Feb 2021 03:47:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229864AbhBBLnW (ORCPT + 99 others); Tue, 2 Feb 2021 06:43:22 -0500 Received: from out30-42.freemail.mail.aliyun.com ([115.124.30.42]:36667 "EHLO out30-42.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231273AbhBBLnP (ORCPT ); Tue, 2 Feb 2021 06:43:15 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=changhuaixin@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0UNgPhy0_1612266042; Received: from localhost(mailfrom:changhuaixin@linux.alibaba.com fp:SMTPD_---0UNgPhy0_1612266042) by smtp.aliyun-inc.com(127.0.0.1); Tue, 02 Feb 2021 19:40:42 +0800 From: Huaixin Chang To: linux-kernel@vger.kernel.org Cc: changhuaixin@linux.alibaba.com, bsegall@google.com, dietmar.eggemann@arm.com, juri.lelli@redhat.com, khlebnikov@yandex-team.ru, mgorman@suse.de, mingo@redhat.com, pauld@redhead.com, peterz@infradead.org, pjt@google.com, rostedt@goodmis.org, shanpeic@linux.alibaba.com, vincent.guittot@linaro.org, xiyou.wangcong@gmail.com Subject: [PATCH 1/4] sched/fair: Introduce primitives for CFS bandwidth burst Date: Tue, 2 Feb 2021 19:40:35 +0800 Message-Id: <20210202114038.64870-2-changhuaixin@linux.alibaba.com> X-Mailer: git-send-email 2.14.4.44.g2045bb6 In-Reply-To: <20210202114038.64870-1-changhuaixin@linux.alibaba.com> References: <20210202114038.64870-1-changhuaixin@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In this patch, we introduce the notion of CFS bandwidth burst. Unused "quota" from pervious "periods" might be accumulated and used in the following "periods". The maximum amount of accumulated bandwidth is bounded by "burst". And the maximun amount of CPU a group can consume in a given period is "buffer" which is equivalent to "quota" + "burst in case that this group has done enough accumulation. Signed-off-by: Huaixin Chang Signed-off-by: Shanpei Chen --- kernel/sched/core.c | 91 ++++++++++++++++++++++++++++++++++++++++++++-------- kernel/sched/fair.c | 2 ++ kernel/sched/sched.h | 2 ++ 3 files changed, 82 insertions(+), 13 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ff74fca39ed2..28e3165c685b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8590,10 +8590,12 @@ static const u64 max_cfs_runtime = MAX_BW * NSEC_PER_USEC; static int __cfs_schedulable(struct task_group *tg, u64 period, u64 runtime); -static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) +static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota, + u64 burst) { int i, ret = 0, runtime_enabled, runtime_was_enabled; struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth; + u64 buffer; if (tg == &root_task_group) return -EINVAL; @@ -8620,6 +8622,16 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) if (quota != RUNTIME_INF && quota > max_cfs_runtime) return -EINVAL; + /* + * Bound burst to defend burst against overflow during bandwidth shift. + */ + if (burst > max_cfs_runtime) + return -EINVAL; + + if (quota == RUNTIME_INF) + buffer = RUNTIME_INF; + else + buffer = min(max_cfs_runtime, quota + burst); /* * Prevent race between setting of cfs_rq->runtime_enabled and * unthrottle_offline_cfs_rqs(). @@ -8641,6 +8653,8 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) raw_spin_lock_irq(&cfs_b->lock); cfs_b->period = ns_to_ktime(period); cfs_b->quota = quota; + cfs_b->burst = burst; + cfs_b->buffer = buffer; __refill_cfs_bandwidth_runtime(cfs_b); @@ -8674,9 +8688,10 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) static int tg_set_cfs_quota(struct task_group *tg, long cfs_quota_us) { - u64 quota, period; + u64 quota, period, burst; period = ktime_to_ns(tg->cfs_bandwidth.period); + burst = tg->cfs_bandwidth.burst; if (cfs_quota_us < 0) quota = RUNTIME_INF; else if ((u64)cfs_quota_us <= U64_MAX / NSEC_PER_USEC) @@ -8684,7 +8699,7 @@ static int tg_set_cfs_quota(struct task_group *tg, long cfs_quota_us) else return -EINVAL; - return tg_set_cfs_bandwidth(tg, period, quota); + return tg_set_cfs_bandwidth(tg, period, quota, burst); } static long tg_get_cfs_quota(struct task_group *tg) @@ -8702,15 +8717,16 @@ static long tg_get_cfs_quota(struct task_group *tg) static int tg_set_cfs_period(struct task_group *tg, long cfs_period_us) { - u64 quota, period; + u64 quota, period, burst; if ((u64)cfs_period_us > U64_MAX / NSEC_PER_USEC) return -EINVAL; period = (u64)cfs_period_us * NSEC_PER_USEC; quota = tg->cfs_bandwidth.quota; + burst = tg->cfs_bandwidth.burst; - return tg_set_cfs_bandwidth(tg, period, quota); + return tg_set_cfs_bandwidth(tg, period, quota, burst); } static long tg_get_cfs_period(struct task_group *tg) @@ -8723,6 +8739,35 @@ static long tg_get_cfs_period(struct task_group *tg) return cfs_period_us; } +static int tg_set_cfs_burst(struct task_group *tg, long cfs_burst_us) +{ + u64 quota, period, burst; + + period = ktime_to_ns(tg->cfs_bandwidth.period); + quota = tg->cfs_bandwidth.quota; + if (cfs_burst_us < 0) + burst = RUNTIME_INF; + else if ((u64)cfs_burst_us <= U64_MAX / NSEC_PER_USEC) + burst = (u64)cfs_burst_us * NSEC_PER_USEC; + else + return -EINVAL; + + return tg_set_cfs_bandwidth(tg, period, quota, burst); +} + +static long tg_get_cfs_burst(struct task_group *tg) +{ + u64 burst_us; + + if (tg->cfs_bandwidth.burst == RUNTIME_INF) + return -1; + + burst_us = tg->cfs_bandwidth.burst; + do_div(burst_us, NSEC_PER_USEC); + + return burst_us; +} + static s64 cpu_cfs_quota_read_s64(struct cgroup_subsys_state *css, struct cftype *cft) { @@ -8747,6 +8792,18 @@ static int cpu_cfs_period_write_u64(struct cgroup_subsys_state *css, return tg_set_cfs_period(css_tg(css), cfs_period_us); } +static s64 cpu_cfs_burst_read_s64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return tg_get_cfs_burst(css_tg(css)); +} + +static int cpu_cfs_burst_write_s64(struct cgroup_subsys_state *css, + struct cftype *cftype, s64 cfs_burst_us) +{ + return tg_set_cfs_burst(css_tg(css), cfs_burst_us); +} + struct cfs_schedulable_data { struct task_group *tg; u64 period, quota; @@ -8899,6 +8956,11 @@ static struct cftype cpu_legacy_files[] = { .read_u64 = cpu_cfs_period_read_u64, .write_u64 = cpu_cfs_period_write_u64, }, + { + .name = "cfs_burst_us", + .read_s64 = cpu_cfs_burst_read_s64, + .write_s64 = cpu_cfs_burst_write_s64, + }, { .name = "stat", .seq_show = cpu_cfs_stat_show, @@ -9019,26 +9081,27 @@ static int cpu_weight_nice_write_s64(struct cgroup_subsys_state *css, #endif static void __maybe_unused cpu_period_quota_print(struct seq_file *sf, - long period, long quota) + long period, long quota, long burst) { if (quota < 0) seq_puts(sf, "max"); else seq_printf(sf, "%ld", quota); - seq_printf(sf, " %ld\n", period); + seq_printf(sf, " %ld %ld\n", period, burst); } -/* caller should put the current value in *@periodp before calling */ +/* caller should put the current value in *@periodp and *@burstp before calling */ static int __maybe_unused cpu_period_quota_parse(char *buf, - u64 *periodp, u64 *quotap) + u64 *periodp, u64 *quotap, u64 *burstp) { char tok[21]; /* U64_MAX */ - if (sscanf(buf, "%20s %llu", tok, periodp) < 1) + if (sscanf(buf, "%20s %llu %llu", tok, periodp, burstp) < 1) return -EINVAL; *periodp *= NSEC_PER_USEC; + *burstp *= NSEC_PER_USEC; if (sscanf(tok, "%llu", quotap)) *quotap *= NSEC_PER_USEC; @@ -9055,7 +9118,8 @@ static int cpu_max_show(struct seq_file *sf, void *v) { struct task_group *tg = css_tg(seq_css(sf)); - cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg)); + cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg), + tg_get_cfs_burst(tg)); return 0; } @@ -9064,12 +9128,13 @@ static ssize_t cpu_max_write(struct kernfs_open_file *of, { struct task_group *tg = css_tg(of_css(of)); u64 period = tg_get_cfs_period(tg); + u64 burst = tg_get_cfs_burst(tg); u64 quota; int ret; - ret = cpu_period_quota_parse(buf, &period, "a); + ret = cpu_period_quota_parse(buf, &period, "a, &burst); if (!ret) - ret = tg_set_cfs_bandwidth(tg, period, quota); + ret = tg_set_cfs_bandwidth(tg, period, quota, burst); return ret ?: nbytes; } #endif diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 04a3ce20da67..46945349f209 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5262,6 +5262,8 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) cfs_b->runtime = 0; cfs_b->quota = RUNTIME_INF; cfs_b->period = ns_to_ktime(default_cfs_period()); + cfs_b->burst = 0; + cfs_b->buffer = RUNTIME_INF; INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq); hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index bb09988451a0..2c0d8469c0fb 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -357,6 +357,8 @@ struct cfs_bandwidth { ktime_t period; u64 quota; u64 runtime; + u64 burst; + u64 buffer; s64 hierarchical_quota; u8 idle; -- 2.14.4.44.g2045bb6