Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp1528451ybc; Tue, 12 Nov 2019 23:37:26 -0800 (PST) X-Google-Smtp-Source: APXvYqyTl9t/t9T87rHeipIqaRbYZJz7Ca5eZQt8JloZbHAezbQ2IoocU+Plj9WZQvse2kiGojSF X-Received: by 2002:a05:6402:12ca:: with SMTP id k10mr1968949edx.255.1573630646099; Tue, 12 Nov 2019 23:37:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573630646; cv=none; d=google.com; s=arc-20160816; b=LUPsbHi76vjYcEKrC1zuNWrJ3UjWyHN6HdekJhDj2oHSYFA20A4E1Psi7eLOntsJzb ESerd9I2eCVTj3hcbx4+mP4Wp/82U3Yp7VQvvH4FyOOh6gXka5sqQ07V59KvSZhokA5y w1rUuolz9mknR+eEwrRZa8K4D1CN+yxvdg0ProdWosSDjN9gJNL9kAex8lMXgkB8LLeR IADyRn5X51AwhqF3oY1cy9ga3Z2wJ1BqvhIDWWSOv9dZG91uSOi/mzxWAjt6eDMbzV63 DjgrEbiTjZ+a0VLxFUiUMWyKrAvXSHO4MTTuAbNGkDxXgzcF5JWJvO168lDx+0zjQFMy 1mrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=77wK0iy2ZvKlRd6iSBsLhNnRfk+rLxBdVAa7QMb2iWY=; b=nI3NwgwYMsjqp3WpUG81iOM2doHJPyspvosHt5gE+imS0V7/px+tcOreMAO7LqFjFg o2kKPBpSikvXRf13HZ5IppboObHCzjPR0C8QbnopwfknfE000A5IgKoKEEX3m4E8De7A 751luBXabdF1FblvHCMMFKIf7dEgZMMd5/YgAFhgu2f7sgcDUeuql7mbHxn+PnDkM8Dm eTazCZ9sEqfAic363IohpuCjRrMLDuFZldmeeau8WisrjH6R0lFYutdzjNaaLvZGjh8X lC8oijIdJ6kBMxS07YPuHhabezw2LbdFbIK+eoNq9Wn2QcDLuamt9YkjAOZFH7TJilYu 3FMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@indeed.com header.s=google header.b=N1NRvazo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=indeed.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h11si819462edf.93.2019.11.12.23.37.00; Tue, 12 Nov 2019 23:37:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@indeed.com header.s=google header.b=N1NRvazo; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=indeed.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726139AbfKMHgM (ORCPT + 99 others); Wed, 13 Nov 2019 02:36:12 -0500 Received: from mail-io1-f66.google.com ([209.85.166.66]:33974 "EHLO mail-io1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725858AbfKMHgM (ORCPT ); Wed, 13 Nov 2019 02:36:12 -0500 Received: by mail-io1-f66.google.com with SMTP id q83so1471312iod.1 for ; Tue, 12 Nov 2019 23:36:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=indeed.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=77wK0iy2ZvKlRd6iSBsLhNnRfk+rLxBdVAa7QMb2iWY=; b=N1NRvazowbvc9MWAWJORUP3zBO8lIIGPBU+z73LD+PAhTLudCRJSHCYZdUewvMGi+c yQl9ps8HvT7BqizZUVaHDzwTTOJQzJcJNnsD4Hg2qpP8LLbuLPGKRQ4TwiX9BxAWl8tX pePv5T7NZ+fhdWjXxLV+uggGXOi12u1wLY7xQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=77wK0iy2ZvKlRd6iSBsLhNnRfk+rLxBdVAa7QMb2iWY=; b=uUnHm0rwSbDd17C+Nh1uICLJK3eCiGTQCgeGa8j0XWhodm72yCG8QOlJAVwWKGZpg5 kZ0MTjQ9IYA5W5OI5D/HPGfYTREG4NikGisoCloNwbwMAzTcvWoiIDOuQ0AXIMISUVeV 1K4kuiBFK54wsEzLmbvTgG+cKDMFVm9H9/g34OcbaNm3TeI52uTDJq/7OOa7R/0ZiF30 KiPKUNhdzMAExVSCdLRsvh7rVnRCDULidhcOjhv2ZcPcd8quA1xT00fMz4ndCMZ1aD+r cRn8iXMuFFMz6aPsbGTQT3eTenT7H+QTC0yp8AWHX+aQ1XtvP7Hwn11VEYfDRhbmTcdn WTiA== X-Gm-Message-State: APjAAAULkPR6KCwouW9+JZjU+Mvx7frafSiLsfh7PB44s9YXtza4ETdz eU8gD6RqXQBCAPmYmBlSSCSoJllSKwK+CB7p1aaNVg== X-Received: by 2002:a5d:870e:: with SMTP id u14mr2140194iom.108.1573630568289; Tue, 12 Nov 2019 23:36:08 -0800 (PST) MIME-Version: 1.0 References: <157312875706.707.12248531434112979828.stgit@buzz> In-Reply-To: <157312875706.707.12248531434112979828.stgit@buzz> From: Dave Chiluk Date: Wed, 13 Nov 2019 01:35:42 -0600 Message-ID: Subject: Re: [PATCH RFC] sched/fair: add burst to cgroup cpu bandwidth controller To: Konstantin Khlebnikov Cc: Peter Zijlstra , Linux Kernel Mailing List , Ben Segall , Ingo Molnar , Tejun Heo , cgroups@vger.kernel.org, Cong Wang Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I was going to write this exact patch, but you beat me to it. I like that it's sequestered into a separate setting. This would be very useful for interactive applications that have cfs bandwidth control enabled *(cough* kubernetes). Are you planning on submitting the plumbing for this feature back into docker/containerd/lxc/other container orchestrators? Also this is very similar in desired behavior to a patchset submitted by Cong Wang https://lore.kernel.org/patchwork/patch/907450/ .Although, I like this implementation better due to it's simplicity. I guess a lot of the simplification comes from not having to deal with the expiration logic. Additional comments inlined. Documentation edits, and my preference for a full burst bank on setting. Otherwise looks good to me. + Reviewed-by: Dave Chiluk I also tested this with my simple test application I use for validating runtime accuracy https://github.com/indeedeng/fibtest. And everything looks reasonable here. I'll try to do a bit more testing but so far I'm happy. Hopefully Ben will give a +1 to this as well. Thanks! On Thu, Nov 7, 2019 at 6:12 AM Konstantin Khlebnikov wrote: > > Currently CFS bandwidth controller assigns cpu.cfs_quota_us runtime into > global pool every cpu.cfs_period_us. All unused runtime is expired. > > Since commit de53fd7aedb1 ("sched/fair: Fix low cpu usage with high > throttling by removing expiration of cpu-local slices") slice assigned > to cpu does not expire. This allows to serve tiny bursts (upto 1ms), > but this runtime pool is cpu-bound and not transferred between cpus. > > Setup for interactive workload with irregular cpu consumption have to set > quota according to relatively short spikes of cpu usage. This eliminates > possibility of control for average cpu usage. Increasing period and quota > proportionally for getting bigger runtime chunks is not an option because > if even bigger spike deplete global pool then execution will stuck until > end of period and next refill. > > This patch adds limited accumulation of unused runtime from past periods. > Accumulated runtime does not expire. It stays in global pool and could > be used by any cpu. Average cpu usage stays limited with quota / period, > but spiky workload could use more cpu power for a short period of time. > > Size of pool for burst runtime is set in attribute cpu.cfs_burst_us. > Default is 0, which reflects current behavior. > > Statistics for used bust runtime is shown in cpu.stat as "burst_time". > > Example setup: > cpu.cfs_period_us = 100ms > cpu.cfs_quota_us = 200ms > cpu.cfs_burst_us = 300ms > > Average cpu usage stays limited with 2 cpus (quota / period), but cgroup > could accumulate runtime (burst) and for 100ms could utilize up to 5 cpus > (quota / period + burst / 100ms), or 3 cpus for 300ms, an so on. > > Implementation is simple. All logic is in __refill_cfs_bandwidth_runtime(). > The rest changes are interface for cgroup and cgroup2. > > For cgroup2 burst is set as third number in attribute cpu.max: > cpu.max = $QUOTA $PERIOD $BURST > > Signed-off-by: Konstantin Khlebnikov > --- > Documentation/admin-guide/cgroup-v2.rst | 15 +++-- > Documentation/scheduler/sched-bwc.rst | 8 ++- > kernel/sched/core.c | 88 +++++++++++++++++++++++++------ > kernel/sched/fair.c | 34 ++++++++++-- > kernel/sched/sched.h | 4 + > 5 files changed, 119 insertions(+), 30 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 5361ebec3361..8c3cc3d882ba 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -981,11 +981,12 @@ All time durations are in microseconds. > - user_usec > - system_usec > > - and the following three when the controller is enabled: > + and the following four when the controller is enabled: > > - nr_periods > - nr_throttled > - throttled_usec > + - burst_usec > > cpu.weight > A read-write single value file which exists on non-root > @@ -1006,16 +1007,18 @@ All time durations are in microseconds. > the closest approximation of the current weight. > > cpu.max > - A read-write two value file which exists on non-root cgroups. > - The default is "max 100000". > + A read-write 1..3 values file which exists on non-root cgroups. > + The default is "max 100000 0". > > The maximum bandwidth limit. It's in the following format:: > > - $MAX $PERIOD > + $MAX $PERIOD $BURST > > which indicates that the group may consume upto $MAX in each > - $PERIOD duration. "max" for $MAX indicates no limit. If only > - one number is written, $MAX is updated. > + $PERIOD duration and accumulates upto $BURST time for bursts. > + > + "max" for $MAX indicates no limit. > + If only one number is written, $MAX is updated. > > cpu.pressure > A read-only nested-key file which exists on non-root cgroups. > diff --git a/Documentation/scheduler/sched-bwc.rst b/Documentation/scheduler/sched-bwc.rst > index 9801d6b284b1..5f1cac7b2d44 100644 > --- a/Documentation/scheduler/sched-bwc.rst > +++ b/Documentation/scheduler/sched-bwc.rst > @@ -27,12 +27,14 @@ Quota and period are managed within the cpu subsystem via cgroupfs. > > cpu.cfs_quota_us: the total available run-time within a period (in microseconds) > cpu.cfs_period_us: the length of a period (in microseconds) > +cpu.cfs_burst_us: the maxumum size of burst run-time pool (in microseconds) maxumum -> maximum > cpu.stat: exports throttling statistics [explained further below] > > The default values are:: > > cpu.cfs_period_us=100ms > - cpu.cfs_quota=-1 > + cpu.cfs_quota_us=-1 > + cpu.cfs_burst_us=0 > > A value of -1 for cpu.cfs_quota_us indicates that the group does not have any > bandwidth restriction in place, such a group is described as an unconstrained > @@ -51,6 +53,9 @@ and return the group to an unconstrained state once more. > Any updates to a group's bandwidth specification will result in it becoming > unthrottled if it is in a constrained state. > > +Writing positive value into cpu.cfs_burst_us allows to allcumulate unused > +run-time (upto this value) and use it later in addition to assigned quota. Writing positive value into cpu.cfs_burst_us allows unused quota to accumulate up to this value and be used later in addition to assigned quota. > + > System wide settings > -------------------- > For efficiency run-time is transferred between the global pool and CPU local > @@ -75,6 +80,7 @@ cpu.stat: > - nr_throttled: Number of times the group has been throttled/limited. > - throttled_time: The total time duration (in nanoseconds) for which entities > of the group have been throttled. > +- burst_time: The total running time consumed from burst pool. > > This interface is read-only. > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index dd05a378631a..010b79c45200 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -7354,7 +7354,8 @@ static const u64 min_cfs_quota_period = 1 * NSEC_PER_MSEC; /* 1ms */ > > static int __cfs_schedulable(struct task_group *tg, u64 period, u64 runtime); > > -static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) > +static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota, > + u64 burst) > { > int i, ret = 0, runtime_enabled, runtime_was_enabled; > struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth; > @@ -7399,12 +7400,13 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) > raw_spin_lock_irq(&cfs_b->lock); > cfs_b->period = ns_to_ktime(period); > cfs_b->quota = quota; > - > - __refill_cfs_bandwidth_runtime(cfs_b); > + cfs_b->burst = burst; + cfs_b->burst_runtime = burst; I think we should default to a full burst bank. After all, during startup applications very likely to need additional CPU. This is a regular pain point for our applications.. By providing burst bank early we help to mitigate slow start-up times for cpu limited applications. > > /* Restart the period timer (if active) to handle new period expiry: */ > - if (runtime_enabled) > + if (runtime_enabled) { > + __refill_cfs_bandwidth_runtime(cfs_b); > start_cfs_bandwidth(cfs_b); > + } > > raw_spin_unlock_irq(&cfs_b->lock); > > @@ -7432,9 +7434,10 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) > > static int tg_set_cfs_quota(struct task_group *tg, long cfs_quota_us) > { > - u64 quota, period; > + u64 quota, period, burst; > > period = ktime_to_ns(tg->cfs_bandwidth.period); > + burst = tg->cfs_bandwidth.burst; > if (cfs_quota_us < 0) > quota = RUNTIME_INF; > else if ((u64)cfs_quota_us <= U64_MAX / NSEC_PER_USEC) > @@ -7442,7 +7445,7 @@ static int tg_set_cfs_quota(struct task_group *tg, long cfs_quota_us) > else > return -EINVAL; > > - return tg_set_cfs_bandwidth(tg, period, quota); > + return tg_set_cfs_bandwidth(tg, period, quota, burst); > } > > static long tg_get_cfs_quota(struct task_group *tg) > @@ -7460,15 +7463,16 @@ static long tg_get_cfs_quota(struct task_group *tg) > > static int tg_set_cfs_period(struct task_group *tg, long cfs_period_us) > { > - u64 quota, period; > + u64 quota, period, burst; > > if ((u64)cfs_period_us > U64_MAX / NSEC_PER_USEC) > return -EINVAL; > > period = (u64)cfs_period_us * NSEC_PER_USEC; > quota = tg->cfs_bandwidth.quota; > + burst = tg->cfs_bandwidth.burst; > > - return tg_set_cfs_bandwidth(tg, period, quota); > + return tg_set_cfs_bandwidth(tg, period, quota, burst); > } > > static long tg_get_cfs_period(struct task_group *tg) > @@ -7481,6 +7485,28 @@ static long tg_get_cfs_period(struct task_group *tg) > return cfs_period_us; > } > > +static long tg_get_cfs_burst(struct task_group *tg) > +{ > + u64 cfs_burst_us = tg->cfs_bandwidth.burst; > + > + do_div(cfs_burst_us, NSEC_PER_USEC); > + return cfs_burst_us; > +} > + > +static int tg_set_cfs_burst(struct task_group *tg, long cfs_burst_us) > +{ > + u64 quota, period, burst; > + > + if ((u64)cfs_burst_us > U64_MAX / NSEC_PER_USEC) > + return -EINVAL; > + > + period = ktime_to_ns(tg->cfs_bandwidth.period); > + quota = tg->cfs_bandwidth.quota; > + burst = (u64)cfs_burst_us * NSEC_PER_USEC; > + > + return tg_set_cfs_bandwidth(tg, period, quota, burst); > +} > + > static s64 cpu_cfs_quota_read_s64(struct cgroup_subsys_state *css, > struct cftype *cft) > { > @@ -7505,6 +7531,18 @@ static int cpu_cfs_period_write_u64(struct cgroup_subsys_state *css, > return tg_set_cfs_period(css_tg(css), cfs_period_us); > } > > +static u64 cpu_cfs_burst_read_u64(struct cgroup_subsys_state *css, > + struct cftype *cft) > +{ > + return tg_get_cfs_burst(css_tg(css)); > +} > + > +static int cpu_cfs_burst_write_u64(struct cgroup_subsys_state *css, > + struct cftype *cftype, u64 cfs_burst_us) > +{ > + return tg_set_cfs_burst(css_tg(css), cfs_burst_us); > +} > + > struct cfs_schedulable_data { > struct task_group *tg; > u64 period, quota; > @@ -7596,6 +7634,7 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v) > seq_printf(sf, "nr_periods %d\n", cfs_b->nr_periods); > seq_printf(sf, "nr_throttled %d\n", cfs_b->nr_throttled); > seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time); > + seq_printf(sf, "burst_time %llu\n", cfs_b->burst_time); > > if (schedstat_enabled() && tg != &root_task_group) { > u64 ws = 0; > @@ -7657,6 +7696,11 @@ static struct cftype cpu_legacy_files[] = { > .read_u64 = cpu_cfs_period_read_u64, > .write_u64 = cpu_cfs_period_write_u64, > }, > + { > + .name = "cfs_burst_us", > + .read_u64 = cpu_cfs_burst_read_u64, > + .write_u64 = cpu_cfs_burst_write_u64, > + }, > { > .name = "stat", > .seq_show = cpu_cfs_stat_show, > @@ -7699,15 +7743,20 @@ static int cpu_extra_stat_show(struct seq_file *sf, > struct task_group *tg = css_tg(css); > struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth; > u64 throttled_usec; > + u64 burst_usec; > > throttled_usec = cfs_b->throttled_time; > do_div(throttled_usec, NSEC_PER_USEC); > > + burst_usec = cfs_b->burst_time; > + do_div(burst_usec, NSEC_PER_USEC); > + > seq_printf(sf, "nr_periods %d\n" > "nr_throttled %d\n" > - "throttled_usec %llu\n", > + "throttled_usec %llu\n" > + "burst_usec %llu\n", > cfs_b->nr_periods, cfs_b->nr_throttled, > - throttled_usec); > + throttled_usec, burst_usec); > } > #endif > return 0; > @@ -7777,26 +7826,29 @@ static int cpu_weight_nice_write_s64(struct cgroup_subsys_state *css, > #endif > > static void __maybe_unused cpu_period_quota_print(struct seq_file *sf, > - long period, long quota) > + long period, long quota, > + long burst) > { > if (quota < 0) > seq_puts(sf, "max"); > else > seq_printf(sf, "%ld", quota); > > - seq_printf(sf, " %ld\n", period); > + seq_printf(sf, " %ld %ld\n", period, burst); > } > > /* caller should put the current value in *@periodp before calling */ > static int __maybe_unused cpu_period_quota_parse(char *buf, > - u64 *periodp, u64 *quotap) > + u64 *periodp, u64 *quotap, > + s64 *burstp) > { > char tok[21]; /* U64_MAX */ > > - if (sscanf(buf, "%20s %llu", tok, periodp) < 1) > + if (sscanf(buf, "%20s %llu %llu", tok, periodp, burstp) < 1) > return -EINVAL; > > *periodp *= NSEC_PER_USEC; > + *burstp *= NSEC_PER_USEC; > > if (sscanf(tok, "%llu", quotap)) > *quotap *= NSEC_PER_USEC; > @@ -7813,7 +7865,8 @@ static int cpu_max_show(struct seq_file *sf, void *v) > { > struct task_group *tg = css_tg(seq_css(sf)); > > - cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg)); > + cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg), > + tg_get_cfs_burst(tg)); > return 0; > } > > @@ -7822,12 +7875,13 @@ static ssize_t cpu_max_write(struct kernfs_open_file *of, > { > struct task_group *tg = css_tg(of_css(of)); > u64 period = tg_get_cfs_period(tg); > + s64 burst = tg_get_cfs_burst(tg); > u64 quota; > int ret; > > - ret = cpu_period_quota_parse(buf, &period, "a); > + ret = cpu_period_quota_parse(buf, &period, "a, &burst); > if (!ret) > - ret = tg_set_cfs_bandwidth(tg, period, quota); > + ret = tg_set_cfs_bandwidth(tg, period, quota, burst); > return ret ?: nbytes; > } > #endif > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 682a754ea3e1..26b8dac67c79 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4353,16 +4353,26 @@ static inline u64 sched_cfs_bandwidth_slice(void) > } > > /* > - * Replenish runtime according to assigned quota. We use sched_clock_cpu > - * directly instead of rq->clock to avoid adding additional synchronization > - * around rq->lock. > + * Replenish runtime according to assigned quota. > + * Called only if quota != RUNTIME_INF. > * > * requires cfs_b->lock > */ > void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b) > { > - if (cfs_b->quota != RUNTIME_INF) > - cfs_b->runtime = cfs_b->quota; > + u64 runtime = cfs_b->runtime; > + > + /* > + * Preserve past runtime up to burst size. If remaining runtime lower > + * than previous burst runtime then account delta as used burst time. > + */ > + if (runtime > cfs_b->burst) > + runtime = cfs_b->burst; > + else if (runtime < cfs_b->burst_runtime) > + cfs_b->burst_time += cfs_b->burst_runtime - runtime; > + > + cfs_b->burst_runtime = runtime; > + cfs_b->runtime = runtime + cfs_b->quota; > } > > static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg) > @@ -4968,6 +4978,9 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) > cfs_b->runtime = 0; > cfs_b->quota = RUNTIME_INF; > cfs_b->period = ns_to_ktime(default_cfs_period()); > + cfs_b->burst = 0; > + cfs_b->burst_runtime = 0; > + cfs_b->burst_time = 0; > > INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq); > hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); > @@ -4986,14 +4999,23 @@ static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq) > > void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b) > { > + u64 overrun; > + > lockdep_assert_held(&cfs_b->lock); > > if (cfs_b->period_active) > return; > > cfs_b->period_active = 1; > - hrtimer_forward_now(&cfs_b->period_timer, cfs_b->period); > + overrun = hrtimer_forward_now(&cfs_b->period_timer, cfs_b->period); > hrtimer_start_expires(&cfs_b->period_timer, HRTIMER_MODE_ABS_PINNED); > + > + /* > + * Refill runtime for periods of inactivity and current. > + * __refill_cfs_bandwidth_runtime() will cut excess. > + */ > + cfs_b->runtime += cfs_b->quota * overrun; > + __refill_cfs_bandwidth_runtime(cfs_b); > } > > static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 0db2c1b3361e..eaecd7298f80 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -344,10 +344,14 @@ struct cfs_bandwidth { > struct hrtimer slack_timer; > struct list_head throttled_cfs_rq; > > + u64 burst; > + u64 burst_runtime; > + > /* Statistics: */ > int nr_periods; > int nr_throttled; > u64 throttled_time; > + u64 burst_time; > #endif > }; > >