Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3289298yba; Mon, 8 Apr 2019 15:32:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqy5sp/lgVc8hMfTDPNIWcrdCETjgmIZWBY1A932qgQmyqxTBAAXv1MeRDad8GLWZmwXn2oQ X-Received: by 2002:a63:6142:: with SMTP id v63mr31768468pgb.342.1554762723741; Mon, 08 Apr 2019 15:32:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554762723; cv=none; d=google.com; s=arc-20160816; b=KiYCAby0S3g87iXeI2bEsu2YKofArQVZl8Q1Y/vl0JJJ/O5+z1RXE5rIXNSJFpJ77q hWFmVmc3DfaMzhzIXpJNiQMkAEMsMv0rv3HobAa6bEULh9V1nkGonujch7dpvsloH1M2 M3JoYULT+7y1FLGl5CFj58NEay9n9YlGl5NXAh9h9HeVtauqpEbTWZxf3V8+XKWmPaiN LcjwpHzieWrTrIJB6bj07Qkc+eJ4IQDKkP0U3MPaTyEiC4Eob4EqstefQpDxoL3KBdPp Pj+hAp8bALw7f+9xXP3NOA33zo1dJvshEycmxXD3eX5yodQbocs+TJavUI4emsiecfy4 LKBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:smtp-origin-cluster:cc:to :smtp-origin-hostname:from:smtp-origin-hostprefix:dkim-signature; bh=OO9TirhM3pC6EEM2ULmWIFa5L79pomgcK4+Z3m72rLA=; b=sQauAl21EqdOJY8S4gVho+iCbEb5muWEXzSDIQ0i9rCbMxLfnA/ca1YYFlC5hyZn92 0X1ZXD5BgVzlitleYd+HiqZE6gFrv3H6HpWMaBsBWIZjMCgpaIce3a0Ld4XCbScPcppl Pat+WaYt9P23W12+ucZXvTDsiCa/A+g7Kv+HJjjI0aCY/O420UzrE2dYnLTuZBct2RGy zGYzN/sjm9pAOK8pY7gahQhfF+s9gCdpylmfFUkl8lAdGj1Jx6B1ff9spN/rpoZq+G1z bF99sikfbgRSiUuAzhggkcNY6cpH4CdtXID5xClWRhFhEu/wyLKIqWZAqemBu4K/OqVl 3t9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=o0fkawdq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fb.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q13si28008611pfi.208.2019.04.08.15.31.48; Mon, 08 Apr 2019 15:32:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=o0fkawdq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fb.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728799AbfDHVqP (ORCPT + 99 others); Mon, 8 Apr 2019 17:46:15 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:54604 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726507AbfDHVqN (ORCPT ); Mon, 8 Apr 2019 17:46:13 -0400 Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x38Li43m008373 for ; Mon, 8 Apr 2019 14:46:12 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=OO9TirhM3pC6EEM2ULmWIFa5L79pomgcK4+Z3m72rLA=; b=o0fkawdqF2jtGf4hnMQsOuOcL+MUhUG66Oo/zmF+rUKp8btKYu1MEcals4hNlWUBP8r4 w5YgZbTGEeUM0zN/xDpA1LjqIBW7YHvK/LGuYOWFJDZwxOrkVlMqsfNVI50m1BqTqqj9 QDhQBIaChgGkfNYfTU62n//JhBv8R0s8eew= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2rrb4v8tuf-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Mon, 08 Apr 2019 14:46:12 -0700 Received: from mx-out.facebook.com (2620:10d:c081:10::13) by mail.thefacebook.com (2620:10d:c081:35::128) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1713.5; Mon, 8 Apr 2019 14:46:09 -0700 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id D6ECD62E1F66; Mon, 8 Apr 2019 14:46:07 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Song Liu Smtp-Origin-Hostname: devbig006.ftw2.facebook.com To: , CC: , , , , , , Song Liu Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH 6/7] sched/fair: throttle task runtime based on cpu.headroom Date: Mon, 8 Apr 2019 14:45:38 -0700 Message-ID: <20190408214539.2705660-7-songliubraving@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190408214539.2705660-1-songliubraving@fb.com> References: <20190408214539.2705660-1-songliubraving@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-08_09:,, signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch enables task runtime throttling based on cpu.headroom setting. The throttling leverages the same mechanism of the cpu.max knob. Task groups with non-zero target_idle get throttled. In __refill_cfs_bandwidth_runtime(), global idleness measured by function cfs_global_idleness_update() is compared against target_idle of the task group. If the measured idleness is lower than the target, runtime of this task group is reduced to min_runtime. A new variable "prev_runtime" is added to struct cfs_bandwidth, so that the new runtime could be adjust accordingly. Signed-off-by: Song Liu --- kernel/sched/fair.c | 69 +++++++++++++++++++++++++++++++++++++++----- kernel/sched/sched.h | 4 +++ 2 files changed, 66 insertions(+), 7 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 49c68daffe7e..3b0535cda7cd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4331,6 +4331,16 @@ static inline u64 sched_cfs_bandwidth_slice(void) return (u64)sysctl_sched_cfs_bandwidth_slice * NSEC_PER_USEC; } +static inline bool cfs_bandwidth_throttling_on(struct cfs_bandwidth *cfs_b) +{ + return cfs_b->quota != RUNTIME_INF || cfs_b->target_idle != 0; +} + +static inline u64 cfs_bandwidth_pct_to_ns(u64 period, unsigned long pct) +{ + return div_u64(period * num_online_cpus() * pct, 100) >> FSHIFT; +} + /* * Replenish runtime according to assigned quota and update expiration time. * We use sched_clock_cpu directly instead of rq->clock to avoid adding @@ -4340,9 +4350,12 @@ static inline u64 sched_cfs_bandwidth_slice(void) */ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b) { + /* runtimes in nanoseconds */ + u64 idle_time, target_idle_time, max_runtime, min_runtime; + unsigned long idle_pct; u64 now; - if (cfs_b->quota == RUNTIME_INF) + if (!cfs_bandwidth_throttling_on(cfs_b)) return; now = sched_clock_cpu(smp_processor_id()); @@ -4353,7 +4366,49 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b) if (cfs_b->target_idle == 0) return; - cfs_global_idleness_update(now, cfs_b->period); + /* + * max_runtime is the maximal possible runtime for given + * target_idle and quota. In other words: + * max_runtime = min(quota, + * total_time * (100% - target_idle)) + */ + max_runtime = min_t(u64, cfs_b->quota, + cfs_bandwidth_pct_to_ns(cfs_b->period, + (100 << FSHIFT) - cfs_b->target_idle)); + idle_pct = cfs_global_idleness_update(now, cfs_b->period); + + /* + * Throttle runtime if idle_pct is less than target_idle: + * idle_pct < cfs_b->target_idle + * + * or if the throttling is on in previous period: + * max_runtime != cfs_b->prev_runtime + */ + if (idle_pct < cfs_b->target_idle || + max_runtime != cfs_b->prev_runtime) { + idle_time = cfs_bandwidth_pct_to_ns(cfs_b->period, idle_pct); + target_idle_time = cfs_bandwidth_pct_to_ns(cfs_b->period, + cfs_b->target_idle); + + /* minimal runtime to avoid starving */ + min_runtime = max_t(u64, min_cfs_quota_period, + cfs_bandwidth_pct_to_ns(cfs_b->period, + cfs_b->min_runtime)); + if (cfs_b->prev_runtime + idle_time < target_idle_time) { + cfs_b->runtime = min_runtime; + } else { + cfs_b->runtime = cfs_b->prev_runtime + idle_time - + target_idle_time; + if (cfs_b->runtime > max_runtime) + cfs_b->runtime = max_runtime; + if (cfs_b->runtime < min_runtime) + cfs_b->runtime = min_runtime; + } + } else { + /* no need for throttling */ + cfs_b->runtime = max_runtime; + } + cfs_b->prev_runtime = cfs_b->runtime; } static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg) @@ -4382,7 +4437,7 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) min_amount = sched_cfs_bandwidth_slice() - cfs_rq->runtime_remaining; raw_spin_lock(&cfs_b->lock); - if (cfs_b->quota == RUNTIME_INF) + if (!cfs_bandwidth_throttling_on(cfs_b)) amount = min_amount; else { start_cfs_bandwidth(cfs_b); @@ -4690,7 +4745,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun, u int throttled; /* no need to continue the timer with no bandwidth constraint */ - if (cfs_b->quota == RUNTIME_INF) + if (!cfs_bandwidth_throttling_on(cfs_b)) goto out_deactivate; throttled = !list_empty(&cfs_b->throttled_cfs_rq); @@ -4806,7 +4861,7 @@ static void __return_cfs_rq_runtime(struct cfs_rq *cfs_rq) return; raw_spin_lock(&cfs_b->lock); - if (cfs_b->quota != RUNTIME_INF && + if (cfs_bandwidth_throttling_on(cfs_b) && cfs_rq->runtime_expires == cfs_b->runtime_expires) { cfs_b->runtime += slack_runtime; @@ -4854,7 +4909,7 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b) return; } - if (cfs_b->quota != RUNTIME_INF && cfs_b->runtime > slice) + if (cfs_bandwidth_throttling_on(cfs_b) && cfs_b->runtime > slice) runtime = cfs_b->runtime; expires = cfs_b->runtime_expires; @@ -5048,7 +5103,7 @@ static void __maybe_unused update_runtime_enabled(struct rq *rq) struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)]; raw_spin_lock(&cfs_b->lock); - cfs_rq->runtime_enabled = cfs_b->quota != RUNTIME_INF; + cfs_rq->runtime_enabled = cfs_bandwidth_throttling_on(cfs_b); raw_spin_unlock(&cfs_b->lock); } rcu_read_unlock(); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9309bf05ff0c..92e8a824c6fe 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -338,6 +338,7 @@ extern struct list_head task_groups; #ifdef CONFIG_CFS_BANDWIDTH extern void cfs_bandwidth_has_tasks_changed_work(struct work_struct *work); +extern const u64 min_cfs_quota_period; #endif struct cfs_bandwidth { @@ -370,6 +371,9 @@ struct cfs_bandwidth { /* work_struct to adjust settings asynchronously */ struct work_struct has_tasks_changed_work; + /* runtime assigned to previous period */ + u64 prev_runtime; + short idle; short period_active; struct hrtimer period_timer; -- 2.17.1