Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp896389ybp; Wed, 9 Oct 2019 06:01:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqy4ti1MyzZq3xmOiTnG4KN8NjkHWCmweixklwyvifUHqL0RMTg/JEDxDkYWR0kRJD7V1SBG X-Received: by 2002:aa7:cb55:: with SMTP id w21mr2817871edt.163.1570626083835; Wed, 09 Oct 2019 06:01:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570626083; cv=none; d=google.com; s=arc-20160816; b=dSghKogWiUZDyk4zagOWEkEdgQhQyEsBBJ95IsmvtVcvERp+oWBAAsUGRiNDxJWapj QCH239pFixeFE3rQk1APUyV7RDvp9N4vcKa7WxpeKmMLPDTuXca20NVc2VIvzjqMam4H c9hxxYaYaCbWgMKzzTP2uey60V7UaFSCMmHPwfDZDvSzORXB4zKP0Gbx7vEQ8uAsstBi +xT982L9diw5J0ssQXiQ2Bbrxwyh8IUbWjFuXRFODxWrQYt26iM+WccYpIatLVo/Kc66 1Le81FsVGl21tOfm7w0Ja4Q17V9me+5bRhkx9xjC9aW2QnKmeWXlTInGGOoqouo1yW6r J9nA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :robot-unsubscribe:robot-id:message-id:mime-version:references :in-reply-to:cc:subject:to:reply-to:from:date; bh=spJNHJOomJoXpzFqzRSg56xoG09PHCEcdWZNSjyGvD0=; b=AFAERl6/KxKaPpjZ62I6NG3nJ/Z11R7oO1sVrZN/d4+XwltZdwewTkFUhfCgf2YsQz t0nPwsmBO9Gso//Ir18UhqVHoAX2dXkaD2Z4+h8MHXUBZ0Zb0TzQFPaf5XQEFVK1Y1AM iVPkq0UW3xhGPOMMhZrA58ULyZfwOG05s9gNf7TKuIlVjBaEx+V/j7pLfynwKkYj3ToD SKFzgxnB2icGohuaGUbfiREx3Nnx2o/nNEX2sLnlvVxniPP9YvGTqIuVogkFRYIojISU IHsG9QR+1ip3VbuxkcRXCWEv0W8WWfA93+O7pDC9zCp+UU3SHBmGaGezb9zEZmj7pibe uq8Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l8si1067842ejc.404.2019.10.09.06.00.52; Wed, 09 Oct 2019 06:01:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731453AbfJINAG (ORCPT + 99 others); Wed, 9 Oct 2019 09:00:06 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:50978 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731441AbfJINAE (ORCPT ); Wed, 9 Oct 2019 09:00:04 -0400 Received: from [5.158.153.53] (helo=tip-bot2.lab.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1iIBYu-0002qh-Bt; Wed, 09 Oct 2019 14:59:28 +0200 Received: from [127.0.1.1] (localhost [IPv6:::1]) by tip-bot2.lab.linutronix.de (Postfix) with ESMTP id 2B3921C02FC; Wed, 9 Oct 2019 14:59:20 +0200 (CEST) Date: Wed, 09 Oct 2019 12:59:20 -0000 From: "tip-bot2 for Xuewei Zhang" Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/urgent] sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision Cc: Phil Auld , Xuewei Zhang , "Peter Zijlstra (Intel)" , Anton Blanchard , Ben Segall , Dietmar Eggemann , Juri Lelli , Linus Torvalds , Mel Gorman , Steven Rostedt , Thomas Gleixner , Vincent Guittot , Ingo Molnar , Borislav Petkov , linux-kernel@vger.kernel.org In-Reply-To: <20191004001243.140897-1-xueweiz@google.com> References: <20191004001243.140897-1-xueweiz@google.com> MIME-Version: 1.0 Message-ID: <157062596009.9978.18110832344830515975.tip-bot2@tip-bot2> X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The following commit has been merged into the sched/urgent branch of tip: Commit-ID: 4929a4e6faa0f13289a67cae98139e727f0d4a97 Gitweb: https://git.kernel.org/tip/4929a4e6faa0f13289a67cae98139e727f0d4a97 Author: Xuewei Zhang AuthorDate: Thu, 03 Oct 2019 17:12:43 -07:00 Committer: Ingo Molnar CommitterDate: Wed, 09 Oct 2019 12:38:02 +02:00 sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision The quota/period ratio is used to ensure a child task group won't get more bandwidth than the parent task group, and is calculated as: normalized_cfs_quota() = [(quota_us << 20) / period_us] If the quota/period ratio was changed during this scaling due to precision loss, it will cause inconsistency between parent and child task groups. See below example: A userspace container manager (kubelet) does three operations: 1) Create a parent cgroup, set quota to 1,000us and period to 10,000us. 2) Create a few children cgroups. 3) Set quota to 1,000us and period to 10,000us on a child cgroup. These operations are expected to succeed. However, if the scaling of 147/128 happens before step 3, quota and period of the parent cgroup will be changed: new_quota: 1148437ns, 1148us new_period: 11484375ns, 11484us And when step 3 comes in, the ratio of the child cgroup will be 104857, which will be larger than the parent cgroup ratio (104821), and will fail. Scaling them by a factor of 2 will fix the problem. Tested-by: Phil Auld Signed-off-by: Xuewei Zhang Signed-off-by: Peter Zijlstra (Intel) Acked-by: Phil Auld Cc: Anton Blanchard Cc: Ben Segall Cc: Dietmar Eggemann Cc: Juri Lelli Cc: Linus Torvalds Cc: Mel Gorman Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vincent Guittot Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup") Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 83ab35e..682a754 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4926,20 +4926,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) if (++count > 3) { u64 new, old = ktime_to_ns(cfs_b->period); - new = (old * 147) / 128; /* ~115% */ - new = min(new, max_cfs_quota_period); - - cfs_b->period = ns_to_ktime(new); - - /* since max is 1s, this is limited to 1e9^2, which fits in u64 */ - cfs_b->quota *= new; - cfs_b->quota = div64_u64(cfs_b->quota, old); - - pr_warn_ratelimited( - "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n", - smp_processor_id(), - div_u64(new, NSEC_PER_USEC), - div_u64(cfs_b->quota, NSEC_PER_USEC)); + /* + * Grow period by a factor of 2 to avoid losing precision. + * Precision loss in the quota/period ratio can cause __cfs_schedulable + * to fail. + */ + new = old * 2; + if (new < max_cfs_quota_period) { + cfs_b->period = ns_to_ktime(new); + cfs_b->quota *= 2; + + pr_warn_ratelimited( + "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us = %lld, cfs_quota_us = %lld)\n", + smp_processor_id(), + div_u64(new, NSEC_PER_USEC), + div_u64(cfs_b->quota, NSEC_PER_USEC)); + } else { + pr_warn_ratelimited( + "cfs_period_timer[cpu%d]: period too short, but cannot scale up without losing precision (cfs_period_us = %lld, cfs_quota_us = %lld)\n", + smp_processor_id(), + div_u64(old, NSEC_PER_USEC), + div_u64(cfs_b->quota, NSEC_PER_USEC)); + } /* reset count so we don't come right back in here */ count = 0;