Received: by 2002:a17:90a:37e8:0:0:0:0 with SMTP id v95csp375220pjb; Fri, 4 Oct 2019 00:41:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqzk+m2UbNueMNKqUaBA+nMg2PfdU9GPHrXV/axTD2dRSfpa4ZC6zHzIw+9o1yOpgd3lx/3K X-Received: by 2002:a17:906:16cd:: with SMTP id t13mr11202804ejd.153.1570174895475; Fri, 04 Oct 2019 00:41:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570174895; cv=none; d=google.com; s=arc-20160816; b=TS2jU+XfYx55lbVghpxDEkx4j2gLWRrdQpVnmg6nQ/pIk9vCnTT8e/QMIRTr1mXVd2 eXeEWO1Ayr24A1de5SJy0e5YNS4YRgTw/sBgprcPV7lOHy7jnmC2hXvixPtOKGd8NFAD GlAnCAjnjd1PLhrDNqnYPglOUd5HYIE1OiTYPx98nhoBACwqZkk/DAjbE0u5XOyUuJGo dwIdbFasbj8Oax1/a6tFDA+gVl0hiaTwxTLstF4z37JJGcAp8UH6vj0G4Buofaqv4Nfo Yj/SmEAq8vk8N7YqenHDrOLThbveu2dCrePxqtSFNqtIq2kPCtho2h7s7LF9bbq1Z4Fh bYlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:mime-version :message-id:date:dkim-signature; bh=ICbLP1MuSAXizuW0BgmIirtLum0JXKK9vWjz+u8oZdo=; b=m6MxzZKOMl8wdV6xtICXRxz5Yq05aQbzl8+iVtbBQsXt9Pl25Hk0wWQXlZcxM/gKLw oVPnmIpm00crjaZTqadmwy3VtPIcHwfLptG0yKGXIigiGQs3WX57K6n6tC5tEklJNuA3 9SkPY5OVRcBblMDXKxdDVj7i/UoL0wp2oaIAhG6SmYmVtWWUGIP/eEsVVeWoqeHPdvN1 WrPX/Q3Fmti8u4pmIDn02EJ9YrQYxzkeMQUPHcGCYSVhe51jLhdDBkh2UvtFCSJEgFrB 2DsvBIbqkWFm3HH2C2CNozr/PwlOTc6POOfGXD6gvySObQMEeMvJVNf2735mzMwLW03v CCfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qHzGwag5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j47si2980358ede.117.2019.10.04.00.41.11; Fri, 04 Oct 2019 00:41:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qHzGwag5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732579AbfJDAMu (ORCPT + 99 others); Thu, 3 Oct 2019 20:12:50 -0400 Received: from mail-pg1-f201.google.com ([209.85.215.201]:44799 "EHLO mail-pg1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732327AbfJDAMu (ORCPT ); Thu, 3 Oct 2019 20:12:50 -0400 Received: by mail-pg1-f201.google.com with SMTP id z7so3089333pgk.11 for ; Thu, 03 Oct 2019 17:12:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=ICbLP1MuSAXizuW0BgmIirtLum0JXKK9vWjz+u8oZdo=; b=qHzGwag5o9+V8imsCxLgTM+yncKYR7E5E0Q2yDvm4QruHteu1CLMS14NHJEUbwp3NB 7nXk34dPQiH7LirbOzy3sxC8CIRr6SZjKmDjqfBiA1D1HR4D7wMbin9PQsdLSQoP8wDp Zswy3leTvQv7qj74/NaYED5B5yqO5kE60hz1wdehXUJUHSTEmoIUuL+qnd7/Tu0LX8OK 75jXavIAbQSROfC7n/Y1EVjwPWSgCQRwIfBPGrAhkg/zkr7hAzfPTPsxm0OMkHreU6qA kliuz4jpsth1+gCh7Zt9Q2IkU/ROOANzumit5ge6nlRERr4XXJ4xj/BbB4twS65VX13c GIZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=ICbLP1MuSAXizuW0BgmIirtLum0JXKK9vWjz+u8oZdo=; b=PYypYuJWCKSq2OdoNEH28GGKmViJ/4kMQ3XkTWO1bviRlwUsL/LxsVndEnoVRje/CV HYL5MlcohBnz2UVSREbry2xRZNarNhZ5yfBRjtQhcODj6AwfojTP7/nOyVtRdkrQE1gq zH4tF1ETGH75C1aPGTIfk6KISyJpEfygHn6nIv7rQJQ1DuqS9yyDf2+9HroUHo98kH0b wi5nO5Eq4gvL0ZrPGcUITO/oMusR6KDn9EmDdsK1G8LUsdaG64k8CfFvAF1VF8nd2LxC akDaYRDykdSKY2sFjWiSvllP0J0InbrYg/tr61Yo1f778iaLC2Y8eM23u+j1cjU51TNV I4RQ== X-Gm-Message-State: APjAAAVgjDodSqf7cda2P2fOg/iOM/sr86IGQNXvmAsZaw5ikI+yT7vN +W/tQwuCJZAOnwAVBpVyd3yOF4bE+ylZ X-Received: by 2002:a65:528a:: with SMTP id y10mr12572307pgp.70.1570147968231; Thu, 03 Oct 2019 17:12:48 -0700 (PDT) Date: Thu, 3 Oct 2019 17:12:43 -0700 Message-Id: <20191004001243.140897-1-xueweiz@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.23.0.581.g78d2f28ef7-goog Subject: [PATCH] sched/fair: scale quota and period without losing quota/period ratio precision From: Xuewei Zhang To: Phil Auld , Peter Zijlstra , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman Cc: Anton Blanchard , Linus Torvalds , Thomas Gleixner , linux-kernel@vger.kernel.org, stable@vger.kernel.org, trivial@kernel.org, Xuewei Zhang Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org quota/period ratio is used to ensure a child task group won't get more bandwidth than the parent task group, and is calculated as: normalized_cfs_quota() = [(quota_us << 20) / period_us] If the quota/period ratio was changed during this scaling due to precision loss, it will cause inconsistency between parent and child task groups. See below example: A userspace container manager (kubelet) does three operations: 1) Create a parent cgroup, set quota to 1,000us and period to 10,000us. 2) Create a few children cgroups. 3) Set quota to 1,000us and period to 10,000us on a child cgroup. These operations are expected to succeed. However, if the scaling of 147/128 happens before step 3), quota and period of the parent cgroup will be changed: new_quota: 1148437ns, 1148us new_period: 11484375ns, 11484us And when step 3) comes in, the ratio of the child cgroup will be 104857, which will be larger than the parent cgroup ratio (104821), and will fail. Scaling them by a factor of 2 will fix the problem. Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup") Signed-off-by: Xuewei Zhang --- kernel/sched/fair.c | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 83ab35e2374f..b3d3d0a231cd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4926,20 +4926,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer) if (++count > 3) { u64 new, old = ktime_to_ns(cfs_b->period); - new = (old * 147) / 128; /* ~115% */ - new = min(new, max_cfs_quota_period); - - cfs_b->period = ns_to_ktime(new); - - /* since max is 1s, this is limited to 1e9^2, which fits in u64 */ - cfs_b->quota *= new; - cfs_b->quota = div64_u64(cfs_b->quota, old); - - pr_warn_ratelimited( - "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n", - smp_processor_id(), - div_u64(new, NSEC_PER_USEC), - div_u64(cfs_b->quota, NSEC_PER_USEC)); + /* + * Grow period by a factor of 2 to avoid lossing precision. + * Precision loss in the quota/period ratio can cause __cfs_schedulable + * to fail. + */ + new = old * 2; + if (new < max_cfs_quota_period) { + cfs_b->period = ns_to_ktime(new); + cfs_b->quota *= 2; + + pr_warn_ratelimited( + "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us = %lld, cfs_quota_us = %lld)\n", + smp_processor_id(), + div_u64(new, NSEC_PER_USEC), + div_u64(cfs_b->quota, NSEC_PER_USEC)); + } else { + pr_warn_ratelimited( + "cfs_period_timer[cpu%d]: period too short, but cannot scale up without losing precision (cfs_period_us = %lld, cfs_quota_us = %lld)\n", + smp_processor_id(), + div_u64(old, NSEC_PER_USEC), + div_u64(cfs_b->quota, NSEC_PER_USEC)); + } /* reset count so we don't come right back in here */ count = 0; -- 2.23.0.581.g78d2f28ef7-goog