Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp2795054ybd; Mon, 24 Jun 2019 12:49:34 -0700 (PDT) X-Google-Smtp-Source: APXvYqxZG7uhdSnqTuHmekHvzhDSdhP3oBuclIDArI7rijhc0qhAv2N8vmz4FfbSpw/1L8OLNSN6 X-Received: by 2002:a17:902:722:: with SMTP id 31mr31947300pli.163.1561405774238; Mon, 24 Jun 2019 12:49:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561405774; cv=none; d=google.com; s=arc-20160816; b=sOI15Jjg/LDHIlUA371DQ3+0uAiKDBVuuxTKbEX9jvk6vivqrWmTM5jwsO0UkPMlX9 IutXt58iVaG5K1g3beTe2I6INTnNzv3RY5PRskR0bflllTVZ6j7gVCwRTS/CeKIhBs8f F5DnPmGkaUsXdipZEeK+BSFRUfZlfxiAj8cSn/AP0HxcCwNwmBdbhTxSTkiOP/hETeqS RBgOtPdEHdZljv/vVVU28vjwgvQ1thUj//ikdHbjkbs2TLrQWlISvCHB7bhXgwiAubuI qtJ1hKWeCo94XDHkwGeOQ9dsy07a9BmWGJ5bKGU5w3I4xzF82XPRepbYlhJLGsdYP/Ci IlbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:to:from:dkim-signature; bh=g7SeDWVCJ8EIBUNOCPrNemk+yV65sZFM9h29PN2LZx0=; b=lNxAwXoWx09hD+rlwFNy6FJNsWWsZvAveXk0TMRZZ9239zXwbyH1qBjfmmLX/LZ0JL jYSFNTgymR/BvGhby96LXrRb+xTHHnMNcUH41ikqXK3aICVZgHbU6oDBAH0256MrCZGb 1EqoBqnWyUV0HGRlQf0L17dvrIpCrwfoKEbuugQuZkHqQYxm+T/uzZJ1t1c6y9rH+7Bp EKB0ykqrOq0ekwaPc2/0Okh9SQCNAZciu/c/CMB4qL2h83Dk78LbjEtw0CzwFyDGyw3j aU0VDFQpjjV8IB0r4dO+fX2xBKEsTTXjKApJQOgZPPm//nAUCITye1GOGJ/KdNndgg6V 5Gog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@indeed.com header.s=google header.b=yG5UXB03; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=indeed.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c2si10662616pgq.110.2019.06.24.12.49.18; Mon, 24 Jun 2019 12:49:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@indeed.com header.s=google header.b=yG5UXB03; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=indeed.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731114AbfFXPuX (ORCPT + 99 others); Mon, 24 Jun 2019 11:50:23 -0400 Received: from mail-ot1-f66.google.com ([209.85.210.66]:45085 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730525AbfFXPuW (ORCPT ); Mon, 24 Jun 2019 11:50:22 -0400 Received: by mail-ot1-f66.google.com with SMTP id x21so13946569otq.12 for ; Mon, 24 Jun 2019 08:50:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=indeed.com; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=g7SeDWVCJ8EIBUNOCPrNemk+yV65sZFM9h29PN2LZx0=; b=yG5UXB03/IrJRyiU10SAWztzkbmLfNTtyCoBXKb7Sa0ikLAOdOSDcRABxlZXhy7t/f qWp1J7376WXJvj1S9LLl5mOiaLp5PhRFFVqAXN2YyLWV2WuBaYYiYatq7AtawG6u8ylL Z38b1Nx8POJoiokCtjnLILdBbWdZwB5IeFnnQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=g7SeDWVCJ8EIBUNOCPrNemk+yV65sZFM9h29PN2LZx0=; b=op5HA2M45+xnXVYRO4f/9SKocEc4vgKvefAC2IEOD97RSKVeeymxN2BniXfByGAiyy OgNMHNBLSz77uBVT1ATLOWUuZr29kas/L5DajAZxdfQptoBtYENLMpOTV1jSZ6r8RyF2 04Ut0jHXqyh2CjIgoHlP2a8l4AvTnUXSuSZ7uJtKqvVk2uWfhb4JbdaDOwB93mIkzQUK AoSt6irTBQ6oxhajpYqmN18vsrb2k+kRoU5tkYOppFG6a4pAOA6mAwY91hVEKYejOg/N K/KsIy4DUs1GwIQ0EJCpgDun2eTtwaNxUJp/sVX6bLReXTFCp2i21wlMYkKs/F0+Z96k o94g== X-Gm-Message-State: APjAAAXS6Mb4UIoy8FMz6w2vNpiI1sCiTvW2/v2WRazYVmNKr0s/BiFY UKtZx3KH6gVr4dUnNrYse6Ue4g== X-Received: by 2002:a9d:7ccb:: with SMTP id r11mr49437066otn.80.1561391421494; Mon, 24 Jun 2019 08:50:21 -0700 (PDT) Received: from cando.ausoff.indeed.net ([97.105.47.162]) by smtp.gmail.com with ESMTPSA id x88sm4237710ota.56.2019.06.24.08.50.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jun 2019 08:50:21 -0700 (PDT) From: Dave Chiluk To: Ben Segall , Phil Auld , Peter Oskolkov , Peter Zijlstra , Ingo Molnar , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Brendan Gregg , Kyle Anderson , Gabriel Munos , John Hammond , Cong Wang Subject: [PATCH v4 1/1] sched/fair: Return all runtime when cfs_b has very little remaining. Date: Mon, 24 Jun 2019 10:50:04 -0500 Message-Id: <1561391404-14450-2-git-send-email-chiluk+linux@indeed.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1561391404-14450-1-git-send-email-chiluk+linux@indeed.com> References: <1558121424-2914-1-git-send-email-chiluk+linux@indeed.com> <1561391404-14450-1-git-send-email-chiluk+linux@indeed.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It has been observed, that highly-threaded, user-interactive applications running under cpu.cfs_quota_us constraints can hit a high percentage of periods throttled while simultaneously not consuming the allocated amount of quota. This impacts user-interactive non-cpu bound applications, such as those running in kubernetes or mesos when run on multiple cores. This has been root caused to threads being allocated per cpu bandwidth slices, and then not fully using that slice within the period. This results in min_cfs_rq_runtime remaining on each per-cpu cfs_rq. At the end of the period this remaining quota goes unused and expires. This expiration of unused time on per-cpu runqueues results in applications under-utilizing their quota while simultaneously hitting throttling. The solution is to return all spare cfs_rq->runtime_remaining when cfs_b->runtime nears the sched_cfs_bandwidth_slice. This balances the desire to prevent cfs_rq from always pulling quota with the desire to allow applications to fully utilize their quota. Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition") Signed-off-by: Dave Chiluk --- kernel/sched/fair.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f35930f..4894eda 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4695,7 +4695,9 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun, u return 1; } -/* a cfs_rq won't donate quota below this amount */ +/* a cfs_rq won't donate quota below this amount unless cfs_b has very little + * remaining runtime. + */ static const u64 min_cfs_rq_runtime = 1 * NSEC_PER_MSEC; /* minimum remaining period time to redistribute slack quota */ static const u64 min_bandwidth_expiration = 2 * NSEC_PER_MSEC; @@ -4743,16 +4745,27 @@ static void start_cfs_slack_bandwidth(struct cfs_bandwidth *cfs_b) static void __return_cfs_rq_runtime(struct cfs_rq *cfs_rq) { struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg); - s64 slack_runtime = cfs_rq->runtime_remaining - min_cfs_rq_runtime; + s64 slack_runtime = cfs_rq->runtime_remaining; + /* There is no runtime to return. */ if (slack_runtime <= 0) return; raw_spin_lock(&cfs_b->lock); if (cfs_b->quota != RUNTIME_INF && cfs_rq->runtime_expires == cfs_b->runtime_expires) { - cfs_b->runtime += slack_runtime; + /* As we near 0 quota remaining on cfs_b start returning all + * remaining runtime. This avoids stranding and then expiring + * runtime on per-cpu cfs_rq. + * + * cfs->b has plenty of runtime leave min_cfs_rq_runtime of + * runtime on this cfs_rq. + */ + if (cfs_b->runtime >= sched_cfs_bandwidth_slice() * 3 && + slack_runtime > min_cfs_rq_runtime) + slack_runtime -= min_cfs_rq_runtime; + cfs_b->runtime += slack_runtime; /* we are under rq->lock, defer unthrottling using a timer */ if (cfs_b->runtime > sched_cfs_bandwidth_slice() && !list_empty(&cfs_b->throttled_cfs_rq)) -- 1.8.3.1