Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp7520115ybi; Mon, 22 Jul 2019 14:56:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqwSrrnp+fUjDccMCT1iK+N56Or3795+h0oo9EgwCPPpV21TiPHHDUitRrUrDDoBufiRdZ19 X-Received: by 2002:a63:f750:: with SMTP id f16mr43474288pgk.317.1563832618199; Mon, 22 Jul 2019 14:56:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563832618; cv=none; d=google.com; s=arc-20160816; b=Gl1WPkWw/K9csiLLD/EMZmYQ0YhBjuE66oqqcF/EXyLDN7ZuUUJsCbNRN/unODtjeF BcdjxjwOlLHBun1dwWX4g5QtwSLg2bMm4Ml121loh70T9qsSsFktoUfMT2JwEEgUIb+D iLRCIkm+z+MuwGw7w0W/q2x5+10BDC7qEuIuICPSSj4ClQrX51Mm2+by2mibztz0rgKt PjWHAYTf7unWAuI+Ayk1u8swcgvwtvNuR1kDPmCRoQhykKb2zIctLjDikCpXtQaZzLWy 5lNLJpYUObaV2gR/Plu66VBHlTHNs5cd/QGemJ39SrwkuTp0TYVz0e7NNtuQiGuMZk04 Isuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=zm+PzLOcxdYYyZYma+BWQUUwnpJkob9ExoulyCSTa1M=; b=bQdoCic41qR4P4iGsRNjfb/YTXDWIeoMMverlV8/6JvhllKQRW028nrK1s2Td0igBp sPwbbIJs54icJ7gNtZgGt4YJZFK8NWfK9yH/u8UgtWGr09cXOWSgPzb05RZmhKkD7lkt w5Q8X9Q8R4C7RmE/jrchK9OpvN45LVrrm4WT2SUASwPFdBqlza7ChK/holzkZYSmal7D mkfqrISeb9lEddY+sSMNfvMCG8s0qoR42isFhzFqxmongifacRfCNHOh3dncl5rkZvaC z6VC5qBAfG0uM913tAo8pwgMG4naMXWm4GFKZW4oj01VETqzth52QIjK5hkl9o3WTfcK 6Maw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f9si14328586pgc.510.2019.07.22.14.56.42; Mon, 22 Jul 2019 14:56:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727568AbfGVReB (ORCPT + 99 others); Mon, 22 Jul 2019 13:34:01 -0400 Received: from shelob.surriel.com ([96.67.55.147]:37644 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726590AbfGVReB (ORCPT ); Mon, 22 Jul 2019 13:34:01 -0400 Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from ) id 1hpcC7-0003HL-TG; Mon, 22 Jul 2019 13:33:51 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@fb.com, pjt@google.com, dietmar.eggemann@arm.com, peterz@infradead.org, mingo@redhat.com, morten.rasmussen@arm.com, tglx@linutronix.de, mgorman@techsingularity.net, vincent.guittot@linaro.org Subject: [PATCH RFC v3 0/14] sched,fair: flatten CPU controller runqueues Date: Mon, 22 Jul 2019 13:33:34 -0400 Message-Id: <20190722173348.9241-1-riel@surriel.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The current implementation of the CPU controller uses hierarchical runqueues, where on wakeup a task is enqueued on its group's runqueue, the group is enqueued on the runqueue of the group above it, etc. This increases a fairly large amount of overhead for workloads that do a lot of wakeups a second, especially given that the default systemd hierarchy is 2 or 3 levels deep. This patch series is an attempt at reducing that overhead, by placing all the tasks on the same runqueue, and scaling the task priority by the priority of the group, which is calculated periodically. My main TODO items for the next period of time are likely going to be testing, testing, and testing. I hope to find and flush out any corner case I can find, and make sure performance does not regress with any workloads, and hopefully improves some. Other TODO items: - More code cleanups. - Remove some more now unused code. - Reimplement CONFIG_CFS_BANDWIDTH. Plan for the CONFIG_CFS_BANDWIDTH reimplementation: - When a cgroup gets throttled, mark the cgroup and its children as throttled. - When pick_next_entity finds a task that is on a throttled cgroup, stash it on the cgroup runqueue (which is not used for runnable tasks any more). Leave the vruntime unchanged, and adjust that runqueue's vruntime to be that of the left-most task. - When a cgroup gets unthrottled, and has tasks on it, place it on a vruntime ordered heap separate from the main runqueue. - Have pick_next_task_fair grab one task off that heap every time it is called, and the min vruntime of that heap is lower than the vruntime of the CPU's cfs_rq (or the CPU has no other runnable tasks). - Place that selected task on the CPU's cfs_rq, renormalizing its vruntime with the GENTLE_FAIR_SLEEPERS logic. That should help interleave the already runnable tasks with the recently unthrottled group, and prevent thundering herd issues. - If the group gets throttled again before all of its task had a chance to run, vruntime sorting ensures all the tasks in the throttled cgroup get a chance to run over time. Changes from v2: - fixed the web server performance regression, in a way vaguely similar to what Josef Bacik suggested (blame me for the implementation) - removed some code duplication so the diffstat is redder than before - propagate sum_exec_runtime up the tree, in preparation for CFS_BANDWIDTH - small cleanups left and right Changes from v1: - use task_se_h_weight instead of task_se_h_load in calc_delta_fair and sched_slice, this seems to improve performance a little, but I still have some remaining regression to chase with our web server workload - implement a number of the changes suggested by Dietmar Eggemann (still holding out for a better name for group_cfs_rq_of_parent) This series applies on top of 5.2 include/linux/sched.h | 7 kernel/sched/core.c | 3 kernel/sched/debug.c | 17 - kernel/sched/fair.c | 780 ++++++++++++++++++++------------------------------ kernel/sched/pelt.c | 69 ++-- kernel/sched/pelt.h | 3 kernel/sched/sched.h | 11 7 files changed, 372 insertions(+), 518 deletions(-)