Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752377AbdHPVUs (ORCPT ); Wed, 16 Aug 2017 17:20:48 -0400 Received: from mail-io0-f179.google.com ([209.85.223.179]:38449 "EHLO mail-io0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752105AbdHPVUr (ORCPT ); Wed, 16 Aug 2017 17:20:47 -0400 From: Mathieu Poirier To: mingo@redhat.com, peterz@infradead.org Cc: tj@kernel.org, vbabka@suse.cz, lizefan@huawei.com, akpm@linux-foundation.org, weiyongjun1@huawei.com, juri.lelli@arm.com, rostedt@goodmis.org, claudio@evidence.eu.com, luca.abeni@santannapisa.it, bristot@redhat.com, linux-kernel@vger.kernel.org, mathieu.poirier@linaro.org Subject: [PATCH 0/7] sched/deadline: fix cpusets bandwidth accounting Date: Wed, 16 Aug 2017 15:20:36 -0600 Message-Id: <1502918443-30169-1-git-send-email-mathieu.poirier@linaro.org> X-Mailer: git-send-email 2.7.4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3503 Lines: 73 This is a renewed attempt at fixing a problem reported by Steve Rostedt [1] where DL bandwidth accounting is not recomputed after CPUset and CPUhotplug operations. When CPUhotplug and some CUPset manipulation take place root domains are destroyed and new ones created, loosing at the same time DL accounting pertaining to utilisation. An earlier attempt by Juri [2] used the scheduling classes' rq_online() and rq_offline() methods, something that highlighted a problem with sleeping DL tasks. The email thread that followed envisioned creating a list of sleeping tasks to circle through when recomputing DL accounting. In this set the problem is addressed by relying on existing list of tasks (sleeping or not) already maintained by CPUsets. When CPUset or CPUhotplug operations have completed we circle through the list of tasks maintained by each CPUset looking for DL tasks. When a DL task is found its utilisation is added to the root domain it pertains to by way of its runqueue. The advantage of proceeding this way is that recomputing of DL accounting is done the same way for both active and inactive tasks, along with guaranteeing that DL accounting for tasks end up in the correct root domain regardless of the CPUset topology. The disadvantage is that circling through all the tasks in a CPUset can be time consuming. The counter argument is that both CPUset and CPUhotplug operations are time consuming in the first place. OPEN ISSUE: Regardless of how we proceed (using existing CPUset list or new ones) we need to deal with DL tasks that span more than one root domain, something that will typically happen after a CPUset operation. For example, if we split the number of available CPUs on a system in two CPUsets and then turn off the 'sched_load_balance' flag on the parent CPUset, DL tasks in the parent CPUset will end up spanning two root domains. One way to deal with this is to prevent CPUset operations from happening when such condition is detected, as enacted in this set. Although simple this approach feels brittle and akin to a "whack-a-mole" game. A better and more reliable approach would be to teach the DL scheduler to deal with tasks that span multiple root domains, a serious and substantial undertaking. I am sending this as a starting point for discussion. I would be grateful if you could take the time to comment on the approach and most importantly provide input on how to deal with the open issue underlined above. Many thanks, Mathieu [1]. https://lkml.org/lkml/2016/2/3/966 [2]. https://marc.info/?l=linux-kernel&m=145493552607388&w=2 Mathieu Poirier (7): sched/topology: Adding function partition_sched_domains_locked() cpuset: Rebuild root domain deadline accounting information sched/deadline: Keep new DL task within root domain's boundary cgroup: Constrain 'sched_load_balance' flag when DL tasks are present cgroup: Concentrate DL related validation code in one place cgroup: Constrain the addition of CPUs to a new CPUset sched/core: Don't change the affinity of DL tasks include/linux/sched.h | 3 + include/linux/sched/deadline.h | 8 ++ include/linux/sched/topology.h | 9 ++ kernel/cgroup/cpuset.c | 186 ++++++++++++++++++++++++++++++++++++++--- kernel/sched/core.c | 10 +-- kernel/sched/deadline.c | 47 ++++++++++- kernel/sched/sched.h | 3 - kernel/sched/topology.c | 31 +++++-- 8 files changed, 272 insertions(+), 25 deletions(-) -- 2.7.4