Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932558AbdHYTxn (ORCPT ); Fri, 25 Aug 2017 15:53:43 -0400 Received: from mail-lf0-f45.google.com ([209.85.215.45]:34715 "EHLO mail-lf0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932424AbdHYTxm (ORCPT ); Fri, 25 Aug 2017 15:53:42 -0400 MIME-Version: 1.0 In-Reply-To: <20170825115209.44a6b042@luca> References: <1502918443-30169-1-git-send-email-mathieu.poirier@linaro.org> <20170822142136.3604336e@luca> <20170824095326.4f5c1777@luca> <20170825080243.7591aa0c@sweethome> <20170825115209.44a6b042@luca> From: Mathieu Poirier Date: Fri, 25 Aug 2017 13:53:39 -0600 Message-ID: Subject: Re: [PATCH 0/7] sched/deadline: fix cpusets bandwidth accounting To: Luca Abeni Cc: Ingo Molnar , Peter Zijlstra , tj@kernel.org, vbabka@suse.cz, Li Zefan , akpm@linux-foundation.org, weiyongjun1@huawei.com, Juri Lelli , Steven Rostedt , Claudio Scordino , Daniel Bristot de Oliveira , "linux-kernel@vger.kernel.org" , Tommaso Cucinotta Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3191 Lines: 75 On 25 August 2017 at 03:52, Luca Abeni wrote: > On Fri, 25 Aug 2017 08:02:43 +0200 > luca abeni wrote: > [...] >> > The above demonstrate that even if we have two CPUsets new task belong >> > to the "default" CPUset and as such can use all the available CPUs. >> >> I still have a doubt (probably showing all my ignorance about >> CPUsets :)... In this situation, we have 3 CPUsets: "default", >> set1, and set2... Is everyone of these CPUsets associated to a >> root domain (so, we have 3 root domains)? Or only set1 and set2 are >> associated to a root domain? > > Ok, after reading (and hopefully understanding better :) the code, I > think this question was kind of silly... There are only 2 root domains, > corresponding to set1 and set2 (right?). Correct - although there is a default CPUset there isn't a default root domain. > > [...] > >> > So above we'd run the acceptance test on root >> > domain A and B before promoting the task. Of course we'd also have to >> > add the utilisation of that task to both root domain. Although simple >> > it goes at the core of the DL scheduler and touches pretty much every >> > aspect of it, something I'm reluctant to embark on. >> >> I see... So, the "default" CPUset does not have any root domain >> associated to it? If it had, we could just subtract the maximum >> utilizations of set1 and set2 to it when creating the root domains of >> set1 and set2. > ... > So, this idea of mine had no sense. > > I think the correct solution is what you implemented in your patchset > (if I understand it correctly). > > If we want to have task spanning multiple root domains, many more > changes in the code are needed... I am wondering if it would make more > sense to track utilizations per runqueue (instead of per root domain): > - when a task tries to become SCHED_DEADLINE, we count how many CPUs are > in its affinity mask. Let's call "n" this number > - then, we sum u / n (where "u" is the task's utilization) to the > utilization of every runqueue that is in its affinity mask, and we > check if all the sums are below the schedulability bound > > For tasks spanning one single root domain, this should be equivalent to > the current admission test. Moreover, this check should ensure that no > root domain can be ever overloaded (even if tasks span multiple > domains). This is an idea worth exploring. > But I do not know the locking implications for this idea... I suspect > it will not scale :( Right, scaling could be a problem - we'd have to prototype it and see how bad things get. We _may_ be able to figure something out with RCU trickery. As I mention in a previous email I toyed with the idea of extending the DL code to support more than one root domain. Maybe it is time to go back to it, finish the admission test and publish just that part... At least we would have code to comment on. Regardless of the avenue we choose to go with I think we could use my current solution as a stepping stone while we figure out what we really want to do. At least it would be an improvement on the current situation. > > > > Luca