Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757490AbcJPTep (ORCPT ); Sun, 16 Oct 2016 15:34:45 -0400 Received: from mail-lf0-f66.google.com ([209.85.215.66]:33100 "EHLO mail-lf0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757156AbcJPTek (ORCPT ); Sun, 16 Oct 2016 15:34:40 -0400 Date: Sun, 16 Oct 2016 21:34:20 +0200 From: Luca Abeni To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Tommaso Cucinotta , Juri Lelli , Thomas Gleixner , Andrea Parri , giuseppe lipari , Claudio Scordino Subject: Re: About group scheduling for SCHED_DEADLINE Message-ID: <20161016213420.64b772da@utopia> In-Reply-To: <20161010101558.GL3568@worktop.programming.kicks-ass.net> References: <20161009213938.3cec05ea@utopia> <20161010101558.GL3568@worktop.programming.kicks-ass.net> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4935 Lines: 118 Hi Peter, first of all, sorry for the delay in my answer On Mon, 10 Oct 2016 12:15:58 +0200 Peter Zijlstra wrote: > On Sun, Oct 09, 2016 at 09:39:38PM +0200, Luca Abeni wrote: > > > So, I started to think about this, and here are some ideas to start > > a discussion: > > 1) First of all, we need to decide the software interface. If I > > understand correctly (please correct me if I am wrong), cgroups > > let you specify a runtime and a period, and this means that the > > cgroup is reserved the specified runtime every period on all the > > cgroup's CPUs... In other words, it is not possible to reserve > > different runtimes/periods on different CPUs. Is this correct? > > That is the current state for RR/FIFO, but given that that is a > complete trainwreck, I think we can deprecate that and change the > interface. Ok > > My primary concern is getting something that actually works and makes > theoretical sense, and then we can worry about the interface. Well, my question was mainly about the kind of functionalities we want (there are various hierarchical scheduling models that makes theoretical sense and can work in practice...) I guess the main decision is about having identical CBS parameters for every (v)cpu of the cgroup, or allowing to have different parameters in the various vcpus. > > Is this what we > > want for hierarchical SCHED_DEADLINE? Or do we want to allow the > > possibility to schedule a cgroup with multiple "deadline servers" > > having different runtime/period parameters? (the first solution > > is easier to implement, the second one offers more degrees of > > freedom that might be used to improve the real-time schedulability) > > Right, I'm not sure what makes most sense, nor am I entirely sure on > what you mean with multiple deadline servers, is that different > different variables per CPU? Sorry, my description was not clear. I assumed we can model a cgroup with multiple (v)cpus as multiple "deadline servers" (one dl scheduling entity per (v)cpu, serving an RT runqueue). And the question was about the parameters (runtime and period/deadline) of these scheduling entity (basically, the same question as above: should all these entities in the cgroup have the same parameters?) [...] > > 4) From a more theoretical point of view, it would be good to define > > the scheduling model that needs to be implemented (based on > > something previously described on some paper, or defining a new > > model from scratch). > > > > Well, I hope this can be a good starting point for a discussion :) > > Right, so the problem we have is unspecified SCHED_FIFO on SMP and > historical behaviour. > > As you know we've extended FIFO to SMP by G-FIFO (run the m highest > prio tasks on m CPUs). But along with that, we allow arbitrary > affinity masks for RR/FIFO tasks. Ok; I think I've seen a paper about multi-processor scheduling of fixed priority tasks with arbitrary affinites, but I do not remember the details... You mean that we should perform a similar analysis for (g-)EDF too, right? > Now, the proposed model has identical CBS parameters for every (v)cpu > of the cgroup. This means that a cgroup must be overprovisioned in the > general case where nr_tasks < nr_cpus (and worse, the parameters must > match the max task). > > This leads to vast amounts of wasted resources. Right... Simplicity (in the interface and in the implementation) come at a cost (they force resource over-provisioning). > The alternative is different but fixed parameters per cpu, but that is > somewhat unwieldy in that it increases the configuration burden. But > you can indeed minimize the wasted resources and deal with the > affinity problem (AFAICT). Yes, I think so... But I suspect this solution introduces a small drawback: the fixed priority scheduler running below SCHED_DEADLINE (at the lower level of the hierarchy) must know what the SCHED_DEADLINE scheduler running above it is doing (so, the RR/FIFO scheduler must make a distinction between the various (v)cpus on which it can schedule tasks). > However, I think there's a third alternative. I have memories of a > paper from UNC (I'd have to dig through the site to see if I can > still find it) where they argue that for a hierarchical (G-)FIFO you > should use minimal concurrency, that is run the minimal number of > (v)cpu servers. Ok, I need to find and read that paper > This would mean we give a single CBS parameter and carve out the > minimal number (of max CBS) (v)cpu that fit in that. > > I'm just not sure how the random affinity crap works out for that, if > we have the (v)cpu servers migratable in the G-EDF and migrate to > whatever is demanded by the task at runtime it might work, but who > knows.. Analysis would be needed I think. I think Giuseppe Lipari (added in cc) can help with this analysis. Luca > > > > Any other opinions / options?