Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755421AbcKYQGE (ORCPT ); Fri, 25 Nov 2016 11:06:04 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:40165 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753589AbcKYQF4 (ORCPT ); Fri, 25 Nov 2016 11:05:56 -0500 Date: Fri, 25 Nov 2016 17:04:56 +0100 From: Peter Zijlstra To: "Michael Kerrisk (man-pages)" Cc: Mike Galbraith , Ingo Molnar , linux-man , lkml , Thomas Gleixner Subject: Re: RFC: documentation of the autogroup feature [v2] Message-ID: <20161125160456.GP3092@twins.programming.kicks-ass.net> References: <41d802dc-873a-ff02-17ff-93ce50f3e925@gmail.com> <1479901185.4306.38.camel@gmx.de> <327586fa-4672-d070-0ded-850654586273@gmail.com> <1479915229.4306.106.camel@gmx.de> <7513b0a5-c5d0-3a92-5849-995af22601e4@gmail.com> <1479921075.4306.153.camel@gmx.de> <1480078973.4075.58.camel@gmx.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4152 Lines: 107 On Fri, Nov 25, 2016 at 04:04:25PM +0100, Michael Kerrisk (man-pages) wrote: > >> ┌─────────────────────────────────────────────────────┐ > >> │FIXME │ > >> ├─────────────────────────────────────────────────────┤ > >> │How do the nice value of a process and the nice │ > >> │value of an autogroup interact? Which has priority? │ > >> │ │ > >> │It *appears* that the autogroup nice value is used │ > >> │for CPU distribution between task groups, and that │ > >> │the process nice value has no effect there. (I.e., │ > >> │suppose two autogroups each contain a CPU-bound │ > >> │process, with one process having nice==0 and the │ > >> │other having nice==19. It appears that they each │ > >> │get 50% of the CPU.) It appears that the process │ > >> │nice value has effect only with respect to schedul‐ │ > >> │ing relative to other processes in the *same* auto‐ │ > >> │group. Is this correct? │ > >> └─────────────────────────────────────────────────────┘ > > > > Yup, entity nice level affects distribution among peer entities. > > Huh! I only just learned about this via my experiments while > investigating autogroups. > > How long have things been like this? Always? (I don't think > so.) Since the arrival of CFS? Since the arrival of > autogrouping? (I'm guessing not.) Since some other point? > (When?) Ever since cfs-cgroup, this is a fundamental design point of cgroups, and has therefore always been the case for autogroups (as that is nothing more than an application of the cgroup code). > It seems to me that this renders the traditional process > nice pretty much useless. (I bet I'm not the only one who'd > be surprised by the current behavior.) Its really rather fundamental to how the whole hierarchical things works. CFS is a weighted fair queueing scheduler; this means each entity receives: w_i dt_i = dt -------- \Sum w_j CPU ______/ \______ / | | \ A B C D So if each entity {A,B,C,D} has equal weight, then they will receive equal time. Explicitly, for C you get: w_C dt_C = dt ----------------------- (w_A + w_B + w_C + w_D) Extending this to a hierarchy, we get: CPU ______/ \______ / | | \ A B C D / \ E F Where C becomes a 'server' for entities {E,F}. The weight of C does not depend on its child entities. This way the time of {E,F} becomes a straight product of their ratio with C. That is; the whole thing becomes, where l denotes the level in the hierarchy and i an entity on that level: l w_g,i dt_l,i = dt \Prod ---------- g=0 \Sum w_g,j Or more concretely, for E: w_E dt_1,E = dt_0,C ----------- (w_E + w_F) w_C w_E = dt ----------------------- ----------- (w_A + w_B + w_C + w_D) (w_E + w_F) And this 'trivially' extends to SMP, with the tricky bit being that the sums over all entities end up being machine wide, instead of per CPU, which is a real and royal pain for performance. Note that this property, where the weight of the server entity is independent from its child entities is a desired feature. Without that it would be impossible to control the relative weights of groups, and that is the sole parameter of the WFQ model. It is also why Linus so likes autogroups, each session competes equally amongst one another.