Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp4082196ybi; Mon, 3 Jun 2019 05:30:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqzroo2XPEMw+rdRyTY4nKni02nqatn2ToDV7txxbegPaO9nPhF5WmjJ2MPMSKlXHhREJ93f X-Received: by 2002:a63:d652:: with SMTP id d18mr28880977pgj.112.1559565039817; Mon, 03 Jun 2019 05:30:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559565039; cv=none; d=google.com; s=arc-20160816; b=iLkpJKzVOX3TuRDcsZIcPMTIxzgcTYFQfB74M5VOlaO1/Y6D30jLIXQ2/rlpDk7AmY mQhIcLkwekdHZaYyY0hN4uVW75nNZH0gybAPetHavRjD7ivcFxFc/S3qpMhDwAfNaz0B Zq0c+FM+68foeWYBA0kXCK+CMgiuicwtv2yxujBJDkfNah4bMiZxLaqVCCxHvR0NW658 F8IJvDr5/Q0MyJVqVpT9gSOfWvsjO8BVrcrxJLWj/lEkJF7sR+yrv9Kh60uLIFPiSqL/ r0S8HDIQO1J53uJvyqPiqhSTW6P2iq9y4VNFe4pKGXT1WZkFNwJe9DwfOItuooXpeNFg dLUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=3MMN5Pf42zb+i8ZKoM3PEMI/UsPSPR+o62XqU7jhMi8=; b=gcE1evd2mbnapEU4YWyCsTbJT7Lo3ZmrmF28eWgAMLP06l9dAfTOy0ULghIULus4FP f6aoKyRqGFEu9lUtFO50c6zhMgPWT9PLZIgWWEfcsEIrYtzcYtDRxM98Of5JbR1AW31O 8l/ejc/omRphmPrO9uDD9cfIlmG5JtDD9OtyKu7pTz5xYH3Yl65PvxEOD3ypV9Np2/75 lmw53lgy3xqLK8a3N4d4hssIXC7uUQ8MWb/elQGizsPi45wggTbg3qONDHHktmDuaOiw ozf5zCMfQ8pXbQLwaFSfBd232KrMV2zJlpunMC9DFh9TUWumWwE/kmlCbc2q/4OvuKZW rHHw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m19si16909644pgj.149.2019.06.03.05.30.22; Mon, 03 Jun 2019 05:30:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727183AbfFCM1e (ORCPT + 99 others); Mon, 3 Jun 2019 08:27:34 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:50128 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726794AbfFCM1d (ORCPT ); Mon, 3 Jun 2019 08:27:33 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1205B15A2; Mon, 3 Jun 2019 05:27:33 -0700 (PDT) Received: from darkstar (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6BE4A3F5AF; Mon, 3 Jun 2019 05:27:28 -0700 (PDT) Date: Mon, 3 Jun 2019 13:27:25 +0100 From: Patrick Bellasi To: Tejun Heo Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, linux-api@vger.kernel.org, Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Vincent Guittot , Viresh Kumar , Paul Turner , Quentin Perret , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle , Suren Baghdasaryan Subject: Re: [PATCH v9 12/16] sched/core: uclamp: Extend CPU's cgroup controller Message-ID: <20190603122725.GB19426@darkstar> References: <20190515094459.10317-1-patrick.bellasi@arm.com> <20190515094459.10317-13-patrick.bellasi@arm.com> <20190531153545.GE374014@devbig004.ftw2.facebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190531153545.GE374014@devbig004.ftw2.facebook.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 31-May 08:35, Tejun Heo wrote: [...] > > These attributes: > > > > a) are available only for non-root nodes, both on default and legacy > > hierarchies, while system wide clamps are defined by a generic > > interface which does not depends on cgroups. This system wide > > interface enforces constraints on tasks in the root node. > > I'd much prefer if they weren't entangled this way. The system wide > limits should work the same regardless of cgroup's existence. cgroup > can put further restriction on top but mere creation of cgroups with > cpu controller enabled shouldn't take them out of the system-wide > limits. That's correct and what you describe matches, at least on its intents, the current implementation provided in: [PATCH v9 14/16] sched/core: uclamp: Propagate system defaults to root group https://lore.kernel.org/lkml/20190515094459.10317-15-patrick.bellasi@arm.com/ System clamps always work the same way, independently from cgroups: they define the upper bound for both min and max clamps. When cgroups are not available, tasks specific clamps are always capped by system clamps. When cgroups are available, the root task group clamps are capped by the system clamps, which affects its "effective" clamps and propagate them down the hierarchy to child's "effective" clamps. That's done in: [PATCH v9 13/16] sched/core: uclamp: Propagate parent clamps https://lore.kernel.org/lkml/20190515094459.10317-14-patrick.bellasi@arm.com/ Example 1 --------- Here is an example of system and groups clamps aggregation: min max system defaults 400 600 cg_name min min.effective max max.effective /uclamp 1024 400 500 500 /uclamp/app 512 400 512 500 /uclamp/app/pid_smalls 100 100 200 200 /uclamp/app/pid_bigs 500 400 700 500 The ".effective" clamps are used to define the actual clamp value to apply to tasks, according to the aggregation rules defined in: [PATCH v9 15/16] sched/core: uclamp: Use TG's clamps to restrict TASK's clamps https://lore.kernel.org/lkml/20190515094459.10317-16-patrick.bellasi@arm.com/ All the above, to me it means that: - cgroups are always capped by system clamps - cgroups can further restrict system clamps Does that match with your view? > > b) enforce effective constraints at each level of the hierarchy which > > are a restriction of the group requests considering its parent's > > effective constraints. Root group effective constraints are defined > > by the system wide interface. > > This mechanism allows each (non-root) level of the hierarchy to: > > - request whatever clamp values it would like to get > > - effectively get only up to the maximum amount allowed by its parent > > I'll come back to this later. > > > c) have higher priority than task-specific clamps, defined via > > sched_setattr(), thus allowing to control and restrict task requests > > This sounds good. > > > Add two new attributes to the cpu controller to collect "requested" > > clamp values. Allow that at each non-root level of the hierarchy. > > Validate local consistency by enforcing util.min < util.max. > > Keep it simple by do not caring now about "effective" values computation > > and propagation along the hierarchy. > > So, the followings are what we're doing for hierarchical protection > and limit propgations. > > * Limits (high / max) default to max. Protections (low / min) 0. A > new cgroup by default doesn't constrain itself further and doesn't > have any protection. Example 2 --------- Let say we have: /tg1: util_min=200 (as a protection) util_max=800 (as a limit) the moment we create a subgroup /tg1/tg11, in v9 it is initialized with the same limits _and protections_ of its father: /tg1/tg11: util_min=200 (protection inherited from /tg1) util_max=800 (limit inherited from /tg1) Do you mean that we should have instead: /tg1/tg11: util_min=0 (no protection by default at creation time) util_max=800 (limit inherited from /tg1) i.e. we need to reset the protection of a newly created subgroup? > * A limit defines the upper ceiling for the subtree. If an ancestor > has a limit of X, none of its descendants can have more than X. That's correct, however we distinguish between "requested" and "effective" values. Example 3 --------- We can have: cg_name max max.effective /uclamp/app 400 400 /uclamp/app/pid_bigs 500 400 Which means that a subgroup can "request" a limit (max=500) higher then its father (max=400), while still getting only up to what its father allows (max.effective = 400). Example 4 --------- Tracking the actual requested limit (max=500) it's useful to enforce it once the father limit should be relaxed, for example we will have: cg_name max max.effective /uclamp/app 600 600 /uclamp/app/pid_bigs 500 500 where a subgroup gets not more than what it has been configured for. This is the logic implemented by cpu_util_update_eff() in: [PATCH v9 13/16] sched/core: uclamp: Propagate parent clamps https://lore.kernel.org/lkml/20190515094459.10317-14-patrick.bellasi@arm.com/ > * A protection defines the upper ceiling of protections for the > subtree. If an andester has a protection of X, none of its > descendants can have more protection than X. Right, that's the current behavior in v9. > Note that there's no way for an ancestor to enforce protection its > descendants. It can only allow them to claim some. This is > intentional as the other end of the spectrum is either descendants > losing the ability to further distribute protections as they see fit. Ok, that means I need to update in v10 the initialization of subgroups min clamps to be none by default as discussed in the above Example 2, right? [...] Cheers, Patrick -- #include Patrick Bellasi