Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp3536432pxj; Tue, 15 Jun 2021 03:10:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyiRZ/rfdL5FjzkZiD3+uNly2gsiomk8CPPxihTq27zb8t8j+13K+yXPsaRxYqOmj3iJ9cn X-Received: by 2002:a17:906:3c44:: with SMTP id i4mr20089360ejg.135.1623751805895; Tue, 15 Jun 2021 03:10:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623751805; cv=none; d=google.com; s=arc-20160816; b=YYeAWNUacSq+46OgqM6KTB7ImYKOjrDSIU/mqvscqPeM2gUQhkYwOghnOTPPqQam/F dHeP0wyIWbu7h3Dv87ZovvZBvg/bnDRgw/C0uzQMCyDTBqHiutyq2A+m72GR2Vk/DVTT aRU/HQLUMtUeAm8eyis//ZDA5wlfiUnv2+cFdzQxre8SvNGBICcZpxiwjPkKyaQtbRWR Pv3ZadaTcWHNn4jm0ExuJDFLc3gegKUULdLDgNiKL0tDgf4TiYNfFRB/gd1N/HA5skfR /z/EWJRC5hRNJk2P/7HHu4Oyn9WORuxViig7MsHGXWEouaPRB5fQj3rcZWZeh8rIx6zq 0WaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=Idjopz2gYWnF1utGI3DGun+Cr2QxDBSfdI2+u1wONDA=; b=QqO5c4wWTX0WBOjt+wNoPR/uv3wvd2P9FR8a+zeisaFp1pmeNBdmW+vT2p2JgsndbJ bPmC3ghLnmM4PtGetzAp145EwyfvQS4430NyxqtI/ATWBtcXQqiSBBNRabRQWx13d45M 6aq0q9wzikRjdRKwA6oGtVh5lhaIDyAzlqVBRlSDsehmkoGvG9yESiv6GemQx6XLON7F tOwVlNGHygCroS7hb9xrMq9GyR/sQUsJ/XEWtFmnFdzQmTUsreClxzO7PqabGkS+KRIZ YLL7enP/tn/11oranbrk8BKNp8VAjSPmjBacjiVOtXj3eNrxoe3gl8mJD+sgJA8nTkM+ DgzA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ba17si15111936edb.400.2021.06.15.03.09.42; Tue, 15 Jun 2021 03:10:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231265AbhFOKJt (ORCPT + 99 others); Tue, 15 Jun 2021 06:09:49 -0400 Received: from foss.arm.com ([217.140.110.172]:58274 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231152AbhFOKJq (ORCPT ); Tue, 15 Jun 2021 06:09:46 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 31602D6E; Tue, 15 Jun 2021 03:07:42 -0700 (PDT) Received: from [192.168.178.6] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5D1E83F694; Tue, 15 Jun 2021 03:07:39 -0700 (PDT) Subject: Re: [PATCH] sched: cgroup SCHED_IDLE support To: Josh Don Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Paul Turner , David Rientjes , Oleg Rombakh , Viresh Kumar , Steve Sistare , Tejun Heo , linux-kernel References: <20210608231132.32012-1-joshdon@google.com> <7222c20a-5cbb-d443-a2fd-19067652a38e@arm.com> From: Dietmar Eggemann Message-ID: Date: Tue, 15 Jun 2021 12:06:57 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/06/2021 01:34, Josh Don wrote: > On Fri, Jun 11, 2021 at 9:43 AM Dietmar Eggemann > wrote: >> >> On 10/06/2021 21:14, Josh Don wrote: >>> Hey Dietmar, >>> >>> On Thu, Jun 10, 2021 at 5:53 AM Dietmar Eggemann >>> wrote: >>>> >>>> Any reason why this should only work on cgroup-v2? >>> >>> My (perhaps incorrect) assumption that new development should not >>> extend v1. I'd actually prefer making this work on v1 as well; I'll >>> add that support. >>> >>>> struct cftype cpu_legacy_files[] vs. cpu_files[] >>>> >>>> [...] >>>> >>>>> @@ -11340,10 +11408,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq, >>>>> >>>>> static DEFINE_MUTEX(shares_mutex); >>>>> >>>>> -int sched_group_set_shares(struct task_group *tg, unsigned long shares) >>>>> +#define IDLE_WEIGHT sched_prio_to_weight[ARRAY_SIZE(sched_prio_to_weight) - 1] >>>> >>>> Why not 3 ? Like for tasks (WEIGHT_IDLEPRIO)? >>>> >>>> [...] >>> >>> Went back and forth on this; on second look, I do think it makes sense >>> to use the IDLEPRIO weight of 3 here. This gets converted to a 0, >>> rather than a 1 for display of cpu.weight, which is also actually a >>> nice property. >> >> I'm struggling to see the benefit here. >> >> For a taskgroup A: Why setting A/cpu.idle=1 to force a minimum A->shares >> when you can set it directly via A/cpu.weight (to 1 (minimum))? >> >> WEIGHT cpu.weight tg->shares >> >> 3 0 3072 >> >> 15 1 15360 >> >> 1 10240 >> >> `A/cpu.weight` follows cgroup-v2's `weights` `resource distribution >> model`* but I can only see `A/cpu.idle` as a layer on top of it forcing >> `A/cpu.weight` to get its minimum value? >> >> *Documentation/admin-guide/cgroup-v2.rst > > Setting cpu.idle carries additional properties in addition to just the > weight. Currently, it primarily includes (a) special wakeup preemption > handling, and (b) contribution to idle_h_nr_running for the purpose of > marking a cpu as a sched_idle_cpu(). Essentially, the current > SCHED_IDLE mechanics. I've also discussed with Peter a potential > extension to SCHED_IDLE to manipulate vruntime. Right, I forgot about (b). But IMHO, (a) could be handled with this special tg->shares value for SCHED_IDLE. If there would be a way to open up `cpu.weight`, `cpu.weight.nice` (and `cpu,shares` for v1) to take a special value for SCHED_IDLE, then you won't need cpu.idle. And you could handle the functionality from sched_group_set_idle() directly in sched_group_set_shares(). In this case sched_group_set_shares() wouldn't have to be rejected on an idle tg. A tg would just become !idle by writing a different cpu.weight value. Currently, if you !idle a tg it gets the default NICE_0_LOAD. I guess cpu.weight [1, 10000] would be easy, 0 could be taken for that and mapped into weight = WEIGHT_IDLEPRIO (3, 3072) to call sched_group_set_shares(..., scale_load(weight). cpu.weight = 1 maps to (10, 10240) cpu.weight.nice [-20, 19] would be already more complicated, 20? And for cpu.shares [2, 2 << 18] 0 could be used. The issue here is that WEIGHT_IDLEPRIO (3, 3072) is a valid value already for shares. > We set the cgroup weight here, since by definition SCHED_IDLE entities > have the least scheduling weight. From the perspective of your > question, the analogous statement for tasks would be that we set task > weight to the min when doing setsched(SCHED_IDLE), even though we > already have a renice mechanism. I agree. `cpu.idle = 1` is like setting the task policy to SCHED_IDLE. And there is even the `cpu.weight.nice` to support the `task - tg` analogy on nice values. I'm just wondering if integrating this into `cpu.weight` and friends would be better to make the code behind this easier to grasp.