Received: by 2002:a05:7412:251c:b0:e2:908c:2ebd with SMTP id w28csp1915378rda; Tue, 24 Oct 2023 07:11:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEARqTrtqxFR+Xc4jAOx1jEsbwc+IL1AnlNarGusXVW+njBO7RyH4c37Ry8UNR3K2dY9u1C X-Received: by 2002:a17:90a:7547:b0:27d:5504:4ccc with SMTP id q65-20020a17090a754700b0027d55044cccmr11941111pjk.11.1698156666256; Tue, 24 Oct 2023 07:11:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698156666; cv=none; d=google.com; s=arc-20160816; b=byRdhFjRj7vrZGq6ifDKOaAhg1hXPeD5dRyuHs9g5MoRm+2gZJnpGOKTPc5zkzW1Cz 9Yy7mpxGVoTmRIyHZsWX0RjGbAjh0vXg8L5sBSuYogcOulfw03+4wzLDXb6ZhUetr3eK qkHxRSOjm2/pSjVDxIEveAqTV1+jFT/ul8heXCfKKlKQdKgWrwZLOrfADgDicyfFmLKc JBCLAvUczhmnS46ZIwL++Houy+wcMdmasXAq+SeJitEJ36I5fMuqyr0AYttQV+Fg5Ia2 554YxdElCedvFltSTBc78HVy8NFrQA9pKuq2x1Z6YV2FZz6h+Cpq9dx02HKj+ZPGQZZn ChIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id; bh=xhVVkfD0QVAZMzPGLbVr+y93VXyLtTSE6+4mFkUhwO0=; fh=p/mMFbdQl/1ICKf/RGFONazR3B0U+onbq64/P9sOM98=; b=pySNRwMG/RJRRuql5tYDt261SMZT+QmXRu1/6T7LurEQ+kmulJy6chIlQmVgGNEFMx GonijNitpZp/4uRSTP/NJFy1HijJloQFV5pDn9f+NIQEasSb6vMfYAtL7Ba1iXIj9UeW 4h78TqcHTYzPDlEusmBdfoLNCXuiI62+76S9Mcii4jO94KPFe+4F+x+9oaT+SsuhCy74 Beq4KLO2a4yLm5K6p+mYPT00trIBVJdmUB7ZzdZTu9pPO87MDFqH47KQVhXqlmmZdCmW I97P+EbPLm32q4Y7yqQqTorlnQhOGdSauxmgF06qqjbfN7Z28KdwMU3My1r5qvnek6EM XeKg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id bu18-20020a632952000000b0056a36f9eb0esi8820638pgb.15.2023.10.24.07.10.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Oct 2023 07:11:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id A833B8069273; Tue, 24 Oct 2023 07:10:32 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343601AbjJXOKL (ORCPT + 99 others); Tue, 24 Oct 2023 10:10:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234664AbjJXOKJ (ORCPT ); Tue, 24 Oct 2023 10:10:09 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 09789B6 for ; Tue, 24 Oct 2023 07:10:07 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AF743C15; Tue, 24 Oct 2023 07:10:47 -0700 (PDT) Received: from [192.168.2.82] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E3BAF3F762; Tue, 24 Oct 2023 07:10:03 -0700 (PDT) Message-ID: <908bb624-1778-4f57-a89b-503a4076cb2e@arm.com> Date: Tue, 24 Oct 2023 16:10:01 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 1/2] sched/fair: Introduce UTIL_FITS_CAPACITY feature (v2) To: Mathieu Desnoyers , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Valentin Schneider , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Vincent Guittot , Juri Lelli , Swapnil Sapkal , Aaron Lu , Chen Yu , Tim Chen , K Prateek Nayak , "Gautham R . Shenoy" , x86@kernel.org References: <20231019160523.1582101-1-mathieu.desnoyers@efficios.com> <20231019160523.1582101-2-mathieu.desnoyers@efficios.com> Content-Language: en-US From: Dietmar Eggemann In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 24 Oct 2023 07:10:32 -0700 (PDT) On 23/10/2023 17:04, Mathieu Desnoyers wrote: > On 2023-10-23 10:11, Dietmar Eggemann wrote: >> On 19/10/2023 18:05, Mathieu Desnoyers wrote: > > [...] >>>   +static unsigned long scale_rt_capacity(int cpu); >>> + >>> +/* >>> + * Returns true if adding the task utilization to the estimated >>> + * utilization of the runnable tasks on @cpu does not exceed the >>> + * capacity of @cpu. >>> + * >>> + * This considers only the utilization of _runnable_ tasks on the @cpu >>> + * runqueue, excluding blocked and sleeping tasks. This is achieved by >>> + * using the runqueue util_est.enqueued. >>> + */ >>> +static inline bool task_fits_remaining_cpu_capacity(unsigned long >>> task_util, >>> +                            int cpu) >> >> This is almost like the existing task_fits_cpu(p, cpu) (used in Capacity >> Aware Scheduling (CAS) for Asymmetric CPU capacity systems) except the >> latter only uses `util = task_util_est(p)` and deals with uclamp as well >> and only tests whether p could fit on the CPU. > > This is indeed a major difference between how asym capacity check works > and what is introduced here: > > asym capacity check only checks whether the given task theoretically > fits in the cpu if that cpu was completely idle, without considering the > current cpu utilization. Yeah, asymmetric CPU capacity systems have to make sure that p fits on the idle/sched_idle CPU, hence the use of sync_entity_load_avg() and asym_fits_cpu(). > My approach is to consider the current util_est of the cpu to check > whether the task fits in the remaining capacity. True. > I did not want to use the existing task_fits_cpu() helper because the > notions of uclamp bounds appear to be heavily tied to the fact that it > checks whether the task fits in an _idle_ runqueue, whereas the check I > am introducing here is much more restrictive: it checks that the task > fits on the runqueue within the remaining capacity. I see. Essentially what you do is util_fits_cpu(util_est(CPU + p), 0, 1024, CPU) in !uclamp_is_used() The uclamp_is_used() case is task-centric though. (*) >> Or like find_energy_efficient_cpu() (feec(), used in >> Energy-Aware-Scheduling (EAS)) which uses cpu_util(cpu, p, cpu, 0) to >> get: >> >>    max(util_avg(CPU + p), util_est(CPU + p)) > > I've tried using cpu_util(), but unfortunately anything that considers > blocked/sleeping tasks in its utilization total does not work for my > use-case. > > From cpu_util(): > >  * CPU utilization is the sum of running time of runnable tasks plus the >  * recent utilization of currently non-runnable tasks on that CPU. OK, I see. Occasions in which `util_avg(CPU + p) > util_est(CPU + p)` would ruin it for your use-case. >> feec() >>      ... >>      for (; pd; pd = pd->next) >>          ... >>          util = cpu_util(cpu, p, cpu, 0); >>          ... >>          fits = util_fits_cpu(util, util_min, util_max, cpu) >>                                     ^^^^^^^^^^^^^^^^^^ >>                                    not used when uclamp is not active (1) >>              ... >>              capacity = capacity_of(cpu) >>              fits = fits_capacity(util, capacity) >>              if (!uclamp_is_used()) (1) >>                  return fits >> >> So not introducing new functions like task_fits_remaining_cpu_capacity() >> in this area and using existing one would be good. > > If the notion of uclamp is not tied to the way asym capacity check is > done against a theoretically idle runqueue, I'd be OK with using this, > but so far both appear to be very much tied. Yeah, uclamp_is_used() scenarios are more complicated (see *). > When I stumbled on this fundamental difference between asym cpu capacity > check and the check introduced here, I've started wondering whether the > asym cpu capacity check would benefit from considering the target cpu > current utilization as well. We just adapted select_idle_sibling() for asymmetric CPU capacity systems by adding the asym_fits_cpu() to the idle/sched_idle check. For me so far sis() is all about finding an idle CPU and not task packing. >>> +{ >>> +    unsigned long total_util; >>> + >>> +    if (!sched_util_fits_capacity_active()) >>> +        return false; >>> +    total_util = READ_ONCE(cpu_rq(cpu)->cfs.avg.util_est.enqueued) + >>> task_util; >>> +    return fits_capacity(total_util, scale_rt_capacity(cpu)); >> >> Why not use: >> >> static unsigned long capacity_of(int cpu) >>      return cpu_rq(cpu)->cpu_capacity; >> >> which is maintained in update_cpu_capacity() as scale_rt_capacity(cpu)? > > The reason for preferring scale_rt_capacity(cpu) over capacity_of(cpu) > is that update_cpu_capacity() only runs periodically every > balance-interval, therefore providing a coarse-grained remaining > capacity approximation with respect to irq, rt, dl, and thermal > utilization. >> If it turns out that being coarse-grained is good enough, we may be able > to save some cycles by using capacity_of(), but not without carefully > considering the impacts of being imprecise. OK, I see. We normally consider capacity_of(cpu) as accurate enough. [...] >>> diff --git a/kernel/sched/features.h b/kernel/sched/features.h >>> index ee7f23c76bd3..9a84a1401123 100644 >>> --- a/kernel/sched/features.h >>> +++ b/kernel/sched/features.h >>> @@ -97,6 +97,12 @@ SCHED_FEAT(WA_BIAS, true) >>>   SCHED_FEAT(UTIL_EST, true) >>>   SCHED_FEAT(UTIL_EST_FASTUP, true) >> >> IMHO, asymmetric CPU capacity systems would have to disable the sched >> feature UTIL_FITS_CAPACITY. Otherwise CAS could deliver different >> results. task_fits_remaining_cpu_capacity() and asym_fits_cpu() work >> slightly different. > > I don't think they should be mutually exclusive. We should look into the > differences between those two more closely to make them work nicely > together instead. For instance, why does asym capacity only consider > whether tasks fit in a theoretically idle runqueue, when it could use > the current utilization of the runqueue to check that the task fits in > the remaining capacity ? We have EAS (feec()) for this on asymmetric CPU capacity systems (as our per-performance_domain packing strategy), which only works when !overutilized. When overutilized, we just need asym_fits_cpu() (select_idle_capacity() -> util_fits_cpu()) to select a fitting idle/sched_idle CPU in CAS which includes the uclamp handling. [...]