Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp1019759rdb; Wed, 1 Nov 2023 09:07:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGiOy5FrVDa4MXeonqbn/LvpLnkuro9qXMFmi2OzwOZwknyblCa9B5k1Mfui4gmhxi6xZ9P X-Received: by 2002:a9d:6b8d:0:b0:6c6:50d0:1104 with SMTP id b13-20020a9d6b8d000000b006c650d01104mr17555878otq.27.1698854835092; Wed, 01 Nov 2023 09:07:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698854835; cv=none; d=google.com; s=arc-20160816; b=LhM1RpKUF6L65ceCTFN5Q+K23PWGCUXOSWJA//Pph4u+jsOVsO2f6RQFiHWfGw1zGb kufWc71PljTpUvxXYiRycsNwWOT5BWqjhsMnVaqv7VMtu/SRBXgBk1+1RCkhONH0aZDY xpIRErLtL+isux3om0mwV6Z6uea0HUo/68JsvZl+ET3wUxcKKWe18C59swdk6ggSNklW XLVR2tGRarbgt/8Goptf5NIXB7Y6hjrPv2XiUsnsOl3C5NnylUMx8CJTGW2uc2vPraj6 weyf1091MWJC1xGbfp4JQVzDYVHkMJ0ngGhix/ErUAhEDYDXXIR/jFG3cSOw/iTLRrLo 1fBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=Qi2fSsO+CeSmMhJo5MZRZvEsbLnAqEdR8Bw3jI8Te8M=; fh=AfLL/DKK2gQQEoZBzmnH4pb//TwGt2KTQ6T6ssunXrA=; b=y2tb5POO+Duh1EVOtnUfaATevnuoT3l4nEBe1gkF2TJD+ZeT+NqfsODt9iLA04MJyA 9NMgB1VBrFywGlQuhVK6QvBENKDpv1Z0F0su8v6uTC3pGya0ED6TYsRVJ4W8G8/P2bdr P78/WzCMC4nM+zH2rUVvCSDlZ57yfzwa2Zgj7UJtv8ksxF2sSiuKNojF4vUgwWpuXmDv vOsLC1V8roE612Ojs7UB5lo+kwJbFCuGHhdaZEEbj+9NvY1SeqLGXnBFRe4e6Bpv/8jt TPHjBZt/SyQs+MHX4Vj9ezP+X/QTui/B5AoAHOUzHENFx1R4h8YNHSgr9iXWaS+x3FZQ XuXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id b20-20020a9d4794000000b006c21c0e444bsi592231otf.179.2023.11.01.09.06.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 09:07:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 96A588048C14; Wed, 1 Nov 2023 09:06:53 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232538AbjKAQGl (ORCPT + 99 others); Wed, 1 Nov 2023 12:06:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229498AbjKAQGk (ORCPT ); Wed, 1 Nov 2023 12:06:40 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5ED8DF7 for ; Wed, 1 Nov 2023 09:06:34 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3CEEF2F4; Wed, 1 Nov 2023 09:07:15 -0700 (PDT) Received: from [192.168.178.6] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id ECA4F3F64C; Wed, 1 Nov 2023 09:06:31 -0700 (PDT) Message-ID: Date: Wed, 1 Nov 2023 17:06:30 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 2/6] sched/uclamp: Simulate PELT decay in util_avg_uclamp Content-Language: en-US To: Hongyan Xia , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Juri Lelli Cc: Qais Yousef , Morten Rasmussen , Lukasz Luba , Christian Loehle , linux-kernel@vger.kernel.org References: From: Dietmar Eggemann In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Wed, 01 Nov 2023 09:06:53 -0700 (PDT) On 04/10/2023 11:04, Hongyan Xia wrote: > From: Hongyan Xia > > Because util_avg_uclamp is not directly managed by PELT, it lacks the > nice property of slowly decaying to a lower value, resulting in > performance degredation due to premature frequency drops. > > Add functions to decay root cfs utilization and tasks that are not on > the rq. This way, we get the benefits of PELT while still maintaining > uclamp. The rules are simple: > > 1. When task is se->on_rq, enforce its util_avg_uclamp within uclamp > range. > 2. When task is !se->on_rq, PELT decay its util_avg_uclamp. > 3. When the root CFS util drops, PELT decay to the target frequency > instead of immediately dropping to a lower target frequency. > > TODO: Can we somehow integrate this uclamp sum aggregation directly into > util_avg, so that we don't need to introduce a new util_avg_uclamp > signal and don't need to simulate PELT decay? That's a good question. I'm wondering why you were not able to integrate the maintenance of the util_avg_uclamp values inside the existing PELT update functionality in fair.c ((__update_load_avg_xxx(), propagate_entity_load_avg() -> update_tg_cfs_util() etc.) Why do you need extra functions like ___decay_util_avg_uclamp_towards() and ___update_util_avg_uclamp() for this? > Signed-off-by: Hongyan Xia > --- > kernel/sched/fair.c | 20 +++++++++ > kernel/sched/pelt.c | 103 ++++++++++++++++++++++++++++++++++++++++--- > kernel/sched/sched.h | 2 + > 3 files changed, 119 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 33e5a6e751c0..420af57d01ee 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4311,17 +4311,22 @@ static inline int > update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) > { > unsigned long removed_load = 0, removed_util = 0, removed_runnable = 0; > + unsigned int removed_root_util = 0; unsigned long removed_load = 0, removed_util = 0, removed_runnable = 0; -unsigned int removed_root_util = 0; +unsigned int __maybe_unused removed_root_util = 0; Otherwise you get `warning: unused variable ‘rq’` w/ !CONFIG_UCLAMP_TASK > struct sched_avg *sa = &cfs_rq->avg; > int decayed = 0; > > if (cfs_rq->removed.nr) { > unsigned long r; > + struct rq *rq = rq_of(cfs_rq); > u32 divider = get_pelt_divider(&cfs_rq->avg); > > raw_spin_lock(&cfs_rq->removed.lock); > swap(cfs_rq->removed.util_avg, removed_util); > swap(cfs_rq->removed.load_avg, removed_load); > swap(cfs_rq->removed.runnable_avg, removed_runnable); > +#ifdef CONFIG_UCLAMP_TASK > + swap(rq->root_cfs_util_uclamp_removed, removed_root_util); > +#endif > cfs_rq->removed.nr = 0; > raw_spin_unlock(&cfs_rq->removed.lock); > [...] > #ifdef CONFIG_UCLAMP_TASK > +static void ___decay_util_avg_uclamp_towards(u64 now, > + u64 last_update_time, > + u32 period_contrib, > + unsigned int *old, > + unsigned int new_val) > +{ > + unsigned int old_val = READ_ONCE(*old); > + u64 delta, periods; > + > + if (old_val <= new_val) { > + WRITE_ONCE(*old, new_val); > + return; > + } Why is the function called `decay`? In case `new >= old` you set old = new and bail out. So it's also more like an `update` function? > + if (!last_update_time) > + return; > + delta = now - last_update_time; > + if ((s64)delta < 0) > + return; > + delta >>= 10; > + if (!delta) > + return; > + > + delta += period_contrib; > + periods = delta / 1024; > + if (periods) { > + u64 diff = old_val - new_val; > + > + /* > + * Let's assume 3 tasks, A, B and C. A is still on rq but B and > + * C have just been dequeued. The cfs.avg.util_avg_uclamp has > + * become A but root_cfs_util_uclamp just starts to decay and is > + * now still A + B + C. > + * > + * After p periods with y being the decay factor, the new > + * root_cfs_util_uclamp should become > + * > + * A + B * y^p + C * y^p == A + (A + B + C - A) * y^p > + * == cfs.avg.util_avg_uclamp + > + * (root_cfs_util_uclamp_at_the_start - cfs.avg.util_avg_uclamp) * y^p > + * == cfs.avg.util_avg_uclamp + diff * y^p > + * > + * So, instead of summing up each individual decayed values, we > + * could just decay the diff and not bother with the summation > + * at all. This is why we decay the diff here. > + */ > + diff = decay_load(diff, periods); > + WRITE_ONCE(*old, new_val + diff); > + } > +} Looks like ___decay_util_avg_uclamp_towards() is used for: (1) tasks with !se->on_rq to decay before enqueue (2) rq->root_cfs_util_uclamp to align with &rq_of(cfs_rq)->cfs->avg.util_avg_uclamp All the cfs_rq's and the taskgroup se's seem to be updated only in ___update_util_avg_uclamp() (which also handles the propagation towards the root taskgroup). [...]