Received: by 2002:ab2:620c:0:b0:1ef:ffd0:ce49 with SMTP id o12csp1669166lqt; Wed, 20 Mar 2024 10:23:23 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU5IT/A2EMsneMKg+BnC0e9hOPPO38YgK5u6G+2Vh7akUGJPVWskmDyVpPbRGapiHslWYSsZGQLWZK4tejP6RJUefq7IVokmrVTP8i6ng== X-Google-Smtp-Source: AGHT+IF4RwkImrDviqNRbFsJj+9JsjlUIoH+EWS/vlj5WvQYQ1xepYtetqH4kVBg22JA/YZKKjdB X-Received: by 2002:a05:6512:312f:b0:513:572f:88f1 with SMTP id p15-20020a056512312f00b00513572f88f1mr10654990lfd.27.1710955403576; Wed, 20 Mar 2024 10:23:23 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710955403; cv=pass; d=google.com; s=arc-20160816; b=aeaHbog+tMXWUbyS9XboMGk/GfZt48bxlqx/vE8zasDnsZ3hnD+PJMwWOxyGc5JgDW 5h4xtcQSR7U8OkU1suczaahW6LLxB3fmbl4Urtp7pblWoEHqEpRHkvUgTPkCmw2+R+yq dWnWaYrvURr1kY65ejPFI79htvL4KYj/E3CPPvqYiI4yxCavq6ylQBKnyr6sNdv+DP3s gXDl/DSJXvQL0wil+pqMQi92kjjN0JJAlNR86HVEKluoibVyx4bBdobky4QTdF4o+4m8 PSjDQIARyn0a57zcWwSXYEycS1klXzD/XAgyiJfgpI+xJOUESgvpbowRMI9faQJdMMgl KY7A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=D8fU8etg1nRp9MzS4XjWFmGE6ca1Brf9TzioZ6cZ8rI=; fh=Q2HMes0FR4suIJbZ2jAh3+/rf5QZh22nARdftuKpDIQ=; b=yIxQxLY8BbT5/e+hRIVcv1BwK/iZ2bysvS/A5oo+EG6I7rQ49rPL8FQFf11GlyeTv8 6dleOchrAwDKkhSev6IVwsaVMX4OMap3GUeOnTnOumC4jQ5q9OPPohgH0VUrqi2FJfJK 2o5bLScCXJV9t1nEy3/EOiyObq291EpEC8T8z/5grEohZbrcafIWQsZgHZajUNJHBTtr 4b6qOWrZYcuWTmvwPHPUERBQdc2VUKzdGgvxp/C8+bXLLPO2ad2ZVLaQDQ4ASPrmQbM2 Un5F1xA0byq6r1MoO5CnfY7rIncrkr0pCKH/zVfobfS1nvcWMG3fFolUqM5NOxrnrXhm iCzA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-109233-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-109233-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id l17-20020a170906079100b00a468c0de70esi5071205ejc.901.2024.03.20.10.23.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Mar 2024 10:23:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-109233-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-109233-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-109233-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 246261F265CB for ; Wed, 20 Mar 2024 17:23:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 89A146A8D2; Wed, 20 Mar 2024 17:22:44 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 606506A326 for ; Wed, 20 Mar 2024 17:22:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710955363; cv=none; b=asFw9uPrqs49auml3YrZnHnI1W+DGalqHa+HsEJnaGSH863OhDJR/fHWPJl+8DaNuRLLGMKVh0jefPqz9uWaiHz+08zd4lYmdkel+tnwl1FgaWW89mRbVaTXYMQPW3oMj5Cx1uhc39uk664TSQ2dO2G13w6H9T3M7gH/fQRICsc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710955363; c=relaxed/simple; bh=Dboex1JOPKImMgkbPtAMCrvsxXL5E9rjh9oZLtCGJsw=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=K7qtHWRJQi3z2j9gxAIzkdliZ+VsbUu57ITa5kEBmieFAvY6lvypdP83UZocGwVTJGvkdWZ2DA2DRZneimPLtI+Ftw9GSQ4ij2vbSg7S4HyVx4MlzwZICfEie9ggu5BwH9HCgNjlLR9oqLFZRZ0V1p2TPac3QJdstpUP3hFH/MM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3255D1007; Wed, 20 Mar 2024 10:23:14 -0700 (PDT) Received: from [10.57.51.202] (unknown [10.57.51.202]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9C9EB3F762; Wed, 20 Mar 2024 10:22:36 -0700 (PDT) Message-ID: <15081257-2cf0-4e05-9f2c-3f38059a58b6@arm.com> Date: Wed, 20 Mar 2024 17:22:34 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 3/7] sched/uclamp: Introduce root_cfs_util_uclamp for rq To: Dietmar Eggemann , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: Qais Yousef , Morten Rasmussen , Lukasz Luba , Christian Loehle , linux-kernel@vger.kernel.org, David Dai , Saravana Kannan References: <68fbd0c0bb7e2ef7a80e7359512672a235a963b1.1706792708.git.hongyan.xia2@arm.com> <169ae6a7-7bdd-4c54-8825-b3ad5ca1cf64@arm.com> <757cbe97-ba55-44b7-9b25-ad1581410147@arm.com> Content-Language: en-US From: Hongyan Xia In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 20/03/2024 15:27, Dietmar Eggemann wrote: > On 19/03/2024 12:50, Hongyan Xia wrote: >> On 18/03/2024 18:21, Dietmar Eggemann wrote: >>> On 01/02/2024 14:11, Hongyan Xia wrote: >>> >>> [...] >>> >>>>       /* >>>>        * The code below (indirectly) updates schedutil which looks at >>>> @@ -6769,6 +6770,10 @@ enqueue_task_fair(struct rq *rq, struct >>>> task_struct *p, int flags) >>>>   #ifdef CONFIG_UCLAMP_TASK >>>>       util_uclamp_enqueue(&rq->cfs.avg, p); >>>>       update_util_uclamp(0, 0, 0, &rq->cfs.avg, p); >>>> +    if (migrated) >>> >>> IMHO, you don't need 'bool __maybe_unused migrated'. You can use: >>> >>>    if (flags & ENQUEUE_MIGRATED) >> >> I'm not sure if they are entirely equivalent. Both >> task_on_rq_migrating() and !task_on_rq_migrating() can have >> last_update_time == 0 but ENQUEUE_MIGRATED flag is only for the former. >> Maybe I'm missing something? > > That's true. There are 2: > > (!task_on_rq_migrating() && !p->se.avg.last_update_time) > > cases: > > (1) wakeup migration: ENQUEUE_MIGRATED (0x40) set in ttwu_do_wakeup() > > (2) new task: wake_up_new_task() -> activate_task(), ENQUEUE_MIGRATED is > not set. > > I assume you want to add the task's util_avg_uclamp to > rq->root_cfs_util_uclamp in (2) too? So: > > if (flags & ENQUEUE_MIGRATED || task_new) I see. Maybe we don't need to check last_update_time. I'll take a look. Although if we want to integrate sum aggregation with se rather than making it fully independent (in your comments below) like util_est, we'll probably just do that in update_util_avg() if (!se->avg.last_update_time && (flags & DO_ATTACH)) { ... } and don't bother doing it in the outside loop of enqueue_task_fair() anyway. > [...] > >>>>   /* avg must belong to the queue this se is on. */ >>>> -void update_util_uclamp(struct sched_avg *avg, struct task_struct *p) >>>> +void update_util_uclamp(u64 now, >>>> +            u64 last_update_time, >>>> +            u32 period_contrib, >>>> +            struct sched_avg *avg, >>>> +            struct task_struct *p) >>>>   { >>> >>> I was wondering why you use such a long parameter list for this >>> function. >>> >>> IMHO >>> >>>    update_util_uclamp(u64 now, struct cfs_rq *cfs_rq, struct >>> sched_entity *se) >>> >>> would work as well. You could check whether se represents a task inside >>> update_util_uclamp() as well as get last_update_time and period_contrib. >>> >>> The only reason I see is that you want to use this function for the RT >>> class as well later, where you have to deal with 'struct rt_rq' and >>> 'struct sched_rt_entity'. >>> >>> IMHO, it's always better to keep the implementation to the minimum and >>> only introduce changes which are related to the functionality you >>> present. This would make reviewing so much easier. >> >> Those parameters are necessary because of >> >> if (___update_load_sum()) { >>     ___update_load_avg(); >>     update_util_uclamp(); >> } > > So you need ___update_load_avg() happening for the `normal uclamp path` > and `last_uptade_time` and `period_contrib` for the `decay path` (1) of > update_util_uclamp()? > > This is pretty hard to grasp. Isn't there a cleaner solution for this? You are correct. Not sure if there's a better way. > Why do you need the: > > if (!avg) > return; > > thing in update_util_uclamp()? __update_load_avg_blocked_se() calls > update_util_uclamp(..., avg = NULL, ...) but this should follow (1)? I added it as a guard to rule out edge cases, but I think when you do __update_load_avg_blocked_se(), it has to be !on_rq so it will never enter this path. I think it doesn't hurt but I can remove it. >> We have to cache last_update_time and period_contrib, because after >> ___update_load_sum() is done, both parameters in sched_avg have already >> been updated and overwritten and we lose the timestamp when the >> sched_avg was previously updated. update_util_uclamp() needs to know >> when sched_avg was previously updated. >> >>> >>>>       unsigned int util, uclamp_min, uclamp_max; >>>>       int delta; >>>>   -    if (!p->se.on_rq) >>>> +    if (!p->se.on_rq) { >>>> +        ___update_util_uclamp_towards(now, >>>> +                          last_update_time, >>>> +                          period_contrib, >>>> +                          &p->se.avg.util_avg_uclamp, >>>> +                          0); >>>>           return; >>>> +    } >>> >>> You decay 'p->se.avg.util_avg_uclamp' which is not really related to >>> root_cfs_util_uclamp (patch header). IMHO, this would belong to 2/7. >> >> I would say this still belongs to 3/7, because 2/7 only implements >> utilization for on_rq tasks. This patch implements utilization for both >> on_rq and !on_rq. For rq, we have rq->cfs.avg.util_avg_uclamp (for >> on_rq) and rq->root_cfs_util_uclamp (for on_rq plus !on_rq). > > Looks like you maintain `rq->cfs.avg.util_avg_uclamp` (2) (consider all > runnable tasks) to be able to: > > (a) set rq->root_cfs_util_uclamp (3) to max((3), (2)) > > (b) check that if 'rq->cfs.h_nr_running == 0' that (2) = 0 too. > > uclamp is based on runnable tasks (so enqueue/dequeue) but you uclamp > around util_avg which has contributions from blocked tasks. And that's > why you have to maintain (3). And (3) only decays within > __update_load_avg_cfs_rq(). > Can this be the reason why th implementation is so convoluted? It's > definitely more complicated than util_est with its easy layout: > > enqueue_task_fair() > util_est_enqueue() > > dequeue_task_fair() > util_est_dequeue() > util_est_update() There's only one rule here, which is the root value must be the sum of all task util_avg_uclamp values. If PELT slowly decays each util_avg, then the sum will also decay in a similar fashion. The code here is just doing that. This 'convoluted' property is from PELT and not from sum aggregation. Actually this is what PELT used to do a while back, when the root CFS util was the sum of all tasks instead of being tracked separately, and we do the same decay here as we did back then. You are right that this feels convoluted, but sum aggregation doesn't attempt to change how PELT works so the decaying property is there due to PELT. I can keep exploring new ways to make the logic easier to follow. >> For tasks, we also have two utilization numbers, one is on_rq and the >> other is on_rq plus !on_rq. However, we know they do not have to be >> stored in separate variables and util_avg_uclamp can capture both. > > Here you lost me. Which value does 'p->se.avg.util_avg_uclamp' store? > 'runnable' or 'runnable + blocking'? I would say it's the latter one but > like in PELT we don't update the task signal when it's sleeping. The latter. We don't update the task signal when it's sleeping but we do when we need to use it, and that's enough. This is also the case for all blocked util_avg. >> Moving this to 2/7 might be fine, although then this would be the only >> !on_rq utilization in 2/7. I can add comments to clarify the situation. >> >>> This is the util_avg_uclamp handling for a se (task): >>> >>> enqueue_task_fair() >>> >>>    util_uclamp_enqueue() >>> >>>    update_util_uclamp()                 (1) >>> >>>      if (!p->se.on_rq)                  (x) >>>        ___update_util_uclamp_towards()  (2) >>> >>> dequeue_task_fair() >>> >>>    util_uclamp_dequeue() >>> >>> __update_load_avg_blocked_se() >>> >>>    update_util_uclamp() >>> >>>      (x) >>> >>> __update_load_avg_se() >>> >>>    update_util_uclamp() >>> >>>      (x) >>> >>> Why is it so unbalanced? Why do you need (1) and (2)? >>> >>> Isn't this just an indication that the se util_avg_uclamp >>> is done at the wrong places? >>> >>> Is there an other way to provide a task/rq signal as the base >>> for uclamp sum aggregation? >> >> (2) won't happen, because at that point p->se.on_rq must be 1. > > I see. > >> The sequence during enqueue_task_fair() is: >> >> enqueue_task_fair(p) >>     enqueue_entity(se) >>         update_load_avg(se) >>             update_util_uclamp(p) (decay path) >>         p->se.on_rq = 1; >>     util_uclamp_enqueue(p) >>     update_util_uclamp(p) (update path) >> >> The only reason why we want to issue update_util_uclamp() after seeing >> on_rq == 1 is that now it goes down the normal uclamp path and not the >> decay path. Otherwise, uclamp won't immediately engage during enqueue >> and has to wait a timer tick. > > OK, I see, the task contribution should be visible immediately after the > enqueue. > >> Ideally, we should: >> >> enqueue_task_fair(p) >>     enqueue_entity(se) >>         update_load_avg(se) >>             util_uclamp_enqueue(p) >>             update_util_uclamp(p) (force update path) >>         p->se.on_rq = 1; >> >> This requires us to invent a new flag to update_util_uclamp() to force >> the update path even when p->se.on_rq is 0. > > I guess you have to go this way to achieve a cleaner/easier integration > of 'util_avg_uclamp'. If we don't want to keep util_avg_uclamp separately like util_est but move it closer to se, then we can explore this option. >> At the moment I'm treating util_avg_uclamp in the same way as util_est >> after the comments in v1, which is independent and has its own code >> path. We can go back to the old style, where util_avg_uclamp is closer >> to how we treat se rather than a separate thing like util_est. > > Except that 'util_est' integration is much easier to understand. And > this is because of 'util_est' is clear runnable based only and is not > linked to any blocked part. True. Well, I can go back to the style in RFC v1. One big advantage of v2 is that we can compile out the support of uclamp very easily because it's treated more or less like an independent thing. > [...] >