Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1389544rwl; Wed, 5 Apr 2023 16:47:27 -0700 (PDT) X-Google-Smtp-Source: AKy350YJkiPcQqlmb/S9xJ9JKiPyFASG1CG9Vedinft7AKjMrYMCPa5sMR/9bkT+1NbF6qgo+ybU X-Received: by 2002:a17:903:1111:b0:19e:5cc3:828f with SMTP id n17-20020a170903111100b0019e5cc3828fmr10146915plh.27.1680738447324; Wed, 05 Apr 2023 16:47:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680738447; cv=none; d=google.com; s=arc-20160816; b=kRMuGWEVRR2+OAUT4Mo+sjrU0iRzU7K0IGh4zL6d0yioYpVG50/Zprq2nm15pBo62I DUlrhzajSCwtISXbe2wAZc5mKXfJekkn/qf6j7+mAjydkW6RtvrYJrXuGdde1KoITna6 98paJIwzpL8rB7G208grZH6R+VtTFcQeGp9SawOU+rUdumnOD6IXQ8aeQb+yaVONB2pg 4OHIop2oxiM55++6OxkVhs+3dbvlEwQ7O5DUmnY4j5YeKUrNK/tg4CroWXXIKW2uI4a0 EVz18230orgdjSkqUY4CNDNcFbs9b1YNRJXt6ioNrMhc7LEsqdZZLgmfJo2H3T5dO1Wp Dzew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=7g+nJD83KGO/5Q3M9wqXtM4GjJe+j3fJfw4W1uFI0Ko=; b=BfwvgJhPErDU9nWIK8BRpxA2rWLFw0F+gmd9RfKV3Vg8wqkjYis9Oq1u7G6CO0VIap sOishQ7qCFdZqM82hkecJWf9nlc5CaX4Y3d+OxUj5639i0GZxluFWdQ1gj+9qms8MPk8 YWhZLshJ/okfuQ3/XT1zYXmUaMNtC81gvDnirZvSM+f5xtVj9hRWxXP2LYB19m8em+6g MKlNqA9Zz/++fPQ0Et/8SgLYsloHPoZmo/WDRgyf0oLy/SOL/Bt2rSg9b2/i6JyjvFKi 0YL3vumXOX2svv8A3e9v8mIi6gj9nV7NRi+GuWXd6eZHiiTur71/RVR/K3XO6k+Usl2u m8AQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=A603qC3B; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w184-20020a6362c1000000b00502ef565706si13435129pgb.139.2023.04.05.16.47.15; Wed, 05 Apr 2023 16:47:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=A603qC3B; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232507AbjDEXgU (ORCPT + 99 others); Wed, 5 Apr 2023 19:36:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230237AbjDEXgT (ORCPT ); Wed, 5 Apr 2023 19:36:19 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 719241BEA for ; Wed, 5 Apr 2023 16:36:17 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id t4so32536672wra.7 for ; Wed, 05 Apr 2023 16:36:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680737776; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7g+nJD83KGO/5Q3M9wqXtM4GjJe+j3fJfw4W1uFI0Ko=; b=A603qC3B8eGcR0RRofg0DOqi9m9uXLjjHi5SZN8MT0JPaxdGKRrCwKuoGDeebTNhpe ezkv4XyzqykqFuK//Mg7d2nmdXTb/CL7AoYB85sXU75WQTEV2dLqQTFI/AX5bRU0nohR NRwmoIX6aMKGiClyAD7LdtY8T0JcHgYergLpmS+RPWRp8jN/SQJF+Uq532LPRcpxPHpI bTZ6F1IoKMzt/sDTJ1pC3yG0lXbLWDng34uYOkqN7Nhc+ClkllqN1G5NNlIIeoN76b5m YAdOt/rA9A3rJtBx/zeMLeGmiUMVLwCEjlwzyWg+LsHkg6krxTbdkLxCreYGMCC2HrA6 Mvug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680737776; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7g+nJD83KGO/5Q3M9wqXtM4GjJe+j3fJfw4W1uFI0Ko=; b=hpsm2vfeUrxBLTPNRcENL2WTGDLVP5n0UJmL/eQaIYmsx89SkUihq0yNkxdyCQuPy1 nO09yimCPnHw4sk4okBpUC9ZYB7YsONvh8m6QLLCGzasc++qca3Gr+c83x4RfijsHq0J uTdCSdWY3YwocLyFE7Mhyu7BfusWsJwBGOS5IftH2s4dPl+Q4UOPc3AZAueLW7Cu9dST YEeTENx2FwOWe6z9MAYTnb6yHY4hQpuirtJFrJsh9DypZ7qWAwAc03aNeiMfqolB0EfZ eSAeaa9btrE/fbHxheNM18vAaClLv7ROb6bEursItXCAPLFhvFXL5RSAimgd+KQpiuE1 qB4Q== X-Gm-Message-State: AAQBX9ebLPHf5D00h78/STNB727y42bNXp1Uhsf5C8hCPOEyB3QNH0sR zM6kovqTiSbbHV5r3087uztzyiBVVAutSjKxDLc3yQ== X-Received: by 2002:a05:6000:1f89:b0:2ee:e45d:71cf with SMTP id bw9-20020a0560001f8900b002eee45d71cfmr27465wrb.6.1680737775781; Wed, 05 Apr 2023 16:36:15 -0700 (PDT) MIME-Version: 1.0 References: <20230330224348.1006691-1-davidai@google.com> <20230330224348.1006691-2-davidai@google.com> <0da58608-2958-fc6c-effb-2f73013c609f@arm.com> In-Reply-To: From: David Dai Date: Wed, 5 Apr 2023 16:36:04 -0700 Message-ID: Subject: Re: [RFC PATCH 1/6] sched/fair: Add util_guest for tasks To: Saravana Kannan Cc: Dietmar Eggemann , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , kernel-team@android.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-15.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 5, 2023 at 2:43=E2=80=AFPM Saravana Kannan wrote: > > On Wed, Apr 5, 2023 at 3:50=E2=80=AFAM Dietmar Eggemann > wrote: > > > > On 04/04/2023 03:11, David Dai wrote: > > > On Mon, Apr 3, 2023 at 4:40=E2=80=AFAM Dietmar Eggemann > > > wrote: > > >> > > >> Hi David, > > > Hi Dietmar, thanks for your comments. > > >> > > >> On 31/03/2023 00:43, David Dai wrote: > > > > [...] > > > > >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > >>> index 6986ea31c984..998649554344 100644 > > >>> --- a/kernel/sched/fair.c > > >>> +++ b/kernel/sched/fair.c > > >>> @@ -4276,14 +4276,16 @@ static int newidle_balance(struct rq *this_= rq, struct rq_flags *rf); > > >>> > > >>> static inline unsigned long task_util(struct task_struct *p) > > >>> { > > >>> - return READ_ONCE(p->se.avg.util_avg); > > >>> + return max(READ_ONCE(p->se.avg.util_avg), > > >>> + READ_ONCE(p->se.avg.util_guest)); > > >>> } > > >>> > > >>> static inline unsigned long _task_util_est(struct task_struct *p) > > >>> { > > >>> struct util_est ue =3D READ_ONCE(p->se.avg.util_est); > > >>> > > >>> - return max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED)); > > >>> + return max_t(unsigned long, READ_ONCE(p->se.avg.util_guest), > > >>> + max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANG= ED))); > > >>> } > > >> > > >> I can't see why the existing p->uclamp_req[UCLAMP_MIN].value can't b= e > > >> used here instead p->se.avg.util_guest. > > > Using p->uclamp_req[UCLAMP_MIN].value would result in folding in > > > uclamp values into task_util and task_util_est for all tasks that hav= e > > > uclamp values set. The intent of these patches isn=E2=80=99t to modif= y > > > existing uclamp behaviour. Users would also override util values from > > > the guest when they set uclamp values. > > >> > > >> I do understand the issue of inheriting uclamp values at fork but do= n't > > >> get the not being `additive` thing. We are at task level here. > > > > > Uclamp values are max aggregated with other tasks at the runqueue > > > level when deciding CPU frequency. For example, a vCPU runqueue may > > > have an util of 512 that results in setting 512 to uclamp_min on the > > > vCPU task. This is insufficient to drive a frequency response if it > > > shares the runqueue with another host task running with util of 512 a= s > > > it would result in a clamped util value of 512 at the runqueue(Ex. If > > > a guest thread had just migrated onto this vCPU). > > > > OK, see your point now. You want an accurate per-task boost for this > > vCPU task on the host run-queue. > > And a scenario in which a vCPU can ask for 100% in these moments is not > > sufficient I guess? In this case uclamp_min could work. > > Right. vCPU can have whatever utilization and there can be random host > threads completely unrelated to the VM. And we need to aggregate both > of their util when deciding CPU freq. > > > > > >> The fact that you have to max util_avg and util_est directly in > > >> task_util() and _task_util_est() tells me that there are places wher= e > > >> this helps and uclamp_task_util() is not called there. > > > Can you clarify on this point a bit more? > > > > Sorry, I meant s/util_est/util_guest/. > > > > The effect of the change in _task_util_est() you see via: > > > > enqueue_task_fair() > > util_est_enqueue() > > cfs_rq->avg.util_est.enqueued +=3D _task_util_est(p) > > > > so that `sugov_get_util() -> cpu_util_cfs() -> > > cfs_rq->avg.util_est.enqueued` can see the effect of util_guest? That sequence looks correct to me. > > > > Not sure about the change in task_util() yet. task_util() provides some signaling in addition to task_util_est() via: find_energy_effcient_cpu() cpu_util_next() lsub_positive(&util, task_util(p)); ... util +=3D task_util(p); //Can provide a better signal than util_est. dequeue_task_fair() util_est_update() ue.enqueued =3D task_util(p); //Updates ue.ewma Thanks, David > > > > >> When you say in the cover letter that you tried uclamp_min, how exac= tly > > >> did you use it? Did you run the existing mainline or did you use > > >> uclamp_min as a replacement for util_guest in this patch here? > > > > > I called sched_setattr_nocheck() with .sched_flags =3D > > > SCHED_FLAG_UTIL_CLAMP when updating uclamp_min and clamp_max is left > > > at 1024. Uclamp_min was not aggregated with task_util and > > > task_util_est during my testing. The only caveat there is that I adde= d > > > a change to only reset uclamp on fork when testing(I realize there is > > > specifically a SCHED_FLAG_RESET_ON_FORK, but I didn=E2=80=99t want to= reset > > > other sched attributes). > > > > OK, understood. It's essentially a util_est v2 for vCPU tasks on host. > > Yup. We initially looked into just overwriting util_est, but didn't > think that'll land well with the community :) as it was a bit messier > because we needed to make sure the current util_est update paths don't > run for vCPU tasks on host (because those values would be wrong). > > > >>> static inline unsigned long task_util_est(struct task_struct *p) > > >>> @@ -6242,6 +6244,15 @@ enqueue_task_fair(struct rq *rq, struct task= _struct *p, int flags) > > >>> */ > > >>> util_est_enqueue(&rq->cfs, p); > > >>> > > >>> + /* > > >>> + * The normal code path for host thread enqueue doesn't take = into > > >>> + * account guest task migrations when updating cpufreq util. > > >>> + * So, always update the cpufreq when a vCPU thread has a > > >>> + * non-zero util_guest value. > > >>> + */ > > >>> + if (READ_ONCE(p->se.avg.util_guest)) > > >>> + cpufreq_update_util(rq, 0); > > >> > > >> > > >> This is because enqueue_entity() -> update_load_avg() -> > > >> attach_entity_load_avg() -> cfs_rq_util_change() requires root run-q= ueue > > >> (&rq->cfs =3D=3D cfs_rq) to call cpufreq_update_util()? > > > The enqueue_entity() would not call into update_load_avg() due to the > > > check for !se->avg.last_update_time. se->avg.last_update_time is > > > non-zero because the vCPU task did not migrate before this enqueue. > > > This enqueue path is reached when util_guest is updated for the vCPU > > > task through the sched_setattr_nocheck call where we want to ensure a > > > frequency update occurs. > > > > OK, vCPU tasks are pinned so always !WF_MIGRATED wakeup I guess? > > Even if say little-vCPU threads are allowed to migrate within little > CPUs, this will still be an issue. While a vCPU thread is continuously > running on a single CPU, a guest thread can migrate into that vCPU and > cause a huge increase in util_guest. But that won't trigger an cpufreq > update on the host side because the host doesn't see a task migration. > That's what David is trying to address. > > -Saravana