Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp400911rdb; Tue, 5 Dec 2023 08:24:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IGqMNp3RbzO31dKhMv9ykXycSlVXRVerUgybpRxRBe2vEC0902h/9HrnqIFOvydMCfw6Nha X-Received: by 2002:a17:90b:370c:b0:286:6cc0:b91b with SMTP id mg12-20020a17090b370c00b002866cc0b91bmr1187939pjb.82.1701793468448; Tue, 05 Dec 2023 08:24:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701793468; cv=none; d=google.com; s=arc-20160816; b=S/AJOUsLPLqf3BsuUYWtMsWop1sDDI37SGPrGEa5tFbJnN+qEBIRZ200+ADNem99fT AaB5QPZtSnA8aDf5tl9jCwSgbhbsC0flZYZ8rfBfIYLpqtY1B15K5xV0PjtUWJWBcQVv 93yX5SOABk8OvA2pQ89idC4/23Td1MXSxVeXdycOhHdKpWlIx9b91f2PctZblfqqNTsM XjaPTmPKqcD6BwgdHn+5NetG0DETOz/CYUkZkKl5dyNkRcGGbLPZWvQ3wVTykDY6J27D CnunPnUUxABh9zOpZVxMFKfBCirLyKdACU5qPb+ogn7r1WUDWbfCpY4mqByF9knGLXi9 e+KA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=AdOQcLNJtx1RJsEFnV0+ExH0oUNzKgaT4GmF+lpyZV8=; fh=syYFQraCTn3Gm6TDuLw+BJHCb0dJAEn0UrO3FjUhGGA=; b=TA2yqYtLoEwt2huIgmJC/OrBChC3GcMZhHrrnFt1Ib1Z8Eg+WQ3lFXDXLUrfezjWag Af5UP0WcMwgue7X8JWnKLHeGArhdlzb7edjRGzMh3yoL/H89f9QG+wcNrtdrc4LSaO55 gp7RDkrajT1JXCddQEbLR2ImNjpvYiEnoqCwetKDnQ6mNZoqJAjG9J+KnoBO8/lHgCtR M28Ihv3aQR/rZDvyZMoUwIfaB/WwLf3Yp+Ly6hcK9Sa9kjQBjoX/A9GDrrKMah+/xsrs obtslRDeNeXmRMhSP2FcjTMtwY8B8Kauc52uNHrQTAGOk6/23uETjd619z6OBKwTJ3Ts OikQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=yj6olYLe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id p2-20020a17090ab90200b002851727a227si5214673pjr.32.2023.12.05.08.24.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 08:24:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=yj6olYLe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 1EBE5806083E; Tue, 5 Dec 2023 08:23:05 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229710AbjLEQWr (ORCPT + 99 others); Tue, 5 Dec 2023 11:22:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229643AbjLEQWo (ORCPT ); Tue, 5 Dec 2023 11:22:44 -0500 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A5901B8 for ; Tue, 5 Dec 2023 08:22:49 -0800 (PST) Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2868605fa4aso1861891a91.0 for ; Tue, 05 Dec 2023 08:22:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1701793369; x=1702398169; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=AdOQcLNJtx1RJsEFnV0+ExH0oUNzKgaT4GmF+lpyZV8=; b=yj6olYLejPT6mKV2FFdDzRkGgMEFuuYN12+RMS0KFu/R4suexcYqOVjjiUePSoVmRv 5C6fXV2e4A3F1Oc/aNCCNmOlC+oEckrZDU74DW3QHe6Ws/AcldR1bUg2mjkXii9lWwgr 7vL2kN47jg5PCsaYd37zKdekmVCK7HQwfBP6zuQeYBrTy9iIviNnwJfRbEiGJHvxyY6l N6dFVp43yjdZIuLhxi9Q6CEL43Fw3zFBjD4niOBICe5CnRDze6VoSfkM3uV9zuHvddUS TCjpWeguP5sD05T0A8Ft/T1wCjwVzEAbqa1LKKYEfcz6AOgsuo6IHyPVojL87mYZtn1P AtLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701793369; x=1702398169; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AdOQcLNJtx1RJsEFnV0+ExH0oUNzKgaT4GmF+lpyZV8=; b=ETHdPb0PjnFV5q3zxPeHdSdv2/FUc3F1SevWiA8qAhjIGoIPtmyZInhVjlxWOSFo3A aZm41iuH9Gg116poM97fYDkn7RN18VcD/rds4gpBWvXpitQ4jogrlcfiUAhXsQoKdZz0 FmfYf5Otl4l7P6AJ9Yq+Rbuib0lGJZBZBwmNJryLL7mSMT5a1zQPfI4HDeHr+GaXAe64 C9fsDrM0e/roGm/Gg3aMlWLllRBIhw4nOhzaS0kSaIKG7yrrMlgN6s818KGdsWvXwLeQ 4WjRXyn2LyMOXQMjCbsPordXWrc+uxFe48vYTnjFbFjoiFQXVObPqrvOJoEEEdLMOHVd o1Aw== X-Gm-Message-State: AOJu0YxhvhLZpRrDpESg4fJv5KxBkYCPnHNplpYiJzUGpeUTZnv4PoE6 hlAe00o20UCzibK6/niRkeFOC6lyxe2ZLPr74FNVCA== X-Received: by 2002:a17:90a:ec12:b0:286:6cc0:b91a with SMTP id l18-20020a17090aec1200b002866cc0b91amr1150326pjy.81.1701793368717; Tue, 05 Dec 2023 08:22:48 -0800 (PST) MIME-Version: 1.0 References: <5564fc23d5e6425d069c36b4cef48edbe77fe64d.1696345700.git.Hongyan.Xia2@arm.com> <7cad55e3-3a61-498a-9364-7a2d69a20757@arm.com> In-Reply-To: <7cad55e3-3a61-498a-9364-7a2d69a20757@arm.com> From: Vincent Guittot Date: Tue, 5 Dec 2023 17:22:37 +0100 Message-ID: Subject: Re: [RFC PATCH 1/6] sched/uclamp: Track uclamped util_avg in sched_avg To: Hongyan Xia Cc: Ingo Molnar , Peter Zijlstra , Dietmar Eggemann , Juri Lelli , Qais Yousef , Morten Rasmussen , Lukasz Luba , Christian Loehle , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Tue, 05 Dec 2023 08:23:05 -0800 (PST) On Tue, 5 Dec 2023 at 15:24, Hongyan Xia wrote: > > On 04/12/2023 16:07, Vincent Guittot wrote: > > On Wed, 4 Oct 2023 at 11:05, Hongyan Xia wrote: > >> > >> From: Hongyan Xia > >> > >> Track a uclamped version of util_avg in sched_avg, which clamps util_avg > >> within [uclamp[UCLAMP_MIN], uclamp[UCLAMP_MAX]] every time util_avg is > >> updated. At the CFS rq level, cfs_rq->avg.util_avg_uclamp must always be > >> the sum of all util_avg_uclamp of entities on this cfs_rq. So, each > >> time the util_avg_uclamp of an entity gets updated, we also track the > >> delta and update the cfs_rq. > > > > No please don't do that. Don't start to mix several different signals > > in one. Typically uclamp_min doen't say you want to use at least this > > amount of cpu capacity. > > But I'd say with uclamp, PELT is already broken and is a mixed signal PELT has nothing to do with uclamp, you could argue that EAS is making a wrong use or mix of PELT signals and uclamp hints to select a CPU but PELT itself is not impacted by uclamp and should stay out of uclamp policy. > anyway. When uclamp is used, a util_avg value X doesn't mean X, it means > X under some rq uclamp, and the worst part is that the rq uclamp may or > may not have anything to do with this task's uclamp. I think that you are mixing PELT and how it is(was) used by EAS. Have you looked at the latest tip/sched/core and the changes in effective_cpu_util(int cpu, unsigned long util_cfs, unsigned long *min, unsigned long *max) ? We return 3 values: - the actual utilization which is not impacted by uclamp - a targeted min value which takes into account uclamp_min - a targeted max value which takes into account uclamp_max https://lore.kernel.org/lkml/20231122133904.446032-1-vincent.guittot@linaro.org/ > > Pretending X is a true PELT value now (and not a mixed value) is why we > have so many problems today. For example in the frequency spike problem, > if a task A has no idle time under uclamp_max, its PELT does not reflect > reality. The moment another task B comes in and uncap the rq uclamp_max, you are mixing 2 things. The PELT signal of the task is correct. > the current scheduler code thinks the 1024 of A means a real 1024, which > is wrong and is exactly why we see a spike when B joins. It's also why This is the true actual utilization, the actual util_avg of A is really 1024. But users want to give a hint to lower its needs with uclamp_max. > we need to special case 0 spare capacity with recent patches, because rq > util_avg has lost its original PELT meaning under uclamp. > > Because X is not the true PELT, we always have to do something to bring > it back into reality. What the current max aggregation code does is to > introduce corner cases, like treating 0 spare capacity as potential > opportunity to queue more tasks (which creates further problems in my > tests), and maybe introducing uclamp load balancing special cases in the > future, or introducing uclamp filtering to delay the effect of wrong > PELT values. > > What this series does is not much different. We keep the somewhat wrong > value X, but we also remember under what uclamp values did we get that X > to bring things back into reality, which means now we have [X, > uclamp_min when X happens, uclamp_max when X happens]. To save space, > this becomes [X, clamped X], which is what this series does. The > original PELT value X is kept, but we use the clamped X in several > places to improve our decisions. > > > > > With tasks with a util_avg of 10 but a uclamp min of 1024 what does it > > mean when you start to sum this value ? > > Like I replied in another comment, assuming a uclamp_min of 1024 is a > hint to run the task on the big CPUs, I don't think it's right to not especially to run on a big but to say that the task needs more performance than what its actual utilization looks > directly use uclamp as a CPU placement indicator. A uclamp value may > come from ADFP from userspace. An ADPF uclamp_min value of little CPU > capacity + 1 certainly doesn't mean a game on purpose wants to avoid the > little core. It simply means it wants at least this much performance, > and whether this results in placing the game thread on a big CPU is > purely the job of EAS (or CAS, etc.). We want to use little CPU + 1 as > uclamp_min because we know the SoC and the little CPU is bad, but uclamp > should be generic and should not rely on knowing the SoC. > > Basically, under sum aggregation one would not use a uclamp_min value of > 1024 to place a small task on a big core. A uclamp_min of 1024 under sum > aggregation has the meaning in ADPF, which is a hint to try to run me as > fast as possible. > > > > > > >> [...]