Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp1166876pxb; Fri, 21 Jan 2022 11:16:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJzVrscobebowNmKfLYBHDNIuKIty91RjSrQtxIQPrvp5xEuPRuZK3h6WbvWkJ06oaVhFmvl X-Received: by 2002:a17:902:ea0f:b0:14a:8e80:cc96 with SMTP id s15-20020a170902ea0f00b0014a8e80cc96mr4937767plg.70.1642792579137; Fri, 21 Jan 2022 11:16:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642792579; cv=none; d=google.com; s=arc-20160816; b=ZPvGPo85RJUxwuvebvuP8GpgXaoutXCsJPxn8hqEzwJe+osEl5vRkiYw48rYm8kQfU uEmVbUrkpxhyf8I4h6OU71QPL3SCCR9XsZjNPel/MV397sC+V46SStOgOJo6ih9agtIz gJzs/L+AAuTxmtH4WUjWgt4fELamTroLloS2mvrLmTrqSWejzFPYpehiLdvqYfjeKngv TZXEbG2LWWLwLZ7t2zklNR/bJLCl09o4fpeGAeHIIvas98BzL31yVN08HowfN+GCoL37 fZ7Bc0OuIt0O5Wb7CHUpKzEF5ZilVciA/tcSdu9bIAFBOporw1JINpcl1X7hZ+HliOM6 t/Qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=K4NxkHaCPaZxAktRY7Z9j13Y+DVG/4Ag2JSyRXHqb24=; b=S4Xy9hNcahC3AhkYkkhu7KshIAWO28sX/NTqE0cgITHM957Xe2xWKQxLMg7WUH6jxJ sJxIM/De0GZBbS3GgpruV9tV9zYTOhoCGu1rsmKsAFdeC9Kxetzhferd2AWMpWiBNCly P+fUIpPvJMWuDaJbwnWzsNefvw9XgiPpDxdK06CXcedxC4+qTb5PVWILj0U/z5mNXhHZ ZZxIOPNUBwBdONC4k5J5UV9w3a0Yk2kO/AlAiOFbipWy159RtuLoSAPZCTtSYkurw+76 KEQtXGvazueYYjJYbad3pW0Faq8Eu5Rb7RboCJNkJddBBAancur7TBmryl8rhMxILB3L VSdg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ti8Gh06t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r10si6219916plo.33.2022.01.21.11.16.07; Fri, 21 Jan 2022 11:16:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ti8Gh06t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354787AbiASNWu (ORCPT + 99 others); Wed, 19 Jan 2022 08:22:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354669AbiASNWn (ORCPT ); Wed, 19 Jan 2022 08:22:43 -0500 Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED1D6C06173F for ; Wed, 19 Jan 2022 05:22:42 -0800 (PST) Received: by mail-lf1-x12a.google.com with SMTP id x11so9371935lfa.2 for ; Wed, 19 Jan 2022 05:22:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=K4NxkHaCPaZxAktRY7Z9j13Y+DVG/4Ag2JSyRXHqb24=; b=ti8Gh06talRe37sOC29PCTTLR6mADi2r7ZHqYaJ5sSLO1zKq5s6E6T3q//czKCW1gM JGha1wvCDGvl6QtuhvFxn3BdefYyH87USpFeB7wGsibh5eciB9mcxBO0wN5Dx4OY85/p Z7ikhOitXFCQNo7KkI/zK/WT+j5FGaMJb2DDzjQ7P79MtllzXplRscjBTYceV/xvTD+b j/aiMVtBriFmAN8WH0ijWgLN9t5EcbZsRLLEYOINaWTZfqFyWu3WjWnalXqtj2cumrrk 1n8d1Hf79gSwW3QqQ7D1lRK2QodZ5O6X0gD/RUB36B8+UeVgxCUEfXRcLVg41kkbr5Lb A8Ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=K4NxkHaCPaZxAktRY7Z9j13Y+DVG/4Ag2JSyRXHqb24=; b=qva6ySd+NXDxix8KoE87bRE9u0TOMLyc28hBqslSNemy/UHBf0Eury1CMJuAkSVkee QxXWwa9IBRegTXXvc2z8Zp1F0FKicB4dt3JjsYFBsMAymp2ji7nLYBVRC08OPyjDylzD ezmHoYhBA+52qWzyhxkTIPAhGkSMoJLc3sSJdXfcbkgOoZtubM+g+9XZtH+kidBkLDdQ YmK0JgfZth/8+BvSOvO8RSOXGQqA0EXP2umn8o+5qEteCsYXumzaugKEzrjxyZpmXbaw atOIM2h3L/x+unerjMgsWFYwqIOj0bdJdU6pHpv7X6LlHqJgU7hobM/FlSUauIW0VzFp Ku2Q== X-Gm-Message-State: AOAM530v5bAsITrt2JsfNO9ezwqFnilagvXQQGRpRjtsWBw4Dz9TzjHD DwKKuQhI6bTNFWyFx2Ne5av6JRhAJDNOeqbeqLjlSA== X-Received: by 2002:a05:6512:308d:: with SMTP id z13mr18136533lfd.523.1642598561229; Wed, 19 Jan 2022 05:22:41 -0800 (PST) MIME-Version: 1.0 References: <20220112161230.836326-1-vincent.donnefort@arm.com> <20220112161230.836326-3-vincent.donnefort@arm.com> In-Reply-To: From: Vincent Guittot Date: Wed, 19 Jan 2022 14:22:28 +0100 Message-ID: Subject: Re: [PATCH v2 2/7] sched/fair: Decay task PELT values during migration To: Vincent Donnefort Cc: peterz@infradead.org, mingo@redhat.com, linux-kernel@vger.kernel.org, dietmar.eggemann@arm.com, Valentin.Schneider@arm.com, Morten.Rasmussen@arm.com, Chris.Redpath@arm.com, qperret@google.com, Lukasz.Luba@arm.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 19 Jan 2022 at 12:59, Vincent Donnefort wrote: > > [...] > > > > > > > > > This has several shortfalls: > > > > - have a look at cfs_rq_clock_pelt() and rq_clock_pelt(). What you > > > > name clock_pelt in your commit message and is used to update PELT and > > > > saved in se->avg.last_update_time is : rq->clock_pelt - > > > > rq->lost_idle_time - cfs_rq->throttled_clock_task_time > > > > > > That's why, the PELT "lag" is added onto se->avg.last_update_time. (see the last > > > paragraph of the commit message) The estimator is just a time delta, that is > > > added on top of the entity's last_update_time. I don't see any problem with the > > > lost_idle_time here. > > > > lost_idle_time is updated before entering idle and after your > > clock_pelt_lag has been updated. This means that the delta that you > > are computing can be wrong > > > > I haven't look in details but similar problem probably happens for > > throttled_clock_task_time > > > > > > > > > - you are doing this whatever the state of the cpu : idle or not. But > > > > the clock cycles are not accounted for in the same way in both cases. > > > > > > If the CPU is idle and clock_pelt == clock_task, the component A of the > > > estimator would be 0 and we only would account for how outdated is the rq's > > > clock, i.e. component B. > > > > And if cpu is not idle, you can't apply the diff between clk_pelt and clock_task > > > > > > > > > - (B) doesn't seem to be accurate as you skip irq and steal time > > > > accounting and you don't apply any scale invariance if the cpu is not > > > > idle > > > > > > The missing irq and paravirt time is the reason why it is called "estimator". > > > But maybe there's a chance of improving this part with a lockless version of > > > rq->prev_irq_time and rq->prev_steal_time_rq? > > > > > > > - IIUC your explanation in the commit message above, the (A) period > > > > seems to be a problem only when idle but you apply it unconditionally. > > > > > > If the CPU is idle (and clock_pelt == clock_task), only the B part would be > > > worth something: > > > > > > A + B = [clock_task - clock_pelt] + [sched_clock_cpu() - clock] > > > A B > > > > > > > If cpu is idle you can assume that clock_pelt should be equal to > > > > clock_task but you can't if cpu is not idle otherwise your sync will > > > > be inaccurate and defeat the primary goal of this patch. If your > > > > problem with clock_pelt is that the pending idle time is not accounted > > > > for when entering idle but only at the next update (update blocked > > > > load or wakeup of a thread). This patch below should fix this and > > > > remove your A. > > > > > > That would help slightly the current situation, but this part is already > > > covered by the estimator. > > > > But the estimator, as you name it, is wrong beaus ethe A part can't be > > applied unconditionally > > Hum, it is used only in the !active migration. So we know the task was sleeping > before that migration. As a consequence, the time we need to account is "sleeping" > time from the task point of view, which is clock_pelt == clock_task (for > __update_load_avg_blocked_se()). Otherwise, we would only decay with the > "wallclock" idle time instead of the "scaled" one wouldn't we? clock_pelt == clock_task only when cpu is idle and after updating lost_idle_time but you have no idea of the state of the cpu when migrating the task > > > +-------------+-------------- > | Task A | Task B ..... > ^ ^ ^ > | | migrate A > | | | > | | | > | | | > | |<----------->| > | Wallclock Task A idle time > |<---------------->| > "Scaled" Task A idle time > > > [...]