Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp3114984ybd; Fri, 28 Jun 2019 03:09:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqwEtF6ak9ttGlDZbS9a3xrF5gDs8reoteOevErDD4lJ6LUe+cmy+QkaqUaJuF6EvItzGjyo X-Received: by 2002:a17:902:290b:: with SMTP id g11mr10421379plb.26.1561716551814; Fri, 28 Jun 2019 03:09:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561716551; cv=none; d=google.com; s=arc-20160816; b=LyNkVU5n/Vot7ByvuE0v9uWjVMCVzdo17TpCCIyl2Edsu6YR6V/xVB2xMNYbZEh9Wu s/ruLlJlyrM6pRLH/hcXUoFSoTyNUNex8PCvBvTuILJOz1hDSy163BZug86iBF8T5NzQ EzkyBSOQ/Ihwh/3lex7JruhPdfe8N6s6afKCS2KZga/m2tylNRRKrYP5AGra+0BcdrUx Ozuy3rREUH6fT6wiTbVEhxtGwwCAHRKAchTtozX62crtH46lQ0g6yASuZWhxlQLKu5wt 8RuJ36WPzCaIeflWHdinlY7RGv8Z2l8meno4ntrj+DI8GNhlDwfOnm99uX+UyZRUAbj6 d1Ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=9STk/WUcFF3ohgWhQX+I85a5rYdHIS9nTrt9+onjqmA=; b=sVaqmsF0/b5tVQHMu1d+uHJWnxz7unRE/T4DKHvLRNYKv74YTJt4awbhwfQxo1smMF qa+zJ61+UQlIq510U2dgQlbEFUeq+34pxmeXYI4P+aq8cOZuswlLZnSObEer6p1FD0vs dps77V3NnmaB9fyD+D1SAhJkfPPihGM2m5QNc1EuhgsCeghDt66jr1YuSqYBCoPmXdlC jO8f1umHkGUFqgqfcWLjLobGqzW6NaC8msKCGHk4TT1+aY+bbVCkHCZodHsw1v6WkpKQ oZz1eGlhB8fYmqgWBV+jdrq7O/nynpk2Eh3u7uAeSme3w08Ix7LZgxh0Bvo4QBdhHIfO NG6g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 92si1789401plc.217.2019.06.28.03.08.55; Fri, 28 Jun 2019 03:09:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726578AbfF1KIX (ORCPT + 99 others); Fri, 28 Jun 2019 06:08:23 -0400 Received: from foss.arm.com ([217.140.110.172]:44232 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726484AbfF1KIW (ORCPT ); Fri, 28 Jun 2019 06:08:22 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A58AC28; Fri, 28 Jun 2019 03:08:21 -0700 (PDT) Received: from e110439-lin (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 34F523F718; Fri, 28 Jun 2019 03:08:20 -0700 (PDT) Date: Fri, 28 Jun 2019 11:08:14 +0100 From: Patrick Bellasi To: Vincent Guittot Cc: linux-kernel , Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Viresh Kumar , Douglas Raillard , Quentin Perret , Dietmar Eggemann , Morten Rasmussen , Juri Lelli Subject: Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases Message-ID: <20190628100751.lpcwsouacsi2swkm@e110439-lin> References: <20190620150555.15717-1-patrick.bellasi@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 26-Jun 13:40, Vincent Guittot wrote: > Hi Patrick, > > On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi wrote: > > > > The estimated utilization for a task is currently defined based on: > > - enqueued: the utilization value at the end of the last activation > > - ewma: an exponential moving average which samples are the enqueued values > > > > According to this definition, when a task suddenly change it's bandwidth > > requirements from small to big, the EWMA will need to collect multiple > > samples before converging up to track the new big utilization. > > > > Moreover, after the PELT scale invariance update [1], in the above scenario we > > can see that the utilization of the task has a significant drop from the first > > big activation to the following one. That's implied by the new "time-scaling" > > Could you give us more details about this? I'm not sure to understand > what changes between the 1st big activation and the following one ? We are after a solution for the problem Douglas Raillard discussed at OSPM, specifically the "Task util drop after 1st idle" highlighted in slide 6 of his presentation: http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf which shows what happens with a task switches from 5% to 75% and we get these start/end values for each activation: Act Time __comm __cpu __pid task util_avg -------------------------------------------------------------- 1 2.813559 4 0 step_up 45 2.902624 step_up 4 2574 step_up 665 -------------------------------------------------------------- 2 2.903722 4 0 step_up 289 2.917385 step_up 4 2574 step_up 452 -------------------------------------------------------------- 3 2.919725 4 0 step_up 418 2.953764 step_up 4 2574 step_up 658 -------------------------------------------------------------- 4 2.954248 4 0 step_up 537 2.967955 step_up 4 2574 step_up 645 -------------------------------------------------------------- 5 2.970248 4 0 step_up 597 2.983914 step_up 4 2574 step_up 692 -------------------------------------------------------------- 6 2.986248 4 0 step_up 640 2.999924 step_up 4 2574 step_up 725 -------------------------------------------------------------- 7 3.002248 4 0 step_up 670 3.015872 step_up 4 2574 step_up 749 -------------------------------------------------------------- 8 3.018248 4 0 step_up 694 3.030474 step_up 4 2574 step_up 767 -------------------------------------------------------------- 9 3.034247 4 0 step_up 710 3.046454 step_up 4 2574 step_up 780 -------------------------------------------------------------- Since the first activation is running at lower-than-max OPPs we do "time-scaling" at the end of the activation. Util_avg starts at 45 and ramps up to 665 but then it drops 375 units down to 289 at the beginning of the second activation. The second activation has a chance to run at higher OPPs, but still not at max. Util_avg starts at 289 and ramps up to 452, which is even lower then the previous max value, but then it drops 34 units down to 418. The following activations have a similar pattern but util_avg converges toward the final value, we run almost always at the highest OPP and the drops are defined mainly by the expected PELT decay. > The utilization implied by new "time-scaling" should be the same as > always running at max frequency with previous method Right, the problem we are tacking with this patch however is to make util_est a better signal for the ramp-up phases. Right now util_est "fixes" only the second activation, since: max(util_avg, last_value, ewma) = max(289, 665, <289) = 665 and thus we keep running on the highest OPP we reached at the end of the first activation. While at the start of the third activation: max(util_avg, last_value, ewma) = max(452, 418, <452) = 452 and this time we drop the OPP quite a lot despite the signal still being ramping up. > > mechanisms instead of the previous "delta-scaling" approach. > > That happens because the EWMA takes multiple activations to converge up, which means it's not very helping much: > > Unfortunately, these drops cannot be fully absorbed by the current util_est > > implementation. Indeed, the low-frequency filtering introduced by the "ewma" is > > entirely useless while converging up and it does not help in stabilizing sooner > > the PELT signal. The idea of the patch is to exploit two observations: 1. the default scheduler behavior is to be performance oriented 2. the longher you run a task underprovisioned, the higher the util_avg will be Which turns into: > > To make util_est do better service in the above scenario, do change its > > definition to slow down only utilization decreases. Do that by resetting the > > "ewma" every time the last collected sample increases. > > > > This change makes also the default util_est implementation more aligned with > > the major scheduler behavior, which is to optimize for performance. > > In the future, this implementation can be further refined to consider > > task specific hints. Cheers, Patrick -- #include Patrick Bellasi