Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2053942imu; Thu, 10 Jan 2019 07:32:53 -0800 (PST) X-Google-Smtp-Source: ALg8bN53pXjP1cO+4eGbcfxe+3lbkNT9Bv2a0SmtmtM5yeYKqLmoP3H0rtkyeNd4HEOpkJkd/MnR X-Received: by 2002:a63:da45:: with SMTP id l5mr9866942pgj.111.1547134373767; Thu, 10 Jan 2019 07:32:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547134373; cv=none; d=google.com; s=arc-20160816; b=0nihkzX/IEXZGOaN8bSo5XeMUEkSzIc9hxaQMgCHdABWXruXXFKfZk1hFzqPj0eIqf tz/ILoV1WiJK/m+2h1WJoKeR2LNbxEg2h3qzYmVGrxU90lv4+1KlUuRpL6CXacJAz+Nb fhJCaYxic7wbkbBBGj1CXQWZJciy5d/6k98dY81YIOv8ARS7cCfHRFOO0L6ptakTCk1T kRMoeYZlDr3ZCYKRzawk015DS4wg8v/C+0GcVJndq/jrJuGxqAOxioaRF/eyFpWs1IOx 26acY961NkQRbYFpWu6W9TDHhL9ec058meV92bBVSOBgvnf2340IZ0f2lJU69dZ4AqUw wINg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=SnJMBIuMIH9oPD9wvVJqaEAWeVOIvy8yPOewKHBeCcY=; b=tYGH9giOGJ5sKM16C+oJ1h9TZnoOpHeeA33iRR/l2Sar+vOvS6WV8OXOxYYeZ6o4IB gGEs5DDbxrCPIUyZsNl6sEm3VXT1XMRfutB+662WH5gNDHRLfcHsniom5k2RE0la8m6x FmzudA8ngOCZJFrda/euAzgIVPrcPj7HYpeL8uomMzdozJvzbpFL95VxR/1riLuu6Vxp q1kikoPCxxrGRLdkN0xbggGkLDNzI4Z2zgb/e4C8VqkF8KaVOHVOYA+jgQ7iVDnpkF/g ecgnGKMV0QQRIWau0PCzhiGRgsl7IkkF578u5RpmJrdo9nokFYnkqGwSqAug7zh4bP+U iSOw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g98si72143554plb.99.2019.01.10.07.32.38; Thu, 10 Jan 2019 07:32:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729733AbfAJPaz (ORCPT + 99 others); Thu, 10 Jan 2019 10:30:55 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:38064 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727882AbfAJPaz (ORCPT ); Thu, 10 Jan 2019 10:30:55 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3BCAA1596; Thu, 10 Jan 2019 07:30:54 -0800 (PST) Received: from e110439-lin (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 04D933F6CF; Thu, 10 Jan 2019 07:30:51 -0800 (PST) Date: Thu, 10 Jan 2019 15:30:42 +0000 From: Patrick Bellasi To: Vincent Guittot Cc: Peter Zijlstra , Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Dietmar Eggemann , Morten Rasmussen , Paul Turner , Ben Segall , Thara Gopinath , pkondeti@codeaurora.org, Quentin Perret , Srinivas Pandruvada Subject: Re: [PATCH v7 2/2] sched/fair: update scale invariance of PELT Message-ID: <20190110153031.4rh64xz2muctkffe@e110439-lin> References: <20181128115336.GB23094@e110439-lin> <20181128144039.GC23094@e110439-lin> <20181128152133.GD23094@e110439-lin> <20181128163545.GE23094@e110439-lin> <20181129150020.GF23094@e110439-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 29-Nov 17:19, Vincent Guittot wrote: > On Thu, 29 Nov 2018 at 16:00, Patrick Bellasi wrote: > > On 29-Nov 11:43, Vincent Guittot wrote: [...] > > Seems we agree that, when there is no idle time: > > - the two 15% tasks will be overestimated > > - their utilization will reach 50% after a while > > > > If I'm not wrong, we will have: > > - 30% CPU util in ~16ms @1024 capacity > > ~64ms @256 capacity > > > > Thus, the tasks will be certainly over-estimated after ~64ms. > > Is that correct ? > > From a pure util_avg pov it's correct > But i'd like to weight that a bit with the example below > > > Now, we can argue that 64ms is a pretty long time and thus it's quite > > unlucky we will have no idle for such a long time. > > > > Still, I'm wondering if we should keep collecting those samples or > > better find a way to detect that and skip the sampling. > > The problem is that you can have util_avg above capacity even with idle time > In the 1st example of this thread, the 39ms/80ms task will reach 709 > which is the value saved by util_est on a big core > But on core with half capacity, there is still idle time so 709 is a > correct value although above 512 Right, I see your point and (in principle) I like the idea of collecting samples for tasks which happen to run at a lower capacity then required and the utilization value makes sense... > In fact, max will be always above the linear ratio because it's based > on geometric series > > And this is true even with 15.6ms/32ms (same ratio as above) task > although the impact is smaller (max value, which should be saved by > util est, becomes 587 in this case). However that's not always the case... as per my example above. Moreover, we should also consider that util_est is mainly meant to be a lower-bound for tasks utilization. That's why task_util_est() already returns the actual util_avg when it's higher than the estimated utilization. With your new signal and without any special check on samples collection, if a task is limited because of thermal capping for example, we could end up overestimating its utilization and thus perhaps generating an unwanted frequency spike when the capping is relaxed... and (even worst) it will take some more activations for the estimated utilization to converge back to the actual utilization. Since we cannot easily know if there is idle time in a CPU when a task completes an activation with a utilization higher then the CPU capacity, I would better prefer to just skip the sampling with something like: ---8<--- diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9332863d122a..485053026533 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3639,6 +3639,7 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep) { long last_ewma_diff; struct util_est ue; + int cpu; if (!sched_feat(UTIL_EST)) return; @@ -3672,6 +3673,14 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep) if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100))) return; + /* + * To avoid overestimation of actual task utilization, skip updates if + * we cannot grant there is idle time in this CPU. + */ + cpu = cpu_of(rq_of(cfs_rq)); + if (task_util(p) > cpu_capacity(cpu)) + return; + /* * Update Task's estimated utilization * ---8<--- At least this will ensure that util_est always provides an actual measured lower bound for a task utilization. If you think this makes sense, feel free to add such a patch on top of your series. Cheers Patrick -- #include Patrick Bellasi