Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2025274imu; Thu, 24 Jan 2019 06:06:54 -0800 (PST) X-Google-Smtp-Source: ALg8bN7MNx5LouxAced1b5fmQQkedb+to9qR0CMCfq87N8SmpGLN/3JTOAFVw/4ACatsFmzDOyeN X-Received: by 2002:a62:6385:: with SMTP id x127mr6685067pfb.15.1548338814178; Thu, 24 Jan 2019 06:06:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548338814; cv=none; d=google.com; s=arc-20160816; b=ogAEJ1f5ZmDwujcHwTIjybi8BAmcZBLapKZvd6vDROZcb4exJMRaUtZnLwildnKn8P IpxdFVu52QbH94qXd373odPQDvUOyi9/X2LWTKFUTHFHbzCNFVhTGkvC8aDMGAz3qCTZ 1Erz/jSmljCRliaSlioHK1WEnSFRZT38WISeVyeizde9pwx23ChEC00/jEx3fvIJmrxe KcU2zfXusmmHG8wQniG0u0j62Oljhtxk15afrvpEordVjeOZreTPnmp/O9Hh3Eq9Aj9E 8VmVIby3uaK2ZUNHeboIjmITDbGJ1u+Zh1/GUJB54jHNl/x/EWv2zVL+Qt5qjycvsLlB nlYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=P61YXO0gCHpMeTOp8vxYM+zD1gOoEq1Avv02vb4E7ls=; b=VXXcWWJw4QnZEJZD9XyBkQkEHKOqYhJo5cZmFdyLnnbU+a3lvAcw3m2TAdDIR1Jr/a +jSqn1btVLerTrHOGtJyue3L96YqIq369Qfrr4V06TEKVel9BH4zDJUXRax/36bToJow K7E0X6i/9ZxN8i3rSmr4H8Mq6V5YYyXLVyKtgTmzFHLQPC1iDDCHADtTPwTFy6sDDHtz mv5AkvzYcDUFdqPjRIBnMLTIHMbJ93xw6/Qn9Z34a6gQfLmE3kDnh5BPUE+eGTencVwI mRH8jKdJZIU4cBsMYgSm0hAKBxHGmVYiXALc1Y6NW6NBt9bkKusbTMUXDVYU3+8zXoyk up3A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b186si22815621pfb.24.2019.01.24.06.06.36; Thu, 24 Jan 2019 06:06:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728219AbfAXOEi (ORCPT + 99 others); Thu, 24 Jan 2019 09:04:38 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:57412 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727596AbfAXOEi (ORCPT ); Thu, 24 Jan 2019 09:04:38 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AC351A78; Thu, 24 Jan 2019 06:04:37 -0800 (PST) Received: from e110439-lin (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 743D23F5C1; Thu, 24 Jan 2019 06:04:35 -0800 (PST) Date: Thu, 24 Jan 2019 14:04:32 +0000 From: Patrick Bellasi To: Peter Zijlstra Cc: Vincent Guittot , Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Dietmar Eggemann , Morten Rasmussen , Paul Turner , Ben Segall , Thara Gopinath , pkondeti@codeaurora.org, Quentin Perret , Srinivas Pandruvada Subject: Re: [PATCH v7 2/2] sched/fair: update scale invariance of PELT Message-ID: <20190124140432.2kiprlxzw2cpf35f@e110439-lin> References: <1542711308-25256-1-git-send-email-vincent.guittot@linaro.org> <1542711308-25256-3-git-send-email-vincent.guittot@linaro.org> <20181128100241.GA2131@hirez.programming.kicks-ass.net> <20181128115336.GB23094@e110439-lin> <20181129125348.GL2131@hirez.programming.kicks-ass.net> <20181129151316.GG23094@e110439-lin> <20190124090755.GC13536@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190124090755.GC13536@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 24-Jan 10:07, Peter Zijlstra wrote: > > Sorry; trying to get back to this and re-reading the old conversations. > > On Thu, Nov 29, 2018 at 03:13:16PM +0000, Patrick Bellasi wrote: > > On 29-Nov 13:53, Peter Zijlstra wrote: > > > On Wed, Nov 28, 2018 at 11:53:36AM +0000, Patrick Bellasi wrote: > > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > index ac855b2f4774..93e0cf5d8a76 100644 > > > > --- a/kernel/sched/fair.c > > > > +++ b/kernel/sched/fair.c > > > > @@ -3661,6 +3661,10 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep) > > > > if (!task_sleep) > > > > return; > > > > > > > > + /* Skip samples which do not represent an actual utilization */ > > > > + if (unlikely(task_util(p) > capacity_of(task_cpu(p)))) > > > > + return; > > > > + > > > > /* > > > > * If the PELT values haven't changed since enqueue time, > > > > * skip the util_est update. > > > > > > Would you not want something like: > > > > > > min(task_util(p), capacity_of(task_cpu(p))) > > > > > > And is this the only place where we need this? > > > > Mmm... even this could be an over-estimation: > > > > I've just posted an example in my last reply to Vincent, end of: > > > > Message-ID: <20181129150020.GF23094@e110439-lin> > > https://lore.kernel.org/lkml/20181129150020.GF23094@e110439-lin/ > > In particular this bit: > > | Seems we agree that, when there is no idle time: > | - the two 15% tasks will be overestimated > | - their utilization will reach 50% after a while > > Right? > > > > OTOH, if the task is always running, it will be always running > > > irrespective of where it runs. > > > > That's not what I'm concerned about. I'm concerned about small tasks > > which are running on limited capacity (e.g. due to thermal capping) > > without idle time. In this case, the new "utilization" signal could > > overestimate the real task needs. > > > > > Not storing these samples seems weird though; this is the exact > > > condition you want to record -- the task is very active, if we skip > > > these, we'll come back at a low frequency on the next wakeup. > > > > When there is not idle time, we don't know if the reported > > utilization, above the cpu capacity, is due to the task being bigger... > > or just the new utilization signal converging towards: > > > > 100% / RUNNABLE_TASKS_COUNT > > So if I'm not mistaken we then have 3 cases: > > 1) runnable == util <= capacity > > no contention, idle > > 2) runnable == util > capacity > > no contention, no idle > > 3) runnable > util > > contention, no idle > > For 1) we can use: 'util' > For 2) we can use: 'capacity' > For 3) we can use: 'util * capacity >> 10' > > (note that 2 is a special case of 3 when u=1) > > This should work right? I think there is a case, similar to 2, in which the new 'util' could potentially be used. That's the case for example of a 20% (estimated) utilization task running alone on a 15% capacity CPU, for a single activation. In that case such a task will complete and be dequeued with: runnable == util > capacity The problem is that we need to be sure there was not contention... and that seems to be difficult to detect. > Now, instead of doing complicated things like that, you instead figure > that when there's no idle there's also no dequeue happening and we can > simply short-cut by skipping the entire thing, forgetting everything > about 2,3. > > Did I get that right? More or less... just saying that 1 is the only easy to detect scenario in which we are granted the utilization represents an actual bandwidth request and thus the only safe values to sample for estimated utilization. For the other cases, since anyway: util_est := max(max(ewma, last_util), util_avg) util_est will just keep representing a safe and actually measured lower-bound for the expected utilization of a task, without side-affecting the EWMA which has a "slow" update dynamic. -- #include Patrick Bellasi