Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3881348imm; Mon, 4 Jun 2018 10:47:05 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLmBqHV318ch8SJlnFWL3RDyYpI6jUI1Q2T+G1rcBrmZ4roiXqUZLCQ7L6PDIHakn4EFgUG X-Received: by 2002:a63:40c7:: with SMTP id n190-v6mr3793994pga.248.1528134425780; Mon, 04 Jun 2018 10:47:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528134425; cv=none; d=google.com; s=arc-20160816; b=UJbSGjT6a6lKvoqbmHGcO5jj8oIAUl5A9DC69qjj4jaTChTkYLfeLXmlce7ZdcZKyu KX+D8To+Xx83zpBC2EHX/Ikf3KOTRNqISu1Xf8ITHNzdOmL3km2wK3rDvm9I/19G8YxE jD2zRPD2WVyc3J8o9SeZvcuCL6ndnx9s0XZ3ujbZXkqrO7IW9E/fr2QV+x3lM8rhIcpU TnVc7L937yTgN7rYTFqjYtOBIzaiEtqkQCC3BRjROKTbi8c4CyEmc2hRpWmlbyAxS7MF XTcqO1PKWUvZ+QQ8k0MybkOC6r8Z3kUnMzJXGAsOqoRghzdAq5ng7c3O/Vz0f9KJL8JE GomA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=O+Ta3Wucf+pGcseiddPFgb5M8rFlGCNW1GxDTIgVnmM=; b=VGCWKnZGFnEfxHo4HeKfT9+MokFywhzuJasWPqLAwWx08zf17c/BKxrJuPdTmWDKKq y8wkeuwEqm3fGN5HTkOK7EjLC3vuS2TdlKDbp7GnYGOMxR5yptZdrL6lSllZuZiYVYg2 TbjfqjV5VnIr++l/+lnIC001XrY5bhr0Fs7ZNeM+fSWkz1o95Uuk5Pyo5DRmNVEYq/ed bMvXKAmAUFmwC38shpcbxDfCPo2ac2p5piNQ7ysLbKQ2JEgTzxCcyfRtSbdYEqoIv2/M LYg8pwbOFCspF3mmTom2AoMpl9JFcuF1IPLMRnwXA7ui+hd+kKWiakiWu7pozE7xB4Gx +brg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=VI7UfOsz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=joelfernandes.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b3-v6si8529740pgr.495.2018.06.04.10.46.51; Mon, 04 Jun 2018 10:47:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=VI7UfOsz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=joelfernandes.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751325AbeFDRqV (ORCPT + 99 others); Mon, 4 Jun 2018 13:46:21 -0400 Received: from mail-pg0-f65.google.com ([74.125.83.65]:36831 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750879AbeFDRqT (ORCPT ); Mon, 4 Jun 2018 13:46:19 -0400 Received: by mail-pg0-f65.google.com with SMTP id m5-v6so10389245pgd.3 for ; Mon, 04 Jun 2018 10:46:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=O+Ta3Wucf+pGcseiddPFgb5M8rFlGCNW1GxDTIgVnmM=; b=VI7UfOszqbwKYgNbRA6gmwR5WQvbc8v7qYy8UBv+ZQWHD61SO2iUJ9q/nEBy9+42lA 4PXGj9I4TEHs/xkZm63LbjshvEZmtzO4NACbh4/mXRC4WcgBNZD+UxateuEZR+F9ng1B FgIlJBC1+XMXsxSAoCLbmCnQFwO4Eu3+3p33M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=O+Ta3Wucf+pGcseiddPFgb5M8rFlGCNW1GxDTIgVnmM=; b=P1RXhFOmw0llmyYHexTc4HBBk96SoMJd1SdZTNggV1HFSwNoibO6EvNXvvoSQZIcLx PhWwHdbL7Y6oLEikncrI+XjQ/FnYZcQISMFQrcRp2UWejqIGAeQ4J9DkMFxKOhbqz4aq k9Bw9D5/oAmpSRUcE3Fv2xUsg6OCyC465pmGjhOf9JOnRfFpXeE0aTie0YFKPVUB+OXF h1fRSsZfyswEo8BlVIIQ8JuPJ7TscDdesCt2GpBumH+Xx/v9M4jZLHoUC+kJ8wjIb4IQ lPRqsZek2GkNd+G67AiJxepWcc9X2aK4dbmhKGYn8H4ErwNQ7LXds//FPTdX4vu74/Kc 4i4g== X-Gm-Message-State: ALKqPwd0plI1/WcXu4P2Y65Fyv5ZV6gi2nEOjATtJoxT8ZpXbcTELd/v a4iXf5M0m10fIsPsuf6obDkBPg== X-Received: by 2002:a62:9056:: with SMTP id a83-v6mr22454999pfe.186.1528134379302; Mon, 04 Jun 2018 10:46:19 -0700 (PDT) Received: from localhost ([2620:0:1000:1600:3122:ea9c:d178:eb]) by smtp.gmail.com with ESMTPSA id r8-v6sm84505374pfk.179.2018.06.04.10.46.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 04 Jun 2018 10:46:18 -0700 (PDT) Date: Mon, 4 Jun 2018 10:46:18 -0700 From: Joel Fernandes To: Patrick Bellasi Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Viresh Kumar , Vincent Guittot , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Joel Fernandes , Steve Muckle , Todd Kjos Subject: Re: [PATCH 2/2] sched/fair: util_est: add running_sum tracking Message-ID: <20180604174618.GA222053@joelaf.mtv.corp.google.com> References: <20180604160600.22052-1-patrick.bellasi@arm.com> <20180604160600.22052-3-patrick.bellasi@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180604160600.22052-3-patrick.bellasi@arm.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Patrick, On Mon, Jun 04, 2018 at 05:06:00PM +0100, Patrick Bellasi wrote: > The estimated utilization of a task is affected by the task being > preempted, either by another FAIR task of by a task of an higher > priority class (i.e. RT or DL). Indeed, when a preemption happens, the > PELT utilization of the preempted task is going to be decayed a bit. > That's actually correct for utilization, which goal is to measure the > actual CPU bandwidth consumed by a task. > > However, the above behavior does not allow to know exactly what is the > utilization a task "would have used" if it was running without > being preempted. Thus, this reduces the effectiveness of util_est for a > task because it does not always allow to predict how much CPU a task is > likely to require. > > Let's improve the estimated utilization by adding a new "sort-of" PELT > signal, explicitly only for SE which has the following behavior: > a) at each enqueue time of a task, its value is the (already decayed) > util_avg of the task being enqueued > b) it's updated at each update_load_avg > c) it can just increase, whenever the task is actually RUNNING on a > CPU, while it's kept stable while the task is RUNNANBLE but not > actively consuming CPU bandwidth > > Such a defined signal is exactly equivalent to the util_avg for a task > running alone on a CPU while, in case the task is preempted, it allows > to know at dequeue time how much would have been the task utilization if > it was running alone on that CPU. > > This new signal is named "running_avg", since it tracks the actual > RUNNING time of a task by ignoring any form of preemption. > > From an implementation standpoint, since the sched_avg should fit into a > single cache line, we save space by tracking only a new runnable sum: > p->se.avg.running_sum > while the conversion into a running_avg is done on demand whenever we > need it, which is at task dequeue time when a new util_est sample has to > be collected. > > The conversion from "running_sum" to "running_avg" is done by performing > a single division by LOAD_AVG_MAX, which introduces a small error since > in the division we do not consider the (sa->period_contrib - 1024) > compensation factor used in ___update_load_avg(). However: > a) this error is expected to be limited (~2-3%) > b) it can be safely ignored since the estimated utilization is the only > consumer which is already subject to small estimation errors > > The additional corresponding benefit is that, at run-time, we pay the > cost for a additional sum and multiply, while the more expensive > division is required only at dequeue time. > > Signed-off-by: Patrick Bellasi > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Vincent Guittot > Cc: Juri Lelli > Cc: Todd Kjos > Cc: Joel Fernandes > Cc: Steve Muckle > Cc: Dietmar Eggemann > Cc: Morten Rasmussen > Cc: linux-kernel@vger.kernel.org > Cc: linux-pm@vger.kernel.org > --- > include/linux/sched.h | 1 + > kernel/sched/fair.c | 16 ++++++++++++++-- > 2 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 9d8732dab264..2bd5f1c68da9 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -399,6 +399,7 @@ struct sched_avg { > u64 load_sum; > u64 runnable_load_sum; > u32 util_sum; > + u32 running_sum; > u32 period_contrib; > unsigned long load_avg; > unsigned long runnable_load_avg; Should update the documentation comments above the struct too? > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index f74441be3f44..5d54d6a4c31f 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3161,6 +3161,8 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa, > sa->runnable_load_sum = > decay_load(sa->runnable_load_sum, periods); > sa->util_sum = decay_load((u64)(sa->util_sum), periods); > + if (running) > + sa->running_sum = decay_load(sa->running_sum, periods); > > /* > * Step 2 > @@ -3176,8 +3178,10 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa, > sa->load_sum += load * contrib; > if (runnable) > sa->runnable_load_sum += runnable * contrib; > - if (running) > + if (running) { > sa->util_sum += contrib * scale_cpu; > + sa->running_sum += contrib * scale_cpu; > + } > > return periods; > } > @@ -3963,6 +3967,12 @@ static inline void util_est_enqueue(struct cfs_rq *cfs_rq, > WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued); > } PELT changes look nice and makes sense :) > +static inline void util_est_enqueue_running(struct task_struct *p) > +{ > + /* Initilize the (non-preempted) utilization */ > + p->se.avg.running_sum = p->se.avg.util_sum; > +} > + > /* > * Check if a (signed) value is within a specified (unsigned) margin, > * based on the observation that: > @@ -4018,7 +4028,7 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep) > * Skip update of task's estimated utilization when its EWMA is > * already ~1% close to its last activation value. > */ > - ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED); > + ue.enqueued = p->se.avg.running_sum / LOAD_AVG_MAX; I guess we are doing extra division here which adds some cost. Does performance look Ok with the change? thanks, - Joel