Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3253828imm; Sun, 3 Jun 2018 23:42:57 -0700 (PDT) X-Google-Smtp-Source: ADUXVKL99GAXnHO8zP2etYb4+tcZqMPaQQBPEBRJcEqiRRTfDxTdMg06o8UFRlEZhZZERuVMOn9s X-Received: by 2002:a62:3bd2:: with SMTP id w79-v6mr20191800pfj.129.1528094577266; Sun, 03 Jun 2018 23:42:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528094577; cv=none; d=google.com; s=arc-20160816; b=S/hY5YKPATXlVDbe4WoPCn4bPsojagBbMLj6S8efFeqCnRbgE0icNqOZ+kra62U409 KQNMHWVVrNOjNwTOZoKERDPlYkQq5JzcYtmoyvfnw9iJ/tIJe71Wi5i4KHbLHWOnZB4h n4Ugp0lNvGf1BLD0RXj5ZDcg4nDPASs5aCdH9U0hsvXoLHiPG4idRqT7Xq98IjIhvDmi J3MREky72TmVjFiFmwaM+y7p5QkXrMQnWI5X/IMR/H0Fb18znUC49pOsipFqgbuQFIe+ 7+XNOjvor3HGXDlGklLwIVWaufcPLreo9nxkjinTHiO7VuZ24YvG5LoQIi8Rrzx+qXM+ GL0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=dX1yqBGzv/kPgHySM4VD5YccgfcLeMgrsMIHaJQpueU=; b=EPY+3s2V8m+6x4QtVF9ojVEzB3/vhZWeOALwzeX1xNkF932uPSvlmd6H0qdvesjpFC n6IuRy6Cllc12mX8hwdUTbqIUzphgPjnwZknSh37VDl/4/lxTsBUWZ4J06RZPfmgrOW8 JzBq/2zfJtkb/A/3RqasRnbiT7GaE4evQnRP+Ut19ZFwXNkBiC2VJOxQGWQV8CapO4Ed mx7htXqUnDzxUe2cmK1M1yY5OvMQRH4dl+HuOPZy2P+SFDZkJX0EkkR3M0I5RD7ZIVcw uOPzSCIkG7yINSulfuvmtS7wTwtVcin/qcO8THi3+6oziZowDpOod4GnxJNGZeZikVOe leHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XxVC8RUk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bc11-v6si44336047plb.544.2018.06.03.23.42.43; Sun, 03 Jun 2018 23:42:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XxVC8RUk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751715AbeFDGmO (ORCPT + 99 others); Mon, 4 Jun 2018 02:42:14 -0400 Received: from mail-io0-f193.google.com ([209.85.223.193]:41453 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751409AbeFDGmN (ORCPT ); Mon, 4 Jun 2018 02:42:13 -0400 Received: by mail-io0-f193.google.com with SMTP id t5-v6so24155081ioa.8 for ; Sun, 03 Jun 2018 23:42:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=dX1yqBGzv/kPgHySM4VD5YccgfcLeMgrsMIHaJQpueU=; b=XxVC8RUkD2tQQ6xfZEcwyO9mNOvvQP6nj7rzzuyVnE05tPPc7kL6W+Y2QaBGMXYc03 X9C79Ebcjy9/K4WihegA8XI8ngJcdPhCQC/eWnkz0Dn9pvSWd6QCWpjN9RZj0NZwQUV7 4aRmVPh55lG5nw5lkLaVu6IHiBbGAY4UOOtTE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=dX1yqBGzv/kPgHySM4VD5YccgfcLeMgrsMIHaJQpueU=; b=B1RWjvkqSSiTHtvzFl5NRKFSN7LOt4iEoiJ3vP5nFrTUEVwV5H10E72SfTIBnvREDu qAKqz44O0mCEJy6XnMB4/N68Seva40s5HTwJU/WeY+n4js2t61AmG9qKYUnzgexqF7N0 70n3xyxjW9WMqilufXVVp0hngB0dq1nCksV+XSpfXszs1xdDXBsxX9fEBjv8wRX3Xzz9 /11LxJYNyc50FokSm0dIVt1N6r5xFxjYDGay5esitK+bcjgTwc+/lgrIWeqth8UU6Oz3 k6pIMime7bQs2NpkWecCeL1HnR9st26zYhctR5u+LZc7QQV3npo7zd12cfjbYefmiPfz QKpQ== X-Gm-Message-State: ALKqPwfNGno1Ks2t3h8Vuik8XtW+B7ng4X2tBzwSQoxNCzGPuDiBb9ik fYhmqx2bugIxqs6x+S4CgOaaQ37gM4b6Thq8LkNgSQ== X-Received: by 2002:a6b:1502:: with SMTP id 2-v6mr20720652iov.18.1528094532369; Sun, 03 Jun 2018 23:42:12 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a6b:304a:0:0:0:0:0 with HTTP; Sun, 3 Jun 2018 23:41:51 -0700 (PDT) In-Reply-To: <20180601174526.GA105687@joelaf.mtv.corp.google.com> References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> <1527253951-22709-6-git-send-email-vincent.guittot@linaro.org> <20180528101234.GA1293@localhost.localdomain> <20180528152243.GD1293@localhost.localdomain> <20180531102735.GM30654@e110439-lin> <20180601135307.GA28550@linaro.org> <20180601174526.GA105687@joelaf.mtv.corp.google.com> From: Vincent Guittot Date: Mon, 4 Jun 2018 08:41:51 +0200 Message-ID: Subject: Re: [PATCH v5 05/10] cpufreq/schedutil: get max utilization To: Joel Fernandes Cc: Juri Lelli , Patrick Bellasi , Peter Zijlstra , Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Dietmar Eggemann , Morten Rasmussen , viresh kumar , Valentin Schneider , Quentin Perret , Luca Abeni , Claudio Scordino , Joel Fernandes , "Cc: Android Kernel" , Alessio Balsini Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1 June 2018 at 19:45, Joel Fernandes wrote: > On Fri, Jun 01, 2018 at 03:53:07PM +0200, Vincent Guittot wrote: >> > >> >> The example with a RT task described in the cover letter can be >> > >> >> run with a DL task and will give similar results. >> > > >> > > In the cover letter you says: >> > > >> > > A rt-app use case which creates an always running cfs thread and a >> > > rt threads that wakes up periodically with both threads pinned on >> > > same CPU, show lot of frequency switches of the CPU whereas the CPU >> > > never goes idles during the test. >> > > >> > > I would say that's a quite specific corner case where your always >> > > running CFS task has never accumulated a util_est sample. >> > > >> > > Do we really have these cases in real systems? >> > >> > My example is voluntary an extreme one because it's easier to >> > highlight the problem >> > >> > > >> > > Otherwise, it seems to me that we are trying to solve quite specific >> > > corner cases by adding a not negligible level of "complexity". >> > >> > By complexity, do you mean : >> > >> > Taking into account the number cfs running task to choose between >> > rq->dl.running_bw and avg_dl.util_avg >> > >> > I'm preparing a patchset that will provide the cfs waiting time in >> > addition to dl/rt util_avg for almost no additional cost. I will try >> > to sent the proposal later today >> >> >> The code below adds the tracking of the waiting level of cfs tasks because of >> rt/dl preemption. This waiting time can then be used when selecting an OPP >> instead of the dl util_avg which could become higher than dl bandwidth with >> "long" runtime >> >> We need only one new call for the 1st cfs task that is enqueued to get these additional metrics >> the call to arch_scale_cpu_capacity() can be removed once the later will be >> taken into account when computing the load (which scales only with freq >> currently) >> >> For rt task, we must keep to take into account util_avg to have an idea of the >> rt level on the cpu which is given by the badnwodth for dl >> >> --- >> kernel/sched/fair.c | 27 +++++++++++++++++++++++++++ >> kernel/sched/pelt.c | 8 ++++++-- >> kernel/sched/sched.h | 4 +++- >> 3 files changed, 36 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index eac1f9a..1682ea7 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -5148,6 +5148,30 @@ static inline void hrtick_update(struct rq *rq) >> } >> #endif >> >> +static inline void update_cfs_wait_util_avg(struct rq *rq) >> +{ >> + /* >> + * If cfs is already enqueued, we don't have anything to do because >> + * we already updated the non waiting time >> + */ >> + if (rq->cfs.h_nr_running) >> + return; >> + >> + /* >> + * If rt is running, we update the non wait time before increasing >> + * cfs.h_nr_running) >> + */ >> + if (rq->curr->sched_class == &rt_sched_class) >> + update_rt_rq_load_avg(rq_clock_task(rq), rq, 1); >> + >> + /* >> + * If dl is running, we update the non time before increasing >> + * cfs.h_nr_running) >> + */ >> + if (rq->curr->sched_class == &dl_sched_class) >> + update_dl_rq_load_avg(rq_clock_task(rq), rq, 1); >> +} >> + > > Please correct me if I'm wrong but the CFS preemption-decay happens in > set_next_entity -> update_load_avg when the CFS task is scheduled again after > the preemption. Then can we not fix this issue by doing our UTIL_EST magic > from set_next_entity? But yeah probably we need to be careful with overhead.. util_est is there to keep track of the last max. I'm not sure that trying to add some magics to take into account preemption is the right way to do. Mixing several information in the same metric just add more fuzzy in the meaning of the metric. > > IMO I feel its overkill to account dl_avg when we already have DL's running > bandwidth we can use. I understand it may be too instanenous, but perhaps we We keep using dl bandwidth which is quite correct for dl needs but doesn't reflect how it has disturbed other classes > can fix CFS's problems within CFS itself and not have to do this kind of > extra external accounting ? > > I also feel its better if we don't have to call update_{rt,dl}_rq_load_avg > from within CFS class as being done above. > > thanks, > > - Joel >