Received: by 2002:ac0:aa62:0:0:0:0:0 with SMTP id w31-v6csp343772ima; Wed, 24 Oct 2018 02:09:38 -0700 (PDT) X-Google-Smtp-Source: AJdET5cVNs7KFEnRk5ge3VqE/YxKBVriSiGo612RGaQQqlmDMsbH9rh0Dpz8kCrlLeZ7YEQzLc0R X-Received: by 2002:a63:d945:: with SMTP id e5-v6mr1793566pgj.24.1540372178654; Wed, 24 Oct 2018 02:09:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540372178; cv=none; d=google.com; s=arc-20160816; b=lMj4LFv/I4q1ZqS+nC73THDhylbbZ92YYmLNzbRShk62S7gGQOgytaAhWqRhVcRbr3 9T2oIwajNwQ30C0sM5Yunsnm0qBKw0B/zxjT/+IlCsNNjG6XzOsoKvA5fCFLNAop+RJ+ KbLOgJNSRiEBeKw7zzxASFdyUP6/u2+f6BniicsapxPzIsVg9TIJ9mE28KaIzHrORUMd BVuanLgo73jPgkXAyxZI10laHMGX1suad/j8z2Sj/BSSqT9wWNyH62MsSNd6Du47wR+W BqE02KrqfB1gQTU7h9obmrv6oi4TL3sU294ZY3x5JHaxkalAda4tJ3JFR0CZlQT6gVB/ 0G2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=c/7fz6bLAWC/Eb+nW3TIMqXX2i5zquFRGZXZJpihnqc=; b=E69lS4rmdVysPdCBhnieSi6tUefnCuvzaOvzXS3HhI6Dcripf87ufNUauUhZZrzKAq fVJh1Qnxzdti6YkkVzp06y/R59429C0wAU47ktyyoq6Jp4rDaCI/iZacN9sMSdZ3g0M/ VMAPcXfFIycs4QQF6GRvoYzN6WRteL9KMcizjJVWXgOk6bPn8WdMtbQXjIkWD7BB/8q3 7pl0Pi6YH9oNrz2X13zCxzXjyN95AbbEEmdjq+48QtSk/oZAGWUOqfi0DR7f/koEfFlD i/5jz4TkGvBvjrO7IaEGi5oo9qs7ITYi5GaeIcTMI5yTAogfjsM41q2pVmIo+XgoxT1F w+6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="kMmDM1d/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j19-v6si3994988pfh.63.2018.10.24.02.09.20; Wed, 24 Oct 2018 02:09:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="kMmDM1d/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727788AbeJXRex (ORCPT + 99 others); Wed, 24 Oct 2018 13:34:53 -0400 Received: from mail-it1-f196.google.com ([209.85.166.196]:52168 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726256AbeJXRew (ORCPT ); Wed, 24 Oct 2018 13:34:52 -0400 Received: by mail-it1-f196.google.com with SMTP id 74-v6so5274630itw.1 for ; Wed, 24 Oct 2018 02:07:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=c/7fz6bLAWC/Eb+nW3TIMqXX2i5zquFRGZXZJpihnqc=; b=kMmDM1d/WA3a8WHk2KMG75S5IeP1nweuBpWDnr4cfLK5AJ9gTSMJhdUzsfNpqGKXao GUypwoy8WwfLlq+4DHWhIVdVheS33JXgf0n9Z/1wr6GYrLy30lSAd4FidKUgWF/QIoTZ RnQ0mkSmms8g9O1hlQzQxc/eSIviE/65BwAfA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=c/7fz6bLAWC/Eb+nW3TIMqXX2i5zquFRGZXZJpihnqc=; b=PTA4Us8ucXifJvWfjHHN1rgAizGOSqB6G3EDoEsUKt5kiktl8rtt+E3ryBPW6HyWne B3v3cZU/4vuDEkxmJ5FXNTfJNecrJ7eJZQpSxUvBg4dlDlbe7hf+RV6SZxCurey2ZIzx T2NARRxSu1n8LHDJFeQlMhJKT1GsSxGIiULfD8QyxcdHvNHzyU09I4EkJCOvR5CPtbqB 06A6eaooSMbTkQBYvBBjZe2aC/VFwfbY+NUn6Hof1phvoKZsEzjeFoFVX8McWlTTD9ZF 0iqlkb3Je9P2MABPEhppzJhUX7/y0rJhvwplHX2G8Wwdz2ehzNLpP48FBvPH9bEMh1i2 5YBw== X-Gm-Message-State: AGRZ1gLm4PtwMW1w8yb1XIjctxqNXqfb2nBMP35lriNTWHTVgIv9zNWa L6iHc3DDDB6UkNRdC5lRm+SG7q8vn3PJsAqzDJD3YQ== X-Received: by 2002:a02:b889:: with SMTP id p9-v6mr1209559jam.20.1540372055835; Wed, 24 Oct 2018 02:07:35 -0700 (PDT) MIME-Version: 1.0 References: <1539965871-22410-1-git-send-email-vincent.guittot@linaro.org> <1539965871-22410-3-git-send-email-vincent.guittot@linaro.org> <20181023055937.GC27587@codeaurora.org> <20181024045305.GD27587@codeaurora.org> In-Reply-To: <20181024045305.GD27587@codeaurora.org> From: Vincent Guittot Date: Wed, 24 Oct 2018 11:07:24 +0200 Message-ID: Subject: Re: [PATCH v4 2/2] sched/fair: update scale invariance of PELT To: pkondeti@codeaurora.org Cc: Peter Zijlstra , Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Dietmar Eggemann , Morten Rasmussen , Patrick Bellasi , Paul Turner , Ben Segall , Thara Gopinath Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Pavan, On Wed, 24 Oct 2018 at 06:53, Pavan Kondeti wrote: > > Hi Vincent, > > Thanks for the detailed explanation. > > On Tue, Oct 23, 2018 at 02:15:08PM +0200, Vincent Guittot wrote: > > Hi Pavan, > > > > On Tue, 23 Oct 2018 at 07:59, Pavan Kondeti wrote: > > > > > > Hi Vincent, > > > > > > On Fri, Oct 19, 2018 at 06:17:51PM +0200, Vincent Guittot wrote: > > > > > > > > /* > > > > + * The clock_pelt scales the time to reflect the effective amount of > > > > + * computation done during the running delta time but then sync back to > > > > + * clock_task when rq is idle. > > > > + * > > > > + * > > > > + * absolute time | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16 > > > > + * @ max capacity ------******---------------******--------------- > > > > + * @ half capacity ------************---------************--------- > > > > + * clock pelt | 1| 2| 3| 4| 7| 8| 9| 10| 11|14|15|16 > > > > + * > > > > + */ > > > > +void update_rq_clock_pelt(struct rq *rq, s64 delta) > > > > +{ > > > > + > > > > + if (is_idle_task(rq->curr)) { > > > > + u32 divider = (LOAD_AVG_MAX - 1024 + rq->cfs.avg.period_contrib) << SCHED_CAPACITY_SHIFT; > > > > + u32 overload = rq->cfs.avg.util_sum + LOAD_AVG_MAX; > > > > + overload += rq->avg_rt.util_sum; > > > > + overload += rq->avg_dl.util_sum; > > > > + > > > > + /* > > > > + * Reflecting some stolen time makes sense only if the idle > > > > + * phase would be present at max capacity. As soon as the > > > > + * utilization of a rq has reached the maximum value, it is > > > > + * considered as an always runnnig rq without idle time to > > > > + * steal. This potential idle time is considered as lost in > > > > + * this case. We keep track of this lost idle time compare to > > > > + * rq's clock_task. > > > > + */ > > > > + if (overload >= divider) > > > > + rq->lost_idle_time += rq_clock_task(rq) - rq->clock_pelt; > > > > + > > > > > > I am trying to understand this better. I believe we run into this scenario, when > > > the frequency is limited due to thermal/userspace constraints. Lets say > > > > Yes these are the most common UCs but this can also happen after tasks > > migration or with a cpufreq governor that doesn't increase OPP fast > > enough for current utilization. > > > > > frequency is limited to Fmax/2. A 50% task at Fmax, becomes 100% running at > > > Fmax/2. The utilization is built up to 100% after several periods. > > > The clock_pelt runs at 1/2 speed of the clock_task. We are loosing the idle time > > > all along. What happens when the CPU enters idle for a short duration and comes > > > back to run this 100% utilization task? > > > > If you are at 100%, we only apply the short idle duration > > > > > > > > If the above block is not present i.e lost_idle_time is not tracked, we > > > stretch the idle time (since clock_pelt is synced to clock_task) and the > > > utilization is dropped. Right? > > > > yes that 's what would happen. I gives more details below > > > > > > > > With the above block, we don't stretch the idle time. In fact we don't > > > consider the idle time at all. Because, > > > > > > idle_time = now - last_time; > > > > > > idle_time = (rq->clock_pelt - rq->lost_idle_time) - last_time > > > idle_time = (rq->clock_task - rq_clock_task + rq->clock_pelt_old) - last_time > > > idle_time = rq->clock_pelt_old - last_time > > > > > > The last time is nothing but the last snapshot of the rq->clock_pelt when the > > > task entered sleep due to which CPU entered idle. > > > > The condition for dropping this idle time is quite important. This > > only happens when the utilization reaches max compute capacity of the > > CPU. Otherwise, the idle time will be fully applied > > Right. > > rq->lost_idle_time += rq_clock_task(rq) - rq->clock_pelt > > This not only tracks the lost idle time due to running slow but also the > absolute/real sleep time. For example, when the slow running 100% task > sleeps for 100 msec, are not we ignoring the 100 msec sleep there? > > For example a task ran 323 msec at full capacity and sleeps for (1000-323) > msec. when it wakes up the utilization is dropped. If the same task runs > for 626 msec at the half capacity and sleeps for (1000-626), should not > drop the utilization by taking (1000-626) sleep time into account. I > understand that why we don't strech idle time to (1000-323) but it is not > clear to me why we completely drop the idle time. So this should not happen. I' m going to update the way I track lost idle time and move it out of update_rq_clock_pelt() and only do the test when entering idle This is even better as it simplifies update_rq_clock_pelt() and reduces the number of tests for lost idle time Thanks for spotting this I'm preparing a new version with this, some build fix for !SMP and the alignement with cache line suggested by Peter Vincent > > > > > >