Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp1719589ybh; Fri, 13 Mar 2020 06:17:40 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvhz8cwCMARS4t5GnYASeGfZuHIhfxSxHApbTrCTiSImRC56mQ/6FFi3u9eRypNmQRMG71j X-Received: by 2002:a9d:734f:: with SMTP id l15mr4403056otk.55.1584105460584; Fri, 13 Mar 2020 06:17:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584105460; cv=none; d=google.com; s=arc-20160816; b=nzghITNUK4Zd/VQG0oAWhcv3ir8/hGWhwmMt9dZH0WLhFjjZL3lBpgOEbTTsD5zlaH eNlVKoH1/NjFGYA0Bk0I+f0iLmOSBuQm03Yav9FGOuSV/oJPbMlYWZEgR4jnEc5zBZTh d98FIwWTOMfpobeDGKppwA6bf+T22bX9oBlaBTnTziZE30JCfR2AwrkCnjVSXfe/jTHw gzo3DvvEM0eqxcoGGG1bbKiF8wG/2UOhAsVpO7o+m0VaK5qNsHNxcEmmjB99mOr10+a4 q/9W9Asj5cgsuaTNVNsVjAgacBw0IDKN/WRtHhNZ7D2+BxPyoVrFwD0X6cbPXi9n1+LU SIRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=4RbC+t56FP+h2CZ9zcELiUpgySPa+U+KMoYMP6T/fD8=; b=GTw22xbczEL+vDksmlIAtTOC3AbCw7z1EAZowdLhOhjJoYIB6RnfLMpVxEDuKqJrT8 S57tMdFK4boyWsooSEDSYTzK22KTjy9un+y01ocrrXM+aDXa3PTz717D6LAwfd70WpDN Rjp3yOHXvL20oYqFQcEIdO5BlKG6OFORDM6PElIbfCNgULXZfNt/q4Wuz4GRtCSWmc3T LMRTkVyi88nZESdWwgKKGf2ha6hqt2/x0A11q1LeAD/Z9U/RDKK+f8vvTbzR0AqqB2bJ D6Mw7W5H8+qDt/MFPuZUNZLXIGTN+ETycgfEGoMtXhztUhKJpB6w93vwdN6QN9CCDMPa hiiA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=nB4eF0tt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e6si5022610oti.166.2020.03.13.06.17.23; Fri, 13 Mar 2020 06:17:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=nB4eF0tt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726733AbgCMNPV (ORCPT + 99 others); Fri, 13 Mar 2020 09:15:21 -0400 Received: from mail-lf1-f66.google.com ([209.85.167.66]:46650 "EHLO mail-lf1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726426AbgCMNPV (ORCPT ); Fri, 13 Mar 2020 09:15:21 -0400 Received: by mail-lf1-f66.google.com with SMTP id r9so2329251lff.13 for ; Fri, 13 Mar 2020 06:15:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=4RbC+t56FP+h2CZ9zcELiUpgySPa+U+KMoYMP6T/fD8=; b=nB4eF0ttlP7MEtPxbfOWwGF+Onk0PESGl7ak20eqcp1hUcMJNPuXZFrZKkWsGHcgm9 9hjrJNGWnD0e5c300x/7k1x4C6/3mWK9IMFsqIGNmgW3lIXD/zKpCs4TistDtaqYXzNH PJWkG1OXVGDLjfXliMCQc8KCNIRFKexG9X7xvQ0l3bvvMJuLZyx3bStAf8qKbxAm+Tre 0wKQ0bEVAmzHap/txbhlteXH1ENVS2dSOXb6NXq/de4S+ZmJHl1Zw3bxMN1E41O+jRiJ ZPy4LyIdtrWzJsN+tQV9m1/FzPI3JWqHO1m3FSzzEkn1V7qPID/9PxLnN9LOxkKvRaLb fyfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=4RbC+t56FP+h2CZ9zcELiUpgySPa+U+KMoYMP6T/fD8=; b=LKZwT0cOp/+YFZrWKFPOQNQdG7SK7b0VXc17XdaADkKCqdj8Nq1KwZO3S/0AziUr7z behaVspbRfL0yAnBlNzmQ/t0BW7ddhpOeOk6oksn7YNnXPCI5MZwrG6SuBnlDxKDPxoZ Qiae9sQB23RJK2aCs7gcCYWQw+cph//VDcX0/PamGM737vkVS3SF/fMedCTp8wq/lqrM OJV1L9TJvgYGqGEcxIV/pu3bdjoBNw9+CeYQ0iebhCszYLOu44YYtDfSAzN3DtsRBWkz LlzDSYll/AwMmjHMrJ5QfLfXY6On7pWXgAYaJEIPwi3+vhWFbv6XaleqbUYV38XgL90Y BgIw== X-Gm-Message-State: ANhLgQ0F2NINJVY9AiOkPfdXWctDfzL6IomfBmrqHpnl7THJXnQHnWzH 5vzvXX+xbNkxIEh0H80WgywwRRwfbYAMe9mFoUbQoQ== X-Received: by 2002:a19:4c08:: with SMTP id z8mr8371376lfa.95.1584105316645; Fri, 13 Mar 2020 06:15:16 -0700 (PDT) MIME-Version: 1.0 References: <20200311202625.13629-1-daniel.lezcano@linaro.org> In-Reply-To: From: Vincent Guittot Date: Fri, 13 Mar 2020 14:15:05 +0100 Message-ID: Subject: Re: [PATCH V2] sched: fair: Use the earliest break even To: Daniel Lezcano Cc: Peter Zijlstra , Ingo Molnar , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , linux-kernel , Qais Yousef , Valentin Schneider Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 13 Mar 2020 at 13:15, Daniel Lezcano wr= ote: > > On 12/03/2020 13:27, Vincent Guittot wrote: > > On Thu, 12 Mar 2020 at 11:04, Daniel Lezcano wrote: > >> > >> On 12/03/2020 09:36, Vincent Guittot wrote: > >>> Hi Daniel, > >>> > >>> On Wed, 11 Mar 2020 at 21:28, Daniel Lezcano wrote: > >>>> > >>>> In the idle CPU selection process occuring in the slow path via the > >>>> find_idlest_group_cpu() function, we pick up in priority an idle CPU > >>>> with the shallowest idle state otherwise we fall back to the least > >>>> loaded CPU. > >>> > >>> The idea makes sense but this path is only used by fork and exec so > >>> I'm not sure about the real impact > >> > >> I agree the fork / exec path is called much less often than the wake > >> path but it makes more sense for the decision. > >> > >>>> In order to be more energy efficient but without impacting the > >>>> performances, let's use another criteria: the break even deadline. > >>>> > >>>> At idle time, when we store the idle state the CPU is entering in, w= e > >>>> compute the next deadline where the CPU could be woken up without > >>>> spending more energy to sleep. > >>>> > >>>> At the selection process, we use the shallowest CPU but in addition = we > >>>> choose the one with the minimal break even deadline instead of relyi= ng > >>>> on the idle_timestamp. When the CPU is idle, the timestamp has less > >>>> meaning because the CPU could have wake up and sleep again several t= imes > >>>> without exiting the idle loop. In this case the break even deadline = is > >>>> more relevant as it increases the probability of choosing a CPU whic= h > >>>> reached its break even. > >>>> > >>>> Tested on: > >>>> - a synquacer 24 cores, 6 sched domains > >>>> - a hikey960 HMP 8 cores, 2 sched domains, with the EAS and energy = probe > >>>> > >>>> sched/perf and messaging does not show a performance regression. Ran > >>>> 50 times schbench, adrestia and forkbench. > >>>> > >>>> The tools described at https://lwn.net/Articles/724935/ > >>>> > >>>> -------------------------------------------------------------- > >>>> | Synquacer | With break even | Without break even | > >>>> -------------------------------------------------------------- > >>>> | schbench *99.0th | 14844.8 | 15017.6 | > >>>> | adrestia / periodic | 57.95 | 57 | > >>>> | adrestia / single | 49.3 | 55.4 | > >>>> -------------------------------------------------------------- > >>> > >>> Have you got some figures or cpuidle statistics for the syncquacer ? > >> > >> No, and we just noticed the syncquacer has a bug in the firmware and > >> does not actually go to the idle states. > >> > >> > >>>> | Hikey960 | With break even | Without break even | > >>>> -------------------------------------------------------------- > >>>> | schbench *99.0th | 56140.8 | 56256 | > >>>> | schbench energy | 153.575 | 152.676 | > >>>> | adrestia / periodic | 4.98 | 5.2 | > >>>> | adrestia / single | 9.02 | 9.12 | > >>>> | adrestia energy | 1.18 | 1.233 | > >>>> | forkbench | 7.971 | 8.05 | > >>>> | forkbench energy | 9.37 | 9.42 | > >>>> -------------------------------------------------------------- > >>>> > >>>> Signed-off-by: Daniel Lezcano > >>>> --- > >>>> kernel/sched/fair.c | 18 ++++++++++++++++-- > >>>> kernel/sched/idle.c | 8 +++++++- > >>>> kernel/sched/sched.h | 20 ++++++++++++++++++++ > >>>> 3 files changed, 43 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >>>> index 4b5d5e5e701e..8bd6ea148db7 100644 > >>>> --- a/kernel/sched/fair.c > >>>> +++ b/kernel/sched/fair.c > >>>> @@ -5793,6 +5793,7 @@ find_idlest_group_cpu(struct sched_group *grou= p, struct task_struct *p, int this > >>>> { > >>>> unsigned long load, min_load =3D ULONG_MAX; > >>>> unsigned int min_exit_latency =3D UINT_MAX; > >>>> + s64 min_break_even =3D S64_MAX; > >>>> u64 latest_idle_timestamp =3D 0; > >>>> int least_loaded_cpu =3D this_cpu; > >>>> int shallowest_idle_cpu =3D -1; > >>>> @@ -5810,6 +5811,8 @@ find_idlest_group_cpu(struct sched_group *grou= p, struct task_struct *p, int this > >>>> if (available_idle_cpu(i)) { > >>>> struct rq *rq =3D cpu_rq(i); > >>>> struct cpuidle_state *idle =3D idle_get_stat= e(rq); > >>>> + s64 break_even =3D idle_get_break_even(rq); > >>>> + > >>>> if (idle && idle->exit_latency < min_exit_la= tency) { > >>>> /* > >>>> * We give priority to a CPU whose i= dle state > >>>> @@ -5817,10 +5820,21 @@ find_idlest_group_cpu(struct sched_group *gr= oup, struct task_struct *p, int this > >>>> * of any idle timestamp. > >>>> */ > >>>> min_exit_latency =3D idle->exit_late= ncy; > >>>> + min_break_even =3D break_even; > >>>> latest_idle_timestamp =3D rq->idle_s= tamp; > >>>> shallowest_idle_cpu =3D i; > >>>> - } else if ((!idle || idle->exit_latency =3D= =3D min_exit_latency) && > >>>> - rq->idle_stamp > latest_idle_time= stamp) { > >>>> + } else if ((idle && idle->exit_latency =3D= =3D min_exit_latency) && > >>>> + break_even < min_break_even) { > >>>> + /* > >>>> + * We give priority to the shallowes= t > >>>> + * idle states with the minimal brea= k > >>>> + * even deadline to decrease the > >>>> + * probability to choose a CPU which > >>>> + * did not reach its break even yet > >>>> + */ > >>>> + min_break_even =3D break_even; > >>>> + shallowest_idle_cpu =3D i; > >>>> + } else if (!idle && rq->idle_stamp > latest_= idle_timestamp) { > >>>> /* > >>>> * If equal or no active idle state,= then > >>>> * the most recently idled CPU might= have > >>>> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > >>>> index b743bf38f08f..3342e7bae072 100644 > >>>> --- a/kernel/sched/idle.c > >>>> +++ b/kernel/sched/idle.c > >>>> @@ -19,7 +19,13 @@ extern char __cpuidle_text_start[], __cpuidle_tex= t_end[]; > >>>> */ > >>>> void sched_idle_set_state(struct cpuidle_state *idle_state) > >>>> { > >>>> - idle_set_state(this_rq(), idle_state); > >>>> + struct rq *rq =3D this_rq(); > >>>> + > >>>> + idle_set_state(rq, idle_state); > >>> > >>> Shouldn't the state be set after setting break even otherwise you wil= l > >>> have a time window with an idle_state !=3D null but the break_even st= ill > >>> set to the previous value > >> > >> IIUC we are protected in this section. Otherwise the routine above wou= ld > >> be also wrong [if (idle && idle->exit_latency)], no? > > > > no there are not the same because it uses the idle pointer to read > > exit_latency so we are sure to use exit_latency related to the idle > > pointer. > > > > In your case it checks idle is not null but then it uses rq to read > > break_even but it might not have been already updated > > Ok I will invert the lines. > > >>>> + > >>>> + if (idle_state) > >>>> + idle_set_break_even(rq, ktime_get_ns() + > >>> > >>> What worries me a bit is that it adds one ktime_get call each time a > >>> cpu enters idle > >> > >> Right, we can improve this in the future by folding the local_clock() = in > >> cpuidle when entering idle with this ktime_get. > > > > Using local_clock() would be more latency friendly > > Unfortunately we are comparing the deadline across CPUs, so the > local_clock() can not be used here. > > But if we have one ktime_get() instead of a local_clock() + ktime_get(), > that should be fine, no? Can't this computation of break_even be done in cpuidle framework while computing other statistics for selecting the idle state instead ? cpuidle already uses ktime_get for next hrtimer as an example. So cpuidle compute break_even and make it available to scheduler like exit_latency. And I can imagine that system wide time value will also be needed when looking at next wakeup event of cluster/group of CPUs > > >>>> + idle_state->exit_latency_ns); > >>>> } > >> > >> [ ... ] > >> > >> > >> -- > >> Linaro.org =E2=94=82 Open source software fo= r ARM SoCs > >> > >> Follow Linaro: Facebook | > >> Twitter | > >> Blog > >> > > > -- > Linaro.org =E2=94=82 Open source software for A= RM SoCs > > Follow Linaro: Facebook | > Twitter | > Blog >