Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp3495337ybh; Tue, 17 Mar 2020 00:58:13 -0700 (PDT) X-Google-Smtp-Source: ADFU+vsnlYAGQnhk/iMExrEUfmpNUBmbps7j29VLefrL4bSTY83BlDB2fxhN25vsIA2fZXwhfs7/ X-Received: by 2002:a9d:7859:: with SMTP id c25mr2715387otm.323.1584431893287; Tue, 17 Mar 2020 00:58:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584431893; cv=none; d=google.com; s=arc-20160816; b=KMq0CAmy2urPYQ80wPmp1ihTWlF9LWXYc3/OavM6M/Rg7uNPl6lhzTl0RpiIgzOKxn kBW9Y2QaIufHm1zchEIHf7TNYyN6lrFcj1lnznz2RzuuH7DV6nw3fOmAEbIW+JbAjU28 7219yqlrUnNz1xqAlvpUtXHGHmRA5NFmyZBjw5ySa3PwSnBwTMLRE4puGkY7pmJDxqma CbCnUzv902zn8352De/9L3cruIdQgE06kNtUA/KDiBvIi6UMeyCklOZdHtdjD6jSHjuH IxEwt8oDyJsMIBjzVt7NGaqRS+T9SOZ6+0qLXTitVPNTXNA0GuhH9dWb0Tk8TLoQKz1F Ks6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=TshkYVmM10whlphFsBY3ScmHsivxpAG0pdrFIPGIl3Y=; b=URNaKblzBfjJ/zQVLEAmuUnvsvYbPo+VTcBT/qfFlQBEOov1shU8XPKYqMlsf7vuDl rX3wKfixbAqAoddLx/9Sa8rz0EVRQzwve5oimjM/Z8HLwpWfun6p287l4TwI2pAQ0Rps Vmp/bh4eJHyCxopdmGSoFqhlh6z368WHobxmOGQyICeVu+DqkMRMoyIsbXkmGB1kFBAh e85xth7ezD9bn8toCD8ufs/SxFXy0Z5qyTUvuVNqbZt3VTuW1ZtunOc847kWZzuTkwsS KY0/FSdanD2QoxcqG+BsLWnRRTrEhDl5zyGsQjc4yIdiQLG2vJvWJy6jDzGxlR4hYH9J D5zA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x73si1330276oif.16.2020.03.17.00.58.01; Tue, 17 Mar 2020 00:58:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726388AbgCQH4M (ORCPT + 99 others); Tue, 17 Mar 2020 03:56:12 -0400 Received: from foss.arm.com ([217.140.110.172]:33460 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725536AbgCQH4M (ORCPT ); Tue, 17 Mar 2020 03:56:12 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 33E0E1FB; Tue, 17 Mar 2020 00:56:11 -0700 (PDT) Received: from e105550-lin.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BC0853F52E; Tue, 17 Mar 2020 00:56:09 -0700 (PDT) Date: Tue, 17 Mar 2020 07:56:07 +0000 From: Morten Rasmussen To: Daniel Lezcano Cc: Vincent Guittot , Peter Zijlstra , Ingo Molnar , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , linux-kernel , Qais Yousef , Valentin Schneider Subject: Re: [PATCH V2] sched: fair: Use the earliest break even Message-ID: <20200317075607.GE10914@e105550-lin.cambridge.arm.com> References: <20200311202625.13629-1-daniel.lezcano@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Daniel, First, I think letting the scheduler know about desired minimum idle times is an interesting optimization if the overhead can be kept at a minimum. I do have a few comments about the patch though. On Thu, Mar 12, 2020 at 11:04:19AM +0100, Daniel Lezcano wrote: > On 12/03/2020 09:36, Vincent Guittot wrote: > > Hi Daniel, > > > > On Wed, 11 Mar 2020 at 21:28, Daniel Lezcano wrote: > >> > >> In the idle CPU selection process occuring in the slow path via the > >> find_idlest_group_cpu() function, we pick up in priority an idle CPU > >> with the shallowest idle state otherwise we fall back to the least > >> loaded CPU. > > > > The idea makes sense but this path is only used by fork and exec so > > I'm not sure about the real impact > > I agree the fork / exec path is called much less often than the wake > path but it makes more sense for the decision. Looking at the flow in find_idlest_cpu(), AFAICT, find_idlest_group_cpu() is not actually making the final choice of CPU, so going through a lot of trouble there looking at idle states is pointless. Is there something I don't see? We fellow sd->child until groups == CPUs which which means that find_idlest_group() actually makes the final choice as the final group passed to find_idlest_group_cpu() is single-CPU group. The flow has been like that for years. Even before you added the initial idle-state awareness. I agree with Vincent, if this should really make a difference it should include wake-ups existing tasks too. Although I'm aware it would be a more invasive change. As said from the beginning, the idea is fine, but the current implementation should not make any measurable difference? > > >> In order to be more energy efficient but without impacting the > >> performances, let's use another criteria: the break even deadline. > >> > >> At idle time, when we store the idle state the CPU is entering in, we > >> compute the next deadline where the CPU could be woken up without > >> spending more energy to sleep. I don't follow the argument that sleeping longer should improve energy consumption. The patch doesn't affect the number of idle state enter/exit cycles, so you spend the amount of energy on those transitions. The main change is that idle time get spread out, so CPUs are less likely to be in the process of entering an idle state when they are asked to wake back up again. Isn't it fair to say that we expect the total number of wake-ups remains unchanged? Total busy and idle times across all CPUs should remain the same too? Unless chosen idle-state is changed, which I don't think we expect either, there should be no net effect on energy? The main benefit is reduced wake-up latency I think. Regarding chosen idle state, I'm wondering how this patch affects the cpuidle governor's idle state selection. Could the spreading of wake-ups trick governor to pick a shallower idle-state for some idle CPUs because we actively spread wake-ups rather than consolidating them? Just a thought. > >> > >> At the selection process, we use the shallowest CPU but in addition we > >> choose the one with the minimal break even deadline instead of relying > >> on the idle_timestamp. When the CPU is idle, the timestamp has less > >> meaning because the CPU could have wake up and sleep again several times > >> without exiting the idle loop. In this case the break even deadline is > >> more relevant as it increases the probability of choosing a CPU which > >> reached its break even. I guess you could improve the idle time stamping without adding the break-even time, they don't have to go together? > >> > >> Tested on: > >> - a synquacer 24 cores, 6 sched domains > >> - a hikey960 HMP 8 cores, 2 sched domains, with the EAS and energy probe > >> > >> sched/perf and messaging does not show a performance regression. Ran > >> 50 times schbench, adrestia and forkbench. > >> > >> The tools described at https://lwn.net/Articles/724935/ > >> > >> -------------------------------------------------------------- > >> | Synquacer | With break even | Without break even | > >> -------------------------------------------------------------- > >> | schbench *99.0th | 14844.8 | 15017.6 | > >> | adrestia / periodic | 57.95 | 57 | > >> | adrestia / single | 49.3 | 55.4 | > >> -------------------------------------------------------------- > > > > Have you got some figures or cpuidle statistics for the syncquacer ? > > No, and we just noticed the syncquacer has a bug in the firmware and > does not actually go to the idle states. I would also like some statistics to help understanding what actually changes. I did some measurements on TX2, which only has one idle-state. I don't see the same trends as you do. adrestia single seems to be most affected by the patch, but _increases_ with the break_even patch rather than decrease. I don't trust adrestia too much though as the time resolution is low on TX2. TX2 tip break_even ---------------------------------------------------- adrestia / single 5.21 5.51 adrestia / periodic 5.75 5.67 schbench 99.0th 45465.6 45376.0 hackbench 27.9851 27.9775 Notes: adrestia: Avg of 100 runs: adrestia -l 25000 schbench: Avg of 10 runs: schbench -m16 -t64 hackbench: Avg of 10 runs: hackbench -g 20 -T 256 -l 100000 Morten