Received: by 10.213.65.68 with SMTP id h4csp2639747imn; Mon, 9 Apr 2018 06:49:52 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/rUt1FwQUkroMGwu8KC/Q1j2s/ppk1PGKh0nfbNX9UIiZiilCWMNPVGfZq17tIj2WQFk2a X-Received: by 2002:a17:902:b188:: with SMTP id s8-v6mr9510648plr.339.1523281792458; Mon, 09 Apr 2018 06:49:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523281792; cv=none; d=google.com; s=arc-20160816; b=YfeG+XTJka3JUsHC2sKYdx0OYrxECSew9TDjhAzvVnn/ZJI7owgxYKFuSQGGW0Svar Wv3lv7L6yXViFQG4+rVduhf7Hvi2tOyHNSHw1+H7BRYZV5h2GjVMN+qNHq/wxd1Q4BfY E33FtegjT4NvUoOcy627gmOSltwNbl/tjj1KyD4HqS0+rfYljKzZ1nMc9XMFp+ttyokm YK7t6UzImgUc+mc/VMM8ZY5aPlbPHsq1Y+zsG5kIjpyvPkHD8GjjTqe+LV9oRGVsab4a mxnsJoODtSI05nFLgledc/bzulVUfq4ZaaYityS9BmjeLqEkueOBQyHbpp20NUyLTpNZ b/bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=GBr1wkJ/DQKAz4/K/PN5vRsdqADpo15F8t/+JOvT2iE=; b=OdkPghaYnLnwzcO4dXb3qs1qGXNzbtVFKXZat2czrV69pL7QWtnZ0LO8gUyuOjycba gzJER8Np3uj4TTqR6sUQQjgQPUC+3kzLxBwMuxzciaRM5z23a5hq2mS1BvAbbxshaSjZ 654pAKVpt4sdg9icRvemWPwAL0EEd3j+lyGtvDw+GNVZq3+8gaG4wfVV7EKZBsFqO1mH yfQ0M8koGkB+EvjjgNBMw0t/VWMa1FNz5gzF2w5JE0zvJMWX5fNduAmL1PvGJ1d18ZiA EHwi1MU8bB0BWgAXzJ8wrgNcunM82MuPdHsqcWnEtjVPMsoj1/xVkMqXuvwblGSI8rc2 pQFg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e12si240245pgn.339.2018.04.09.06.49.15; Mon, 09 Apr 2018 06:49:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751916AbeDINpU (ORCPT + 99 others); Mon, 9 Apr 2018 09:45:20 -0400 Received: from foss.arm.com ([217.140.101.70]:56480 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751736AbeDINpS (ORCPT ); Mon, 9 Apr 2018 09:45:18 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3BF9EF; Mon, 9 Apr 2018 06:45:18 -0700 (PDT) Received: from e108498-lin.cambridge.arm.com (e108498-lin.cambridge.arm.com [10.1.210.38]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B51AC3F592; Mon, 9 Apr 2018 06:45:15 -0700 (PDT) Date: Mon, 9 Apr 2018 14:45:11 +0100 From: Quentin Perret To: Peter Zijlstra Cc: Dietmar Eggemann , linux-kernel@vger.kernel.org, Thara Gopinath , linux-pm@vger.kernel.org, Morten Rasmussen , Chris Redpath , Patrick Bellasi , Valentin Schneider , "Rafael J . Wysocki" , Greg Kroah-Hartman , Vincent Guittot , Viresh Kumar , Todd Kjos , Joel Fernandes Subject: Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs Message-ID: <20180409134510.GA4577@e108498-lin.cambridge.arm.com> References: <20180320094312.24081-1-dietmar.eggemann@arm.com> <20180320094312.24081-3-dietmar.eggemann@arm.com> <20180409120111.GA4043@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180409120111.GA4043@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Monday 09 Apr 2018 at 14:01:11 (+0200), Peter Zijlstra wrote: > On Tue, Mar 20, 2018 at 09:43:08AM +0000, Dietmar Eggemann wrote: > > From: Quentin Perret > > > > The energy consumption of each CPU in the system is modeled with a list > > of values representing its dissipated power and compute capacity at each > > available Operating Performance Point (OPP). These values are derived > > from existing information in the kernel (currently used by the thermal > > subsystem) and don't require the introduction of new platform-specific > > tunables. The energy model is also provided with a simple representation > > of all frequency domains as cpumasks, hence enabling the scheduler to be > > aware of dependencies between CPUs. The data required to build the energy > > model is provided by the OPP library which enables an abstract view of > > the platform from the scheduler. The new data structures holding these > > models and the routines to populate them are stored in > > kernel/sched/energy.c. > > > > For the sake of simplicity, it is assumed in the energy model that all > > CPUs in a frequency domain share the same micro-architecture. As long as > > this assumption is correct, the energy models of different CPUs belonging > > to the same frequency domain are equal. Hence, this commit builds only one > > energy model per frequency domain, and links all relevant CPUs to it in > > order to save time and memory. If needed for future hardware platforms, > > relaxing this assumption should imply relatively simple modifications in > > the code but a significantly higher algorithmic complexity. > > What this doesn't mention is why this isn't part of the regular topology > bits. IIRC this is because the frequency domains don't necessarily need > to align with the existing topology, but this completely fails to state > any of that. Yes that's the main reason. Frequency domains and scheduling domains don't necessarily align. That used to be the case for big.LITTLE platforms, but not anymore with DynamIQ ... > > Also, since I'm not at all familiar with DT and the OPP library stuff, > this code is completely unreadable to me and there isn't a nice comment > to help me along. Right, so I can definitely fix that. Comments in the code and a better commit message should help hopefully. And also, it has already been suggested that a documentation file should be added alongside the code for this patchset, so I'll make sure we add that for the next version. In the meantime, here is a (hopefully) better explanation below. In this specific patch, we are basically trying to figure out the boundaries of frequency domains, and the power consumed by each CPU at each OPP, to make them available to the scheduler. The important thing here is that, in both cases, we rely on the OPP library to keep the code as platform-agnostic as possible. In the case of the frequency domains for example, the cpufreq driver is in charge of specifying the CPUs that are sharing frequencies. That information can come from DT, or SCPI, or SCMI, or whatever -- we probably shouldn't have to care about that from the scheduler's standpoint. That's why using dev_pm_opp_get_sharing_cpus() is handy, the OPP library gives us the digested information we need. The power values (dev_pm_opp_get_power) we use right now are those already used by the thermal subsystem (IPA), which means we don't have to introduce any new DT binding whatsoever. In a close future, the power values could also come from other sources (SCMI for ex), and again it's probably not the scheduler's job to care about those things, so the OPP library is helping us again. As mentioned in the notes, as of today, this approach has dependencies on other patches relating to these things which are already on the list [1]. The rest of the code in this patch is just about iterating over the CPUs/freq. domains/OPPs. The algorithm is more or less the following: 1. find a frequency domain which hasn't been visited yet; 2. estimate the power and capacity of a CPU in this freq domain at each possible OPP; 3. map all CPUs in the freq domain to this list of tuples; 4. go to 1. I hope that makes sense. Thanks, Quentin [1] https://marc.info/?l=linux-pm&m=151635516419249&w=2