Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp2623433imm; Fri, 24 Aug 2018 02:28:05 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbcxV62OglrZN7HkHRuwesCgH1EbSx9QZP2x9UUj/pDagTd82KGvXTIqmhZUs8t7pwDcoFL X-Received: by 2002:a62:404e:: with SMTP id n75-v6mr1063739pfa.232.1535102885096; Fri, 24 Aug 2018 02:28:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535102885; cv=none; d=google.com; s=arc-20160816; b=o/5n1+P/q1gdSwbFHbtAUQxfBmyrUKlLtdwc6tjetKdNezkI8W/IoW0lHKPHlMujeY 62QyJouHSGMKF6K78XJHGc46tDEfhBRbd2mAn4oe1U//xdF5fFlhFeF7h6ZubEJ4Lxsj FJUG+DnUyKg/orufHAhxvSr+b9rklpoIwpd5bwd8Nnc1zZT7flsohlqGTqX4+DX66cEA ZnECyQvhLFMk6RTXLpWqjv17F1aLYRG50FMCeXO4DB4DUNH2FOxygEob8MHdaCKFbuI4 3aIggy0KfHHLPOxS6VigUKdaxDgzY2wh2fLdzikgoRt1W6tj+axv0gt3FKtNOo4TATP8 iGSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=fPzEj+3eewhBZ7+76CHA6pVfTZBMnW7nLtyGE98+Irg=; b=Jcjp9LAYTyQFYdFFA/MOomoIbDx6yjty30UvIOFta+7rK2N/EXGh1m2/VzzLRvkT2X 8fA+opKZkRs1agwhRD1nLbiQeE+Tjzvla+TGLgyla0jfZQ3U0ftdqrKHIInn889SkUQr LJPeAfY+f/2ES2pXx3DLROMQQ8sDD8BrBl6dNrqCtT2OmSeJ2KD1JWyzfrLBhfif1dc3 G6xKj9cFAyo8956rTmapu9MhkSguasP8J2IjHrMbGxfsDO/DpQtygIKNVqI1HImted0g hI1IUXipgezgZQs4VS89tC9IPOW1SOpd7z4t1NlUUok2Ago5x7lqwYta2YBUe3dlBXQJ HGCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=IVnhrFbl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l89-v6si2605778pfb.100.2018.08.24.02.27.49; Fri, 24 Aug 2018 02:28:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=IVnhrFbl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727622AbeHXNAI (ORCPT + 99 others); Fri, 24 Aug 2018 09:00:08 -0400 Received: from mail-it0-f67.google.com ([209.85.214.67]:50831 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726608AbeHXNAI (ORCPT ); Fri, 24 Aug 2018 09:00:08 -0400 Received: by mail-it0-f67.google.com with SMTP id j81-v6so1296138ite.0 for ; Fri, 24 Aug 2018 02:26:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=fPzEj+3eewhBZ7+76CHA6pVfTZBMnW7nLtyGE98+Irg=; b=IVnhrFblG3jvcJExREx8jjaU18CRJyPPW3a0Rg2dWX4t7h1Mc0WBZvYIkU8ALhQPMW 16rOerXQmiTpBNM+0m2fyiwJuHgFm0f9cPaKxe5U8RhsOB7VcbllQHk0+1o6pPC2hd/X jB0x5qJjtN4FmYhAq7KY4lOfjac96ei6dv62s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=fPzEj+3eewhBZ7+76CHA6pVfTZBMnW7nLtyGE98+Irg=; b=YR1U2B9Vhwqam8boWODuCs3wWQDbsLyodkkc0T/LCcjL2jX4gPLEvTAwvM6d298h0B 0SPc7itQwoI3yZzTLR6RxHiy71gOHIPpsMEg755KPQqPXbKvOp88NRvJY6IpHY4sneY+ /PplqQPdaSOyZhLhXrdHf8oQEuPPC3RMLG9kGx2Sfkwg6UZb1d1TsP6fkLGJgaDE+B9g w0MmjRgTn9xzGBa/GigCpVE2AiA3UpJcsX5Evjf3XhjHIvvvg69WVyAJRFZEETAlJGYy 11XDF/NEx9TVmAodVadW5C9w/IvM0PEpot3R9/gnCVe7DjYejvcMV1Q6PpBdFZ43ZSBm wNRg== X-Gm-Message-State: APzg51B9LI7mTJRuGoFrJLrzaaVPNVYyowGm9b9udk6p5gtQiAaAZThS O4xvSNuskkkscylcJ/l47SLeYWnLxDeyP4RKGzNw7Q== X-Received: by 2002:a24:3fc6:: with SMTP id d189-v6mr734917ita.64.1535102779966; Fri, 24 Aug 2018 02:26:19 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a02:2b03:0:0:0:0:0 with HTTP; Fri, 24 Aug 2018 02:26:19 -0700 (PDT) In-Reply-To: <20180809153925.GA20329@red-moon> References: <20180620172226.15012-1-ulf.hansson@linaro.org> <20180620172226.15012-8-ulf.hansson@linaro.org> <3574880.GjmnMm1lMq@aspire.rjw.lan> <10360149.m4MlxDWZY5@aspire.rjw.lan> <20180809153925.GA20329@red-moon> From: Ulf Hansson Date: Fri, 24 Aug 2018 11:26:19 +0200 Message-ID: Subject: Re: [PATCH v8 07/26] PM / Domains: Add genpd governor for CPUs To: Lorenzo Pieralisi Cc: "Rafael J. Wysocki" , "Rafael J. Wysocki" , Sudeep Holla , Mark Rutland , Linux PM , Kevin Hilman , Lina Iyer , Lina Iyer , Rob Herring , Daniel Lezcano , Thomas Gleixner , Vincent Guittot , Stephen Boyd , Juri Lelli , Geert Uytterhoeven , Linux ARM , linux-arm-msm , Linux Kernel Mailing List , Frederic Weisbecker , Ingo Molnar Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9 August 2018 at 17:39, Lorenzo Pieralisi wrote: > On Mon, Aug 06, 2018 at 11:20:59AM +0200, Rafael J. Wysocki wrote: > > [...] > >> >>> > @@ -245,6 +248,56 @@ static bool always_on_power_down_ok(struct dev_pm_domain *domain) >> >>> > return false; >> >>> > } >> >>> > >> >>> > +static bool cpu_power_down_ok(struct dev_pm_domain *pd) >> >>> > +{ >> >>> > + struct generic_pm_domain *genpd = pd_to_genpd(pd); >> >>> > + ktime_t domain_wakeup, cpu_wakeup; >> >>> > + s64 idle_duration_ns; >> >>> > + int cpu, i; >> >>> > + >> >>> > + if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN)) >> >>> > + return true; >> >>> > + >> >>> > + /* >> >>> > + * Find the next wakeup for any of the online CPUs within the PM domain >> >>> > + * and its subdomains. Note, we only need the genpd->cpus, as it already >> >>> > + * contains a mask of all CPUs from subdomains. >> >>> > + */ >> >>> > + domain_wakeup = ktime_set(KTIME_SEC_MAX, 0); >> >>> > + for_each_cpu_and(cpu, genpd->cpus, cpu_online_mask) { >> >>> > + cpu_wakeup = tick_nohz_get_next_wakeup(cpu); >> >>> > + if (ktime_before(cpu_wakeup, domain_wakeup)) >> >>> > + domain_wakeup = cpu_wakeup; >> >>> > + } >> >> >> >> Here's a concern I have missed before. :-/ >> >> >> >> Say, one of the CPUs you're walking here is woken up in the meantime. >> > >> > Yes, that can happen - when we miss-predicted "next wakeup". >> > >> >> >> >> I don't think it is valid to evaluate tick_nohz_get_next_wakeup() for it then >> >> to update domain_wakeup. We really should just avoid the domain power off in >> >> that case at all IMO. >> > >> > Correct. >> > >> > However, we also want to avoid locking contentions in the idle path, >> > which is what this boils done to. >> >> This already is done under genpd_lock() AFAICS, so I'm not quite sure >> what exactly you mean. >> >> Besides, this is not just about increased latency, which is a concern >> by itself but maybe not so much in all environments, but also about >> possibility of missing a CPU wakeup, which is a major issue. >> >> If one of the CPUs sharing the domain with the current one is woken up >> during cpu_power_down_ok() and the wakeup is an edge-triggered >> interrupt and the domain is turned off regardless, the wakeup may be >> missed entirely if I'm not mistaken. >> >> It looks like there needs to be a way for the hardware to prevent a >> domain poweroff when there's a pending interrupt or I don't quite see >> how this can be handled correctly. >> >> >> Sure enough, if the domain power off is already started and one of the CPUs >> >> in the domain is woken up then, too bad, it will suffer the latency (but in >> >> that case the hardware should be able to help somewhat), but otherwise CPU >> >> wakeup should prevent domain power off from being carried out. >> > >> > The CPU is not prevented from waking up, as we rely on the FW to deal with that. >> > >> > Even if the above computation turns out to wrongly suggest that the >> > cluster can be powered off, the FW shall together with the genpd >> > backend driver prevent it. >> >> Fine, but then the solution depends on specific FW/HW behavior, so I'm >> not sure how generic it really is. At least, that expectation should >> be clearly documented somewhere, preferably in code comments. >> >> > To cover this case for PSCI, we also use a per cpu variable for the >> > CPU's power off state, as can be seen later in the series. >> >> Oh great, but the generic part should be independent on the underlying >> implementation of the driver. If it isn't, then it also is not >> generic. >> >> > Hope this clarifies your concern, else tell and will to elaborate a bit more. >> >> Not really. >> >> There also is one more problem and that is the interaction between >> this code and the idle governor. >> >> Namely, the idle governor may select a shallower state for some >> reason, for example due to an additional latency limit derived from >> CPU utilization (like in the menu governor), and how does the code in >> cpu_power_down_ok() know what state has been selected and how does it >> honor the selection made by the idle governor? > > That's a good question and it maybe gives a path towards a solution. > > AFAICS the genPD governor only selects the idle state parameter that > determines the idle state at, say, GenPD cpumask level it does not touch > the CPUidle decision, that works on a subset of idle states (at cpu > level). > > That's my understanding, which can be wrong so please correct me > if that's the case because that's a bit confusing. > > Let's imagine that we flattened out the list of idle states and feed > CPUidle with it (all of them - cpu, cluster, package, system - as it is > in the mainline _now_). Then the GenPD governor can run-through the > CPUidle selection and _demote_ the idle state if necessary since it > understands that some CPUs in the GenPD will wake up shortly and break > the target residency hyphothesis the CPUidle governor is expecting. > > The whole idea about this series is improving CPUidle decision when > the target idle state is _shared_ among groups of cpus (again, please > do correct me if I am wrong). Absolutely, this is one of the main reason for the series! > > It is obvious that a GenPD governor must only demote - never promote a > CPU idle state selection given that hierarchy implies more power > savings and higher target residencies required. Absolutely. I apologize if I have been using the word "promote" wrongly, I realize it may be a bit confusing. > > This whole series would become more generic and won't depend on > PSCI OSI at all - actually that would become a hierarchical > CPUidle governor. Well, to me we need a first user of the new infrastructure code in genpd and PSCI is probably the easiest one to start with. An option would be to start with an old ARM32 platform, but it seems a bit silly to me. In regards to OS-initiated mode vs platform coordinated mode, let's discuss that in details in the other email thread instead. > > I still think that PSCI firmware and most certainly mwait() play the > role the GenPD governor does since they can detect in FW/HW whether > that's worthwhile to switch off a domain, the information is obviously > there and the kernel would just add latency to the idle path in that > case but let's gloss over this for the sake of this discussion. Yep, let's discuss that separately. That said, can I interpret your comments on the series up until this change, that you seems rather happy with where the series is going? Kind regards Uffe