Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1067569imu; Wed, 16 Jan 2019 12:10:17 -0800 (PST) X-Google-Smtp-Source: ALg8bN4gViotJ+ryzMpIZcw0kjAd7elEIJwPlFEYmZ7qUyfjteLRYuu4BtVTJt1unY0nuzox8Pc+ X-Received: by 2002:a17:902:6e0f:: with SMTP id u15mr11544818plk.175.1547669417872; Wed, 16 Jan 2019 12:10:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547669417; cv=none; d=google.com; s=arc-20160816; b=bQTVAwl+vDkcLxZrBwvQ6HoYd0Cf+EwJiEAkMeUWWS2llq/x8cDhSPtGtQf/+GAdBj r7/J8GRIWQqRfLhV+Jw+13CIB2ie3nhf/2bT0taZ0QuJ2YArKwzEQA1OKFTFot+7AGMD Dp3SFUhL4iK7SbIF/v2yRE4TyZgGFRs8gNfDOCjp9LvrDnPw1GLyBYvT2rc/GuNtoLua TmKhohfP6Q89PGacNR/Bi9wN6x9kB/SSEpGU8Il72O0+wYv4Suymyur8X1eW5LgXZch2 xn+ddg/01tSY2CYbCc4RjPn6iFpi99nkV0SpCudZRgNlHJEM0P2FpU0HW65VkxZnbt3h czqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=pgxm3kChNw8oK1BV6R/D92bOBtVUjtxC+UQG7yRRsDE=; b=RiBYRxHt0Nj5W+/d3RjtMHV1rXTSIObGkR+oaIEsvfpvMR78Q2TLEjm3lPdcCFd5Qw VNwEHGpKDwpwZzov9b4iXjfDRqG1ToB6bVGiVULh4OBnkyHaAmzFCLAJtHQbQAso4YLI IUyMAisyCkxvBN8dxvaheCl8wCtX9k5QCRNVZWMaQW6gRL1xQrzAUTJaqYyRGbgEp0rp VGD/7KJ0xhGpLCQ2QVJkwgBxTHVtwHvAD1uoBj26Ct6BwXqX2y2LVNdCS7hZaiqXTCxS yIfmKBF5grs6c1ZZ2KR1LKhtjhbAWtngEJmH3fwo8qluSOIm1EE77ftVOapDUCIO4Omm Xgeg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=JxJS7eYn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d7si7222908pfo.108.2019.01.16.12.10.01; Wed, 16 Jan 2019 12:10:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=JxJS7eYn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391608AbfAPMAv (ORCPT + 99 others); Wed, 16 Jan 2019 07:00:51 -0500 Received: from mail-ua1-f65.google.com ([209.85.222.65]:37723 "EHLO mail-ua1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391450AbfAPMAv (ORCPT ); Wed, 16 Jan 2019 07:00:51 -0500 Received: by mail-ua1-f65.google.com with SMTP id u19so2062673uae.4 for ; Wed, 16 Jan 2019 04:00:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pgxm3kChNw8oK1BV6R/D92bOBtVUjtxC+UQG7yRRsDE=; b=JxJS7eYnc6IfjfgEpI48yRwQyPcr7WfXJQZh2wt9L5ImQwQ3xcYZ6DV7Vs7j5sKNqh Qk43YwkxJgZ8zZXEQyyOnTyYMublI9RNq3b2tuQDRRrsg6K7kjxxCWpSxhrL9//It4tG xyHIeNjLjAUWLUS+rllZWzZZ3LU40DYiUmsyQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pgxm3kChNw8oK1BV6R/D92bOBtVUjtxC+UQG7yRRsDE=; b=GhUV7/1ZArwUNFf9hyLd7aav2WY6YEOpP4G3WEQG+iHXaA0NcwZZn+zUMsS0KD2IRS 8XN1GyTEGtg7eK8AsHbH5hTeeCUWHfOci8wVzHrXLH7ojxW7p431444qSyDm5ug/0crr fMCFQSnXH0L31IJb5rlsIkEVVBx80MVOEIUu5rlqT6GoPMZPG3P6LayeJVwLg+MfUrts T/lvwWrS/k3L7PcekJvQ5G9BJQDivPY59bnhZuZdmyVkQAeirFxdGJCjz5cgHw+7xnj1 3kOK71Xg9+NiVTRamXkF+4a+mnTPSkL5LbGmAILHiWoJruqdzIytP2urrblUqDohQxjV IJIw== X-Gm-Message-State: AJcUukefJg2fwRcZMgHvomwbX3UytkmP3c4mpQq84AS9g4qAIYq6aDcs nDTtnkwa7FjGjDVqFuQTLOek6GQmwmjW9bE4zxoCAA== X-Received: by 2002:ab0:526:: with SMTP id 35mr3521294uax.25.1547640049661; Wed, 16 Jan 2019 04:00:49 -0800 (PST) MIME-Version: 1.0 References: <20181129174700.16585-1-ulf.hansson@linaro.org> <20181129174700.16585-4-ulf.hansson@linaro.org> <119031115.C8gHTGbYmQ@aspire.rjw.lan> In-Reply-To: From: Ulf Hansson Date: Wed, 16 Jan 2019 13:00:13 +0100 Message-ID: Subject: Re: [PATCH v10 03/27] timer: Export next wakeup time of a CPU To: "Rafael J. Wysocki" Cc: "Rafael J. Wysocki" , Sudeep Holla , Lorenzo Pieralisi , Mark Rutland , Daniel Lezcano , Linux PM , "Raju P . L . S . S . S . N" , Stephen Boyd , Tony Lindgren , Kevin Hilman , Lina Iyer , Viresh Kumar , Vincent Guittot , Geert Uytterhoeven , Linux ARM , linux-arm-msm , Linux Kernel Mailing List , Lina Iyer , Thomas Gleixner , Frederic Weisbecker , Ingo Molnar Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 16 Jan 2019 at 11:59, Rafael J. Wysocki wrote: > > On Wed, Jan 16, 2019 at 8:58 AM Ulf Hansson wrote: > > > > On Fri, 11 Jan 2019 at 12:07, Rafael J. Wysocki wrote: > > > > > > On Thursday, November 29, 2018 6:46:36 PM CET Ulf Hansson wrote: > > > > From: Lina Iyer > > > > > > > > Knowing the sleep duration of CPUs, is known to be needed while selecting > > > > the most energy efficient idle state for a CPU or a group of CPUs. > > > > > > > > However, to be able to compute the sleep duration, we need to know at what > > > > time the next expected wakeup is for the CPU. Therefore, let's export this > > > > information via a new function, tick_nohz_get_next_wakeup(). Following > > > > changes make use of it. > > > > > > > > Cc: Thomas Gleixner > > > > Cc: Daniel Lezcano > > > > Cc: Lina Iyer > > > > Cc: Frederic Weisbecker > > > > Cc: Ingo Molnar > > > > Signed-off-by: Lina Iyer > > > > Co-developed-by: Ulf Hansson > > > > Signed-off-by: Ulf Hansson > > > > --- > > > > > > > > Changes in v10: > > > > - Updated function header of tick_nohz_get_next_wakeup(). > > > > > > > > --- > > > > include/linux/tick.h | 8 ++++++++ > > > > kernel/time/tick-sched.c | 13 +++++++++++++ > > > > 2 files changed, 21 insertions(+) > > > > > > > > diff --git a/include/linux/tick.h b/include/linux/tick.h > > > > index 55388ab45fd4..e48f6b26b425 100644 > > > > --- a/include/linux/tick.h > > > > +++ b/include/linux/tick.h > > > > @@ -125,6 +125,7 @@ extern bool tick_nohz_idle_got_tick(void); > > > > extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next); > > > > extern unsigned long tick_nohz_get_idle_calls(void); > > > > extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu); > > > > +extern ktime_t tick_nohz_get_next_wakeup(int cpu); > > > > extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time); > > > > extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time); > > > > > > > > @@ -151,6 +152,13 @@ static inline ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next) > > > > *delta_next = TICK_NSEC; > > > > return *delta_next; > > > > } > > > > + > > > > +static inline ktime_t tick_nohz_get_next_wakeup(int cpu) > > > > +{ > > > > + /* Next wake up is the tick period, assume it starts now */ > > > > + return ktime_add(ktime_get(), TICK_NSEC); > > > > +} > > > > + > > > > static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; } > > > > static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; } > > > > > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > > > index 69e673b88474..7a9166506503 100644 > > > > --- a/kernel/time/tick-sched.c > > > > +++ b/kernel/time/tick-sched.c > > > > @@ -1089,6 +1089,19 @@ unsigned long tick_nohz_get_idle_calls(void) > > > > return ts->idle_calls; > > > > } > > > > > > > > +/** > > > > + * tick_nohz_get_next_wakeup - return the next wake up of the CPU > > > > + * @cpu: the particular CPU to get next wake up for > > > > + * > > > > + * Called for idle CPUs only. > > > > + */ > > > > +ktime_t tick_nohz_get_next_wakeup(int cpu) > > > > +{ > > > > + struct clock_event_device *dev = per_cpu(tick_cpu_device.evtdev, cpu); > > > > + > > > > + return dev->next_event; > > > > +} > > > > + > > > > static void tick_nohz_account_idle_ticks(struct tick_sched *ts) > > > > { > > > > #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE > > > > > > > > > > Well, I have concerns regarding this one. > > > > > > I don't believe it is valid to call this new function for non-idle CPUs and > > > the kerneldoc kind of says so, but the next patch doesn't actually prevent > > > it from being called for a non-idle CPU (at the time it is called in there > > > the target CPU may not be idle any more AFAICS). > > > > You are correct, but let me clarify things. > > > > We are calling this new API from the new genpd governor, which may > > have a cpumask indicating there is more than one CPU attached to its > > PM domain+sub-PM domains. In other words, we may call the API for > > another CPU than the one we are executing on. > > > > When the new genpd governor is called, all CPUs in the cpumask of the > > genpd in question, are already runtime suspended and will remain so > > throughout the decisions made by the governor. > > > > However, because of the race condition, which needs to be manged by > > the genpd backend driver and its corresponding FW, one of the CPU in > > the genpd cpumask could potentially wake up from idle when the genpd > > governor runs. However, as a part of exiting from idle, that CPU needs > > to wait for the call to pm_runtime_get_sync() to return before > > completing the exit patch of idle. This also means waiting for the > > genpd governor to finish. > > OK, so the CPU spins on a spin lock inside of the idle loop with interrupts off. Correct. This is the part that is not very nice, but ideally it should be a rather rare condition as it only happens during the last man standing point. > > > The point is, no matter what decision the governor takes under these > > circumstances, the genpd backend driver and its FW must manage this > > race condition during the last man standing. For PSCI OSI mode, it > > means that if a cluster idle state is suggested by Linux during these > > circumstances, it must be prevented and aborted. > > I would suggest putting a comment to explain that somewhere as it is > not really obvious. Let me see if can squeeze in that somewhere, probably it's best suited in the new genpd governor code somewhere. > > > > > > > In principle, the cpuidle core can store this value, say in struct > > > cpuidle_device of the given CPU, and expose a helper to access it from > > > genpd, but that would be extra overhead totally unnecessary on everthing > > > that doesn't use genpd for cpuidle. > > > > > > So maybe the driver could store it in its ->enter callback? After all, > > > the driver knows that genpd is going to be used later. > > > > This would work, but it wouldn't really change much when it comes to > > the race condition described above. > > No, it wouldn't make the race go away. > > > Of course it would turn the code > > into being more cpuidle specific, which seems reasonable to me. > > > > Anyway, if I understand your suggestion, in principle it means > > changing $subject patch in such way that the API should not take "int > > cpu" as an in-parameter, but instead only use __this_cpu() to read out > > the next event for current idle CPU. > > Yes. > > > Additionally, we need another new cpuidle API, which genpd can call to > > retrieve a new per CPU "next event data" stored by the cpuidle driver > > from its ->enter() callback. Is this a correct interpretation of your > > suggestion? > > Yes, it is. Thanks for confirming! > > Generally, something like "cpuidle, give me the wakeup time of this > CPU". And it may very well give you 0 if the CPU has woken up > already. :-) Yep, I was thinking something like that, so in principle it may minimize the window of receiving in-correct "next wakeup data" in genpd for a non-idle CPU, but again it doesn't solve the race condition. Alright, I re-spin this according to your suggestions. Thanks for reviewing! Kind regards Uffe