Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp289842imu; Fri, 25 Jan 2019 02:19:44 -0800 (PST) X-Google-Smtp-Source: ALg8bN7rIO2q3lC+LnHy/LzXVOON+hjqczvojXW3W3aqetIW7I8Y5RRuKoid+ZTlyI80YU+TLZdX X-Received: by 2002:a62:2a4b:: with SMTP id q72mr10208979pfq.61.1548411584718; Fri, 25 Jan 2019 02:19:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548411584; cv=none; d=google.com; s=arc-20160816; b=0lUWzRHTt0tpWBDXNl+jAdXm9+S/rh+yZP+CnASIqcardIC3rSb20CPf7P3N4JzqKS 0xnilTEUxX1uykgTln0L8u1LIeQ3wf5CnRlOhC3dtMGotFW73Cu4DC7anYwIT2hcxe0q RfT5HHYh5XRCbDo+Iys3biBcwYybn+6WmVERkPlxeA3rVwhWhxzriXwonYh9yFQq9fxI 2cJkIhkMcPOV7gfKISaizkqjnOLsNCrKlNB4dpjZpc6z7NIQQgkh6Kg/giZmv2WYhOhY WLwGTXCb0yef0FJDQNEexw1La/6DoOuwA2uBuKP/CNGH49WhBIZx2W7pJoyTshxEydOC d5Ow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=cieXeQ//y7xkUGvpDjyVuY5IzlnKYqpyI5hEWYBCr7M=; b=IBdLxGU4Wgy/Nyvt0DOKskm2srL7c93sR4E1ElqRV4y0ZrzuZJvH6SpRQhtYUrkMNu sWg69JnAUYclnKjG7zWHeiFnt4WXB5XZVOXUIj2x7/57pZs5C5u5yTEGRuJlg/e2hlYH HRGFayqEVeNBgBFJMKbvRLVltRuBeV6pXP4EOQpPlg+fxEg4WYiN8YwikshVaZhFcmGi jy1vZLGA+mkJ35gEoHp2KlI+b5NgffU9HXVuKGTfiNvf+v3KwKSZjvK8ZLPWM9qmNE/A UfsgBqDOiU/EQKqL1XWVIMzdR3drM5o4aea1APTYU4UYDwL/c9WqXZk9ga+ijN5QzBCh EeZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o12si25733426plg.250.2019.01.25.02.19.29; Fri, 25 Jan 2019 02:19:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727264AbfAYKTG (ORCPT + 99 others); Fri, 25 Jan 2019 05:19:06 -0500 Received: from mail-ot1-f68.google.com ([209.85.210.68]:40470 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726111AbfAYKTG (ORCPT ); Fri, 25 Jan 2019 05:19:06 -0500 Received: by mail-ot1-f68.google.com with SMTP id s5so8038756oth.7; Fri, 25 Jan 2019 02:19:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cieXeQ//y7xkUGvpDjyVuY5IzlnKYqpyI5hEWYBCr7M=; b=ieEbK9WT99i+De/a6/qsw+CjIERtD99tEVR3gjsqYqEi/3tokG6Qq7PD2W/uTgUCUO /QPeiIJO9HZb+etoTN0rsGWiB4JYzOyUhJXaf5c23rYMth8n++wA/uWITeBA1YxeFJqQ MlhtyNiYZ32vUNCIEPyWBrTkMGwULfxTIrqe3Amw9RUSeM12AjDm/DATZ/0+28BWLhMW XzOLqdDgXf5t6L9tkyBSzdKVKM2Uk2rIuXmR7yoi/xI7kuMr/ZPl6RpdiqSzGZMuW3+1 dEzc17k2M6nZe/uwdEcQAwSc7jqlrq22OLzo5SDo7tV9cEr8b0gvh5yF0fNUoVrSQrdH cg4w== X-Gm-Message-State: AJcUukdkMMtu8Lhf6U1JsRWE4fYfag4PdcLjKoDAw7jjLFAwfNqRqVl+ FSTqAjj43KJCVrzkCaRt8zzmsdrIit539jQnjNg= X-Received: by 2002:a9d:2062:: with SMTP id n89mr6920554ota.244.1548411544379; Fri, 25 Jan 2019 02:19:04 -0800 (PST) MIME-Version: 1.0 References: <20181129174700.16585-1-ulf.hansson@linaro.org> <20181129174700.16585-4-ulf.hansson@linaro.org> <119031115.C8gHTGbYmQ@aspire.rjw.lan> In-Reply-To: From: "Rafael J. Wysocki" Date: Fri, 25 Jan 2019 11:18:53 +0100 Message-ID: Subject: Re: [PATCH v10 03/27] timer: Export next wakeup time of a CPU To: Ulf Hansson Cc: "Rafael J. Wysocki" , "Rafael J. Wysocki" , Sudeep Holla , Lorenzo Pieralisi , Mark Rutland , Daniel Lezcano , Linux PM , "Raju P . L . S . S . S . N" , Stephen Boyd , Tony Lindgren , Kevin Hilman , Lina Iyer , Viresh Kumar , Vincent Guittot , Geert Uytterhoeven , Linux ARM , linux-arm-msm , Linux Kernel Mailing List , Lina Iyer , Thomas Gleixner , Frederic Weisbecker , Ingo Molnar Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 25, 2019 at 11:04 AM Ulf Hansson wrote: > > [...] > > > > > > > +/** > > > > > > + * tick_nohz_get_next_wakeup - return the next wake up of the CPU > > > > > > + * @cpu: the particular CPU to get next wake up for > > > > > > + * > > > > > > + * Called for idle CPUs only. > > > > > > + */ > > > > > > +ktime_t tick_nohz_get_next_wakeup(int cpu) > > > > > > +{ > > > > > > + struct clock_event_device *dev = per_cpu(tick_cpu_device.evtdev, cpu); > > > > > > + > > > > > > + return dev->next_event; > > > > > > +} > > > > > > + > > > > > > static void tick_nohz_account_idle_ticks(struct tick_sched *ts) > > > > > > { > > > > > > #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE > > > > > > > > > > > > > > > > Well, I have concerns regarding this one. > > > > > > > > > > I don't believe it is valid to call this new function for non-idle CPUs and > > > > > the kerneldoc kind of says so, but the next patch doesn't actually prevent > > > > > it from being called for a non-idle CPU (at the time it is called in there > > > > > the target CPU may not be idle any more AFAICS). > > > > > > > > You are correct, but let me clarify things. > > > > > > > > We are calling this new API from the new genpd governor, which may > > > > have a cpumask indicating there is more than one CPU attached to its > > > > PM domain+sub-PM domains. In other words, we may call the API for > > > > another CPU than the one we are executing on. > > > > > > > > When the new genpd governor is called, all CPUs in the cpumask of the > > > > genpd in question, are already runtime suspended and will remain so > > > > throughout the decisions made by the governor. > > > > > > > > However, because of the race condition, which needs to be manged by > > > > the genpd backend driver and its corresponding FW, one of the CPU in > > > > the genpd cpumask could potentially wake up from idle when the genpd > > > > governor runs. However, as a part of exiting from idle, that CPU needs > > > > to wait for the call to pm_runtime_get_sync() to return before > > > > completing the exit patch of idle. This also means waiting for the > > > > genpd governor to finish. > > > > > > OK, so the CPU spins on a spin lock inside of the idle loop with interrupts off. > > > > Correct. > > > > This is the part that is not very nice, but ideally it should be a > > rather rare condition as it only happens during the last man standing > > point. > > > > > > > > > The point is, no matter what decision the governor takes under these > > > > circumstances, the genpd backend driver and its FW must manage this > > > > race condition during the last man standing. For PSCI OSI mode, it > > > > means that if a cluster idle state is suggested by Linux during these > > > > circumstances, it must be prevented and aborted. > > > > > > I would suggest putting a comment to explain that somewhere as it is > > > not really obvious. > > > > Let me see if can squeeze in that somewhere, probably it's best suited > > in the new genpd governor code somewhere. > > > > > > > > > > > > > > > In principle, the cpuidle core can store this value, say in struct > > > > > cpuidle_device of the given CPU, and expose a helper to access it from > > > > > genpd, but that would be extra overhead totally unnecessary on everthing > > > > > that doesn't use genpd for cpuidle. > > > > > > > > > > So maybe the driver could store it in its ->enter callback? After all, > > > > > the driver knows that genpd is going to be used later. > > > > > > > > This would work, but it wouldn't really change much when it comes to > > > > the race condition described above. > > > > > > No, it wouldn't make the race go away. > > > > > > > Of course it would turn the code > > > > into being more cpuidle specific, which seems reasonable to me. > > > > > > > > Anyway, if I understand your suggestion, in principle it means > > > > changing $subject patch in such way that the API should not take "int > > > > cpu" as an in-parameter, but instead only use __this_cpu() to read out > > > > the next event for current idle CPU. > > > > > > Yes. > > I have looked closer to this and it turned out that it seems that I > should probably not need introduce an entirely new thing here. Instead > I should likely be able to re-factor the current > tick_nohz_get_sleep_length() and tick_nohz_next_event(), as those are > in principle doing the similar things as I need. So I started hacking > on that, when Daniel Lezcano told me that he already have a patch > doing exactly what I want. :-) However, in the context of his "next > wakeup prediction" work, but that shouldn't matter. > > If I can make it work, I will fold in his patch in the next version of > the series instead. > > Please tell if you already at this point, see any issues with this approach. Not in principle as long as you do that in the context of the cpuidle framework. That is, I still would like to have this "cpuidle, give me the wakeup time of this CPU" I/F to the genpd governor, but you can do the above to implement it as long as I'm concerned.