Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp277195imu; Fri, 25 Jan 2019 02:05:08 -0800 (PST) X-Google-Smtp-Source: ALg8bN5nemCZJ3q6XrrZyudatGpxbP1DBCDpcPMDOmhu55rjctxesI2VGgKLDqgLf4hXKQ4DxwTb X-Received: by 2002:a17:902:4601:: with SMTP id o1mr10274405pld.243.1548410708807; Fri, 25 Jan 2019 02:05:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548410708; cv=none; d=google.com; s=arc-20160816; b=yFkOY92llVeuTq+YxiQpgijsXGJEOmYS9T5iQk3LHgmi1ARswwxnV9L97ptg+8nuuS LndqZBV3JJ9jBmnBpAuff3gEPG01oXAirgw482OZeq2cl8+nowrexFVLQDElGkEVoy0l 1/0xQLd9z3V0ndsnfr9nVt78n4ABSX+r0rkMK6yFB0gXL/x/POX6rsU5abAD1IfnrLlF r+bGyUsRgic0sRipcDdXCgSCLQ6J4XaigzSSZKewiX5J75vAEH6Y6CW+YWcD47Yuj6B+ bH+LZ5Iyhv8kM5rnvKbeM6UErygKlZBb/mvo+fUhUBgrbqtkmwN9UqFisVeWACGvOwGz uY7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=36F1UyrhDpsJESxu4ZGkKTzuU4bNd6HP1thiwh5cFRo=; b=ddytOAFBw1xcZFJARCcknOOLVdA7K3HD3M8lFCYTEgehOfteLkRs2HIVoILMUOTNC2 yoEb1Pi6m7PP9g4fe95R17ald4MnuBhswut9HRIEsNMP/D4E6SKsCES7fqX4m6Y2EB+o z47t6EOs8wuCXNL7aT5vhWeh+umEFIUZ0DgJYYXLx+ASWH6Ts4r+o8ApM8CyxiKJISXL zqggd4GheePhcWBH3/ILgdR0nRXFBBrNF4yHTM/cp9GPJZ1mqh1nnZ2cXdQjJDFi/yHT tNhO7hm8Uq14uf1mfYuy9H/LaRDDMkbpGFKBISfdoAQ5nSy3DAEHm3VbcYncBlwjA2eo kkjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ENrEsONO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w67si5595650pgb.45.2019.01.25.02.04.53; Fri, 25 Jan 2019 02:05:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ENrEsONO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728920AbfAYKEo (ORCPT + 99 others); Fri, 25 Jan 2019 05:04:44 -0500 Received: from mail-ua1-f68.google.com ([209.85.222.68]:42273 "EHLO mail-ua1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728238AbfAYKEo (ORCPT ); Fri, 25 Jan 2019 05:04:44 -0500 Received: by mail-ua1-f68.google.com with SMTP id d21so3040895uap.9 for ; Fri, 25 Jan 2019 02:04:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=36F1UyrhDpsJESxu4ZGkKTzuU4bNd6HP1thiwh5cFRo=; b=ENrEsONOxK7lU6jrUha4iAGX+QSos40HGj9coQLsdczjqQ4ivfEoIVMme5KT5qeEyf UbFSQV4DrpmUG/11C0Qjhv/sg7fQ/wvvmWBYkRG+IqjpeQbBTE6rRFyaGYOneyIsIBDc FvWxIYjKzfZvLHIyKv24yJOOvPZ5jYSnBONTI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=36F1UyrhDpsJESxu4ZGkKTzuU4bNd6HP1thiwh5cFRo=; b=NA6bNEbg3cv0+S80IjQPGS2nVDvQ1jgKAGLdAIrUuJNfhTuneV3WuhZiNvuZr0jT70 nj+lkMkrluSwPxGX3qLBRnofrlbXIxnfmL2MP5M3hzIhPSZyEhVG63oEyYOZ6lQeULmz vdUcZZ0ed0M4LYUpxYr/ungzSKpzp81mXIjFw79wBYcFrws/VSJL4xGKpwsrlT4OjvqA Gsv+YhWmxuftcw+/+N8H+nOSGLg3KL7pwA2Ai2qFRuvrfNkkXzx7VD1DQ/QQTVnSCn0C gr1fkLCwhuogGvldu7Q/p8Ir0KEcgU8yxiv8RUcyb0AMvrAx+9lj83gq8bkxkOs5HPzk Gycg== X-Gm-Message-State: AJcUukdD+ZLmhkfzYXoaoo/HEVzrSGyTztbuuUOxddY4Aeu3zxQYssDn 04BApRffW3TkDxGk6oQvrE/hA1Qw15HEjwpVjsTH8Q== X-Received: by 2002:ab0:3484:: with SMTP id c4mr4314298uar.39.1548410683029; Fri, 25 Jan 2019 02:04:43 -0800 (PST) MIME-Version: 1.0 References: <20181129174700.16585-1-ulf.hansson@linaro.org> <20181129174700.16585-4-ulf.hansson@linaro.org> <119031115.C8gHTGbYmQ@aspire.rjw.lan> In-Reply-To: From: Ulf Hansson Date: Fri, 25 Jan 2019 11:04:06 +0100 Message-ID: Subject: Re: [PATCH v10 03/27] timer: Export next wakeup time of a CPU To: "Rafael J. Wysocki" Cc: "Rafael J. Wysocki" , Sudeep Holla , Lorenzo Pieralisi , Mark Rutland , Daniel Lezcano , Linux PM , "Raju P . L . S . S . S . N" , Stephen Boyd , Tony Lindgren , Kevin Hilman , Lina Iyer , Viresh Kumar , Vincent Guittot , Geert Uytterhoeven , Linux ARM , linux-arm-msm , Linux Kernel Mailing List , Lina Iyer , Thomas Gleixner , Frederic Weisbecker , Ingo Molnar Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [...] > > > > > +/** > > > > > + * tick_nohz_get_next_wakeup - return the next wake up of the CPU > > > > > + * @cpu: the particular CPU to get next wake up for > > > > > + * > > > > > + * Called for idle CPUs only. > > > > > + */ > > > > > +ktime_t tick_nohz_get_next_wakeup(int cpu) > > > > > +{ > > > > > + struct clock_event_device *dev = per_cpu(tick_cpu_device.evtdev, cpu); > > > > > + > > > > > + return dev->next_event; > > > > > +} > > > > > + > > > > > static void tick_nohz_account_idle_ticks(struct tick_sched *ts) > > > > > { > > > > > #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE > > > > > > > > > > > > > Well, I have concerns regarding this one. > > > > > > > > I don't believe it is valid to call this new function for non-idle CPUs and > > > > the kerneldoc kind of says so, but the next patch doesn't actually prevent > > > > it from being called for a non-idle CPU (at the time it is called in there > > > > the target CPU may not be idle any more AFAICS). > > > > > > You are correct, but let me clarify things. > > > > > > We are calling this new API from the new genpd governor, which may > > > have a cpumask indicating there is more than one CPU attached to its > > > PM domain+sub-PM domains. In other words, we may call the API for > > > another CPU than the one we are executing on. > > > > > > When the new genpd governor is called, all CPUs in the cpumask of the > > > genpd in question, are already runtime suspended and will remain so > > > throughout the decisions made by the governor. > > > > > > However, because of the race condition, which needs to be manged by > > > the genpd backend driver and its corresponding FW, one of the CPU in > > > the genpd cpumask could potentially wake up from idle when the genpd > > > governor runs. However, as a part of exiting from idle, that CPU needs > > > to wait for the call to pm_runtime_get_sync() to return before > > > completing the exit patch of idle. This also means waiting for the > > > genpd governor to finish. > > > > OK, so the CPU spins on a spin lock inside of the idle loop with interrupts off. > > Correct. > > This is the part that is not very nice, but ideally it should be a > rather rare condition as it only happens during the last man standing > point. > > > > > > The point is, no matter what decision the governor takes under these > > > circumstances, the genpd backend driver and its FW must manage this > > > race condition during the last man standing. For PSCI OSI mode, it > > > means that if a cluster idle state is suggested by Linux during these > > > circumstances, it must be prevented and aborted. > > > > I would suggest putting a comment to explain that somewhere as it is > > not really obvious. > > Let me see if can squeeze in that somewhere, probably it's best suited > in the new genpd governor code somewhere. > > > > > > > > > > > In principle, the cpuidle core can store this value, say in struct > > > > cpuidle_device of the given CPU, and expose a helper to access it from > > > > genpd, but that would be extra overhead totally unnecessary on everthing > > > > that doesn't use genpd for cpuidle. > > > > > > > > So maybe the driver could store it in its ->enter callback? After all, > > > > the driver knows that genpd is going to be used later. > > > > > > This would work, but it wouldn't really change much when it comes to > > > the race condition described above. > > > > No, it wouldn't make the race go away. > > > > > Of course it would turn the code > > > into being more cpuidle specific, which seems reasonable to me. > > > > > > Anyway, if I understand your suggestion, in principle it means > > > changing $subject patch in such way that the API should not take "int > > > cpu" as an in-parameter, but instead only use __this_cpu() to read out > > > the next event for current idle CPU. > > > > Yes. I have looked closer to this and it turned out that it seems that I should probably not need introduce an entirely new thing here. Instead I should likely be able to re-factor the current tick_nohz_get_sleep_length() and tick_nohz_next_event(), as those are in principle doing the similar things as I need. So I started hacking on that, when Daniel Lezcano told me that he already have a patch doing exactly what I want. :-) However, in the context of his "next wakeup prediction" work, but that shouldn't matter. If I can make it work, I will fold in his patch in the next version of the series instead. Please tell if you already at this point, see any issues with this approach. > > > > > Additionally, we need another new cpuidle API, which genpd can call to > > > retrieve a new per CPU "next event data" stored by the cpuidle driver > > > from its ->enter() callback. Is this a correct interpretation of your > > > suggestion? > > > > Yes, it is. > > Thanks for confirming! > > > > > Generally, something like "cpuidle, give me the wakeup time of this > > CPU". And it may very well give you 0 if the CPU has woken up > > already. :-) > > Yep, I was thinking something like that, so in principle it may > minimize the window of receiving in-correct "next wakeup data" in > genpd for a non-idle CPU, but again it doesn't solve the race > condition. > Kind regards Uffe