Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp4142729imm; Mon, 25 Jun 2018 10:24:28 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJnmMGgUKG9gKrbjsbWevYtpcUkzJ/UHR/onuf0lSkPN8IPNF0pCx2TLnVCUBv3EznmTPWG X-Received: by 2002:a17:902:5a83:: with SMTP id r3-v6mr11181195pli.78.1529947468892; Mon, 25 Jun 2018 10:24:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529947468; cv=none; d=google.com; s=arc-20160816; b=k+T1Oei/koHbuY00ftMoHR3jzTfzZY1ZfVzl/4cwbAjUivEnodsAb2iFtZtVckB+uZ ZeR4cuKjF3feCc74qWN2eDZcrZVW3ThupfetbPTbpE0oPanKUcm3s7e2fXqeliJOp+CP uzGKy+Cj1wXWhHMX5MW62d6F1cYCR6zoGdYinekDsEZ8+q5Bt27hrXwEgMFPG5eiyU08 y3RBYE2Xx2NmA9mLhan89K6XFzN6mSTjdecypmT512xRCoACYLBRPX4UdeEs/1a+dZYA re9XtW/MR04VKdqBr9g+n09+iaaOqYwY4mRtcXPzTPntJsb/lTjJJ1Zi3IrLXMJpuzW7 kPaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=VIsYKmRj3P8bTRimULhECiDoE4k8r6cNBbjxz/IgH54=; b=Q7EwRlWY6oKDMJiYJf5w2XFB1en7nN2CVikXIoUxseUjqw1RsVjFTxC7XosS1QU7KJ JuSI+0OlkG+qRMX9bkp4cUwBK3Alze8plL0J/krUET2f2nR/dX+/cQSPrXQSGdHxntKz vV41O33Z34UIU8wZMMTkUICA4s4sfhn1qPDzpEsxUZcWr3YizNWhFt9psBT26lCbWt2t 9/r0jYrE1QQ6ZgWpcbO2/Vh2GhVB4VHwRbvPE0dP/8BsrwbD/bliI8+ZEF8qbvG/sjLN XZfuL47/hKjfT4fJ/q/Eg3mufxCMNmzGKVxFxaRlrTlwtrRe+OkDczM0S8iEx91LY/zb 7u8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OaUVU4eD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si14141881plo.20.2018.06.25.10.24.14; Mon, 25 Jun 2018 10:24:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OaUVU4eD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755498AbeFYRX2 (ORCPT + 99 others); Mon, 25 Jun 2018 13:23:28 -0400 Received: from mail-wr0-f175.google.com ([209.85.128.175]:37376 "EHLO mail-wr0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755041AbeFYRXZ (ORCPT ); Mon, 25 Jun 2018 13:23:25 -0400 Received: by mail-wr0-f175.google.com with SMTP id k6-v6so14483754wrp.4 for ; Mon, 25 Jun 2018 10:23:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=VIsYKmRj3P8bTRimULhECiDoE4k8r6cNBbjxz/IgH54=; b=OaUVU4eD9bYyv0jgdP6+5zgYxLXT+7ceT4GxlplCIi0A30OJx49LdodZ8JrhUd9bSy /BrdrEGUehnmJv+n0c2pqOdSFeyoNUG6yUCtP3gHb2rVmQM7RIWRPjDbfLQnf3CDOSXT v8bpm+46ziHbaQn4IOSDyFFiB4xVMk9gHYH0A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=VIsYKmRj3P8bTRimULhECiDoE4k8r6cNBbjxz/IgH54=; b=pNfCeiAF+MKfcBC1Ei6Rb5hgNU//e2avKJYvBZ20qYWZRThtnTmLuWrGeQtWFfCHL8 tmYV8PcHy3/JR9vUCfT9/NkAOxu52B7ybX8/YXb2nb7MdY/v6GFdj3jTrwZAJ/7VSZze 9RcjEVDDZVIsvKDY6KmG72u30DSw1GxbNima8mWbOYfxX3NKAnId4rXbFTrf32MDqFpP qRPG87v0QpD6fY3lg4ul/NTbRGysYXSEjX0+xTfFDwkpWr/QqhlZUtGIzNbeX3WBgvtD H8LmxIaCubVzzJ3dyrzcJvtQyHlAp3iAaXu/sdEoe7tLjCHNtlz0fFaxvSlhCz90O3g0 d7mw== X-Gm-Message-State: APt69E21ypuBvCLnoeYESPQAUHc12nMF63LaaqaYI37EkfjmVqNLaJoH 0wo5tDGgcNXjMoo2VE5rHmo8lHEj9cfjEVyvNy6Odw== X-Received: by 2002:adf:ab95:: with SMTP id s21-v6mr5343044wrc.90.1529947403924; Mon, 25 Jun 2018 10:23:23 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a1c:c202:0:0:0:0:0 with HTTP; Mon, 25 Jun 2018 10:23:22 -0700 (PDT) In-Reply-To: References: From: John Stultz Date: Mon, 25 Jun 2018 10:23:22 -0700 Message-ID: Subject: Re: [PATCH 1/8] time: Add persistent clock support To: Thomas Gleixner Cc: Baolin Wang , Daniel Lezcano , Arnd Bergmann , Tony Lindgren , aaro.koskinen@iki.fi, Russell King - ARM Linux , Mark Rutland , Marc Zyngier , Mark Brown , "Paul E. McKenney" , Miroslav Lichvar , Randy Dunlap , Kate Stewart , Greg KH , pombredanne@nexb.com, Thierry Reding , Jon Hunter , =?UTF-8?Q?Heiko_St=C3=BCbner?= , Linus Walleij , Viresh Kumar , Ingo Molnar , "H. Peter Anvin" , Peter Zijlstra , douly.fnst@cn.fujitsu.com, Len Brown , rajvi.jingar@intel.com, alexandre.belloni@bootlin.com, x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-tegra@vger.kernel.org, lkml , linux-omap@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 23, 2018 at 5:14 PM, Thomas Gleixner wrote: > On Wed, 13 Jun 2018, Baolin Wang wrote: >> Moreover we can register the clocksource with CLOCK_SOURCE_SUSPEND_NONSTOP >> to be one persistent clock, then we can simplify the suspend/resume >> accounting by removing CLOCK_SOURCE_SUSPEND_NONSTOP timing. After that >> we can only compensate the OS time by persistent clock or RTC. > > That makes sense because it adds a gazillion lines of code and removes 5? > Not really, > >> +/** >> + * persistent_clock_read_data - data required to read persistent clock >> + * @read: Returns a cycle value from persistent clock. >> + * @last_cycles: Clock cycle value at last update. >> + * @last_ns: Time value (nanoseconds) at last update. >> + * @mask: Bitmask for two's complement subtraction of non 64bit clocks. >> + * @mult: Cycle to nanosecond multiplier. >> + * @shift: Cycle to nanosecond divisor. >> + */ >> +struct persistent_clock_read_data { >> + u64 (*read)(void); >> + u64 last_cycles; >> + u64 last_ns; >> + u64 mask; >> + u32 mult; >> + u32 shift; >> +}; >> +/** >> + * persistent_clock - represent the persistent clock >> + * @read_data: Data required to read from persistent clock. >> + * @seq: Sequence counter for protecting updates. >> + * @freq: The frequency of the persistent clock. >> + * @wrap: Duration for persistent clock can run before wrapping. >> + * @alarm: Update timeout for persistent clock wrap. >> + * @alarm_inited: Indicate if the alarm has been initialized. >> + */ >> +struct persistent_clock { >> + struct persistent_clock_read_data read_data; >> + seqcount_t seq; >> + u32 freq; >> + ktime_t wrap; >> + struct alarm alarm; >> + bool alarm_inited; >> +}; > > NAK! > > There is no reason to invent yet another set of data structures and yet > more read functions with a sequence counter. which are just a bad and > broken copy of the existing timekeeping/clocksource code. And of course the > stuff is not serialized against multiple registrations, etc. etc. > > Plus the utter nonsense that any call site has to do the same thing over > and over: > > register(): > start_alarm_timer(); > > Why is this required in the first place? It's not at all. The only place > where such an alarm timer will be required is when the system actually goes > to suspend. Starting it at registration time is pointless and even counter > productive. Assume the clocksource wraps every 2 hours. So you start it at > boot time and after 119 minutes uptime the system suspends. So it will > wakeup one minute later to update the clocksource. Heck no. If the timer is > started when the machine actually suspends it will wake up earliest in 120 > minutes. > > And you even add that to the TSC which does not need it at all. It will > wrap in about 400 years on a 2GHZ machine. So you degrade the functionality > instead of improving it. > > So no, this is not going anywhere. > > Let's look at the problem itself: > > You want to use one clocksource for timekeeping during runtime which is > fast and accurate and another one for suspend time injection which is > slower and/or less accurate because the fast one stops in suspend. > > Plus you need an alarmtimer which makes sure that the clocksource does > not wrap around during suspend. > > Now lets look what we have already: > > Both clocksources already exist and are registered as clocksources with > all required data in the clocksource core. > > Ergo the only sane and logical conclusion is to expand the existing > infrastructure to handle that. > > When a clocksource is registered, then the registration function already > makes decisions about using it as timekeeping clocksource. So add a few > lines of code to check whether the newly registered clocksource is suitable > and preferred for suspend. > > if (!stops_in_suspend(newcs)) { > if (!suspend_cs || is_preferred_suspend_cs(newcs)) > suspend_cs = newcs; > } > > The is_preferred_suspend_cs() can be based on rating, the maximum suspend > length which can be achieved or whatever is sensible. It should start of as > a very simple decision function based on rating and not an prematurely > overengineered monstrosity. > > The suspend/resume() code needs a few very simple changes: > > xxx_suspend(): > clocksource_prepare_suspend(); > > Note, this is _NOT_ timekeeping_suspend() because that is invoked _AFTER_ > alarmtimer_suspend(). So if an alarm timer is required it needs to be > armed before that. A trivial solution might be to just call it from > alarmtimer_suspend(), but that a minor detail to worry about. > > timekeeping_suspend() > { > clocksource_enter_suspend(); > ... > > timekeeping_resume() > { > ... > if (clocksource_leave_suspend(&nsec)) { > ts_delta = ns_to_timespec64(nsec); > sleeptime_injected = true; > } else if (...... > > > Now lets look at the new functions: > > void clocksource_prepare_suspend(void) > { > if (!suspend_cs) > return; > > if (needs_alarmtimer(suspend_cs)) > start_suspend_alarm(suspend_cs); > } > > void clocksource_enter_suspend(void) > { > if (!suspend_cs) > return; > suspend_start = suspend_cs->read(); > } > > bool clocksource_leave_suspend(u64 *nsec) > { > u64 now, delta; > > if (!suspend_cs) > return false; > > now = suspend_cs->read(); > delta = clocksource_delta(now, suspend_start, suspend_cs->mask); > *nsec = mul_u64_u32_shr(delta, suspend_cs->mult, suspend_cs->shift); > return true; > } > > See? It does not need any of this totally nonsensical stuff in your > registration function and not any new read functions and whatever, because > it simply can use the bog standard mult/shift values. Why? Because the > conversion above can cope with a full 64 bit * 32 bit multiply without > falling apart. It's already there in timekeeping_resume() otherwise > resuming with a NONSTOP TSC would result in bogus sleep times after a few > minutes. It's slower than the normal clocksource conversion which is > optimized for performance, but thats completely irrelevant on resume. This > whole blurb about requiring separate mult/shift values is just plain > drivel. > > Plus any reasonably broad clocksource will not need an alarmtimer at > all. Because the only reason it is needed is when the clocksource itself > wraps around. And that has absolutely nothing to do with mult/shift > values. That just depends on the frequency and the bitwidth of the counter, > > So it does not need an update function either because in case of broad > enough clocksources there is absolutely no need for update and in case of > wrapping ones the alarmtimer brings it out of suspend on time. And because > the only interesting thing is the delta between suspend and resume this is > all a complete non issue. > > The clocksource core already has all the registration/unregistration > functionality plus an interface to reconfigure the frequency, so > clocksources can come and go and be reconfigured and all of this just > works. > > Once the extra few lines of code are in place, then you can go and cleanup > the existing mess of homebrewn interfaces and claim that this is > consolidation and simplification. > > > > What's wrong with you people? Didn't they teach you in school that the > first thing to do is proper problem and code analysis? If they did not, go > back to them and ask your money back, > > I'm really tired of these overengineered trainwrecks which are then > advertised with bullshit marketing like the best invention since sliced > bread. This might work in random corporates, but not here. > > Thomas, what is wrong with *you*? This is completely unnecessary. First of all, I sent Baolin on this approach, as I was getting sick of seeing the suspend/resume timing continually adding extra warts and complexity to handle odd hardware that needs some solution. So I suggested he clean that up in a way that lets these multiple ways of solving the problem be done in device specific code w/o adding more complexity to the core logic - in a fashion how the clocksources allowed us to support nice fast/well-behaving hardware w/o preventing ugly-but-neccessary solutions like the pit clocksource. Now, I've also been quite a poor maintainer of late, and haven't done much to help with pre-review of Baloin's code. So that's on me, not him. Admittedly, my taste isn't always great, so there is likely to be a better approach, and having your guidance here is *really* valued. I just wish we could get it without all this unnecessary vinegar. I was really lucky to have your guidance and support when I was starting in the community. Your helping bring me up as a co-maintainer (only a a relatively small set of responsibility compared to your much wider maintainership), does let me see that having the responsibility of billions of devices running your code is immense and comes with no small amount of stress. Juggling the infinite incoming stream of review requests on naieve or otherwise lacking code with other important development priorities is a major burden, so *I really get how frustrating it is*. And its super annoying to have lots of short-term development requests being thrown at you when you're the one who has to maintain it for the long term. But long-long-term, no one is going to be a maintainer forever, and we are not going to develop more olympic swimmers if we keep peeing in the pool. Baolin: My apologies for leading you poorly here. Work on something else for a day to let the flames burn off you, then review Thomas' technical feedback, ignoring the harsh style, and try to address his technical comments with the approach. I'll try to do better in finding time to review your work. -john