Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp2387642imm; Sat, 23 Jun 2018 17:17:58 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLvyrjxIwU7gnP9pS5g9J+k0wsIm4NR/AZWFoU/lBhC3d1bdTsIhi9iTh/ZJxcZODj68TyJ X-Received: by 2002:a65:550d:: with SMTP id f13-v6mr5899701pgr.180.1529799478286; Sat, 23 Jun 2018 17:17:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529799478; cv=none; d=google.com; s=arc-20160816; b=tu2qQpYgmU8bUvHNtFFBSDhboI6I26E1t+988YRu1frflAGV4G6JzmaUgpDY7622TN nmAq1N2zSOykgEhzHWRBGVsL7kvc3GOU0Kcl5PaGr/Y3YM9Pam5VXzn8R3dadDANfsvd Ejr39cyN0lXfperT34S0FvlxXLavHRKRyszCbQ0uszmmmd3Y1Q5zIuBJ58PVxqsZH6In kkAkV3qMrC5w52l414kKFKSk7RaLvV18tvIXBPwI1M/30hiRNFkHKBoa9Zb+D9CPe0J3 vs6X525SY3cpXkLeOksaEUc/pHpQjsYcDmlR+OHjHJZeRbnF3V8YVUvcaPhuOiS2ZtjN 6xZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=CE5adlgIHXt1h6LL4yX3KDAIJTA7oIOcOmt5vSdbooE=; b=IZ2LsXni6duRo1ZWgyCfawPNqH5+CmcxCfzyKO0gl93qjUIGzorHzQrhBAmNvuBNAq EeVhK3Tr5owl0xA0prfYKNm3zfHQZvetJ5ePPD1KszC/asSWZH7oopkXCglgntyYcnJ3 wIUFiH2A7pA8f6FsJyO9gkiQxZXcuxH6NY+7UfNkla9Bgc98haqNkEoy2l3xUXxjHHYm SL1X4+EK2iImKB68ANivJhXmzo9Wa71D2xED+yoJXVfGSH+FYEsRSM0UTLCu5/8drQEN qD5IWvp5HPglz+jpuibzFYPfq63RV+SodpTuy1TzZkh7Fs4h3w7JHXsCm4SldkGpHYtI Ns7A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v72-v6si10599167pfa.103.2018.06.23.17.17.04; Sat, 23 Jun 2018 17:17:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751796AbeFXAPR (ORCPT + 99 others); Sat, 23 Jun 2018 20:15:17 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:44065 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751583AbeFXAPP (ORCPT ); Sat, 23 Jun 2018 20:15:15 -0400 Received: from p4fea482e.dip0.t-ipconnect.de ([79.234.72.46] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fWsfn-0000PN-4C; Sun, 24 Jun 2018 02:14:31 +0200 Date: Sun, 24 Jun 2018 02:14:30 +0200 (CEST) From: Thomas Gleixner To: Baolin Wang cc: john.stultz@linaro.org, daniel.lezcano@linaro.org, arnd@arndb.de, tony@atomide.com, aaro.koskinen@iki.fi, linux@armlinux.org.uk, mark.rutland@arm.com, marc.zyngier@arm.com, broonie@kernel.org, paulmck@linux.vnet.ibm.com, mlichvar@redhat.com, rdunlap@infradead.org, kstewart@linuxfoundation.org, gregkh@linuxfoundation.org, pombredanne@nexb.com, thierry.reding@gmail.com, jonathanh@nvidia.com, heiko@sntech.de, linus.walleij@linaro.org, viresh.kumar@linaro.org, mingo@kernel.org, hpa@zytor.com, peterz@infradead.org, douly.fnst@cn.fujitsu.com, len.brown@intel.com, rajvi.jingar@intel.com, alexandre.belloni@bootlin.com, x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-tegra@vger.kernel.org, linux-kernel@vger.kernel.org, linux-omap@vger.kernel.org Subject: Re: [PATCH 1/8] time: Add persistent clock support In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 13 Jun 2018, Baolin Wang wrote: > Moreover we can register the clocksource with CLOCK_SOURCE_SUSPEND_NONSTOP > to be one persistent clock, then we can simplify the suspend/resume > accounting by removing CLOCK_SOURCE_SUSPEND_NONSTOP timing. After that > we can only compensate the OS time by persistent clock or RTC. That makes sense because it adds a gazillion lines of code and removes 5? Not really, > +/** > + * persistent_clock_read_data - data required to read persistent clock > + * @read: Returns a cycle value from persistent clock. > + * @last_cycles: Clock cycle value at last update. > + * @last_ns: Time value (nanoseconds) at last update. > + * @mask: Bitmask for two's complement subtraction of non 64bit clocks. > + * @mult: Cycle to nanosecond multiplier. > + * @shift: Cycle to nanosecond divisor. > + */ > +struct persistent_clock_read_data { > + u64 (*read)(void); > + u64 last_cycles; > + u64 last_ns; > + u64 mask; > + u32 mult; > + u32 shift; > +}; > +/** > + * persistent_clock - represent the persistent clock > + * @read_data: Data required to read from persistent clock. > + * @seq: Sequence counter for protecting updates. > + * @freq: The frequency of the persistent clock. > + * @wrap: Duration for persistent clock can run before wrapping. > + * @alarm: Update timeout for persistent clock wrap. > + * @alarm_inited: Indicate if the alarm has been initialized. > + */ > +struct persistent_clock { > + struct persistent_clock_read_data read_data; > + seqcount_t seq; > + u32 freq; > + ktime_t wrap; > + struct alarm alarm; > + bool alarm_inited; > +}; NAK! There is no reason to invent yet another set of data structures and yet more read functions with a sequence counter. which are just a bad and broken copy of the existing timekeeping/clocksource code. And of course the stuff is not serialized against multiple registrations, etc. etc. Plus the utter nonsense that any call site has to do the same thing over and over: register(): start_alarm_timer(); Why is this required in the first place? It's not at all. The only place where such an alarm timer will be required is when the system actually goes to suspend. Starting it at registration time is pointless and even counter productive. Assume the clocksource wraps every 2 hours. So you start it at boot time and after 119 minutes uptime the system suspends. So it will wakeup one minute later to update the clocksource. Heck no. If the timer is started when the machine actually suspends it will wake up earliest in 120 minutes. And you even add that to the TSC which does not need it at all. It will wrap in about 400 years on a 2GHZ machine. So you degrade the functionality instead of improving it. So no, this is not going anywhere. Let's look at the problem itself: You want to use one clocksource for timekeeping during runtime which is fast and accurate and another one for suspend time injection which is slower and/or less accurate because the fast one stops in suspend. Plus you need an alarmtimer which makes sure that the clocksource does not wrap around during suspend. Now lets look what we have already: Both clocksources already exist and are registered as clocksources with all required data in the clocksource core. Ergo the only sane and logical conclusion is to expand the existing infrastructure to handle that. When a clocksource is registered, then the registration function already makes decisions about using it as timekeeping clocksource. So add a few lines of code to check whether the newly registered clocksource is suitable and preferred for suspend. if (!stops_in_suspend(newcs)) { if (!suspend_cs || is_preferred_suspend_cs(newcs)) suspend_cs = newcs; } The is_preferred_suspend_cs() can be based on rating, the maximum suspend length which can be achieved or whatever is sensible. It should start of as a very simple decision function based on rating and not an prematurely overengineered monstrosity. The suspend/resume() code needs a few very simple changes: xxx_suspend(): clocksource_prepare_suspend(); Note, this is _NOT_ timekeeping_suspend() because that is invoked _AFTER_ alarmtimer_suspend(). So if an alarm timer is required it needs to be armed before that. A trivial solution might be to just call it from alarmtimer_suspend(), but that a minor detail to worry about. timekeeping_suspend() { clocksource_enter_suspend(); ... timekeeping_resume() { ... if (clocksource_leave_suspend(&nsec)) { ts_delta = ns_to_timespec64(nsec); sleeptime_injected = true; } else if (...... Now lets look at the new functions: void clocksource_prepare_suspend(void) { if (!suspend_cs) return; if (needs_alarmtimer(suspend_cs)) start_suspend_alarm(suspend_cs); } void clocksource_enter_suspend(void) { if (!suspend_cs) return; suspend_start = suspend_cs->read(); } bool clocksource_leave_suspend(u64 *nsec) { u64 now, delta; if (!suspend_cs) return false; now = suspend_cs->read(); delta = clocksource_delta(now, suspend_start, suspend_cs->mask); *nsec = mul_u64_u32_shr(delta, suspend_cs->mult, suspend_cs->shift); return true; } See? It does not need any of this totally nonsensical stuff in your registration function and not any new read functions and whatever, because it simply can use the bog standard mult/shift values. Why? Because the conversion above can cope with a full 64 bit * 32 bit multiply without falling apart. It's already there in timekeeping_resume() otherwise resuming with a NONSTOP TSC would result in bogus sleep times after a few minutes. It's slower than the normal clocksource conversion which is optimized for performance, but thats completely irrelevant on resume. This whole blurb about requiring separate mult/shift values is just plain drivel. Plus any reasonably broad clocksource will not need an alarmtimer at all. Because the only reason it is needed is when the clocksource itself wraps around. And that has absolutely nothing to do with mult/shift values. That just depends on the frequency and the bitwidth of the counter, So it does not need an update function either because in case of broad enough clocksources there is absolutely no need for update and in case of wrapping ones the alarmtimer brings it out of suspend on time. And because the only interesting thing is the delta between suspend and resume this is all a complete non issue. The clocksource core already has all the registration/unregistration functionality plus an interface to reconfigure the frequency, so clocksources can come and go and be reconfigured and all of this just works. Once the extra few lines of code are in place, then you can go and cleanup the existing mess of homebrewn interfaces and claim that this is consolidation and simplification. What's wrong with you people? Didn't they teach you in school that the first thing to do is proper problem and code analysis? If they did not, go back to them and ask your money back, I'm really tired of these overengineered trainwrecks which are then advertised with bullshit marketing like the best invention since sliced bread. This might work in random corporates, but not here. Thanks, tglx