Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933555AbbFJJK7 (ORCPT ); Wed, 10 Jun 2015 05:10:59 -0400 Received: from ns.horizon.com ([71.41.210.147]:41001 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S933509AbbFJJKg (ORCPT ); Wed, 10 Jun 2015 05:10:36 -0400 Date: 10 Jun 2015 04:47:03 -0400 Message-ID: <20150610084703.3659.qmail@ns.horizon.com> From: "George Spelvin" To: linux@horizon.com, mingo@kernel.org Subject: Re: Discussion: quick_pit_calibrate is slow Cc: a.p.zijlstra@chello.nl, adrian.hunter@intel.com, ak@linux.intel.com, akpm@linux-foundation.org, arjan@infradead.org, bp@alien8.de, hpa@zytor.com, linux-kernel@vger.kernel.org, luto@amacapital.net, penberg@iki.fi, tglx@linutronix.de, torvalds@linux-foundation.org In-Reply-To: <20150610073009.GA14879@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4433 Lines: 104 Ingo Molnar wrote: >* George Spelvin wrote: > As a side note: so VMs often want to skip the whole calibration business, > because they are running on a well-calibrated host. > 1,000 msecs is also an eternity: consider for example the KVM + tools/kvm > based "Clear Containers" feature from Arjan: > ... which boots up a generic Linux kernel to generic Linux user-space in 32 > milliseconds, i.e. it boots in 0.03 seconds (!). Agreed, if you're paravirtualized, you can just pass this stuff in from the host. But there's plenty of hardware virtualization that boots a generic Linux. I pulled generous numbers out of my ass because I didn't want to over-reach in the argument that it's taking too long. The shorter the boot time, the stronger the point. >> With a total of 0.84 us of read uncertaity (1/12 of quick_pit_calibrate >> currently), we can get within 500 ppm within 1.75 us. Or do better >> within 5 or 10. > (msec you mean I suspect?) Yes, typo; that should be 1.75 ms. >> The loop I'd write would start the PIC (and the RTC, if we want to) >> and then go round-robin reading all the time sources and associated >> TSC values. > I'd just start with the PIT to have as few balls in flight as possible. Once I get the loop structured properly, additional timers really aren't a problem. The biggest PITA is the PM_TMR and all its brokenness (do I have a PIIX machine in the closet somewhere?), but the quick_pit_calibrate patch I already posted to LKML shows how to handle that. I set up a small circular buffer of captured values, and when I'm (say) three captures past the "interesting" one, go back and see if the reads look good. > Could you please structure it the following way: > > - first a patch that fixes bogus comments about the current code. It has > bitrotten and if we change it significantly we better have a well > documented starting point that is easier to compare against. > > - then a patch that introduces your more accurate calibration method and > uses it as the first method to calibrate. If it fails (and it should have a > notion of failing) then it should fall back to the other two methods. > > - possibly add a boot option to skip your new calibration method - > i.e. to make the kernel behave in the old way. This would be useful > for tracking down any regressions in this. > > - then maybe add a patch for the RTC method, but as a .config driven opt-in > initially. Sonds good, but when do we get to the decruftification? I'd prefer to prepare the final patch (if nothing else, so Linus will be reassured by the diffstat), although I can see holding it back for a few releases. > Please also add calibration tracing code (.config driven and default-off), > so that the statistical properties of calibration can be debugged and > validated without patching the kernel. Definitely desired, but I have to be careful here. Obviously I can't print during the timing loop, so it will take either a lot of memory, or add significant computation to the loop. I also don't want to flood the kernel log before syslog is started. Do you have any specific suggestions? Should I just capture everything into a permanently-allocated buffer and export it via debugfs? >> I realize this is a far bigger overhaul than Adrian proposed, but do other >> people agree that some decruftification is warranted? > Absolutely! Thanks for the encouragement! >> Any suggestions for a reasonable time/quality tradeoff? 500 ppm ASAP? >> Best I can do in 10 ms? Wait until the PIT is 500 ppm and then use >> the better result from a higher-resolution timer if available? > So I'd suggest a minimum polling interval (at least 1 msecs?) plus a > ppm target. Would 100ppm be too aggressive? How about 122 ppm (1/8192) because I'm lazy? :-) What I imagine is this: - The code will loop until it reaches 122 ppm or 55 ms, whichever comes first. (There's also a minimum, before which 122 ppm isn't checked.) - Initially, failure to reach 122 ppm will print a message and fall back. - In the final cleanup patch, I'll accept anything up to 500 ppm and only fail (and disable TSC) if I can't reach that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/