Date: 10 Jun 2015 04:47:03 -0400
Message-ID: <20150610084703.3659.qmail@ns.horizon.com>
From: "George Spelvin" <linux@horizon.com>
To: linux@horizon.com, mingo@kernel.org
Subject: Re: Discussion: quick_pit_calibrate is slow
Cc: a.p.zijlstra@chello.nl, adrian.hunter@intel.com, ak@linux.intel.com,
        akpm@linux-foundation.org, arjan@infradead.org, bp@alien8.de,
        hpa@zytor.com, linux-kernel@vger.kernel.org, luto@amacapital.net,
        penberg@iki.fi, tglx@linutronix.de, torvalds@linux-foundation.org
In-Reply-To: <20150610073009.GA14879@gmail.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4433
Lines: 104

Ingo Molnar wrote:
>* George Spelvin <linux@horizon.com> wrote:

> As a side note: so VMs often want to skip the whole calibration business,
> because they are running on a well-calibrated host.

> 1,000 msecs is also an eternity: consider for example the KVM + tools/kvm
> based "Clear Containers" feature from Arjan:
> ... which boots up a generic Linux kernel to generic Linux user-space in 32 
> milliseconds, i.e. it boots in 0.03 seconds (!).

Agreed, if you're paravirtualized, you can just pass this stuff in from
the host.  But there's plenty of hardware virtualization that boots
a generic Linux.

I pulled generous numbers out of my ass because I didn't want to over-reach
in the argument that it's taking too long.  The shorter the boot
time, the stronger the point.

>> With a total of 0.84 us of read uncertaity (1/12 of quick_pit_calibrate
>> currently), we can get within 500 ppm within 1.75 us.  Or do better
>> within 5 or 10.

> (msec you mean I suspect?)

Yes, typo; that should be 1.75 ms.

>> The loop I'd write would start the PIC (and the RTC, if we want to)
>> and then go round-robin reading all the time sources and associated
>> TSC values.

> I'd just start with the PIT to have as few balls in flight as possible.

Once I get the loop structured properly, additional timers really
aren't a problem.  The biggest PITA is the PM_TMR and all its
brokenness (do I have a PIIX machine in the closet somewhere?),
but the quick_pit_calibrate patch I already posted to LKML shows
how to handle that.  I set up a small circular buffer of captured
values, and when I'm (say) three captures past the "interesting"
one, go back and see if the reads look good.

> Could you please structure it the following way:
>
> - first a patch that fixes bogus comments about the current code. It has 
>   bitrotten and if we change it significantly we better have a well
>   documented starting point that is easier to compare against.
>
> - then a patch that introduces your more accurate calibration method and
>   uses it as the first method to calibrate. If it fails (and it should have a 
>   notion of failing) then it should fall back to the other two methods.
>
> - possibly add a boot option to skip your new calibration method -
>   i.e. to make the kernel behave in the old way. This would be useful
>   for tracking down any regressions in this.
>
>  - then maybe add a patch for the RTC method, but as a .config driven opt-in 
>    initially.

Sonds good, but when do we get to the decruftification?  I'd prefer to
prepare the final patch (if nothing else, so Linus will be reassured by
the diffstat), although I can see holding it back for a few releases.

> Please also add calibration tracing code (.config driven and default-off),
> so that the statistical properties of calibration can be debugged and
> validated without patching the kernel.

Definitely desired, but I have to be careful here.  Obviously I can't
print during the timing loop, so it will take either a lot of memory,
or add significant computation to the loop.

I also don't want to flood the kernel log before syslog is
started.

Do you have any specific suggestions?  Should I just capture everything
into a permanently-allocated buffer and export it via debugfs?

>> I realize this is a far bigger overhaul than Adrian proposed, but do other 
>> people agree that some decruftification is warranted?

> Absolutely!

Thanks for the encouragement!

>> Any suggestions for a reasonable time/quality tradeoff?  500 ppm ASAP?
>> Best I can do in 10 ms?  Wait until the PIT is 500 ppm and then use
>> the better result from a higher-resolution timer if available?

> So I'd suggest a minimum polling interval (at least 1 msecs?) plus a
> ppm target.  Would 100ppm be too aggressive?

How about 122 ppm (1/8192) because I'm lazy? :-)

What I imagine is this:

- The code will loop until it reaches 122 ppm or 55 ms, whichever comes
  first.  (There's also a minimum, before which 122 ppm isn't checked.)
- Initially, failure to reach 122 ppm will print a message and fall back.
- In the final cleanup patch, I'll accept anything up to 500 ppm
  and only fail (and disable TSC) if I can't reach that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/