Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755864Ab1FHNpP (ORCPT ); Wed, 8 Jun 2011 09:45:15 -0400 Received: from eu1sys200aog116.obsmtp.com ([207.126.144.141]:56798 "EHLO eu1sys200aog116.obsmtp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754779Ab1FHNpM (ORCPT ); Wed, 8 Jun 2011 09:45:12 -0400 Message-ID: <4DEF7CA1.8080809@stericsson.com> Date: Wed, 8 Jun 2011 15:44:01 +0200 From: Mattias Wallin User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-GB; rv:1.9.2.15) Gecko/20110303 Lightning/1.0b2 Thunderbird/3.1.9 MIME-Version: 1.0 To: john stultz Cc: Santosh Shilimkar , Russell King - ARM Linux , Thomas Gleixner , Lee Jones , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" Subject: Re: [PATCHv2 0/3] clocksource: add db8500 PRCMU timer References: <1307007271-1004-1-git-send-email-mattias.wallin@stericsson.com> <20110602094622.GS3660@n2100.arm.linux.org.uk> <4DE7637B.3060901@stericsson.com> <20110602110137.GU3660@n2100.arm.linux.org.uk> <4DE77D9A.6090301@stericsson.com> <4DE788B2.5060308@ti.com> <1307040455.11492.52.camel@work-vm> In-Reply-To: <1307040455.11492.52.camel@work-vm> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6217 Lines: 141 On 06/02/2011 08:47 PM, john stultz wrote: > On Thu, 2011-06-02 at 18:27 +0530, Santosh Shilimkar wrote: >> On 6/2/2011 5:40 PM, Mattias Wallin wrote: >>> On 06/02/2011 01:01 PM, Russell King - ARM Linux wrote: >>>> On Thu, Jun 02, 2011 at 12:18:35PM +0200, Mattias Wallin wrote: >>>>> On 06/02/2011 11:46 AM, Russell King - ARM Linux wrote: >>>>>> Why don't we just find a way of fixing sched_clock so that the value >>>>>> doesn't reset over a suspend/resume cycle? >>>>> Even if the value isn't reset during suspend/resume we want the >>>>> clocksource to keep counting. Or is it ok to have the clocksource stop >>>>> or freeze during suspend? >>>> >>>> kernel/time/timekeeping.c:timekeeping_suspend(): >>>> >>>> timekeeping_forward_now(); >>>> >>>> which does: >>>> cycle_now = clock->read(clock); >>>> cycle_delta = (cycle_now - clock->cycle_last)& clock->mask; >>>> clock->cycle_last = cycle_now; >>>> >>>> So that updates the time with the current offset between cycle_last and >>>> the current value. >>>> >>>> kernel/time/timekeeping.c:timekeeping_resume(): >>>> /* re-base the last cycle value */ >>>> timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock); >>>> >>>> So this re-sets cycle_last to the current value of the clocksource. This >>>> means that on resume, the clocksource can start counting from any >>>> value it >>>> likes. >>>> >>>> So, without any additional external inputs, time resumes incrementing at >>>> the point where the suspend occurred without any jump backwards or >>>> forwards. >>>> >>>> The code accounts for the sleep time by using read_persistent_clock() >>>> read >>>> a timer which continues running during sleep to calculate the delta >>>> between >>>> suspend and resume, and injects the delta between them to wind the time >>>> forward. >>>> >>>>> Then we have cpuidle. Is it ok to stop/freeze the timer during cpuidle >>>>> sleep states? >>>> >>>> During _idle_ (iow, no task running) sched_clock and the clocksource >>>> should both continue to run - the scheduler needs to know how long the >>>> system has been idle for, and the clocksource can't stop because we'll >>>> lose track of time. >>>> >>>> Remember that the clockevent stuff is used as a trigger to the >>>> timekeeping >>>> code to read the clocksource, and update the current time. Time is moved >>>> forward by the delta between a previous clocksource read and the current >>>> clocksource read. So stopping or resetting the clocksource in unexpected >>>> ways (other than over suspend/resume as mentioned above) will result in >>>> time going weird. >>> >>> Hmm, I have missed the existence of the read_persistent_clock(). It >>> sounds like I should keep the MTU as my clocksource / sched_clock and >>> have the PRCMU Timer as a persistent_clock instead. >>> >>> Then one problem remains. The MTU will be powered during cstates: >>> running, wfi, ApIdle (arm retenetion). The MTU will loose power during >>> cstates ApSleep and ApDeepSleep. So I need to do a similar sync as >>> suspend does against the persistent_clock but when leaving and enter the >>> deeper cstates. >>> >>> Should I solve it in the clocksource framework with a flag telling which >>> cstates the timer will stop/freeze and then inject time from the >>> persistent_clock for those cstates? (I am thinking something like the >>> CLOCK_EVT_FEAT_C3STOP flag) >>> >>> Am I on the wrong track here or how should I solve it? >>> > [snip] >> John mentioned that because of frequent clock-source >> switching, will affect the NTP correction badly to an >> extent that NTP correction may not work. >> >> Here is what John suggested to do but I got busy with >> other stuff and this one got pushed out from my todo. >> >> -------------------- >> John wrote ... >> A different approach would be to create a meta-clocksource, that >> utilizes the two underlying clocks to create a what looks like a unified >> counter. >> >> Basically you use the slow-always running counter as your long-term freq >> adjusted clock, but supplement its low granularity with the highres >> halting clock. >> >> This works best if both counters are driven by the same crystal, so >> there won't be much drift between them. >> ---------------------- > > Yea. The basic idea is to interpolate between two counters to produce a > a continuous clocksource for the timekeeping core. > > As Russell pointed out, error is injected to the timekeeping code every > time you switch between clocksources or the persistent clock. Doing this > very frequently will provide very poor time. > > So by using an interpolated clocksource, we would be able to maintain > clock adjustments via ntp and provide accurate long term time, while > still having fine-grained resolution (along with some short term error). > > However, its all easier said then done. There are lots of gotchas and > corner-cases with interpolation. That's the whole reason we moved from > tick based timekeeping to continuous clocksource based timekeeping. I > believe the KVM folks have used similar interpolation method for their > kvm clocksource, and I know they had numerous issues. But it seems to > work well enough, so I would take a peek at their code to start. Russell, What is your gut feeling about above interpolated clocksource strategy suggestion? While we (I) look at an alternative clocksource strategy and experiment are you ok with the three patches I sent? I thought about the persistant_clock that I did not know about but that will only help me in suspend so I conclude that I still need the a clocksource that counts in all low power modes. I would like to have something merged in case the alternate strategy fails. Something that work in the current kernel. The warnings above hints that it will be difficult and can take a long time to find a solution where both my timers can work together. It also easier to discuss framework changes with improvements of in-kernel drivers. Thanks, /Mattias Wallin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/