Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751950AbZG0L4E (ORCPT ); Mon, 27 Jul 2009 07:56:04 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751374AbZG0L4D (ORCPT ); Mon, 27 Jul 2009 07:56:03 -0400 Received: from mtagate4.de.ibm.com ([195.212.29.153]:62843 "EHLO mtagate4.de.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751100AbZG0L4B (ORCPT ); Mon, 27 Jul 2009 07:56:01 -0400 Date: Mon, 27 Jul 2009 13:55:50 +0200 From: Martin Schwidefsky To: john stultz Cc: Daniel Walker , linux-kernel@vger.kernel.org, Ingo Molnar , Thomas Gleixner Subject: Re: [RFC][patch 1/5] move clock source related code to clocksource.c Message-ID: <20090727135550.4929c89b@skybase> In-Reply-To: <1248480527.12279.29.camel@localhost.localdomain> References: <20090721191745.788551122@de.ibm.com> <20090721192057.177653956@de.ibm.com> <1248205851.14209.777.camel@desktop> <1248213607.3298.107.camel@localhost> <20090722092519.772b238b@skybase> <1248284733.18789.32.camel@work-vm> <1248308900.7592.36.camel@localhost.localdomain> <20090723125245.34e84cac@skybase> <1248480527.12279.29.camel@localhost.localdomain> Organization: IBM Corporation X-Mailer: Claws Mail 3.7.2 (GTK+ 2.16.4; i486-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12786 Lines: 328 On Fri, 24 Jul 2009 17:08:47 -0700 john stultz wrote: > On Thu, 2009-07-23 at 12:52 +0200, Martin Schwidefsky wrote: > > On Wed, 22 Jul 2009 17:28:20 -0700 > > john stultz wrote: > > > 2) Structure names aren't too great right now. Not sure timeclock is > > > what I want to use, probably system_time or something. Will find/replace > > > before the next revision is sent out. > > > > I've picked the name struct timekeeper. > > Nice. I like this name. Agreed then. > > > 5) The TSC clocksource uses cycles_last to avoid very slight skew issues > > > (that otherwise would not be noticed). Not sure how to fix that if we're > > > pulling cycles_last (which is purely timekeeping state) out of the > > > clocksource. Will have to think of something. > > > > That is an ugly one. A similar thing exists in the s390 backend where I > > want to reset the timekeeping to precise values after the clocksource > > switch from jiffies. The proper solution probably is to allow > > architectures to override the default clocksource. The jiffies > > clocksource doesn't make any sense on s390. > > Yea. Still not sure what a good fix is here. We need some way for the > TSC to check if its slightly behind another cpu. Alternatively we could > create a flag (SLIGHTLY_UNSYNCED or something) where we then do the > quick comparison in the timekeeping code instead of the clocksource read > implementation? It would really only be useful for the TSC and would > clutter the code up somewhat, so I'm not sure I really like it, but > maybe that would work. For my next version of the patch series I will leave the cycle_last in the struct clocksource. Small, manageable steps, if we find a way to move it then we can do that later on as well. > > > 3) Once all arches are converted to using read_persistent_clock(), then > > > the arch specific time initialization can be dropped. Removing the > > > majority of direct xtime structure accesses. > > > > Only if the read_persistent_clock allows for a better resolution than > > seconds. With my cputime accounting hat on: seconds are not good > > enough.. > > cputime accounting? read_persistent_clock is only used for time > initialization. But yea, it could be extended to return a timespec. Exactly! > > > 4) Then once the remaining direct wall_to_monotonic and xtime accessors > > > are moved to timekeeping.c we can make those both static and embed them > > > into the core timekeeping structure. > > > > Both should not be accessed at a rate that makes it necessary to read > > from the values directly. An accessor should be fine I think. > > Yea its just a matter of cleaning things up. Nod. > > > But let me know if this patch doesn't achieve most of the cleanup you > > > wanted to see. > > > > We are getting there. I wonder if it is really necessary to pull > > xtime_cache, raw_time, total_sleep_time and timekeeping_suspended into > > the struct timeclock. I would prefer the semantics that the struct > > timekeeper / timeclock contains the internal values of the timekeeping > > code for the currently selected clock source. xtime is not clock > > specific. > > Its not necessary. Although I do like the idea of pulling all the values > protected by the xtime_lock into one structure so its clear what we're > protecting. As it is now, jiffies, ntp state and a whole host of other > data is covered by it, so it would be a ways off until all that logic > could be dethreaded. Yea, I guess that makes sense. Gives us something to do in the near future. For now I do what is not too complicated to implement. > The other reason for putting those values into the timekeeping struct is > that they are actually fairly tightly connected with the clocksource > state. There is some redundancy: xtime_nsec stores the highres remainder > of xtime.tv_nsec, xtime_cache is also mostly duplicate, so hopefully we > can combine those once xtime is made static. > > > For reference here is the current stack of patches I have on my disk. > > The stop_machine conversion to install a new clocksource is currently missing. > > > > PRELIMINARY PATCHES, USE AT YOUR OWN RISK. > > ------------------------------------------------------------------- > > > > Subject: [PATCH] introduce timekeeping_leap_insert > > > > From: john stultz > > > > --- > > include/linux/time.h | 1 + > > kernel/time/ntp.c | 7 ++----- > > kernel/time/timekeeping.c | 7 +++++++ > > 3 files changed, 10 insertions(+), 5 deletions(-) > > > [snip] > > I think this one is pretty easy and should be able to be submitted on > the sooner side. A good question is who will upstream all this, will you take care of it ? > > ------------------------------------------------------------------- > > > > Subject: [PATCH] remove clocksource inline functions > > > > From: Martin Schwidefsky > > > > Remove clocksource_read, clocksource_enable and clocksource_disable > > inline functions. No functional change. > > > > Cc: Ingo Molnar > > Cc: Thomas Gleixner > > Cc: john stultz > > Cc: Daniel Walker > > Signed-off-by: Martin Schwidefsky > > --- > > include/linux/clocksource.h | 46 -------------------------------------------- > > kernel/time/timekeeping.c | 32 +++++++++++++++++------------- > > 2 files changed, 18 insertions(+), 60 deletions(-) > > [snip] > > Again, the outstanding mult_orig fix (I need to pester Thomas or Andrew > to push that before 2.6.31 comes out) will collide with this, but > overall it looks ok. The conflict with mult_orig is a small issue, I'll work around it. In the end mult_orig will be gone anyway. > > ------------------------------------------------------------------- > > > > Subject: [PATCH] cleanup clocksource selection > > > > From: Martin Schwidefsky > > > > Introduce clocksource_dequeue & clocksource_update and move spinlock > > calls. clocksource_update does nothing for GENERIC_TIME=n since > > change_clocksource does nothing as well. > > > > Cc: Ingo Molnar > > Cc: Thomas Gleixner > > Cc: john stultz > > Cc: Daniel Walker > > Signed-off-by: Martin Schwidefsky > > [snip] > > Haven't had enough time to closely review or test this, but it looks > like an agreeable change. Don't know yet, the change to clocksource for the stop_machine rework will be bigger than I though. The clocksource watchdog code changes the rating of the TSC clock from interrupt context. I need to get out of the interrupt context if I want to use stop_machine for the clocksource update. Not nice.. > > +/* Structure holding internal timekeeping values. */ > > +struct timekeeper { > > + struct clocksource *clock; > > + cycle_t cycle_interval; > > + u64 xtime_interval; > > + u32 raw_interval; > > + u64 xtime_nsec; > > + s64 ntp_error; > > + int xtime_shift; > > + int ntp_shift; > > + > > + /* > > + * The following is written at each timer interrupt > > + * Keep it in a different cache line to dirty no > > + * more than one cache line. > > + */ > > + cycle_t cycle_last ____cacheline_aligned_in_smp; > > +}; > > > The cacheline aligned bit probably needs to cover almost everything here > (not clock, not the shift values), as it will all be updated each tick. cycle_last is back in the clocksource structure for now. > [snip] > > @@ -564,8 +628,8 @@ static __always_inline int clocksource_b > > * Now calculate the error in (1 << look_ahead) ticks, but first > > * remove the single look ahead already included in the error. > > */ > > - tick_error = tick_length >> (NTP_SCALE_SHIFT - clock->shift + 1); > > - tick_error -= clock->xtime_interval >> 1; > > + tick_error = tick_length >> (timekeeper.ntp_shift + 1); > > + tick_error -= timekeeper.xtime_interval >> 1; > > error = ((error - tick_error) >> look_ahead) + tick_error; > > > > /* Finally calculate the adjustment shift value. */ > > @@ -590,18 +654,18 @@ static __always_inline int clocksource_b > > * this is optimized for the most common adjustments of -1,0,1, > > * for other values we can do a bit more work. > > */ > > -static void clocksource_adjust(s64 offset) > > +static void timekeeping_adjust(s64 offset) > > { > > - s64 error, interval = clock->cycle_interval; > > + s64 error, interval = timekeeper.cycle_interval; > > int adj; > > > > - error = clock->error >> (NTP_SCALE_SHIFT - clock->shift - 1); > > + error = timekeeper.ntp_error >> (timekeeper.ntp_shift - 1); > > if (error > interval) { > > error >>= 2; > > if (likely(error <= interval)) > > adj = 1; > > else > > - adj = clocksource_bigadjust(error, &interval, &offset); > > + adj = timekeeping_bigadjust(error, &interval, &offset); > > } else if (error < -interval) { > > error >>= 2; > > if (likely(error >= -interval)) { > > @@ -609,15 +673,14 @@ static void clocksource_adjust(s64 offse > > interval = -interval; > > offset = -offset; > > } else > > - adj = clocksource_bigadjust(error, &interval, &offset); > > + adj = timekeeping_bigadjust(error, &interval, &offset); > > } else > > return; > > > > - clock->mult += adj; > > - clock->xtime_interval += interval; > > - clock->xtime_nsec -= offset; > > - clock->error -= (interval - offset) << > > - (NTP_SCALE_SHIFT - clock->shift); > > + timekeeper.clock->mult += adj; > > + timekeeper.xtime_interval += interval; > > + timekeeper.xtime_nsec -= offset; > > + timekeeper.ntp_error -= (interval - offset) << timekeeper.ntp_shift; > > } > > > > /** > > @@ -627,53 +690,56 @@ static void clocksource_adjust(s64 offse > > */ > > void update_wall_time(void) > > { > > + struct clocksource *clock; > > cycle_t offset; > > > > /* Make sure we're fully resumed: */ > > if (unlikely(timekeeping_suspended)) > > return; > > > > + clock = timekeeper.clock; > > #ifdef CONFIG_GENERIC_TIME > > - offset = (clock->read(clock) - clock->cycle_last) & clock->mask; > > + offset = (clock->read(clock) - timekeeper.cycle_last) & clock->mask; > > #else > > - offset = clock->cycle_interval; > > + offset = timekeeper.cycle_interval; > > #endif > > - clock->xtime_nsec = (s64)xtime.tv_nsec << clock->shift; > > + timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.xtime_shift; > > > > /* normally this loop will run just once, however in the > > * case of lost or late ticks, it will accumulate correctly. > > */ > > - while (offset >= clock->cycle_interval) { > > + while (offset >= timekeeper.cycle_interval) { > > /* accumulate one interval */ > > - offset -= clock->cycle_interval; > > - clock->cycle_last += clock->cycle_interval; > > + offset -= timekeeper.cycle_interval; > > + timekeeper.cycle_last += timekeeper.cycle_interval; > > > > - clock->xtime_nsec += clock->xtime_interval; > > - if (clock->xtime_nsec >= (u64)NSEC_PER_SEC << clock->shift) { > > - clock->xtime_nsec -= (u64)NSEC_PER_SEC << clock->shift; > > + timekeeper.xtime_nsec += timekeeper.xtime_interval; > > + if (timekeeper.xtime_nsec >= (u64)NSEC_PER_SEC << timekeeper.xtime_shift) { > > + timekeeper.xtime_nsec -= (u64)NSEC_PER_SEC << timekeeper.xtime_shift; > > xtime.tv_sec++; > > second_overflow(); > > } > > > > - clock->raw_time.tv_nsec += clock->raw_interval; > > - if (clock->raw_time.tv_nsec >= NSEC_PER_SEC) { > > - clock->raw_time.tv_nsec -= NSEC_PER_SEC; > > - clock->raw_time.tv_sec++; > > + raw_time.tv_nsec += timekeeper.raw_interval; > > + if (raw_time.tv_nsec >= NSEC_PER_SEC) { > > + raw_time.tv_nsec -= NSEC_PER_SEC; > > + raw_time.tv_sec++; > > } > > > > /* accumulate error between NTP and clock interval */ > > - clock->error += tick_length; > > - clock->error -= clock->xtime_interval << (NTP_SCALE_SHIFT - clock->shift); > > + timekeeper.ntp_error += tick_length; > > + timekeeper.ntp_error -= timekeeper.xtime_interval << > > + timekeeper.ntp_shift; > > } > > > Just to be super careful, I'd probably separate the introduction of the > timekeeper code and the ntp_shift changes into separate patches. Just to > keep the amount of change to very complex code down to a more easily > review-able amount. Done, I now have multiple patches for the cleanup. I will resend soon. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/