LinuxLists.cc - Re: [PATCH 0/5] clocksource patches

2006-05-06 02:04:16

Subject: Re: [PATCH 0/5] clocksource patches

On Thu, 2006-04-27 at 22:33 +0200, Roman Zippel wrote:
> I really don't understand your problem with a clocksource specific
> nsec_offset(), the author already has to provide most of the basic
> parameters, it's not that difficult to put them together and for the
> really lazy we can provide a:
>
> my_clock_nsec_offset(struct clocksource *cs)
> {
> return generic_nsec_offset(cs, my_get_cycles(), MASK, SHIFT);
> }
>
> this is _very_ simple and if some of the parameters are constant you
> already saved a few cycles.

We already have this, its the per-arch gettimeofday().

My issue here is that by pushing this functionality off into the
clocksource driver, not only do you complicate the driver, you also are
implicitly binding the driver to a form of tick based timekeeping
(question: from what point is the nsec_offset measuring from?). Also,
this has implications w/ NTP, as the nsec_offset() function now must do
its own scaling internally.

I think this sort of get_nsec_offset() interface is needed, but it
doesn't belong in the clocksource driver, because that fouls the
abstraction.

Instead, why don't we have two implementations of __get_nsec_offset()?
One which uses the clean clocksource abstraction, and one that can call
your arch_get_nsec_offset()?

Here's what I'm proposing for generic code:

do_get(ns)timeofday():
xtime + __get_nsec_offset()

update_wall_time():
now = clocksource_read() /* <- that's jiffies in tick-clock case */
while (now - last > interval_cycles):
last += interval_cycles
xtime += interval_nsecs
error += interval_ntp - interval_nsecs
ntp_adjust(...)
...

#ifdef CONTINUOUS_CLOCK
__get_nsec_offset():
now = clock_read
return ((now - last)*mult)>>shift
#else /*TICK_CLOCK*/
__get_nsec_offset():
return arch_get_nsec_offset()
#endif

How's that sound?

> John, we have to find some compromise here, but I think sacrificing
> everything to driver simplicity is IMO a huge mistake.

We're coming up on the one year mark for this discussion. Every pass
I've taken your feedback, worked to understand it (which, i admit can
take me awhile :), and implemented parts of your ideas into the code. I
really do appreciate this cycle, so I hope you don't find me
thick-headed or stubborn. I believe I've been happy to compromise every
step of the way.

I do want to find a solution that we both like, but I am feeling some
fatigue and we need to make some progress. The existing tick based
assumptions in mainline is blocking the ability to sanely implement both
bug fixes and new features.

> A "simple" driver
> gives you a lot of flexibility on the generic side, but you remove this
> flexibility from the driver side, i.e. if a driver doesn't fit into this
> cycle model (e.g. tick based drivers) it has to do extra work to provide
> this abstraction.

I'm not trying to fit tick based clocks into the continuous model. I'm
trying to allow the currently jiffies based timekeeping code to possibly
use something else, cleaning up a lot of code in the process.

> A good abstraction should concentrate on the _common_ properties and I
> don't think that the continous cycle model is a good general abstraction
> for all types of clocks. Tick based clocks are already more complex due
> the extra work needed to emulate the cycle counter. Some clocks may
> already provide a nsec value (e.g. virtual clocks in a virtualized
> environment), where your generic nsec calculation is a complete waste of
> time. A common property of all clocks is that we want a nsec value from
> them, so why not simply ask the clock for some kind of nsec value and
> provide the clock driver with the necessary library routines to convert
> the cycle value to a nsec value, where you actually have the necessary
> information to create efficient code. As long as you try to pull the cycle
> model into the generic model it will seriously suck from a performance
> perspective, as you separate the cycle value from the information how to
> deal with it efficiently.

For features like robust timekeeping in the face of lost ticks (needed
for virtualization, and realtime), as well as high-res timers and
dynamic/no-idle ticks, we *NEED* a continuous clock.

I made a quick audit of the arches to see what the breakdown was for
continuous vs tick clocks:

Continuous:
i386,x86-64, ia64, powerpc, ppc, sparc, alpha, mips, parisc, s390,
xtensa, and arm (pxa, sa1100, plat-omap)

Tick:
arm (other), cris, m32, m68k, sh

xtime/no intertick resolution:
frv, h8300, v850

I'll admit, the continuos cycle model doesn't fit tick based clocks, but
we shouldn't limit the abstraction to the lowest common denominator. By
providing what I suggested above (w/ the two __get_nsec_offset()
implementations), we reduce code for both models, and allow progress for
systems w/ continuous clocks in a generic fashion.

> > > I'm also still interested in opinions about different possibilities to
> > > organize the clocksource infrastructure (although I'd more appreciate
> > > pro/con arguments than outright rejections). One possibility here would be
> > > to also shorten the function names a bit (clocksource_ is a lot to type :)
> > > ), cs_... or clock_... would IMO work too.
> >
> > I *much* prefer the clarity of clocksource over the wear and tear typing
> > it might take on my fingers.
>
> What is the special meaning of "clocksource" vs e.g. just "clock"?

"clock" is already overloaded. We have the
CLOCK_MONOTONIC/CLOCK_REALTIME clocks, we have the RTC clocks... Its a
cycle counter we're using to accumulate time, thus clocksource seemed
understandable and unique.

> > > I also kept it separate from the do_timer() callback and simply created a
> > > new callback function, which archs can use. This makes it less likely to
> > > break any existing setup and even allows to coexists both methods until
> > > archs have been completely converted.
> >
> > One of the reasons I did the significant rework of the TOD patches was
> > so that we could generically introduce the clocksource abstraction first
> > (using jiffies) for all arches. I feel this provides a much smoother
> > transition to the generic timekeeping code (and greatly reduces the
> > amount of CONFIG_GENERIC_TIME specific code), so I'm not sure I
> > understand why you want to go back to separate do_timer() functions.
>
> Apropos code size: I checked the generated code size of timer.o (default
> i386 config), before it was 3248 bytes, with your patches it goes to 5907
> bytes and by disabling CONFIG_GENERIC_TIME it still is 4554. With my
> patches it's 4767/4347 bytes (removing the old time code saves another
> 315 bytes).

This isn't really a fair comparison (yet atleast), as your patches don't
appear to handle suspend/resume correctly. Nor did your patches even
boot on my laptop. :(

I'll admit that my code could use more low-level optimization, and I
welcome patches against my code to improve it. My code has already
gotten a good amount of testing in both -mm and -rt, so I know it works.
Lets make it fast too, but just in steps.

> In my version I was very careful to keep the common interrupt paths as
> short as possible, so that most of this code is not even executed most of
> the time. You can't just "hope" that gcc does the work for you (especially
> with 64bit math), you actually have to check the code and you would have
> the noticed the bloated code it generates.

Good point. I just sent a patch to Andrew fixing the assumption that gcc
could avoid a few of the mults. And I'm working on more patches to
reduce the text size as well.

Once again, I do appreciate the feedback.

thanks
-john

2006-05-06 16:25:28

by Roman Zippel

[permalink] [raw]

Subject: Re: [PATCH 0/5] clocksource patches

Hi,

On Fri, 5 May 2006, john stultz wrote:

> > A good abstraction should concentrate on the _common_ properties and I
> > don't think that the continous cycle model is a good general abstraction
> > for all types of clocks. Tick based clocks are already more complex due
> > the extra work needed to emulate the cycle counter. Some clocks may
> > already provide a nsec value (e.g. virtual clocks in a virtualized
> > environment), where your generic nsec calculation is a complete waste of
> > time. A common property of all clocks is that we want a nsec value from
> > them, so why not simply ask the clock for some kind of nsec value and
> > provide the clock driver with the necessary library routines to convert
> > the cycle value to a nsec value, where you actually have the necessary
> > information to create efficient code. As long as you try to pull the cycle
> > model into the generic model it will seriously suck from a performance
> > perspective, as you separate the cycle value from the information how to
> > deal with it efficiently.
>
>
> For features like robust timekeeping in the face of lost ticks (needed
> for virtualization, and realtime), as well as high-res timers and
> dynamic/no-idle ticks, we *NEED* a continuous clock.

Let's concentrate on the core issue.
I do agree that we need a continuous clock, but why has this clock to be
cycle based? As I tried to explain above the cycle based abstraction hurts
performance, as it cuts us off from further optimizations.
What we want in the end is continous _nanosecond_ value, so why not let
the abstraction base on this? Why is the cycle value so important?

> I made a quick audit of the arches to see what the breakdown was for
> continuous vs tick clocks:
>
> Continuous:
> i386,x86-64, ia64, powerpc, ppc, sparc, alpha, mips, parisc, s390,
> xtensa, and arm (pxa, sa1100, plat-omap)
>
> Tick:
> arm (other), cris, m32, m68k, sh
>
> xtime/no intertick resolution:
> frv, h8300, v850
>
> I'll admit, the continuos cycle model doesn't fit tick based clocks, but
> we shouldn't limit the abstraction to the lowest common denominator. By
> providing what I suggested above (w/ the two __get_nsec_offset()
> implementations), we reduce code for both models, and allow progress for
> systems w/ continuous clocks in a generic fashion.

Why can't we reach the same goal by providing library functions to the
drivers, which would allow far better optimizations?
Of these archs ppc has a higly optimized lock and condition free
gettimeofday implementation, which you simply throw away. I'm afraid that
archs which care about performance have to work around your slow generic
implementation. I have a big problem seeing progress in this.

John, I see three options:
1. I'm missing the point in your design which allows for further
optimizations in a generic way.
2. Something is wrong with your cycle based design.
3. Performance isn't important anymore.

> > > I *much* prefer the clarity of clocksource over the wear and tear typing
> > > it might take on my fingers.
> >
> > What is the special meaning of "clocksource" vs e.g. just "clock"?
>
>
> "clock" is already overloaded. We have the
> CLOCK_MONOTONIC/CLOCK_REALTIME clocks, we have the RTC clocks... Its a
> cycle counter we're using to accumulate time, thus clocksource seemed
> understandable and unique.

The first would actually be my reason for calling it "clock", i.e. if this
is going to be _the_ central abstraction for further time based services,
it deserves the shorter and concise "clock" name.

> This isn't really a fair comparison (yet atleast), as your patches don't
> appear to handle suspend/resume correctly. Nor did your patches even
> boot on my laptop. :(

Why didn't you mentioned this earlier? :(

> I'll admit that my code could use more low-level optimization, and I
> welcome patches against my code to improve it. My code has already
> gotten a good amount of testing in both -mm and -rt, so I know it works.

Andrew immediately drops my patches, so they don't even get a chance to
get any testing. This is a complete unfair argument, I don't even get the
chance to prove that my code might be better. :-(

> Lets make it fast too, but just in steps.

The first step would be to keep it separate from the current
update_wall_time() code. I just got rid of clock read in the hrtimer
interrupt code you are about to introduce it again here. Many clocks don't
need that much precision, and especially if it's an expensive operation
it's a complete waste of time.

bye, Roman

2006-05-08 18:33:36

by john stultz

[permalink] [raw]

Subject: Re: [PATCH 0/5] clocksource patches

On Sat, 2006-05-06 at 18:25 +0200, Roman Zippel wrote:
> On Fri, 5 May 2006, john stultz wrote:
>
> > > A good abstraction should concentrate on the _common_ properties and I
> > > don't think that the continous cycle model is a good general abstraction
> > > for all types of clocks. Tick based clocks are already more complex due
> > > the extra work needed to emulate the cycle counter. Some clocks may
> > > already provide a nsec value (e.g. virtual clocks in a virtualized
> > > environment), where your generic nsec calculation is a complete waste of
> > > time. A common property of all clocks is that we want a nsec value from
> > > them, so why not simply ask the clock for some kind of nsec value and
> > > provide the clock driver with the necessary library routines to convert
> > > the cycle value to a nsec value, where you actually have the necessary
> > > information to create efficient code. As long as you try to pull the cycle
> > > model into the generic model it will seriously suck from a performance
> > > perspective, as you separate the cycle value from the information how to
> > > deal with it efficiently.
> >
> >
> > For features like robust timekeeping in the face of lost ticks (needed
> > for virtualization, and realtime), as well as high-res timers and
> > dynamic/no-idle ticks, we *NEED* a continuous clock.
>
> Let's concentrate on the core issue.
> I do agree that we need a continuous clock, but why has this clock to be
> cycle based? As I tried to explain above the cycle based abstraction hurts
> performance, as it cuts us off from further optimizations.

First of all, you didn't reply specifically, but I hope we're in
agreement w/ the tick based clocks being not part of this specific
discussion. I'm fine letting systems w/ tick based clocks have an
get_nsec_offset() that is fully arch specific. And I don't love it, but
I can deal w/ having two update_wall_time() paths so tick based systems
can get some extra constant based optimizations.

Now, on to the continuous clocksource discussion :)

What arch specific optimizations for continuous clocks do you have in
mind? In other words, what would be an example of an architecture
specific optimization for generating time from a continuous counter?

For the sake of this discussion, I claim that optimizations made on
converting a continuous cycle based clock to an NTP adjusted time can be
made to all arches, and pushing the nanosecond conversion into the
driver is messy and needless. What are examples contrary to this claim?

> What we want in the end is continous _nanosecond_ value, so why not let
> the abstraction base on this? Why is the cycle value so important?

Because doing the NTP adjustment correctly on a cycle based clock is
difficult with the current code (I think ppc is the only one that does
it correctly, in my mind, by changing the frequency multiplier).

It can be done generically, and I do not see what sort of optimizations
you're imagining, so why keep it arch specific?

> Of these archs ppc has a higly optimized lock and condition free
> gettimeofday implementation, which you simply throw away. I'm afraid that
> archs which care about performance have to work around your slow generic
> implementation. I have a big problem seeing progress in this.

The ppc's lockfree implementation is interesting (putting aside for a
moment the fact that the current ppc vsyscall-gtod added locks back to
the code).

However I don't see how its an arch specific optimization! Its simply
doing atomic updates via pointer switches between two structures. This
doesn't need to be ppc specific, yet because of the current mess it is.
This is a great example of why the generic code would be useful.

> > This isn't really a fair comparison (yet atleast), as your patches don't
> > appear to handle suspend/resume correctly. Nor did your patches even
> > boot on my laptop. :(
>
> Why didn't you mentioned this earlier? :(

I apologize for not trying to run your patches earlier, but being time
constrained as well, I have been trying to focus understand the
algorithmic differences.

> > Lets make it fast too, but just in steps.
>
> The first step would be to keep it separate from the current
> update_wall_time() code. I just got rid of clock read in the hrtimer
> interrupt code you are about to introduce it again here. Many clocks don't
> need that much precision, and especially if it's an expensive operation
> it's a complete waste of time.

With continuous cycle based counters, the clock read is *necessary* when
updating xtime for robust timekeeping. We can move update_wall_time so
we don't run it every timer interrupt, but we cannot keep correct time
by just guessing how much time has passed and adding it in.

On tick based systems, the code in -mm, would just be reading jiffies
which is equivalent to how its done in mainline. But I'll grant you we
probably miss out on some of the optimizations where we could use
constants, so I'll add in a tick based update_wall_time path soon.

thanks
-john

2006-05-08 21:15:48

by Roman Zippel

[permalink] [raw]

Subject: Re: [PATCH 0/5] clocksource patches

Hi,

On Mon, 8 May 2006, john stultz wrote:

> > Let's concentrate on the core issue.
> > I do agree that we need a continuous clock, but why has this clock to be
> > cycle based? As I tried to explain above the cycle based abstraction hurts
> > performance, as it cuts us off from further optimizations.
>
>
> First of all, you didn't reply specifically, but I hope we're in
> agreement w/ the tick based clocks being not part of this specific
> discussion. I'm fine letting systems w/ tick based clocks have an
> get_nsec_offset() that is fully arch specific. And I don't love it, but
> I can deal w/ having two update_wall_time() paths so tick based systems
> can get some extra constant based optimizations.

Please leave the tick based stuff out of this completely. We want to get
rid of the arch specific hooks and not add them back.
The point is to give the _clock_ control over this kind of stuff, only the
clock driver knows how to deal with this efficiently, so as long as you
try to create a "dumb" clock driver you're going to make things only
worse. Now the archs have most control over it, but creating a single
bloated and slow blop instead will not be an improvement.
It's not about moving everything into the clock driver here, it's about
creating a _powerful_ API, which leaves control in the hands of the clock
driver, but at the same time keeps them as _simple_ (and not as dumb) as
possible.

> What arch specific optimizations for continuous clocks do you have in
> mind? In other words, what would be an example of an architecture
> specific optimization for generating time from a continuous counter?

The best example is the powerpc gettimeofday.

> For the sake of this discussion, I claim that optimizations made on
> converting a continuous cycle based clock to an NTP adjusted time can be
> made to all arches, and pushing the nanosecond conversion into the
> driver is messy and needless. What are examples contrary to this claim?

What kind of NTP adjustments are you talking about? A nsec_offset function
could look like this:

unsigned long nsec_offset(cs)
{
return ((cs->xtime_nsec + get_cycles() - cs->last_offset) * cs->mult) >> SHIFT;
}

This is fucking simple, what about this is "messy"? There is no NTP
adjustment here, this is all happening somewhere else. Keeping it in the
driver allows to make parameter constant, skip unnecessary steps and
allows to do it within 32bit. This is something you can _never_ do in a
generic centralized way without making it truly messy. I'd be happy to be
proven otherwise, but I simply don't see it.

> > What we want in the end is continous _nanosecond_ value, so why not let
> > the abstraction base on this? Why is the cycle value so important?
>
> Because doing the NTP adjustment correctly on a cycle based clock is
> difficult with the current code (I think ppc is the only one that does
> it correctly, in my mind, by changing the frequency multiplier).

I was talking about the nsec offset (see above), which is outside of the
NTP business. What is your fascination with the cycle counter, that you
want to make it so generic?

> It can be done generically, and I do not see what sort of optimizations
> you're imagining, so why keep it arch specific?

John, it's _clock_ specific.

> > Of these archs ppc has a higly optimized lock and condition free
> > gettimeofday implementation, which you simply throw away. I'm afraid that
> > archs which care about performance have to work around your slow generic
> > implementation. I have a big problem seeing progress in this.
>
> The ppc's lockfree implementation is interesting (putting aside for a
> moment the fact that the current ppc vsyscall-gtod added locks back to
> the code).
>
> However I don't see how its an arch specific optimization! Its simply
> doing atomic updates via pointer switches between two structures. This
> doesn't need to be ppc specific, yet because of the current mess it is.
> This is a great example of why the generic code would be useful.

Part of this can be done generically and I'd like to see it implemented,
but not all the optimizations can be done generically, e.g. not every arch
has a convenient mulhdu instruction like ppc64.
The arch can provide some base optimizations, but in the end it's the
clock which knows best (e.g. 32bit vs. 64bit resolution). Only the clock
driver knows the basic clock parameters at _compile_ time and can generate
an efficient implementation. With the help of some library functions we
can keep the amount of source needed in the driver to a minimum and
simple. Where is the problem with this?

> > The first step would be to keep it separate from the current
> > update_wall_time() code. I just got rid of clock read in the hrtimer
> > interrupt code you are about to introduce it again here. Many clocks don't
> > need that much precision, and especially if it's an expensive operation
> > it's a complete waste of time.
>
> With continuous cycle based counters, the clock read is *necessary* when
> updating xtime for robust timekeeping. We can move update_wall_time so
> we don't run it every timer interrupt, but we cannot keep correct time
> by just guessing how much time has passed and adding it in.

It has almost nothing to do with continuous cycles. On an UP system only
anything running with a higher priority than the timer interrupt or if the
cycle adjustment happens asynchron to the timer interrupt (e.g. TSC) can
see the adjustment. Again it depends on the clock whether the common
adjustment is significant bigger than the time needed to read the clock,
otherwise it's just a waste of time.

> On tick based systems, the code in -mm, would just be reading jiffies
> which is equivalent to how its done in mainline. But I'll grant you we
> probably miss out on some of the optimizations where we could use
> constants, so I'll add in a tick based update_wall_time path soon.

John, again, it's not about tick based systems, you'll get it wrong as
long as you think it's about special exceptions for tick based systems.
It's about creating a flexible system, which can handle about anything and
only the clock driver really knows what this is, otherwise you create a
bloated and slow generic system.

bye, Roman

2006-05-09 00:29:26

by john stultz

[permalink] [raw]

Subject: Re: [PATCH 0/5] clocksource patches

On Mon, 2006-05-08 at 23:15 +0200, Roman Zippel wrote:
> The point is to give the _clock_ control over this kind of stuff, only the
> clock driver knows how to deal with this efficiently, so as long as you
> try to create a "dumb" clock driver you're going to make things only
> worse. Now the archs have most control over it, but creating a single
> bloated and slow blop instead will not be an improvement.
> It's not about moving everything into the clock driver here, it's about
> creating a _powerful_ API, which leaves control in the hands of the clock
> driver, but at the same time keeps them as _simple_ (and not as dumb) as
> possible.

Part of my concern here is keeping the code manageable and hackable.
What you're suggesting sounds very similar to what we have for the i386
timer_opts code, which I don't want to duplicate.

Maybe it would help here if you would better define the API for this
abstraction. As it stands, it appears incomplete. Define the state
values stored, and list what interfaces modify which state values, etc.

ie: clock->get_nsec_offset():
What exactly does this measure? Is this strictly defined by the state
stored in the clocksource? If not, how will this interact w/ dynamic
ticks? What is the maximum time we can delay interrupts before the clock
wraps? How do we know if its tick based?

Another issue: if get_nsec_offset() is clock specific in its
implementation, but the state values it uses in at least the common case
are modified by generic code, how can we implement something like the
ppc lock free read? That will affect both the generic code and the clock
specific code.

> > What arch specific optimizations for continuous clocks do you have in
> > mind? In other words, what would be an example of an architecture
> > specific optimization for generating time from a continuous counter?
>
> The best example is the powerpc gettimeofday.
>
> > For the sake of this discussion, I claim that optimizations made on
> > converting a continuous cycle based clock to an NTP adjusted time can be
> > made to all arches, and pushing the nanosecond conversion into the
> > driver is messy and needless. What are examples contrary to this claim?
>
> What kind of NTP adjustments are you talking about? A nsec_offset function
> could look like this:
>
> unsigned long nsec_offset(cs)
> {
> return ((cs->xtime_nsec + get_cycles() - cs->last_offset) * cs->mult) >> SHIFT;
> }
>
> This is fucking simple, what about this is "messy"? There is no NTP
> adjustment here, this is all happening somewhere else.

That's my point. if nsec_offset is opaque, then what is the interface
for making NTP adjustments to that function? Are all nsec_offset
functions required to use the xtime_nsec, last_offset, and mult values?

> Keeping it in the
> driver allows to make parameter constant, skip unnecessary steps and
> allows to do it within 32bit. This is something you can _never_ do in a
> generic centralized way without making it truly messy. I'd be happy to be
> proven otherwise, but I simply don't see it.

Well, my issue w/ you desire to have a 32bit continuous clock is that I
don't know how useful it would be.

For a 1Mhz clock,
You've got 1,000 ns per cycle and the algorithm looks something like
say:
get_nsec_offset: cycles * 1,000 >> 0
which gives you ~4 seconds of leeway, which is pretty good.
However, the short term jitter is 1000ppm.

So lets bump up the SHIFT value,
get_nsec_offset: cycles * 1,024,000 >> 10

The short term jitter: <1ppm, so that's good.
But the max cycles is then ~4,000 (4ms), which is too short.

You can wiggle around here and maybe come up with something that works,
but it only gets worse for clocks that are faster.

For robust timekeeping using continuous cycles, I think the 64bit mult
in gettimeofday is going to be necessary in the majority of cases (we
can wrap it in some sort of arch optimized macro for a mul_lxl_ll or
something if possible).

> > > The first step would be to keep it separate from the current
> > > update_wall_time() code. I just got rid of clock read in the hrtimer
> > > interrupt code you are about to introduce it again here. Many clocks don't
> > > need that much precision, and especially if it's an expensive operation
> > > it's a complete waste of time.
> >
> > With continuous cycle based counters, the clock read is *necessary* when
> > updating xtime for robust timekeeping. We can move update_wall_time so
> > we don't run it every timer interrupt, but we cannot keep correct time
> > by just guessing how much time has passed and adding it in.
>
> It has almost nothing to do with continuous cycles. On an UP system only
> anything running with a higher priority than the timer interrupt or if the
> cycle adjustment happens asynchron to the timer interrupt (e.g. TSC) can
> see the adjustment. Again it depends on the clock whether the common
> adjustment is significant bigger than the time needed to read the clock,
> otherwise it's just a waste of time.

Eh? I didn't understand that. Mind clarifying/expanding here?

thanks
-john

2006-05-09 22:48:56

by Roman Zippel

[permalink] [raw]

Subject: Re: [PATCH 0/5] clocksource patches

Hi,

On Mon, 8 May 2006, john stultz wrote:

> On Mon, 2006-05-08 at 23:15 +0200, Roman Zippel wrote:
> > The point is to give the _clock_ control over this kind of stuff, only the
> > clock driver knows how to deal with this efficiently, so as long as you
> > try to create a "dumb" clock driver you're going to make things only
> > worse. Now the archs have most control over it, but creating a single
> > bloated and slow blop instead will not be an improvement.
> > It's not about moving everything into the clock driver here, it's about
> > creating a _powerful_ API, which leaves control in the hands of the clock
> > driver, but at the same time keeps them as _simple_ (and not as dumb) as
> > possible.
>
> Part of my concern here is keeping the code manageable and hackable.
> What you're suggesting sounds very similar to what we have for the i386
> timer_opts code, which I don't want to duplicate.

What are you talking about???

Let's take a simple example:

static cycle_t read_tsc(void)
{
cycle_t ret;

rdtscll(ret);

return ret;
}

which could become:

static unsigned long tsc_nsec_offset(struct clocksource *cs)
{
cycle_t cyc;

rdtscll(cyc);

return generic_cont_cycle2nsec_offset(cs, cyc, 22, (cycle_t)-1);
}

Please tell me what kind of problem you have with this? Even in this
simple case it will already be faster due to the constant parameters.
What is wrong with a little extra performance?

> Maybe it would help here if you would better define the API for this
> abstraction. As it stands, it appears incomplete. Define the state
> values stored, and list what interfaces modify which state values, etc.
>
> ie: clock->get_nsec_offset():
> What exactly does this measure? Is this strictly defined by the state
> stored in the clocksource? If not, how will this interact w/ dynamic
> ticks? What is the maximum time we can delay interrupts before the clock
> wraps? How do we know if its tick based?

I explained this already at the start of this thread and please read the
patch. I'm truly baffled, that you actually have to ask these questions.

> Another issue: if get_nsec_offset() is clock specific in its
> implementation, but the state values it uses in at least the common case
> are modified by generic code, how can we implement something like the
> ppc lock free read? That will affect both the generic code and the clock
> specific code.

You basically have two copies of the time parameters, for
get_nsec_offset() this only means it gets a different structure argument,
the rest is the same. The pointer switch between these two copies can be
done generically.

> > What kind of NTP adjustments are you talking about? A nsec_offset function
> > could look like this:
> >
> > unsigned long nsec_offset(cs)
> > {
> > return ((cs->xtime_nsec + get_cycles() - cs->last_offset) * cs->mult) >> SHIFT;
> > }
> >
> > This is fucking simple, what about this is "messy"? There is no NTP
> > adjustment here, this is all happening somewhere else.
>
> That's my point. if nsec_offset is opaque, then what is the interface
> for making NTP adjustments to that function? Are all nsec_offset
> functions required to use the xtime_nsec, last_offset, and mult values?

These values are not opaque, from an average kernel hacker I would at
least expect to understand what these values mean, but he doesn't has to
know how they are adjusted (or even calculated) and for this function it's
also not really important.

> > Keeping it in the
> > driver allows to make parameter constant, skip unnecessary steps and
> > allows to do it within 32bit. This is something you can _never_ do in a
> > generic centralized way without making it truly messy. I'd be happy to be
> > proven otherwise, but I simply don't see it.
>
> Well, my issue w/ you desire to have a 32bit continuous clock is that I
> don't know how useful it would be.
>
> For a 1Mhz clock,
> You've got 1,000 ns per cycle and the algorithm looks something like
> say:
> get_nsec_offset: cycles * 1,000 >> 0
> which gives you ~4 seconds of leeway, which is pretty good.
> However, the short term jitter is 1000ppm.

I think you mean 1000ppb or 1ppm.
Anyway, it's still not entirely correct. The short term jitter (for a
continuous clock) is defined by two parameters. First you have the base
jitter defined by the clock frequency of 1/freq seconds (which is also
independent of the used shift), so even with a perfectly synchronized
signal there is a jitter of +-0.5ppm. The second parameter is defined by
the size of the adjustment steps, which depends on the shift and HZ:
freq/2^shift/HZ nsec. This means in this case with HZ=1000 adjustment adds
a jitter of +-0.5ppm or with HZ=100 it's +-5ppm.

> So lets bump up the SHIFT value,
> get_nsec_offset: cycles * 1,024,000 >> 10
>
> The short term jitter: <1ppm, so that's good.
> But the max cycles is then ~4,000 (4ms), which is too short.
>
> You can wiggle around here and maybe come up with something that works,
> but it only gets worse for clocks that are faster.

You don't have to "wiggle" around anything. Using log2(freq^2/10^9/HZ) you
can calculate a shift value with an adjustment step which matches the
clock resolution, so with HZ=100 and a shift of 4 the adjustment step is
+-0.3125ppm and with a shift of 5 it's +-0.15625ppm, which is good enough
and wraps around after 134msec (If you lose that many update ticks you
have other problems already and with a tick based clock you wouldn't
notice it anyway). So if the clock resolution is only around 1usec anyway,
a 32bit calculation of the offset is perfectly usable.

I don't really expect to know these details from many people, but if you
want to maintain the clock system, you have to know this, you have to know
what is possible and what isn't and why. I'm really scared if you create
your code by "wiggling" around with it until it somehow fits.

> For robust timekeeping using continuous cycles, I think the 64bit mult
> in gettimeofday is going to be necessary in the majority of cases (we
> can wrap it in some sort of arch optimized macro for a mul_lxl_ll or
> something if possible).

Actually currently you use a mul_llxl_ll, which is considerably more
expensive on many archs. With current GHz clocks and the possible delays
of the timer interrupt in the rt kernel we are currently pushing the
limits and a mul_llxl_ll may be needed, which is perfectly fine. What is
not ok, is forcing it onto everyone by design. If something is not
generally true, you cannot make it generic!
What makes this worse that you don't optimize for the majority of cases,
but that you have to do it for the worst case, this is the reason it's a
mul_llxl_ll not mul_lxl_ll. Giving the clock control over this you could
register two different clock and have a choice whether you want the slow,
low priority clock, which can go a few seconds without updates, or the
fast clock, which needs updates a few times a second. With your cycle
based design you'll never have this kind of flexibility, are you really
sure nobody ever needs that?

> > > With continuous cycle based counters, the clock read is *necessary* when
> > > updating xtime for robust timekeeping. We can move update_wall_time so
> > > we don't run it every timer interrupt, but we cannot keep correct time
> > > by just guessing how much time has passed and adding it in.
> >
> > It has almost nothing to do with continuous cycles. On an UP system only
> > anything running with a higher priority than the timer interrupt or if the
> > cycle adjustment happens asynchron to the timer interrupt (e.g. TSC) can
> > see the adjustment. Again it depends on the clock whether the common
> > adjustment is significant bigger than the time needed to read the clock,
> > otherwise it's just a waste of time.
>
> Eh? I didn't understand that. Mind clarifying/expanding here?

If the clock is updated at exactly every freq/HZ cycles, you won't see a
jump due to the clock adjustment. This means only if you delay the timer
interrupt can you cause a time jump and this delay has to be large enough
to produce a large enough jump to be noticable (without correcting for
it).
The nsec offset is adjusted by (offset >> (shift - mult_adj)) nsec, this
means IOW the correction is adjustment_offset*delay, e.g. if the clock is
adjusted by 1msec/sec and the delay is 1msec (one tick!), the correction
is 1usec. IOW you already need quite large adjustments and delay to
produce a significant adjustment for the delay, so for slow clocks it's
pretty much guaranteed to be a complete waste of time.

bye, Roman