2005-05-04 17:49:19

by George Anzinger

[permalink] [raw]
Subject: Re: Help with the high res timers

~

>>The issue is not its length, but can we interrupt at its end. The key to high
>>res is being on time. If we want a 100usec resolution timer, we need to have
>>the low res timer hit its mark with something real close to this.
>
>
> Indeed.
>
> As an aside, one nice thing with Nish's code is that the maximum soft-
> timer latency is only the maximum interval between ticks. This avoids
> accumulating error caused by lost ticks.

I think you are confusing the x86 problems with lost ticks with the wider world.
Most archs have reasonable time resources and, while they may suffer from late
ticks, do not, therefor loose time. In as much as you can find a way to
eliminate time loss in the x86 due to lost ticks of the time base, you can also
do so with the current time keeping system. The current code is just to
dependant on catching every timer tick. It should use the other resources
available to it to cover any missing ticks. (We do this rather nicely in the
HRT patch, by the way.)
>
>
>
>>>Now when the new code expires timers it does so against the timeofday
>>>subsystem's notion of time instead of jiffies. It simply goes through
>>>all the timer bucket entries between now and the last time we expired
>>>timers.
>>
>>This is not unlike what happens now. I would hope that the number of buckets
>>visited averages to to something real close to 1 per run_timers entery.
>
>
> Well, as long as the HZ period is close to the timer-interval unit
> length, this is true. However if the timer-interval unit is smaller,
> multiple bucket entries would be expired. The performance considerations
> here are being looked at and this may be an area where the concepts in
> HRT might help (having a HRT specific sub-bucket).

This is where we get in trouble with HR timers. For a HR timer, we need to know
how to get a timer to expire (i.e. appear in the call back) at a well defined
and presise time (leaving aside latency issues). The above discription allows
timers to be put in buckets without (as near as I can tell) making transparent
exactly when the bucket will be emptied, only saying that it will be after the
latest timer in the bucket is due.
>
>
>>>Now an interesting point you bring up above is when to schedule timer
>>>interrupts. One could just have a fine-grained timer-interval unit and
>>>crank up HZ, but clearly we don't want to schedule interrupts to go off
>>>too frequently or the overhead won't be worth it. Instead to get high-
>>>res timers, the idea is to look at the timer list and schedule
>>>interrupts appropriately. Then we only take an interrupt when there is
>>>work to do, rather then at regular periodic intervals when there may or
>>>may not be anything to expire.
>>
>>This would suggest that all timers are high res. I don't think this makes sense:
>>1) because most users just don't care that much about resolution and high-res
>>carries some overhead.
>>2) Not all platform hardware is able to handle high res.
>
>
> Indeed, in our system all timers could be high res, but don't
> necessarily need to be by default. It ends up being a function of how
> finely-grained the timer-interval units are set to be and how
> efficiently the hardware can be scheduled.

We currently ship HRT with a resolution of 1usec. Lets agree that you don't
want to even try to do this by adjusting the timer-interval.
>
>
>>I think high res timers should only be used when the user asks for them. This
>>keeps the overhead under control.
>
>
> Abstractly that makes sense, but I'm not sure how you mean "when the
> user asks for them". Is this a runtime consideration, or is it compile
> time?

Run time, he uses the POSIX clocks and timers interface and uses a seperate high
res clock. Only timers on high res clocks are high res.
>
>
~
>>
>>A small, measureable latency is ok and need not be backed out by the software.
>>If you go this route you risk expiring a timer early and the standard says bad
>>things about this.
>
>
> Since we expire based upon time instead of ticks, we can never expire
> early.

Think of it this way. Decompose a HR timer into corse and fine units (you
choose, but here let say jiffies and nanoseconds). Now we want the normal timer
system to handle the jiffies part of the time and to turn the timer over to the
HR timer code to take care of the nanosecond remainder. If the jiffie part is
late, depending on the nanosecond part, it could make the timer late (i.e for
low values of the nanosecond part). For high values of the nanosecond part, we
can compenstate...

This decomposition makes a lot of sense, by the way, for, at least, the
following reasons:
1) it keeps the most of the HR issues out of the normal timer code,
2) it keeps high res and low res timer in the correct time order, i.e. a low res
timer for jiffie X will expire prior to a high res timer for jiffie X + Y
nanoseconds.
3) handling the high res timer list is made vastly easier as it will only need
to have a rather small number of timers in it at any given time (i.e. those that
are to expire prior to the next corse timer tick).
>
>
>>What is missing in this is that the flag ship arch (x86) has piss poor
>>capability to schedule timer interrupts. I.e. there really is no commonly
>>available timer to generate interrupts at some arbitrary time. There is,
>>however, the PIT, which, if you don't mess with it, is very low overhead but
>>periodic.
>>
>>Then there is the issue of what the time standard is. In the x86, the PIT is
>>it. The pm_timer is close, but the read problem adds so much overhead as to
>>make it hard to justify. Possibly the HPET does a better job, but it is still
>>an I/O access (read SLOW) and is not commonly available as yet.
>
>
> With my rework, the time standard issue is separate problem domain from
> HRT. If the timeofday code cannot give correct time, its a bug in that
> subsystem. While you're point is a fair critique of my timeofday work
> (which I am trying to address), the clear interface between soft-timers
> and timeofday allows for the soft-timer subsystem to not worry about
> that.

Leaving aside timers, one issue I am trying to get you to address is that on
most X86 machines the "clock" rock is only expressed via PIT interrupts. While
we must express time with more resolution than this, we must also "use" that
rock (i.e. PIT) to keep decent long term time.
>
>
>>You also have to do something reasonable for the accounting subsystems.
>>Currently this is done by sampling the system at a periodic rate. What ever is
>>running gets charged with the last period. If you go non-periodic, either you
>>have to charge a variable period when the sample is taken or you have to set up
>>timers as part of context switching. This ladder is not wise as the context
>>switch overhead then gets larger. It is rather easy to show that the accounting
>>overhead in such a system with a modest load is higher than in a periodic
>>sampled system. This system is also charged with doing the time slicing. With
>>out a periodic tick, a timer needs to be set up each context switch.
>
>
> I've not looked at the accounting subsystem yet, but I'll try to dig in
> and see what we can do here. Thanks for the heads up.

This is the main reason a tick less system is unwise. It is overload prone.
>
>
>>In addition to all of this, there is the issue of how to organize the timer list
>>so that you can find the next timer. With the current organization this is
>>something you just don't want to do. On the other hand, I am convinced that,
>>for periodic timers, it is about as good as it gets.
>
>
> You might have to go a bit more into detail on this last point.

First, lets agree that we need not be in love with any given timer list structure.

The issues to be addressed are:
1) Fast insert.
2) Fast removal prior to expire, (almost all timers never expire).
3) Fast look up and removal of timers to expire NOW

If you want to find the next timer to expire at some time in the furture:
4) Fast look up of the next timer.

The current cascade timer list does a very good job of 1, 2, and 3 but is not
set up to make 4 easy or fast.


>
> One thing I'd like to emphasize is that while Nish and my work do change
> a large amount of code that collides with your code, we want to make
> sure that what the HRT patch achieves is still possible. As I get more
> familiar with the HRT code needs and you get more familiar with what
> we're providing I hope things will work smoothly.
>
As do I.
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/


2005-05-04 22:14:05

by john stultz

[permalink] [raw]
Subject: Re: Help with the high res timers

On Wed, 2005-05-04 at 10:46 -0700, George Anzinger wrote:
> > Well, as long as the HZ period is close to the timer-interval unit
> > length, this is true. However if the timer-interval unit is smaller,
> > multiple bucket entries would be expired. The performance considerations
> > here are being looked at and this may be an area where the concepts in
> > HRT might help (having a HRT specific sub-bucket).
>
> This is where we get in trouble with HR timers. For a HR timer, we need to know
> how to get a timer to expire (i.e. appear in the call back) at a well defined
> and presise time (leaving aside latency issues). The above discription allows
> timers to be put in buckets without (as near as I can tell) making transparent
> exactly when the bucket will be emptied, only saying that it will be after the
> latest timer in the bucket is due.

<snip>

>
> Think of it this way. Decompose a HR timer into corse and fine units (you
> choose, but here let say jiffies and nanoseconds). Now we want the normal timer
> system to handle the jiffies part of the time and to turn the timer over to the
> HR timer code to take care of the nanosecond remainder. If the jiffie part is
> late, depending on the nanosecond part, it could make the timer late (i.e for
> low values of the nanosecond part). For high values of the nanosecond part, we
> can compenstate...
>
> This decomposition makes a lot of sense, by the way, for, at least, the
> following reasons:
> 1) it keeps the most of the HR issues out of the normal timer code,
> 2) it keeps high res and low res timer in the correct time order, i.e. a low res
> timer for jiffie X will expire prior to a high res timer for jiffie X + Y
> nanoseconds.
> 3) handling the high res timer list is made vastly easier as it will only need
> to have a rather small number of timers in it at any given time (i.e. those that
> are to expire prior to the next corse timer tick).


Hmmm. Ok I think I see what your getting at. Let me know where I go
wrong:

1. Since HR soft-timers are a special case, their absolute nanosecond
expire values (exp_ns) are decomposed into a low-res portion and a high-
res portion. In your case it is units of jiffies (exp_jf) and
arch_cycles (exp_ac) respectively.

2. Since jiffies units map directly to a periodic tick, one can set a
regular soft-timer to expire at exp_jf. Then when it is expired, it is
added to a separate HR-timers list to expire in exp_ac arch_cycles
units.

3. With the new-timeofday rework and Nish's soft-timers code, the soft-
timers bucket entries map to actual nanosecond time values, rather then
ticks. This causes a problem with your two level (regular periodic and
hr-timer) timer-lists because since entries don't strictly map to
interrupts, you don't how to decompose the nanosecond expiration into
low-res and high-res portions.


Here is my possible solution: Still keeping Nish's soft-timer rework
where we use nanoseconds instead of ticks or jiffies, provide an
expected interrupt-period value, which gives you the maximum interval
length between ticks. Thus you schedule a regular-low-res timer for the
nanosecond value (exp_ns - expected_interrupt_period). When that timer
expires, you move the timer to the HR timer list and schedule an
interrupt for the remaining time.

Let me know how that sounds.

thanks
-john

2005-05-04 22:48:29

by George Anzinger

[permalink] [raw]
Subject: Re: Help with the high res timers

john stultz wrote:
> On Wed, 2005-05-04 at 10:46 -0700, George Anzinger wrote:
>
>>>Well, as long as the HZ period is close to the timer-interval unit
>>>length, this is true. However if the timer-interval unit is smaller,
>>>multiple bucket entries would be expired. The performance considerations
>>>here are being looked at and this may be an area where the concepts in
>>>HRT might help (having a HRT specific sub-bucket).
>>
>>This is where we get in trouble with HR timers. For a HR timer, we need to know
>>how to get a timer to expire (i.e. appear in the call back) at a well defined
>>and precise time (leaving aside latency issues). The above description allows
>>timers to be put in buckets without (as near as I can tell) making transparent
>>exactly when the bucket will be emptied, only saying that it will be after the
>>latest timer in the bucket is due.
>
>
> <snip>
>
>>Think of it this way. Decompose a HR timer into coarse and fine units (you
>>choose, but here let say jiffies and nanoseconds). Now we want the normal timer
>>system to handle the jiffies part of the time and to turn the timer over to the
>>HR timer code to take care of the nanosecond remainder. If the jiffie part is
>>late, depending on the nanosecond part, it could make the timer late (i.e for
>>low values of the nanosecond part). For high values of the nanosecond part, we
>>can compenstate...
>>
>>This decomposition makes a lot of sense, by the way, for, at least, the
>>following reasons:
>>1) it keeps the most of the HR issues out of the normal timer code,
>>2) it keeps high res and low res timer in the correct time order, i.e. a low res
>>timer for jiffie X will expire prior to a high res timer for jiffie X + Y
>>nanoseconds.
>>3) handling the high res timer list is made vastly easier as it will only need
>>to have a rather small number of timers in it at any given time (i.e. those that
>>are to expire prior to the next coarse timer tick).
>
>
>
> Hmmm. Ok I think I see what your getting at. Let me know where I go
> wrong:
>
> 1. Since HR soft-timers are a special case, their absolute nanosecond
> expire values (exp_ns) are decomposed into a low-res portion and a high-
> res portion. In your case it is units of jiffies (exp_jf) and
> arch_cycles (exp_ac) respectively.
>
> 2. Since jiffies units map directly to a periodic tick, one can set a
> regular soft-timer to expire at exp_jf. Then when it is expired, it is
> added to a separate HR-timers list to expire in exp_ac arch_cycles
> units.
>
> 3. With the new-timeofday rework and Nish's soft-timers code, the soft-
> timers bucket entries map to actual nanosecond time values, rather then
> ticks. This causes a problem with your two level (regular periodic and
> hr-timer) timer-lists because since entries don't strictly map to
> interrupts, you don't how to decompose the nanosecond expiration into
> low-res and high-res portions.
>
>
> Here is my possible solution: Still keeping Nish's soft-timer rework
> where we use nanoseconds instead of ticks or jiffies, provide an
> expected interrupt-period value, which gives you the maximum interval
> length between ticks. Thus you schedule a regular-low-res timer for the
> nanosecond value (exp_ns - expected_interrupt_period). When that timer
> expires, you move the timer to the HR timer list and schedule an
> interrupt for the remaining time.

Oh, I have been thinking along these same lines. I don't recall at the moment,
but I depend on the timer retaining the absolute expire time as I set it. This
is used both by the secondary timer, and by the rearm code. I need to look more
closely at that. But this is picking things apart at a very low level and does
not, I think, address timer ordering to the point where I feel completely safe.
I.e. will such a scheme allow a LR time at time X to expire after a HR timer
for time X+y.
>
> Let me know how that sounds.
>
> thanks
> -john
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/

2005-05-04 23:27:10

by john stultz

[permalink] [raw]
Subject: Re: Help with the high res timers

On Wed, 2005-05-04 at 15:48 -0700, George Anzinger wrote:
> john stultz wrote:
> > Here is my possible solution: Still keeping Nish's soft-timer rework
> > where we use nanoseconds instead of ticks or jiffies, provide an
> > expected interrupt-period value, which gives you the maximum interval
> > length between ticks. Thus you schedule a regular-low-res timer for the
> > nanosecond value (exp_ns - expected_interrupt_period). When that timer
> > expires, you move the timer to the HR timer list and schedule an
> > interrupt for the remaining time.
>
> Oh, I have been thinking along these same lines. I don't recall at the moment,
> but I depend on the timer retaining the absolute expire time as I set it. This
> is used both by the secondary timer, and by the rearm code. I need to look more
> closely at that.

Glad we're on similar wavelengths. :)

> But this is picking things apart at a very low level and does
> not, I think, address timer ordering to the point where I feel completely safe.
> I.e. will such a scheme allow a LR time at time X to expire after a HR timer
> for time X+y.

Hmm. That sounds like an interesting problem as well, although I'm not
familiar enough with it to make a reasonable comment. Let me ponder it
for awhile and see what can be done.

thanks
-john