LinuxLists.cc - [patch 00/21] hrtimer - High-resolution timer subsystem

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi Thomas,

On Tue, 6 Dec 2005 [email protected] wrote:

Before I get into a detailed review, I have to asked a question I already
asked earlier: are even interested in a discussion about this?

Since I posted the ptimer patches, I haven't gotten a single direct
response from you, except some generic description in your last patch.
I would prefer if we could work together on this, but this requires some
communication. I know I'm sometimes a little hard to understand, but you
don't even try to ask if something is unclear or to explain the details
from your perspective.
Slowly I'm asking myself why I should bother, the alternative would be to
just continue my own patch set. I don't really want that and Andrew
certainly doesn't want to choose between two versions either. So Thomas,
please get over yourself and start talking.

> We worked through the subsystem and its users and further reduced the
> implementation to the basic required infrastructure and generally
> streamlined it. (We did this with easy extensibility for the high
> resolution clock support still in mind, so we kept some small extras
> around.)

It looks better, but could you please explain, what these extras are good
for?

> After reading the Posix specification again, we came to the conclusion
> that it is possible to do no rounding at all for the ktime_t values, and
> to still ensure that the timer is not delivered early.

Nice, that you finally also come to that conclusion, after I said that
already for ages. (It's also interesting how you do that without giving me
any credit for it.)
Nevertheless, if you read my explanation of the rounding carefully and
look at my implementation, you may notice that I still disagree with the
actual implementation.

BTW there is one thing I'm currently curious about. Why did you rename the
timer from high-precision timer to high-resolution timer? hrtimer was just
a suggestion from Andrew and ptimer would have been fine as well.

bye, Roman

2005-12-06 19:07:14

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

* Roman Zippel <[email protected]> wrote:

> Hi Thomas,
>
> On Tue, 6 Dec 2005 [email protected] wrote:
>
> Before I get into a detailed review, I have to asked a question I
> already asked earlier: are even interested in a discussion about this?

we are certainly interested in a technical discussion!

> I would prefer if we could work together on this, but this requires
> some communication. I know I'm sometimes a little hard to understand,
> but you don't even try to ask if something is unclear or to explain
> the details from your perspective.

you think the reason is that you are "sometimes a little hard to
understand". Which, as i guess it implies, comes from your superior
intellectual state of mind, and i am really thankful for your efforts
trying to educate us mere mortals.

but do you honestly believe that this is the only possible reason? How
about "your message often gets lost because you often offend people and
thus do not respect their work" as a possibility? How about "hence it
has not been much fun to work with you" as a further explanation?

to be able to comprehend what kind of mood we might be in when reading
your emails these days, how about this little snippet from you, from the
second email you wrote in the ktimers threads:

"First off, I can understand that you're rather upset with what I wrote,
unfortunately you got overly defensive, so could you please next time
not reply immediately and first sleep over it, an overly emotional
reply is funny to read but not exactly useful."

http://marc.theaimsgroup.com/?l=linux-kernel&m=112743074308613&w=2

and to tell you my personal perspective, the insults coming from you in
our direction have not appeared to have stopped since. I am being dead
serious here, and i'd love nothing else if you stopped doing what you
are doing and if i didnt have to write such mails and if things got more
constructive in the end. Insults like the following sentence in this
very email:

> [...] So Thomas, please get over yourself and start talking.

let me be frank, and show you my initial reply that came to my mind when
reading the above sentence: "who the f*ck do you think you are to talk
to _anyone_ like that?". Now i'm usually polite and wont reply like
that, but one thing is sure: no technical thought was triggered by your
sentence and no eternal joy filled my mind aching to reply to your
questions. Suggestion: if you want communication and cooperation, then
be cooperative to begin with. We are doing Linux kernel programming for
the fun of it, and the style of discussions matters just as much as the
style of code.

i'm not sure what eternal sin we've committed to have deserved the
sometimes hidden, sometimes less hidden trash-talk you've been
practicing ever since we announced ktimers.

in any case, from me you'll definitely get a reply to every positive or
constructive question you ask in this thread, but you wont get many
replies to mails that also include high-horse insults, question or
statements. Frankly, i dont have that much time to burn, we've been
through one ktimer flamewar already and it wasnt overly productive :)

Ingo

2005-12-06 22:04:48

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi Roman,

On Tue, 2005-12-06 at 18:32 +0100, Roman Zippel wrote:
> Before I get into a detailed review, I have to asked a question I already
> asked earlier: are even interested in a discussion about this?

Yes, I am and always was, as long it is on a technical level.

> Slowly I'm asking myself why I should bother, the alternative would be
> to just continue my own patch set. I don't really want that and Andrew
> certainly doesn't want to choose between two versions either. So
> Thomas, please get over yourself and start talking.

I'm interested in working with others and I do that a lot. It depends a
bit on the attitude of the person who wants to do that. I did not have
the feeling that you are interested in working together. Usually people
who want to participate in a project send patches, suggestions or
testing feedback. Your reaction throughout the whole mail threads was
neither cooperative nor appealing to me. I have no problem at all to
accept critizism and help from others, but your attitude of teaching me
how to do my work was just annoying.

When others have done the hard chores of analysing the underlying
problems and trying to solve them in various ways it is a simple task to
jump in and tell them the big truth of the right solution. Acknowledging
the work of others which led to a maybe imperfect solution in the first
pass and helping in a constructive way to bring it to a better shape is
a different thing.

Sure you can fork off your own project and do what you want if you feel
the urge to do so. We'd prefer to see patches against our queue, but
it's up to you.

I'm replying to the technical points in a different mail.

tglx

2005-12-06 22:21:57

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi Roman,

On Tue, 2005-12-06 at 18:32 +0100, Roman Zippel wrote:

> > We worked through the subsystem and its users and further reduced the
> > implementation to the basic required infrastructure and generally
> > streamlined it. (We did this with easy extensibility for the high
> > resolution clock support still in mind, so we kept some small extras
> > around.)
>
> It looks better, but could you please explain, what these extras are
> good for?

One extra we kept is the list and the state field, which are useful for
the high resolution implementation. We wanted to keep as much as
possible common code [shared between the current low-resolution clock
based hrtimer code and future high-resolution clock based hrtimercode]
to avoid the big #ifdeffery all over the place.

It might turn out in the rework of the hrt bits that the list can go
away, but this is a nobrainer to do. (The very first unpublished version
of ktimers had no list, it got introduced during the hrt addon and
stayed there to make the hrt patch less intrusive.)

> > After reading the Posix specification again, we came to the conclusion
> > that it is possible to do no rounding at all for the ktime_t values, and
> > to still ensure that the timer is not delivered early.
>
> Nice, that you finally also come to that conclusion, after I said that
> already for ages. (It's also interesting how you do that without
> giving me any credit for it.)

Sorry if it was previously your idea and if we didnt credit you for it.
I did not keep track of each word said in these endless mail threads. We
credited every suggestion and idea which we picked up from you, see our
previous emails. If we missed one, it was definitely not intentional.

The decision to change the rounding implementation was not made based on
reading old mail threads. It was made by doing tests and analysis and it
differs a lot from your implementation.

Setting up a periodic timer leads to a summing-up error versus the
expected timeline. This applies both to vanilla, ktimers and ptimers.
The difference is how this error is showing up:

The vanilla 2.6.15-rc5 kernel has an rather constant error which is in
the range of roughly 1ms per interval mostly independent of the given
interval and the system load. This is roughly 1 sec per 1000 intervals.

Ktimers had a contant summing error which is exactly the delta of the
rounded interval to the real interval. The error is number of intervals
* delta. The error could be deduced by the application, but thats not a
really good idea. For a 50ms interval it is 2sec / 1000 intervals, which
is exactly the 2ms delta between the 50ms requested and the 52ms real
interval on a system with HZ=250

Ptimers have a rounding error which depends on the delta to the jiffy
resolution and the system load. So it gets rather unpredicitble what
happens. The basic error is roughly the same as with ktimers, but the
addon component due to system load is not. For a 50ms interval a summing
error between 2sec and 7sec per 1000 intervals was measured.

So while vanilla and ktimers have a systematic error, ptimer introduces
random Brownean motion!

We analysed the problem again and went through the spec and came to the
conclusion that rounding can be completely omitted. We changed the code
accordingly and did the same tests. The result is systematic deviation
of the timeline which wanders between 0 and resolution - 1 [i.e. 0-4msec
with HZ=250], but does not introduce a summing error. This behaviour
will be the same when high resolution bits are put on top. Of course the
error then will be significantly smaller.

To sum up the effects of various implementations (and
non-implementations in our hrtimers case) of rounding, a 50 msec
interval timer accumulates the following timeline error (precision
error) over 1000 periods (50 seconds):

vanilla: 1000 msecs
ktimers: 2000 msecs
ptimers: 2000-7000 msecs
hrtimers: 4 msecs

In the interim low-res ktimers version we were concentrating on the
'multiples of exposed resolution case. E.g. with 40 msec intervals
(which is 10x 4msec jiffy) you'd only get 0-4msecs longterm error:

vanilla: 1000 msecs
ktimers: 4 msecs
ptimers: 8-2000 msecs
hrtimers: 4 msecs

> Nevertheless, if you read my explanation of the rounding carefully and
> look at my implementation, you may notice that I still disagree with
> the actual implementation.

I started to read it, but your explanation seems to be completely
detached from the testing results and the code.

I can imagine that you dont agree, but you might also elaborate why. I
definitely disagree with your implementation for the following reasons:

You define that absolute interval timers on clock realtime change their
behaviour when the initial time value is expired into relative timers
which are not affected by time settings. I have no idea how you came to
that interpretation of the spec. I completely disagree [but if you would
like I can go into more detail why I think it's wrong.]

Beside of that, the implementation is also completely broken. (You
rebase the timer from the realtime base to the monotonic base inside of
the timer callback function. On return you lock the realtime base and
enqueue the timer into the realtime queue, but the base field of the
timer points to the monotonic queue. It needs not much phantasy to get
this exploited into a crash.)

Furthermore, your implementation is calculating the next expiry value
based on the current time of the expiration rather than on the previous
expected expiry time, which would be the natural thing to do. This
detail also explains the system-load dependent random drifting of
ptimers quite well.

The changes you did to the timer locking code (also in timer.c) are racy
and simply exposable. Oleg's locking implementation is there for a good
reason.

Neither do I understand the jiffie boundness you re-introduced all over
the place. The softirq code is called once per jiffy and the expiry is
checked versus the current time. Basing a new design on jiffies, where
the design intends to be easily extensible to high resolution clocks, is
wrong IMNSHO. Doing a high resolution extension on top of it is just
introducing a lot of #ifdef mess in places where none has to be. We had
that before, and dont want to go back there.

One of your main, often repeated arguments was the complexity of
ktimers. While ktimers held a lot of complex functionality, the "simple"
ptimers .text size is larger than the ktimers one! I know that you claim
that .text size is not a criteria, but Andrew seriously asked what he
gets for the increase of .text.

> BTW there is one thing I'm currently curious about. Why did you rename
> the timer from high-precision timer to high-resolution timer? hrtimer
> was just a suggestion from Andrew and ptimer would have been fine as
> well.

We decided to rename 'ktimer' because Andrew pretty much vetoed the
'ktimeout' queue, and "timer_list" plus "ktimer" looked and sounded
confusing (as we've explained it before). Of the possible target names,
we decided against "ptimer" because it could be confused with "process
timers" and "posix timers". hrtimers is a clear term that indicates what
those timers do, so we picked up Andrew's suggestion as a way out the
endless naming discussion. Does this satisfy your curiosity?

tglx

2005-12-07 03:05:30

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Tue, 6 Dec 2005, Ingo Molnar wrote:

> you think the reason is that you are "sometimes a little hard to
> understand". Which, as i guess it implies, comes from your superior
> intellectual state of mind, and i am really thankful for your efforts
> trying to educate us mere mortals.

I can assure you my "superior intellectual state of mind" is not much
different from many other kernel hackers. I have at times strong opinions,
but who here hasn't?

> to be able to comprehend what kind of mood we might be in when reading
> your emails these days, how about this little snippet from you, from the
> second email you wrote in the ktimers threads:
>
> "First off, I can understand that you're rather upset with what I wrote,
> unfortunately you got overly defensive, so could you please next time
> not reply immediately and first sleep over it, an overly emotional
> reply is funny to read but not exactly useful."

Here we probably get to the root of the problem: we got off on the wrong
foot.
In my first email I hadn't much good to say about the initial
announcement, but at any time it was meant technical. Anyone who compares
the first and the following announcement will notice the big improvement.
Unfortunately Thomas seemed to have to taken it rather personal (although
it never was meant that way) and I never got past this first impression
and ever since I can't get him back to a normal conversation.

> Insults like the following sentence in this very email:
>
> > [...] So Thomas, please get over yourself and start talking.

I must say it's completely beyond me how this could be "insulting". This
is my desperate attempt at getting any conversation started. If Thomas
isn't talking to me at all, I can't resolve any issue he might have with
me. Instead he's just moping around, pissed at me and simply ignores me,
which makes a conversation over this channel nearly impossible.

> let me be frank, and show you my initial reply that came to my mind when
> reading the above sentence: "who the f*ck do you think you are to talk
> to _anyone_ like that?". Now i'm usually polite and wont reply like
> that,...

You may haven't said it openly like that, but this hostility was still
noticable. You disagreed with me on minor issues and used the smallest
mistake to simply lecture me. From my point the attitude you showed
towards me is not much different from what you're accusing me of here.
I'm not saying that I'm innocent about this, but any "insult" was never
intentional and I tried my best to correct any issues after we got off on
the wrong foot, but I obviously failed at that, I simply never got past
the initial impression.

> in any case, from me you'll definitely get a reply to every positive or
> constructive question you ask in this thread, but you wont get many
> replies to mails that also include high-horse insults, question or
> statements.

Let's take the ptimer patches, I got _zero_ direct responses to it and
it's difficult for me to understand how this could be taken as "high-horse
insult". As I obviously failed to make my criticism understandable before,
I produced these patches to provide a technical base for a discussion of
how this functionality could be merged in the hopes of "Patches wont be
ignored, i can assure you", unfortunately they were.
Ingo, you might now start to understand my frustration. One positive
effect at least is that finally some movement got into this mess and you
managed to produce a simplified version of the timer. OTOH since I never
got a reply to these patches does that mean they were neither positive nor
constructive?

bye, Roman

2005-12-07 03:11:45

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Tue, 6 Dec 2005, Thomas Gleixner wrote:

> I'm interested in working with others and I do that a lot. It depends a
> bit on the attitude of the person who wants to do that. I did not have
> the feeling that you are interested in working together. Usually people
> who want to participate in a project send patches, suggestions or
> testing feedback. Your reaction throughout the whole mail threads was
> neither cooperative nor appealing to me. I have no problem at all to
> accept critizism and help from others, but your attitude of teaching me
> how to do my work was just annoying.

See my mail to Ingo about most of this. The basic point is you should have
told me about this earlier, simply ignoring the problem won't make it go
away. Your annoyance was quite noticable, but this seemed to include my
complete contribution. You never said what annoyed you, which makes it
rather hard for me to you to change it into a more acceptable form.

bye, Roman

2005-12-07 09:32:03

by Andrew Morton

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Thomas Gleixner <[email protected]> wrote:
>
> We decided to rename 'ktimer' because Andrew pretty much vetoed the
> 'ktimeout' queue

Well I whined about the rename of timer_list to ktimeout and asked why it
happened. I don't think anyone replied.

I assume from your above statement that there wasn't really a strong reason
for the rename, and that a new patch series is in the offing, which adds
ktimers and leaves timer_list alone?

Is ktimer a better name than ptimer or hrtimer?

2005-12-07 10:11:39

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

* Andrew Morton <[email protected]> wrote:

> Thomas Gleixner <[email protected]> wrote:
> >
> > We decided to rename 'ktimer' because Andrew pretty much vetoed the
> > 'ktimeout' queue

let me defuse things a bit here: the above sentence might sound a bit
bitter, but we really, truly are not. You were more like the sane voice
slapping us back into reality: for the next 2 years we do not want to be
buried in tons of timer_list->ktimeout patches, causing disruption all
across the kernel (and external trees). You definitely did not 'veto' it
in any way, and in fact you are carrying it in -mm currently.

I did the ktimeout queue in an hour or so, and i dont have strong
feelings about it. I very much agree with you that a mass rename could
easily cause more problems than the added clarity adds - still i had to
try the ktimeout queue, because i'm hopelessly purist at heart :)

Maybe in a few years all substantial kernel code will be managed by a
network of GIT repositories, and GIT will be extended with automatic
'mass namespace change' capabilities, making an overnight switchover
much more practical.

> Well I whined about the rename of timer_list to ktimeout and asked why
> it happened. I don't think anyone replied.

we thought we had this issue covered way too many times :) but find
below my original justification for the ktimeout patch-queue. This is
just for historical interest now i think.

[ insert the text below here. Time passes as everyone reads it :-) ]

once we take 'mass change of timer_list to ktimeout' out of the possible
things to do, we've only got these secondary possibilities:

'struct timer_list, struct ktimer'
'struct timer_list, struct ptimer'
'struct timer_list, struct hrtimer'

and having eliminated the first option due to being impractical to pull
off, we had the choice between 'ptimer' and 'hrtimer', and went for the
last one, for the following reason [snipped from a mail to Roman]:

| we decided against "ptimer" because it could be confused with "process
| timers" and "posix timers". hrtimers is a clear term that indicates
| what those timers do, so we picked up Andrew's suggestion as a way out
| the endless naming discussion.

but really ... facing an imperfect naming situation (i do not think
timer_list is the correct name - just as much as struct inode_list would
not be correct - but it is the historic name and i think you are right
that we've got to live with it) i'm alot less passionate about which one
to pick. If we had the chance to have perfect naming, i'd definitely
spend the effort to get it right, but now lets just go with the most
descriptive one: 'struct hrtimer'.

Ingo

-----
regarding naming. This is something best left to native speakers, but
i'm not giving up the issue just yet :-)

i always sensed and agreed that 'struct ktimer' and 'struct timer_list'
together is confusing. Same for kernel/ktimers.c and kernel/timers.c. So
no argument about that, this situation cannot continue.

but the reason i am still holding on to 'struct ktimer' is that i think
the end result should be:

- 'struct ktimer' (for precise timers where the event of expiry is the
main function)

- 'struct ktimeout' (for the wheel-based timeouts, where expiry is an
exception)

Similarly, kernel/ktimer.c for ktimers, and kernel/ktimeout.c for
timeouts.

see the attached finegrained patchqueue that does all the changes to
rename 'timers' to 'timeouts' [and to clean up the resulting subsystem],
to see what i'm trying to achieve.

For now i'm ignoring the feasibility of a 'mass API change' issues -
those are better up to lkml. The queue does build and compile fine on a
rather big .config so the only question is - do we want it. Note that
the patch does not have to touch even a single non-subsystem user of the
timer.c APIs, so the renaming is robust.

IMO it looks a lot less confusing and dualistic that way. The rename is
technically feasible and robust mainly because we can do this:

#define timer_list timeout

for the transition period (see the patch-queue). Fortunately timer_list
is not a generic name. (it's also an incorrect name because it implies
implementation) Here's the full list of mappings that occur:

#define timer_list ktimeout

#define TIMER_INITIALIZER KTIMEOUT_INITIALIZER
#define DEFINE_TIMER DEFINE_KTIMEOUT
#define init_timer ktimeout_init
#define setup_timer ktimeout_setup
#define timer_pending ktimeout_pending
#define add_timer_on ktimeout_add_on
#define del_timer ktimeout_del
#define __mod_timer __ktimeout_mod
#define mod_timer ktimeout_mod
#define next_timer_interrupt ktimeout_next_interrupt
#define add_timer ktimeout_add
#define try_to_del_timer_sync ktimeout_try_to_del_sync
#define del_timer_sync ktimeout_del_sync
#define del_singleshot_timer_sync ktimeout_del_singleshot_sync
#define init_timers init_ktimeouts
#define run_local_timers run_local_ktimeouts

but maybe 'struct ptimer' and 'struct ktimeout' is the better choice? I
dont think so, but it's a possibility.

so i believe that:

- 'struct ktimer', 'struct ktimeout'

is in theory superior naming, compared to:

- 'struct ptimer', 'struct timer_list'

again, ignoring all the 'do we want to have a massive namespace change'
issues.

Ingo

2005-12-07 10:20:35

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

* Ingo Molnar <[email protected]> wrote:

> once we take 'mass change of timer_list to ktimeout' out of the possible
> things to do, we've only got these secondary possibilities:
>
> 'struct timer_list, struct ktimer'
> 'struct timer_list, struct ptimer'
> 'struct timer_list, struct hrtimer'
>
> and having eliminated the first option due to being impractical to pull
> off, [...]

(correction: due to being confusing.)

Ingo

2005-12-07 10:23:37

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Ingo Molnar wrote:

> so i believe that:
>
> - 'struct ktimer', 'struct ktimeout'
>
> is in theory superior naming, compared to:
>
> - 'struct ptimer', 'struct timer_list'
>

Just curious -- why the "k" thing?

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-12-07 10:48:58

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

* Nick Piggin <[email protected]> wrote:

> Ingo Molnar wrote:
>
> >so i believe that:
> >
> > - 'struct ktimer', 'struct ktimeout'
> >
> >is in theory superior naming, compared to:
> >
> > - 'struct ptimer', 'struct timer_list'
> >
>
> Just curious -- why the "k" thing?

yeah. 'struct timer' and 'struct timeout' is even better. I tried it on
real code and sometimes it looked a bit funny: often we have a 'timeout'
parameter somewhere that is a scalar or a timeval/timespec. So at least
for variable names it was useful to have it in this form:

struct timeout *ktimeout;

struct timer *ktimer;

otherwise it looked OK. This is also in line with most other 'object
names' we have in the kernel: struct inode, struct dentry.

Ingo

2005-12-07 11:09:35

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
>
>>Ingo Molnar wrote:
>>
>>
>>>so i believe that:
>>>
>>> - 'struct ktimer', 'struct ktimeout'
>>>
>>>is in theory superior naming, compared to:
>>>
>>> - 'struct ptimer', 'struct timer_list'
>>>
>>
>>Just curious -- why the "k" thing?
>
>
> yeah. 'struct timer' and 'struct timeout' is even better. I tried it on

Oh good, glad you think so :)

> real code and sometimes it looked a bit funny: often we have a 'timeout'
> parameter somewhere that is a scalar or a timeval/timespec. So at least

Sure... hmm, the names timeout and timer themselves have something
vagely wrong about them, but I can't quite place my finger on it,
not a real worry though...

Maybe it is that timeout is an end result, but timer is a mechanism.
So maybe it should be 'struct interval', 'struct timeout';
or 'struct timer', 'struct timeout_timer'.

But I don't know really, it isn't a big deal.

Nick

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-12-07 11:33:23

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

* Nick Piggin <[email protected]> wrote:

> >>Just curious -- why the "k" thing?
> >
> >
> >yeah. 'struct timer' and 'struct timeout' is even better. I tried it on
>
> Oh good, glad you think so :)
>
> >real code and sometimes it looked a bit funny: often we have a 'timeout'
> >parameter somewhere that is a scalar or a timeval/timespec. So at least
>
> Sure... hmm, the names timeout and timer themselves have something
> vagely wrong about them, but I can't quite place my finger on it, not
> a real worry though...
>
> Maybe it is that timeout is an end result, but timer is a mechanism.

hm, i think you are right.

> So maybe it should be 'struct interval', 'struct timeout'; or 'struct
> timer', 'struct timeout_timer'.

maybe 'struct timer' and 'struct hrtimer' is the right solution after
all, and our latest queue doing 'struct timer_list' + 'struct hrtimer'
is actually quite close to it.

'struct ptimer' does have a bit of vagueness in it at first sight, do
you agree with that? (does it mean 'process'? 'posix'? 'precision'?)

also, hrtimers on low-res clocks do have high internal resolution, but
they are not precise timing mechanisms in the end, due to the low-res
clock. So the more generic name would be 'high-resolution timers', not
'precision timers'. (also, the name 'precision timers' sounds a bit
funny too, but i dont really know why.)

Ingo

2005-12-07 11:41:00

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>

>>Maybe it is that timeout is an end result, but timer is a mechanism.
>
>
> hm, i think you are right.
>
>
>>So maybe it should be 'struct interval', 'struct timeout'; or 'struct
>>timer', 'struct timeout_timer'.
>
>
> maybe 'struct timer' and 'struct hrtimer' is the right solution after
> all, and our latest queue doing 'struct timer_list' + 'struct hrtimer'
> is actually quite close to it.
>
> 'struct ptimer' does have a bit of vagueness in it at first sight, do
> you agree with that? (does it mean 'process'? 'posix'? 'precision'?)
>

Yes I would agree that the p doesn't add much, wheras hrtimer at least
*rules out* the obvious process and posix.

I can't see a problem with timer and hrtimer myself.

Thanks,
Nick

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-12-07 12:18:52

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi

On Tue, 6 Dec 2005, Thomas Gleixner wrote:

> > It looks better, but could you please explain, what these extras are
> > good for?
>
> One extra we kept is the list and the state field, which are useful for
> the high resolution implementation. We wanted to keep as much as
> possible common code [shared between the current low-resolution clock
> based hrtimer code and future high-resolution clock based hrtimercode]
> to avoid the big #ifdeffery all over the place.
>
> It might turn out in the rework of the hrt bits that the list can go
> away, but this is a nobrainer to do. (The very first unpublished version
> of ktimers had no list, it got introduced during the hrt addon and
> stayed there to make the hrt patch less intrusive.)

If it's such a nobrainer to remove it, why don't you add it later? As it
is right now, it's not needed and nobody but you knows what it's good for.
This way yoy make it only harder to review the code, if one stumbles over
these pieces all the time.

> > Nice, that you finally also come to that conclusion, after I said that
> > already for ages. (It's also interesting how you do that without
> > giving me any credit for it.)
>
> Sorry if it was previously your idea and if we didnt credit you for it.
> I did not keep track of each word said in these endless mail threads. We
> credited every suggestion and idea which we picked up from you, see our
> previous emails. If we missed one, it was definitely not intentional.

http://marc.theaimsgroup.com/?l=linux-kernel&m=112755827327101
http://marc.theaimsgroup.com/?l=linux-kernel&m=112760582427537

A bit later ktime_t looked pretty much like the 64bit part of my ktimespec.
I don't won't to imply any intention, but please try to see this from my
perspective, after this happens a number of times.

> Setting up a periodic timer leads to a summing-up error versus the
> expected timeline. This applies both to vanilla, ktimers and ptimers.
> The difference is how this error is showing up:
>
> The vanilla 2.6.15-rc5 kernel has an rather constant error which is in
> the range of roughly 1ms per interval mostly independent of the given
> interval and the system load. This is roughly 1 sec per 1000 intervals.
>
> Ktimers had a contant summing error which is exactly the delta of the
> rounded interval to the real interval. The error is number of intervals
> * delta. The error could be deduced by the application, but thats not a
> really good idea. For a 50ms interval it is 2sec / 1000 intervals, which
> is exactly the 2ms delta between the 50ms requested and the 52ms real
> interval on a system with HZ=250
>
> Ptimers have a rounding error which depends on the delta to the jiffy
> resolution and the system load. So it gets rather unpredicitble what
> happens. The basic error is roughly the same as with ktimers, but the
> addon component due to system load is not. For a 50ms interval a summing
> error between 2sec and 7sec per 1000 intervals was measured.
>
> So while vanilla and ktimers have a systematic error, ptimer introduces
> random Brownean motion!

Thomas, you unfortunately don't provide enough context for these numbers
for me to reproduce them.
I don't want to guess, so please provide an example to demonstrate this.

> > Nevertheless, if you read my explanation of the rounding carefully and
> > look at my implementation, you may notice that I still disagree with
> > the actual implementation.
>
> I started to read it, but your explanation seems to be completely
> detached from the testing results and the code.

Thomas, if you don't ask me, I can't help you to understand any issues,
there might have been.

> You define that absolute interval timers on clock realtime change their
> behaviour when the initial time value is expired into relative timers
> which are not affected by time settings. I have no idea how you came to
> that interpretation of the spec. I completely disagree [but if you would
> like I can go into more detail why I think it's wrong.]

Please do. I explained it in one of my patches:

[PATCH 5/9] remove relative timer from abs_list

When an absolute timer expires, it becomes a relative timer, so remove
it from the abs_list. The TIMER_ABSTIME flag for timer_settime()
changes the interpretation of the it_value member, but it_interval is
always a relative value and clock_settime() only affects absolute time
services.

> Beside of that, the implementation is also completely broken. (You
> rebase the timer from the realtime base to the monotonic base inside of
> the timer callback function. On return you lock the realtime base and
> enqueue the timer into the realtime queue, but the base field of the
> timer points to the monotonic queue. It needs not much phantasy to get
> this exploited into a crash.)

If you provide the wrong parameters, you can crash a lot of stuff in the
kernel. "exploited" usually implies it can be abused from userspace, which
is not the case here.

> Furthermore, your implementation is calculating the next expiry value
> based on the current time of the expiration rather than on the previous
> expected expiry time, which would be the natural thing to do. This
> detail also explains the system-load dependent random drifting of
> ptimers quite well.

Is this conclusion based on actual testing? The behaviour of ptimer should
be quite close to the old jiffie based timer, so I'm a bit at a loss here,
how you get to this conclusion. Please provide more details.

> The changes you did to the timer locking code (also in timer.c) are racy
> and simply exposable. Oleg's locking implementation is there for a good
> reason.

Thomas, bringing up this issue is really weak. With Oleg's help it's
already solved, you don't have to warm it up. :(

> Neither do I understand the jiffie boundness you re-introduced all over
> the place. The softirq code is called once per jiffy and the expiry is
> checked versus the current time. Basing a new design on jiffies, where
> the design intends to be easily extensible to high resolution clocks, is
> wrong IMNSHO. Doing a high resolution extension on top of it is just
> introducing a lot of #ifdef mess in places where none has to be. We had
> that before, and dont want to go back there.

I don't understand where you get this from, I explicitely said that higher
resolution requires a better clock abstraction, bascially any place which
mentions TICK_NSEC has to be cleaned up like this. I'm at loss why you
think this requires "a lot of #ifdef mess".

> One of your main, often repeated arguments was the complexity of
> ktimers. While ktimers held a lot of complex functionality, the "simple"
> ptimers .text size is larger than the ktimers one! I know that you claim
> that .text size is not a criteria, but Andrew seriously asked what he
> gets for the increase of .text.

Sorry, but I didn't had as much time as you to finetune my implementation.
I did some quick tests by also deinlining some stuff, which quickly
brought it down to the ktimer size, integrating some more cleanups should
do the rest.
Thomas, this is hardly an argument against my implementation, I never
claimed it to be complete. It was meant as sanely mergable version
compared to your large ktimers patch.

Anyway, thanks for finally responding, there seem have to piled up a
number of misconceptions, please give it some time to clear up.

bye, Roman

2005-12-07 12:46:52

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Wed, 7 Dec 2005, Nick Piggin wrote:

> Sure... hmm, the names timeout and timer themselves have something
> vagely wrong about them, but I can't quite place my finger on it,
> not a real worry though...
>
> Maybe it is that timeout is an end result, but timer is a mechanism.
> So maybe it should be 'struct interval', 'struct timeout';
> or 'struct timer', 'struct timeout_timer'.
>
> But I don't know really, it isn't a big deal.

Nick, thanks for speaking up about this.
My mistake was to make a big deal out of it, because I knew it would
confuse more people. After I got the heat for this, it seems nobody else
want to get flamed for it.

bye, Roman

2005-12-07 13:07:34

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Wed, 7 Dec 2005, Ingo Molnar wrote:

> maybe 'struct timer' and 'struct hrtimer' is the right solution after
> all, and our latest queue doing 'struct timer_list' + 'struct hrtimer'
> is actually quite close to it.
>
> 'struct ptimer' does have a bit of vagueness in it at first sight, do
> you agree with that? (does it mean 'process'? 'posix'? 'precision'?)
>
> also, hrtimers on low-res clocks do have high internal resolution, but
> they are not precise timing mechanisms in the end, due to the low-res
> clock. So the more generic name would be 'high-resolution timers', not
> 'precision timers'. (also, the name 'precision timers' sounds a bit
> funny too, but i dont really know why.)

My ptimer suggestion was based on your "funny" name "high-precision
timer". Sorry Ingo, that joke is on you. :-)
Anyway, anything else than ktimer is fine with me.

bye, Roman

2005-12-07 16:55:49

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

* Roman Zippel <[email protected]> wrote:

> > Sorry if it was previously your idea and if we didnt credit you for it.
> > I did not keep track of each word said in these endless mail threads. We
> > credited every suggestion and idea which we picked up from you, see our
> > previous emails. If we missed one, it was definitely not intentional.
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=112755827327101
> http://marc.theaimsgroup.com/?l=linux-kernel&m=112760582427537
>
> A bit later ktime_t looked pretty much like the 64bit part of my
> ktimespec.

and Thomas credited you for that point in his announcement:

" Roman pointed out that the penalty for some architectures
would be quite big when using the nsec_t (64bit) scalar time
storage format. "

http://marc.theaimsgroup.com/?l=linux-kernel&m=112794069605537&w=2

also, once you came up with actual modifications to the ktimers concept
(the ptimer queue) we noticed a further refinement of ktime_t in that
code: the elimination of the plain scalar type. We gave you credit
again:

" - eliminate the plain s64 scalar type, and always use the union.
This simplifies the arithmetics. Idea from Roman Zippel. "

see:

http://marc.theaimsgroup.com/?l=linux-kernel&m=113339663027117&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113382965626004&w=2

we couldnt take your actual patch/code though, due to the way you
created the ptimer queue: you took our ktimer queue, added a dozen
changes to it (intermixed, without keeping/disclosing the changes), then
you split up the resulting queue. This structure made it largely
incompatible with our queue, the diff between ktimers and ptimers was
larger than the two patches themselves, due to the stacked changed! This
is not a complaint - we are happy you are writing ktimer based code -
this is just an explanation of why we couldnt take the code/patch as-is
but had to redo that portion from scratch, based on your idea.

> I don't won't to imply any intention, but please try to see this from
> my perspective, after this happens a number of times.

What the hell are you talking about? I bloody well know how it all
happened, because i did those simplifications to ktime.h myself, and i
added the changelog too, crediting you for the idea.

Ingo

2005-12-07 17:18:00

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Wed, 7 Dec 2005, Ingo Molnar wrote:

> > A bit later ktime_t looked pretty much like the 64bit part of my
> > ktimespec.
>
> and Thomas credited you for that point in his announcement:
>
> " Roman pointed out that the penalty for some architectures
> would be quite big when using the nsec_t (64bit) scalar time
> storage format. "

"pointed out that the penalty" is a bit different from "provided the
basic idea of the ktime_t union and half the implementation"...

bye, Roman

2005-12-07 17:57:11

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

* Roman Zippel <[email protected]> wrote:

> > > (It's also interesting how you do that without giving me any
> > > credit for it.)
> >
> > Sorry if it was previously your idea and if we didnt credit you for
> > it.
> > [...]
> >
> > > A bit later ktime_t looked pretty much like the 64bit part of my
> > > ktimespec.
> >
> > and Thomas credited you for that point in his announcement:
> >
> > " Roman pointed out that the penalty for some architectures
> > would be quite big when using the nsec_t (64bit) scalar time
> > storage format. "
>
> "pointed out that the penalty" is a bit different from "provided the
> basic idea of the ktime_t union and half the implementation"...

so ... did you change your position from accusing us of not giving you
_any_ credit:

"It's also interesting how you do that without giving me
any credit for it."

to accusing us of not giving you _enough_ credit? Did i get that right?

And ontop of that, you now want the credit for providing the basic idea
for half of the ktimer/hrtimer implementation? Sorry that i did not find
out in advance that you wanted _that_ ;-)

Ingo

2005-12-07 18:02:48

by Paul Baxter

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

> "pointed out that the penalty" is a bit different from "provided the
> basic idea of the ktime_t union and half the implementation"...
>
> bye, Roman

This is getting bloody ridiculous.

Roman, you won't get credited for every nuance of what you've said and done,
neither will Ingo and Thomas and the many others that work hard to make
Linux a better Operating System.

I admire the fact that you did pick up the gauntlett and produce code which
has helped further the whole work.

I keep reading this thread because, against all odds, there is a lot of
technical progress but the constant bickering really does your credibility
no favours.

Please stop trying to portray yourself as a victim, those that care will
form an opinion from your words. Please, please have the grace to rise above
any perceived 'insults' both actual, and, in my view, mostly insubstantial.

A frustrated lurker, who can't wait for high resolution timers and maybe
even high precision timers one day.

Thank you, Roman, for all your technical efforts.

2005-12-07 18:19:13

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Wed, 7 Dec 2005, Ingo Molnar wrote:

> so ... did you change your position from accusing us of not giving you
> _any_ credit:
>
> "It's also interesting how you do that without giving me
> any credit for it."
>
> to accusing us of not giving you _enough_ credit? Did i get that right?
>
> And ontop of that, you now want the credit for providing the basic idea
> for half of the ktimer/hrtimer implementation? Sorry that i did not find
> out in advance that you wanted _that_ ;-)

Ingo, please stay serious and don't take things out of context.
I was just asking for some more/better credit for some of the ktime_t core
ideas, I'll leave it now to you what you make with this and I won't
further pursue this issue.
Can we please now get back to the more important issues?

bye, Roman

2005-12-07 23:12:41

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi Roman,

Roman Zippel wrote:
>
> Nick, thanks for speaking up about this.
> My mistake was to make a big deal out of it, because I knew it would
> confuse more people. After I got the heat for this, it seems nobody else
> want to get flamed for it.
>

I didn't mean to trivialise the issue. I think good naming is
important, however I added the disclaimer because of course I
didn't write any code, so my opinion didn't carry much weight
in that particular situation compared to you guys.

Thanks,
Nick

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-12-08 05:19:00

by Paul Jackson

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

> > > [...] So Thomas, please get over yourself and start talking.
>
> I must say it's completely beyond me how this could be "insulting".

Well ... fools rush in where angels fear to tread ...

As you note, people had better not take comments on their code as
insulting, if they are going to survive for long on lkml.

However comments on the person that don't match that person's current
self image often cause distress, even if (sometimes -especially- if)
they are accurate.

This 'get over yourself' implies a comment on the person, not the code.
It suggests you think they are on a high horse.

Since Thomas (apparently) didn't think he was afflicted at the
moment with something he needed to 'get over', he probably found that
instruction annoying on first reading.

That he called it an 'insult' is an irrelevant detail. He's just
saying he found it annoying to read, but like most of us, instead
of saying "I hurt", he's saying "dog bites." Nevermind that it was
actually a cat.

There is an easy way around this however. When I feel like telling
someone they are an idiot (or any other ad hominem comment less than
puppy dog happiness), I have better luck calling myself that, as in
"sorry for being such a stupid git, but ...". Few people object
to the -other- person confessing to being a stupid rude bastard.
They just don't want themselves to be thought of that way, or anything
remotely resembling that.

As I recall, Linus has called himself a bastard more than once, with
just such good affect.

So just take all descriptions of other persons, and flip them around,
pretending to describe yourself. It will be a bold faced lie, and
totally illogical ... but that's typical in the realm of human
emotions. The human species is definitely one dorked up bunch.

Imagine that this subthread had begun "Sorry, Thomas, let me get off
my high horse and start talking ...".

Thanks, by the way, for your help back then on cpuset locking. It was
invaluable.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-12-08 08:12:20

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

* Paul Jackson <[email protected]> wrote:

> So just take all descriptions of other persons, and flip them around,
> pretending to describe yourself. It will be a bold faced lie, and
> totally illogical ... but that's typical in the realm of human
> emotions. The human species is definitely one dorked up bunch.

> "sorry for being such a stupid git, but ...".

it does not matter that it's a bold faced lie, people technically 'lie'
about little things all the time and for a good reason - the human
species (especially males) are way to agressive by default, so a certain
conscious buffer zone is needed to even that out. (It is in fact a
medical condition if that buffer-zone does not exist.)

But there's a crutial difference. A statement from Linus (or you) like
the one above also shows three more things that are important:

1) that you have a sense of humor, and that you dont take things too
seriously :) Humor goes a long way defusing differences.

2) that you actually entertain the possibility of being wrong and that
you do not just want to steamroll the other side with your opinion.

3) that you actually care about the other person. This matters alot.
There's a reason why hundreds of people who never met each other join
on a mailing list and write code, and the reason is definitely not to
have others piss on their code.

> Thanks, by the way, for your help back then on cpuset locking. It was
> invaluable.

i took some time and re-read your thread with Roman about cpusets back
in September. It was like fresh air! Roman was totally reasonable and
positive in that thread - so i dont think it's me or Thomas misreading
his style in this case or something.

anyway, the reason i confronted Roman with the situation was in the
renewed hope to improve direct communication. I'm still hopeful :)

Ingo

2005-12-08 09:26:39

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

* Roman Zippel <[email protected]> wrote:

> > to be able to comprehend what kind of mood we might be in when reading
> > your emails these days, how about this little snippet from you, from the
> > second email you wrote in the ktimers threads:
> >
> > "First off, I can understand that you're rather upset with what I wrote,
> > unfortunately you got overly defensive, so could you please next time
> > not reply immediately and first sleep over it, an overly emotional
> > reply is funny to read but not exactly useful."
>
> Here we probably get to the root of the problem: we got off on the
> wrong foot.

yes.

> In my first email I hadn't much good to say about the initial
> announcement, but at any time it was meant technical. Anyone who
> compares the first and the following announcement will notice the big
> improvement. Unfortunately Thomas seemed to have to taken it rather
> personal (although it never was meant that way) [...]

what you did was in essence to piss on his code, concepts and
description. [oh, i wrote roughly half of ktimers, so you pissed on my
code too ;-) ]

Here are a few snippets from you that show that most of the negative
messaging from you was directed against the text Thomas wrote (or
against Thomas), not against the code:

" How you get to these conclusions is still rather unclear, I don't even
really know what the problem is from just reading the pretext. "

" What is seriously missing here is the big picture. "

" This no answer at all, you only repeat what you already said above. "

" The main problem with your text is that you jump from one topic to the
next, making it impossible to create a coherent picture from it. "

" You don't think that having that much timers in first place is little
insane (especially if these are kernel timers)? "

" Later it becomes clear that you want high resolution timers, what
doesn't become clear is why it's such an essential feature that
everyone has to pay the price for it (this does not only include
changes in runtime behaviour, but also the proposed API changes). "

" It's nice that you're sure of it, but as long don't provide the means
to verify them, they are just assertions. "

note that these were pretty much out of the blue sky, and they pretty
much set the stage. Given that Thomas is a volunteer too, he does not
have to bear with what he senses as arbitrary abuse.

> [...] and I never got past this first impression and ever since I
> can't get him back to a normal conversation.

maybe because your 'get him back to a normal conversation' attempt used
precisely the same somewhat dismissive and somewhat derogatory tone and
type of language that your initial mails were using? Past experience is
extrapolated to the future, so even small negative messages get easily
blown up, and the "cycle of violence" never stops, because nobody breaks
the loop.

> > Insults like the following sentence in this very email:
> >
> > > [...] So Thomas, please get over yourself and start talking.
>
> I must say it's completely beyond me how this could be "insulting".

maybe it is insulting because the "get over yourself" implicitly
suggests that the fault is with Thomas?

Let me give you a few alternatives, that might have had a completely
different effect and which do not contain any implicit insults:

"So Thomas, I know we got on to the wrong footing, but lets start
talking again."

or:

"So Thomas, I know we had a couple of nasty exchanges in the past, but
lets bury the hatchet and try again. I apologize if I offended you in
any way in the past."

just try it, really. Even if it's a bold faced lie ;)

> This is my desperate attempt at getting any conversation started. If
> Thomas isn't talking to me at all, I can't resolve any issue he might
> have with me. [...]

Thomas wrote you 11 replies in 2.5 months, and some of those were
extremely detailed. That's a far cry from not talking at all. He did try
hard, he did get involved in a flamewar with you, which wasnt overly
productive. But he is a volunteer, he has no obligation to waste time on
flamewars.

> > let me be frank, and show you my initial reply that came to my mind when
> > reading the above sentence: "who the f*ck do you think you are to talk
> > to _anyone_ like that?". Now i'm usually polite and wont reply like
> > that,...
>
> You may haven't said it openly like that, but this hostility was still
> noticable. You disagreed with me on minor issues and used the smallest
> mistake to simply lecture me. From my point the attitude you showed
> towards me is not much different from what you're accusing me of here.

yes, i definitely was not nice in a couple of mails, and i'd like to
apologize for it.

> I'm not saying that I'm innocent about this, but any "insult" was
> never intentional and I tried my best to correct any issues after we
> got off on the wrong foot, but I obviously failed at that, I simply
> never got past the initial impression.

ok, apology taken :)

> > in any case, from me you'll definitely get a reply to every positive or
> > constructive question you ask in this thread, but you wont get many
> > replies to mails that also include high-horse insults, question or
> > statements.
>
> Let's take the ptimer patches, I got _zero_ direct responses to it
> [...]

well, direct communication with you has proven to be very unproductive a
number of times, so what would have been the point? But we did mention
lots of technical points in our subsequent mails, referring to your
ptimers queue a number of times. We even adopted the ktime.h
simplification idea and credited you for it.

also, what did you expect? Basically you came out with a patch-queue
based on ktimers, but you did not send any changes against the ktimers
patch itself, which made it very hard to map the real finegrained
changes you did to ktimers. You provided a writeup of differences, but
they did not fully cover the full scope of changes, relative to ktimers.
You based your queue on a weeks old version of ktimers, which also
raises the possibility that you were working on this for some time, in
private, for whatever reason. (Again, this is not a complaint - we are
happy you are communicating via code - whatever form that code is in.)

> [...] and it's difficult for me to understand how this could be taken
> as "high-horse insult". As I obviously failed to make my criticism
> understandable before, I produced these patches to provide a technical
> base for a discussion of how this functionality could be merged in the
> hopes of "Patches wont be ignored, i can assure you", unfortunately
> they were.

they were not ignored - we mentioned the ptimer changes in our
subsequent announcements, and we Cc:-ed you to those annoucements so
that you get it first-hand.

Ingo

2005-12-08 13:09:12

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Thu, 8 Dec 2005, Ingo Molnar wrote:

> Here are a few snippets from you that show that most of the negative
> messaging from you was directed against the text Thomas wrote (or
> against Thomas), not against the code:

Do you really think quoting me out of context is helping? From my
perspective you're trying to show me now as the bad guy and I'm not
accepting that. I don't know what you're trying to do, if you're trying to
mediate, then you're really suck at it, if you just want to piss me off,
it's working great. :-(

Technically I still stand behind everything I said in that context, in the
meantime I learned a few new things and I understand them better, so
some things have become nonissues and I even changed my mind about some
other things.
OTOH I'm the first to admit that I could have said things nicer, but mail
is a rather bad channel to transport emotions and whatever I say can be
taken badly. I really try my best to avoid this, but sometimes it's really
hard, especially if I can't get past the initial resentment. I gladly
apologize for any mistake I did and I'll do my best to learn from it, but
I'm not going to make amends for it forever. At some point it would be
really nice if you stopped to rub it in what a insensitive clod I am, I
know that already.
Ingo, if you want to help me, why don't you go with a good example ahead
and I'll try to follow you. How about this?

> > > > [...] So Thomas, please get over yourself and start talking.
> >
> > I must say it's completely beyond me how this could be "insulting".
>
> maybe it is insulting because the "get over yourself" implicitly
> suggests that the fault is with Thomas?

This is a nice example, that _whatever_ I'm saying can be misunderstood.
Why don't you even try to give me a little credit that above was not meant
as insult? You make an assumption about what I said and you don't even
give me a chance to correct myself.
Thomas obviously has some kind of problem with me and unless he starts to
talk to me, I can't help him to get over whatever problem that is. I'm
not going away, so we have to get along somehow and this means we have to
_talk_.
Ingo, you only want to see the "get over yourself" part, whereas my
emphasis was and is on "talking".

> just try it, really. Even if it's a bold faced lie ;)

I'm a bad liar and as long as I don't know what the problem is, I'll make
the same mistake over and over. I have no intention of becoming a
notorious liar.

> Thomas wrote you 11 replies in 2.5 months, and some of those were
> extremely detailed. That's a far cry from not talking at all.

Some of it was indeed more verbose, but I never got very far with my
followup questions. Thomas used very often a phrase like "we analyzed the
problem and we came to the conclusion...". It's great that you and Thomas
get so well along with each other, but I'm in the disadvantage that I lack
the information context that you have. What is "extremely detailed" for
you is lacking context to create a coherent picture for me, so it's
sometimes really frustrating to pull some information out of you both.

> also, what did you expect? Basically you came out with a patch-queue
> based on ktimers, but you did not send any changes against the ktimers
> patch itself, which made it very hard to map the real finegrained
> changes you did to ktimers.

At the time I only had the huge ktimers patch from -mm to work with.
One primary target was to split out the core (without all the extra
complexity and extra cleanups) into mergable pieces, which makes it a bit
pointless to do it relative to this huge patch.
The other main target was the resolution handling, I tried very hard to
explain the details of it and why I did them this way. A discussion about
this would have required a _direct_ response, where you point out with
what you disagree. Random comments in other mails are not helping at all.
The rest are some smaller patches, which are completely independent of
hrtimer, but even for this I got no response except from Oleg.

> You provided a writeup of differences, but
> they did not fully cover the full scope of changes, relative to ktimers.

I've seen this claim now a few times, but why the hell don't you just ask
about the things that you think were missing?

bye, Roman

2005-12-08 15:37:21

by Steven Rostedt

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

On Thu, 2005-12-08 at 14:08 +0100, Roman Zippel wrote:
> Hi,
>
> On Thu, 8 Dec 2005, Ingo Molnar wrote:
>
> > Here are a few snippets from you that show that most of the negative
> > messaging from you was directed against the text Thomas wrote (or
> > against Thomas), not against the code:
>
> Do you really think quoting me out of context is helping? From my
> perspective you're trying to show me now as the bad guy and I'm not
> accepting that. I don't know what you're trying to do, if you're trying to
> mediate, then you're really suck at it, if you just want to piss me off,
> it's working great. :-(
>

Ingo, Thomas, Roman, Please!!!! Lets all say sorry to each other and
start over. This thread is starting to become really annoying. You
three are very intelligent, and I really want to know what you have to
say technically to each other, and that's why I'm not ignoring this
thread. Yes, all of you were not nice to each other. Suck it up and
forget about it.

How about this, if you want to flame each other, do it privately. But
when posting to the LKML, Talk technically only. Don't even try to be
nice, since I'm not sure now any of you can do that to each other. But
also leave out _any_ personal comments. So talk _only_ about the code.
If you don't like it, give a technical reason why.

Remember, the rest of the world is watching here.

Thank you,

-- Steve

2005-12-09 17:16:43

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

sorry for the late reply. I was travelling and cut off the net for a
while.

On Wed, 2005-12-07 at 13:18 +0100, Roman Zippel wrote:
> > It might turn out in the rework of the hrt bits that the list can go
> > away, but this is a nobrainer to do. (The very first unpublished version
> > of ktimers had no list, it got introduced during the hrt addon and
> > stayed there to make the hrt patch less intrusive.)
>
> If it's such a nobrainer to remove it, why don't you add it later? As it
> is right now, it's not needed and nobody but you knows what it's good for.
> This way yoy make it only harder to review the code, if one stumbles over
> these pieces all the time.

Well, if it makes it simpler for you to review the code.

But can you please explain to a code review unaware kernel developer
newbie like me, why this makes a lot of difference and why it makes it
so much easier to review ?

Actually the change adds more code lines and removes one field of the
hrtimer structure, but it has exactly the same functionality: Fast
access to the first expiring timer without walking the rb_tree.

include/linux/hrtimer.h | 7 ++-----
kernel/hrtimer.c | 26 ++++++++++++++------------
2 files changed, 16 insertions(+), 17 deletions(-)

Index: linux-2.6.15-rc5/include/linux/hrtimer.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/hrtimer.h
+++ linux-2.6.15-rc5/include/linux/hrtimer.h
@@ -49,8 +49,6 @@ struct hrtimer_base;
* struct hrtimer - the basic hrtimer structure
*
* @node: red black tree node for time ordered insertion
- * @list: list head for easier access to the time ordered list,
- * without walking the red black tree.
* @expires: the absolute expiry time in the hrtimers internal
* representation. The time is related to the clock on
* which the timer is based.
@@ -63,7 +61,6 @@ struct hrtimer_base;
*/
struct hrtimer {
struct rb_node node;
- struct list_head list;
ktime_t expires;
enum hrtimer_state state;
int (*function)(void *);
@@ -78,7 +75,7 @@ struct hrtimer {
* to a base on another cpu.
* @lock: lock protecting the base and associated timers
* @active: red black tree root node for the active timers
- * @pending: list of pending timers for simple time ordered access
+ * @first: pointer to the timer node which expires first
* @resolution: the resolution of the clock, in nanoseconds
* @get_time: function to retrieve the current time of the clock
* @curr_timer: the timer which is executing a callback right now
@@ -87,7 +84,7 @@ struct hrtimer_base {
clockid_t index;
spinlock_t lock;
struct rb_root active;
- struct list_head pending;
+ struct rb_node *first;
unsigned long resolution;
ktime_t (*get_time)(void);
struct hrtimer *curr_timer;
Index: linux-2.6.15-rc5/kernel/hrtimer.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/hrtimer.c
+++ linux-2.6.15-rc5/kernel/hrtimer.c
@@ -313,7 +313,6 @@ hrtimer_forward(struct hrtimer *timer, c
static void enqueue_hrtimer(struct hrtimer *timer, struct hrtimer_base *base)
{
struct rb_node **link = &base->active.rb_node;
- struct list_head *prev = &base->pending;
struct rb_node *parent = NULL;
struct hrtimer *entry;

@@ -329,22 +328,23 @@ static void enqueue_hrtimer(struct hrtim
*/
if (timer->expires.tv64 < entry->expires.tv64)
link = &(*link)->rb_left;
- else {
+ else
link = &(*link)->rb_right;
- prev = &entry->list;
- }
}

/*
- * Insert the timer to the rbtree and to the sorted list:
+ * Insert the timer to the rbtree and check whether it
+ * replaces the first pending timer
*/
rb_link_node(&timer->node, parent, link);
rb_insert_color(&timer->node, &base->active);
- list_add(&timer->list, prev);

timer->state = HRTIMER_PENDING;
-}

+ if (!base->first || timer->expires.tv64 <
+ rb_entry(base->first, struct hrtimer, node)->expires.tv64)
+ base->first = &timer->node;
+}

/*
* __remove_hrtimer - internal function to remove a timer
@@ -354,9 +354,11 @@ static void enqueue_hrtimer(struct hrtim
static void __remove_hrtimer(struct hrtimer *timer, struct hrtimer_base *base)
{
/*
- * Remove the timer from the sorted list and from the rbtree:
+ * Remove the timer from the rbtree and replace the
+ * first entry pointer if necessary.
*/
- list_del(&timer->list);
+ if (base->first == &timer->node)
+ base->first = rb_next(&timer->node);
rb_erase(&timer->node, &base->active);
}

@@ -528,16 +530,17 @@ int hrtimer_get_res(const clockid_t whic
static inline void run_hrtimer_queue(struct hrtimer_base *base)
{
ktime_t now = base->get_time();
+ struct rb_node *node;

spin_lock_irq(&base->lock);

- while (!list_empty(&base->pending)) {
+ while ((node = base->first)) {
struct hrtimer *timer;
int (*fn)(void *);
int restart;
void *data;

- timer = list_entry(base->pending.next, struct hrtimer, list);
+ timer = rb_entry(node, struct hrtimer, node);
if (now.tv64 <= timer->expires.tv64)
break;

@@ -590,7 +593,6 @@ static void __devinit init_hrtimers_cpu(

for (i = 0; i < MAX_HRTIMER_BASES; i++) {
spin_lock_init(&base->lock);
- INIT_LIST_HEAD(&base->pending);
base++;
}
}

> > > Nice, that you finally also come to that conclusion, after I said that
> > > already for ages. (It's also interesting how you do that without
> > > giving me any credit for it.)
> >
> > Sorry if it was previously your idea and if we didnt credit you for it.
> > I did not keep track of each word said in these endless mail threads. We
> > credited every suggestion and idea which we picked up from you, see our
> > previous emails. If we missed one, it was definitely not intentional.
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=112755827327101
> http://marc.theaimsgroup.com/?l=linux-kernel&m=112760582427537
>
> A bit later ktime_t looked pretty much like the 64bit part of my ktimespec.
> I don't won't to imply any intention, but please try to see this from my
> perspective, after this happens a number of times.

I have seen your and Ingos conversation on that and I dont want to add
more flames into this.

> > You define that absolute interval timers on clock realtime change their
> > behaviour when the initial time value is expired into relative timers
> > which are not affected by time settings. I have no idea how you came to
> > that interpretation of the spec. I completely disagree [but if you would
> > like I can go into more detail why I think it's wrong.]
>
> Please do. I explained it in one of my patches:
>
> [PATCH 5/9] remove relative timer from abs_list
>
> When an absolute timer expires, it becomes a relative timer, so remove
> it from the abs_list. The TIMER_ABSTIME flag for timer_settime()
> changes the interpretation of the it_value member, but it_interval is
> always a relative value and clock_settime() only affects absolute time
> services.

This is your interpretation and I disagree.

If I set up a timer with a 24 hour interval, which should go off
everyday at 6:00 AM, then I expect that this timer does this even when
the clock is set e.g. by daylight saving. I think, that this is a
completely valid interpretation and makes a lot of sense from a
practical point of view. The existing implementation does it that way
already, so why do we want to change this ?

Also you treat the interval relative to the current time of the callback
function:

timer->expires = ktime_add(timer->base->last_expired,
timr->it.real.incr);

This leads to a summing up error and even if the result is similar to
the summing up error of the current vanilla implementation I prefer a
solution which adds the interval to the previous set expiry time

timer->expires = ktime_add(timer->expires,
timr->it.real.incr);

The spec says:
"Also note that some implementations may choose to adjust time and/or
interval values to exactly match the ticks of the underlying clock."

So there is no requirement to do so. Of course you may, but this takes
simply the name "precision" ad absurdum.

> > Beside of that, the implementation is also completely broken. (You
> > rebase the timer from the realtime base to the monotonic base inside of
> > the timer callback function. On return you lock the realtime base and
> > enqueue the timer into the realtime queue, but the base field of the
> > timer points to the monotonic queue. It needs not much phantasy to get
> > this exploited into a crash.)
>
> If you provide the wrong parameters, you can crash a lot of stuff in the
> kernel. "exploited" usually implies it can be abused from userspace, which
> is not the case here.

Thanks for teaching me what an "exploit" usally means!

I intentionally wrote "exploited into a crash".

How do you think I got this to crash ? By hacking up some complex kernel
code ? No, simply by running my unmodified test scripts from user space
with completely valid and correct parameters. Of course its also
possible to write a program which actually exploits this.

The implementation is simply broken. Can you just accept this ?

> > Furthermore, your implementation is calculating the next expiry value
> > based on the current time of the expiration rather than on the previous
> > expected expiry time, which would be the natural thing to do. This
> > detail also explains the system-load dependent random drifting of
> > ptimers quite well.
>
> Is this conclusion based on actual testing? The behaviour of ptimer should
> be quite close to the old jiffie based timer, so I'm a bit at a loss here,
> how you get to this conclusion. Please provide more details.

It is based on testing. Do you think I pulled numbers out of my nose ?

But I have to admit that I did not look close enough into your code and
so I missed the ptimer_run_queue call inside of the lost jiffie loop.
Sorry, my conclusion was wrong.

The problem seems to be related to the rebase code, which leads to a
wrong expiry value for clock realtime interval timers with the ABSTIME
flag set.

> > The changes you did to the timer locking code (also in timer.c) are racy
> > and simply exposable. Oleg's locking implementation is there for a good
> > reason.
>
> Thomas, bringing up this issue is really weak. With Oleg's help it's
> already solved, you don't have to warm it up. :(

I did not warm anything up. I was not aware that Oleg jumped already in
on this - I was not cc'ed and I really did not pay much attention on
LKML during this time.
I'm familiar enough with locking, that I can recognize such a problem on
my own.

> > Neither do I understand the jiffie boundness you re-introduced all over
> > the place. The softirq code is called once per jiffy and the expiry is
> > checked versus the current time. Basing a new design on jiffies, where
> > the design intends to be easily extensible to high resolution clocks, is
> > wrong IMNSHO. Doing a high resolution extension on top of it is just
> > introducing a lot of #ifdef mess in places where none has to be. We had
> > that before, and dont want to go back there.
>
> I don't understand where you get this from, I explicitely said that higher
> resolution requires a better clock abstraction, bascially any place which
> mentions TICK_NSEC has to be cleaned up like this. I'm at loss why you
> think this requires "a lot of #ifdef mess".

Why do you need all this jiffie stuff in the first place? It is not
necessary at all. The hrtimer code does not contain a single reference
of jiffies and therefor it does not need anything to clean up. I
consider even a single high resolution timer related #ifdef outside of
hrtimer.c and the clock event abstraction as an unnecessary mess. Sure
you can replace the TICK_NSEC and ktime_to_jiffie stuff completely, but
I still do not see the point why it is necessary to put it there first.
It just makes it overly complex to review and understand :)

I'm happy that we at least agree that we need better clock abstraction
layers. How do you think does our existing high resolution timer
implementation work ? While you explicitely said that it is required, we
explicitely used exactly such a mechanism from the very first day.

Please stop your absurd schoolmasterly attempts to teach me stuff which
I'm well aware off. Can you please accept, that I exactly know what I'm
talking about?

> Anyway, thanks for finally responding, there seem have to piled up a
> number of misconceptions, please give it some time to clear up.

Roman, I have no interest and no intention to spend any more of my
private time on a discussion like this.

I always was and I'm still up for a technical discussion and
cooperation. I'm not vengeful at all and if I ever meet you in person,
the first beers at the bar are on my bill.

But I seriously will ignore you completely, if you keep this tone and
attitude with me.

I'm well aware that LKML is not a nun convent, but the basic rules of
human behaviour and respect still apply.

tglx

2005-12-12 13:39:28

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Fri, 9 Dec 2005, Thomas Gleixner wrote:

> Actually the change adds more code lines and removes one field of the
> hrtimer structure, but it has exactly the same functionality: Fast
> access to the first expiring timer without walking the rb_tree.

Together with the state field this would save 12 bytes, which is IMO very
well worth considering.
You seem to have some plans for it, the best hint I've found for it is:

+ (This seperate list is also useful for high-resolution timers where we
+ need seperate pending and expired queues while keeping the time-order
+ intact.)"

Could you please elaborate on this?

> > [PATCH 5/9] remove relative timer from abs_list
> >
> > When an absolute timer expires, it becomes a relative timer, so remove
> > it from the abs_list. The TIMER_ABSTIME flag for timer_settime()
> > changes the interpretation of the it_value member, but it_interval is
> > always a relative value and clock_settime() only affects absolute time
> > services.
>
> This is your interpretation and I disagree.
>
> If I set up a timer with a 24 hour interval, which should go off
> everyday at 6:00 AM, then I expect that this timer does this even when
> the clock is set e.g. by daylight saving. I think, that this is a
> completely valid interpretation and makes a lot of sense from a
> practical point of view. The existing implementation does it that way
> already, so why do we want to change this ?

I don't know whether this behaviour was intentional and why it was done
this way, so I did this patch to initiate a discussion about this.

I wouldn't say a 1 day interval timer is a very realistic example and the
old timer wouldn't be very precise for this.
The rationale for example talks about "a periodic timer with an absolute
_initial_ expiration time", so I could also construct a valid example with
this expectation. The more I read the spec the more I think the current
behaviour is not correct, e.g. that ABS_TIME is only relevant for
it_value.
So I'm interested in specific interpretations of the spec which support
the current behaviour.

> Also you treat the interval relative to the current time of the callback
> function:
>
> timer->expires = ktime_add(timer->base->last_expired,
> timr->it.real.incr);
>
> This leads to a summing up error and even if the result is similar to
> the summing up error of the current vanilla implementation I prefer a
> solution which adds the interval to the previous set expiry time
>
> timer->expires = ktime_add(timer->expires,
> timr->it.real.incr);
>
> The spec says:
> "Also note that some implementations may choose to adjust time and/or
> interval values to exactly match the ticks of the underlying clock."
>
> So there is no requirement to do so. Of course you may, but this takes
> simply the name "precision" ad absurdum.

Your current implementation contradicts the requirement that values should
be rounded up to the resolution of the timer, that's exactly what my
implementation does. The resolution of the timer is currently TICK_NSEC
(+- ntp correction) and one expiry of it should only cause at most one
expiry of all pending timer. If I set a 1msec timer in your implementation
(with HZ=250), I automatically get 3 overruns, even though the timer
really did only expire once.

Since you don't do any rounding at all anymore, your timer may now expire
early with low resolution clocks (the old jiffies + 1 problem I explained
in my ktime_t patch).

Also in the ktimer patch you wrote:

+- also, there is an application surprise factor, the 'do not round
+ intervals' technique can lead to the following sample sequence of
+ events:
+
+ Interval: 1.7ms
+ Resolution: 1ms
+
+ Event timeline:
+
+ 2ms - 4ms - 6ms - 7ms - 9ms - 11ms - 12ms - 14ms - 16ms - 17ms ...
+
+ this 2,2,1,2,2,1...msec 'unpredictable and uneven' relative distance
+ of events could surprise applications.

But this is now exactly the bevhaviour your timer has, why is not
"surprising" anymore?

> > > Beside of that, the implementation is also completely broken. (You
> > > rebase the timer from the realtime base to the monotonic base inside of
> > > the timer callback function. On return you lock the realtime base and
> > > enqueue the timer into the realtime queue, but the base field of the
> > > timer points to the monotonic queue. It needs not much phantasy to get
> > > this exploited into a crash.)
> >
> > If you provide the wrong parameters, you can crash a lot of stuff in the
> > kernel. "exploited" usually implies it can be abused from userspace, which
> > is not the case here.
>
> Thanks for teaching me what an "exploit" usally means!
>
> I intentionally wrote "exploited into a crash".
>
> How do you think I got this to crash ? By hacking up some complex kernel
> code ? No, simply by running my unmodified test scripts from user space
> with completely valid and correct parameters. Of course its also
> possible to write a program which actually exploits this.
>
> The implementation is simply broken. Can you just accept this ?

I can accept that you found bug, but for "simply broken" I'm not convinced
yet.
Sorry, I have not been specific enough, I disagree with your analysis
above. On return the timer isn't requeued into the realtime queue at all,
so this can't be the reason for the crash. I guess it's more likely you
managed to trigger the locking bug.

> > > Furthermore, your implementation is calculating the next expiry value
> > > based on the current time of the expiration rather than on the previous
> > > expected expiry time, which would be the natural thing to do. This
> > > detail also explains the system-load dependent random drifting of
> > > ptimers quite well.
> >
> > Is this conclusion based on actual testing? The behaviour of ptimer should
> > be quite close to the old jiffie based timer, so I'm a bit at a loss here,
> > how you get to this conclusion. Please provide more details.
>
> It is based on testing. Do you think I pulled numbers out of my nose ?

Jeez, sorry for asking. :(
You didn't specify anywhere how you got to this conclusion, so I could
reproduce it myself. Could you please elaborate on this "system-load
dependent random drifting"?

> > I don't understand where you get this from, I explicitely said that higher
> > resolution requires a better clock abstraction, bascially any place which
> > mentions TICK_NSEC has to be cleaned up like this. I'm at loss why you
> > think this requires "a lot of #ifdef mess".
>
> Why do you need all this jiffie stuff in the first place? It is not
> necessary at all. The hrtimer code does not contain a single reference
> of jiffies and therefor it does not need anything to clean up. I
> consider even a single high resolution timer related #ifdef outside of
> hrtimer.c and the clock event abstraction as an unnecessary mess. Sure
> you can replace the TICK_NSEC and ktime_to_jiffie stuff completely, but
> I still do not see the point why it is necessary to put it there first.
> It just makes it overly complex to review and understand :)

In this regard I had two major goals: a) keep it as simple as possible, b)
preserve the current behaviour and I still think I found the best
compromise so far. This would allow to first merge the basic
infrastructure, while reducing the risk of breaking anything.

I don't mind changing the behaviour, but I would prefer to do this in a
separate step and with an analysis of the possible consequences. This is
not just about posix-timers, but it also affects itimers, nanosleep and
possibly other systems in the future. Actually my main focus is not on HR
posix timer, my main interest is that everythings else keeps working and
doesn't has to pay the price for it.

It's rather likely that if there is a subtle change in behaviour, which
causes something to break, it's not noticed until it hits a release
kernel, so I think it's very well worth it to understand and document the
differences between the implementations.

> Please stop your absurd schoolmasterly attempts to teach me stuff which
> I'm well aware off. Can you please accept, that I exactly know what I'm
> talking about?

Sure, I can. I'm sorry I tried to explain things you already know, but if
you know these things already, then please show it. At this point I'm
mostly still trying to understand, why you did certain things and
sometimes I explain things from my perspective in the hopes you would fill
in the holes from your perspective.

You mostly just post your patches and only explain the conclusion, you're
make it rather short on how you get to these conclusions, e.g. what other
alternatives you've already considered. This makes it hard for me to
figure out what you know exactly from what you're talking about.

> But I seriously will ignore you completely, if you keep this tone and
> attitude with me.

Jeez, cut me some slack, would you? Especially in the last mail I mostly
just asked for more information. You read something into my mails, that is
simply not there.

bye, Roman

2005-12-12 16:36:15

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Mon, 2005-12-12 at 14:39 +0100, Roman Zippel wrote:
> > Actually the change adds more code lines and removes one field of the
> > hrtimer structure, but it has exactly the same functionality: Fast
> > access to the first expiring timer without walking the rb_tree.
>
> Together with the state field this would save 12 bytes, which is IMO very
> well worth considering.
> You seem to have some plans for it, the best hint I've found for it is:
>
> + (This seperate list is also useful for high-resolution timers where we
> + need seperate pending and expired queues while keeping the time-order
> + intact.)"
>
> Could you please elaborate on this?

Sure. I have already removed the list_head for the non high resolution
case as it turned out that it does not hurt the high resolution
implementation.

For the high resolution implementation we have to move the expired
timers to a seperate list, as we do not want to do complex callback
functions from the event interrupt itself. But we have to reprogramm the
next event interrupt, so we need simple access to the timer which
expires first.

The initial implementation did simply move the timer from the pending
list to the expired list without doing the rb_tree removal inside of the
event interrupt handler. That way the next event for reprogramming was
the first event in the pending list.

The new rebased version with the pending list removed does the rb_tree
removal inside the event interrupt and enqueues the timer, for which the
callback function has to be executed in the softirq, to the expired
list. One exception are simple wakeup callback functions, as they are
reasonably fast and we save two context switches. The next event for
reprogramming the event interrupt is retrieved by the pointer in the
base structure.

This way the list head is only necessary for the high resolution case.

The state field is not removed

> > > [PATCH 5/9] remove relative timer from abs_list
> > >
> > > When an absolute timer expires, it becomes a relative timer, so remove
> > > it from the abs_list. The TIMER_ABSTIME flag for timer_settime()
> > > changes the interpretation of the it_value member, but it_interval is
> > > always a relative value and clock_settime() only affects absolute time
> > > services.
> >
> > This is your interpretation and I disagree.
> >
> > If I set up a timer with a 24 hour interval, which should go off
> > everyday at 6:00 AM, then I expect that this timer does this even when
> > the clock is set e.g. by daylight saving. I think, that this is a
> > completely valid interpretation and makes a lot of sense from a
> > practical point of view. The existing implementation does it that way
> > already, so why do we want to change this ?
>
> I don't know whether this behaviour was intentional and why it was done
> this way, so I did this patch to initiate a discussion about this.

Ok.

> I wouldn't say a 1 day interval timer is a very realistic example and the
> old timer wouldn't be very precise for this.

Sure, as all comparisons are flawed. I just used a simple example to
illustrate my POV.

> The rationale for example talks about "a periodic timer with an absolute
> _initial_ expiration time", so I could also construct a valid example with
> this expectation. The more I read the spec the more I think the current
> behaviour is not correct, e.g. that ABS_TIME is only relevant for
> it_value.
> So I'm interested in specific interpretations of the spec which support
> the current behaviour.

Unfortunately you find just the spec all over the place. I fear we have
to find and agree on an interpretation ourself.

I agree, that the restriction to the initial it_value is definitely
something you can read out of the spec. But it does not make a lot of
sense for me. Also the restriction to TIMER_ABSTIME is somehow strange
as it converts an CLOCK_REALTIME timer to a CLOCK_MONOTONIC timer. I
never understood the rationale behind that.

> > The spec says:
> > "Also note that some implementations may choose to adjust time and/or
> > interval values to exactly match the ticks of the underlying clock."
> >
> > So there is no requirement to do so. Of course you may, but this takes
> > simply the name "precision" ad absurdum.
>
> Your current implementation contradicts the requirement that values should
> be rounded up to the resolution of the timer, that's exactly what my
> implementation does. The resolution of the timer is currently TICK_NSEC
> (+- ntp correction) and one expiry of it should only cause at most one
> expiry of all pending timer. If I set a 1msec timer in your implementation
> (with HZ=250), I automatically get 3 overruns, even though the timer
> really did only expire once.

Damn, you are right. We did not take this into account.

> Since you don't do any rounding at all anymore, your timer may now expire
> early with low resolution clocks (the old jiffies + 1 problem I explained
> in my ktime_t patch).

It does not expire early. The timer->expires field is still compared
against now.

> Also in the ktimer patch you wrote:
>
> +- also, there is an application surprise factor, the 'do not round
> + intervals' technique can lead to the following sample sequence of
> + events:
> +
> + Interval: 1.7ms
> + Resolution: 1ms
> +
> + Event timeline:
> +
> + 2ms - 4ms - 6ms - 7ms - 9ms - 11ms - 12ms - 14ms - 16ms - 17ms ...
> +
> + this 2,2,1,2,2,1...msec 'unpredictable and uneven' relative distance
> + of events could surprise applications.
>
> But this is now exactly the bevhaviour your timer has, why is not
> "surprising" anymore?

Yes, we wrote that before. After reconsidering the results we came to
the conclusion, that we actually dont need the rounding at all because
the uneven distance is equally surprising as the summing up errors
introduced by rounding.

> I can accept that you found bug, but for "simply broken" I'm not convinced
> yet. Sorry, I have not been specific enough, I disagree with your analysis
> above. On return the timer isn't requeued into the realtime queue at all,
> so this can't be the reason for the crash. I guess it's more likely you
> managed to trigger the locking bug.

Ok. Maybe I did not understand the code at this point.

> You didn't specify anywhere how you got to this conclusion, so I could
> reproduce it myself. Could you please elaborate on this "system-load
> dependent random drifting"?

As I said already, my conclusion was wrong. This showed up on a SMP
machine not on UP, when the system load was high. (The timeline was
randomly off)

> > > I don't understand where you get this from, I explicitely said that higher
> > > resolution requires a better clock abstraction, bascially any place which
> > > mentions TICK_NSEC has to be cleaned up like this. I'm at loss why you
> > > think this requires "a lot of #ifdef mess".
> >
> > Why do you need all this jiffie stuff in the first place? It is not
> > necessary at all. The hrtimer code does not contain a single reference
> > of jiffies and therefor it does not need anything to clean up. I
> > consider even a single high resolution timer related #ifdef outside of
> > hrtimer.c and the clock event abstraction as an unnecessary mess. Sure
> > you can replace the TICK_NSEC and ktime_to_jiffie stuff completely, but
> > I still do not see the point why it is necessary to put it there first.
> > It just makes it overly complex to review and understand :)
>
> In this regard I had two major goals: a) keep it as simple as possible, b)
> preserve the current behaviour and I still think I found the best
> compromise so far. This would allow to first merge the basic
> infrastructure, while reducing the risk of breaking anything.
>
> I don't mind changing the behaviour, but I would prefer to do this in a
> separate step and with an analysis of the possible consequences. This is
> not just about posix-timers, but it also affects itimers, nanosleep and
> possibly other systems in the future. Actually my main focus is not on HR
> posix timer, my main interest is that everythings else keeps working and
> doesn't has to pay the price for it.

While my focus is a clean merging of high resolution timers without
breaking the non hrt case, I still believe that we can find a solution,
where we can have the base implementation without any reference to
jiffies.

> It's rather likely that if there is a subtle change in behaviour, which
> causes something to break, it's not noticed until it hits a release
> kernel, so I think it's very well worth it to understand and document the
> differences between the implementations.

Sure.

Our goal was to keep the code almost identical independend of the
driving clock source.

I try to compare and contrast the two possible solutions:

Rounding the initial expiry time and the interval results in a summing
up error, which depends on the delta of the interval and the
resolution.

The non rounding solution results in a summing up error for intervals
which are less than the resolution. For intervals >= resolution no
summing up error is happening, but for intervals, which are not a
multiple of the resolution, an uneven interval as close as possible to
the timeline is delivered.

In both cases the timers never expire early and I think both variants
are compliant with the specification.

> Sure, I can. I'm sorry I tried to explain things you already know, but if
> you know these things already, then please show it. At this point I'm
> mostly still trying to understand, why you did certain things and
> sometimes I explain things from my perspective in the hopes you would fill
> in the holes from your perspective.
>
> You mostly just post your patches and only explain the conclusion, you're
> make it rather short on how you get to these conclusions, e.g. what other
> alternatives you've already considered. This makes it hard for me to
> figure out what you know exactly from what you're talking about.

Ok. Will try to get this better.

tglx

2005-12-12 18:31:18

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

> On Mon, 2005-12-12 at 14:39 +0100, Roman Zippel wrote:
> > > Actually the change adds more code lines and removes one field of the
> > > hrtimer structure, but it has exactly the same functionality: Fast
> > > access to the first expiring timer without walking the rb_tree.
> >
> > Together with the state field this would save 12 bytes, which is IMO very
> > well worth considering.
> > You seem to have some plans for it, the best hint I've found for it is:
> >
> > + (This seperate list is also useful for high-resolution timers where we
> > + need seperate pending and expired queues while keeping the time-order
> > + intact.)"
> >
> > Could you please elaborate on this?
>
> Sure. I have already removed the list_head for the non high resolution
> case as it turned out that it does not hurt the high resolution
> implementation.
>
> For the high resolution implementation we have to move the expired
> timers to a seperate list, as we do not want to do complex callback
> functions from the event interrupt itself. But we have to reprogramm the
> next event interrupt, so we need simple access to the timer which
> expires first.
>
> The initial implementation did simply move the timer from the pending
> list to the expired list without doing the rb_tree removal inside of the
> event interrupt handler. That way the next event for reprogramming was
> the first event in the pending list.
>
> The new rebased version with the pending list removed does the rb_tree
> removal inside the event interrupt and enqueues the timer, for which the
> callback function has to be executed in the softirq, to the expired
> list. One exception are simple wakeup callback functions, as they are
> reasonably fast and we save two context switches. The next event for
> reprogramming the event interrupt is retrieved by the pointer in the
> base structure.
>
> This way the list head is only necessary for the high resolution case.
>
> The state field is not removed

Oops, I somehow managed to remove the rest of this paragraph :(

The state field is not removed because I'm not a big fan of those
overloaded fields and I prefer to pay the 4 byte penalty for the
seperation.
Of course if there is the absolute requirement to reduce the size, I'm
not insisting on keeping it.

tglx

2005-12-13 01:26:53

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Thomas Gleixner wrote:
~
>
>
>>I wouldn't say a 1 day interval timer is a very realistic example and the
>>old timer wouldn't be very precise for this.
>
>
> Sure, as all comparisons are flawed. I just used a simple example to
> illustrate my POV.
>
>
>>The rationale for example talks about "a periodic timer with an absolute
>>_initial_ expiration time", so I could also construct a valid example with
>>this expectation. The more I read the spec the more I think the current
>>behaviour is not correct, e.g. that ABS_TIME is only relevant for
>>it_value.
>>So I'm interested in specific interpretations of the spec which support
>>the current behaviour.

My $0.02 worth: It is clear (from the standard) that the initial time
is to be ABS_TIME. It is also clear that the interval is to be added
to that time. IMHO then, the result should have the same property,
i.e. ABS_TIME. Sort of like adding an offset to a relative address.
The result is still relative.
>
>
> Unfortunately you find just the spec all over the place. I fear we have
> to find and agree on an interpretation ourself.
>
> I agree, that the restriction to the initial it_value is definitely
> something you can read out of the spec. But it does not make a lot of
> sense for me. Also the restriction to TIMER_ABSTIME is somehow strange
> as it converts an CLOCK_REALTIME timer to a CLOCK_MONOTONIC timer. I
> never understood the rationale behind that.

I don't think it really does that. The TIMER_ABSTIME flag just says
that the time requested is to be taken as "clock" time (which ever
clock) AND that this is to be the expire time regardless of clock
setting. We, in an attempt to simplify the lists, convert the expire
time into some common time notation (in most cases we convert relative
times to absolute times) but this introduces problems because the
caller has _asked_ for a relative or absolute time and not the other.
If the clock can not be set this is not a problem. If it can, well,
we need to keep track of what the caller wanted, absolute or relative.

It might help others to understand this if you were to remove the
clock names from your queues and instead call them "absolute_real" and
"up_time". Then it would be more clear, I think, that we are mapping
user requests onto these queues based on the desired functionality
without a predilection to put a timer on a given queue just because a
particular clock was requested. At this point it becomes clear, for
example, that a TIMER_ABSTIME request on the real clock is the _only_
request that should be mapped to the "absolute_real" list.
>
~
--
George Anzinger [email protected]
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/

2005-12-13 09:11:34

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

On Mon, 2005-12-12 at 17:25 -0800, George Anzinger wrote:
> >>The rationale for example talks about "a periodic timer with an absolute
> >>_initial_ expiration time", so I could also construct a valid example with
> >>this expectation. The more I read the spec the more I think the current
> >>behaviour is not correct, e.g. that ABS_TIME is only relevant for
> >>it_value.
> >>So I'm interested in specific interpretations of the spec which support
> >>the current behaviour.
>
> My $0.02 worth: It is clear (from the standard) that the initial time
> is to be ABS_TIME. It is also clear that the interval is to be added
> to that time. IMHO then, the result should have the same property,
> i.e. ABS_TIME. Sort of like adding an offset to a relative address.
> The result is still relative.

So the only difference between a timer with ABSTIME set and one without
is the notion of the initial expiry value, aside the
clock_settime(CLOCK_REALTIME) speciality.

ABSTIME:
firstexp = it_value
firstexp, firstexp + it_interval, ... firstexp + n * it_interval

non ABSTIME:
firstexp = now + it_value
firstexp, firstexp + it_interval, ... firstexp + n * it_interval

The only limitation of this is that the interval value can not be less
than the resolution of the clock in order to avoid the wrong accounting
of the overflow.

> > Unfortunately you find just the spec all over the place. I fear we have
> > to find and agree on an interpretation ourself.
> >
> > I agree, that the restriction to the initial it_value is definitely
> > something you can read out of the spec. But it does not make a lot of
> > sense for me. Also the restriction to TIMER_ABSTIME is somehow strange
> > as it converts an CLOCK_REALTIME timer to a CLOCK_MONOTONIC timer. I
> > never understood the rationale behind that.
>
> I don't think it really does that. The TIMER_ABSTIME flag just says
> that the time requested is to be taken as "clock" time (which ever
> clock) AND that this is to be the expire time regardless of clock
> setting. We, in an attempt to simplify the lists, convert the expire
> time into some common time notation (in most cases we convert relative
> times to absolute times) but this introduces problems because the
> caller has _asked_ for a relative or absolute time and not the other.
> If the clock can not be set this is not a problem. If it can, well,
> we need to keep track of what the caller wanted, absolute or relative.
>
> It might help others to understand this if you were to remove the
> clock names from your queues and instead call them "absolute_real" and
> "up_time". Then it would be more clear, I think, that we are mapping
> user requests onto these queues based on the desired functionality
> without a predilection to put a timer on a given queue just because a
> particular clock was requested. At this point it becomes clear, for
> example, that a TIMER_ABSTIME request on the real clock is the _only_
> request that should be mapped to the "absolute_real" list.

In other words. If there is only CLOCK_REALTIME, then the implementation
has to keep track of absolute and relative timers.

The existance of CLOCK_MONOTONIC and the fact that CLOCK_MONOTONIC is
using the same clock source as CLOCK_REALTIME allows us to optimize the
implementation by putting the relative timers on the monotonic list.

tglx

2005-12-13 12:45:53

by Nicolas Mailhot

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

"This is your interpretation and I disagree.

If I set up a timer with a 24 hour interval, which should go off
everyday at 6:00 AM, then I expect that this timer does this even when
the clock is set e.g. by daylight saving. I think, that this is a
completely valid interpretation and makes a lot of sense from a
practical point of view. The existing implementation does it that way
already, so why do we want to change this ?"

Please do not hardcode anywhere 1 day = 24h or something like this.
Relative timers should stay relative not depend on DST.

If someone needs a timer that sets of everyday at the same (legal) time,
make him ask for everyday at that time not one time + n x 24h.

Some processes need an exact legal hour
Other processes need an exact duration

In a DST world that's not the same thing at all - don't assume one or the
other, have coders request exactly what they need and everyone will be
happy.

I can tell from experience trying to fix code which assumed one day = 24h
is not fun at all. And yes sometimes the difference between legal and UTC
time matters a lot.

--
Nicolas Mailhot

2005-12-13 23:39:32

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Nicolas Mailhot wrote:
> "This is your interpretation and I disagree.
>
> If I set up a timer with a 24 hour interval, which should go off
> everyday at 6:00 AM, then I expect that this timer does this even when
> the clock is set e.g. by daylight saving. I think, that this is a
> completely valid interpretation and makes a lot of sense from a
> practical point of view. The existing implementation does it that way
> already, so why do we want to change this ?"

I think that there is a miss understanding here. The kernel timers,
at this time, do not know or care about daylight savings time. This
is not really a clock set but a time zone change which does not
intrude on the kernels notion of time (that being, more or less UTC).
>
> Please do not hardcode anywhere 1 day = 24h or something like this.
> Relative timers should stay relative not depend on DST.

As far as timers go, it is only the user who understands any
abstraction above the second. I.e. hour, day, min. all are
abstractions done in user land.

There is, however, one exception, the leap second. The kernel inserts
this at midnight UTC and does use a fixed constant (86400) to find
midnight.
>
> If someone needs a timer that sets of everyday at the same (legal) time,
> make him ask for everyday at that time not one time + n x 24h.
>
> Some processes need an exact legal hour
> Other processes need an exact duration

I think what we are saying is that ABS time flag says that the timer
is supposed to expire at the given time "by the specified clock",
however that time is arrived at, be it the initial time or the initial
time plus one or more intervals. We are NOT saying that these
intervals are the same size, but only that the given clock says that
they are the same size, thus any clock setting done during an interval
can cause that interval to be of a different size.

Without the ABS time flag, we are talking about intervals (the initial
and subsequent) that are NOT affected by clock setting and are thus as
close to the requested duration as possible.
>
> In a DST world that's not the same thing at all - don't assume one or the
> other, have coders request exactly what they need and everyone will be
> happy.

This is why the standard introduced the ABS time flag. It does NOT,
however, IMHO touch on the issue of time zone changes introduced by
shifting into and out of day light savings time.
>
> I can tell from experience trying to fix code which assumed one day = 24h
> is not fun at all. And yes sometimes the difference between legal and UTC
> time matters a lot.
>

--
George Anzinger [email protected]
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/

2005-12-14 08:58:44

by Kyle Moffett

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

On Dec 13, 2005, at 18:38, George Anzinger wrote:
> I think that there is a miss understanding here. The kernel
> timers, at this time, do not know or care about daylight savings
> time. This is not really a clock set but a time zone change which
> does not intrude on the kernels notion of time (that being, more or
> less UTC).

One question I have right now is: How does the kernel treat time
slewing? Sometimes I might want to say: "The clock has continuous
error and measures 24hours and 2 seconds for every 24 hours of real
time", in which case the monotonic time should be slewed -2sec/
24hours. On the other hand, I might also want to say: "The clock has
fixed error and is 2 hours ahead cause some dummy messed up the
time", so I'm going to fix this over the next 2 weeks by slewing
backwards 1 hour per 7 days, in which case I do _not_ want the
monotonic time to be affected (I'm passing 2 days, not 1 day and 22
hours). How does the kernel handle this? I've never seen any good
description of the NTP and time-control APIs; if there is one out
there (that's not 42 pages of dry standard), I would love a link.

Cheers,
Kyle Moffett

--
If you don't believe that a case based on [nothing] could potentially
drag on in court for _years_, then you have no business playing with
the legal system at all.
-- Rob Landley

2005-12-14 10:04:22

by Nicolas Mailhot

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

On Mer 14 décembre 2005 00:38, George Anzinger wrote:
> Nicolas Mailhot wrote:
>> "This is your interpretation and I disagree.
>>
>> If I set up a timer with a 24 hour interval, which should go off
>> everyday at 6:00 AM, then I expect that this timer does this even when
>> the clock is set e.g. by daylight saving. I think, that this is a
>> completely valid interpretation and makes a lot of sense from a
>> practical point of view. The existing implementation does it that way
>> already, so why do we want to change this ?"
>
> I think that there is a miss understanding here. The kernel timers,
> at this time, do not know or care about daylight savings time. This
> is not really a clock set but a time zone change which does not
> intrude on the kernels notion of time (that being, more or less UTC).

Probably. I freely admit I didn't follow the whole discussion. But the
example quoted strongly hinted at fudging timers in case of DST, which
would be very bad if done systematically and not on explicit user request.

What I meant to write is "do not assume any random clock adjustement
should change timer duration". Some people want it, others definitely
don't.

I case of kernel code legal time should be pretty much irrelevant, so if
24h timers are adjusted so they still go of at the same legal hour, that
would be a bug IMHO.

--
Nicolas Mailhot

2005-12-14 20:49:25

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Mon, 12 Dec 2005, Thomas Gleixner wrote:

> For the high resolution implementation we have to move the expired
> timers to a seperate list, as we do not want to do complex callback
> functions from the event interrupt itself. But we have to reprogramm the
> next event interrupt, so we need simple access to the timer which
> expires first.
>
> The initial implementation did simply move the timer from the pending
> list to the expired list without doing the rb_tree removal inside of the
> event interrupt handler. That way the next event for reprogramming was
> the first event in the pending list.
>
> The new rebased version with the pending list removed does the rb_tree
> removal inside the event interrupt and enqueues the timer, for which the
> callback function has to be executed in the softirq, to the expired
> list. One exception are simple wakeup callback functions, as they are
> reasonably fast and we save two context switches. The next event for
> reprogramming the event interrupt is retrieved by the pointer in the
> base structure.
>
> This way the list head is only necessary for the high resolution case.

Thanks for the explanation. If it's just for reprogramming the interrupt,
it should be cheaper to just check the rbtree than walk the list to find
the next expiration time (at least theoretically). This leaves only
optimizations for rt kernel and from the base kernel point of view I
prefer the immediate space savings.

> The state field is not removed because I'm not a big fan of those
> overloaded fields and I prefer to pay the 4 byte penalty for the
> seperation.
> Of course if there is the absolute requirement to reduce the size, I'm
> not insisting on keeping it.

Well, I'm not a big fan of redundant state information, e.g. the pending
information can be included in the rb_node (it's not as quite simple as
with the timer_list, but it's the same thing). The expired information
(together with the data field) is an optimization for simple sleeps that
AFAICT only makes a difference in the rt kernel (the saved context switch
you mentioned above). What makes me more uncomfortable is that this is a
special case optimization and other callbacks are probably fast as well
(e.g. wakeup + timer restart).

I can understand you want to keep the difference to the rt kernel small,
but for me it's more about immediate benefits against uncertain long term
benefits.

> > The rationale for example talks about "a periodic timer with an absolute
> > _initial_ expiration time", so I could also construct a valid example with
> > this expectation. The more I read the spec the more I think the current
> > behaviour is not correct, e.g. that ABS_TIME is only relevant for
> > it_value.
> > So I'm interested in specific interpretations of the spec which support
> > the current behaviour.
>
> Unfortunately you find just the spec all over the place. I fear we have
> to find and agree on an interpretation ourself.
>
> I agree, that the restriction to the initial it_value is definitely
> something you can read out of the spec. But it does not make a lot of
> sense for me. Also the restriction to TIMER_ABSTIME is somehow strange
> as it converts an CLOCK_REALTIME timer to a CLOCK_MONOTONIC timer. I
> never understood the rationale behind that.

As George already said, it's easier to keep these clocks separate. I think
the spec rationale is also more clear about the intended usage. About
timers it says:

"Two timer types are required for a system to support realtime
applications:

1. One-shot
...

2. Periodic
..."

Basically you have two independent timer types. It's quite explicit about
that only the "initial timer expiration" can be relative or absolute. It
doesn't say anywhere that there are relative and absolute periodic timer,
all references to "absolute" or "relative" are only in connection with the
initial expiration time and after the initial expiration, it becomes a
periodic timer. At every timer expiration the timer is reloaded with a
relative time interval.
I can understand that you find this behaviour useful (although other
people may disagree) and the spec doesn't explicitely say that you must
not do this, but I really can't convince myself that this is the
_intendend_ behaviour.

> > Since you don't do any rounding at all anymore, your timer may now expire
> > early with low resolution clocks (the old jiffies + 1 problem I explained
> > in my ktime_t patch).
>
> It does not expire early. The timer->expires field is still compared
> against now.

I don't think that's enough (unless I missed something). Steven maybe
explained it better than I did in
http://marc.theaimsgroup.com/?l=linux-kernel&m=113047529313935

Even if you set the timer resolution to 1 nsec, there is still the
resolution of the actual hardware clock and it has to be taken into
account somewhere when you start a relative timer. Even if the clock
resolution is usually higher than the normal latency, so the problem won't
be visible for most people, the general timer code should take this into
account. If someone doesn't care about high resolution timer, he can still
use it with a low resolution clock (e.g. jiffies) and then this becomes a
problem.

> > But this is now exactly the bevhaviour your timer has, why is not
> > "surprising" anymore?
>
> Yes, we wrote that before. After reconsidering the results we came to
> the conclusion, that we actually dont need the rounding at all because
> the uneven distance is equally surprising as the summing up errors
> introduced by rounding.

I don't think that the summing up error is surprising at all, the spec is
quite clear that the time values have to be rounded up to the resolution
of the timer and it's also the behaviour of the current timer.
This error is actually the expected behaviour for any timer with a
resolution different from 1 nsec. I don't want to say that we can't have
such a timer, but I'm not so sure whether this should be the default
behaviour. I actually prefer George's earlier suggestion of CLOCK_REALTIME
and CLOCK_REALTIME_HR, where one is possibly faster and the other is more
precise. Even within the kernel I would prefer to map itimer and nanosleep
to the first clock (maybe also based on arch/kconfig defaults).
OTOH if the hardware allows it, both clocks can do the same thing, but I
really would like to have the possibility to give higher (and thus
possibly more expensive) resolution only to those asking for it.

> > I don't mind changing the behaviour, but I would prefer to do this in a
> > separate step and with an analysis of the possible consequences. This is
> > not just about posix-timers, but it also affects itimers, nanosleep and
> > possibly other systems in the future. Actually my main focus is not on HR
> > posix timer, my main interest is that everythings else keeps working and
> > doesn't has to pay the price for it.
>
> While my focus is a clean merging of high resolution timers without
> breaking the non hrt case, I still believe that we can find a solution,
> where we can have the base implementation without any reference to
> jiffies.

This is what I think requires the better clock abstraction, most of it is
related to the clock resolution, the generic timer code currently has no
idea of the real resolution of the underlying clock, so I assumed a worst
case of TICK_NSEC everywhere.

> I try to compare and contrast the two possible solutions:
>
> Rounding the initial expiry time and the interval results in a summing
> up error, which depends on the delta of the interval and the
> resolution.
>
> The non rounding solution results in a summing up error for intervals
> which are less than the resolution. For intervals >= resolution no
> summing up error is happening, but for intervals, which are not a
> multiple of the resolution, an uneven interval as close as possible to
> the timeline is delivered.
>
> In both cases the timers never expire early and I think both variants
> are compliant with the specification.

What I'd like to avoid is that we have to commit ourselves to only one
solution right now. I think the first solution is better suited to the low
resolution timer, that we have right now. The second solution requires a
better clock framework - this includes better time keeping and
reprogrammable timer interrupts.
At this point I wouldn't like to settle on just one solution, I had to
see more of the infrastructure integrated before doing this. At this point
I see more advantages in having a choice (may it be as Kconfig or even a
runtime option).

bye, Roman

2005-12-14 22:24:01

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Wed, 2005-12-14 at 21:48 +0100, Roman Zippel wrote:
> >
> > This way the list head is only necessary for the high resolution case.
>
> Thanks for the explanation. If it's just for reprogramming the interrupt,
> it should be cheaper to just check the rbtree than walk the list to find
> the next expiration time (at least theoretically). This leaves only
> optimizations for rt kernel and from the base kernel point of view I
> prefer the immediate space savings.

The current -hrt queue contains the removal patch of the list_head
already and you interrupted my attempt to send out the patch for -mm :)

> > The state field is not removed because I'm not a big fan of those
> > overloaded fields and I prefer to pay the 4 byte penalty for the
> > seperation.
> > Of course if there is the absolute requirement to reduce the size, I'm
> > not insisting on keeping it.
>
> Well, I'm not a big fan of redundant state information, e.g. the pending
> information can be included in the rb_node (it's not as quite simple as
> with the timer_list, but it's the same thing).

I do not consider this as redundant information. It's a design decision
whether to use a state variable for state information and the rbnode for
rbtree handling or to overload the meaning of the rbnode with
information which is not the natural associated content.

I'm well aware of those optimization and space saving tricks. I did
microcontroller programming long enough, but - and out of the experience
- I want to avoid it for new designs where ever it is possible for
clarity and extensibility reasons.

> The expired information
> (together with the data field) is an optimization for simple sleeps that
> AFAICT only makes a difference in the rt kernel (the saved context switch
> you mentioned above). What makes me more uncomfortable is that this is a
> special case optimization and other callbacks are probably fast as well
> (e.g. wakeup + timer restart).

Not only in a -rt kernel, it also saves the context switch for a high
resolution timer enabled kernel, where you actually can execute the
callback in hard interrupt context. We can solve it differently, but we
should carefully think about the extensiblity issues. The wakeup +
restart szenario is a good reason to reconsider this.

> I can understand you want to keep the difference to the rt kernel small,
> but for me it's more about immediate benefits against uncertain long term
> benefits.

If you have a clear target and the experience of having implemented the
extensions, you have to carefully weigh up the consequences of such
decisions. I'm not talking about "might be implemented by somebody
sometimes features", I'm talking about existing proof of concept
implementations. There is no real justification to ignore well known
consequences.

Of course if you consider the possibility of including high resolution
timers (I'm not talking about -rt) as zero, your requests make sense.

I disagree because I'm convinced that the problems "high res timers",
"dynamic ticks", "timekeeping", "clock event abstraction" are closely
related and we have high demands to get those solved in a clean way. So
providing some jiffies bound minimal solution in the first place is more
than shortsighted IMO.

> > > The rationale for example talks about "a periodic timer with an absolute
> > > _initial_ expiration time", so I could also construct a valid example with
> > > this expectation. The more I read the spec the more I think the current
> > > behaviour is not correct, e.g. that ABS_TIME is only relevant for
> > > it_value.
> > > So I'm interested in specific interpretations of the spec which support
> > > the current behaviour.
> >
> > Unfortunately you find just the spec all over the place. I fear we have
> > to find and agree on an interpretation ourself.
> >
> > I agree, that the restriction to the initial it_value is definitely
> > something you can read out of the spec. But it does not make a lot of
> > sense for me. Also the restriction to TIMER_ABSTIME is somehow strange
> > as it converts an CLOCK_REALTIME timer to a CLOCK_MONOTONIC timer. I
> > never understood the rationale behind that.
>
> As George already said, it's easier to keep these clocks separate. I think
> the spec rationale is also more clear about the intended usage. About
> timers it says:
>
> "Two timer types are required for a system to support realtime
> applications:
>
> 1. One-shot
> ...
>
> 2. Periodic
> ..."
>
> Basically you have two independent timer types. It's quite explicit about
> that only the "initial timer expiration" can be relative or absolute. It
> doesn't say anywhere that there are relative and absolute periodic timer,
> all references to "absolute" or "relative" are only in connection with the
> initial expiration time and after the initial expiration, it becomes a
> periodic timer. At every timer expiration the timer is reloaded with a
> relative time interval.
> I can understand that you find this behaviour useful (although other
> people may disagree) and the spec doesn't explicitely say that you must
> not do this, but I really can't convince myself that this is the
> _intendend_ behaviour.

Goerge said explicitely:

> My $0.02 worth: It is clear (from the standard) that the initial time
> is to be ABS_TIME. It is also clear that the interval is to be added
> to that time. IMHO then, the result should have the same property,
> i.e. ABS_TIME.

I dont find a way to read out that the interval should not have the
ABSTIME property.

> > > Since you don't do any rounding at all anymore, your timer may now expire
> > > early with low resolution clocks (the old jiffies + 1 problem I explained
> > > in my ktime_t patch).
> >
> > It does not expire early. The timer->expires field is still compared
> > against now.
>
> I don't think that's enough (unless I missed something). Steven maybe
> explained it better than I did in
> http://marc.theaimsgroup.com/?l=linux-kernel&m=113047529313935

Steven said:

> Interesting though, I tried to force this scenario, by changing the
> base->get_time to return jiffies. I have a jitter test and ran this
> several times, and I could never get it to expire early. I even changed
> HZ back to 100.
>
> Then I looked at run_ktimer_queue. And here we have the compare:
>
> timer = list_entry(base->pending.next, struct ktimer, list);
> if (ktime_cmp(now, <=, timer->expires))
> break;
>
> So, the timer does _not_ get processed if it is after or _equal_ to the
> current time. So although the timer may go off early, the expired queue
> does not get executed. So the above example would not go off at 3.2,
> but some time in the 4 category

Again, I'm not able to find the problem.

while(timers_pending()) {
timer = getnext_timer();
if (timer->expires > now)
break;
execute_callback();
}

Please elaborate how the timer can expire early.

> Even if you set the timer resolution to 1 nsec, there is still the
> resolution of the actual hardware clock and it has to be taken into
> account somewhere when you start a relative timer. Even if the clock
> resolution is usually higher than the normal latency, so the problem won't
> be visible for most people, the general timer code should take this into
> account. If someone doesn't care about high resolution timer, he can still
> use it with a low resolution clock (e.g. jiffies) and then this becomes a
> problem.

I'm completely lost on this.
Can you please make up a simple example with numbers?

If you disable high resolution timers then the resolution is jiffies.
The simple comparision which determines whether the timer is expired or
not is still valid.

> > > But this is now exactly the bevhaviour your timer has, why is not
> > > "surprising" anymore?
> >
> > Yes, we wrote that before. After reconsidering the results we came to
> > the conclusion, that we actually dont need the rounding at all because
> > the uneven distance is equally surprising as the summing up errors
> > introduced by rounding.
>
> I don't think that the summing up error is surprising at all, the spec is
> quite clear that the time values have to be rounded up to the resolution
> of the timer and it's also the behaviour of the current timer.

No, the spec is not clear at all about this.

I pointed this out before and I still think that the part of the
RATIONALE section is the key to this decision.

"Also note that some implementations may choose to adjust time and/or
interval values to exactly match the ticks of the underlying clock"

You decide to do the adjustment. I prefer not to do so and I dont buy
any argument which says, that the current behaviour is the yardstick for
everything. It can't be. Otherwise we would not be able to introduce
high resolution timers at all. Every application which relies on some
behaviour of the kernel which is not explicitely required by the
specification is broken by definition.

The compliance requirement is the yardstick, not some random
implementation detail which happens to be compliant.

> This error is actually the expected behaviour for any timer with a
> resolution different from 1 nsec. I don't want to say that we can't have
> such a timer, but I'm not so sure whether this should be the default
> behaviour. I actually prefer George's earlier suggestion of CLOCK_REALTIME
> and CLOCK_REALTIME_HR, where one is possibly faster and the other is more
> precise. Even within the kernel I would prefer to map itimer and nanosleep
> to the first clock (maybe also based on arch/kconfig defaults).
> OTOH if the hardware allows it, both clocks can do the same thing, but I
> really would like to have the possibility to give higher (and thus
> possibly more expensive) resolution only to those asking for it.

Thats an rather odd approach for me. If we drag this further then we
might consider that only some users (i.e. applications) of -rt patches
are using the enhanced functionalities, which introduces interesting
computational problems (e.g when to treat a mutex as a concurrency
control which is capable of priority inversion or not).

I vote strongly against introducing private, special purpose APIs and I
consider CLOCK_XXX_HR as such. The proposed hrtimer solution does not
introduce any penalties for people who do not enable a future high
resolution extension. It gives us the benefit of a clean code base which
is capable to be switched simply and non intrusive to the high
resolution mode. We have done extensive tests on the impact of
converting all users unconditionally to high resolution mode once it is
switched on and the penalty is within the noise range.

You are explicitely asking for increased complexity with your approach.

> > > I don't mind changing the behaviour, but I would prefer to do this in a
> > > separate step and with an analysis of the possible consequences. This is
> > > not just about posix-timers, but it also affects itimers, nanosleep and
> > > possibly other systems in the future. Actually my main focus is not on HR
> > > posix timer, my main interest is that everythings else keeps working and
> > > doesn't has to pay the price for it.
> >
> > While my focus is a clean merging of high resolution timers without
> > breaking the non hrt case, I still believe that we can find a solution,
> > where we can have the base implementation without any reference to
> > jiffies.
>
> This is what I think requires the better clock abstraction, most of it is
> related to the clock resolution, the generic timer code currently has no
> idea of the real resolution of the underlying clock, so I assumed a worst
> case of TICK_NSEC everywhere.

Well, can you please show where the current hrtimer implementation is
doing something which requires a better clock abstraction ?

The clock abstraction layer is relevant if you actuallly want to switch
to high resolution mode and until then the coarse interface is
sufficient.

> > I try to compare and contrast the two possible solutions:
> >
> > Rounding the initial expiry time and the interval results in a summing
> > up error, which depends on the delta of the interval and the
> > resolution.
> >
> > The non rounding solution results in a summing up error for intervals
> > which are less than the resolution. For intervals >= resolution no
> > summing up error is happening, but for intervals, which are not a
> > multiple of the resolution, an uneven interval as close as possible to
> > the timeline is delivered.
> >
> > In both cases the timers never expire early and I think both variants
> > are compliant with the specification.
>
> What I'd like to avoid is that we have to commit ourselves to only one
> solution right now. I think the first solution is better suited to the low
> resolution timer, that we have right now. The second solution requires a
> better clock framework - this includes better time keeping and
> reprogrammable timer interrupts.

We have to choose one. Everything else is a bad compromise. There is
nothing worse than making no decision when you are at a point where a
decision is required.

Please provide one reproducable scenario why a better time keeping and a
reprogrammable timer interrupt is required. The current hrtimer code
does not need this at all. Only if you want to have higher resolution
you need this and we use it in the high resolution timer queue - both
the timekeeping and the reprogamming abstraction layer.

> At this point I wouldn't like to settle on just one solution, I had to
> see more of the infrastructure integrated before doing this. At this point
> I see more advantages in having a choice (may it be as Kconfig or even a
> runtime option).

Well, I do not see a point why we want to have Kconfig, runtime or
whatever choices for a non existant problem at all.

tglx

2005-12-15 00:56:43

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Thomas Gleixner wrote:
> Hi,
~

>
>
>>This error is actually the expected behaviour for any timer with a
>>resolution different from 1 nsec. I don't want to say that we can't have
>>such a timer, but I'm not so sure whether this should be the default
>>behaviour. I actually prefer George's earlier suggestion of CLOCK_REALTIME
>>and CLOCK_REALTIME_HR, where one is possibly faster and the other is more
>>precise. Even within the kernel I would prefer to map itimer and nanosleep
>>to the first clock (maybe also based on arch/kconfig defaults).
>>OTOH if the hardware allows it, both clocks can do the same thing, but I
>>really would like to have the possibility to give higher (and thus
>>possibly more expensive) resolution only to those asking for it.
>
>
> Thats an rather odd approach for me. If we drag this further then we
> might consider that only some users (i.e. applications) of -rt patches
> are using the enhanced functionalities, which introduces interesting
> computational problems (e.g when to treat a mutex as a concurrency
> control which is capable of priority inversion or not).

Er... what? This is a non-compute.
>
> I vote strongly against introducing private, special purpose APIs and I
> consider CLOCK_XXX_HR as such. The proposed hrtimer solution does not
> introduce any penalties for people who do not enable a future high
> resolution extension. It gives us the benefit of a clean code base which
> is capable to be switched simply and non intrusive to the high
> resolution mode. We have done extensive tests on the impact of
> converting all users unconditionally to high resolution mode once it is
> switched on and the penalty is within the noise range.
>
> You are explicitely asking for increased complexity with your approach.

I beg to differ here. The fact that high res timers, in general,
require an interrupt per expiry, and that, by definition, we are
changing the resolution by, I would guess, a couple of orders of
magnitude implies a rather much larger over head. If we sum this over
all user timers it can IMHO get out of control. Given that only a
very small number of applications really need the extra resolution, I
think it makes a lot of sense that those applications incur the
overhead and others, which don't need nor want the higher resolution,
just use the old low resolution timers. The notion of switching this
at configure time implies that a given kernel is going to be used ONLY
one way or another for all applications, which, AFAICT is just not the
way most users do things.

As to CLOCK_XXX_HR being a special purpose API, this is only half
true. It is a POSIX conforming extension and I do think you can find
it used elsewhere as well. On the other hand, it if you want to limit
the higher overhead timers to only those who ask, well, I guess you
could call that "special purpose".

On the complexity thing, your new organization makes the added
"complexity" rather non-complex, in fact, you might say it is down
right simple, for which, thank you.
>
>
~
--
George Anzinger [email protected]
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/

2005-12-15 01:11:49

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Nicolas Mailhot wrote:
> On Mer 14 dÃ©cembre 2005 00:38, George Anzinger wrote:
>
>>Nicolas Mailhot wrote:
>>
>>>"This is your interpretation and I disagree.
>>>
>>>If I set up a timer with a 24 hour interval, which should go off
>>>everyday at 6:00 AM, then I expect that this timer does this even when
>>>the clock is set e.g. by daylight saving. I think, that this is a
>>>completely valid interpretation and makes a lot of sense from a
>>>practical point of view. The existing implementation does it that way
>>>already, so why do we want to change this ?"
>>
>>I think that there is a miss understanding here. The kernel timers,
>>at this time, do not know or care about daylight savings time. This
>>is not really a clock set but a time zone change which does not
>>intrude on the kernels notion of time (that being, more or less UTC).
>
>
> Probably. I freely admit I didn't follow the whole discussion. But the
> example quoted strongly hinted at fudging timers in case of DST, which
> would be very bad if done systematically and not on explicit user request.
>
> What I meant to write is "do not assume any random clock adjustement
> should change timer duration". Some people want it, others definitely
> don't.
>
> I case of kernel code legal time should be pretty much irrelevant, so if
> 24h timers are adjusted so they still go of at the same legal hour, that
> would be a bug IMHO.

I am not quite sure what you are asking for here, but, as things set
today, the kernels notion of time starts with a set time somewhere
around boot up, be it from RT clock hardware or, possibly some script
that quires some other system to find the time (NTP or otherwise).
This time is then kept up to date by timer ticks which are assumed to
have some fixed duration with, possibly, small drift corrections via
NTP code. And then there is the random settimeofday, which the kernel
has to assume is "needed" and correct.

On top of this the POSIX clocks and timers code implements clocks
which read the current time and a system relative time called
monotonic time. We, by convention, roll the monotonic time and uptime
together, and, assuming that the NTP corrections are telling us
something about our "rock", we correct both the monotonic time and the
time of day as per the NTP requests.

Timers are then built on top of these clocks in two ways, again, as
per the POSIX standard: 1) the relative timer, and 2) the absolute
timer. For the relative timer, the specified expiry time is defined
to be _now_ plus the given interval. For the absolute timer the
expiry time is defined as that time when the given clock reaches the
requested time.

The only thing in here that might relate to your "legal" hour is that
we adjust (via NTP) the clocks so that they, supposedly, run at the
NBS (or is it the Naval Observatory) rate, give or take a small,
hopefully, well defined error.
>

--
George Anzinger [email protected]
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/

2005-12-15 01:35:41

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Mon, 12 Dec 2005, George Anzinger wrote:

> My $0.02 worth: It is clear (from the standard) that the initial time is to be
> ABS_TIME.

Yes.

> It is also clear that the interval is to be added to that time.

Not necessarily. It says it_interval is a "reload value", it's used to
reload the timer to count down to the next expiration.
It's up to the implementation, whether it really counts down this time or
whether it converts it first into an absolute value.

> IMHO then, the result should have the same property, i.e. ABS_TIME. Sort of
> like adding an offset to a relative address. The result is still relative.

If the result is relative, why should have a clock set any effect?
IMO the spec makes it quite clear that initial timer and the periodic
timer are two different types of the timer. The initial timer only
specifies how the periodic timer is started and the periodic timer itself
is a "relative time service".

bye, Roman

2005-12-15 02:30:34

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Roman Zippel wrote:
> Hi,
>
> On Mon, 12 Dec 2005, George Anzinger wrote:
>
>
>>My $0.02 worth: It is clear (from the standard) that the initial time is to be
>>ABS_TIME.
>
>
> Yes.
>
>
>> It is also clear that the interval is to be added to that time.
>
>
> Not necessarily. It says it_interval is a "reload value", it's used to
> reload the timer to count down to the next expiration.
> It's up to the implementation, whether it really counts down this time or
> whether it converts it first into an absolute value.
>
>
>>IMHO then, the result should have the same property, i.e. ABS_TIME. Sort of
>>like adding an offset to a relative address. The result is still relative.
>
>
> If the result is relative, why should have a clock set any effect?
> IMO the spec makes it quite clear that initial timer and the periodic
> timer are two different types of the timer. The initial timer only
> specifies how the periodic timer is started and the periodic timer itself
> is a "relative time service".
>
Well, I guess we will have to agree to disagree. That which the
interval is added to is an absolute time, so I, and others, take the
result as absolute. At this point there really is no "conversion" to
an absolute timer. Once the timer initial time is absolute,
everything derived from it, i.e. all intervals added to it, must be
absolute.

For what its worth, I do think that the standards folks could have
done a bit better here. I, for example, would have liked to have seen
a discussion about what to do with overrun in the face of clock setting.

--
George Anzinger [email protected]
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/

2005-12-15 14:18:39

by Steven Rostedt

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

On Wed, 2005-12-14 at 23:30 +0100, Thomas Gleixner wrote:

> >
> > I don't think that's enough (unless I missed something). Steven maybe
> > explained it better than I did in
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=113047529313935
>
> Steven said:
>
> > Interesting though, I tried to force this scenario, by changing the
> > base->get_time to return jiffies. I have a jitter test and ran this
> > several times, and I could never get it to expire early. I even changed
> > HZ back to 100.
> >
> > Then I looked at run_ktimer_queue. And here we have the compare:
> >
> > timer = list_entry(base->pending.next, struct ktimer, list);
> > if (ktime_cmp(now, <=, timer->expires))
> > break;
> >
> > So, the timer does _not_ get processed if it is after or _equal_ to the
> > current time. So although the timer may go off early, the expired queue
> > does not get executed. So the above example would not go off at 3.2,
> > but some time in the 4 category
>
> Again, I'm not able to find the problem.
>
> while(timers_pending()) {
> timer = getnext_timer();
> if (timer->expires > now)
> break;
> execute_callback();
> }
>
> Please elaborate how the timer can expire early.

Actually Thomas, the above code doesn't handle it correctly, although,
the code you have in hrtimer.c does. Here you say ">" where it should
be ">=", otherwise you can have the affect that I explained in the
reference that Roman stated.

Although run_hrtimer_queue is still correct, I think you might want to
change the hrtimer_forward. It has a ">" where it should be ">=".

-- Steve

2005-12-19 14:50:43

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Wed, 14 Dec 2005, Thomas Gleixner wrote:

> The current -hrt queue contains the removal patch of the list_head
> already and you interrupted my attempt to send out the patch for -mm :)

Ok.

> > > The state field is not removed because I'm not a big fan of those
> > > overloaded fields and I prefer to pay the 4 byte penalty for the
> > > seperation.
> > > Of course if there is the absolute requirement to reduce the size, I'm
> > > not insisting on keeping it.
> >
> > Well, I'm not a big fan of redundant state information, e.g. the pending
> > information can be included in the rb_node (it's not as quite simple as
> > with the timer_list, but it's the same thing).
>
> I do not consider this as redundant information. It's a design decision
> whether to use a state variable for state information and the rbnode for
> rbtree handling or to overload the meaning of the rbnode with
> information which is not the natural associated content.
>
> I'm well aware of those optimization and space saving tricks. I did
> microcontroller programming long enough, but - and out of the experience
> - I want to avoid it for new designs where ever it is possible for
> clarity and extensibility reasons.

It's not just about optimization tricks, it's about redundant information.
Right now the primary function of the state is to tell whether the timer
node is in the tree or now, so I prefer to add this information directly
to the rbnode, similiar to what we do with normal lists.
The other problem is that such "simple" state variables quickly become
inadequate as the state machine becomes more complex, especially if
multiple processes run at the same time (e.g. a timer can be "running"
and/or "active"). So for me it's also for clarity and extensibility
reasons, that I want to avoid overloaded state machines and rather keep
it as simple as possible.

> > The expired information
> > (together with the data field) is an optimization for simple sleeps that
> > AFAICT only makes a difference in the rt kernel (the saved context switch
> > you mentioned above). What makes me more uncomfortable is that this is a
> > special case optimization and other callbacks are probably fast as well
> > (e.g. wakeup + timer restart).
>
> Not only in a -rt kernel, it also saves the context switch for a high
> resolution timer enabled kernel, where you actually can execute the
> callback in hard interrupt context. We can solve it differently, but we
> should carefully think about the extensiblity issues. The wakeup +
> restart szenario is a good reason to reconsider this.

I don't think executing something in the soft or hard interrupt context
makes a big difference on a normal kernel (at least I wouldn't call it a
context switch).

> > I can understand you want to keep the difference to the rt kernel small,
> > but for me it's more about immediate benefits against uncertain long term
> > benefits.
>
> If you have a clear target and the experience of having implemented the
> extensions, you have to carefully weigh up the consequences of such
> decisions. I'm not talking about "might be implemented by somebody
> sometimes features", I'm talking about existing proof of concept
> implementations. There is no real justification to ignore well known
> consequences.
>
> Of course if you consider the possibility of including high resolution
> timers (I'm not talking about -rt) as zero, your requests make sense.

Thomas, please don't treat me like an idiot, you may have more experiance
with hrtimer, but after working on it for a while I also know what I'm
talking about. Please accept that I have different focus on this, I want
to keep things as simple as possible. New features should stand on his
own and this includes the complexity they add to the kernel. The new
hrtimer especially cleans up greatly the posix timer stuff, this and
keeping all other users working is my primary focus now.

New features add new complexity and I want to see and evaluate it at the
time it's added to the kernel, primarily to find solutions to avoid the
(runtime) complexity for clocks which don't want to or can't support such
high resolutions, so they don't have to pay the price for these new
features. I want to keep things flexible and keeping things simple is IMO
a much better starting point.

> I disagree because I'm convinced that the problems "high res timers",
> "dynamic ticks", "timekeeping", "clock event abstraction" are closely
> related and we have high demands to get those solved in a clean way. So
> providing some jiffies bound minimal solution in the first place is more
> than shortsighted IMO.

You're misunderstanding me, I don't want "some jiffies bound minimal
solution", I want to solve one problem at a time and fixing the jiffies
problem requires solving problems in the clock abstraction first,
otherwise you produce a crutch which works in most cases, but leaves a few
problem cases behind.

> Goerge said explicitely:
>
> > My $0.02 worth: It is clear (from the standard) that the initial time
> > is to be ABS_TIME. It is also clear that the interval is to be added
> > to that time. IMHO then, the result should have the same property,
> > i.e. ABS_TIME.
>
> I dont find a way to read out that the interval should not have the
> ABSTIME property.

That's not what you wrote earlier: "I agree, that the restriction to the
initial it_value is definitely something you can read out of the spec."

clock_settime says: "..., these time services shall expire when the
requested relative interval elapses, independently of the new or old value
of the clock." it_interval is a relative interval and otherwise the spec
only talks about "an initial expiration time, again either relative or
absolute," I can't really find a direct connection that TIMER_ABSTIME
should apply to the interval as well.

> > > > Since you don't do any rounding at all anymore, your timer may now expire
> > > > early with low resolution clocks (the old jiffies + 1 problem I explained
> > > > in my ktime_t patch).
> > >
> > > It does not expire early. The timer->expires field is still compared
> > > against now.
> >
> > I don't think that's enough (unless I missed something). Steven maybe
> > explained it better than I did in
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=113047529313935
>
> Steven said:
>
> > Interesting though, I tried to force this scenario, by changing the
> > base->get_time to return jiffies. I have a jitter test and ran this
> > several times, and I could never get it to expire early.
> [..]
> Please elaborate how the timer can expire early.

At this time you still did the rounding of the values, so it actually
worked.
When reading a time t from a clock with resolution r, the real time can be
anything from t to t+r-1. Assuming it's currently t+r-1 and you try to set
a relative timer to r-1, you will read t from the clock and arm the timer
for t+r-1, which will cause the timer to expire at t+r, where it must not
expire before t+r-1+r-1.
It currently only works because latencies are usually larger than the
clock resolution, but if I want to configure hrtimer with a low resolution
clock, the problem can become visible.

> > > > But this is now exactly the bevhaviour your timer has, why is not
> > > > "surprising" anymore?
> > >
> > > Yes, we wrote that before. After reconsidering the results we came to
> > > the conclusion, that we actually dont need the rounding at all because
> > > the uneven distance is equally surprising as the summing up errors
> > > introduced by rounding.
> >
> > I don't think that the summing up error is surprising at all, the spec is
> > quite clear that the time values have to be rounded up to the resolution
> > of the timer and it's also the behaviour of the current timer.
>
> No, the spec is not clear at all about this.
>
> I pointed this out before and I still think that the part of the
> RATIONALE section is the key to this decision.
>
> "Also note that some implementations may choose to adjust time and/or
> interval values to exactly match the ticks of the underlying clock"

You basically use this sentence as loophole to take the whole resolution
rounding rule ad absurdum and I disagree that this sentence means
"ignore everything above and do whatever you want".

> You decide to do the adjustment. I prefer not to do so and I dont buy
> any argument which says, that the current behaviour is the yardstick for
> everything. It can't be. Otherwise we would not be able to introduce
> high resolution timers at all. Every application which relies on some
> behaviour of the kernel which is not explicitely required by the
> specification is broken by definition.

The problem is that you force this behaviour also on other hrtimer users
(itimers and nanosleep) and we should be very careful with such behaviour
changes. My proposal keeps the current behaviour and is less likely to
break anything. As I said before I'm not against changing the behaviour,
but it should be done carefully.

> I vote strongly against introducing private, special purpose APIs and I
> consider CLOCK_XXX_HR as such. The proposed hrtimer solution does not
> introduce any penalties for people who do not enable a future high
> resolution extension. It gives us the benefit of a clean code base which
> is capable to be switched simply and non intrusive to the high
> resolution mode. We have done extensive tests on the impact of
> converting all users unconditionally to high resolution mode once it is
> switched on and the penalty is within the noise range.
>
> You are explicitely asking for increased complexity with your approach.

Which in this case it would be good thing. Right now we don't have much
choice in clock source, but that will change soon and I think it would be
a good to be able to map a timer to specific clock source. The gained
flexibility outweighs the required complexity greatly.

> > > While my focus is a clean merging of high resolution timers without
> > > breaking the non hrt case, I still believe that we can find a solution,
> > > where we can have the base implementation without any reference to
> > > jiffies.
> >
> > This is what I think requires the better clock abstraction, most of it is
> > related to the clock resolution, the generic timer code currently has no
> > idea of the real resolution of the underlying clock, so I assumed a worst
> > case of TICK_NSEC everywhere.
>
> Well, can you please show where the current hrtimer implementation is
> doing something which requires a better clock abstraction ?

1. clock resolution is unknown (see above).
2. reprogrammable timer interrupts.

> The clock abstraction layer is relevant if you actuallly want to switch
> to high resolution mode and until then the coarse interface is
> sufficient.

Right and until then it's also not really avoidable, that you find a few
references to jiffies. I'm not saying that we have to keep them, but
please only one step at a time.

> > What I'd like to avoid is that we have to commit ourselves to only one
> > solution right now. I think the first solution is better suited to the low
> > resolution timer, that we have right now. The second solution requires a
> > better clock framework - this includes better time keeping and
> > reprogrammable timer interrupts.
>
> We have to choose one. Everything else is a bad compromise. There is
> nothing worse than making no decision when you are at a point where a
> decision is required.

Do you really have so little trust in your own code, that we can't afford
the flexibility and have to hardcode the timer resolution now?

bye, Roman

2005-12-19 14:56:29

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Wed, 14 Dec 2005, George Anzinger wrote:

> > > IMHO then, the result should have the same property, i.e. ABS_TIME. Sort
> > > of
> > > like adding an offset to a relative address. The result is still relative.
> >
> >
> > If the result is relative, why should have a clock set any effect?
> > IMO the spec makes it quite clear that initial timer and the periodic timer
> > are two different types of the timer. The initial timer only specifies how
> > the periodic timer is started and the periodic timer itself is a "relative
> > time service".
> >
> Well, I guess we will have to agree to disagree.

That's easy for you to say. :)
You don't think the current behaviour is wrong.

> That which the interval is
> added to is an absolute time, so I, and others, take the result as absolute.
> At this point there really is no "conversion" to an absolute timer. Once the
> timer initial time is absolute, everything derived from it, i.e. all intervals
> added to it, must be absolute.

With this argumentation, any relative timer could be treated this way, you
have to base a relative timer on something.
While searching for more information I found the NetBSD code and they
do exactly this, they just convert everything to absolute values and clock
set affects all timers equally. Is this now more correct?

> For what its worth, I do think that the standards folks could have done a bit
> better here. I, for example, would have liked to have seen a discussion about
> what to do with overrun in the face of clock setting.

Maybe they thought it wouldn't be necessary :), because a periodic is a
relative timer and thus not affected...

bye, Roman

2005-12-19 20:56:39

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Roman Zippel wrote:
> Hi,
>
> On Wed, 14 Dec 2005, George Anzinger wrote:
>
>
>>>>IMHO then, the result should have the same property, i.e. ABS_TIME. Sort
>>>>of
>>>>like adding an offset to a relative address. The result is still relative.
>>>
>>>
>>>If the result is relative, why should have a clock set any effect?
>>>IMO the spec makes it quite clear that initial timer and the periodic timer
>>>are two different types of the timer. The initial timer only specifies how
>>>the periodic timer is started and the periodic timer itself is a "relative
>>>time service".
>>>
>>
>>Well, I guess we will have to agree to disagree.
>
>
> That's easy for you to say. :)
> You don't think the current behaviour is wrong.
>
>
On of the issues I see with using your assumption is that moving the
timer to an absolute clock after the initial expiry _may_ lead to
additional qauntization errors, depending on how aligned the two
clocks are.

>> That which the interval is
>>added to is an absolute time, so I, and others, take the result as absolute.
>>At this point there really is no "conversion" to an absolute timer. Once the
>>timer initial time is absolute, everything derived from it, i.e. all intervals
>>added to it, must be absolute.
>
>
> With this argumentation, any relative timer could be treated this way, you
> have to base a relative timer on something.
> While searching for more information I found the NetBSD code and they
> do exactly this, they just convert everything to absolute values and clock
> set affects all timers equally. Is this now more correct?
>
I would guess, then, that either the non-absolute or the absolute
timer behaves badly in the face of clock setting. Could you provide a
pointer to the NetBSD code so I can have a look too?
>
>>For what its worth, I do think that the standards folks could have done a bit
>>better here. I, for example, would have liked to have seen a discussion about
>>what to do with overrun in the face of clock setting.
>
>
> Maybe they thought it wouldn't be necessary :), because a periodic is a
> relative timer and thus not affected...

Well, then they could have said that :) Might have prevented a lot of
lkml bandwidth usage as well as several days of my time trying to do
something other than what they might say is the right thing.

--
George Anzinger [email protected]
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/

2005-12-19 22:04:42

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Mon, 2005-12-19 at 15:50 +0100, Roman Zippel wrote:

> It's not just about optimization tricks, it's about redundant information.
> Right now the primary function of the state is to tell whether the timer
> node is in the tree or now, so I prefer to add this information directly
> to the rbnode, similiar to what we do with normal lists.
> The other problem is that such "simple" state variables quickly become
> inadequate as the state machine becomes more complex, especially if
> multiple processes run at the same time (e.g. a timer can be "running"
> and/or "active"). So for me it's also for clarity and extensibility
> reasons, that I want to avoid overloaded state machines and rather keep
> it as simple as possible.

You want to avoid overloaded state machines and therefor overload a
rbnode struct with state information ?

Sorry, -ENOPARSE.

> > Not only in a -rt kernel, it also saves the context switch for a high
> > resolution timer enabled kernel, where you actually can execute the
> > callback in hard interrupt context. We can solve it differently, but we
> > should carefully think about the extensiblity issues. The wakeup +
> > restart szenario is a good reason to reconsider this.
>
> I don't think executing something in the soft or hard interrupt context
> makes a big difference on a normal kernel (at least I wouldn't call it a
> context switch).

Well. How would you call it then?

Thread A runs
hrtimer interrupt
timer X is expired, softirq is woken up

context switch to softirq

softirq runs
timer X callback is executed, thread B is woken up

context switch to thread B

versus

Thread A runs
hrtimer interrupt
timer X is expired, callback is executed thread B is woken up

context switch to thread B

I still call it a context switch, because it is one, except for the case
where the softirq is called in the interrupt return path, but also then
we gain the advantage that we do not have to execute it.

> > > I can understand you want to keep the difference to the rt kernel small,
> > > but for me it's more about immediate benefits against uncertain long term
> > > benefits.
> >
> > If you have a clear target and the experience of having implemented the
> > extensions, you have to carefully weigh up the consequences of such
> > decisions. I'm not talking about "might be implemented by somebody
> > sometimes features", I'm talking about existing proof of concept
> > implementations. There is no real justification to ignore well known
> > consequences.
> >
> > Of course if you consider the possibility of including high resolution
> > timers (I'm not talking about -rt) as zero, your requests make sense.
>
> Thomas, please don't treat me like an idiot, you may have more experiance
> with hrtimer, but after working on it for a while I also know what I'm
> talking about. Please accept that I have different focus on this, I want
> to keep things as simple as possible. New features should stand on his
> own and this includes the complexity they add to the kernel. The new
> hrtimer especially cleans up greatly the posix timer stuff, this and
> keeping all other users working is my primary focus now.

The basic hrtimer patch without any addons does not introduce
complexities and is simple and keeps everything working.

Can you please elaborate the new features and complexities instead of
repeating this over and over without pointing out exactly what and where
it is?

> New features add new complexity and I want to see and evaluate it at the
> time it's added to the kernel, primarily to find solutions to avoid the
> (runtime) complexity for clocks which don't want to or can't support such
> high resolutions, so they don't have to pay the price for these new
> features. I want to keep things flexible and keeping things simple is IMO
> a much better starting point.

The code is flexible to handle low resolution as well as a later high
resolution extension. It does not introduce additional complexity to
anything. Stop this prayer wheel argumentation and show exactly which
complexitiy it introduces.

> > I disagree because I'm convinced that the problems "high res timers",
> > "dynamic ticks", "timekeeping", "clock event abstraction" are closely
> > related and we have high demands to get those solved in a clean way. So
> > providing some jiffies bound minimal solution in the first place is more
> > than shortsighted IMO.
>
> You're misunderstanding me, I don't want "some jiffies bound minimal
> solution", I want to solve one problem at a time and fixing the jiffies
> problem requires solving problems in the clock abstraction first,
> otherwise you produce a crutch which works in most cases, but leaves a few
> problem cases behind.

Which problem cases please ?

> > Goerge said explicitely:
> >
> > > My $0.02 worth: It is clear (from the standard) that the initial time
> > > is to be ABS_TIME. It is also clear that the interval is to be added
> > > to that time. IMHO then, the result should have the same property,
> > > i.e. ABS_TIME.
> >
> > I dont find a way to read out that the interval should not have the
> > ABSTIME property.
>
> That's not what you wrote earlier: "I agree, that the restriction to the
> initial it_value is definitely something you can read out of the spec."

I was talking about Georges citiation and not about some random pieces
of text cut out of the original context. I still dont find a way to
interprete Georges writing in the way you did.

> clock_settime says: "..., these time services shall expire when the
> requested relative interval elapses, independently of the new or old value
> of the clock." it_interval is a relative interval and otherwise the spec
> only talks about "an initial expiration time, again either relative or
> absolute," I can't really find a direct connection that TIMER_ABSTIME
> should apply to the interval as well.

timer_set says: "... The reload value of the timer is set to the value
specified by the it_interval member of value. When a timer is armed with
a non-zero it_interval, a periodic (or repetitive) timer is specified."

I dont see a notion that an interval does remove the ABSTIME property
from a timer which was set up with the ABSTIME flag set.

And you are claiming not to change anything in order not to break
anything. The current upstream code is keeping ABSTIME interval timers
on the abslist, so why are you changing this at will for no real good
reason.

> > Please elaborate how the timer can expire early.
>
> At this time you still did the rounding of the values, so it actually
> worked.
> When reading a time t from a clock with resolution r, the real time can be
> anything from t to t+r-1. Assuming it's currently t+r-1 and you try to set
> a relative timer to r-1, you will read t from the clock and arm the timer
> for t+r-1, which will cause the timer to expire at t+r, where it must not
> expire before t+r-1+r-1.

I dont see where you pull this from.

At any given point the clock reads a value between two ticks.

t(tickn) <= now < t(tickn+1), where t(tickn+1) - t(tickn) = resolution

In any given case the interval is added to now.

expiry = now + interval

The expiry check is still

if (expiry <= now)
expire_timer()

The softirq which handles the expiry is called every tick, which
guarantees that the timer is always expired at or past the tick
boundary, but never ever it can be expired early.

> It currently only works because latencies are usually larger than the
> clock resolution, but if I want to configure hrtimer with a low resolution
> clock, the problem can become visible.

Where is this configuration switch in the posted code ? The posted
hrtimers code is low resolution.

Show me a simple example code which makes this become visible.

> > > > > But this is now exactly the bevhaviour your timer has, why is not
> > > > > "surprising" anymore?
> > > >
> > > > Yes, we wrote that before. After reconsidering the results we came to
> > > > the conclusion, that we actually dont need the rounding at all because
> > > > the uneven distance is equally surprising as the summing up errors
> > > > introduced by rounding.
> > >
> > > I don't think that the summing up error is surprising at all, the spec is
> > > quite clear that the time values have to be rounded up to the resolution
> > > of the timer and it's also the behaviour of the current timer.
> >
> > No, the spec is not clear at all about this.
> >
> > I pointed this out before and I still think that the part of the
> > RATIONALE section is the key to this decision.
> >
> > "Also note that some implementations may choose to adjust time and/or
> > interval values to exactly match the ticks of the underlying clock"
>
> You basically use this sentence as loophole to take the whole resolution
> rounding rule ad absurdum and I disagree that this sentence means
> "ignore everything above and do whatever you want".

I do not whatever I want and you well know that. The rounding is still
done on expiry time, which means the rounding happens on the tick
boundary.

"Time values that are between two consecutive non-negative integer
multiples of the resolution of the specified timer will be rounded up to
the larger multiple of the resolution. Quantization error will not cause
the timer to expire earlier than the rounded time value.
....
Also note that some implementations may choose to adjust time and/or
interval values to exactly match the ticks of the underlying clock."

Please tell me where my interpretation is violating the spec.

It is different to your interpretation, thats all.

> > You decide to do the adjustment. I prefer not to do so and I dont buy
> > any argument which says, that the current behaviour is the yardstick for
> > everything. It can't be. Otherwise we would not be able to introduce
> > high resolution timers at all. Every application which relies on some
> > behaviour of the kernel which is not explicitely required by the
> > specification is broken by definition.
>
> The problem is that you force this behaviour also on other hrtimer users
> (itimers and nanosleep) and we should be very careful with such behaviour
> changes. My proposal keeps the current behaviour and is less likely to
> break anything. As I said before I'm not against changing the behaviour,
> but it should be done carefully.
>
> > I vote strongly against introducing private, special purpose APIs and I
> > consider CLOCK_XXX_HR as such. The proposed hrtimer solution does not
> > introduce any penalties for people who do not enable a future high
> > resolution extension. It gives us the benefit of a clean code base which
> > is capable to be switched simply and non intrusive to the high
> > resolution mode. We have done extensive tests on the impact of
> > converting all users unconditionally to high resolution mode once it is
> > switched on and the penalty is within the noise range.
> >
> > You are explicitely asking for increased complexity with your approach.
>
> Which in this case it would be good thing. Right now we don't have much
> choice in clock source, but that will change soon and I think it would be
> a good to be able to map a timer to specific clock source. The gained
> flexibility outweighs the required complexity greatly.

I really want to know what you are talking about. I have proven, that
without clock abstraction improvements the current code is working
without one single reference to jiffies. When clock abstractions are
available the code will make use of them just by changing the functions
which read the crurrent time.

> > > > While my focus is a clean merging of high resolution timers without
> > > > breaking the non hrt case, I still believe that we can find a solution,
> > > > where we can have the base implementation without any reference to
> > > > jiffies.
> > >
> > > This is what I think requires the better clock abstraction, most of it is
> > > related to the clock resolution, the generic timer code currently has no
> > > idea of the real resolution of the underlying clock, so I assumed a worst
> > > case of TICK_NSEC everywhere.
> >
> > Well, can you please show where the current hrtimer implementation is
> > doing something which requires a better clock abstraction ?
>
> 1. clock resolution is unknown (see above).
> 2. reprogrammable timer interrupts.

There is no single reference to a reprogrammable timer interrupt in the
current hrtimer code which was posted and replaced the initial ktimer
code.

Please stop mixing up the high resolution timer implementation on top of
hrtimers with the hrtimers base patch for a cheap argument. We have been
there and I dont see a point to get back to this kind of discussion.

> > The clock abstraction layer is relevant if you actuallly want to switch
> > to high resolution mode and until then the coarse interface is
> > sufficient.
>
> Right and until then it's also not really avoidable, that you find a few
> references to jiffies. I'm not saying that we have to keep them, but
> please only one step at a time.

No, it is not necessary at all. Thats proven and it is one step at a
time. If we can avoid jiffies in the first place why should we put them
there ? What would we gain ?

> > > What I'd like to avoid is that we have to commit ourselves to only one
> > > solution right now. I think the first solution is better suited to the low
> > > resolution timer, that we have right now. The second solution requires a
> > > better clock framework - this includes better time keeping and
> > > reprogrammable timer interrupts.
> >
> > We have to choose one. Everything else is a bad compromise. There is
> > nothing worse than making no decision when you are at a point where a
> > decision is required.
>
> Do you really have so little trust in your own code, that we can't afford
> the flexibility and have to hardcode the timer resolution now?

I trust my code, but you seem to trust jiffies more than simple math.

What flexibility do you gain with your jiffies implementation ?
The flexibility to add some replacement code which will look basically
the same way as hrtimers are looking now ?

tglx

2005-12-21 23:04:27

[permalink] [raw]

Subject: Re: [patch 00/21] hrtimer - High-resolution timer subsystem

Hi,

On Mon, 19 Dec 2005, George Anzinger wrote:

> > You don't think the current behaviour is wrong.
> >
> >
> On of the issues I see with using your assumption is that moving the timer to
> an absolute clock after the initial expiry _may_ lead to additional
> qauntization errors, depending on how aligned the two clocks are.

What do you mean by "moving the timer to an an absolute clock"?

> I would guess, then, that either the non-absolute or the absolute timer
> behaves badly in the face of clock setting. Could you provide a pointer to
> the NetBSD code so I can have a look too?

http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_time.c?rev=1.98&content-type=text/x-cvsweb-markup
AFAICT TIMER_ABSTIME is only used to convert the relative value to an
absolute value.

bye, Roman

2005-12-22 04:31:48