2005-11-30 23:51:19

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 00/43] ktimer reworked

this patch series is a refactored version of the ktimer-subsystem patch.

firstly, we fixed two bugs noticed by Roman Zippel (and mentioned in the
ptimer patch-queue):

- clock_nanosleep(): interrupted absolute sleeps are now handled
correctly

- overrun accounting in posix interval timers is fixed.

furthermore, we have split the ktimer patch series up into 23 patches -
each stage introduces one new (and mostly small) conceptual step. Each
stage builds & boots fine. The ktimer design has not been changed
radically, but some improvements and reshaping has been done in the
course of the rework.

based on the idea from Roman Zippel, we have eliminated the 'pure scalar
type' in ktime.h, and have gone for the ktime union for both 32-bit and
64-bit platforms. This resulted in further simplifications, e.g. the
ktime_cmp() ugliness has been removed.

to address the naming debate: we do agree that 'struct ktimer' and
'struct timer_list' is confusing and awkward - both are about timers.
Same goes for kernel/ktimer.c and kernel/timer.c.

but what we'd like to achieve as an end-result is the clear separation
of 'timer' vs. 'timeout' APIs. Our proposed end result would be to have
'struct ktimer' for timers, and 'struct ktimeout' for timeouts.

to prove the feasibility of this separation, we have also included
another 20 patches ontop of the ktimer patch-queue, which patches
introduce the 'ktimeout' types and APIs. This is mostly a careful
renaming and refactoring of the current 'struct timer_list' and APIs
into a 'struct ktimeout' object and API. (see the patches for details)
These patches too build & boot at every stage.

we have done this to show that the 'ktimer' vs. 'timer_list' confusion
is only temporary, and that the intended end-result is 'struct ktimer'
and 'struct ktimeout' - two separate concepts.

on the other hand, 'struct ptimer' and 'struct ktimeout' [or 'struct
ptimer' and 'struct timer_list'] would be just as confusing.

here's a list of other differences to the ptimer patchqueue:

- ptimer.c is smaller, for the price of making users of the APIs more
complex. ktimer.c carries more features and thus more code, so that
users can have simpler code.

these features are not unconditional: we tried to be careful to not
let 'cruft' creep into ktimer.c: e.g. quite some Posix 'special timer
code' is still kept in posix-timers.c.

Right now the extra code in ktimer.c is roughly offset by the code
reduction in posix-timers.c, itimer.c and timer.c/ktimeout.c, and we
have not converted posix-cpu-timers to ktimers yet. So we believe this
choice of us is a net win, even with the relative low number of
ktimer users at this early stage.

- the ktimer subsystem is designed with the extensibility for high
resolution timers and dynamic ticks in mind, without having to do
further rewrites. The practical feasibility and cleanliness of the
ktimer approach has been proven since the very beginning, both the
-khrt and the -rt trees have carried those patches for months now,
with a real HRT implementation ontop of it. We believe that some
details that the ptimers patch-queue chopped off the ktimers codebase
will have to be reintroduced for HRT timers later on.

- the resolution handling is implemented without any jiffy relations and
resembles the behaviour of the current implementation. The first view
of more complexity has to be carefully weighed against the flexibility
for further extensions.

- ktimer is fully docbook documented and well commented. The ptimer
patchqueue was based off an older ktimer tree with less comments and
no documentation.

if there are any other substantial differences between the ptimer
patchqueue and the current ktimer queue then please speak up. (there
was no documentation of all ktimer->ptimer changes, so we might have
missed something)

the patchset has been compiled & booted against recent -linus trees, on
x86 and x64.

The patch series is also available from
http://www.tglx.de/projects/ktimers/patch-2.6.15-rc2-kt-rework.tar.bz2

Thomas, Ingo

--


2005-12-01 00:40:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Thomas Gleixner <[email protected]> wrote:
>
> this patch series is a refactored version of the ktimer-subsystem patch.

25 files changed, 3364 insertions(+), 1827 deletions(-)

allnoconfig, before:

text data bss dec hex filename
764888 157221 53748 975857 ee3f1 vmlinux

after:

text data bss dec hex filename
766712 157741 53748 978201 eed19 vmlinux


Remind me what we gained for this?

2005-12-01 02:19:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked


* Andrew Morton <[email protected]> wrote:

> Thomas Gleixner <[email protected]> wrote:
> >
> > this patch series is a refactored version of the ktimer-subsystem patch.
>
> 25 files changed, 3364 insertions(+), 1827 deletions(-)
>
> allnoconfig, before:
>
> text data bss dec hex filename
> 764888 157221 53748 975857 ee3f1 vmlinux
>
> after:
>
> text data bss dec hex filename
> 766712 157741 53748 978201 eed19 vmlinux
>
> Remind me what we gained for this?

well, for 1824 bytes of code [*] and 520 bytes of data you got a new,
clean timer subsystem, which is per-clock tree based and hres-timers
ready. It also doesnt scan all active timers linearly and fixes them up
whenever NTP decides to mend the clock a bit. It also has no jiffy
dependencies and has nsec resolution with timeouts of up to 292 years,
to the nanosec. It has no subjiffies, no HZ, no tradeoffs.

note that ktimer.o itself is larger than 1824 bytes:

size kernel/ktimer.o
text data bss dec hex filename
3912 100 0 4012 fac kernel/ktimer.o

so it has already offset roughly half of its size.

we can (and will) try to improve it further, but if anyone desires to
get it for free, that's probably not possible. (only 'probable' because
we have not converted posix-cpu-timers yet, another ktimer conversion
candidate with code reduction potential)

it had to be a new set of APIs, which all take text space. We'll try to
shave off some more .text, but miracles are not expected.

Ingo

[*] if you enable CONFIG_KTIME_SCALAR, then on x86 we get denser
ktime_t code. We keep it off by default to give the union
representation testing (the scalar representation is the more
trivial case). It should shave off another 300 bytes from your
kernel's size. We'll probably enable KTIME_SCALAR on x86 later on.

2005-12-01 03:32:59

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Thu, 1 Dec 2005, Thomas Gleixner wrote:

> to address the naming debate: we do agree that 'struct ktimer' and
> 'struct timer_list' is confusing and awkward - both are about timers.
> Same goes for kernel/ktimer.c and kernel/timer.c.
>
> but what we'd like to achieve as an end-result is the clear separation
> of 'timer' vs. 'timeout' APIs. Our proposed end result would be to have
> 'struct ktimer' for timers, and 'struct ktimeout' for timeouts.

Sorry, but calling it "ktimeout" would be completely wrong.

"timeout" is a rather imprecise term, which can have different meanings
depending on the context, e.g. any timer usually has a "timeout value",
but what is meant here is a "timeout timer". So basically this is supposed
to be about "timer" vs "timeout timer".
First, the problem here is these "timeout timer" are not restricted to
just a single use case, so there is no reason to name them like this only
because they are most commonly used as timeout timer in the kernel. There
are many types of timer usages in the kernel and there is no reason to
give every one of them their own API.
Second, "timer" is a generic term, which would include all types of timer
and this suggests that this is a generally usable timer. The emphasis of
this timer is to provide precise, high resolution timer, but this purpose
is not reflected in the name at all.

This is a really bad choice in names, both are timer. timer_list is so far
_the_ generic API for _any_ type of timer, this unlikely to change for
some time, so restricting this via a new name is confusing and wrong. The
focus of the new timer system is high resolution timer and a good name
would include something which describes this purpose and would clearly
distinguish it from the current common timer.
Calling them "timer" and "timeout" would completely reverse the rolls,
which makes it completely wrong.

> - ptimer.c is smaller, for the price of making users of the APIs more
> complex. ktimer.c carries more features and thus more code, so that
> users can have simpler code.

What complexity are you talking about? Let's look at the itimer:

int it_real_fn(struct ptimer *timer)
{
struct signal_struct *sig = container_of(timer, struct signal_struct, real_timer);

send_group_sig_info(SIGALRM, SEND_SIG_PRIV, sig->tsk);

if (sig->it_real_incr.tv64 == KTIME_ZERO)
return 0;
sig->real_timer.expires = ktime_add(timer->base->last_expired, sig->it_real_incr);
return 1;
}

This looks really simple to me. The last few lines could be moved into the
timer code, but I decided against it because this version leaves more
flexibility to the user. Currently we have only few users, should we have
more users we could for example change it to:

return ptimer_rearm(timer, sig->it_real_incr);

But this is currently not needed, we can still refine the API as soon as
there are more users, so we see what is really needed. What my patch does
is to provide the basic functionality upon which further improvements are
possible.

Posix timer are OTOH indeed more complex, but here I left the complexity
there instead of moving it into ptimer.

> - the ktimer subsystem is designed with the extensibility for high
> resolution timers and dynamic ticks in mind, without having to do
> further rewrites. The practical feasibility and cleanliness of the
> ktimer approach has been proven since the very beginning, both the
> -khrt and the -rt trees have carried those patches for months now,
> with a real HRT implementation ontop of it. We believe that some
> details that the ptimers patch-queue chopped off the ktimers codebase
> will have to be reintroduced for HRT timers later on.

This is very vague. What kind of "further rewrites" will be required. What
are the "details" you "believe" are so important.

> - the resolution handling is implemented without any jiffy relations and
> resembles the behaviour of the current implementation. The first view
> of more complexity has to be carefully weighed against the flexibility
> for further extensions.

Again, what "further extensions"?
ptimer is not limited to jiffy resolutions, it's the resolution it
_currently_ uses and it's the best currently possible with the current
clock abstraction.

The resolution handling in your patch is overly complex, most of the
rounding is off and sometimes even wrong. I explained the details of the
rounding in the ktime_t patch and I would like to encourage you to pick up
from there and explain what the hell you're talking about.

> - ktimer is fully docbook documented and well commented. The ptimer
> patchqueue was based off an older ktimer tree with less comments and
> no documentation.

I intentionally left out some of the documentation, because I wanted some
discussion about the implementation first and then update the
documentation based on the discussion, so it reflects a common view
instead of finishing the discussion before it even started.

> if there are any other substantial differences between the ptimer
> patchqueue and the current ktimer queue then please speak up. (there
> was no documentation of all ktimer->ptimer changes, so we might have
> missed something)

If there is anything unclear, you could just ask.

> The patch series is also available from
> http://www.tglx.de/projects/ktimers/patch-2.6.15-rc2-kt-rework.tar.bz2

I don't want to go into the details here. Most of the initial cleanup
patches could easily be done afterwards and are not critical enough to be
done first, the ktimer part needs an explanation what the extra complexity
is needed for and I explained above what I think about the last part.

BTW a quick test shows the overrun handling is still broken. timer_gettime
seems to be broken now for these cases as well.

bye, Roman

2005-12-01 03:58:51

by Kyle Moffett

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Nov 30, 2005, at 22:32, Roman Zippel wrote:
> On Thu, 1 Dec 2005, Thomas Gleixner wrote:
>> but what we'd like to achieve as an end-result is the clear
>> separation of 'timer' vs. 'timeout' APIs. Our proposed end result
>> would be to have 'struct ktimer' for timers, and 'struct ktimeout'
>> for timeouts.
>
> Sorry, but calling it "ktimeout" would be completely wrong.
>
> "timeout" is a rather imprecise term, which can have different
> meanings depending on the context, e.g. any timer usually has a
> "timeout value", but what is meant here is a "timeout timer". So
> basically this is supposed to be about "timer" vs "timeout timer".
> [snip lengthy discussion]

If I recall correctly, this whole naming mess has been discussed to
death before, with the result that almost everybody but Roman thought
the names were perfectly clear such that a timer is _expected_ to
expire and a timeout is not, therefore timers should be optimized for
add=>run=>expire and timeouts optimized for add=>run=>remove.

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/E/U d- s++: a18 C++++>$ ULBX*++++(+++)>$ P++++(+++)>$ L++++
(+++)>$ !E- W+++(++) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP
+ t+(+++) 5 X R? !tv-(--) b++++(++) DI+(++) D+++ G e>++++$ h*(+)>++$ r
%(--) !y?-(--)
------END GEEK CODE BLOCK------



2005-12-01 15:40:38

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Wed, 30 Nov 2005, Kyle Moffett wrote:

> If I recall correctly, this whole naming mess has been discussed to death
> before, with the result that almost everybody but Roman thought the names were
> perfectly clear such that a timer is _expected_ to expire and a timeout is
> not, therefore timers should be optimized for add=>run=>expire and timeouts
> optimized for add=>run=>remove.

The human language is a bit more complicated than this (at least English
and related languages). Depending on the context a word can have different
meanings, e.g. if you ask an athlete what "timeout" means, you'll get a
different answer than you would get from an engineer. Even if we limit it
to the technical field one can define "timeout" very generally as "a
period of time after which an event is generated". Does this imply this
timeout is usually aborted? For some people it obviously does, but I
highly doubt this is generally true. Without any context "timeout" can
mean many similiar, but still different things. If you don't provide any
context, it will trigger different associations and people will add their
own context of how they use "timeout". You will of course find a large
overlap, but the less context you provide, the more likely are
misunderstandings.

A good name provides enough context to minimize misunderstandings, the
name is important for how people will perceive and use something. Here we
get to a larger problem, which goes beyond simple naming issues. Thomas
and Ingo seem to want to completely redefine how time is managed in the
kernel. The consequences for this would be very farreaching and should be
discussed independently. Discussing this under the topic of high
resolution timer would provide the entirely wrong context and lead to
misunderstandings.

Whatever it is Thomas and Ingo are trying to do with the current kernel
timer, they have to explain it in the proper context. I'm not going to
second-guess their intentions and sneaking these changes in as part of the
high resolution timer is unacceptable.

bye, Roman

2005-12-01 16:22:04

by Ray Lee

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On 12/1/05, Roman Zippel <[email protected]> wrote:
> The human language is a bit more complicated than this (at least English
> and related languages). Depending on the context a word can have different
> meanings, e.g. if you ask an athlete what "timeout" means, you'll get a
> different answer than you would get from an engineer.

Actually, no you won't. The athlete will say "A timeout? Something out
of the ordinary happened, and coach wants me to go to the sidelines to
talk." Timeouts are unexpected and exceptional, whether you're an
athlete or a piece of code. On the other hand, they have a timer that
everyone *expects* to expire at the end of the quarter or game.

Ray, who is both an athlete and a native English speaker, who thinks
the naming is the clearest of anything to come across this list in
ages.

2005-12-01 16:52:00

by Russell King

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Thu, Dec 01, 2005 at 08:22:01AM -0800, Ray Lee wrote:
> On 12/1/05, Roman Zippel <[email protected]> wrote:
> > The human language is a bit more complicated than this (at least English
> > and related languages). Depending on the context a word can have different
> > meanings, e.g. if you ask an athlete what "timeout" means, you'll get a
> > different answer than you would get from an engineer.
>
> Actually, no you won't. The athlete will say "A timeout? Something out
> of the ordinary happened, and coach wants me to go to the sidelines to
> talk." Timeouts are unexpected and exceptional, whether you're an
> athlete or a piece of code. On the other hand, they have a timer that
> everyone *expects* to expire at the end of the quarter or game.
>
> Ray, who is both an athlete and a native English speaker, who thinks
> the naming is the clearest of anything to come across this list in
> ages.

rmk, also a native English speaker, agrees with Ray, Thomas and Ingo.
As does dictionary.reference.com's definitions of timeout and timer:

timeout

A period of time after which an error condition is raised if some event
has not occured. A common example is sending a message. If the receiver
does not acknowledge the message within some preset timeout period, a
transmission error is assumed to have occured.

timer

a timepiece that measures a time interval and signals its end

Hence, timers have the implication that they are _expected_ to expire.
Timeouts have the implication that their expiry is an exceptional
condition.

So can we stop rehashing this stupid discussion?

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-12-01 16:52:50

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Thu, 1 Dec 2005, Ray Lee wrote:

> Actually, no you won't. The athlete will say "A timeout? Something out
> of the ordinary happened, and coach wants me to go to the sidelines to
> talk." Timeouts are unexpected and exceptional, whether you're an
> athlete or a piece of code. On the other hand, they have a timer that
> everyone *expects* to expire at the end of the quarter or game.

Please be precise, there is of course a common base, but it's not the
same. In sports a timeout is the actual event that interrupts something.
In code it's a time _period_ until an exceptional event. A timer delivers
an asynchronous event after a specified timeout, so they're always
"unexpected and exceptional". (You can of course constantly poll the timer
to make it a little less unexpected, but then you don't really need to set
a timer in first place.)

bye, Roman

2005-12-01 17:45:00

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Thu, 1 Dec 2005, Russell King wrote:

> timeout
>
> A period of time after which an error condition is raised if some event
> has not occured. A common example is sending a message. If the receiver
> does not acknowledge the message within some preset timeout period, a
> transmission error is assumed to have occured.
>
> timer
>
> a timepiece that measures a time interval and signals its end
>
> Hence, timers have the implication that they are _expected_ to expire.
> Timeouts have the implication that their expiry is an exceptional
> condition.

IOW a timeout uses a timer to implement an exceptional condition after a
period of time expires.

> So can we stop rehashing this stupid discussion?

The naming isn't actually my primary concern. I want a precise definition
of the expected behaviour and usage of the old and new timer system. If I
had this, it would be far easier to choose a proper name.
E.g. I still don't know why ktimeout should be restricted to raise just
"error conditions", as the name implies.

bye, Roman

2005-12-01 19:09:41

by Steven Rostedt

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Thu, 2005-12-01 at 18:44 +0100, Roman Zippel wrote:
> Hi,
>
> On Thu, 1 Dec 2005, Russell King wrote:
...
> > Hence, timers have the implication that they are _expected_ to expire.
> > Timeouts have the implication that their expiry is an exceptional
> > condition.
>
> IOW a timeout uses a timer to implement an exceptional condition after a
> period of time expires.
>
> > So can we stop rehashing this stupid discussion?
>
> The naming isn't actually my primary concern. I want a precise definition
> of the expected behaviour and usage of the old and new timer system. If I
> had this, it would be far easier to choose a proper name.
> E.g. I still don't know why ktimeout should be restricted to raise just
> "error conditions", as the name implies.
>

ktimeout may not need to be restricted to anything.

It should just be documented simply as: If you need to set some timed
event to happen that you don't expect to occur then use a ktimeout.
Where this timed event is an event that lets you know that another event
hasn't happened in a given time. (I want to know if x didn't happen by
this time).

If a timed event is expected to occur then use ktimer. Where it is
mostly used for the event itself (I want x to happen at this time).

Now you could use ktimout on something that will always occur, but this
will just be inefficient, since ktimeout is optimized for removal and
not expiration. Which usually happens when something "times out".

And you could use ktimer on something that isn't going to occur. But
again this is just inefficient.

So Roman, please have someone else speak up and let us know that they
are just as confused on these names as you are. Currently, it seems
that you are the only one that doesn't understand the difference between
a timeout and a timer. You seem very intelligent and that could be why
you are getting confused. You're looking too deep into the
implementation of timers and timeouts, where they seem to use each
other. You just need to take a step back and look at this at a higher
view. Think about what to use when you need to implement being told
when something has timed out (timeout) or when you just want to do
something that happens a a certain time (timer).

-- Steve

2005-12-01 20:25:56

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Russell King <[email protected]> wrote:
>
> rmk, also a native English speaker, agrees with Ray, Thomas and Ingo.
> As does dictionary.reference.com's definitions of timeout and timer:
>
> timeout
>
> A period of time after which an error condition is raised if some event
> has not occured. A common example is sending a message. If the receiver
> does not acknowledge the message within some preset timeout period, a
> transmission error is assumed to have occured.
>
> timer
>
> a timepiece that measures a time interval and signals its end
>
> Hence, timers have the implication that they are _expected_ to expire.
> Timeouts have the implication that their expiry is an exceptional
> condition.

Well timer_lists get around the problem quite neatly by handling both
situations. In a way which has been learned by thousands of developers
over many years.

The whole concept of separating "timers" from "timeouts" seems a step
backward to me. A large one. Why was it done, and can it be undone?

2005-12-01 21:12:32

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Thu, 1 Dec 2005, Steven Rostedt wrote:

> It should just be documented simply as: If you need to set some timed
> event to happen that you don't expect to occur then use a ktimeout.
> Where this timed event is an event that lets you know that another event
> hasn't happened in a given time. (I want to know if x didn't happen by
> this time).
>
> If a timed event is expected to occur then use ktimer. Where it is
> mostly used for the event itself (I want x to happen at this time).

If that is really _the_ defining difference, then we are _seriously_
screwed.

Here are a few items I would consider in choosing a timer:

- reading time: to program a timer you have to read the time first,
reading jiffies is practically free, whereas reading the precise time can
be very expensive. With the right hardware it can be optimized to be quite
cheap, but if portability is important you may want to avoid the extra
cost.

- calculations: jiffies is a long integer whereas ktime_t is 64bit, so if
you need a lot of complex time calculation, you should take the cost for
32bit archs into account.

- resolution: how precise must the timer be? jiffies can't represent time
values less than 1ms, but if time is e.g. measured in 10th of a second,
jiffies may be enough.

- timer life time: if only a short interval is needed (e.g. a fraction of
a second) timer_list is often a lot cheaper.

> So Roman, please have someone else speak up and let us know that they
> are just as confused on these names as you are.

Let's ignore the name for a moment, let's instead prioritize the above
list.
If your item of whether a timer does expire or not is really the most
important criteria for choosing a timer, I will accept any name you want.

bye, Roman

2005-12-01 21:19:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked


* Andrew Morton <[email protected]> wrote:

> The whole concept of separating "timers" from "timeouts" seems a step
> backward to me. A large one. Why was it done, and can it be undone?

i had a very similar opinion when i first talked to Thomas about HRES
timers a couple of months ago. I told him that the only sane way to add
HRES timers to the -rt tree is to integrate them into the existing timer
wheel, and to avoid a duality of APIs. I told him that adding a separate
HRES implementation is pretty much a 'non-starter'.

then many months of experimentation followed. We (well, Thomas mostly)
patiently tried a sub-jiffy method, a split-lists method, all sorts of
ways to merge high-res timers into the timer wheel. We got HRES timers
work in every such design, but it looked way too ugly and had bad
performance and latencies - and we wanted upsteam integration.

so after many months we realized that the core issue is that the
requirements of 'timers' are unmixable with the requirements of
'timeouts'. See a more detailed analysis at:

http://lwn.net/Articles/152436/

i'll try to sum it up again very briefly: the timer wheel is a very
well-optimized data structure geared towards 'timeout' type of use. But
it is at the very edge of its 'feature reach', and we found no workable
way to extend it into the directions we wanted to go. The moment we
tried to extend it in one direction (e.g. to increase HZ to 1000000 to
get 1 usec resolution), it started creaking in some other spots.

the only clean solution we found was to totally separate them, and to
use the natural data structures for both of them: to keep the highly
scalable timer wheel on the timeout side, and to use the slower but more
flexible [and deterministic] timer trees on the timer side. [ktimers are
still very fast - but they cannot possibly be as fast as the single
list_add() of add_timer()!]. The two usage scenarios (timeouts and
timers) do not care about each other.

we could merge the two by driving 'timeouts' via ktimers too - but there
would be some unavoidable overhead to things like the TCP stack. But
ktimers cannot be merged into timeouts, that's sure.

Ingo

2005-12-01 21:52:46

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Ingo Molnar <[email protected]> wrote:
>
> we could merge the two by driving 'timeouts' via ktimers too - but there
> would be some unavoidable overhead to things like the TCP stack. But
> ktimers cannot be merged into timeouts, that's sure.

I think you guys have an advantage over me because you've been discussing
and thinking about this terminology for months. IOW, your lips are moving
but all I hear is blah, blah, blah ;)

For instance, when Kyle came out with his one-sentence description of
timers versus timeouts, I thought he had them backwards. Only apparently
he didn't.

So either it's all confusing, or I'm dumb, or both. I can evade
investigation of that by claiming that we should seek something which is
unconfusing to even dumb people.

We have timer_lists. But you say they don't suit precision timers. Fine.
So why cannot we call the new precision timers something like "precision
timers" and avoid this semantic confusion over timeouts versus timers?

IOW: leave timer_lists alone. Just add the needed new subsystem and use it.

I guess old-timers can mentally do s/ktimeout/timer_list/ whenever they
come across the danged thing, but it's a bit painful. If we called them
"timer_list" and "hrtimer", things would be much clearer. Plus that's a
description of what they *are*, rather than of how we expect them to be
applied.

2005-12-01 22:04:04

by Steven Rostedt

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked


On Thu, 1 Dec 2005, Roman Zippel wrote:

> Hi,

Thank you Roman, this has cleared up a little where you are coming from.
The name issue should never have been brought up, and what you state here
should have been mentioned first. (If you did mention this earlier, I
guess it was just lost in the noise).

>
> On Thu, 1 Dec 2005, Steven Rostedt wrote:
>
> > It should just be documented simply as: If you need to set some timed
> > event to happen that you don't expect to occur then use a ktimeout.
> > Where this timed event is an event that lets you know that another event
> > hasn't happened in a given time. (I want to know if x didn't happen by
> > this time).
> >
> > If a timed event is expected to occur then use ktimer. Where it is
> > mostly used for the event itself (I want x to happen at this time).
>
> If that is really _the_ defining difference, then we are _seriously_
> screwed.

I'm not convinced that we are.

>
> Here are a few items I would consider in choosing a timer:
>
> - reading time: to program a timer you have to read the time first,
> reading jiffies is practically free, whereas reading the precise time can
> be very expensive. With the right hardware it can be optimized to be quite
> cheap, but if portability is important you may want to avoid the extra
> cost.

If you are only concerned with ms resolution, then the added expense of
precise time is pretty negligible. And these times are still for usage
with timers that are expected to expire. Remember that timeouts are still
using jiffies.

As for portablility, I believe John Stultz has some nice plugins coming
to what timer source you want to use, so if there's a better way to get a
time, these should make things easy to add.

>
> - calculations: jiffies is a long integer whereas ktime_t is 64bit, so if
> you need a lot of complex time calculation, you should take the cost for
> 32bit archs into account.

Again, the cost is negligible for ms resolution. And we're getting close
to 2038 so we need to be thinking in 64 bits anyway ;)

I'm not running huge multiuser systems, so I don't know what the average
amount of timers that are used (ones that are expected to expire), and how
much of an overhead it would be for these timers to be using 64 bit
calculations and precise timers.

>
> - resolution: how precise must the timer be? jiffies can't represent time
> values less than 1ms, but if time is e.g. measured in 10th of a second,
> jiffies may be enough.

And they would be if that is all you need. But coming from an embedded
point of view, that is not nearly enough. I really see HighRes making it
into the kernel soon, and any new code in this area really needs to take
that into account.

>
> - timer life time: if only a short interval is needed (e.g. a fraction of
> a second) timer_list is often a lot cheaper.

And again, you are only limited to 1000 choices to go off in that fraction
of a second if jiffies is the resolution (with jiffies an 1000HZ).

As for the timer_list, you mean a linear list? If you have a bunch of
timers to go off within a fraction of a second, couldn't a linear list
actually take longer?

-- Steve

2005-12-01 22:13:24

by Kyle Moffett

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Dec 01, 2005, at 16:51, Andrew Morton wrote:
> Ingo Molnar <[email protected]> wrote:
>> we could merge the two by driving 'timeouts' via ktimers too - but
>> there would be some unavoidable overhead to things like the TCP
>> stack. But ktimers cannot be merged into timeouts, that's sure.
>
> I think you guys have an advantage over me because you've been
> discussing and thinking about this terminology for months. IOW,
> your lips are moving but all I hear is blah, blah, blah ;)
>
> For instance, when Kyle came out with his one-sentence description
> of timers versus timeouts, I thought he had them backwards. Only
> apparently he didn't.
>
> So either it's all confusing, or I'm dumb, or both. I can evade
> investigation of that by claiming that we should seek something
> which is unconfusing to even dumb people.

I think part of this confusion was the little flamewar over naming.
I think I can provide a succinct and simple description of this
stuff, possibly as the start of some documentation:

In this patch there are two ways of setting up code to run at some
point in the future: timers and timeouts.

A timeout (like waiting for somebody to answer the phone) is
optimized to never happen (they will hopefully pick up first). If
everything works perfectly; it will be stopped before it has a chance
to go off.

A timer (like a kitchen timer telling you the cookies are done) is
optimized to be added and sit around until it expires. You just
don't turn off the timer and take the cookies out before they are done.

For the most part, you don't really care much about accuracy with a
timeout. It needs to happen no earlier than the specified time, but
if it occurs a second late, so what? The person might sit around
waiting an extra few seconds for their friend to pick up, but that's
not a major issue. On the other hand, you really *do* care about how
accurate your timer is. If you wait an extra minute or two after the
timer goes off before pulling the cookies from the oven, you have
some rather inedible cookies.

> IOW: leave timer_lists alone. Just add the needed new subsystem
> and use it.

I think that this is a relatively useful distinction to make, and
perhaps we _should_ rename the subsystem and educate developers about
the difference between timers and timeouts.

Cheers,
Kyle Moffett

--
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.
-- Brian Kernighan


2005-12-01 22:16:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Thu, Dec 01, 2005 at 05:13:17PM -0500, Kyle Moffett wrote:
> In this patch there are two ways of setting up code to run at some
> point in the future: timers and timeouts.
>
> A timeout (like waiting for somebody to answer the phone) is
> optimized to never happen (they will hopefully pick up first). If
> everything works perfectly; it will be stopped before it has a chance
> to go off.
>
> A timer (like a kitchen timer telling you the cookies are done) is
> optimized to be added and sit around until it expires. You just
> don't turn off the timer and take the cookies out before they are done.

Heh, in my dumb non-native speaker mind I'd expectit the other way around,
as in a timeout is expected to time out :) and a timer is expect to happen,
as in say the timer the tells you your breakfast egg is ready.

2005-12-01 23:56:40

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Thu, 2005-12-01 at 22:15 +0000, Christoph Hellwig wrote:
> On Thu, Dec 01, 2005 at 05:13:17PM -0500, Kyle Moffett wrote:
> > In this patch there are two ways of setting up code to run at some
> > point in the future: timers and timeouts.
> >
> > A timeout (like waiting for somebody to answer the phone) is
> > optimized to never happen (they will hopefully pick up first). If
> > everything works perfectly; it will be stopped before it has a chance
> > to go off.
> >
> > A timer (like a kitchen timer telling you the cookies are done) is
> > optimized to be added and sit around until it expires. You just
> > don't turn off the timer and take the cookies out before they are done.
>
> Heh, in my dumb non-native speaker mind I'd expectit the other way around,
> as in a timeout is expected to time out :) and a timer is expect to happen,
> as in say the timer the tells you your breakfast egg is ready.

Which is perfectly the point Kyle made.

The timer tells you that the cookies or your breakfast eggs are well
done. You put them out of the oven or the pot at exactly the time when
the timer event happens. You won't turn off the timer before your
cookies/eggs are done, because you want them well done.

The timeout you set up is to remind you to switch off the oven before
your kitchen starts to burn. This timeout is likely - not sure in your
personal case :) - to be cancelled because you did think about switching
off the oven in time. If you forgot it, it is not a big difference if
you get the reminder 5 minutes earlier or later.

In case of the egg / cookie timer the distcintion of 1 to 5 minutes
makes a big difference.

tglx


2005-12-02 00:30:27

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Thu, 1 Dec 2005, Steven Rostedt wrote:

> The name issue should never have been brought up,

Well, I haven't brought it up, I'm trying to get rid od it.


> > - reading time: to program a timer you have to read the time first,
> > reading jiffies is practically free, whereas reading the precise time can
> > be very expensive. With the right hardware it can be optimized to be quite
> > cheap, but if portability is important you may want to avoid the extra
> > cost.
>
> If you are only concerned with ms resolution, then the added expense of
> precise time is pretty negligible. And these times are still for usage
> with timers that are expected to expire. Remember that timeouts are still
> using jiffies.
>
> As for portablility, I believe John Stultz has some nice plugins coming
> to what timer source you want to use, so if there's a better way to get a
> time, these should make things easy to add.

These plugins can do no magic, if the hardware timer is slow, the whole
thing gets slow.

> > - calculations: jiffies is a long integer whereas ktime_t is 64bit, so if
> > you need a lot of complex time calculation, you should take the cost for
> > 32bit archs into account.
>
> Again, the cost is negligible for ms resolution. And we're getting close
> to 2038 so we need to be thinking in 64 bits anyway ;)

That one is not really an issue in the kernel, unless you want an uptime
of a hundred years.

> > - resolution: how precise must the timer be? jiffies can't represent time
> > values less than 1ms, but if time is e.g. measured in 10th of a second,
> > jiffies may be enough.
>
> And they would be if that is all you need. But coming from an embedded
> point of view, that is not nearly enough. I really see HighRes making it
> into the kernel soon, and any new code in this area really needs to take
> that into account.

I'm not against HR timer, I have a problem with using them as timer for
everything.

> > - timer life time: if only a short interval is needed (e.g. a fraction of
> > a second) timer_list is often a lot cheaper.
>
> And again, you are only limited to 1000 choices to go off in that fraction
> of a second if jiffies is the resolution (with jiffies an 1000HZ).

The point is still valid, short interval timer are cheaper using normal
timer, independent of whether they are removed or they expire.

bye, Roman

2005-12-02 00:36:29

by Kyle Moffett

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Dec 01, 2005, at 19:02, Thomas Gleixner wrote:
> On Thu, 2005-12-01 at 22:15 +0000, Christoph Hellwig wrote:
>> Heh, in my dumb non-native speaker mind I'd expectit the other way
>> around, as in a timeout is expected to time out :) and a timer is
>> expect to happen, as in say the timer the tells you your breakfast
>> egg is ready.
>
> Which is perfectly the point Kyle made.

In any case, the real important note here is that the two are pretty
different concepts, ones that lend themselves to _very_ different
optimizations, that are currently lumped together. The very fact
that some developers easily get them confused says that we need a
good clean implementation of both distinct APIs with comparable
documentation, including a bunch of good example usages.

Cheers,
Kyle Moffett

--
I have yet to see any problem, however complicated, which, when you
looked at it in the right way, did not become still more complicated.
-- Poul Anderson



2005-12-02 00:41:51

by Kyle Moffett

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Dec 01, 2005, at 19:29, Roman Zippel wrote:
> Hi,
>
>> As for portablility, I believe John Stultz has some nice plugins
>> coming to what timer source you want to use, so if there's a
>> better way to get a time, these should make things easy to add.
>
> These plugins can do no magic, if the hardware timer is slow, the
> whole thing gets slow.

The point is that you could switch both the timer and timeout
implementations to jiffies if you wanted to, at the expense of the
accuracy that a lot of people care about.

>>> - resolution: how precise must the timer be? jiffies can't
>>> represent time values less than 1ms, but if time is e.g. measured
>>> in 10th of a second, jiffies may be enough.
>>
>> And they would be if that is all you need. But coming from an
>> embedded point of view, that is not nearly enough. I really see
>> HighRes making it into the kernel soon, and any new code in this
>> area really needs to take that into account.
>
> I'm not against HR timer, I have a problem with using them as timer
> for everything.

This is _exactly_ why there is the timer/timeout distinction. Some
things don't care, and as a result use a timer wheel exactly like
they always have. For the things that do, however, the new timer API
provides it using the fastest hardware interface available.

Cheers,
Kyle Moffett

--
I didn't say it would work as a defense, just that they can spin that
out for years in court if it came to it.
-- Rob Landley



2005-12-02 00:46:35

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Thu, 1 Dec 2005, Kyle Moffett wrote:

> A timeout (like waiting for somebody to answer the phone) is optimized to
> never happen (they will hopefully pick up first). If everything works
> perfectly; it will be stopped before it has a chance to go off.
>
> A timer (like a kitchen timer telling you the cookies are done) is optimized
> to be added and sit around until it expires. You just don't turn off the
> timer and take the cookies out before they are done.

Making this the primary criteria for choosing a timer system would be a
huge mistake. As I wrote in a previous mail there are other, more
important criteria.
So far I still thought this was about kernel programming and not about
Aunt Tillies cooking show. Can we please bring this back to a technical
level?

bye, Roman

2005-12-02 00:58:12

by john stultz

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Thu, 2005-12-01 at 19:41 -0500, Kyle Moffett wrote:
> On Dec 01, 2005, at 19:29, Roman Zippel wrote:
> > Hi,
> >
> >> As for portablility, I believe John Stultz has some nice plugins
> >> coming to what timer source you want to use, so if there's a
> >> better way to get a time, these should make things easy to add.
> >
> > These plugins can do no magic, if the hardware timer is slow, the
> > whole thing gets slow.
>
> The point is that you could switch both the timer and timeout
> implementations to jiffies if you wanted to, at the expense of the
> accuracy that a lot of people care about.

While I'm not challenging the possibility of doing this, my timekeeping
work does not provide quite this level of flexibility you imply. Indeed
one could use jiffies as a clocksource, limiting all time users
(including ktimers) to jiffies resolution, but I would consider that to
be abusing the interface.

thanks
-john

2005-12-02 01:02:28

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Thu, 1 Dec 2005, Kyle Moffett wrote:

> > I'm not against HR timer, I have a problem with using them as timer for
> > everything.
>
> This is _exactly_ why there is the timer/timeout distinction. Some things
> don't care, and as a result use a timer wheel exactly like they always have.
> For the things that do, however, the new timer API provides it using the
> fastest hardware interface available.

This is about kernel programming - people should care. We have enough crap
as it is. timer wheel is fast as well, but everything has its limits,
putting this focus completely to delivery is nonsense. It can't be that
difficult to put together a decent list of criteria, where to use which
timer. Both are still _timer_, introducing this timer/timeout thing is
only confusing.

bye, Roman

2005-12-02 01:08:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Kyle Moffett <[email protected]> wrote:
>
> On Dec 01, 2005, at 19:02, Thomas Gleixner wrote:
> > On Thu, 2005-12-01 at 22:15 +0000, Christoph Hellwig wrote:
> >> Heh, in my dumb non-native speaker mind I'd expectit the other way
> >> around, as in a timeout is expected to time out :) and a timer is
> >> expect to happen, as in say the timer the tells you your breakfast
> >> egg is ready.
> >
> > Which is perfectly the point Kyle made.
>
> In any case, the real important note here is that the two are pretty
> different concepts, ones that lend themselves to _very_ different
> optimizations, that are currently lumped together. The very fact
> that some developers easily get them confused says that we need a
> good clean implementation of both distinct APIs with comparable
> documentation, including a bunch of good example usages.
>

Or just leave the timer_lists as they are.

If I'm going to spend the next two years buried in helpful
s/timer_list/ktimeout/ patches then there'd better be a darn good reason
for the rename, thanks. I don't see one.

2005-12-02 01:09:25

by Kyle Moffett

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Dec 01, 2005, at 20:01, Roman Zippel wrote:
> Hi,
>
> On Thu, 1 Dec 2005, Kyle Moffett wrote:
>>> I'm not against HR timer, I have a problem with using them as
>>> timer for
>>> everything.
>>
>> This is _exactly_ why there is the timer/timeout distinction.
>> Some things don't care, and as a result use a timer wheel exactly
>> like they always have. For the things that do, however, the new
>> timer API provides it using the fastest hardware interface available.
>
> This is about kernel programming - people should care.

My _point_ is that some code doesn't care about accuracy. If a
networking timeout occurs a half-second later than it should, nothing
bad happens. We have configurable SCSI drive timeouts, precisely
because it doesn't really matter all that much if we deliver it now
or give the drive a couple seconds extra time to try to respond
before signalling a reset. And I agree with you that people should
care, this distinction is important.

> We have enough crap as it is. timer wheel is fast as well, but
> everything has its limits, putting this focus completely to
> delivery is nonsense. It can't be that difficult to put together a
> decent list of criteria, where to use which timer.

A ktimer should be used where the common case is the timer being
added and expiring. A ktimeout should be used where the common case
is the timer being added and removed before it expires. Simple enough?

Cheers,
Kyle Moffett

--
I lost interest in "blade servers" when I found they didn't throw
knives at people who weren't supposed to be in your machine room.
-- Anthony de Boer


2005-12-02 01:25:22

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Thu, 1 Dec 2005, Kyle Moffett wrote:

> My _point_ is that some code doesn't care about accuracy.

That's not how it works, the timer wheel is accurate within its
resolution.

> A ktimer should be used where the common case is the timer being added and
> expiring. A ktimeout should be used where the common case is the timer being
> added and removed before it expires. Simple enough?

As I said before there are other more important criteria.
Check my mail to Steven, I'm not repeating it here.

bye, Roman

2005-12-02 01:47:55

by David Lang

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Fri, 2 Dec 2005, Roman Zippel wrote:

> Hi,
>
> On Thu, 1 Dec 2005, Kyle Moffett wrote:
>
>> My _point_ is that some code doesn't care about accuracy.
>
> That's not how it works, the timer wheel is accurate within its
> resolution.

but the point that is being made is that while this is true, there is a
large group of functions that really don't care (the timeout case), and
for that type of use it's possible to do some optimizations that make it
extremely efficiant.

In addition, once you remove the bulk of these uses from the picture (by
makeing them use a new timer type that's optimized for their useage
pattern, the 'unlikly to expire' case) the remainder of the timer users
easily fall into the catagory where the timer is expected to expire, so
that code can accept a performance hit for removing events prior to them
going off that would not be acceptable in a general case version.

the example of this is the networking timers. they almost never go off,
but one is set for just about every packet that's processed. so adding any
overhead in removing the unexpired timer has a large impact on
performance.

but once this large group of timer users are removed, (along with a few
other similar watchdog timers for disk I/O, etc) the remaining users will
almost never remove the event before it goes off, so the code can be
optimized for that situation (including things that would increase the
cost to remove the unexpired event) and gain precision and possibly
performance as well.

would the term 'watchdog' or 'watchdog_timer' for what's been refered to
as the timeout timer make more sense to people? it's used when you need to
setup a safety net around the possibility that an event won't happen, it's
guarenteed not to fire before the time specified, but may have it's
activation delayed slightly past that point.

then the rest of the uses could use the term 'timer', and that code is
optimized for the timer actually expireing, removing an event that has not
expired will be relativly costly.

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

2005-12-02 02:22:33

by Steven Rostedt

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Fri, 2005-12-02 at 01:02 +0100, Thomas Gleixner wrote:
> On Thu, 2005-12-01 at 22:15 +0000, Christoph Hellwig wrote:
> > On Thu, Dec 01, 2005 at 05:13:17PM -0500, Kyle Moffett wrote:
> > > In this patch there are two ways of setting up code to run at some
> > > point in the future: timers and timeouts.
> > >
> > > A timeout (like waiting for somebody to answer the phone) is
> > > optimized to never happen (they will hopefully pick up first). If
> > > everything works perfectly; it will be stopped before it has a chance
> > > to go off.
> > >
> > > A timer (like a kitchen timer telling you the cookies are done) is
> > > optimized to be added and sit around until it expires. You just
> > > don't turn off the timer and take the cookies out before they are done.
> >
> > Heh, in my dumb non-native speaker mind I'd expectit the other way around,
> > as in a timeout is expected to time out :) and a timer is expect to happen,
> > as in say the timer the tells you your breakfast egg is ready.
>
> Which is perfectly the point Kyle made.
>
> The timer tells you that the cookies or your breakfast eggs are well
> done. You put them out of the oven or the pot at exactly the time when
> the timer event happens. You won't turn off the timer before your
> cookies/eggs are done, because you want them well done.

As I mentioned before, this is where you expect an event to happen (I
want x to happen at some time). The event being to take out the
cookies.

>
> The timeout you set up is to remind you to switch off the oven before
> your kitchen starts to burn. This timeout is likely - not sure in your
> personal case :) - to be cancelled because you did think about switching
> off the oven in time. If you forgot it, it is not a big difference if
> you get the reminder 5 minutes earlier or later.

And this is where a time out is done when some event doesn't happen (I
want to know if x didn't happen by some time). The event here is to
turn off the oven and the time out is to let you know that you didn't
and the house will soon burn down if you don't turn it off.

> In case of the egg / cookie timer the distcintion of 1 to 5 minutes
> makes a big difference.


I started kernel programming with writing experimental network stacks.
So to me "timeout" is strongly engraved in my head as to let you know
when something didn't happen, and you need to do something to recover
for it. If you send out a packet and you don't receive an ACK, you have
a timeout let you know that it probably didn't make it (or the ACK
didn't make it to you), and send another packet out. But most likely,
(unless you are on a really unreliable network) you get your ACK and you
can remove the time out.

I know that timeouts are used tremendously in the networking code, so
separating them from precision timers would likely be a good thing.

-- Steve


2005-12-02 02:51:57

by Steven Rostedt

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Fri, 2005-12-02 at 01:29 +0100, Roman Zippel wrote:
> Hi,

> > As for portablility, I believe John Stultz has some nice plugins coming
> > to what timer source you want to use, so if there's a better way to get a
> > time, these should make things easy to add.
>
> These plugins can do no magic, if the hardware timer is slow, the whole
> thing gets slow.

I was stating that if you have a faster timer, you can switch to it
without suffering much portability problems. Anyway, you can turn off
HR if you don't need it and you still just get jiffy resolution. And we
still lose the overhead of the timer wheel having a lot of timers that
will expire.

...

> > And they would be if that is all you need. But coming from an embedded
> > point of view, that is not nearly enough. I really see HighRes making it
> > into the kernel soon, and any new code in this area really needs to take
> > that into account.
>
> I'm not against HR timer, I have a problem with using them as timer for
> everything.

When do we know to use HR or not? So do you think that adding HR should
be specific to certain areas?

So now we could have three separate timers :)

expire_timer - A timer that is expected to expire
nonexpire_timer - A timer not expected to expire (timeout)
precision_timer - A timer that will expire at a specific high
resolution time.

This would clear things up a bit, but I don't think Andrew would go for
three different timer interfaces ;) But it would give you what you and
I both want.

1) a timer that won't use high resolution timers but is still efficient
for the add->expire case.
2) a timer that will most likely not expire and is efficient for the
add->remove case.
3) a timer that will expire but will use the slower but more precise
hardware timer.

If this didn't cause more confusion for developers not knowing which
timer to use, I would say this would be a good idea.

Actually, we could make a single API for this and the default being the
nonexpire_timer (timeout). Just add a flag field that would tell it to
use the expire or precision timer. Have the guts of the API be
implemented separately.

Just a suggestion.

>
> > > - timer life time: if only a short interval is needed (e.g. a fraction of
> > > a second) timer_list is often a lot cheaper.
> >
> > And again, you are only limited to 1000 choices to go off in that fraction
> > of a second if jiffies is the resolution (with jiffies an 1000HZ).
>
> The point is still valid, short interval timer are cheaper using normal
> timer, independent of whether they are removed or they expire.

As long as they don't need to be rehashed.

-- Steve


2005-12-02 14:42:10

by John Stoffel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

>>>>> "Kyle" == Kyle Moffett <[email protected]> writes:

Kyle> On Dec 01, 2005, at 19:02, Thomas Gleixner wrote:
>> On Thu, 2005-12-01 at 22:15 +0000, Christoph Hellwig wrote:
>>> Heh, in my dumb non-native speaker mind I'd expectit the other way
>>> around, as in a timeout is expected to time out :) and a timer is
>>> expect to happen, as in say the timer the tells you your breakfast
>>> egg is ready.
>>
>> Which is perfectly the point Kyle made.

Kyle> In any case, the real important note here is that the two are
Kyle> pretty different concepts, ones that lend themselves to _very_
Kyle> different optimizations, that are currently lumped together.
Kyle> The very fact that some developers easily get them confused says
Kyle> that we need a good clean implementation of both distinct APIs
Kyle> with comparable documentation, including a bunch of good example
Kyle> usages.

I think the problem is in using the work 'time' in both. Split that
so that they are seperate, and alot of the confusion will go away. Do
I have a usefull suggestion? No, I'm being fairly dumb this
morning... but just seeing all your smart guys getting confused makes
me think they rest of us would be lost too.

Hmm... how about:

timer - gotta let me know exactly when it expires, I won't
touch it until it does.

reminder - I'll generally clean this up before it fires ,
don't care if I get reminded a bit later.

John

2005-12-02 14:44:34

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Thu, 1 Dec 2005, David Lang wrote:

> In addition, once you remove the bulk of these uses from the picture (by
> makeing them use a new timer type that's optimized for their useage pattern,
> the 'unlikly to expire' case) the remainder of the timer users easily fall
> into the catagory where the timer is expected to expire, so that code can
> accept a performance hit for removing events prior to them going off that
> would not be acceptable in a general case version.

Guys, before you continue spreading nonsense, please read carefully Ingos
description of the timer wheel at http://lwn.net/Articles/156329/ .
Let me also refine the statement I made in this mail: the _focus_ on
delivery is complete nonsense.

The delivery is really not the important part, what is important is the
_lifetime_ of the timer. As Ingo said we try to delay as much work as
possible into the future, so that all the work needed for short-lived
timer is basically:

list_add() + list_del()

This is a constant operation and whether at the end is a callback is
unimportant from the perspective of the timer system.
When the timer spends more time in the timer wheel, it has to be moved
into different slots over time, but this not a really expensive operation
either, so e.g. all the work needed with a single cascading step is:

list_add() + list_del() + list_add() + list_del()

This is still quite cheap and with a single cascading step we cover 2^14
jiffies (2^10 for small configurations), which is quite a lot of time and
whether in that time the timer is delivered or not doesn't change above
cost. Another important thing to realize is that this cost is independent
of the amount of timers, the per timer cost depends only on the timeout
value.

So let's look at the new timer which uses a rbtree. Its per timer cost
doesn't depend on the expiry value, but on the size of the tree instead.
All you have to do with the timer is:

tree_insert() + tree_remove()

This is not a constant operation, with O(log(n)) it grows quite slowly,
but in any case it's more expensive than a simple list_add/list_del, this
means you have to do a number of list operations before it becomes more
expensive than a single tree operation. The nonconstant cost also means
the more timer start using the rbtree, the relatively cheaper it becomes
to use the timer wheel again.
The break-even point may now be different on various machines, but I think
it's safe to assume that two list add/del is at least as cheap and usually
cheaper then a tree add/del. This means timers which run for less than
2^14 jiffies are better off using the timer wheel, unless they require the
higher resolution of the new timer system.

Moving timers away from the timer wheel will also not help with the
problem cases of the timer wheel. If you have a million network timer, a
cascading step for thousands of timer takes time, but it doesn't change
the cost per timer, we just have to do the work that we were too lazy to
do before. In this case it would be better to look into solutions which
avoid generating millions of timer in first place.

So can we please stop this likely/unlikely expiry nonsense? It's great if
you want to tell aunt Tillie about kernel hacking, but it's terrible
advice to kernel programmers. When it comes to choosing a timer
implementation, the delivery is completely and utterly unimportant.

bye, Roman

2005-12-02 15:41:46

by Kyle Moffett

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Dec 2, 2005, at 09:43:45, Roman Zippel wrote:
> Hi,
>
> On Thu, 1 Dec 2005, David Lang wrote:
>
>> In addition, once you remove the bulk of these uses from the
>> picture (by
>> makeing them use a new timer type that's optimized for their
>> useage pattern,
>> the 'unlikly to expire' case) the remainder of the timer users
>> easily fall
>> into the catagory where the timer is expected to expire, so that
>> code can
>> accept a performance hit for removing events prior to them going
>> off that
>> would not be acceptable in a general case version.
>
> [snip timer wheel is efficient for lots of add remove, timer tree
> is not]
>
> This means timers which run for less than 2^14 jiffies are better
> off using the timer wheel, unless they require the higher
> resolution of the new timer system.

PRECISELY!!! The point is to provide a new and more flexible API,
either as two different sets of timer manipulation functions (one for
timer wheel and one for timer tree) or as a single set with multiple
backends), and migrate old timers to the new system, reclassifying
them based on the timer needs. Some timers/timeouts/whatevers don't
care much about delivery accuracy and could be placed into a much
smaller and more efficient timer wheel with only quarter-second or
half-second accuracy, and maybe even coalesced to one-or-two-second
boundaries without harming functionality, because they are almost
certainly going to be removed before they run. This would be a _big_
help in tickless systems, where we can schedule a bunch of networking
timers to all be expired simultaneously, keeping caches hot and
allowing longer sleep times. Likewise, we would port some to the new
API such that they just use the same old ordinary timer wheel. Other
timers that want highres guarantees would use the highres part of the
kernel API (either flags in a structure or a separate set of
functions) and would be added to a slow but very accurate timer tree.

The fact remains that we have two reasonably useful internal timer
structures; one which is optimized for lots of timers being added and
removed frequently, which has poor accuracy, and the other which
doesn't handle a million timers very well, and is poor at adding and
removing timers but has excellent accuracy. We should come up with a
set of recommendations for when to use each interface. The _best_
way to explain that to most kernel developers who don't really
understand the guts of it is:
1) If you need high resolution and you add the timer and let it
expire normally, use the ktimer/whatever API.
2) If you just want to time-out an operation or fail when something
doesn't happen, or a timer that doesn't care about accuracy, use the
ktimeout/whatever2 API.

> So can we please stop this likely/unlikely expiry nonsense? It's
> great if you want to tell aunt Tillie about kernel hacking, but
> it's terrible advice to kernel programmers. When it comes to
> choosing a timer implementation, the delivery is completely and
> utterly unimportant.

The fact is, we have a _lot_ of timers, a _lot_ of kernel hackers,
and we need some easy way to tell people which of two subsystems to
use. The fact that the likely/unlikely stuff is easy to tell aunt
Tillie is precisely what makes it useful to tell kernel hackers with
a half-million other things on their minds. Hopefully it will be
easy enough to understand that when they get around to using timers
for something or another, they'll pick the right API for their task.

Cheers,
Kyle Moffett

2005-12-04 01:29:16

by Andrew James Wade

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Thursday 01 December 2005 14:08, Steven Rostedt wrote:
> On Thu, 2005-12-01 at 18:44 +0100, Roman Zippel wrote:
> > Hi,
> >
> > On Thu, 1 Dec 2005, Russell King wrote:
> ...
> > > Hence, timers have the implication that they are _expected_ to expire.
> > > Timeouts have the implication that their expiry is an exceptional
> > > condition.
> >
> > IOW a timeout uses a timer to implement an exceptional condition after a
> > period of time expires.
> >
> > > So can we stop rehashing this stupid discussion?
> >
> > The naming isn't actually my primary concern. I want a precise definition
> > of the expected behaviour and usage of the old and new timer system. If I
> > had this, it would be far easier to choose a proper name.
> > E.g. I still don't know why ktimeout should be restricted to raise just
> > "error conditions", as the name implies.
> >
>
> ktimeout may not need to be restricted to anything.

But does it make sense to use it in any other circumstances? It sounds
like the rb-tree based ktimer system is suitable for the general case. So
you can have a simple rule: use ktimeout for timing out when an expected
event doesn't occur, and ktimer for everything else. Are there any
situations where you want a timer optimized for the removal case that is not
also monotonic and low-res? And are there any situations in practice other
than the "timeout" one where you'd want to use a timer wheel instead of a
rb-tree?

It sounds to me that the ktimer should be the general case, leaving
ktimeout to be optimized for one particular case (by e.g. decreasing the
resolution to reduce cascades).

Andrew Wade

2005-12-05 19:41:47

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Sat, 3 Dec 2005, Andrew James Wade wrote:

> But does it make sense to use it in any other circumstances? It sounds
> like the rb-tree based ktimer system is suitable for the general case. So
> you can have a simple rule: use ktimeout for timing out when an expected
> event doesn't occur, and ktimer for everything else. Are there any
> situations where you want a timer optimized for the removal case that is not
> also monotonic and low-res? And are there any situations in practice other
> than the "timeout" one where you'd want to use a timer wheel instead of a
> rb-tree?
>
> It sounds to me that the ktimer should be the general case, leaving
> ktimeout to be optimized for one particular case (by e.g. decreasing the
> resolution to reduce cascades).

By reducing the resolution you only reduce the frequency of the cascading,
but the amount of timer in the timer wheel at any time is still the same.
So in general you will have less number of cascades, but not generally
smaller cascades. The latter depends on the actual timeout value and its
distribution over the wheel. This means you can tune the resolution to
avoid most cascading for a specific situation, but that would be a rather
bad general solution.

rbtree based timer are also not necessarily the better general case. The
timer wheel still scales better with O(n) compared to the rbtree with
O(n*log(n)).

It's really better to keep the focus of the new timer at high resolution
timer, that's what it's really better at and we shouldn't try to use it
for everything only because it has such a kool name.

bye, Roman

2005-12-06 02:46:16

by Andrew James Wade

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Monday 05 December 2005 14:40, Roman Zippel wrote:
> ...
> rbtree based timer are also not necessarily the better general case. ...

... As you've mentioned before. Somehow I missed that. Thank you for your
patience.

Andrew

2005-12-07 09:36:50

by James Bruce

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

Roman Zippel wrote:
> On Thu, 1 Dec 2005, David Lang wrote:
>>In addition, once you remove the bulk of these uses from the picture (by
>>makeing them use a new timer type that's optimized for their useage pattern,
>>the 'unlikly to expire' case) the remainder of the timer users easily fall
>>into the catagory where the timer is expected to expire, so that code can
>>accept a performance hit for removing events prior to them going off that
>>would not be acceptable in a general case version.
>
> Guys, before you continue spreading nonsense, please read carefully Ingos
> description of the timer wheel at http://lwn.net/Articles/156329/ .
> Let me also refine the statement I made in this mail: the _focus_ on
> delivery is complete nonsense.

Must you start every email with inflammatory rhetoric? If you want to
know why you find it difficult to get people to see things your way, the
key is in the above paragraph. In everyday life you don't insult a
person on the street and then ask them for directions.

Yes, the expiry/non-expiry distinction is an approximation and perhaps
an oversimplification. However, after insulting others for that, you
continue with your own oversimplification of the algorithms involved.
Following your words, I could say "Roman, before you continue spreading
nonsense, please go back and read your algorithms textbook". The
reality though is that both of you are approximately correct, and
neither post deserves to be called nonsense.

> The delivery is really not the important part, what is important is the
> _lifetime_ of the timer. As Ingo said we try to delay as much work as
> possible into the future, so that all the work needed for short-lived
> timer is basically:
>
> list_add() + list_del()
>
> This is a constant operation and whether at the end is a callback is
> unimportant from the perspective of the timer system.

Timeout-style timers imply a short lifetime, independent of their
maximum expiry time. Regular timers expected to expire can have their
lifetime predicted accurately by looking at their expiry time. An
interface which gives a hint as to the type of timer allows us to
predict the lifetime. Please tell me how this tight relation is nonsense?

You are right in that the lifetime is what is important, but the whole
point of the ktimer distinction is that by knowing if something is a
timer vs a timeout, *we can more accurately predict the lifetime*.

> When the timer spends more time in the timer wheel, it has to be moved
> into different slots over time, but this not a really expensive operation
> either, so e.g. all the work needed with a single cascading step is:
>
> list_add() + list_del() + list_add() + list_del()
>
> This is still quite cheap and with a single cascading step we cover 2^14
> jiffies (2^10 for small configurations), which is quite a lot of time and
> whether in that time the timer is delivered or not doesn't change above
> cost. Another important thing to realize is that this cost is independent
> of the amount of timers, the per timer cost depends only on the timeout
> value.

On the other hand, there is a huge difference between *amortized*
constant time, and constant time. The cascade falls into the former
category, and affects latency a great deal. It's cheap *per timer*, but
by batching so much work to be done at once, it's not cheap to execute
the *cascade operation*. If you are putting important, latency
sensitive timers in the same data structure as non-latency-sensitive
timeouts, it is going to hurt accuracy and timeliness. For timeouts we
don't care much, since they rarely cascade; Timers which expire *will*
go through all the cascades based on their expiry time, and if there are
many of them, worst-case latency will suffer. In a mathematical sense
then, it's not O(1) and to call it so is incorrect. It's "amortized
O(log(l))", where "l" is the lifetime of the timer, and the minimum
resolution is constant.

> So let's look at the new timer which uses a rbtree. Its per timer cost
> doesn't depend on the expiry value, but on the size of the tree instead.
> All you have to do with the timer is:
>
> tree_insert() + tree_remove()
>
> This is not a constant operation, with O(log(n)) it grows quite slowly,
> but in any case it's more expensive than a simple list_add/list_del, this
> means you have to do a number of list operations before it becomes more
> expensive than a single tree operation. The nonconstant cost also means
> the more timer start using the rbtree, the relatively cheaper it becomes
> to use the timer wheel again.

Thank you for a very good argument about why timeouts shouldn't use
rbtree, and should continue to use the timer wheel. Nobody disagrees on
this however, and adding ktimers will not force any existing users to
change to the new interface.

> The break-even point may now be different on various machines, but I think
> it's safe to assume that two list add/del is at least as cheap and usually
> cheaper then a tree add/del. This means timers which run for less than
> 2^14 jiffies are better off using the timer wheel, unless they require the
> higher resolution of the new timer system.

Again, putting timeouts on the timer wheel is ideal, since we know they
tend to have short lifetimes. Same goes for low-resolution timers which
only need jiffy accuracy. However, jiffy accuracy doesn't cut it for a
lot of applications. It is when we add high accuracy that the timer
wheel falls down, and requires a different approach. So let's call "dt"
the desired resolution of the timer in seconds. Then the timer wheel
becomes "amortized O(log(l/dt)) = O(log(l) + log(1/dt))". When you
start talking about resolutions where dt=25usec, then the timer wheel
all the sudden becomes worse than a balanced tree, which is always O(n),
independent of resolution.

And that's the whole *point* about how we got here. Let the low
resolution, low lifetime timeouts stay on the timer wheel, and make a
new approach that specializes in handling longer lifetime, higher
resolution timers. That's ktimers in a nutshell. You seem to be
arguing for it rather than against it.

> Moving timers away from the timer wheel will also not help with the
> problem cases of the timer wheel. If you have a million network timer, a
> cascading step for thousands of timer takes time, but it doesn't change
> the cost per timer, we just have to do the work that we were too lazy to
> do before. In this case it would be better to look into solutions which
> avoid generating millions of timer in first place.

Putting timers on an rbtree most definitely helps with the worst-case
latency of the timer wheel. That is an issue that some of us care very
deeply about.

You've brought up the fact that networking shouldn't use lots of timers
several times in the overall discussion. If you know how to do this,
I'm sure you can start sending patches to netdev and show them all how
stupid they've been all along. However, more likely you'll just find
out that just maybe the networking people really *have* thought about
the problem, and the solution they came up with is actually a pretty
good one.

At any rate, while you fix up all those "timer-abusing" subsystems
throughout the kernel, can we just try to improve the timer system in
the meantime?

> So can we please stop this likely/unlikely expiry nonsense? It's great if
> you want to tell aunt Tillie about kernel hacking, but it's terrible
> advice to kernel programmers. When it comes to choosing a timer
> implementation, the delivery is completely and utterly unimportant.

Expected expiry is a simple predictor of expected lifetime. If we knew
the lifetime, we could use that, but expiry is one hint that is easier
for the developer to provide. Really, we want to know "E[l]/dt" (E[] is
notation for expected value), but that's unrealistic to estimate. What
ktimers says is: if it's a timeout (E[l] is low and dt is high), use the
timer wheel, and if its a timer (E[l] is high and dt is low), use an
rbtree. In what way is that not a reasonable approach?

Jim Bruce

2005-12-07 12:35:52

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Wed, 7 Dec 2005, James Bruce wrote:

> > Guys, before you continue spreading nonsense, please read carefully Ingos
> > description of the timer wheel at http://lwn.net/Articles/156329/ .
> > Let me also refine the statement I made in this mail: the _focus_ on
> > delivery is complete nonsense.
>
> Must you start every email with inflammatory rhetoric? If you want to know
> why you find it difficult to get people to see things your way, the key is in
> the above paragraph. In everyday life you don't insult a person on the street
> and then ask them for directions.

You analogy is wrong: Thomas and Ingo spread flyer for "free food", above
is my frustration about all the people wanting free food.

> And that's the whole *point* about how we got here. Let the low resolution,
> low lifetime timeouts stay on the timer wheel, and make a new approach that
> specializes in handling longer lifetime, higher resolution timers. That's
> ktimers in a nutshell. You seem to be arguing for it rather than against it.

I do, just without the focus on the lifetime, which is really unimportant
for most kernel developers.

> You've brought up the fact that networking shouldn't use lots of timers
> several times in the overall discussion. If you know how to do this, I'm sure
> you can start sending patches to netdev and show them all how stupid they've
> been all along. However, more likely you'll just find out that just maybe the
> networking people really *have* thought about the problem, and the solution
> they came up with is actually a pretty good one.
>
> At any rate, while you fix up all those "timer-abusing" subsystems throughout
> the kernel, can we just try to improve the timer system in the meantime?

James, after giving me a rhetoric lesson you maybe should be a bit more
careful with your own rhetoric. What kind of answer do you expect after
insulting me?

The short version is that I didn't bring up the network timer problem, I
only made a suggestions how it could be solved, but nobody followed me up
on it, so I guess the problem wasn't really that big. Please check the
archives for details.

bye, Roman

2005-12-07 14:15:40

by Kyle Moffett

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Dec 07, 2005, at 07:34, Roman Zippel wrote:
> Hi,
>
> On Wed, 7 Dec 2005, James Bruce wrote:
>> And that's the whole *point* about how we got here. Let the low
>> resolution, low lifetime timeouts stay on the timer wheel, and
>> make a new approach that specializes in handling longer lifetime,
>> higher resolution timers. That's ktimers in a nutshell. You seem
>> to be arguing for it rather than against it.
>
> I do, just without the focus on the lifetime, which is really
> unimportant for most kernel developers.

It _is_ important. Not because kernel developers do care about it,
but because it's important for reasons of its own and therefore they
should. Networking timeouts and highres audio timers are two _VERY_
different applications of "do this thing then", and kernel developers
should be made aware of them. If you disagree, please explain in
detail exactly why you think the lifetime is unimportant. I have yet
to see an email regarding this, and I've searched the archives pretty
carefully, in addition to watching this thread.

Cheers,
Kyle Moffett

--
I lost interest in "blade servers" when I found they didn't throw
knives at people who weren't supposed to be in your machine room.
-- Anthony de Boer


2005-12-07 14:18:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

On Wed, 2005-12-07 at 13:34 +0100, Roman Zippel wrote:
> Hi,
>
> On Wed, 7 Dec 2005, James Bruce wrote:
>
> > > Guys, before you continue spreading nonsense, please read carefully Ingos
> > > description of the timer wheel at http://lwn.net/Articles/156329/ .
> > > Let me also refine the statement I made in this mail: the _focus_ on
> > > delivery is complete nonsense.
> >
> > Must you start every email with inflammatory rhetoric? If you want to know
> > why you find it difficult to get people to see things your way, the key is in
> > the above paragraph. In everyday life you don't insult a person on the street
> > and then ask them for directions.
>
> You analogy is wrong: Thomas and Ingo spread flyer for "free food", above
> is my frustration about all the people wanting free food.

And to think that this all seemed to have started with one simple email:

http://lkml.org/lkml/2005/9/15/128

;-)

-- Steve

2005-12-07 15:04:34

by Roman Zippel

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Hi,

On Wed, 7 Dec 2005, Kyle Moffett wrote:

> > I do, just without the focus on the lifetime, which is really unimportant
> > for most kernel developers.
>
> It _is_ important. Not because kernel developers do care about it, but
> because it's important for reasons of its own and therefore they should.
> Networking timeouts and highres audio timers are two _VERY_ different
> applications of "do this thing then", and kernel developers should be made
> aware of them. If you disagree, please explain in detail exactly why you
> think the lifetime is unimportant. I have yet to see an email regarding this,
> and I've searched the archives pretty carefully, in addition to watching this

Please be precise, the "for most kernel developers" part is important.
In most situations this guideline is more than enough:
- if you need fast and simple timer, use normal timer
- if you need higher resolution, use hrtimer

Only if you want to push the timer system to its limits, you have to take
the corner cases into account. I consider lifetime issues an advanced
topic of timer programming, which Joe Kernelhacker doesn't has to be
overly concerned about (it worked quite well so far).

bye, Roman

2005-12-08 15:46:26

by James Bruce

[permalink] [raw]
Subject: Re: [patch 00/43] ktimer reworked

Roman Zippel wrote:
> You analogy is wrong: Thomas and Ingo spread flyer for "free food", above
> is my frustration about all the people wanting free food.

New features or functionality have to be motivated by looking at the
benefits vs cost. I thought Thomas laid out a pretty good argument in
his first email, and I don't believe either he or Ingo are claiming
their approach is completely without cost. Those of us who are "wanting
free food" mostly just want a clean and acceptable hrtimer implementation.

Hrtimers is a feature that has been a very long time in coming; I first
needed to make an application that ran at a fixed 30Hz in 1999, which
was only possible with high variance per frame due to the 10 msec sleep
resolution. That mostly went away with HZ=1000 in 2.6, but is now back
again since a given linux kernel can choose HZ={100,250} (across arches,
one could never assume HZ=1000 anyway). I find it unfortunate that in
2005, it is still more accurate to sleep on Linux using serial port
writes and flushes than with nanosleep. Yes, there is rtc on some
machines, but many distributions don't allow users to access it by
default (I'm guessing they have a reason for that). So, the benefit is
clear for anyone doing media programming or emulating media devices.
That leaves the technical side, which is making sure the implementation
is right.

The discussion on ktimers unfortunately strayed from the technical very
early on, and I think that has left people in bad moods all the way
until now. To the extent that the discussion was technical however, I
think it has been worthwhile. The implementation continues to improve.

>>You've brought up the fact that networking shouldn't use lots of timers
>>several times in the overall discussion. If you know how to do this, I'm sure
>>you can start sending patches to netdev and show them all how stupid they've
>>been all along. However, more likely you'll just find out that just maybe the
>>networking people really *have* thought about the problem, and the solution
>>they came up with is actually a pretty good one.
>>
>>At any rate, while you fix up all those "timer-abusing" subsystems throughout
>>the kernel, can we just try to improve the timer system in the meantime?
>
> James, after giving me a rhetoric lesson you maybe should be a bit more
> careful with your own rhetoric. What kind of answer do you expect after
> insulting me?

It's not an insult if you truly believe you are right, and 30 years of
unix network stack designs are wrong. Please consider that your
comments might be insulting the people who worked for years to get the
Linux network stack to the point where it was able to saturate GbE
cards. I'd imagine there isn't too much low-hanging fruit left, and to
assume they are being wasteful with resources comes across as a little
arrogant. But again, if they are wrong, by all means demonstrate it
with concrete examples and patches, and we will all benefit in the end.

> The short version is that I didn't bring up the network timer problem, I
> only made a suggestions how it could be solved, but nobody followed me up
> on it, so I guess the problem wasn't really that big. Please check the
> archives for details.

I was reading then. Your solution, IIRC, was to use a coarse periodic
timer to handle multiple network timeouts together. I'd expect we'd end
up keeping track of the separate events somehow, probably on a list. In
that case I don't see the advantage compared to using the timer wheel,
which is also just list manipulation in the common case, as you have
already pointed out. The coarse periodic timer would get the benefit of
better batching though; Right now the timer wheel already batches, but
based on jiffy resolution. Realistically though, the networking people
have RFCs to follow, hardware timing, and many other constraints to
worry about, so we are just hand-waving without knowing more.

One of the things that ktimers enables is moving timers with sub-10ms
resolution requirements off of the wheel, which would allow us to make
the timer wheel coarser, saving either memory or processing time. This
would get the benefits of a specialized coarse network timer such as the
approach you proposed, but without requiring the networking code to
reimplement the timer wheel. The coarse timer wheel for the
low-resolution timers may even make up for the extra processing overhead
of putting hrtimers on an rbtree; On a typical system I would expect
coarse timers (timeouts) to far outnumber hrtimers. Time will tell I guess.

So, let's try to keep focused on the real technical impediments, and
we'll achieve the best end result.

- Jim Bruce