LinuxLists.cc - [Q] jiffies overflow & timers.

2003-11-17 18:36:28

Subject: [Q] jiffies overflow & timers.

Hello!

[ ver 2.4.18/22 ]

[ I feel strongly that is FAQ - so any ptr(RTFM)!=0 apreciated. ]

[ ldd2 covers this very sparsely - overflow case is not covered at
all - just mentioned. Google - looks like I cannot find right keywords
for this ... ]

I'm trying to find correct solution for case of jiffies overflow and
standard kernel timers (./kernel/timer.c).

My module has to maintain list of timers. I cannot reuse directly
struct timer_list - since it uses jiffies and jiffies do wrap on overflow.

I decided to use struct timeval & do_gettimeofday(). But still I
have to handle case when next timer to expire will happend after jiffies
will overflow.

So my question - how to detect that jiffies had overflown?

Is the following code is sufficient?
(Assuming that I will not try to set timer longer than (~0UL/(HZ))
seconds)

unsigned long
tv_get_next_expiring_jiffies( struct timeval *target_tv )
{
struct timeval curr_tv, timeout;
ulong dif_jif;

do_gettimeofday( &curr_tv );

/* timeout = curr_tv - target_tv */
tv_sub( &timeout, &curr_tv, target_tv );

dif_jif = tv_to_jiffies( &timeout ); /* assumption above. */

if (jiffies > ~0UL - dif_jif) {
/* will overflow
* so wait for overflow, then just reschedule
*/
return ~0UL;
} else {
/* will not */
return jiffies + dif_jif;
}
}

Is (~0UL) "safe harbour"? in other words - will this value reached? /me
cannot find where jiffies is incremented... Dumd grep -r jiffies gives
no results (no assignment, no taking of pointer, literally no matches in
.S files - what I'm missing?) Because if this value will be reached and
I will detect that jiffies == ~0UL I will have to "while(jiffies ==
~0UL);" wait for overflow.

--
Ihar 'Philips' Filipau / with best regards from Saarbruecken.
-- _ _ _
"... and for $64000 question, could you get yourself |_|*|_|
vaguely familiar with the notion of on-topic posting?" |_|_|*|
-- Al Viro @ LKML |*|*|*|

2003-11-17 18:59:33

by Richard B. Johnson

[permalink] [raw]

Subject: Re: [Q] jiffies overflow & timers.

On Mon, 17 Nov 2003, Ihar 'Philips' Filipau wrote:

> Hello!
>
[SNIPPED...]

Use jiffies as other modules use it:

tim = jiffies + TIMEOUT_IN_HZ;
while(time_before(jiffies, tim))
{
if(what_im_waiting_for())
break;
current->policy |= SCHED_YIELD;
schedule();
}
//
// Note that somebody could have taken the CPU for many seconds
// causing a 'timeout', therefore, you need to add one more check
// after loop-termination:
//
if(what_im_waiting_for())
good();
else
timed_out();

Overflow is handled up to one complete wrap of jiffies + TIMEOUT. It's
only the second wrap that will fail and if you are waiting several
months for something to happen in your code, the code is broken.

>
> My module has to maintain list of timers. I cannot reuse directly
> struct timer_list - since it uses jiffies and jiffies do wrap on overflow.
>

If you need a list of timers, you grab the current jiffies plus
the current test value. After that, you use the time_before() macro
to test them.

>
> So my question - how to detect that jiffies had overflown?
>

You don't. Correct code will not require it. Correct code uses
jiffies as a relative value, not an absolute one. If you need
an absolute time value, you need to use sys_gettimeofday(), but
you should never use this for a time-out, only some time-stamp
because not all bits are exercised.

> Is the following code is sufficient?
> (Assuming that I will not try to set timer longer than (~0UL/(HZ))
> seconds)
>

See above for code that uses jiffies.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2003-11-17 21:02:16

by Ihar 'Philips' Filipau

[permalink] [raw]

Subject: Re: [Q] jiffies overflow & timers.

Richard B. Johnson wrote:
>
> Use jiffies as other modules use it:
>
> tim = jiffies + TIMEOUT_IN_HZ;
> while(time_before(jiffies, tim))
> {
> if(what_im_waiting_for())
> break;
> current->policy |= SCHED_YIELD;
> schedule();
> }
> //
> // Note that somebody could have taken the CPU for many seconds
> // causing a 'timeout', therefore, you need to add one more check
> // after loop-termination:
> //
> if(what_im_waiting_for())
> good();
> else
> timed_out();
>
> Overflow is handled up to one complete wrap of jiffies + TIMEOUT. It's
> only the second wrap that will fail and if you are waiting several
> months for something to happen in your code, the code is broken.
>

Thanks! Looks & sounds sane.

Will try to apply this to my case.
what_im_waiting_for() == 'any expired timer'. I'm generating event to
upper layer, if timer wasn't canceled before. This is network layer
implementation - e.g. if line is Okay for some specified time, we need
to generate event that line is 'up'. Or if we didn't get positive ack
for some specified time resend payload. Something like this.

And sure you example needs to be enhanced for the case when timer
gets canceled before. Doable in anyway.

You example implies I have to have something I can call schedule() on
- currently I'm running without any user space part. To follow your
advice I will need to create process a la kswapd/keventd?

P.S. BTW it looks like that tcp_timer.c does exactly what I do right now
- this is 2.4.22 - hope 2.6 handles this correctly? I had enlightenment
to take a look into tcp - tcp has to have timers - but found (as it
seems) bug...

--
Ihar 'Philips' Filipau / with best regards from Saarbruecken.
-- _ _ _
"... and for $64000 question, could you get yourself |_|*|_|
vaguely familiar with the notion of on-topic posting?" |_|_|*|
-- Al Viro @ LKML |*|*|*|

2003-11-17 21:25:59

by Richard B. Johnson

[permalink] [raw]

Subject: Re: [Q] jiffies overflow & timers.

On Mon, 17 Nov 2003, Ihar 'Philips' Filipau wrote:

> Richard B. Johnson wrote:
> >
> > Use jiffies as other modules use it:
> >
> > tim = jiffies + TIMEOUT_IN_HZ;
> > while(time_before(jiffies, tim))
> > {
> > if(what_im_waiting_for())
> > break;
> > current->policy |= SCHED_YIELD;
> > schedule();
> > }
> > //
> > // Note that somebody could have taken the CPU for many seconds
> > // causing a 'timeout', therefore, you need to add one more check
> > // after loop-termination:
> > //
> > if(what_im_waiting_for())
> > good();
> > else
> > timed_out();
> >
> > Overflow is handled up to one complete wrap of jiffies + TIMEOUT. It's
> > only the second wrap that will fail and if you are waiting several
> > months for something to happen in your code, the code is broken.
> >
>
> Thanks! Looks & sounds sane.
>
> Will try to apply this to my case.
> what_im_waiting_for() == 'any expired timer'. I'm generating event to
> upper layer, if timer wasn't canceled before. This is network layer
> implementation - e.g. if line is Okay for some specified time, we need
> to generate event that line is 'up'. Or if we didn't get positive ack
> for some specified time resend payload. Something like this.
>
> And sure you example needs to be enhanced for the case when timer
> gets canceled before. Doable in anyway.
>
> You example implies I have to have something I can call schedule() on
> - currently I'm running without any user space part. To follow your
> advice I will need to create process a la kswapd/keventd?
>

schedule() is the kernel procedure that gives the CPU to somebody
while your code is waiting for something to happen. You cannot
call that in an interrupt or when a lock is held.

current->policy is a member of 'current' a pointer to a structure
that completely defines the current task being executed. It exists
inside the kernel. You don't need to create a task. Lots of tasks
already exist.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2003-11-18 09:53:40

by Ihar 'Philips' Filipau

[permalink] [raw]

Subject: Re: [Q] jiffies overflow & timers.

Richard B. Johnson wrote:
>>Richard B. Johnson wrote:
>>
>>>Use jiffies as other modules use it:
>>>
>>> tim = jiffies + TIMEOUT_IN_HZ;
>>> while(time_before(jiffies, tim))
>>> {
>>> if(what_im_waiting_for())
>>> break;
>>> current->policy |= SCHED_YIELD;
>>> schedule();
>>> }
>>>//
>>>// Note that somebody could have taken the CPU for many seconds
>>>// causing a 'timeout', therefore, you need to add one more check
>>>// after loop-termination:
>>>//
>>> if(what_im_waiting_for())
>>> good();
>>> else
>>> timed_out();
>>>
>>>Overflow is handled up to one complete wrap of jiffies + TIMEOUT. It's
>>>only the second wrap that will fail and if you are waiting several
>>>months for something to happen in your code, the code is broken.
>>>

time_before(a,b) == (((long)a - (long)b) < 0)

Can you explain me this games with signs there?
Or this code expected to work reliably for timeouts < (ULONG_MAX/2)?
time_before/time_after - do implicit conversion to signed types,
while jiffies/friends are all unsigned. If one day gcc will be fixed -
and it will truncate data here as I expect it to do - this will not work
at all. Or this is a feature of 2-complement archs?
(ldd2 again is silent on this topic - and I'm totally confused...)

>
> schedule() is the kernel procedure that gives the CPU to somebody
> while your code is waiting for something to happen. You cannot
> call that in an interrupt or when a lock is held.
>

It is state machine, it is event driven - there is nothing that can
yield CPU to someone else, because in first place it does not take CPU ;-)))
Right now it is run from tasklet - so ksoftirqd context.

Ok.
Thinking about this gave me hints to understand userspace
implementation of timers, which was used with my network layers before I
have started kernel port.
Idea is simple: all times absolute (think struct timeval). all given
timer events are put into let us say binary heap, with timeval used as
key. Check for expiration == O(1) - and this check is called in
"while(1) { schedule(); }" loop. If we have NO expired timer - we are
fast to yield CPU to someone else. Slow case of dequeueing from heap
(what is O(log(n))) is really slow by definition - we are dequeueing
event from heap and it needs to be processed.

Looks Ok to me.
Clearer/cleaner/safer than games with sign & ./kernel/timer.c
implementation (internal_add_timer/cascade_timers/run_timer_list - what
all those mess is about?).

--
Ihar 'Philips' Filipau / with best regards from Saarbruecken.
-- _ _ _
"... and for $64000 question, could you get yourself |_|*|_|
vaguely familiar with the notion of on-topic posting?" |_|_|*|
-- Al Viro @ LKML |*|*|*|

2003-11-18 13:23:23

by Richard B. Johnson

[permalink] [raw]

Subject: Re: [Q] jiffies overflow & timers.

On Tue, 18 Nov 2003, Ihar 'Philips' Filipau wrote:

> Richard B. Johnson wrote:
> >>Richard B. Johnson wrote:
> >>
> >>>Use jiffies as other modules use it:
> >>>
> >>> tim = jiffies + TIMEOUT_IN_HZ;
> >>> while(time_before(jiffies, tim))
> >>> {
> >>> if(what_im_waiting_for())
> >>> break;
> >>> current->policy |= SCHED_YIELD;
> >>> schedule();
> >>> }
> >>>//
> >>>// Note that somebody could have taken the CPU for many seconds
> >>>// causing a 'timeout', therefore, you need to add one more check
> >>>// after loop-termination:
> >>>//
> >>> if(what_im_waiting_for())
> >>> good();
> >>> else
> >>> timed_out();
> >>>
> >>>Overflow is handled up to one complete wrap of jiffies + TIMEOUT. It's
> >>>only the second wrap that will fail and if you are waiting several
> >>>months for something to happen in your code, the code is broken.
> >>>
>
>
> time_before(a,b) == (((long)a - (long)b) < 0)
>
> Can you explain me this games with signs there?
> Or this code expected to work reliably for timeouts < (ULONG_MAX/2)?
> time_before/time_after - do implicit conversion to signed types,
> while jiffies/friends are all unsigned. If one day gcc will be fixed -
> and it will truncate data here as I expect it to do - this will not work
> at all. Or this is a feature of 2-complement archs?
> (ldd2 again is silent on this topic - and I'm totally confused...)
>

There are no games here. The macro does explicit conversion (for the
comparison only) so that the comparison operation will work correctly.
It will work correctly if your time-outs are less than about 1.5 months
(ULONG_MAX/2)-1. If you have time-outs exceeding that, you should rethink
your line of work.

It has nothing to do with 2s-compliment and everything to do with
signed integer comparison. In 32-bit land, 0xffffffff is -1. When
I add a number, say 10 to that, causing overflow, the number is
0x00000009. When comparing, -1 is still less than +9 so the fact
that some wrap occurred is not of consequence.

If gcc fails to work this way, then gcc is broken. It is unlikely
that somebody will break gcc to make this not work since it would
make arithmetic of signed integers wrong.
-1 + -1 = -2
0xffffffff + 0xffffffff = 0xfffffffe -> a large wrap occurred
here!

>
> >
> > schedule() is the kernel procedure that gives the CPU to somebody
> > while your code is waiting for something to happen. You cannot
> > call that in an interrupt or when a lock is held.
> >
>
> It is state machine, it is event driven - there is nothing that can
> yield CPU to someone else, because in first place it does not take CPU ;-)))
> Right now it is run from tasklet - so ksoftirqd context.
>

If you are looping in a tasklet, waiting for something to happen, you
are "locked up" until the event which may never happen. If for some
"bad hardware that they won't fix" reason, you must loop, then you
need to create a kernel thread. If the hardware is good and you
decided to write a driver this way, then you need to re-think
what you are doing.

[SNIPPED rest...]

Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (797.90 BogoMips).
Note 96.31% of all statistics are fiction.

2003-11-18 14:11:57

by Ihar 'Philips' Filipau

[permalink] [raw]

Subject: Re: [Q] jiffies overflow & timers.

Richard B. Johnson wrote:

[ Very good lection on 2-complement arithmetics snipped. This is I
beleive feature of C: (type) conversion say to compiler "forget about
original data type - use this one". So this was misunderstanding on my
side. And thanks for examples ;-) ]

>>
>> It is state machine, it is event driven - there is nothing that can
>>yield CPU to someone else, because in first place it does not take CPU ;-)))
>> Right now it is run from tasklet - so ksoftirqd context.
>>
>
>
> If you are looping in a tasklet, waiting for something to happen, you
> are "locked up" until the event which may never happen. If for some
> "bad hardware that they won't fix" reason, you must loop, then you
> need to create a kernel thread. If the hardware is good and you
> decided to write a driver this way, then you need to re-think
> what you are doing.
>

As I said this is network layer.
My hardware is good - we miss nothing ;-) [ After all it is made in
room next to mine ;-))) ]

I have situations when layer needs to know that given time had passed
- so if is there any other way to do it without timers - I will really
appreciate for any advice.

The same stuff as in TCP/IP - after some time we need to tell that
this connection timed-out. and TCP/IP uses timers for this: given
routine will be called at specified jiffies, so e.g. handshake can be
invalidated or data retransmission invoked.

In my case layer service routines will generate specially formated
input for fsm - so fsm will know that given timer has expired and will
take actions accordingly. Quite flexible - I enqueueing can be put into
any context and just need to "tasklet_schedule( &layer_entry );" The
same way skbs are passed to fsm. Pretty standard telecoms layer.

So to draw conclusion: ./kernel/timer.c:
add_timer/del_timer/mod_timer are not protected against jiffies
overflow. internal_add_timer() implicitely schedules items with expires
< jiffies into the head, so they will be processed at next timer tick.
(check 2.4.22 & 2.6.0test7)

Actually I beleive that here the same trick as for Y2K problem can be
used:
if (expires < ULONG_MAX/2 && jiffies >= ULONG_MAX/2
&& (jiffies - expires) > ULONG_MAX/2)
assume_expires_has_overflown();
In any way, as you say, if someone sets the timer for time longer
that ULONG_MAX/2 - this is most likely bug. So as for it makes sens to
put checks in this routines with given above kind of hack to handle
jiffies overflow. And add const for longest timer expires period -
((ULONG_MAX/2)-1)
For example I need timer at top for about 45 seconds. Not more.
TCP timers at most about 1.5/2 minutes long.

I have no clue what those stuff in internal_add_timer() does - so I
cannot produce any patch to try. I can only guess what there really
going on. But case of expires overflow is clearly stated there as case
when ((signed long) idx < 0) - timer "in the past" comment. But whole
ide of those queueing is pretty obscure. And no external reference for
algoritms/alike...

--
Ihar 'Philips' Filipau / with best regards from Saarbruecken.
-- _ _ _
"... and for $64000 question, could you get yourself |_|*|_|
vaguely familiar with the notion of on-topic posting?" |_|_|*|
-- Al Viro @ LKML |*|*|*|