Hello Thomas, et al,
Following on from our discussion of read() on a timerfd [1], I
happened to remember a Debian bug report [2] that points out that
timer_settime() can fail with the error ECANCELED, which is both
surprising and odd (because despite the error, the timer does get
updated).
The relevant kernel code (I think, from your commit [3]) seems to be
the following in timerfd_setup():
if (texp != 0) {
if (flags & TFD_TIMER_ABSTIME)
texp = timens_ktime_to_host(clockid, texp);
if (isalarm(ctx)) {
if (flags & TFD_TIMER_ABSTIME)
alarm_start(&ctx->t.alarm, texp);
else
alarm_start_relative(&ctx->t.alarm, texp);
} else {
hrtimer_start(&ctx->t.tmr, texp, htmode);
}
if (timerfd_canceled(ctx))
return -ECANCELED;
}
Using a small test program [4] shows the behavior. The program loops,
repeatedly calling timerfd_settime() (with a delay of a few seconds
before each call). In another terminal window, enter the following
command a few times:
$ sudo date -s "5 seconds" # Add 5 secs to wall-clock time
I see behavior as follows (the /sudo date -s "5 seconds"/ command was
executed before loop iterations 0, 2, and 4):
[[
$ ./timerfd_settime_ECANCELED
0
Current time is 1585729978 secs, 868510078 nsecs
Timer value is now 0 secs, 0 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999991977 nsecs
1
Current time is 1585729982 secs, 716339545 nsecs
Timer value is now 6 secs, 152167990 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999992940 nsecs
2
Current time is 1585729991 secs, 567377831 nsecs
Timer value is now 1 secs, 148959376 nsecs
timerfd_settime: Operation canceled
Timer value is now 9 secs, 999976294 nsecs
3
Current time is 1585729995 secs, 405385503 nsecs
Timer value is now 6 secs, 161989917 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999993317 nsecs
4
Current time is 1585730004 secs, 225036165 nsecs
Timer value is now 1 secs, 180346909 nsecs
timerfd_settime: Operation canceled
Timer value is now 9 secs, 999984345 nsecs
]]
I note from the above.
(1) If the wall-clock is changed before the first timerfd_settime()
call, the call succeeds. This is of course expected.
(2) If the wall-clock is changed after a timerfd_settime() call, then
the next timerfd_settime() call fails with ECANCELED.
(3) Even if the timerfd_settime() call fails, the timer is still updated(!).
Some questions:
(a) What is the rationale for timerfd_settime() failing with ECANCELED
in this case? (Currently, the manual page says nothing about this.)
(b) It seems at the least surprising, but more likely a bug, that
timerfd_settime() fails with ECANCELED while at the same time
successfully updating the timer value.
Your thoughts?
Thanks,
Michael
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947091
[3]
commit 99ee5315dac6211e972fa3f23bcc9a0343ff58c4
Author: Thomas Gleixner <[email protected]>
Date: Wed Apr 27 14:16:42 2011 +0200
timerfd: Allow timers to be cancelled when clock was set
[4]
/* timerfd_settime_ECANCELED.c */
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <inttypes.h>
#include <sys/timerfd.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
int
main(int argc, char *argv[])
{
struct itimerspec ts, gts;
struct timespec start;
int tfd = timerfd_create(CLOCK_REALTIME, 0);
if (tfd == -1)
errExit("timerfd_create");
ts.it_interval.tv_sec = 0;
ts.it_interval.tv_nsec = 10;
int flags = TFD_TIMER_ABSTIME | TFD_TIMER_CANCEL_ON_SET;
for (long j ; ; j++) {
/* Inject a delay into each loop, by calling getppid()
many times */
for (int k = 0; k < 10000000; k++)
getppid();
if (j % 1 == 0)
printf("%ld\n", j);
/* Display the current wall-clock time */
if (clock_gettime(CLOCK_REALTIME, &start) == -1)
errExit("clock_gettime");
printf("Current time is %ld secs, %ld nsecs\n",
start.tv_sec, start.tv_nsec);
/* Before resetting the timer, retrieve its current value
so that after the timerfd_settime() call, we can see
whether the the value has changed */
if (timerfd_gettime(tfd, >s) == -1)
perror("timerfd_gettime");
printf("Timer value is now %ld secs, %ld nsecs\n",
gts.it_value.tv_sec, gts.it_value.tv_nsec);
/* Reset the timer to now + 10 secs */
ts.it_value.tv_sec = start.tv_sec + 10;
ts.it_value.tv_nsec = start.tv_nsec;
if (timerfd_settime(tfd, flags, &ts, NULL) == -1)
perror("timerfd_settime");
else
printf("timerfd_settime() succeeded\n");
/* Display the timer value once again */
if (timerfd_gettime(tfd, >s) == -1)
perror("timerfd_gettime");
printf("Timer value is now %ld secs, %ld nsecs\n",
gts.it_value.tv_sec, gts.it_value.tv_nsec);
printf("\n");
}
}
On Wed, Apr 01, 2020 at 11:01:18AM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Thomas, et al,
>
> Following on from our discussion of read() on a timerfd [1], I
> happened to remember a Debian bug report [2] that points out that
> timer_settime() can fail with the error ECANCELED, which is both
> surprising and odd (because despite the error, the timer does get
> updated).
>
> The relevant kernel code (I think, from your commit [3]) seems to be
> the following in timerfd_setup():
>
> if (texp != 0) {
> if (flags & TFD_TIMER_ABSTIME)
> texp = timens_ktime_to_host(clockid, texp);
> if (isalarm(ctx)) {
> if (flags & TFD_TIMER_ABSTIME)
> alarm_start(&ctx->t.alarm, texp);
> else
> alarm_start_relative(&ctx->t.alarm, texp);
> } else {
> hrtimer_start(&ctx->t.tmr, texp, htmode);
> }
>
> if (timerfd_canceled(ctx))
> return -ECANCELED;
> }
>
> Using a small test program [4] shows the behavior. The program loops,
> repeatedly calling timerfd_settime() (with a delay of a few seconds
> before each call). In another terminal window, enter the following
> command a few times:
>
> $ sudo date -s "5 seconds" # Add 5 secs to wall-clock time
>
> I see behavior as follows (the /sudo date -s "5 seconds"/ command was
> executed before loop iterations 0, 2, and 4):
Hi Michael, I can be wrong (since I didn't look into timerfd code
for long time) but I guess if we wanna preserve the timer value
we will have to lock timekeeper which is inacceptable. Thus looks
like this is a tradeoff in a sake of speed (not sure though, better
wait for Thomas reply)
Michael,
"Michael Kerrisk (man-pages)" <[email protected]> writes:
> Following on from our discussion of read() on a timerfd [1], I
> happened to remember a Debian bug report [2] that points out that
> timer_settime() can fail with the error ECANCELED, which is both
> surprising and odd (because despite the error, the timer does get
> updated).
...
> (1) If the wall-clock is changed before the first timerfd_settime()
> call, the call succeeds. This is of course expected.
> (2) If the wall-clock is changed after a timerfd_settime() call, then
> the next timerfd_settime() call fails with ECANCELED.
> (3) Even if the timerfd_settime() call fails, the timer is still updated(!).
>
> Some questions:
> (a) What is the rationale for timerfd_settime() failing with ECANCELED
> in this case? (Currently, the manual page says nothing about this.)
> (b) It seems at the least surprising, but more likely a bug, that
> timerfd_settime() fails with ECANCELED while at the same time
> successfully updating the timer value.
Really good question and TBH I can't remember why this is implemented in
the way it is, but I have a faint memory that at least (a) is
intentional.
After staring at the code for a while I came up with the following
answers:
(a): If the clock was set event ("date -s ...") which triggered the
cancel was not yet consumed by user space via read(), then that
information would get lost because arming the timer to the new
value has to reset the state.
(b): Arming the timer in that case is indeed very questionable, but it
could be argued that because the clock was set event happened with
the old expiry value that the new expiry value is not affected.
I'd be happy to change that and not arm the timer in the case of a
pending cancel, but I fear that some user space already depends on
that behaviour.
Thanks,
tglx
Hi Thomas,
On 4/1/20 7:42 PM, Thomas Gleixner wrote:
> Michael,
>
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>> Following on from our discussion of read() on a timerfd [1], I
>> happened to remember a Debian bug report [2] that points out that
>> timer_settime() can fail with the error ECANCELED, which is both
>> surprising and odd (because despite the error, the timer does get
>> updated).
> ...
>> (1) If the wall-clock is changed before the first timerfd_settime()
>> call, the call succeeds. This is of course expected.
>> (2) If the wall-clock is changed after a timerfd_settime() call, then
>> the next timerfd_settime() call fails with ECANCELED.
>> (3) Even if the timerfd_settime() call fails, the timer is still updated(!).
>>
>> Some questions:
>> (a) What is the rationale for timerfd_settime() failing with ECANCELED
>> in this case? (Currently, the manual page says nothing about this.)
>> (b) It seems at the least surprising, but more likely a bug, that
>> timerfd_settime() fails with ECANCELED while at the same time
>> successfully updating the timer value.
>
> Really good question and TBH I can't remember why this is implemented in
> the way it is, but I have a faint memory that at least (a) is
> intentional.
>
> After staring at the code for a while I came up with the following
> answers:
>
> (a): If the clock was set event ("date -s ...") which triggered the
> cancel was not yet consumed by user space via read(), then that
> information would get lost because arming the timer to the new
> value has to reset the state.
>
> (b): Arming the timer in that case is indeed very questionable, but it
> could be argued that because the clock was set event happened with
> the old expiry value that the new expiry value is not affected.
>
> I'd be happy to change that and not arm the timer in the case of a
> pending cancel, but I fear that some user space already depends on
> that behaviour.
Yes, that's the risk, of course. So, shall we just document all
this in the manual page?
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
Michael,
"Michael Kerrisk (man-pages)" <[email protected]> writes:
> On 4/1/20 7:42 PM, Thomas Gleixner wrote:
>> (b): Arming the timer in that case is indeed very questionable, but it
>> could be argued that because the clock was set event happened with
>> the old expiry value that the new expiry value is not affected.
>>
>> I'd be happy to change that and not arm the timer in the case of a
>> pending cancel, but I fear that some user space already depends on
>> that behaviour.
>
> Yes, that's the risk, of course. So, shall we just document all
> this in the manual page?
I think so.
Thanks,
tglx
"Michael Kerrisk (man-pages)" <[email protected]> writes:
> NOTES
> Suppose the following scenario for CLOCK_REALTIME or CLOCK_REAL‐
> TIME_ALARM timer that was created with timerfd_create():
>
> (a) The timer has been started (timerfd_settime()) with the
> TFD_TIMER_ABSTIME and TFD_TIMER_CANCEL_ON_SET flags;
>
> (b) A discontinuous change (e.g. settimeofday(2)) is subsequently
> made to the CLOCK_REALTIME clock; and
>
> (c) the caller once more calls timerfd_settime() to rearm the
> timer (without first doing a read(2) on the file descriptor).
>
> In this case the following occurs:
>
> · The timerfd_settime() returns -1 with errno set to ECANCELED.
> (This enables the caller to know that the previous timer was
> affected by a discontinuous change to the clock.)
>
> · The timer is successfully rearmed with the settings provided in
> the second timerfd_settime() call. (This was probably an imple‐
> mentation accident, but won't be fixed now, in case there are
> applications that depend on this behaviour.)
Clear enough.
Thanks Michael!
On 4/2/20 10:49 AM, Thomas Gleixner wrote:
> Michael,
>
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>> On 4/1/20 7:42 PM, Thomas Gleixner wrote:
>>> (b): Arming the timer in that case is indeed very questionable, but it
>>> could be argued that because the clock was set event happened with
>>> the old expiry value that the new expiry value is not affected.
>>>
>>> I'd be happy to change that and not arm the timer in the case of a
>>> pending cancel, but I fear that some user space already depends on
>>> that behaviour.
>>
>> Yes, that's the risk, of course. So, shall we just document all
>> this in the manual page?
>
> I think so.
>
> Thanks,
Okay. How is the following?
NOTES
Suppose the following scenario for CLOCK_REALTIME or CLOCK_REAL‐
TIME_ALARM timer that was created with timerfd_create():
(a) The timer has been started (timerfd_settime()) with the
TFD_TIMER_ABSTIME and TFD_TIMER_CANCEL_ON_SET flags;
(b) A discontinuous change (e.g. settimeofday(2)) is subsequently
made to the CLOCK_REALTIME clock; and
(c) the caller once more calls timerfd_settime() to rearm the
timer (without first doing a read(2) on the file descriptor).
In this case the following occurs:
· The timerfd_settime() returns -1 with errno set to ECANCELED.
(This enables the caller to know that the previous timer was
affected by a discontinuous change to the clock.)
· The timer is successfully rearmed with the settings provided in
the second timerfd_settime() call. (This was probably an imple‐
mentation accident, but won't be fixed now, in case there are
applications that depend on this behaviour.)
Thanks,
Michael
diff --git a/man2/timerfd_create.2 b/man2/timerfd_create.2
index ec137fbfe..98225dcad 100644
--- a/man2/timerfd_create.2
+++ b/man2/timerfd_create.2
@@ -477,6 +477,9 @@ is not a valid timerfd file descriptor.
.BR timerfd_settime ()
can also fail with the following errors:
.TP
+.B ECANCELED
+See NOTES.
+.TP
.B EINVAL
.I new_value
is not properly initialized (one of the
@@ -493,6 +496,52 @@ These system calls are available on Linux since kernel 2.6.
25.
Library support is provided by glibc since version 2.8.
.SH CONFORMING TO
These system calls are Linux-specific.
+.SH NOTES
+Suppose the following scenario for
+.BR CLOCK_REALTIME
+or
+.BR CLOCK_REALTIME_ALARM
+timer that was created with
+.BR timerfd_create ():
+.IP (a) 4
+The timer has been started
+.RB ( timerfd_settime ())
+with the
+.BR TFD_TIMER_ABSTIME
+and
+.BR TFD_TIMER_CANCEL_ON_SET
+flags;
+.IP (b)
+A discontinuous change (e.g.
+.BR settimeofday (2))
+is subsequently made to the
+.BR CLOCK_REALTIME
+clock; and
+.IP (c)
+the caller once more calls
+.BR timerfd_settime ()
+to rearm the timer (without first doing a
+.BR read (2)
+on the file descriptor).
+.PP
+In this case the following occurs:
+.IP \(bu 2
+The
+.BR timerfd_settime ()
+returns \-1 with
+.I errno
+set to
+.BR ECANCELED .
+(This enables the caller to know that the previous timer was affected
+by a discontinuous change to the clock.)
+.IP \(bu
+The timer
+.I "is successfully rearmed"
+with the settings provided in the second
+.BR timerfd_settime ()
+call.
+(This was probably an implementation accident, but won't be fixed now,
+in case there are applications that depend on this behaviour.)
.SH BUGS
Currently,
.\" 2.6.29
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
On 4/2/20 3:35 PM, Thomas Gleixner wrote:
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>> NOTES
>> Suppose the following scenario for CLOCK_REALTIME or CLOCK_REAL‐
>> TIME_ALARM timer that was created with timerfd_create():
>>
>> (a) The timer has been started (timerfd_settime()) with the
>> TFD_TIMER_ABSTIME and TFD_TIMER_CANCEL_ON_SET flags;
>>
>> (b) A discontinuous change (e.g. settimeofday(2)) is subsequently
>> made to the CLOCK_REALTIME clock; and
>>
>> (c) the caller once more calls timerfd_settime() to rearm the
>> timer (without first doing a read(2) on the file descriptor).
>>
>> In this case the following occurs:
>>
>> · The timerfd_settime() returns -1 with errno set to ECANCELED.
>> (This enables the caller to know that the previous timer was
>> affected by a discontinuous change to the clock.)
>>
>> · The timer is successfully rearmed with the settings provided in
>> the second timerfd_settime() call. (This was probably an imple‐
>> mentation accident, but won't be fixed now, in case there are
>> applications that depend on this behaviour.)
>
> Clear enough.
>
> Thanks Michael!
Thanks. Committed. (But, next time you change the API. maybe a
man-pages patch to go with that? :-).)
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
Michael,
"Michael Kerrisk (man-pages)" <[email protected]> writes:
> Thanks. Committed. (But, next time you change the API. maybe a
> man-pages patch to go with that? :-).)
I try to remember.