2007-08-07 06:56:56

by Michael Kerrisk

[permalink] [raw]
Subject: Re: Problems with timerfd()

Andrew,

I'm working on the changes to timerfd(), but must admit I am struggling to
understand some of the kernel code for working with userspace timers (e.g.,
in kernel/posix-timers.c). Can you suggest anyone who could provide
assistance?

Cheers,

Michael

Andrew Morton wrote:
> On Wed, 25 Jul 2007 20:18:51 +0200
> Michael Kerrisk <[email protected]> wrote:
>
>> Andrew,
>>
>> Andrew Morton wrote:
>>> On Mon, 23 Jul 2007 08:32:29 +0200 Michael Kerrisk <[email protected]> wrote:
>>>
>>>> Andrew,
>>>>
>>>> The timerfd() syscall went into 2.6.22. While writing the man page for
>>>> this syscall I've found some notable limitations of the interface, and I am
>>>> wondering whether you and Linus would consider having this interface fixed
>>>> for 2.6.23.
>>>>
>>>> On the one hand, these fixes would be an ABI change, which is of course
>>>> bad. (However, as noted below, you have already accepted one of the ABI
>>>> changes that I suggested into -mm, after Davide submitted a patch.)
>>>>
>>>> On the other hand, the interface has not yet made its way into a glibc
>>>> release, and the change will not break applications. (The 2.6.22 version
>>>> of the interface would just be "broken".)
>>> I think if the need is sufficient we can do this: fix it in 2.6.23 and in
>>> 2.6.22.x. That means that there will be a few broken-on-new-glibc kernels
>>> out in the wild, but very few I suspect.
>> So I'm still not quite clear. Can I take it from your statement above that
>> the proposed ABI changes would be admissible, as long as Davide is okay
>> with them?
>>
>
> yup, I'll send that diff into Linus and -stable and see what happens.
>

--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance? Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages/
read the HOWTOHELP file and grep the source files for 'FIXME'.


2007-08-07 07:37:41

by Andrew Morton

[permalink] [raw]
Subject: Re: Problems with timerfd()

On Tue, 07 Aug 2007 08:55:01 +0200 Michael Kerrisk <[email protected]> wrote:

> I'm working on the changes to timerfd(), but must admit I am struggling to
> understand some of the kernel code for working with userspace timers (e.g.,
> in kernel/posix-timers.c).

Join the club.

> Can you suggest anyone who could provide assistance?

The code was originally contributed by George Anzinger. He has moved on
and Thomas Gleixner now does most work in there. I'd expect that Thomas is
your guy, but he is nearing the end of a two-week vacation and will
presumably return to an akpm-sized inbox, so I'd give him a little time ;)

2007-08-07 09:15:28

by Michael Kerrisk

[permalink] [raw]
Subject: Re: Problems with timerfd()

> On Tue, 07 Aug 2007 08:55:01 +0200 Michael Kerrisk <[email protected]>
> wrote:
>
> > I'm working on the changes to timerfd(), but must admit I am struggling
> > to understand some of the kernel code for working with userspace timers
> > (e.g., in kernel/posix-timers.c).
>
> Join the club.

Ahhh good to know I'm not the only one challeged by the code.

> > Can you suggest anyone who could provide assistance?
>
> The code was originally contributed by George Anzinger.

Yes, I pinged George a week or so back, but got no response.
(I think he's more or less retired nowadays.)

> He has moved on
> and Thomas Gleixner now does most work in there. I'd expect that Thomas
> is
> your guy, but he is nearing the end of a two-week vacation and will
> presumably return to an akpm-sized inbox, so I'd give him a little time
> ;)

Thanks -- Thomas CCed on this reply.

Unfortunately I myself go on holiday on Saturday for two weeks.
All of these holidays (I guess that Davide is only back in another
1.5 weeks or so) start to make the chances of getting change
in during the .23-rc window look tight...

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-08-09 21:14:03

by Michael Kerrisk

[permalink] [raw]
Subject: [PATCH] Revised timerfd() interface

Andrew,

Here's my first shot at changing the timerfd() interface; patch
is against 2.6.23-rc1-mm2.

I think I got to the bottom of understanding the timer code in
the end, but I may still have missed some things...

This is my first kernel patch of any substance. Perhaps Randy
or Christoph can give my toes a light toasting for the various
boobs I've likely made.

Thomas, would you please look over this patch to verify my
timer code?

Davide: would you please bless this interface change ;-).
It makes it possible to:

1) Fetch the previous timer settings (i.e., interval, and time
remaining until next expiration) while installing new settings

2) Retrieve the settings of an existing timer without changing
the timer.

Both of these features are supported by the older Unix timer APIs
(setitimer/getitimer and timer_create/timer_settime/timer_gettime).

The changes are as follows:

a) A further argument, outmr, can be used to retrieve the interval
and time remaining before the expiration of a previously created
timer. Thus the call signature is now:

timerfd(int utmr, int clockid, int flags,
struct itimerspec *utmr,
struct itimerspec *outmr)

The outmr argument:

1) must be NULL for a new timer (ufd == -1).
2) can be NULL if we are updating an existing
timer (i.e., utmr != NULL), but we are not
interested in the previous timer value.
3) can be used to retrieve the time until next timer
expiration, without changing the timer.
(In this case, utmr == NULL and ufd >= 0.)

b) Make the 'clockid' immutable: it can only be set
if 'ufd' is -1 -- that is, on the timerfd() call that
first creates a timer. (This is effectively
the limitation that is imposed by POSIX timers: the
clockid is specified when the timer is created with
timer_create(), and can't be changed.)

[In the 2.6.22 interface, the clockid of an existing
timer can be changed with a further call to timerfd()
that specifies the file descriptor of an existing timer.]

The use cases are as follows:

ufd = timerfd(-1, clockid, flags, utmr, NULL);
to create a new timer with given clockid, flags, and utmr (initial
expiration + interval). outmr must be NULL.

ufd = timerfd(ufd, -1, flags, utmr, NULL);
to change the flags and timer settings of an existing timer.
(The clockid argument serves no purpose when modifying an
existing timer, and as I've implemented things, the argument
must be specified as -1 for an existing timer.)

ufd = timerfd(ufd, -1, flags, utmr, &outmr);
to change the flags and timer settings of an existing timer, and
retrieve the time until the next expiration of the timer (and
the associated interval).

ufd = timerfd(ufd, -1, 0, NULL, &outmr);
Return the time until the next expiration of the timer (and the
associated interval), without changing the existing timer settings.
The flags argument has no meaning in this case, and must be
specified as 0.

Other changes:

* I've added checks for various combinations of arguments values
that could be deemed invalid; there is probably room for debate
about whether we want all of these checks. Comments in the
patch describe these checks.

* Appropriate, and possibly even correct, changes made in fs/compat.c.

* Jan Engelhardt noted that a cast could be removed from Davide's
original code; I've removed it.

The changes are correct as far as I've tested them. I've attached a
test program. Davide, could you subject this to further test please?

I'm off on holiday the day after tomorrow, back in two weeks, with
very limited computer access. I may have time for another pass of
the code if comments come in over(euro)night, otherwise Davide may
be willing to clean up whatever I messed up...

Cheers,

Michael

diff -ruN linux-2.6.23-rc1-mm2-orig/fs/compat.c linux-2.6.23-rc1-mm2/fs/compat.c
--- linux-2.6.23-rc1-mm2-orig/fs/compat.c 2007-08-05 10:43:30.000000000 +0200
+++ linux-2.6.23-rc1-mm2/fs/compat.c 2007-08-07 22:08:15.000000000 +0200
@@ -2211,18 +2211,38 @@
#ifdef CONFIG_TIMERFD

asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
- const struct compat_itimerspec __user *utmr)
+ const struct compat_itimerspec __user *utmr,
+ struct compat_itimerspec __user *old_utmr)
{
struct itimerspec t;
struct itimerspec __user *ut;
+ long ret;

- if (get_compat_itimerspec(&t, utmr))
- return -EFAULT;
- ut = compat_alloc_user_space(sizeof(*ut));
- if (copy_to_user(ut, &t, sizeof(t)))
- return -EFAULT;
+ /*:mtk: Framework taken from compat_sys_mq_getsetattr() */

- return sys_timerfd(ufd, clockid, flags, ut);
+ ut = compat_alloc_user_space(2 * sizeof(*ut));
+
+ if (utmr) {
+ if (get_compat_itimerspec(&t, utmr))
+ return -EFAULT;
+ if (copy_to_user(ut, &t, sizeof(t)))
+ return -EFAULT;
+ }
+
+ ret = sys_timerfd(ufd, clockid, flags,
+ utmr ? ut : NULL,
+ old_utmr ? ut + 1 : NULL);
+
+ if (ret)
+ return ret;
+
+ if (old_utmr) {
+ if (copy_from_user(&t, old_ut, sizeof(t)) ||
+ put_compat_itimerspec(&t, old_utmr))
+ return -EFAULT;
+ }
+
+ return 0;
}

#endif /* CONFIG_TIMERFD */
diff -ruN linux-2.6.23-rc1-mm2-orig/fs/timerfd.c linux-2.6.23-rc1-mm2/fs/timerfd.c
--- linux-2.6.23-rc1-mm2-orig/fs/timerfd.c 2007-08-05 10:44:24.000000000 +0200
+++ linux-2.6.23-rc1-mm2/fs/timerfd.c 2007-08-09 22:11:25.000000000 +0200
@@ -2,6 +2,8 @@
* fs/timerfd.c
*
* Copyright (C) 2007 Davide Libenzi <[email protected]>
+ * and Copyright (C) 2007 Google, Inc.
+ * (Author: Michael Kerrisk <[email protected]>)
*
*
* Thanks to Thomas Gleixner for code reviews and useful comments.
@@ -26,6 +28,12 @@
ktime_t tintv;
wait_queue_head_t wqh;
int expired;
+ int clockid; /* Established when timer is created, then
+ immutable */
+ int overrun; /* Number of overruns for this timer so far,
+ updated on calls to timerfd() that retrieve
+ settings of an existing timer, reset to zero
+ by timerfd_read() */
};

/*
@@ -61,6 +69,7 @@
hrtimer_init(&ctx->tmr, clockid, htmode);
ctx->tmr.expires = texp;
ctx->tmr.function = timerfd_tmrproc;
+ ctx->overrun = 0;
if (texp.tv64 != 0)
hrtimer_start(&ctx->tmr, texp, htmode);
}
@@ -130,10 +139,11 @@
* callback to avoid DoS attacks specifying a very
* short timer period.
*/
- ticks = (u64)
+ ticks = ctx->overrun +
hrtimer_forward(&ctx->tmr,
hrtimer_cb_get_time(&ctx->tmr),
ctx->tintv);
+ ctx->overrun = 0;
hrtimer_restart(&ctx->tmr);
} else
ticks = 1;
@@ -151,24 +161,69 @@
};

asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
- const struct itimerspec __user *utmr)
+ const struct itimerspec __user *utmr,
+ struct itimerspec __user *old_utmr)
{
int error;
struct timerfd_ctx *ctx;
struct file *file;
struct inode *inode;
- struct itimerspec ktmr;
+ struct itimerspec ktmr, oktmr;

- if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
- return -EFAULT;
+ /*
+ * Not sure if we really need the first check below -- it
+ * relies on all clocks having a non-negative value. That's
+ * currently true, but I'm not sure that it is anywhere
+ * documented as a requirement.
+ * (The point of the check is that clockid has no meaning if
+ * we are changing the settings of an existing timer
+ * (i.e., ufd >= 0), so let's require it to be an otherwise
+ * unused value.)
+ */

- if (clockid != CLOCK_MONOTONIC &&
- clockid != CLOCK_REALTIME)
+ if (ufd >= 0) {
+ if (clockid != -1)
+ return -EINVAL;
+ } else if (clockid != CLOCK_MONOTONIC && clockid != CLOCK_REALTIME)
+ return -EINVAL;
+
+ /*
+ * Disallow any other bits in flags than those we implement.
+ */
+ if ((flags & ~(TFD_TIMER_ABSTIME)) != 0)
+ return -EINVAL;
+
+ /*
+ * Not sure if this check is strictly necessary, except to stop
+ * userspace trying to do something silly. Flags only has meaning
+ * if we are creating/modifying a timer.
+ */
+ if (utmr == NULL && flags != 0)
return -EINVAL;
- if (!timespec_valid(&ktmr.it_value) ||
- !timespec_valid(&ktmr.it_interval))
+
+ /*
+ * If we are creating a new timer, then we must supply the setting
+ * and there is no previous timer setting to retrieve.
+ */
+ if (ufd == -1 && (utmr == NULL || old_utmr != NULL))
+ return -EINVAL;
+
+ /*
+ * Caller must provide a new timer setting, or ask for previous
+ * setting, or both.
+ */
+ if (utmr == NULL && old_utmr == NULL)
return -EINVAL;

+ if (utmr) {
+ if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
+ return -EFAULT;
+
+ if (!timespec_valid(&ktmr.it_value) ||
+ !timespec_valid(&ktmr.it_interval))
+ return -EINVAL;
+ }
+
if (ufd == -1) {
ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
@@ -176,6 +231,11 @@

init_waitqueue_head(&ctx->wqh);

+ /*
+ * Save the clockid since we need it later if we modify
+ * the timer.
+ */
+ ctx->clockid = clockid;
timerfd_setup(ctx, clockid, flags, &ktmr);

/*
@@ -195,23 +255,61 @@
fput(file);
return -EINVAL;
}
- /*
- * We need to stop the existing timer before reprogramming
- * it to the new values.
- */
- for (;;) {
- spin_lock_irq(&ctx->wqh.lock);
- if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
- break;
- spin_unlock_irq(&ctx->wqh.lock);
- cpu_relax();
+
+ if (old_utmr) {
+
+ /*
+ * Retrieve interval and time remaining before
+ * next expiration of timer.
+ */
+
+ struct hrtimer *timer = &ctx->tmr;
+ ktime_t iv = ctx->tintv;
+ ktime_t now, remaining;
+
+ clockid = ctx->clockid;
+
+ memset(&oktmr, 0, sizeof(struct itimerspec));
+ if (iv.tv64)
+ oktmr.it_interval = ktime_to_timespec(iv);
+
+ now = timer->base->get_time();
+ ctx->overrun += hrtimer_forward(&ctx->tmr,
+ hrtimer_cb_get_time(&ctx->tmr),
+ ctx->tintv);
+ remaining = ktime_sub(timer->expires, now);
+ if (remaining.tv64 <= 0)
+ oktmr.it_value.tv_nsec = 1;
+ else
+ oktmr.it_value = ktime_to_timespec(remaining);
+
+ if (copy_to_user(old_utmr, &oktmr, sizeof (oktmr))) {
+ fput(file);
+ return -EFAULT;
+ }
}
- /*
- * Re-program the timer to the new value ...
- */
- timerfd_setup(ctx, clockid, flags, &ktmr);

- spin_unlock_irq(&ctx->wqh.lock);
+ if (utmr) {
+ /*
+ * Modify settings of existing timer.
+ * We need to stop the existing timer before
+ * reprogramming it to the new values.
+ */
+ for (;;) {
+ spin_lock_irq(&ctx->wqh.lock);
+ if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
+ break;
+ spin_unlock_irq(&ctx->wqh.lock);
+ cpu_relax();
+ }
+
+ /*
+ * Re-program the timer to the new value ...
+ */
+ timerfd_setup(ctx, ctx->clockid, flags, &ktmr);
+
+ spin_unlock_irq(&ctx->wqh.lock);
+ }
fput(file);
}

@@ -222,4 +320,3 @@
kfree(ctx);
return error;
}
-
diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/compat.h linux-2.6.23-rc1-mm2/include/linux/compat.h
--- linux-2.6.23-rc1-mm2-orig/include/linux/compat.h 2007-07-09 01:32:17.000000000 +0200
+++ linux-2.6.23-rc1-mm2/include/linux/compat.h 2007-08-05 17:46:33.000000000 +0200
@@ -265,7 +265,8 @@
const compat_sigset_t __user *sigmask,
compat_size_t sigsetsize);
asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
- const struct compat_itimerspec __user *utmr);
+ const struct compat_itimerspec __user *utmr,
+ struct compat_itimerspec __user *old_utmr);

#endif /* CONFIG_COMPAT */
#endif /* _LINUX_COMPAT_H */
diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h linux-2.6.23-rc1-mm2/include/linux/syscalls.h
--- linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h 2007-08-05 10:44:32.000000000 +0200
+++ linux-2.6.23-rc1-mm2/include/linux/syscalls.h 2007-08-05 17:47:15.000000000 +0200
@@ -608,7 +608,8 @@
asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node, struct getcpu_cache __user *cache);
asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t sizemask);
asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
- const struct itimerspec __user *utmr);
+ const struct itimerspec __user *utmr,
+ struct itimerspec __user *old_utmr);
asmlinkage long sys_eventfd(unsigned int count);
asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);



Attachments:
new_timerfd_errors_test.c (9.49 kB)

2007-08-13 23:29:28

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

On Thu, 09 Aug 2007 23:11:45 +0200 Michael Kerrisk wrote:

> Andrew,
>
> Here's my first shot at changing the timerfd() interface; patch
> is against 2.6.23-rc1-mm2.
>
> I think I got to the bottom of understanding the timer code in
> the end, but I may still have missed some things...
>
> This is my first kernel patch of any substance. Perhaps Randy
> or Christoph can give my toes a light toasting for the various
> boobs I've likely made.
>
> Thomas, would you please look over this patch to verify my
> timer code?
>
> Davide: would you please bless this interface change ;-).
> It makes it possible to:
>
> 1) Fetch the previous timer settings (i.e., interval, and time
> remaining until next expiration) while installing new settings
>
> 2) Retrieve the settings of an existing timer without changing
> the timer.
>
> Both of these features are supported by the older Unix timer APIs
> (setitimer/getitimer and timer_create/timer_settime/timer_gettime).

Hi,

OK, I'll make a few comments.

1. Provide a full changelog, including reasons why the patch is
needed. Don't assume that we have read the previous email threads.
(We haven't. :)


> The changes are as follows:
>
> a) A further argument, outmr, can be used to retrieve the interval
> and time remaining before the expiration of a previously created
> timer. Thus the call signature is now:
>
> timerfd(int utmr, int clockid, int flags,

ufd,

> struct itimerspec *utmr,
> struct itimerspec *outmr)
>
> The outmr argument:
>
> 1) must be NULL for a new timer (ufd == -1).
> 2) can be NULL if we are updating an existing
> timer (i.e., utmr != NULL), but we are not
> interested in the previous timer value.
> 3) can be used to retrieve the time until next timer
> expiration, without changing the timer.
> (In this case, utmr == NULL and ufd >= 0.)
>
> b) Make the 'clockid' immutable: it can only be set
> if 'ufd' is -1 -- that is, on the timerfd() call that
> first creates a timer. (This is effectively
> the limitation that is imposed by POSIX timers: the
> clockid is specified when the timer is created with
> timer_create(), and can't be changed.)
>
> [In the 2.6.22 interface, the clockid of an existing
> timer can be changed with a further call to timerfd()
> that specifies the file descriptor of an existing timer.]
>
> The use cases are as follows:
>
> ufd = timerfd(-1, clockid, flags, utmr, NULL);
> to create a new timer with given clockid, flags, and utmr (initial
> expiration + interval). outmr must be NULL.
>
> ufd = timerfd(ufd, -1, flags, utmr, NULL);
> to change the flags and timer settings of an existing timer.
> (The clockid argument serves no purpose when modifying an
> existing timer, and as I've implemented things, the argument
> must be specified as -1 for an existing timer.)
>
> ufd = timerfd(ufd, -1, flags, utmr, &outmr);
> to change the flags and timer settings of an existing timer, and
> retrieve the time until the next expiration of the timer (and
> the associated interval).
>
> ufd = timerfd(ufd, -1, 0, NULL, &outmr);
> Return the time until the next expiration of the timer (and the
> associated interval), without changing the existing timer settings.
> The flags argument has no meaning in this case, and must be
> specified as 0.
>
> Other changes:
>
> * I've added checks for various combinations of arguments values
> that could be deemed invalid; there is probably room for debate
> about whether we want all of these checks. Comments in the
> patch describe these checks.
>
> * Appropriate, and possibly even correct, changes made in fs/compat.c.
>
> * Jan Engelhardt noted that a cast could be removed from Davide's
> original code; I've removed it.
>
> The changes are correct as far as I've tested them. I've attached a
> test program. Davide, could you subject this to further test please?
>
> I'm off on holiday the day after tomorrow, back in two weeks, with
> very limited computer access. I may have time for another pass of
> the code if comments come in over(euro)night, otherwise Davide may
> be willing to clean up whatever I messed up...
>
> Cheers,
>
> Michael
>
> diff -ruN linux-2.6.23-rc1-mm2-orig/fs/compat.c linux-2.6.23-rc1-mm2/fs/compat.c
> --- linux-2.6.23-rc1-mm2-orig/fs/compat.c 2007-08-05 10:43:30.000000000 +0200
> +++ linux-2.6.23-rc1-mm2/fs/compat.c 2007-08-07 22:08:15.000000000 +0200
> @@ -2211,18 +2211,38 @@
> #ifdef CONFIG_TIMERFD
>
> asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
> - const struct compat_itimerspec __user *utmr)
> + const struct compat_itimerspec __user *utmr,
> + struct compat_itimerspec __user *old_utmr)
> {
> struct itimerspec t;
> struct itimerspec __user *ut;
> + long ret;
>
> - if (get_compat_itimerspec(&t, utmr))
> - return -EFAULT;
> - ut = compat_alloc_user_space(sizeof(*ut));
> - if (copy_to_user(ut, &t, sizeof(t)))
> - return -EFAULT;
> + /*:mtk: Framework taken from compat_sys_mq_getsetattr() */

Drop the :mtk: string(s).

> - return sys_timerfd(ufd, clockid, flags, ut);
> + ut = compat_alloc_user_space(2 * sizeof(*ut));
> +
> + if (utmr) {
> + if (get_compat_itimerspec(&t, utmr))
> + return -EFAULT;
> + if (copy_to_user(ut, &t, sizeof(t)))
> + return -EFAULT;
> + }
> +
> + ret = sys_timerfd(ufd, clockid, flags,
> + utmr ? ut : NULL,
> + old_utmr ? ut + 1 : NULL);
> +
> + if (ret)
> + return ret;
> +
> + if (old_utmr) {
> + if (copy_from_user(&t, old_ut, sizeof(t)) ||

I can't find <old_ut> anywhere. Should be old_utmr I guess.

> + put_compat_itimerspec(&t, old_utmr))

put_compat_itimerspec(old_utmr, &t))

IOW, I think that the parameters are reversed (at least this way
their types match the function prototype).

> + return -EFAULT;
> + }
> +
> + return 0;
> }
>
> #endif /* CONFIG_TIMERFD */
> diff -ruN linux-2.6.23-rc1-mm2-orig/fs/timerfd.c linux-2.6.23-rc1-mm2/fs/timerfd.c
> --- linux-2.6.23-rc1-mm2-orig/fs/timerfd.c 2007-08-05 10:44:24.000000000 +0200
> +++ linux-2.6.23-rc1-mm2/fs/timerfd.c 2007-08-09 22:11:25.000000000 +0200
> @@ -2,6 +2,8 @@
> * fs/timerfd.c
> *
> * Copyright (C) 2007 Davide Libenzi <[email protected]>
> + * and Copyright (C) 2007 Google, Inc.
> + * (Author: Michael Kerrisk <[email protected]>)
> *
> *
> * Thanks to Thomas Gleixner for code reviews and useful comments.
> @@ -26,6 +28,12 @@
> ktime_t tintv;
> wait_queue_head_t wqh;
> int expired;
> + int clockid; /* Established when timer is created, then
> + immutable */
> + int overrun; /* Number of overruns for this timer so far,
> + updated on calls to timerfd() that retrieve
> + settings of an existing timer, reset to zero
> + by timerfd_read() */
> };
>
> /*
> @@ -61,6 +69,7 @@
> hrtimer_init(&ctx->tmr, clockid, htmode);
> ctx->tmr.expires = texp;
> ctx->tmr.function = timerfd_tmrproc;
> + ctx->overrun = 0;
> if (texp.tv64 != 0)
> hrtimer_start(&ctx->tmr, texp, htmode);
> }
> @@ -130,10 +139,11 @@
> * callback to avoid DoS attacks specifying a very
> * short timer period.
> */
> - ticks = (u64)
> + ticks = ctx->overrun +
> hrtimer_forward(&ctx->tmr,
> hrtimer_cb_get_time(&ctx->tmr),
> ctx->tintv);
> + ctx->overrun = 0;
> hrtimer_restart(&ctx->tmr);
> } else
> ticks = 1;
> @@ -151,24 +161,69 @@
> };
>
> asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
> - const struct itimerspec __user *utmr)
> + const struct itimerspec __user *utmr,
> + struct itimerspec __user *old_utmr)
> {
> int error;
> struct timerfd_ctx *ctx;
> struct file *file;
> struct inode *inode;
> - struct itimerspec ktmr;
> + struct itimerspec ktmr, oktmr;
>
> - if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> - return -EFAULT;
> + /*
> + * Not sure if we really need the first check below -- it
> + * relies on all clocks having a non-negative value. That's
> + * currently true, but I'm not sure that it is anywhere
> + * documented as a requirement.
> + * (The point of the check is that clockid has no meaning if
> + * we are changing the settings of an existing timer
> + * (i.e., ufd >= 0), so let's require it to be an otherwise
> + * unused value.)
> + */
>
> - if (clockid != CLOCK_MONOTONIC &&
> - clockid != CLOCK_REALTIME)
> + if (ufd >= 0) {
> + if (clockid != -1)
> + return -EINVAL;
> + } else if (clockid != CLOCK_MONOTONIC && clockid != CLOCK_REALTIME)
> + return -EINVAL;
> +
> + /*
> + * Disallow any other bits in flags than those we implement.
> + */
> + if ((flags & ~(TFD_TIMER_ABSTIME)) != 0)
> + return -EINVAL;
> +
> + /*
> + * Not sure if this check is strictly necessary, except to stop

Use tab+space instead of spaces.

> + * userspace trying to do something silly. Flags only has meaning
> + * if we are creating/modifying a timer.
> + */
> + if (utmr == NULL && flags != 0)
> return -EINVAL;
> - if (!timespec_valid(&ktmr.it_value) ||
> - !timespec_valid(&ktmr.it_interval))
> +
> + /*
> + * If we are creating a new timer, then we must supply the setting
> + * and there is no previous timer setting to retrieve.
> + */
> + if (ufd == -1 && (utmr == NULL || old_utmr != NULL))
> + return -EINVAL;
> +
> + /*
> + * Caller must provide a new timer setting, or ask for previous
> + * setting, or both.
> + */
> + if (utmr == NULL && old_utmr == NULL)
> return -EINVAL;
>
> + if (utmr) {
> + if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> + return -EFAULT;
> +
> + if (!timespec_valid(&ktmr.it_value) ||
> + !timespec_valid(&ktmr.it_interval))
> + return -EINVAL;
> + }
> +
> if (ufd == -1) {
> ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
> if (!ctx)
> @@ -176,6 +231,11 @@
>
> init_waitqueue_head(&ctx->wqh);
>
> + /*
> + * Save the clockid since we need it later if we modify
> + * the timer.
> + */
> + ctx->clockid = clockid;
> timerfd_setup(ctx, clockid, flags, &ktmr);
>
> /*
> @@ -195,23 +255,61 @@
> fput(file);
> return -EINVAL;
> }
> - /*
> - * We need to stop the existing timer before reprogramming
> - * it to the new values.
> - */
> - for (;;) {
> - spin_lock_irq(&ctx->wqh.lock);
> - if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> - break;
> - spin_unlock_irq(&ctx->wqh.lock);
> - cpu_relax();
> +
> + if (old_utmr) {
> +
> + /*
> + * Retrieve interval and time remaining before
> + * next expiration of timer.
> + */
> +
> + struct hrtimer *timer = &ctx->tmr;
> + ktime_t iv = ctx->tintv;
> + ktime_t now, remaining;
> +
> + clockid = ctx->clockid;
> +
> + memset(&oktmr, 0, sizeof(struct itimerspec));
> + if (iv.tv64)
> + oktmr.it_interval = ktime_to_timespec(iv);
> +
> + now = timer->base->get_time();
> + ctx->overrun += hrtimer_forward(&ctx->tmr,
> + hrtimer_cb_get_time(&ctx->tmr),
> + ctx->tintv);
> + remaining = ktime_sub(timer->expires, now);
> + if (remaining.tv64 <= 0)
> + oktmr.it_value.tv_nsec = 1;
> + else
> + oktmr.it_value = ktime_to_timespec(remaining);
> +
> + if (copy_to_user(old_utmr, &oktmr, sizeof (oktmr))) {
> + fput(file);
> + return -EFAULT;
> + }
> }
> - /*
> - * Re-program the timer to the new value ...
> - */
> - timerfd_setup(ctx, clockid, flags, &ktmr);
>
> - spin_unlock_irq(&ctx->wqh.lock);
> + if (utmr) {
> + /*
> + * Modify settings of existing timer.
> + * We need to stop the existing timer before
> + * reprogramming it to the new values.
> + */
> + for (;;) {
> + spin_lock_irq(&ctx->wqh.lock);
> + if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> + break;
> + spin_unlock_irq(&ctx->wqh.lock);
> + cpu_relax();
> + }
> +
> + /*
> + * Re-program the timer to the new value ...
> + */
> + timerfd_setup(ctx, ctx->clockid, flags, &ktmr);
> +
> + spin_unlock_irq(&ctx->wqh.lock);
> + }
> fput(file);
> }
>
> @@ -222,4 +320,3 @@
> kfree(ctx);
> return error;
> }
> -
> diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/compat.h linux-2.6.23-rc1-mm2/include/linux/compat.h
> --- linux-2.6.23-rc1-mm2-orig/include/linux/compat.h 2007-07-09 01:32:17.000000000 +0200
> +++ linux-2.6.23-rc1-mm2/include/linux/compat.h 2007-08-05 17:46:33.000000000 +0200
> @@ -265,7 +265,8 @@
> const compat_sigset_t __user *sigmask,
> compat_size_t sigsetsize);
> asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
> - const struct compat_itimerspec __user *utmr);
> + const struct compat_itimerspec __user *utmr,
> + struct compat_itimerspec __user *old_utmr);
>
> #endif /* CONFIG_COMPAT */
> #endif /* _LINUX_COMPAT_H */
> diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h linux-2.6.23-rc1-mm2/include/linux/syscalls.h
> --- linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h 2007-08-05 10:44:32.000000000 +0200
> +++ linux-2.6.23-rc1-mm2/include/linux/syscalls.h 2007-08-05 17:47:15.000000000 +0200
> @@ -608,7 +608,8 @@
> asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node, struct getcpu_cache __user *cache);
> asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t sizemask);
> asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
> - const struct itimerspec __user *utmr);
> + const struct itimerspec __user *utmr,
> + struct itimerspec __user *old_utmr);
> asmlinkage long sys_eventfd(unsigned int count);
> asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-08-15 14:40:27

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

Sorry for the late commentary...as I looked this over, one thing popped
into my mind....

> b) Make the 'clockid' immutable: it can only be set
> if 'ufd' is -1 -- that is, on the timerfd() call that
> first creates a timer.

timerfd() is looking increasingly like a multiplexor system call in
disguise. There is no form of invocation which uses all of the
arguments. Are we sure we wouldn't rather have something like:

timerfd_open(clock);
timerfd_adjust(fd, new_time, old_time);

?

jon

2007-08-25 06:41:30

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

[resend, since LKML didn't like last send.]

Randy,

Thanks for the input. I will try to send a revised patch after I get back
from holiday. Some comments below.

Davide -- ping! Can you please offer your comments about this change, and
also thoughts on Jon's and my comments about a more radical API change
later in this thread.

Cheers,

Michael


Cheers,

Michael

On 8/14/07, Randy Dunlap <[email protected]> wrote:
>
> On Thu, 09 Aug 2007 23:11:45 +0200 Michael Kerrisk wrote:
>
> > Andrew,
> >
> > Here's my first shot at changing the timerfd() interface; patch
> > is against 2.6.23-rc1-mm2.
> >
> > I think I got to the bottom of understanding the timer code in
> > the end, but I may still have missed some things...
> >
> > This is my first kernel patch of any substance. Perhaps Randy
> > or Christoph can give my toes a light toasting for the various
> > boobs I've likely made.
> >
> > Thomas, would you please look over this patch to verify my
> > timer code?
> >
> > Davide: would you please bless this interface change ;-).
> > It makes it possible to:
> >
> > 1) Fetch the previous timer settings (i.e., interval, and time
> > remaining until next expiration) while installing new settings
> >
> > 2) Retrieve the settings of an existing timer without changing
> > the timer.
> >
> > Both of these features are supported by the older Unix timer APIs
> > (setitimer/getitimer and timer_create/timer_settime/timer_gettime).
>
> Hi,
>
> OK, I'll make a few comments.
>
> 1. Provide a full changelog, including reasons why the patch is
> needed. Don't assume that we have read the previous email threads.
> (We haven't. :)


Okay -- will do for a future revision of this patch.

In the meantime, the most relevant earlier mail thread is this:

http://article.gmane.org/gmane.linux.kernel/559193

The following are also of some relevance:

http://article.gmane.org/gmane.linux.kernel/556043
http://article.gmane.org/gmane.linux.kernel/556124

> The changes are as follows:
> >
> > a) A further argument, outmr, can be used to retrieve the interval
> > and time remaining before the expiration of a previously created
> > timer. Thus the call signature is now:
> >
> > timerfd(int utmr, int clockid, int flags,
>
> ufd,

Yes.

> struct itimerspec *utmr,
> > struct itimerspec *outmr)
> >
> > The outmr argument:
> >
> > 1) must be NULL for a new timer (ufd == -1).
> > 2) can be NULL if we are updating an existing
> > timer (i.e., utmr != NULL), but we are not
> > interested in the previous timer value.
> > 3) can be used to retrieve the time until next timer
> > expiration, without changing the timer.
> > (In this case, utmr == NULL and ufd >= 0.)
> >
> > b) Make the 'clockid' immutable: it can only be set
> > if 'ufd' is -1 -- that is, on the timerfd() call that
> > first creates a timer. (This is effectively
> > the limitation that is imposed by POSIX timers: the
> > clockid is specified when the timer is created with
> > timer_create(), and can't be changed.)
> >
> > [In the 2.6.22 interface, the clockid of an existing
> > timer can be changed with a further call to timerfd()
> > that specifies the file descriptor of an existing timer.]
> >
> > The use cases are as follows:
> >
> > ufd = timerfd(-1, clockid, flags, utmr, NULL);
> > to create a new timer with given clockid, flags, and utmr (initial
> > expiration + interval). outmr must be NULL.
> >
> > ufd = timerfd(ufd, -1, flags, utmr, NULL);
> > to change the flags and timer settings of an existing timer.
> > (The clockid argument serves no purpose when modifying an
> > existing timer, and as I've implemented things, the argument
> > must be specified as -1 for an existing timer.)
> >
> > ufd = timerfd(ufd, -1, flags, utmr, &outmr);
> > to change the flags and timer settings of an existing timer, and
> > retrieve the time until the next expiration of the timer (and
> > the associated interval).
> >
> > ufd = timerfd(ufd, -1, 0, NULL, &outmr);
> > Return the time until the next expiration of the timer (and the
> > associated interval), without changing the existing timer settings.
> > The flags argument has no meaning in this case, and must be
> > specified as 0.
> >
> > Other changes:
> >
> > * I've added checks for various combinations of arguments values
> > that could be deemed invalid; there is probably room for debate
> > about whether we want all of these checks. Comments in the
> > patch describe these checks.
> >
> > * Appropriate, and possibly even correct, changes made in fs/compat.c.
> >
> > * Jan Engelhardt noted that a cast could be removed from Davide's
> > original code; I've removed it.
> >
> > The changes are correct as far as I've tested them. I've attached a
> > test program. Davide, could you subject this to further test please?
> >
> > I'm off on holiday the day after tomorrow, back in two weeks, with
> > very limited computer access. I may have time for another pass of
> > the code if comments come in over(euro)night, otherwise Davide may
> > be willing to clean up whatever I messed up...
> >
> > Cheers,
> >
> > Michael
> >
> > diff -ruN linux-2.6.23-rc1-mm2-orig/fs/compat.c linux-2.6.23-rc1-mm2
> /fs/compat.c
> > --- linux-2.6.23-rc1-mm2-orig/fs/compat.c 2007-08-05 10:43:
> 30.000000000 +0200
> > +++ linux-2.6.23-rc1-mm2/fs/compat.c 2007-08-07 22:08:15.000000000+0200
> > @@ -2211,18 +2211,38 @@
> > #ifdef CONFIG_TIMERFD
> >
> > asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
> > - const struct compat_itimerspec __user
> *utmr)
> > + const struct compat_itimerspec __user *utmr,
> > + struct compat_itimerspec __user *old_utmr)
> > {
> > struct itimerspec t;
> > struct itimerspec __user *ut;
> > + long ret;
> >
> > - if (get_compat_itimerspec(&t, utmr))
> > - return -EFAULT;
> > - ut = compat_alloc_user_space(sizeof(*ut));
> > - if (copy_to_user(ut, &t, sizeof(t)))
> > - return -EFAULT;
> > + /*:mtk: Framework taken from compat_sys_mq_getsetattr() */
>
> Drop the :mtk: string(s).


Yes, that was just a marker to myself for comments that should eventually
be removed.

> - return sys_timerfd(ufd, clockid, flags, ut);
> > + ut = compat_alloc_user_space(2 * sizeof(*ut));
> > +
> > + if (utmr) {
> > + if (get_compat_itimerspec(&t, utmr))
> > + return -EFAULT;
> > + if (copy_to_user(ut, &t, sizeof(t)))
> > + return -EFAULT;
> > + }
> > +
> > + ret = sys_timerfd(ufd, clockid, flags,
> > + utmr ? ut : NULL,
> > + old_utmr ? ut + 1 : NULL);
> > +
> > + if (ret)
> > + return ret;
> > +
> > + if (old_utmr) {
> > + if (copy_from_user(&t, old_ut, sizeof(t)) ||
>
> I can't find <old_ut> anywhere. Should be old_utmr I guess.
>
> > + put_compat_itimerspec(&t, old_utmr))
>
> put_compat_itimerspec(old_utmr, &t))
>
> IOW, I think that the parameters are reversed (at least this way
> their types match the function prototype).


I'm not sure what happened there. I had a compiled, working kernel with the patch. Somehow I've messed up while making the patch, I guess.

> + return -EFAULT;
> > + }
> > +
> > + return 0;
> > }
> >
> > #endif /* CONFIG_TIMERFD */
> > diff -ruN linux-2.6.23-rc1-mm2-orig/fs/timerfd.c linux-2.6.23-rc1-mm2
> /fs/timerfd.c
> > --- linux-2.6.23-rc1-mm2-orig/fs/timerfd.c 2007-08-05 10:44:
> 24.000000000 +0200
> > +++ linux-2.6.23-rc1-mm2/fs/timerfd.c 2007-08-09 22:11:25.000000000+0200
> > @@ -2,6 +2,8 @@
> > * fs/timerfd.c
> > *
> > * Copyright (C) 2007 Davide Libenzi <[email protected]>
> > + * and Copyright (C) 2007 Google, Inc.
> > + * (Author: Michael Kerrisk <[email protected]>)
> > *
> > *
> > * Thanks to Thomas Gleixner for code reviews and useful comments.
> > @@ -26,6 +28,12 @@
> > ktime_t tintv;
> > wait_queue_head_t wqh;
> > int expired;
> > + int clockid; /* Established when timer is created, then
> > + immutable */
> > + int overrun; /* Number of overruns for this timer so far,
> > + updated on calls to timerfd() that retrieve
> > + settings of an existing timer, reset to zero
> > + by timerfd_read() */
> > };
> >
> > /*
> > @@ -61,6 +69,7 @@
> > hrtimer_init(&ctx->tmr, clockid, htmode);
> > ctx->tmr.expires = texp;
> > ctx->tmr.function = timerfd_tmrproc;
> > + ctx->overrun = 0;
> > if (texp.tv64 != 0)
> > hrtimer_start(&ctx->tmr, texp, htmode);
> > }
> > @@ -130,10 +139,11 @@
> > * callback to avoid DoS attacks specifying a very
> > * short timer period.
> > */
> > - ticks = (u64)
> > + ticks = ctx->overrun +
> > hrtimer_forward(&ctx->tmr,
> >
> hrtimer_cb_get_time(&ctx->tmr),
> > ctx->tintv);
> > + ctx->overrun = 0;
> > hrtimer_restart(&ctx->tmr);
> > } else
> > ticks = 1;
> > @@ -151,24 +161,69 @@
> > };
> >
> > asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
> > - const struct itimerspec __user *utmr)
> > + const struct itimerspec __user *utmr,
> > + struct itimerspec __user *old_utmr)
> > {
> > int error;
> > struct timerfd_ctx *ctx;
> > struct file *file;
> > struct inode *inode;
> > - struct itimerspec ktmr;
> > + struct itimerspec ktmr, oktmr;
> >
> > - if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> > - return -EFAULT;
> > + /*
> > + * Not sure if we really need the first check below -- it
> > + * relies on all clocks having a non-negative value. That's
> > + * currently true, but I'm not sure that it is anywhere
> > + * documented as a requirement.
> > + * (The point of the check is that clockid has no meaning if
> > + * we are changing the settings of an existing timer
> > + * (i.e., ufd >= 0), so let's require it to be an otherwise
> > + * unused value.)
> > + */
> >
> > - if (clockid != CLOCK_MONOTONIC &&
> > - clockid != CLOCK_REALTIME)
> > + if (ufd >= 0) {
> > + if (clockid != -1)
> > + return -EINVAL;
> > + } else if (clockid != CLOCK_MONOTONIC && clockid !=
> CLOCK_REALTIME)
> > + return -EINVAL;
> > +
> > + /*
> > + * Disallow any other bits in flags than those we implement.
> > + */
> > + if ((flags & ~(TFD_TIMER_ABSTIME)) != 0)
> > + return -EINVAL;
> > +
> > + /*
> > + * Not sure if this check is strictly necessary, except to stop
>
> Use tab+space instead of spaces.


Doh! Yes.

> + * userspace trying to do something silly. Flags only has meaning
> > + * if we are creating/modifying a timer.
> > + */
> > + if (utmr == NULL && flags != 0)
> > return -EINVAL;
> > - if (!timespec_valid(&ktmr.it_value) ||
> > - !timespec_valid(&ktmr.it_interval))
> > +
> > + /*
> > + * If we are creating a new timer, then we must supply the setting
> > + * and there is no previous timer setting to retrieve.
> > + */
> > + if (ufd == -1 && (utmr == NULL || old_utmr != NULL))
> > + return -EINVAL;
> > +
> > + /*
> > + * Caller must provide a new timer setting, or ask for previous
> > + * setting, or both.
> > + */
> > + if (utmr == NULL && old_utmr == NULL)
> > return -EINVAL;
> >
> > + if (utmr) {
> > + if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> > + return -EFAULT;
> > +
> > + if (!timespec_valid(&ktmr.it_value) ||
> > + !timespec_valid(&ktmr.it_interval))
> > + return -EINVAL;
> > + }
> > +
> > if (ufd == -1) {
> > ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
> > if (!ctx)
> > @@ -176,6 +231,11 @@
> >
> > init_waitqueue_head(&ctx->wqh);
> >
> > + /*
> > + * Save the clockid since we need it later if we modify
> > + * the timer.
> > + */
> > + ctx->clockid = clockid;
> > timerfd_setup(ctx, clockid, flags, &ktmr);
> >
> > /*
> > @@ -195,23 +255,61 @@
> > fput(file);
> > return -EINVAL;
> > }
> > - /*
> > - * We need to stop the existing timer before reprogramming
> > - * it to the new values.
> > - */
> > - for (;;) {
> > - spin_lock_irq(&ctx->wqh.lock);
> > - if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> > - break;
> > - spin_unlock_irq(&ctx->wqh.lock);
> > - cpu_relax();
> > +
> > + if (old_utmr) {
> > +
> > + /*
> > + * Retrieve interval and time remaining before
> > + * next expiration of timer.
> > + */
> > +
> > + struct hrtimer *timer = &ctx->tmr;
> > + ktime_t iv = ctx->tintv;
> > + ktime_t now, remaining;
> > +
> > + clockid = ctx->clockid;
> > +
> > + memset(&oktmr, 0, sizeof(struct itimerspec));
> > + if (iv.tv64)
> > + oktmr.it_interval = ktime_to_timespec(iv);
> > +
> > + now = timer->base->get_time();
> > + ctx->overrun += hrtimer_forward(&ctx->tmr,
> > + hrtimer_cb_get_time(&ctx->tmr),
> > + ctx->tintv);
> > + remaining = ktime_sub(timer->expires, now);
> > + if (remaining.tv64 <= 0)
> > + oktmr.it_value.tv_nsec = 1;
> > + else
> > + oktmr.it_value =
> ktime_to_timespec(remaining);
> > +
> > + if (copy_to_user(old_utmr, &oktmr, sizeof
> (oktmr))) {
> > + fput(file);
> > + return -EFAULT;
> > + }
> > }
> > - /*
> > - * Re-program the timer to the new value ...
> > - */
> > - timerfd_setup(ctx, clockid, flags, &ktmr);
> >
> > - spin_unlock_irq(&ctx->wqh.lock);
> > + if (utmr) {
> > + /*
> > + * Modify settings of existing timer.
> > + * We need to stop the existing timer before
> > + * reprogramming it to the new values.
> > + */
> > + for (;;) {
> > + spin_lock_irq(&ctx->wqh.lock);
> > + if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> > + break;
> > + spin_unlock_irq(&ctx->wqh.lock);
> > + cpu_relax();
> > + }
> > +
> > + /*
> > + * Re-program the timer to the new value ...
> > + */
> > + timerfd_setup(ctx, ctx->clockid, flags, &ktmr);
> > +
> > + spin_unlock_irq(&ctx->wqh.lock);
> > + }
> > fput(file);
> > }
> >
> > @@ -222,4 +320,3 @@
> > kfree(ctx);
> > return error;
> > }
> > -
> > diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/compat.h
> linux-2.6.23-rc1-mm2/include/linux/compat.h
> > --- linux-2.6.23-rc1-mm2-orig/include/linux/compat.h 2007-07-09 01:32:
> 17.000000000 +0200
> > +++ linux-2.6.23-rc1-mm2/include/linux/compat.h 2007-08-05 17:46:
> 33.000000000 +0200
> > @@ -265,7 +265,8 @@
> > const compat_sigset_t __user *sigmask,
> > compat_size_t sigsetsize);
> > asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
> > - const struct compat_itimerspec __user
> *utmr);
> > + const struct compat_itimerspec __user *utmr,
> > + struct compat_itimerspec __user *old_utmr);
> >
> > #endif /* CONFIG_COMPAT */
> > #endif /* _LINUX_COMPAT_H */
> > diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h
> linux-2.6.23-rc1-mm2/include/linux/syscalls.h
> > --- linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h 2007-08-05
> 10:44:32.000000000 +0200
> > +++ linux-2.6.23-rc1-mm2/include/linux/syscalls.h 2007-08-05 17:47:
> 15.000000000 +0200
> > @@ -608,7 +608,8 @@
> > asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node,
> struct getcpu_cache __user *cache);
> > asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask,
> size_t sizemask);
> > asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
> > - const struct itimerspec __user *utmr);
> > + const struct itimerspec __user *utmr,
> > + struct itimerspec __user *old_utmr);
> > asmlinkage long sys_eventfd(unsigned int count);
> > asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t
> len);
>
>
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code
> ***
>

2007-08-30 12:09:45

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

On Sat, 25 Aug 2007, Michael Kerrisk wrote:

> Davide -- ping! Can you please offer your comments about this change, and
> also thoughts on Jon's and my comments about a more radical API change
> later in this thread.

IMO the complexity of the resulting API (and resulting patch), and the ABI
change, is not justified by the added value.



- Davide


2007-09-04 08:06:22

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

Davide,

>> Davide -- ping! Can you please offer your comments about this change, and
>> also thoughts on Jon's and my comments about a more radical API change
>> later in this thread.
>
> IMO the complexity of the resulting API (and resulting patch), and the ABI
> change, is not justified by the added value.

Neither of the proposed APIs (either my multiplexed version of timerfd()
or Jon's/my idea of using three system calls (like POSIX timers), or
the notion of timerfd() integrated with POSIX timers) is more
complicated than the existing POSIX timers API.

The ABI change doesn't really matter, since timerfd() was broken in 2.6.22
anyway.

Both previous APIs provided the features I have described provide:

* the ability to fetch the old timer value when applying
a new setting

* the ability to non-destructively fetch the amount of time remaining
on a timer.

This is clearly useful for timers -- but you have not explained why
you think this is not necessary for timerfd timers.

Please -- let's do timerfd() better. Either three syscalls like:

timerfd_create()
timerfd_settime()
timer_gettime()

(the analogs of timer_create(), timer_settime(), timer_gettime()).

Or (if possible, and even better) timerfd() integrated with POSIX timers.

Then we have a good API for the coming decades. I'm prepared to help
out with patches (for what my help is worth ;-)).

Cheers,

Michael

--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance? Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages/
read the HOWTOHELP file and grep the source files for 'FIXME'.

2007-09-04 08:19:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

> On Tue, 04 Sep 2007 10:03:56 +0200 Michael Kerrisk <[email protected]> wrote:
> Davide,
>
> >> Davide -- ping! Can you please offer your comments about this change, and
> >> also thoughts on Jon's and my comments about a more radical API change
> >> later in this thread.
> >
> > IMO the complexity of the resulting API (and resulting patch), and the ABI
> > change, is not justified by the added value.
>
> Neither of the proposed APIs (either my multiplexed version of timerfd()
> or Jon's/my idea of using three system calls (like POSIX timers), or
> the notion of timerfd() integrated with POSIX timers) is more
> complicated than the existing POSIX timers API.
>
> The ABI change doesn't really matter, since timerfd() was broken in 2.6.22
> anyway.
>
> Both previous APIs provided the features I have described provide:
>
> * the ability to fetch the old timer value when applying
> a new setting
>
> * the ability to non-destructively fetch the amount of time remaining
> on a timer.
>
> This is clearly useful for timers -- but you have not explained why
> you think this is not necessary for timerfd timers.

<wakes up>

I'd have thought that the existing stuff would be near-useless without the
capabilities which you describe?


2007-09-04 08:24:42

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

[...]
> > Neither of the proposed APIs (either my multiplexed version of
> > timerfd()
> > or Jon's/my idea of using three system calls (like POSIX timers), or
> > the notion of timerfd() integrated with POSIX timers) is more
> > complicated than the existing POSIX timers API.
> >
> > The ABI change doesn't really matter, since timerfd() was broken in
> > 2.6.22 anyway.
> >
> > Both previous APIs provided the features I have described provide:
> >
> > * the ability to fetch the old timer value when applying
> > a new setting
> >
> > * the ability to non-destructively fetch the amount of time remaining
> > on a timer.
> >
> > This is clearly useful for timers -- but you have not explained why
> > you think this is not necessary for timerfd timers.
>
> <wakes up>
>
> I'd have thought that the existing stuff would be near-useless without
> the capabilities which you describe?

Not useless, but certainly less functional than it can/should be
(and with not too much effort on the kernel implementation side).

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-09-04 15:25:46

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

On Tue, 4 Sep 2007, Andrew Morton wrote:

> > On Tue, 04 Sep 2007 10:03:56 +0200 Michael Kerrisk <[email protected]> wrote:
> > Davide,
> >
> > >> Davide -- ping! Can you please offer your comments about this change, and
> > >> also thoughts on Jon's and my comments about a more radical API change
> > >> later in this thread.
> > >
> > > IMO the complexity of the resulting API (and resulting patch), and the ABI
> > > change, is not justified by the added value.
> >
> > Neither of the proposed APIs (either my multiplexed version of timerfd()
> > or Jon's/my idea of using three system calls (like POSIX timers), or
> > the notion of timerfd() integrated with POSIX timers) is more
> > complicated than the existing POSIX timers API.
> >
> > The ABI change doesn't really matter, since timerfd() was broken in 2.6.22
> > anyway.
> >
> > Both previous APIs provided the features I have described provide:
> >
> > * the ability to fetch the old timer value when applying
> > a new setting
> >
> > * the ability to non-destructively fetch the amount of time remaining
> > on a timer.
> >
> > This is clearly useful for timers -- but you have not explained why
> > you think this is not necessary for timerfd timers.
>
> <wakes up>
>
> I'd have thought that the existing stuff would be near-useless without the
> capabilities which you describe?

Useless like it'd be a motorcycle w/out a cup-holder :)
Seriously, the ability to get the previous values from "something" could
have a meaning if this something is a shared global resource (like signals
for example). In the timerfd case this makes little sense, since you can
create as many timerfd as you like and you do not need to share a single
one by changing/restoring the original context.
On top of that, the cup-holder addition would cost in terms of API clarity
(or in terms of two additional system calls in the other case), and in
terms of kernel code footprint. Costs that IMO are not balanced by the
added values.



- Davide


2007-09-04 15:39:50

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

Hi Davide,

> > <wakes up>
> >
> > I'd have thought that the existing stuff would be near-useless without
> > the capabilities which you describe?
>
> Useless like it'd be a motorcycle w/out a cup-holder :)
> Seriously, the ability to get the previous values from "something" could
> have a meaning if this something is a shared global resource (like
> signals
> for example). In the timerfd case this makes little sense, since you can
> create as many timerfd as you like and you do not need to share a single
> one by changing/restoring the original context.

However, one can have multipe POSIX timers, just as you can
have multiple timerfd timers; nevertheless POSIX timers provide
the get and get-while-setting functionality.

> On top of that, the cup-holder addition would cost in terms of API
> clarity

I agree my proposed API is not as clean as it could be, that's why
I would favour:

> (or in terms of two additional system calls in the other case),

Or better still, have timerfd() integrated with POSIX tiemrs (if
this is feasible). This givesus a simple API, exactly one new
syscall, and all of the functionality of the existing POSIX
timers API.

> and in terms of kernel code footprint.

Not sure what your concern is here. The total amount of
new code for all of these options is pretty small.

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-09-04 20:49:44

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

> > > The ABI change doesn't really matter, since timerfd() was broken in
> > > 2.6.22 anyway.
> > >
> > > Both previous APIs provided the features I have described provide:
> > >
> > > * the ability to fetch the old timer value when applying
> > > a new setting
> > >
> > > * the ability to non-destructively fetch the amount of time remaining
> > > on a timer.
> > >
> > > This is clearly useful for timers -- but you have not explained why
> > > you think this is not necessary for timerfd timers.
> >
> > <wakes up>
> >
> > I'd have thought that the existing stuff would be near-useless without
> > the capabilities which you describe?
>
> Useless like it'd be a motorcycle w/out a cup-holder :)
> Seriously, the ability to get the previous values from "something" could
> have a meaning if this something is a shared global resource (like
> signals
> for example). In the timerfd case this makes little sense, since you can
> create as many timerfd as you like and you do not need to share a single
> one by changing/restoring the original context.

Davide,

As I think about this more, I see more problems with
your argument. timerfd needs the ability to get and
get-while-setting just as much as the earlier APIs.
Consider a library that creates a timerfd file descriptor that
is handed off to an application: that library may want
to modify the timer settings without having to create a
new file descriptor (the app mey not be able to be told about
the new fd). Your argument just doesn't hold, AFAICS.

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-09-04 22:41:43

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

On Tue, 4 Sep 2007, Michael Kerrisk wrote:

> Hi Davide,
>
> > > <wakes up>
> > >
> > > I'd have thought that the existing stuff would be near-useless without
> > > the capabilities which you describe?
> >
> > Useless like it'd be a motorcycle w/out a cup-holder :)
> > Seriously, the ability to get the previous values from "something" could
> > have a meaning if this something is a shared global resource (like
> > signals
> > for example). In the timerfd case this makes little sense, since you can
> > create as many timerfd as you like and you do not need to share a single
> > one by changing/restoring the original context.
>
> However, one can have multipe POSIX timers, just as you can
> have multiple timerfd timers; nevertheless POSIX timers provide
> the get and get-while-setting functionality.

The fact that POSIX defined a certain API in a given way, does not
automatically mean that every other API has to look exactly like that.
POSIX has the tendency to bloat things up at times ;)



> > and in terms of kernel code footprint.
>
> Not sure what your concern is here. The total amount of
> new code for all of these options is pretty small.

>From your patch:

fs/compat.c | 34 ++++++++--
fs/timerfd.c | 147 +++++++++++++++++++++++++++++++++++++++--------
include/linux/compat.h | 3
include/linux/syscalls.h | 3
4 files changed, 153 insertions(+), 34 deletions(-)

And the API definition becomes pretty messy. The other way is to add new
system calls. 120+ lines of code more of new system calls wouldn't even be
a problem in itself, if the added value was there.
IMO, as I already said, the added value does not justify them.



- Davide


2007-09-04 22:45:03

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

On Tue, 4 Sep 2007, Michael Kerrisk wrote:

> > Useless like it'd be a motorcycle w/out a cup-holder :)
> > Seriously, the ability to get the previous values from "something" could
> > have a meaning if this something is a shared global resource (like
> > signals
> > for example). In the timerfd case this makes little sense, since you can
> > create as many timerfd as you like and you do not need to share a single
> > one by changing/restoring the original context.
>
> Davide,
>
> As I think about this more, I see more problems with
> your argument. timerfd needs the ability to get and
> get-while-setting just as much as the earlier APIs.
> Consider a library that creates a timerfd file descriptor that
> is handed off to an application: that library may want
> to modify the timer settings without having to create a
> new file descriptor (the app mey not be able to be told about
> the new fd). Your argument just doesn't hold, AFAICS.

Such hypotethical library, in case it really wanted to offer such
functionality, could simply return an handle instead of the raw fd, and
take care of all that stuff in userspace.
Again, mimicking POSIX APIs doesn't always take you in the right place.



- Davide


2007-09-05 00:08:41

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

Davide,

> > As I think about this more, I see more problems with
> > your argument. timerfd needs the ability to get and
> > get-while-setting just as much as the earlier APIs.
> > Consider a library that creates a timerfd file descriptor that
> > is handed off to an application: that library may want
> > to modify the timer settings without having to create a
> > new file descriptor (the app mey not be able to be told about
> > the new fd). Your argument just doesn't hold, AFAICS.
>
> Such hypotethical library, in case it really wanted to offer such
> functionality, could simply return an handle instead of the raw fd, and
> take care of all that stuff in userspace.

Did I miss something? Is it not the case that as soon as the
library returns a handle, rather than an fd, then the whole
advantage of timerfd() (being able to select/poll/epoll on
the timer as well as other fds) is lost?

> Again, mimicking POSIX APIs doesn't always take you in the right place.

POSIX may goof in places, but in general it is the result of
many smart people thinking about how to design/standardize APIs.
So the onus is on us to explain why they got this point wrong.
And it is not merely POSIX that did things things in the
way I've described: so did the earlier setitimer()/getitimer().

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-09-05 12:04:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

> On Wed, 05 Sep 2007 02:08:31 +0200 "Michael Kerrisk" <[email protected]> wrote:
> Davide,
>
> > > As I think about this more, I see more problems with
> > > your argument. timerfd needs the ability to get and
> > > get-while-setting just as much as the earlier APIs.
> > > Consider a library that creates a timerfd file descriptor that
> > > is handed off to an application: that library may want
> > > to modify the timer settings without having to create a
> > > new file descriptor (the app mey not be able to be told about
> > > the new fd). Your argument just doesn't hold, AFAICS.
> >
> > Such hypotethical library, in case it really wanted to offer such
> > functionality, could simply return an handle instead of the raw fd, and
> > take care of all that stuff in userspace.
>
> Did I miss something? Is it not the case that as soon as the
> library returns a handle, rather than an fd, then the whole
> advantage of timerfd() (being able to select/poll/epoll on
> the timer as well as other fds) is lost?
>
> > Again, mimicking POSIX APIs doesn't always take you in the right place.
>
> POSIX may goof in places, but in general it is the result of
> many smart people thinking about how to design/standardize APIs.
> So the onus is on us to explain why they got this point wrong.
> And it is not merely POSIX that did things things in the
> way I've described: so did the earlier setitimer()/getitimer().
>

<wakes up even more>

Seems that we need to pay some more attention to this, as time is pressing
and we should get this stuff finalised in 2.6.23.

Which means that most of us have some catching-up to do.

Michael, could you please refresh our memories with a brief, from-scratch
summary of what the current interface is, followed by a summary of what you
believe to be the shortcomings to be?

2007-09-05 12:12:39

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

On Tuesday 04 September 2007 16:25, Davide Libenzi wrote:
> On Tue, 4 Sep 2007, Andrew Morton wrote:
>
> > > On Tue, 04 Sep 2007 10:03:56 +0200 Michael Kerrisk <[email protected]> wrote:
> > > Davide,
> > >
> > > >> Davide -- ping! Can you please offer your comments about this change, and
> > > >> also thoughts on Jon's and my comments about a more radical API change
> > > >> later in this thread.
> > > >
> > > > IMO the complexity of the resulting API (and resulting patch), and the ABI
> > > > change, is not justified by the added value.
> > >
> > > Neither of the proposed APIs (either my multiplexed version of timerfd()
> > > or Jon's/my idea of using three system calls (like POSIX timers), or
> > > the notion of timerfd() integrated with POSIX timers) is more
> > > complicated than the existing POSIX timers API.
> > >
> > > The ABI change doesn't really matter, since timerfd() was broken in 2.6.22
> > > anyway.
> > >
> > > Both previous APIs provided the features I have described provide:
> > >
> > > * the ability to fetch the old timer value when applying
> > > a new setting
> > >
> > > * the ability to non-destructively fetch the amount of time remaining
> > > on a timer.
> > >
> > > This is clearly useful for timers -- but you have not explained why
> > > you think this is not necessary for timerfd timers.
> >
> > <wakes up>
> >
> > I'd have thought that the existing stuff would be near-useless without the
> > capabilities which you describe?
>
> Useless like it'd be a motorcycle w/out a cup-holder :)
> Seriously, the ability to get the previous values from "something" could
> have a meaning if this something is a shared global resource (like signals
> for example). In the timerfd case this makes little sense, since you can
> create as many timerfd as you like and you do not need to share a single
> one by changing/restoring the original context.

I think at least ability to read remaining time from a timerfd is needed.
--
vda

2007-09-05 15:32:17

by Michael Kerrisk

[permalink] [raw]
Subject: timerfd redux

[Was: Re: [PATCH] Revised timerfd() interface]

> Michael, could you please refresh our memories with a brief,
> from-scratch summary of what the current interface is, followed
> by a summary of what you believe to be the shortcomings to be?

Andrew,

I'll break this up into parts:

1. the existing timerfd interface
2. timerfd limitations
3. possible solutions
a) Add an argument
b) Create an interface similar to POSIX timers
c) Integrate timerfd with POSIX timers

Cheers,

Michael


1: the existing timerfd interface
=================================

In 2.6.22, Davide added timerfd() with the following interface:

returned_fd = timerfd(int fd, int clockid, int flags,
struct itimerspec *utimer);

If fd is -1, a new timer is created and started. The syscall
returns a file descriptor for the timer. 'utimer' specifies
the initial expiration and interval of the timer.
'clockid' is CLOCK_REALTIME or CLOCK_REALTIME. The 'utimer'
value is relative, unless TFD_TIMER_ABSTIME is specified in
'flags', in which case the initial expiration is specified
absolutely.

If 'fd' is not -1, then the call modifies the existing timer
referred to by the file descriptor 'fd'. The 'clockid', 'flags',
and 'utimer' can all be modified. The return value is 'fd'.

The key feature of timerfd() is that the caller can use
select/poll/epoll to wait on traditional file descriptors and
one or more timers.

read() from a timerfd file descriptor (should) return a 4-byte
integer that is the number of timer expirations since the last
read. (If no expiration has so far occurred, read() will block.)

IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken:
only a single byte of info was returned by read(). I regard
this as a virtue: it gives us something closer to a blank slate
for fixing the problems described below; furthermore,
arguably at this point we could buy ourselves time by
pulling timerfd() from 2.6.23, and taking more time to get
things right in 2.6.24.

(More details on timerfd() can be found here:
http://lwn.net/Articles/245533/)

2. timerfd limitations
======================

Unix has two older timer interfaces:

* setitimer/getitimer and

* POSIX timers (timer_create/timer_settime/timer_gettime).

timerfd() lacks two features that are present in the older
interfaces:

* Retrieve the previous setting of an existing timer when
setting a new value for the timer.

* Non-destructively fetch the timer remaining until the
next expiration of the timer.

The fact that this functionality is present in both older APIs
strongly suggests that various applications really need both
functionalities.

(Davide has argued that timerfd() doesn't need the
get-while-setting functionality because we can create multiple
timerfd timers. However, POSIX timers also allow multiple
timer instances, but nevertheless provide get-while-setting.
I would estimate that this functionality would be useful for
libraries that want to create and control a (single) timerfd
file descriptor that is returned to the caller.)

3. possible solutions
=====================

====> a) Add an argument

I proposed adding a further argument to timerfd(): old_utmr,
which could be used to return the time remaining until
expiry for an existing timer
(http://marc.info/?l=linux-kernel&m=118669430305788&w=2 ).
I proposed semantics that would allow get and
get-while-setting functionality.

Jon Corbet pointed out that my suggestion was starting
to look like a multiplexing syscall. I agree. I now
favor one of the remaining solutions.

====> b) Create an interface similar to POSIX timers

Create an interface analogous to POSIX timers:

fd = timerfd_create(clockid, flags);

timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire);

timerfd_gettime(fd, &time_to_next_expire);

Advantage: this would be a clean, fully functional API, and well
understood by virtue of its analogy with the POSIX timers API.

Disadvantage: three new system calls, rather than 1.

This solution would be sufficient, IMO, but the
next solution might be better.

====> c) Integrate timerfd with POSIX timers

Make a very simple timerfd call that is integrated with
the POSIX timers API. A POSIX timer is created using:

int timer_create(clockid_t clockid, struct sigevent *evp,
timer_t *timerid);

We could then have a timerfd() call that returns a file descriptor
for the newly created 'timerid':

fd = timerfd(timer_t timerid);

We could then use the POSIX timers API to operate on the timer
(start it / modify it / fetch timer value):

int timer_gettime(timer_t timerid, struct itimerspec *value);
int timer_settime(timer_t timerid, int flags,
const struct itimerspec *value,
struct itimerspec *ovalue);

And then read from 'fd' as before.

Advantages:
1. Integration with an existing API.
2. Adds just a single system call
3. This strikes me as the most beautiful solution,
if we can do it properly.

Disadvantage: I'm not yet completely clear whether there are some
features of the POSIX timers API that might preclude a clean
integration. In particular, we would need to think a little
about the semantics of timer_getoverrun():

int timer_getoverrun(timer_t timerid);

I suspect it's fine, but we better think about it a little.

We would also have to think about how the 'evp' argument
of timer_create() would be used. This might be trickier.
(Simplest might be to require evp.sigev_notify to be
SIGEV_NONE, or perhaps a new flag, SIGEV_TIMERFD.)

=== END ===

2007-09-05 16:14:44

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

On Wed, 5 Sep 2007, Michael Kerrisk wrote:

> Davide,

A Michael!


> > > As I think about this more, I see more problems with
> > > your argument. timerfd needs the ability to get and
> > > get-while-setting just as much as the earlier APIs.
> > > Consider a library that creates a timerfd file descriptor that
> > > is handed off to an application: that library may want
> > > to modify the timer settings without having to create a
> > > new file descriptor (the app mey not be able to be told about
> > > the new fd). Your argument just doesn't hold, AFAICS.
> >
> > Such hypotethical library, in case it really wanted to offer such
> > functionality, could simply return an handle instead of the raw fd, and
> > take care of all that stuff in userspace.
>
> Did I miss something? Is it not the case that as soon as the
> library returns a handle, rather than an fd, then the whole
> advantage of timerfd() (being able to select/poll/epoll on
> the timer as well as other fds) is lost?

Why? The handle would simply be a little struct where the timerfd fd is
stored, and a XXX_getfd() would return it.
So my point is, I doubt such functionalities are really needed, and I also
argue that the kernel is the best place for such wrapper code to go.



- Davide


2007-09-05 16:24:08

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

Hi Davide,

> > > > As I think about this more, I see more problems with
> > > > your argument. timerfd needs the ability to get and
> > > > get-while-setting just as much as the earlier APIs.
> > > > Consider a library that creates a timerfd file descriptor that
> > > > is handed off to an application: that library may want
> > > > to modify the timer settings without having to create a
> > > > new file descriptor (the app mey not be able to be told about
> > > > the new fd). Your argument just doesn't hold, AFAICS.
> > >
> > > Such hypotethical library, in case it really wanted to offer such
> > > functionality, could simply return an handle instead of the raw fd,
> > > and take care of all that stuff in userspace.
> >
> > Did I miss something? Is it not the case that as soon as the
> > library returns a handle, rather than an fd, then the whole
> > advantage of timerfd() (being able to select/poll/epoll on
> > the timer as well as other fds) is lost?
>
> Why? The handle would simply be a little struct where the timerfd fd is
> stored, and a XXX_getfd() would return it.
> So my point is, I doubt such functionalities are really needed, and I
> also argue that the kernel is the best place for such wrapper code
> to go.

So what happens if one thread (via the library) wants modify
a timer's settings at the same timer as another thread is
select()ing on it? The first thread can't do this by creating
a new timerfd timer, since it wants to affect the select()
in the other thread?

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-09-05 19:57:41

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

On Wed, 5 Sep 2007, Michael Kerrisk wrote:

> Hi Davide,
>
> > > > > As I think about this more, I see more problems with
> > > > > your argument. timerfd needs the ability to get and
> > > > > get-while-setting just as much as the earlier APIs.
> > > > > Consider a library that creates a timerfd file descriptor that
> > > > > is handed off to an application: that library may want
> > > > > to modify the timer settings without having to create a
> > > > > new file descriptor (the app mey not be able to be told about
> > > > > the new fd). Your argument just doesn't hold, AFAICS.
> > > >
> > > > Such hypotethical library, in case it really wanted to offer such
> > > > functionality, could simply return an handle instead of the raw fd,
> > > > and take care of all that stuff in userspace.
> > >
> > > Did I miss something? Is it not the case that as soon as the
> > > library returns a handle, rather than an fd, then the whole
> > > advantage of timerfd() (being able to select/poll/epoll on
> > > the timer as well as other fds) is lost?
> >
> > Why? The handle would simply be a little struct where the timerfd fd is
> > stored, and a XXX_getfd() would return it.
> > So my point is, I doubt such functionalities are really needed, and I
> > also argue that the kernel is the best place for such wrapper code
> > to go.
>
> So what happens if one thread (via the library) wants modify
> a timer's settings at the same timer as another thread is
> select()ing on it? The first thread can't do this by creating
> a new timerfd timer, since it wants to affect the select()
> in the other thread?

It can be done w/out any problems. The select thread will be notified
whenever the new timer setting expires.


- Davide


2007-09-05 22:51:00

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

Hi Davide,

> > > > > > As I think about this more, I see more problems with
> > > > > > your argument. timerfd needs the ability to get and
> > > > > > get-while-setting just as much as the earlier APIs.
> > > > > > Consider a library that creates a timerfd file descriptor that
> > > > > > is handed off to an application: that library may want
> > > > > > to modify the timer settings without having to create a
> > > > > > new file descriptor (the app mey not be able to be told about
> > > > > > the new fd). Your argument just doesn't hold, AFAICS.
> > > > >
> > > > > Such hypotethical library, in case it really wanted to offer such
> > > > > functionality, could simply return an handle instead of the raw
> > > > > fd, and take care of all that stuff in userspace.
> > > >
> > > > Did I miss something? Is it not the case that as soon as the
> > > > library returns a handle, rather than an fd, then the whole
> > > > advantage of timerfd() (being able to select/poll/epoll on
> > > > the timer as well as other fds) is lost?
> > >
> > > Why? The handle would simply be a little struct where the timerfd fd
> > > is
> > > stored, and a XXX_getfd() would return it.
> > > So my point is, I doubt such functionalities are really needed, and I
> > > also argue that the kernel is the best place for such wrapper code
> > > to go.
> >
> > So what happens if one thread (via the library) wants modify
> > a timer's settings at the same timer as another thread is
> > select()ing on it? The first thread can't do this by creating
> > a new timerfd timer, since it wants to affect the select()
> > in the other thread?
>
> It can be done w/out any problems. The select thread will be notified
> whenever the new timer setting expires.

We are going in circles here. I think you are missing my point.
Consider the following

[[
Thread A: calls library function which creates a timerfd file
descriptor.

Thread B: calls select() on the timerfd file descriptor.

Thread A: calls library function which wants to:
a) modify timer settings, and retrieve copy of current timer
settings, and later
b) restore old timer settings.
]]

This seems a quite reasonable use-case to me, and the existing
interface simply can't support it.

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-09-05 23:46:00

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

On Thu, 6 Sep 2007, Michael Kerrisk wrote:

> Hi Davide,
>
> > > > > > > As I think about this more, I see more problems with
> > > > > > > your argument. timerfd needs the ability to get and
> > > > > > > get-while-setting just as much as the earlier APIs.
> > > > > > > Consider a library that creates a timerfd file descriptor that
> > > > > > > is handed off to an application: that library may want
> > > > > > > to modify the timer settings without having to create a
> > > > > > > new file descriptor (the app mey not be able to be told about
> > > > > > > the new fd). Your argument just doesn't hold, AFAICS.
> > > > > >
> > > > > > Such hypotethical library, in case it really wanted to offer such
> > > > > > functionality, could simply return an handle instead of the raw
> > > > > > fd, and take care of all that stuff in userspace.
> > > > >
> > > > > Did I miss something? Is it not the case that as soon as the
> > > > > library returns a handle, rather than an fd, then the whole
> > > > > advantage of timerfd() (being able to select/poll/epoll on
> > > > > the timer as well as other fds) is lost?
> > > >
> > > > Why? The handle would simply be a little struct where the timerfd fd
> > > > is
> > > > stored, and a XXX_getfd() would return it.
> > > > So my point is, I doubt such functionalities are really needed, and I
> > > > also argue that the kernel is the best place for such wrapper code
> > > > to go.
> > >
> > > So what happens if one thread (via the library) wants modify
> > > a timer's settings at the same timer as another thread is
> > > select()ing on it? The first thread can't do this by creating
> > > a new timerfd timer, since it wants to affect the select()
> > > in the other thread?
> >
> > It can be done w/out any problems. The select thread will be notified
> > whenever the new timer setting expires.
>
> We are going in circles here. I think you are missing my point.
> Consider the following
>
> [[
> Thread A: calls library function which creates a timerfd file
> descriptor.
>
> Thread B: calls select() on the timerfd file descriptor.
>
> Thread A: calls library function which wants to:
> a) modify timer settings, and retrieve copy of current timer
> settings, and later
> b) restore old timer settings.
> ]]
>
> This seems a quite reasonable use-case to me, and the existing
> interface simply can't support it.

"Quite reasonable"? :)
I honestly doubt it, but anyway. Modulo error checking:

struct tfd {
int fd, clockid;
struct itimerspec ts;
};

struct tfd *tfd_create(int clockid, int flags, const struct itimerspec *ts) {
struct tfd *th;
th = malloc(sizeof(*th));
th->clockid = clockid;
th->ts = *ts;
th->fd = timerfd(-1, clockid, flags, ts);
return th;
}

void tfd_close(struct tfd *th) {
close(th->fd);
free(th);
}

int tfd_getfd(const struct tfd *th) {
return th->fd;
}

int tfd_gettime(const struct tfd *th, int *clockid, struct itimerspec *ts) {
*clockid = th->clockid;
*ts = th->ts;
return 0;
}

int tfd_settime(struct tfd *th, int clockid, int flags,
const struct itimerspec *ts) {
th->fd = timerfd(th->fd, clockid, flags, ts);
th->clockid = clockid;
th->ts = *ts;
return 0;
}

Wrap the get/set with a mutex in case you plan to shoot yourself in a foot
by doing get/set from multiple threads ;)
So, once again:

- I sincerly doubt the above is common usage/design patters for timerfds

* timerfds are not a common global resource, ala signals, that requires
get+set+restore pattern - you can have many of them set to different
times

- Those IMO *very* special use cases can be handled in userspace with few
lines of code, *if* really needed



- Davide


2007-09-06 06:58:26

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

Hi Davide,

> > > > > > > > As I think about this more, I see more problems with
> > > > > > > > your argument. timerfd needs the ability to get and
> > > > > > > > get-while-setting just as much as the earlier APIs.
> > > > > > > > Consider a library that creates a timerfd file descriptor
> > > > > > > > that
> > > > > > > > is handed off to an application: that library may want
> > > > > > > > to modify the timer settings without having to create a
> > > > > > > > new file descriptor (the app mey not be able to be told
> > > > > > > > about
> > > > > > > > the new fd). Your argument just doesn't hold, AFAICS.
> > > > > > >
> > > > > > > Such hypotethical library, in case it really wanted to offer
> > > > > > > such
> > > > > > > functionality, could simply return an handle instead of the
> > > > > > > raw fd, and take care of all that stuff in userspace.
> > > > > >
> > > > > > Did I miss something? Is it not the case that as soon as the
> > > > > > library returns a handle, rather than an fd, then the whole
> > > > > > advantage of timerfd() (being able to select/poll/epoll on
> > > > > > the timer as well as other fds) is lost?
> > > > >
> > > > > Why? The handle would simply be a little struct where the
> > > > > timerfd fd is
> > > > > stored, and a XXX_getfd() would return it.
> > > > > So my point is, I doubt such functionalities are really needed,
> > > > > and I
> > > > > also argue that the kernel is the best place for such wrapper
^
(By the way, I assume that back there, there was a missing "not",
right?)

> > > > > code to go.
> > > >
> > > > So what happens if one thread (via the library) wants modify
> > > > a timer's settings at the same timer as another thread is
> > > > select()ing on it? The first thread can't do this by creating
> > > > a new timerfd timer, since it wants to affect the select()
> > > > in the other thread?
> > >
> > > It can be done w/out any problems. The select thread will be notified
> > > whenever the new timer setting expires.
> >
> > We are going in circles here. I think you are missing my point.
> > Consider the following
> >
> > [[
> > Thread A: calls library function which creates a timerfd file
> > descriptor.
> >
> > Thread B: calls select() on the timerfd file descriptor.
> >
> > Thread A: calls library function which wants to:
> > a) modify timer settings, and retrieve copy of current timer
> > settings, and later
> > b) restore old timer settings.
> > ]]
> >
> > This seems a quite reasonable use-case to me, and the existing
> > interface simply can't support it.
>
> "Quite reasonable"? :)
> I honestly doubt it, but anyway.

You are asserting this in the face of two previous APIs designed
by people who (at least in the case of POSIX timers) probably
thoroughly examined and discussed existing APIs and practice.

> Modulo error checking:

I'm not sure what problem this code is supposed to solve. It
doesn't solve the problem I have in mind; see below.

> struct tfd {
> int fd, clockid;
> struct itimerspec ts;
> };
>
> struct tfd *tfd_create(int clockid, int flags, const struct itimerspec
> *ts) {
> struct tfd *th;
> th = malloc(sizeof(*th));
> th->clockid = clockid;
> th->ts = *ts;
> th->fd = timerfd(-1, clockid, flags, ts);
> return th;
> }
>
> void tfd_close(struct tfd *th) {
> close(th->fd);
> free(th);
> }
>
> int tfd_getfd(const struct tfd *th) {
> return th->fd;
> }
>
> int tfd_gettime(const struct tfd *th, int *clockid, struct itimerspec *ts)
> {
> *clockid = th->clockid;
> *ts = th->ts;
> return 0;
> }

This function is *not at all* equivalent to the "get"
functionality of the previous APIs. The "get" functionality
of POSIX timers (for example) returns a structure that contains
the timer interval and the *time until the next expiration of
the timer* (not the initial timer string, as your code above
does).

So come up with a reliable, race-free way of doing that in
userspace. Then make it work for both CLOCK_MONOTONIC and
CLOCK_REALTIME timers. (You'll certainly need to be making
some additional system calls, by the way: at least some
calls to clock_gettime().)

> int tfd_settime(struct tfd *th, int clockid, int flags,
> const struct itimerspec *ts) {
> th->fd = timerfd(th->fd, clockid, flags, ts);
> th->clockid = clockid;
> th->ts = *ts;
> return 0;
> }
>
> Wrap the get/set with a mutex in case you plan to shoot yourself in a
> foot by doing get/set from multiple threads ;)
> So, once again:
>
> - I sincerly doubt the above is common usage/design patters for timerfds

See my comments above in this message. You may doubt it, but
all of the earlier API designers did not.

> * timerfds are not a common global resource, ala signals, that requires
> get+set+restore pattern - you can have many of them set to different
> times

No! In the example I described the library is able to create and
control exactly *one* timerfd file descriptor. It wants to hand
that fd back to the application and then perform arbitrary
manipulations of the timer's settings. Meanwhile the application
needs to (repeatedly) monitor that one file descriptor in a
select()/poll()/epoll() (and so the library can't just arbitrarily
create further file descriptors).

> - Those IMO *very* special use cases can be handled in userspace with few
> lines of code, *if* really needed

You have not demonstrated this ;-). Your userspace code does
not solve the problem.

Best regards,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-09-06 23:37:23

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

On Thu, 6 Sep 2007, Michael Kerrisk wrote:

> You are asserting this in the face of two previous APIs designed
> by people who (at least in the case of POSIX timers) probably
> thoroughly examined and discussed existing APIs and practice.

You really think that. Uhmm, ok.



> This function is *not at all* equivalent to the "get"
> functionality of the previous APIs. The "get" functionality
> of POSIX timers (for example) returns a structure that contains
> the timer interval and the *time until the next expiration of
> the timer* (not the initial timer string, as your code above
> does).
> So come up with a reliable, race-free way of doing that in
> userspace. Then make it work for both CLOCK_MONOTONIC and
> CLOCK_REALTIME timers. (You'll certainly need to be making
> some additional system calls, by the way: at least some
> calls to clock_gettime().)

No, I don't think I will. Since 1) it's so trivial it's not even
challenging 2) you know it can be done (I assume) 3) it solves a non-case
made up to justify an API/kernel-code bloating.
But you don't need *my* signoff. Cook a working patch, make a case for it,
and gather support and signoffs. I won't be acking it because, as I said
many times, I think it's useless bloating.



- Davide


2007-09-10 03:18:17

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] Revised timerfd() interface

Hi Davide,

>> This function is *not at all* equivalent to the "get"
>> functionality of the previous APIs. The "get" functionality
>> of POSIX timers (for example) returns a structure that contains
>> the timer interval and the *time until the next expiration of
>> the timer* (not the initial timer string, as your code above
>> does).
>> So come up with a reliable, race-free way of doing that in
>> userspace. Then make it work for both CLOCK_MONOTONIC and
>> CLOCK_REALTIME timers. (You'll certainly need to be making
>> some additional system calls, by the way: at least some
>> calls to clock_gettime().)
>
> No, I don't think I will. Since 1) it's so trivial it's not even
> challenging

Well, it seems to me that the proposed library wrapper code
could start to get rather verbose by the time one adds in
these changes (plus the mutexes).

> 2) you know it can be done (I assume)

Sorry -- I was probably speaking too rhetorically. In fact, it's
not completely clear to me that it can (always) be done. For
example, the answer depends on knowing the clockid. But
what if the file descriptor has been handed off to some code
that doesn't know the clock type (i.e., isn't accessing the fd
via your suggested library)?

Cheers,

Michael

> 3) it solves a non-case
> made up to justify an API/kernel-code bloating.
> But you don't need *my* signoff. Cook a working patch, make a case for it,
> and gather support and signoffs. I won't be acking it because, as I said
> many times, I think it's useless bloating.

2007-09-13 02:40:49

by Andrew Morton

[permalink] [raw]
Subject: Re: timerfd redux

On Wed, 05 Sep 2007 17:32:01 +0200 "Michael Kerrisk" <[email protected]> wrote:

> [Was: Re: [PATCH] Revised timerfd() interface]
>
> > Michael, could you please refresh our memories with a brief,
> > from-scratch summary of what the current interface is, followed
> > by a summary of what you believe to be the shortcomings to be?
>
> Andrew,
>
> I'll break this up into parts:
>
> 1. the existing timerfd interface
> 2. timerfd limitations
> 3. possible solutions
> a) Add an argument
> b) Create an interface similar to POSIX timers
> c) Integrate timerfd with POSIX timers
>
> Cheers,
>
> Michael
>
>
> 1: the existing timerfd interface
> =================================
>
> In 2.6.22, Davide added timerfd() with the following interface:
>
> returned_fd = timerfd(int fd, int clockid, int flags,
> struct itimerspec *utimer);
>
> If fd is -1, a new timer is created and started. The syscall
> returns a file descriptor for the timer. 'utimer' specifies
> the initial expiration and interval of the timer.
> 'clockid' is CLOCK_REALTIME or CLOCK_REALTIME. The 'utimer'
> value is relative, unless TFD_TIMER_ABSTIME is specified in
> 'flags', in which case the initial expiration is specified
> absolutely.
>
> If 'fd' is not -1, then the call modifies the existing timer
> referred to by the file descriptor 'fd'. The 'clockid', 'flags',
> and 'utimer' can all be modified. The return value is 'fd'.
>
> The key feature of timerfd() is that the caller can use
> select/poll/epoll to wait on traditional file descriptors and
> one or more timers.
>
> read() from a timerfd file descriptor (should) return a 4-byte
> integer that is the number of timer expirations since the last
> read. (If no expiration has so far occurred, read() will block.)
>
> IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken:
> only a single byte of info was returned by read(). I regard
> this as a virtue: it gives us something closer to a blank slate
> for fixing the problems described below; furthermore,
> arguably at this point we could buy ourselves time by
> pulling timerfd() from 2.6.23, and taking more time to get
> things right in 2.6.24.
>
> (More details on timerfd() can be found here:
> http://lwn.net/Articles/245533/)

OK.

> 2. timerfd limitations
> ======================
>
> Unix has two older timer interfaces:
>
> * setitimer/getitimer and
>
> * POSIX timers (timer_create/timer_settime/timer_gettime).
>
> timerfd() lacks two features that are present in the older
> interfaces:
>
> * Retrieve the previous setting of an existing timer when
> setting a new value for the timer.
>
> * Non-destructively fetch the timer remaining until the
> next expiration of the timer.
>
> The fact that this functionality is present in both older APIs
> strongly suggests that various applications really need both
> functionalities.

Yes, I can imagine applications wanting to do those things.

> (Davide has argued that timerfd() doesn't need the
> get-while-setting functionality because we can create multiple
> timerfd timers. However, POSIX timers also allow multiple
> timer instances, but nevertheless provide get-while-setting.
> I would estimate that this functionality would be useful for
> libraries that want to create and control a (single) timerfd
> file descriptor that is returned to the caller.)

Sure. If you're implementing a timeout and you want to reset it, you might
indeed want to know how close the old one was to expiring.

Davide's proposal sounds like an awkward workaround for missing
functionality.


Does Davide have a proposal for the non-destructive fetch?


> 3. possible solutions

I don't think we'll have this settled and coded in time for 2.6.23. So I
think the prudent thing to do is to push this back to 2.6.24 and not offer
sys_timerfd() in 2.6.23.

2007-09-13 06:14:37

by Michael Kerrisk

[permalink] [raw]
Subject: Re: timerfd redux

> > [Was: Re: [PATCH] Revised timerfd() interface]
> >
> > > Michael, could you please refresh our memories with a brief,
> > > from-scratch summary of what the current interface is, followed
> > > by a summary of what you believe to be the shortcomings to be?
> >
> > Andrew,
> >
> > I'll break this up into parts:
> >
> > 1. the existing timerfd interface
> > 2. timerfd limitations
> > 3. possible solutions
> > a) Add an argument
> > b) Create an interface similar to POSIX timers
> > c) Integrate timerfd with POSIX timers
> >
> > Cheers,
> >
> > Michael
> >
> >
> > 1: the existing timerfd interface
> > =================================
> >
> > In 2.6.22, Davide added timerfd() with the following interface:
> >
> > returned_fd = timerfd(int fd, int clockid, int flags,
> > struct itimerspec *utimer);
> >
> > If fd is -1, a new timer is created and started. The syscall
> > returns a file descriptor for the timer. 'utimer' specifies
> > the initial expiration and interval of the timer.
> > 'clockid' is CLOCK_REALTIME or CLOCK_REALTIME. The 'utimer'
> > value is relative, unless TFD_TIMER_ABSTIME is specified in
> > 'flags', in which case the initial expiration is specified
> > absolutely.
> >
> > If 'fd' is not -1, then the call modifies the existing timer
> > referred to by the file descriptor 'fd'. The 'clockid', 'flags',
> > and 'utimer' can all be modified. The return value is 'fd'.
> >
> > The key feature of timerfd() is that the caller can use
> > select/poll/epoll to wait on traditional file descriptors and
> > one or more timers.
> >
> > read() from a timerfd file descriptor (should) return a 4-byte
> > integer that is the number of timer expirations since the last
> > read. (If no expiration has so far occurred, read() will block.)
> >
> > IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken:
> > only a single byte of info was returned by read(). I regard
> > this as a virtue: it gives us something closer to a blank slate
> > for fixing the problems described below; furthermore,
> > arguably at this point we could buy ourselves time by
> > pulling timerfd() from 2.6.23, and taking more time to get
> > things right in 2.6.24.
> >
> > (More details on timerfd() can be found here:
> > http://lwn.net/Articles/245533/)
>
> OK.
>
> > 2. timerfd limitations
> > ======================
> >
> > Unix has two older timer interfaces:
> >
> > * setitimer/getitimer and
> >
> > * POSIX timers (timer_create/timer_settime/timer_gettime).
> >
> > timerfd() lacks two features that are present in the older
> > interfaces:
> >
> > * Retrieve the previous setting of an existing timer when
> > setting a new value for the timer.
> >
> > * Non-destructively fetch the timer remaining until the
> > next expiration of the timer.
> >
> > The fact that this functionality is present in both older APIs
> > strongly suggests that various applications really need both
> > functionalities.
>
> Yes, I can imagine applications wanting to do those things.
>
> > (Davide has argued that timerfd() doesn't need the
> > get-while-setting functionality because we can create multiple
> > timerfd timers. However, POSIX timers also allow multiple
> > timer instances, but nevertheless provide get-while-setting.
> > I would estimate that this functionality would be useful for
> > libraries that want to create and control a (single) timerfd
> > file descriptor that is returned to the caller.)
>
> Sure. If you're implementing a timeout and you want to reset it, you
> might indeed want to know how close the old one was to expiring.
>
> Davide's proposal sounds like an awkward workaround for missing
> functionality.

In the other thread I commented that the userspace solution
starts to look pretty complex, and I doubt that it can
be made to work in all cases.

> Does Davide have a proposal for the non-destructive fetch?

I don't think so, since he disagrees about it's necessity.

> > 3. possible solutions
>
> I don't think we'll have this settled and coded in time for 2.6.23. So I
> think the prudent thing to do is to push this back to 2.6.24 and not
> offer sys_timerfd() in 2.6.23.

I think this would be wise. I'd like to talk with Davide some
more about the possibilities.

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-09-13 08:14:16

by Michael Kerrisk

[permalink] [raw]
Subject: Re: timerfd redux

Andrew,

> > 3. possible solutions
>
> I don't think we'll have this settled and coded in time for 2.6.23. So I
> think the prudent thing to do is to push this back to 2.6.24 and not offer
> sys_timerfd() in 2.6.23.

Did you want a patch to remove the syscall number for now,
or will you do that?

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-09-13 08:22:19

by Andrew Morton

[permalink] [raw]
Subject: Re: timerfd redux

On Thu, 13 Sep 2007 10:13:59 +0200 "Michael Kerrisk" <[email protected]> wrote:

> Andrew,
>
> > > 3. possible solutions
> >
> > I don't think we'll have this settled and coded in time for 2.6.23. So I
> > think the prudent thing to do is to push this back to 2.6.24 and not offer
> > sys_timerfd() in 2.6.23.
>
> Did you want a patch to remove the syscall number for now,
> or will you do that?
>

Please send one over sometime.