2007-09-18 07:29:50

by Michael Kerrisk

[permalink] [raw]
Subject: RFC: A revised timerfd API

After my earlier mail (full thread here:
http://thread.gmane.org/gmane.linux.kernel/574430/focus=579368 )
it seems that some people agree we should give a bit more
thought to how a final timerfd interface should look.

Davide's original API and the limitations that I see in it,
are described in the message cited above. In that message,
I proposed three alternatives (and I'm going to add a fourth
now) that provide "get-while-set" and "non-destructive-get"
functionality. Before trying to implement anything I'd like to
get input on the various possible designs.

The four designs are:

a) A multiplexing timerfd() system call.
b) Creating three syscalls analogous to the POSIX timers API (i.e.,
timerfd_create/timerfd_settime/timerfd_gettime).
c) Creating a simplified timerfd() system call that is integrated
with the POSIX timers API.
d) Extending the POSIX timers API to support the timerfd concept.

My order of preference for the implementations is currently
something like (in descending order): d, b, c, a.

The details follow:

====> a) Add an argument (a multiplexing timerfd() system call)

In an earlier mail (http://thread.gmane.org/gmane.linux.kernel/559193 ).
I proposed adding a further argument to timerfd(): old_utmr, which could
be used to return the time remaining until expiry for an existing timer
The proposed semantics that would allow get and get-while-setting
functionality.

Advantages:
1. Provides the desired get and get-while-setting functionality.
2. It's a simple interface (a single system call).

Disadvantage:
Jon Corbet pointed out
(http://thread.gmane.org/gmane.linux.kernel/559193/focus=570709 )
that this interface was starting to look like a multiplexing syscall,
because there is no case where all of the arguments are used (see
the use-case descriptions in the earlier mail).

I'm inclined to agree with Jon; therefore one of the remaining
solutions may be preferable

====> b) Create a timerfd interface analogous to POSIX timers

Create an interface analogous to POSIX timers:

fd = timerfd_create(clockid, flags);

timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire);

timerfd_gettime(fd, &time_to_next_expire);

Under this proposal, the manner of making a timer that does not
need "get-while-set" functionality remains fairly simple:

fd = timerfd_create(clockid);

timerfd_settime(fd, flags, newtimervalue, NULL);

Advantage: this would be a clean, fully functional API, and well
understood by virtue of its analogy with the POSIX timers API.

Disadvantage: 3 new system calls, rather than 1.

This solution would be sufficient, IMO, but one of the
next solutions might be better.

====> c) Integrate timerfd with POSIX timers

Make a very simple timerfd call that is integrated with the
POSIX timers API. The POSIX timers API is detailed here:
http://linux.die.net/man/3/timer_create
http://linux.die.net/man/3/timer_settime

Under the POSIX timers API, a new timer is created using:

int timer_create(clockid_t clockid, struct sigevent *evp,
timer_t *timerid);

We could then have a timerfd() call that returns a file descriptor
for the newly created 'timerid':

fd = timerfd(timer_t timerid);

We could then use the POSIX timers API to operate on the timer
(start it / modify it / fetch timer value):

int timer_settime(timer_t timerid, int flags,
const struct itimerspec *value,
struct itimerspec *ovalue);
int timer_gettime(timer_t timerid, struct itimerspec *value);

And then read() from 'fd' as before.

In the simple case (no "get" or "get-while-setting" functionality),
the use of API (c) would be:

timer_create(clockid, &evp, &timerid);

fd = timerfd(timerid);

timer_settime(timerid, flags, &newvalue, NULL));

Advantages:
1. Integration with an existing API.
2. Adds just a single system call.
3. It _might_ be possible to construct an interface that allows
userland programs to do things like creating a timer fd for
a POSIX timer that was created via some library that doesn't
actually know about timer fds. (I can already see problems with
this, since that library will already expect to be delivering
timer notifications somehow (via threads or signals), and it may
be difficult to make the two notification mechanisms play
together in a sane way. But maybe someone else has a take on
this that can rescue this idea.)

Disadvantages:
1. Starts to get a little more clunky to use in the simple
case shown above.

This strikes me as a more attractive solution than (b), if we can do
it properly -- that means: if we can achieve advantage 3
in some reasonable way. If we can't achieve that, then probably
the next solution is better.

====> d) extend the POSIX timers API

Under the POSIX timers API, the evp argument of timer_create() is a
structure that allows the caller to specify how timer expirations
should be notified. There are the following possibilities
(differentiated by the value assigned to evp.sigev_notify):

i) notify via a signal: the caller specifies which signal the
kernel should deliver when the timer expires.
(SIGEV_SIGNAL)
ii) notify by delivering a signal to the thread whose thread ID
is specified in evp. (This is Linux specific.)
(SIGEV_THREAD_ID)
iii) notify via a thread: when the timer expires, the system starts
a new thread which receives an argument that was specified in
the evp structure. (SIGEV_THREAD)
iv) no notification: the caller can monitor the timer state using
timer_gettime(). (SIGEV_NONE)

In all of the above cases, the return value from timer_create()
is 0 for success, or -1 for failure.

We could extend the interface as follows:

1) Add a new flag for evp.sigev_notify: SIGEV_TIMERFD.
This flag indicates that the caller wants timer
notification via a file descriptor.
2) Whenevp.sigev_notify == SIGEV_TIMERFD, have a successful
timer_create() call return a file descriptor (i.e., an
integer >= 0).

Advantages:
1. Integration with an existing API.
2. No new system calls are required.
3. This idea might even have a chance of getting standardized in
POSIX one day, since (IMO) it integrates fairly cleanly with
an existing API.

Disadvantages:
1. The fact that the return value of a successful timer_create()
is different for the SIGEV_TIMERFD case is a bit of a wart.

=====

That's it. I welcome people's thoughts on these designs.

Cheers,

Michael


2007-09-18 07:33:35

by Michael Kerrisk

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

[Resend, because I got one email address wrong in the earlier send]

After my earlier mail (full thread here:
http://thread.gmane.org/gmane.linux.kernel/574430/focus=579368 )
it seems that some people agree we should give a bit more
thought to how a final timerfd interface should look.

Davide's original API and the limitations that I see in it,
are described in the message cited above. In that message,
I proposed three alternatives (and I'm going to add a fourth
now) that provide "get-while-set" and "non-destructive-get"
functionality. Before trying to implement anything I'd like to
get input on the various possible designs.

The four designs are:

a) A multiplexing timerfd() system call.
b) Creating three syscalls analogous to the POSIX timers API (i.e.,
timerfd_create/timerfd_settime/timerfd_gettime).
c) Creating a simplified timerfd() system call that is integrated
with the POSIX timers API.
d) Extending the POSIX timers API to support the timerfd concept.

My order of preference for the implementations is currently
something like (in descending order): d, b, c, a.

The details follow:

====> a) Add an argument (a multiplexing timerfd() system call)

In an earlier mail (http://thread.gmane.org/gmane.linux.kernel/559193 ).
I proposed adding a further argument to timerfd(): old_utmr, which could
be used to return the time remaining until expiry for an existing timer
The proposed semantics that would allow get and get-while-setting
functionality.

Advantages:
1. Provides the desired get and get-while-setting functionality.
2. It's a simple interface (a single system call).

Disadvantage:
Jon Corbet pointed out
(http://thread.gmane.org/gmane.linux.kernel/559193/focus=570709 )
that this interface was starting to look like a multiplexing syscall,
because there is no case where all of the arguments are used (see
the use-case descriptions in the earlier mail).

I'm inclined to agree with Jon; therefore one of the remaining
solutions may be preferable

====> b) Create a timerfd interface analogous to POSIX timers

Create an interface analogous to POSIX timers:

fd = timerfd_create(clockid, flags);

timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire);

timerfd_gettime(fd, &time_to_next_expire);

Under this proposal, the manner of making a timer that does not
need "get-while-set" functionality remains fairly simple:

fd = timerfd_create(clockid);

timerfd_settime(fd, flags, newtimervalue, NULL);

Advantage: this would be a clean, fully functional API, and well
understood by virtue of its analogy with the POSIX timers API.

Disadvantage: 3 new system calls, rather than 1.

This solution would be sufficient, IMO, but one of the
next solutions might be better.

====> c) Integrate timerfd with POSIX timers

Make a very simple timerfd call that is integrated with the
POSIX timers API. The POSIX timers API is detailed here:
http://linux.die.net/man/3/timer_create
http://linux.die.net/man/3/timer_settime

Under the POSIX timers API, a new timer is created using:

int timer_create(clockid_t clockid, struct sigevent *evp,
timer_t *timerid);

We could then have a timerfd() call that returns a file descriptor
for the newly created 'timerid':

fd = timerfd(timer_t timerid);

We could then use the POSIX timers API to operate on the timer
(start it / modify it / fetch timer value):

int timer_settime(timer_t timerid, int flags,
const struct itimerspec *value,
struct itimerspec *ovalue);
int timer_gettime(timer_t timerid, struct itimerspec *value);

And then read() from 'fd' as before.

In the simple case (no "get" or "get-while-setting" functionality),
the use of API (c) would be:

timer_create(clockid, &evp, &timerid);

fd = timerfd(timerid);

timer_settime(timerid, flags, &newvalue, NULL));

Advantages:
1. Integration with an existing API.
2. Adds just a single system call.
3. It _might_ be possible to construct an interface that allows
userland programs to do things like creating a timer fd for
a POSIX timer that was created via some library that doesn't
actually know about timer fds. (I can already see problems with
this, since that library will already expect to be delivering
timer notifications somehow (via threads or signals), and it may
be difficult to make the two notification mechanisms play
together in a sane way. But maybe someone else has a take on
this that can rescue this idea.)

Disadvantages:
1. Starts to get a little more clunky to use in the simple
case shown above.

This strikes me as a more attractive solution than (b), if we can do
it properly -- that means: if we can achieve advantage 3
in some reasonable way. If we can't achieve that, then probably
the next solution is better.

====> d) extend the POSIX timers API

Under the POSIX timers API, the evp argument of timer_create() is a
structure that allows the caller to specify how timer expirations
should be notified. There are the following possibilities
(differentiated by the value assigned to evp.sigev_notify):

i) notify via a signal: the caller specifies which signal the
kernel should deliver when the timer expires.
(SIGEV_SIGNAL)
ii) notify by delivering a signal to the thread whose thread ID
is specified in evp. (This is Linux specific.)
(SIGEV_THREAD_ID)
iii) notify via a thread: when the timer expires, the system starts
a new thread which receives an argument that was specified in
the evp structure. (SIGEV_THREAD)
iv) no notification: the caller can monitor the timer state using
timer_gettime(). (SIGEV_NONE)

In all of the above cases, the return value from timer_create()
is 0 for success, or -1 for failure.

We could extend the interface as follows:

1) Add a new flag for evp.sigev_notify: SIGEV_TIMERFD.
This flag indicates that the caller wants timer
notification via a file descriptor.
2) Whenevp.sigev_notify == SIGEV_TIMERFD, have a successful
timer_create() call return a file descriptor (i.e., an
integer >= 0).

Advantages:
1. Integration with an existing API.
2. No new system calls are required.
3. This idea might even have a chance of getting standardized in
POSIX one day, since (IMO) it integrates fairly cleanly with
an existing API.

Disadvantages:
1. The fact that the return value of a successful timer_create()
is different for the SIGEV_TIMERFD case is a bit of a wart.

=====

That's it. I welcome people's thoughts on these designs.

Cheers,

Michael

2007-09-18 08:53:51

by David Härdeman

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Tue, September 18, 2007 09:30, Michael Kerrisk wrote:
> ====> b) Create a timerfd interface analogous to POSIX timers
>
> Create an interface analogous to POSIX timers:
> fd = timerfd_create(clockid, flags);
> timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire);
> timerfd_gettime(fd, &time_to_next_expire);
...
> ====> c) Integrate timerfd with POSIX timers
>
> Make a very simple timerfd call that is integrated with the
> POSIX timers API. The POSIX timers API is detailed here:
> http://linux.die.net/man/3/timer_create
> http://linux.die.net/man/3/timer_settime
>
> Under the POSIX timers API, a new timer is created using:
>
> int timer_create(clockid_t clockid, struct sigevent *evp,
> timer_t *timerid);
>
> We could then have a timerfd() call that returns a file descriptor
> for the newly created 'timerid':
>
> fd = timerfd(timer_t timerid);

Wouldn't this remove some of the usefulness of the timerfd?

For example, if a timerfd is one of the fd's that is returned by a
epoll_wait syscall, you manually need to do the mapping between the
timerfd and the timerid in order to be able to modify the timer.

The advantage of solution b) above is that the fd is everything that is
needed to work with the timer. With solution c) you have to keep two
references to the same timer around and use one of them depending on what
you want to do with the timer.

Also, if the timerfd is close():d, does that remove the underlying timer
(invalidate the timerid) as well?

--
David H?rdeman

2007-09-18 09:01:21

by Michael Kerrisk

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

Hello David,

Thanks for taking a look at this.

> On Tue, September 18, 2007 09:30, Michael Kerrisk wrote:
> > ====> b) Create a timerfd interface analogous to POSIX timers
> >
> > Create an interface analogous to POSIX timers:
> > fd = timerfd_create(clockid, flags);
> > timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire);
> > timerfd_gettime(fd, &time_to_next_expire);
> ...
> > ====> c) Integrate timerfd with POSIX timers
> >
> > Make a very simple timerfd call that is integrated with the
> > POSIX timers API. The POSIX timers API is detailed here:
> > http://linux.die.net/man/3/timer_create
> > http://linux.die.net/man/3/timer_settime
> >
> > Under the POSIX timers API, a new timer is created using:
> >
> > int timer_create(clockid_t clockid, struct sigevent *evp,
> > timer_t *timerid);
> >
> > We could then have a timerfd() call that returns a file descriptor
> > for the newly created 'timerid':
> >
> > fd = timerfd(timer_t timerid);
>
> Wouldn't this remove some of the usefulness of the timerfd?
>
> For example, if a timerfd is one of the fd's that is returned by a
> epoll_wait syscall, you manually need to do the mapping between the
> timerfd and the timerid in order to be able to modify the timer.

You're right, that makes the interface more clumsy. +1 for
the disadvantages. And of course solution (d) also suffers
this problem.

> The advantage of solution b) above is that the fd is everything that is
> needed to work with the timer.

Yes, true. Solution (b) would also be relatively easier (for me)
to implement.

> With solution c) you have to keep two
> references to the same timer around and use one of them depending on what
> you want to do with the timer.

Yes. (And the same for option (d).)

> Also, if the timerfd is close():d, does that remove the underlying timer
> (invalidate the timerid) as well?

My gut feeling would be to say that closing the timerfd would not
remove the underlying timer (so the timerid would remain valid).
One could even do things like recreating a file descriptor
for the timer using another timerfd() call.

But now that raises the question: what are the semantics if
timerfd() is called more than once on the same timerid?
Perhaps a read() from any one of them (destructively)
reads the expiration count, as though one had read from a
dup()ed the file descriptor. All in all, solution (c)
starts to look overly complex, and maybe suffers from
various dirty corners in the API. (Solution (d) feels
slightly better, because the creation of the file descriptor
and the timerid are integrated into a single call, and the
fact that it integrates with an existing API, but
it still has the limitation you describe above.)

Cheers,

Michael

2007-09-18 09:10:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

Michael,

On Tue, 2007-09-18 at 09:30 +0200, Michael Kerrisk wrote:
> ====> a) Add an argument (a multiplexing timerfd() system call)
> Disadvantage:
> Jon Corbet pointed out
> (http://thread.gmane.org/gmane.linux.kernel/559193/focus=570709 )
> that this interface was starting to look like a multiplexing syscall,
> because there is no case where all of the arguments are used (see
> the use-case descriptions in the earlier mail).
>
> I'm inclined to agree with Jon; therefore one of the remaining
> solutions may be preferable

I agree. It's ugly.

> ====> b) Create a timerfd interface analogous to POSIX timers
>
> Create an interface analogous to POSIX timers:
>
> fd = timerfd_create(clockid, flags);
>
> timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire);
>
> timerfd_gettime(fd, &time_to_next_expire);
>
> Under this proposal, the manner of making a timer that does not
> need "get-while-set" functionality remains fairly simple:
>
> fd = timerfd_create(clockid);
>
> timerfd_settime(fd, flags, newtimervalue, NULL);
>
> Advantage: this would be a clean, fully functional API, and well
> understood by virtue of its analogy with the POSIX timers API.
>
> Disadvantage: 3 new system calls, rather than 1.
>
> This solution would be sufficient, IMO, but one of the
> next solutions might be better.

I'm not scared by the 3 system calls. I rather fear that we end up
reimplementing half of the existing posix timer code.

> ====> c) Integrate timerfd with POSIX timers
>
> Make a very simple timerfd call that is integrated with the
> POSIX timers API. The POSIX timers API is detailed here:
> http://linux.die.net/man/3/timer_create
> http://linux.die.net/man/3/timer_settime
>
> Under the POSIX timers API, a new timer is created using:
>
> int timer_create(clockid_t clockid, struct sigevent *evp,
> timer_t *timerid);
>
> We could then have a timerfd() call that returns a file descriptor
> for the newly created 'timerid':
>
> fd = timerfd(timer_t timerid);
>
> We could then use the POSIX timers API to operate on the timer
> (start it / modify it / fetch timer value):
>
> int timer_settime(timer_t timerid, int flags,
> const struct itimerspec *value,
> struct itimerspec *ovalue);
> int timer_gettime(timer_t timerid, struct itimerspec *value);
>
> And then read() from 'fd' as before.
>
> In the simple case (no "get" or "get-while-setting" functionality),
> the use of API (c) would be:
>
> timer_create(clockid, &evp, &timerid);
>
> fd = timerfd(timerid);
>
> timer_settime(timerid, flags, &newvalue, NULL));
>
> Advantages:
> 1. Integration with an existing API.
> 2. Adds just a single system call.
> 3. It _might_ be possible to construct an interface that allows
> userland programs to do things like creating a timer fd for
> a POSIX timer that was created via some library that doesn't
> actually know about timer fds. (I can already see problems with
> this, since that library will already expect to be delivering
> timer notifications somehow (via threads or signals), and it may
> be difficult to make the two notification mechanisms play
> together in a sane way. But maybe someone else has a take on
> this that can rescue this idea.)
>
> Disadvantages:
> 1. Starts to get a little more clunky to use in the simple
> case shown above.
>
> This strikes me as a more attractive solution than (b), if we can do
> it properly -- that means: if we can achieve advantage 3
> in some reasonable way. If we can't achieve that, then probably
> the next solution is better.

The main problem here is, that there is no way to tell the posix timer
code that the delivery of the timer is through the file descriptor and
not via the usual posix timer mechanisms. We need something like the
SIGEV_TIMERFD flag to make the posix timer code aware of that.

> ====> d) extend the POSIX timers API
>
> Under the POSIX timers API, the evp argument of timer_create() is a
> structure that allows the caller to specify how timer expirations
> should be notified. There are the following possibilities
> (differentiated by the value assigned to evp.sigev_notify):
>
> i) notify via a signal: the caller specifies which signal the
> kernel should deliver when the timer expires.
> (SIGEV_SIGNAL)
> ii) notify by delivering a signal to the thread whose thread ID
> is specified in evp. (This is Linux specific.)
> (SIGEV_THREAD_ID)
> iii) notify via a thread: when the timer expires, the system starts
> a new thread which receives an argument that was specified in
> the evp structure. (SIGEV_THREAD)
> iv) no notification: the caller can monitor the timer state using
> timer_gettime(). (SIGEV_NONE)
>
> In all of the above cases, the return value from timer_create()
> is 0 for success, or -1 for failure.
>
> We could extend the interface as follows:
>
> 1) Add a new flag for evp.sigev_notify: SIGEV_TIMERFD.
> This flag indicates that the caller wants timer
> notification via a file descriptor.
> 2) Whenevp.sigev_notify == SIGEV_TIMERFD, have a successful
> timer_create() call return a file descriptor (i.e., an
> integer >= 0).
>
> Advantages:
> 1. Integration with an existing API.
> 2. No new system calls are required.
> 3. This idea might even have a chance of getting standardized in
> POSIX one day, since (IMO) it integrates fairly cleanly with
> an existing API.
>
> Disadvantages:
> 1. The fact that the return value of a successful timer_create()
> is different for the SIGEV_TIMERFD case is a bit of a wart.

What happens on close(fd) ? Is the posix timer automatically destroyed ?
Is the file descriptor invalidated when the timer is destroyed via
timer_delete(timer_id) ? The automatic file descriptor creation is a bit
ugly.

I'd rather see a combination of c) and d) as a solution:

Notify the posix timer code that the timer delivery is done via the file
descriptor mechanism (SIGEV_TIMERFD).

Use a new syscall to open a file descriptor on that timer.

When the file descriptor is closed the timer is not destroyed, but
delivery disabled (analogous to the SIGEV_NONE case), so you can reopen
and reactivate it later on.

This way we have it nicely integrated into the posix timer code and keep
the existing semantics of posix timers intact.

We need to think about the open file descriptor in the timer_delete()
case as well, but this should be not too hard to sort out.

tglx




2007-09-18 09:27:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Tue, 2007-09-18 at 11:01 +0200, Michael Kerrisk wrote:
> > With solution c) you have to keep two
> > references to the same timer around and use one of them depending on what
> > you want to do with the timer.
>
> Yes. (And the same for option (d).)
>
> > Also, if the timerfd is close():d, does that remove the underlying timer
> > (invalidate the timerid) as well?
>
> My gut feeling would be to say that closing the timerfd would not
> remove the underlying timer (so the timerid would remain valid).
> One could even do things like recreating a file descriptor
> for the timer using another timerfd() call.
>
> But now that raises the question: what are the semantics if
> timerfd() is called more than once on the same timerid?
> Perhaps a read() from any one of them (destructively)
> reads the expiration count, as though one had read from a
> dup()ed the file descriptor. All in all, solution (c)
> starts to look overly complex, and maybe suffers from
> various dirty corners in the API. (Solution (d) feels
> slightly better, because the creation of the file descriptor
> and the timerid are integrated into a single call, and the
> fact that it integrates with an existing API, but
> it still has the limitation you describe above.)

I don't think it is a big problem to have several open file descriptors
on a single posix timer without having destructive reads, we just need
to store the event count per file descriptor in file->private_data. We
solved this in the UIO code already and it works perfectly fine.

tglx


2007-09-18 09:30:24

by Michael Kerrisk

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

Hi Thomas,

> On Tue, 2007-09-18 at 09:30 +0200, Michael Kerrisk wrote:
> > ====> a) Add an argument (a multiplexing timerfd() system call)
> > Disadvantage:
> > Jon Corbet pointed out
> > (http://thread.gmane.org/gmane.linux.kernel/559193/focus=570709 )
> > that this interface was starting to look like a multiplexing syscall,
> > because there is no case where all of the arguments are used (see
> > the use-case descriptions in the earlier mail).
> >
> > I'm inclined to agree with Jon; therefore one of the remaining
> > solutions may be preferable
>
> I agree. It's ugly.

Fair enough. I mainly tried to do things that way to minimize
the change from the Davide's original interface.

> > ====> b) Create a timerfd interface analogous to POSIX timers
> >
> > Create an interface analogous to POSIX timers:
> >
> > fd = timerfd_create(clockid, flags);
> >
> > timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire);
> >
> > timerfd_gettime(fd, &time_to_next_expire);
> >
> > Under this proposal, the manner of making a timer that does not
> > need "get-while-set" functionality remains fairly simple:
> >
> > fd = timerfd_create(clockid);
> >
> > timerfd_settime(fd, flags, newtimervalue, NULL);
> >
> > Advantage: this would be a clean, fully functional API, and well
> > understood by virtue of its analogy with the POSIX timers API.
> >
> > Disadvantage: 3 new system calls, rather than 1.
> >
> > This solution would be sufficient, IMO, but one of the
> > next solutions might be better.
>
> I'm not scared by the 3 system calls. I rather fear that we end up
> reimplementing half of the existing posix timer code.

Yes. Perhaps some refactoring might be required, if we went
down this route.

> > ====> c) Integrate timerfd with POSIX timers
> >
> > Make a very simple timerfd call that is integrated with the
> > POSIX timers API. The POSIX timers API is detailed here:
> > http://linux.die.net/man/3/timer_create
> > http://linux.die.net/man/3/timer_settime
> >
> > Under the POSIX timers API, a new timer is created using:
> >
> > int timer_create(clockid_t clockid, struct sigevent *evp,
> > timer_t *timerid);
> >
> > We could then have a timerfd() call that returns a file descriptor
> > for the newly created 'timerid':
> >
> > fd = timerfd(timer_t timerid);
> >
> > We could then use the POSIX timers API to operate on the timer
> > (start it / modify it / fetch timer value):
> >
> > int timer_settime(timer_t timerid, int flags,
> > const struct itimerspec *value,
> > struct itimerspec *ovalue);
> > int timer_gettime(timer_t timerid, struct itimerspec *value);
> >
> > And then read() from 'fd' as before.
> >
> > In the simple case (no "get" or "get-while-setting" functionality),
> > the use of API (c) would be:
> >
> > timer_create(clockid, &evp, &timerid);
> >
> > fd = timerfd(timerid);
> >
> > timer_settime(timerid, flags, &newvalue, NULL));
> >
> > Advantages:
> > 1. Integration with an existing API.
> > 2. Adds just a single system call.
> > 3. It _might_ be possible to construct an interface that allows
> > userland programs to do things like creating a timer fd for
> > a POSIX timer that was created via some library that doesn't
> > actually know about timer fds. (I can already see problems with
> > this, since that library will already expect to be delivering
> > timer notifications somehow (via threads or signals), and it may
> > be difficult to make the two notification mechanisms play
> > together in a sane way. But maybe someone else has a take on
> > this that can rescue this idea.)
> >
> > Disadvantages:
> > 1. Starts to get a little more clunky to use in the simple
> > case shown above.
> >
> > This strikes me as a more attractive solution than (b), if we can do
> > it properly -- that means: if we can achieve advantage 3
> > in some reasonable way. If we can't achieve that, then probably
> > the next solution is better.
>
> The main problem here is, that there is no way to tell the posix timer
> code that the delivery of the timer is through the file descriptor and
> not via the usual posix timer mechanisms. We need something like the
> SIGEV_TIMERFD flag to make the posix timer code aware of that.

Well, I left it it kind of open whether the expiration
notification might be delivered via both the traditional
mechanism, and via the tiemrfd. But I realize that all
may get overly complex.

> > ====> d) extend the POSIX timers API
> >
> > Under the POSIX timers API, the evp argument of timer_create() is a
> > structure that allows the caller to specify how timer expirations
> > should be notified. There are the following possibilities
> > (differentiated by the value assigned to evp.sigev_notify):
> >
> > i) notify via a signal: the caller specifies which signal the
> > kernel should deliver when the timer expires.
> > (SIGEV_SIGNAL)
> > ii) notify by delivering a signal to the thread whose thread ID
> > is specified in evp. (This is Linux specific.)
> > (SIGEV_THREAD_ID)
> > iii) notify via a thread: when the timer expires, the system starts
> > a new thread which receives an argument that was specified in
> > the evp structure. (SIGEV_THREAD)
> > iv) no notification: the caller can monitor the timer state using
> > timer_gettime(). (SIGEV_NONE)
> >
> > In all of the above cases, the return value from timer_create()
> > is 0 for success, or -1 for failure.
> >
> > We could extend the interface as follows:
> >
> > 1) Add a new flag for evp.sigev_notify: SIGEV_TIMERFD.
> > This flag indicates that the caller wants timer
> > notification via a file descriptor.
> > 2) Whenevp.sigev_notify == SIGEV_TIMERFD, have a successful
> > timer_create() call return a file descriptor (i.e., an
> > integer >= 0).
> >
> > Advantages:
> > 1. Integration with an existing API.
> > 2. No new system calls are required.
> > 3. This idea might even have a chance of getting standardized in
> > POSIX one day, since (IMO) it integrates fairly cleanly with
> > an existing API.
> >
> > Disadvantages:
> > 1. The fact that the return value of a successful timer_create()
> > is different for the SIGEV_TIMERFD case is a bit of a wart.
>
> What happens on close(fd) ? Is the posix timer automatically destroyed ?

I would say not (see also my reply to David H?rdeman.)

> Is the file descriptor invalidated when the timer is destroyed via
> timer_delete(timer_id) ? The automatic file descriptor creation is a bit
> ugly.

Yes, it is a little ugly.

> I'd rather see a combination of c) and d) as a solution:
>
> Notify the posix timer code that the timer delivery is done via the file
> descriptor mechanism (SIGEV_TIMERFD).
>
> Use a new syscall to open a file descriptor on that timer.
>
> When the file descriptor is closed the timer is not destroyed, but
> delivery disabled (analogous to the SIGEV_NONE case), so you can reopen
> and reactivate it later on.
>
> This way we have it nicely integrated into the posix timer code and keep
> the existing semantics of posix timers intact.
>
> We need to think about the open file descriptor in the timer_delete()
> case as well, but this should be not too hard to sort out.

This seems like a workable idea also. But note David H?rdeman's
critique of options c & d: the existence of a coupled timerfd
and a timerid means that the application must maintain a mapping
between the two, so that after an epoll call (for example) that
says the timerfd is ready, the timer can be manipulated using
the corresponding timerfd. This isn't IMO a fatal flaw, but
it does make the API a little more clumsy.

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-09-18 09:43:13

by Thomas Gleixner

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Tue, 2007-09-18 at 11:30 +0200, Michael Kerrisk wrote:
> > This way we have it nicely integrated into the posix timer code and keep
> > the existing semantics of posix timers intact.
> >
> > We need to think about the open file descriptor in the timer_delete()
> > case as well, but this should be not too hard to sort out.
>
> This seems like a workable idea also. But note David Härdeman's
> critique of options c & d: the existence of a coupled timerfd
> and a timerid means that the application must maintain a mapping
> between the two, so that after an epoll call (for example) that
> says the timerfd is ready, the timer can be manipulated using
> the corresponding timerfd. This isn't IMO a fatal flaw, but
> it does make the API a little more clumsy.

Hmm, we might do something like:

timer_gettime(fd | POSIX_TIMER_FD, .....);

So the kernel looks up the fd in order to figure out the timer_id, which
needs to be referenced in filep->private_data anyway.

tglx


2007-09-18 11:08:53

by Michael Kerrisk

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

Hi Thomas,

> Von: Thomas Gleixner <[email protected]>
> On Tue, 2007-09-18 at 11:30 +0200, Michael Kerrisk wrote:
> > > This way we have it nicely integrated into the posix timer code and keep
> > > the existing semantics of posix timers intact.
> > >
> > > We need to think about the open file descriptor in the timer_delete()
> > > case as well, but this should be not too hard to sort out.
> >
> > This seems like a workable idea also. But note David H?rdeman's
> > critique of options c & d: the existence of a coupled timerfd
> > and a timerid means that the application must maintain a mapping
> > between the two, so that after an epoll call (for example) that
> > says the timerfd is ready, the timer can be manipulated using
> > the corresponding timerfd. This isn't IMO a fatal flaw, but
^^^^^^^
hmmm, of course I meant "timerid" in that last line.

> > it does make the API a little more clumsy.
>
> Hmm, we might do something like:
>
> timer_gettime(fd | POSIX_TIMER_FD, .....);
>
> So the kernel looks up the fd in order to figure out the timer_id, which
> needs to be referenced in filep->private_data anyway.

And you'd need similar for timer_settime() and, perhaps,
timer_getoverrun(). But it seems slightly ugly, in the same way that
my idea in option (d) of returning a file descriptor from
timer_create() seems a slightly ugly. (And can we guarantee that
the [timerid] space is distinct from the [fd|POSIX_TIMER_FD] space?)

Cheers,

Michael
--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages ,
read the HOWTOHELP file and grep the source
files for 'FIXME'.

2007-09-18 11:30:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Tue, 2007-09-18 at 13:08 +0200, Michael Kerrisk wrote:
> > > it does make the API a little more clumsy.
> >
> > Hmm, we might do something like:
> >
> > timer_gettime(fd | POSIX_TIMER_FD, .....);
> >
> > So the kernel looks up the fd in order to figure out the timer_id, which
> > needs to be referenced in filep->private_data anyway.
>
> And you'd need similar for timer_settime() and, perhaps,
> timer_getoverrun(). But it seems slightly ugly, in the same way that
> my idea in option (d) of returning a file descriptor from
> timer_create() seems a slightly ugly. (And can we guarantee that
> the [timerid] space is distinct from the [fd|POSIX_TIMER_FD] space?)

If we use the most significant bit for POSIX_TIMER_FD, we should be
fine.

tglx


2007-09-18 13:14:29

by David Härdeman

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Tue, September 18, 2007 13:30, Thomas Gleixner wrote:
>>> timer_gettime(fd | POSIX_TIMER_FD, .....);
>
> If we use the most significant bit for POSIX_TIMER_FD, we should be
> fine.

I think alternative b) - three new syscalls, sounds better.

The only negatives so far are that it adds more syscalls and that it might
require code duplication with posix timers. The syscall numbers argument
seemed not to be very important and the code duplication should be fixable
by refactoring the code so that more is shared between the two systems (I
assume).

Overloading file descriptors with flags looks ugly, is there any other
syscall which does that?

--
David H?rdeman
(sorry Thomas for the dupe, I missed replying to all on the first msg).


2007-09-18 16:52:05

by Davide Libenzi

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Tue, 18 Sep 2007, Michael Kerrisk wrote:

> The four designs are:
>
> a) A multiplexing timerfd() system call.
> b) Creating three syscalls analogous to the POSIX timers API (i.e.,
> timerfd_create/timerfd_settime/timerfd_gettime).
> c) Creating a simplified timerfd() system call that is integrated
> with the POSIX timers API.
> d) Extending the POSIX timers API to support the timerfd concept.

If you really want to shoot yourself in your foot, I'd pick bullet B.
Bullet A makes me sea-sick, and bullets C and D, well, let's leave POSIX
APIs being *POSIX* APIs.
Once you remove all the "ifs" and "elses" that resulted from your previous
bullet A multiplexing implementation, timerfd_gettime and timerfd_settime
should result in being pretty slick.
I still think we could have survived w/out all this done inside the
kernel though.



- Davide


2007-09-22 13:06:15

by Michael Kerrisk

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

David H?rdeman wrote:
> On Tue, September 18, 2007 13:30, Thomas Gleixner wrote:
>>>> timer_gettime(fd | POSIX_TIMER_FD, .....);
>> If we use the most significant bit for POSIX_TIMER_FD, we should be
>> fine.
>
> I think alternative b) - three new syscalls, sounds better.
>
> The only negatives so far are that it adds more syscalls and that it might
> require code duplication with posix timers. The syscall numbers argument
> seemed not to be very important and the code duplication should be fixable
> by refactoring the code so that more is shared between the two systems (I
> assume).

Yes, I'm inclined to agree with you on both points.

> Overloading file descriptors with flags looks ugly, is there any other
> syscall which does that?

AFAIK there is no other syscall that does that. I agree that it's not very
pretty.

Cheers,

Michael

--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance? Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages/
read the HOWTOHELP file and grep the source files for 'FIXME'.

2007-09-22 13:15:38

by Michael Kerrisk

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

Davide, Andrew, Linus, et al.

At the start of this thread
(http://thread.gmane.org/gmane.linux.kernel/581115 ), I proposed 4
alternatives to Davide's original timerfd API. Based on the feedback in
that thread (and one or two earlier comments):

Let's dismiss option (a), since it is an unlovely multiplexing interface.

Option (b) seems a viable. The most notable concern was from Thomas
Gleixner, that we might end up duplicating code from the POSIX timers API
within the timerfd API -- some eventual refactoring might mitigate this
problem.

Option (c) seems overly complex. In addition, David Härdeman pointed out
that option (c) (and, I realised afterwards, option (d)) require the
userland programmer to maintain a mapping between timerfd file descriptors
and POSIX timer IDs. Thomas Gleixner proposed an API that: attempts to
avoid that problem; mixes features of options (c) and (d); and probably
helps avoid redundancy of kernel code between the timerfd system and the
POSIX timers system. I'll flesh out that API now as I understand it:

====> e) Integrate timerfd() with the POSIX timers API in such a way that
the POSIX timers API understands timerfd file descriptors.

Under the POSIX timers API, a new timer is created using:

int timer_create(clockid_t clockid, struct sigevent *evp,
timer_t *timerid);

When making this call, we would specify evp.sigev_notify to a new flag
value SIGEV_TIMERFD, to inform the system that this timer will deliver
notification via a timerfd file descriptor.

We would then have a timerfd() call that returns a file descriptor
for the newly created 'timerid':

fd = timerfd(timer_t timerid);

(A variant here would be to have timer_create() directly return a file
descriptor when SIGEV_TIMERFD is specified, although this breaks the
traditional semantics that timer_create() only returns 0 on success.)

We could then use the POSIX timers API to operate on the timer
(start it / modify it / fetch timer value):

int timer_settime(timer_t timerid, int flags,
const struct itimerspec *value,
struct itimerspec *ovalue);
int timer_gettime(timer_t timerid, struct itimerspec *value);

The difference here is that 'timerid' could be either:

1) the timerid value returned from timer_create(); or

2) the value (fd | POSIX_TIMER_FD), where POSIX_TIMER_FD is a
flag (perhaps the topmost bit set on) that indicates that
the rest of the value is a file descriptor. With this
information, the kernel can do a lookup to find the
corresponding timerfd and perform the required operation
on it.

Advantages:
1. Userland programs don't need to maintain a mapping between
timer IDs and file descriptors.
2. Adds just a single system call.

Disadvantages:
1. This design stretches the POSIX timers API in strange
ways. My option (d) also did this to a lesser extent,
and that felt slightly uncomfortable. Option (e)
makes more uncomfortable still. As David Härdeman
pointed out, overloading file descriptors with flags looks
ugly, and I can't thing of any other syscall that does
that. In addition this idea probably breaks POSIX, since
'timer_t' is only required to be an arithmetic type: it
need not specifically be an integer type (although it is
on Linux).

=====

The upshot is that of the 5 alternatives, I favor option (b). David
Härdeman also expressed a preference for option (b) and it was Davide's
least disliked alternative ;-).

So I'm inclined to implement option (b), unless someone has strong
objections. Davide, could I persuade you to help?

Cheers,

Michael

--
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance? Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages/
read the HOWTOHELP file and grep the source files for 'FIXME'.

2007-09-22 14:33:07

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

In article <[email protected]> you wrote:
> 1. This design stretches the POSIX timers API in strange
> ways.

Maybe it is possible to reimplement the POSIX API in usermode using the
kernel's FD implementation? (and drop the posix support from kernel)

Gruss
Bernd

2007-09-22 16:07:28

by Michael Kerrisk

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

Hello Bernd,

Please don't trim the CC list when replying! I nearly did not see
your reply, and others will have missed it also.

On 9/22/07, Bernd Eckenfels <[email protected]> wrote:
> In article <[email protected]> you wrote:
> > 1. This design stretches the POSIX timers API in strange
> > ways.
>
> Maybe it is possible to reimplement the POSIX API in usermode using the
> kernel's FD implementation?

It's a clever idea... Without thinking on it too long, I'm not sure
whether or not there might be some details which would make this
difficult.

> (and drop the posix support from kernel)

However we couldn't drop POSIX support from the kernel, because that
would break the ABI.

Cheers,

Michael

2007-09-22 17:05:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Sat, 2007-09-22 at 18:07 +0200, Michael Kerrisk wrote:
> Hello Bernd,
>
> Please don't trim the CC list when replying! I nearly did not see
> your reply, and others will have missed it also.

Yup.

> On 9/22/07, Bernd Eckenfels <[email protected]> wrote:
> > In article <[email protected]> you wrote:
> > > 1. This design stretches the POSIX timers API in strange
> > > ways.
> >
> > Maybe it is possible to reimplement the POSIX API in usermode using the
> > kernel's FD implementation?

Yikes.

> It's a clever idea... Without thinking on it too long, I'm not sure
> whether or not there might be some details which would make this
> difficult.

You'd need be quite masochistic to start such a project. The POSIX timer
API consists mostly of corner cases and I doubt that you get them even
halfway under control in a pure user space implementation.

It would be a rather huge performance penalty as well. You need at least
two user space context switches to get the most simple cases resolved.

> > (and drop the posix support from kernel)
>
> However we couldn't drop POSIX support from the kernel, because that
> would break the ABI.

True. So there is no point in reinventing the wheel.

tglx


2007-09-22 17:10:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

Michael,

On Sat, 2007-09-22 at 15:12 +0200, Michael Kerrisk wrote:
> Davide, Andrew, Linus, et al.
>
> At the start of this thread
> (http://thread.gmane.org/gmane.linux.kernel/581115 ), I proposed 4
> alternatives to Davide's original timerfd API. Based on the feedback in
> that thread (and one or two earlier comments):
>
> Let's dismiss option (a), since it is an unlovely multiplexing interface.
>
> Option (b) seems a viable. The most notable concern was from Thomas
> Gleixner, that we might end up duplicating code from the POSIX timers API
> within the timerfd API -- some eventual refactoring might mitigate this
> problem.

It should be possible to use the timerfd syscalls as wrappers for the
posix timer implementation and add the discussed SIGEV_TIMERFD only
internally in the kernel to signal the posix timer code new delivery
mechanism.

tglx




2007-09-22 21:08:00

by Davide Libenzi

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Sat, 22 Sep 2007, Michael Kerrisk wrote:

> So I'm inclined to implement option (b), unless someone has strong
> objections. Davide, could I persuade you to help?

I guess I better do, otherwise you'll continue to stress me ;)

int timerfd_create(int clockid);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);

Patch below. Builds, not tested yet (you need to remove the "broken"
status from CONFIG_TIMERFD in case you want to test - and plug the new
syscall to arch/xxx).
May that work for you?
Thomas-san, hrtimer_try_to_cancel() does not touch ->expires and I assume
it'll never do, granted?



- Davide



---
fs/compat.c | 32 ++++++++--
fs/timerfd.c | 144 +++++++++++++++++++++++++++++------------------
include/linux/compat.h | 7 +-
include/linux/syscalls.h | 7 +-
4 files changed, 126 insertions(+), 64 deletions(-)

Index: linux-2.6.mod/fs/timerfd.c
===================================================================
--- linux-2.6.mod.orig/fs/timerfd.c 2007-09-22 12:22:19.000000000 -0700
+++ linux-2.6.mod/fs/timerfd.c 2007-09-22 13:21:21.000000000 -0700
@@ -23,6 +23,7 @@

struct timerfd_ctx {
struct hrtimer tmr;
+ int clockid;
ktime_t tintv;
wait_queue_head_t wqh;
int expired;
@@ -46,7 +47,7 @@
return HRTIMER_NORESTART;
}

-static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags,
+static void timerfd_setup(struct timerfd_ctx *ctx, int flags,
const struct itimerspec *ktmr)
{
enum hrtimer_mode htmode;
@@ -58,7 +59,7 @@
texp = timespec_to_ktime(ktmr->it_value);
ctx->expired = 0;
ctx->tintv = timespec_to_ktime(ktmr->it_interval);
- hrtimer_init(&ctx->tmr, clockid, htmode);
+ hrtimer_init(&ctx->tmr, ctx->clockid, htmode);
ctx->tmr.expires = texp;
ctx->tmr.function = timerfd_tmrproc;
if (texp.tv64 != 0)
@@ -150,76 +151,109 @@
.read = timerfd_read,
};

-asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
- const struct itimerspec __user *utmr)
+asmlinkage long sys_timerfd_create(int clockid)
{
- int error;
+ int error, ufd;
struct timerfd_ctx *ctx;
struct file *file;
struct inode *inode;
- struct itimerspec ktmr;
-
- if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
- return -EFAULT;

if (clockid != CLOCK_MONOTONIC &&
clockid != CLOCK_REALTIME)
return -EINVAL;
+
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+ if (!ctx)
+ return -ENOMEM;
+
+ init_waitqueue_head(&ctx->wqh);
+ ctx->clockid = clockid;
+
+ error = anon_inode_getfd(&ufd, &inode, &file, "[timerfd]",
+ &timerfd_fops, ctx);
+ if (error)
+ goto err_kfree_ctx;
+
+ return ufd;
+
+err_kfree_ctx:
+ kfree(ctx);
+ return error;
+}
+
+asmlinkage long sys_timerfd_settime(int ufd, int flags,
+ const struct itimerspec __user *utmr,
+ struct itimerspec __user *otmr)
+{
+ struct file *file;
+ struct timerfd_ctx *ctx;
+ struct itimerspec ktmr, kotmr;
+
+ if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
+ return -EFAULT;
+
if (!timespec_valid(&ktmr.it_value) ||
!timespec_valid(&ktmr.it_interval))
return -EINVAL;

- if (ufd == -1) {
- ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
- if (!ctx)
- return -ENOMEM;
-
- init_waitqueue_head(&ctx->wqh);
-
- timerfd_setup(ctx, clockid, flags, &ktmr);
-
- /*
- * When we call this, the initialization must be complete, since
- * anon_inode_getfd() will install the fd.
- */
- error = anon_inode_getfd(&ufd, &inode, &file, "[timerfd]",
- &timerfd_fops, ctx);
- if (error)
- goto err_tmrcancel;
- } else {
- file = fget(ufd);
- if (!file)
- return -EBADF;
- ctx = file->private_data;
- if (file->f_op != &timerfd_fops) {
- fput(file);
- return -EINVAL;
- }
- /*
- * We need to stop the existing timer before reprogramming
- * it to the new values.
- */
- for (;;) {
- spin_lock_irq(&ctx->wqh.lock);
- if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
- break;
- spin_unlock_irq(&ctx->wqh.lock);
- cpu_relax();
- }
- /*
- * Re-program the timer to the new value ...
- */
- timerfd_setup(ctx, clockid, flags, &ktmr);
+ file = fget(ufd);
+ if (!file)
+ return -EBADF;
+ ctx = file->private_data;
+ if (file->f_op != &timerfd_fops) {
+ fput(file);
+ return -EINVAL;
+ }

+ /*
+ * We need to stop the existing timer before reprogramming
+ * it to the new values.
+ */
+ for (;;) {
+ spin_lock_irq(&ctx->wqh.lock);
+ if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
+ break;
spin_unlock_irq(&ctx->wqh.lock);
+ cpu_relax();
+ }
+
+ kotmr.it_value = ktime_to_timespec(ctx->tmr.expires);
+ kotmr.it_interval = ktime_to_timespec(ctx->tintv);
+
+ /*
+ * Re-program the timer to the new value ...
+ */
+ timerfd_setup(ctx, flags, &ktmr);
+
+ spin_unlock_irq(&ctx->wqh.lock);
+ fput(file);
+ if (otmr && copy_to_user(otmr, &kotmr, sizeof(kotmr)))
+ return -EFAULT;
+
+ return 0;
+}
+
+asmlinkage long sys_timerfd_gettime(int ufd, struct itimerspec __user *otmr)
+{
+ struct file *file;
+ struct timerfd_ctx *ctx;
+ struct itimerspec kotmr;
+
+ file = fget(ufd);
+ if (!file)
+ return -EBADF;
+ ctx = file->private_data;
+ if (file->f_op != &timerfd_fops) {
fput(file);
+ return -EINVAL;
}

- return ufd;
+ spin_lock_irq(&ctx->wqh.lock);
+ kotmr.it_value = ktime_to_timespec(ctx->tmr.expires);
+ kotmr.it_interval = ktime_to_timespec(ctx->tintv);
+ spin_unlock_irq(&ctx->wqh.lock);
+ fput(file);

-err_tmrcancel:
- hrtimer_cancel(&ctx->tmr);
- kfree(ctx);
- return error;
+ return copy_to_user(otmr, &kotmr, sizeof(kotmr)) ? -EFAULT: 0;
}

Index: linux-2.6.mod/include/linux/syscalls.h
===================================================================
--- linux-2.6.mod.orig/include/linux/syscalls.h 2007-09-22 12:23:08.000000000 -0700
+++ linux-2.6.mod/include/linux/syscalls.h 2007-09-22 13:08:49.000000000 -0700
@@ -607,8 +607,11 @@
size_t len);
asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node, struct getcpu_cache __user *cache);
asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t sizemask);
-asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
- const struct itimerspec __user *utmr);
+asmlinkage long sys_timerfd_create(int clockid);
+asmlinkage long sys_timerfd_settime(int ufd, int flags,
+ const struct itimerspec __user *utmr,
+ struct itimerspec __user *otmr);
+asmlinkage long sys_timerfd_gettime(int ufd, struct itimerspec __user *otmr);
asmlinkage long sys_eventfd(unsigned int count);
asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);

Index: linux-2.6.mod/fs/compat.c
===================================================================
--- linux-2.6.mod.orig/fs/compat.c 2007-09-22 13:31:41.000000000 -0700
+++ linux-2.6.mod/fs/compat.c 2007-09-22 13:47:02.000000000 -0700
@@ -2210,19 +2210,41 @@

#ifdef CONFIG_TIMERFD

-asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
- const struct compat_itimerspec __user *utmr)
+asmlinkage long compat_sys_timerfd_settime(int ufd, int flags,
+ const struct compat_itimerspec __user *utmr,
+ struct compat_itimerspec __user *otmr)
{
+ int error;
struct itimerspec t;
struct itimerspec __user *ut;

if (get_compat_itimerspec(&t, utmr))
return -EFAULT;
- ut = compat_alloc_user_space(sizeof(*ut));
- if (copy_to_user(ut, &t, sizeof(t)))
+ ut = compat_alloc_user_space(2 * sizeof(struct itimerspec));
+ if (copy_to_user(&ut[0], &t, sizeof(t)))
return -EFAULT;
+ error = sys_timerfd_settime(ufd, flags, &ut[0], &ut[1]);
+ if (!error && otmr)
+ error = (copy_from_user(&t, &ut[1], sizeof(struct itimerspec)) ||
+ put_compat_itimerspec(otmr, &t)) ? -EFAULT: 0;

- return sys_timerfd(ufd, clockid, flags, ut);
+ return error;
+}
+
+asmlinkage long compat_sys_timerfd_gettime(int ufd,
+ struct compat_itimerspec __user *otmr)
+{
+ int error;
+ struct itimerspec t;
+ struct itimerspec __user *ut;
+
+ ut = compat_alloc_user_space(sizeof(struct itimerspec));
+ error = sys_timerfd_gettime(ufd, ut);
+ if (!error)
+ error = (copy_from_user(&t, ut, sizeof(struct itimerspec)) ||
+ put_compat_itimerspec(otmr, &t)) ? -EFAULT: 0;
+
+ return error;
}

#endif /* CONFIG_TIMERFD */
Index: linux-2.6.mod/include/linux/compat.h
===================================================================
--- linux-2.6.mod.orig/include/linux/compat.h 2007-09-22 13:47:57.000000000 -0700
+++ linux-2.6.mod/include/linux/compat.h 2007-09-22 13:48:41.000000000 -0700
@@ -264,8 +264,11 @@
asmlinkage long compat_sys_signalfd(int ufd,
const compat_sigset_t __user *sigmask,
compat_size_t sigsetsize);
-asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
- const struct compat_itimerspec __user *utmr);
+asmlinkage long compat_sys_timerfd_settime(int ufd, int flags,
+ const struct compat_itimerspec __user *utmr,
+ struct compat_itimerspec __user *otmr);
+asmlinkage long compat_sys_timerfd_gettime(int ufd,
+ struct compat_itimerspec __user *otmr);

#endif /* CONFIG_COMPAT */
#endif /* _LINUX_COMPAT_H */

2007-09-22 21:26:37

by Thomas Gleixner

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Sat, 2007-09-22 at 14:07 -0700, Davide Libenzi wrote:
> On Sat, 22 Sep 2007, Michael Kerrisk wrote:
>
> > So I'm inclined to implement option (b), unless someone has strong
> > objections. Davide, could I persuade you to help?
>
> I guess I better do, otherwise you'll continue to stress me ;)
>
> int timerfd_create(int clockid);
> int timerfd_settime(int ufd, int flags,
> const struct itimerspec *utmr,
> struct itimerspec *otmr);
> int timerfd_gettime(int ufd, struct itimerspec *otmr);
>
> Patch below. Builds, not tested yet (you need to remove the "broken"
> status from CONFIG_TIMERFD in case you want to test - and plug the new
> syscall to arch/xxx).
> May that work for you?
> Thomas-san, hrtimer_try_to_cancel() does not touch ->expires and I assume
> it'll never do, granted?

Davide-san, I have no intention to change that, but remember there is
this file "Documentation/stable_api_nonsense.txt" :)

tglx



2007-09-22 23:21:33

by Davide Libenzi

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Sat, 22 Sep 2007, Thomas Gleixner wrote:

> On Sat, 2007-09-22 at 14:07 -0700, Davide Libenzi wrote:
> > On Sat, 22 Sep 2007, Michael Kerrisk wrote:
> >
> > > So I'm inclined to implement option (b), unless someone has strong
> > > objections. Davide, could I persuade you to help?
> >
> > I guess I better do, otherwise you'll continue to stress me ;)
> >
> > int timerfd_create(int clockid);
> > int timerfd_settime(int ufd, int flags,
> > const struct itimerspec *utmr,
> > struct itimerspec *otmr);
> > int timerfd_gettime(int ufd, struct itimerspec *otmr);
> >
> > Patch below. Builds, not tested yet (you need to remove the "broken"
> > status from CONFIG_TIMERFD in case you want to test - and plug the new
> > syscall to arch/xxx).
> > May that work for you?
> > Thomas-san, hrtimer_try_to_cancel() does not touch ->expires and I assume
> > it'll never do, granted?
>
> Davide-san, I have no intention to change that, but remember there is
> this file "Documentation/stable_api_nonsense.txt" :)

Heh, I guess that'll work then ;)



- Davide


2007-09-23 17:36:46

by Michael Kerrisk

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

Hi Davide,

Davide Libenzi wrote:
> On Sat, 22 Sep 2007, Michael Kerrisk wrote:
>
>> So I'm inclined to implement option (b), unless someone has strong
>> objections. Davide, could I persuade you to help?
>
> I guess I better do, otherwise you'll continue to stress me ;)

Thanks -- that was more than I hoped for!

> int timerfd_create(int clockid);
> int timerfd_settime(int ufd, int flags,
> const struct itimerspec *utmr,
> struct itimerspec *otmr);
> int timerfd_gettime(int ufd, struct itimerspec *otmr);
>
> Patch below. Builds, not tested yet (you need to remove the "broken"
> status from CONFIG_TIMERFD in case you want to test - and plug the new
> syscall to arch/xxx).

I applied this patch against 2.6.27-rc7, and wired up the syscalls as shown
in the definitions below. When I ran the the program below, my system
immediately froze. Can you try it on your system please.

Cheers,

Michael

/* Link with -lrt */

#define _GNU_SOURCE
#include <sys/syscall.h>
#include <unistd.h>
#include <time.h>
#if defined(__i386__)
#define __NR_timerfd_create 325
#define __NR_timerfd_settime 326
#define __NR_timerfd_gettime 327
17170:man-pages/man2> cat timerfd3_test.c
/* Link with -lrt */

#define _GNU_SOURCE
#include <sys/syscall.h>
#include <unistd.h>
#include <time.h>
#if defined(__i386__)
#define __NR_timerfd_create 325
#define __NR_timerfd_settime 326
#define __NR_timerfd_gettime 327
#endif

static int
timerfd_create(int clockid)
{
return syscall(__NR_timerfd_create, clockid);
}

static int
timerfd_settime(int ufd, int flags, struct itimerspec *utmr,
struct itimerspec *outmr)
{
return syscall(__NR_timerfd_settime, ufd, flags, utmr, outmr);
}

static int
timerfd_gettime(int ufd, struct itimerspec *outmr)
{
return syscall(__NR_timerfd_gettime, ufd, outmr);
}

/*
static int
timerfd(int ufd, int clockid, int flags, struct itimerspec *utmr,
struct itimerspec *outmr)
{
return syscall(__NR_timerfd, ufd, clockid, flags, utmr, outmr);
}

*/


/*
int timerfd_settime(int ufd, int flags,
> > const struct itimerspec *utmr,
> > struct itimerspec *otmr);
> > int timerfd_gettime(int ufd, struct itimerspec *otmr)
*/
#define TFD_TIMER_ABSTIME (1 << 0)

////////////////////////////////////////////////////////////

// #include <sys/timerfd.h>
#include <time.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h> /* Definition of uint32_t */

#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)

static void
print_elapsed_time(void)
{
static struct timespec start;
struct timespec curr;
static int first_call = 1;
int secs, nsecs;

if (first_call) {
first_call = 0;
if (clock_gettime(CLOCK_MONOTONIC, &start) == -1)
die("clock_gettime");
}

if (clock_gettime(CLOCK_MONOTONIC, &curr) == -1)
die("clock_gettime");

secs = curr.tv_sec - start.tv_sec;
nsecs = curr.tv_nsec - start.tv_nsec;
if (nsecs < 0) {
secs--;
nsecs += 1000000000;
}
printf("%d.%03d: ", secs, (nsecs + 500000) / 1000000);
}

int
main(int argc, char *argv[])
{
struct itimerspec utmr, outmr;
int ufd;
struct timespec now;
int j, s;
uint64_t exp;
time_t start;

if (argc < 2) {
fprintf(stderr, "%s init-secs [interval-secs]\n",
argv[0]);
exit(EXIT_FAILURE);
}

if (clock_gettime(CLOCK_REALTIME, &now) == -1)
die("clock_gettime");

/* Create a CLOCK_REALTIME absolute timer with initial
expiration and interval as specified in command line */

utmr.it_value.tv_sec = now.tv_sec + atoi(argv[1]);
utmr.it_value.tv_nsec = now.tv_nsec;
if (argc == 2) {
utmr.it_interval.tv_sec = 0;
} else {
utmr.it_interval.tv_sec = atoi(argv[2]);
}
utmr.it_interval.tv_nsec = 0;

ufd = timerfd_create(CLOCK_REALTIME);
if (ufd == -1)
die("timerfd");
s = timerfd_settime(ufd, TFD_TIMER_ABSTIME, &utmr, NULL);
if (ufd == -1)
die("timerfd");

start = time(NULL);
for (j = 0; ; j++) {
sleep(1);
if (0 && (j % 10) == 0) {
printf("Resetting timer\n");
utmr.it_value.tv_sec += 1;
utmr.it_interval.tv_sec += 2;
s = timerfd_settime(ufd, 0, &utmr, &outmr);
if (s == -1)
die("timerfd-2");

}
s = timerfd_gettime(ufd, &outmr);
printf("Retrieval %3d (%3ld) - Got: %ld %ld; %ld %ld\n",
j, (long) (time(NULL) - start),
(long) outmr.it_value.tv_sec,
(long) outmr.it_value.tv_nsec,
(long) outmr.it_interval.tv_sec,
(long) outmr.it_interval.tv_nsec);
if ((j % 30) == 0 && j > 0) {
printf("About to read\n");
s = read(ufd, &exp, sizeof(uint64_t));
if (s != sizeof(uint64_t))
die("read");
printf("Read: %lld\n", exp);
}
}

exit(EXIT_SUCCESS);
}

2007-09-23 18:34:09

by Davide Libenzi

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Sun, 23 Sep 2007, Michael Kerrisk wrote:

> I applied this patch against 2.6.27-rc7, and wired up the syscalls as shown
> in the definitions below. When I ran the the program below, my system
> immediately froze. Can you try it on your system please.

There's an hrtimer_init() missing in timerfd_create(). I'll refactor the
patch.



- Davide


2007-09-23 18:41:43

by Davide Libenzi

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Sun, 23 Sep 2007, Davide Libenzi wrote:

> On Sun, 23 Sep 2007, Michael Kerrisk wrote:
>
> > I applied this patch against 2.6.27-rc7, and wired up the syscalls as shown
> > in the definitions below. When I ran the the program below, my system
> > immediately froze. Can you try it on your system please.
>
> There's an hrtimer_init() missing in timerfd_create(). I'll refactor the
> patch.

There's the case of a timerfd_gettime return status when the timerfd has
not been set yet (ie, soon after a timerfd_create), to handle.
Current way is to return an (itimerspec) { 0, 0 }. Ok?



- Davide


2007-09-23 19:03:17

by Michael Kerrisk

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

Davide Libenzi wrote:
> On Sun, 23 Sep 2007, Davide Libenzi wrote:
>
>> On Sun, 23 Sep 2007, Michael Kerrisk wrote:
>>
>>> I applied this patch against 2.6.27-rc7, and wired up the syscalls as shown
>>> in the definitions below. When I ran the the program below, my system
>>> immediately froze. Can you try it on your system please.
>> There's an hrtimer_init() missing in timerfd_create(). I'll refactor the
>> patch.
>
> There's the case of a timerfd_gettime return status when the timerfd has
> not been set yet (ie, soon after a timerfd_create), to handle.
> Current way is to return an (itimerspec) { 0, 0 }. Ok?

Seems reasonable. In the analogous situation, the POSIX timers API
returns a structure containing all zeros, at least on Linux.

2007-10-01 16:22:34

by David Härdeman

[permalink] [raw]
Subject: Re: RFC: A revised timerfd API

On Sat, Sep 22, 2007 at 06:07:14PM +0200, Michael Kerrisk wrote:
>On 9/22/07, Bernd Eckenfels <[email protected]> wrote:
>> In article <[email protected]> you wrote:
>> > 1. This design stretches the POSIX timers API in strange
>> > ways.
>>
>> Maybe it is possible to reimplement the POSIX API in usermode using the
>> kernel's FD implementation?
>
>It's a clever idea... Without thinking on it too long, I'm not sure
>whether or not there might be some details which would make this
>difficult.

It seems to be a dangerous idea. It has the potential of breaking
userspace applications that rely on POSIX timers not creating fd's.

Image code like this:

/* Close stdin, stdout, stderr */
close(0);
close(1);
close(2);

/* Oh, a timer would be nice */
timer_create(x, y, z);

/* Create new stdin, stdout, stderr */
fd = open("/dev/null", flags);
dup(fd);
dup(fd);

Unless timer_create does some magic to avoid using the lowest available
fd, this would suddenly break as the timerfd would be fd 0.

--
David H?rdeman