2002-11-24 17:39:20

by Felix von Leitner

[permalink] [raw]
Subject: epoll_wait conflicts with man page

I just implemented epoll_create, epoll_ctl and epoll_wait for the diet
libc and found that epoll_wait in 2.5.59 does not expect struct
epoll_event* as second argument but actually struct pollfd*.

That makes it more useful in one sense because porting old poll programs
is easier this way. On the other hand, it makes the whole API less
useful because the epoll_ctl call is documented to use struct
epoll_event which contains opaque user data to enlist a file descriptor.
This data can now not be passed back to user space because it does not
fit in struct pollfd.

By the way: the epoll API looks great! I especially like the opaque
user specified data thing, however making it a union is not so smart
because strace can't meaningfully display it. Also, I would move the
"int fd" out of the union because it is universally useful to know the
file descriptor without having to save it in the opaque data.

Felix

PS: I noticed that the epoll syscall numbers are not #defined on several
platforms, for example sparc and mips.


2002-11-24 17:49:34

by Davide Libenzi

[permalink] [raw]
Subject: Re: epoll_wait conflicts with man page

On Sun, 24 Nov 2002, Felix von Leitner wrote:

> I just implemented epoll_create, epoll_ctl and epoll_wait for the diet
> libc and found that epoll_wait in 2.5.59 does not expect struct
> epoll_event* as second argument but actually struct pollfd*.

Man pages are currently under review/editing and the definitive ones
should be ready for the next week.



- Davide

2002-11-24 17:48:44

by Davide Libenzi

[permalink] [raw]
Subject: Re: epoll_wait conflicts with man page

On Sun, 24 Nov 2002, Felix von Leitner wrote:

> I just implemented epoll_create, epoll_ctl and epoll_wait for the diet
> libc and found that epoll_wait in 2.5.59 does not expect struct
> epoll_event* as second argument but actually struct pollfd*.
>
> That makes it more useful in one sense because porting old poll programs
> is easier this way. On the other hand, it makes the whole API less
> useful because the epoll_ctl call is documented to use struct
> epoll_event which contains opaque user data to enlist a file descriptor.
> This data can now not be passed back to user space because it does not
> fit in struct pollfd.
>
> By the way: the epoll API looks great! I especially like the opaque
> user specified data thing, however making it a union is not so smart
> because strace can't meaningfully display it. Also, I would move the
> "int fd" out of the union because it is universally useful to know the
> file descriptor without having to save it in the opaque data.

This will be the file that I'll submit to Ulrich for inclusion in glibc :


#ifndef _SYS_EPOLL_H
#define _SYS_EPOLL_H 1

#include <sys/types.h>


enum EPOLL_EVENTS {
EPOLLIN = 0x001,
#define EPOLLIN EPOLLIN

EPOLLPRI = 0x002,
#define EPOLLPRI EPOLLPRI

EPOLLOUT = 0x004,
#define EPOLLOUT EPOLLOUT

#ifdef __USE_XOPEN

EPOLLRDNORM = 0x040,
#define EPOLLRDNORM EPOLLRDNORM

EPOLLRDBAND = 0x080,
#define EPOLLRDBAND EPOLLRDBAND

EPOLLWRNORM = 0x100,
#define EPOLLWRNORM EPOLLWRNORM

EPOLLWRBAND = 0x200,
#define EPOLLWRBAND EPOLLWRBAND

#endif /* #ifdef __USE_XOPEN */

#ifdef __USE_GNU
EPOLLMSG = 0x400,
#define EPOLLMSG EPOLLMSG
#endif /* #ifdef __USE_GNU */

EPOLLERR = 0x008,
#define EPOLLERR EPOLLERR

EPOLLHUP = 0x010
#define EPOLLHUP EPOLLHUP

};


/* Valid opcodes ( "op" parameter ) to issue to epoll_ctl() */
#define EPOLL_CTL_ADD 1 /* Add a file decriptor to the interface */
#define EPOLL_CTL_DEL 2 /* Remove a file decriptor from the interface */
#define EPOLL_CTL_MOD 3 /* Change file decriptor epoll_event structure */


typedef union epoll_data {
void *ptr;
int fd;
__uint32_t u32;
__uint64_t u64;
} epoll_data_t;

struct epoll_event {
__uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};


__BEGIN_DECLS


/*
* Creates an epoll instance.
*
* Returns an fd for the new instance.
*
* The "size" parameter is a hint specifying the number of file
* descriptors to be associated with the new instance.
*
* The fd returned by epoll_create() should be closed with close().
*/
extern int epoll_create(int size);


/*
* Manipulate an epoll instance "epfd".
*
* Returns 0 in case of success, -1 in case of error ( the "errno" variable
* will contain the specific error code )
*
* The "op" parameter is one of the EPOLL_CTL_* constants defined above.
* The "fd" parameter is the target of the operation.
* The "event" parameter describes which events the caller is interested
* in and any associated user data.
*/
extern int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event) __THROW;


/*
* Wait for events on an epoll instance "epfd".
*
* Returns the number of triggered events returned in "events" buffer.
* Or -1 in case of error with the "errno" variable set to the specific error code.
*
* The "events" parameter is a buffer that will contain triggered events.
* The "maxevents" is the maximum number of events to be returned
* ( usually size of "events" ).
* The "timeout" parameter specifies the maximum wait time in milliseconds
* ( -1 == infinite ).
*/
extern int epoll_wait(int epfd, struct epoll_event *events, int maxevents,
int timeout) __THROW;

__END_DECLS


#endif /* #ifndef _SYS_EPOLL_H */


This is waht my current code implements. As soon has Linus will merge my
latest bits ( likely in 2.5.50 ) this will be the interface exposed by the
kernel, and the one that will go in glibc. The latest change is the
removal of "revents" ( Jamie suggestion ) and the usage of "events" for
both setting the interest mask and retrieving the available events.




- Davide


2002-11-24 19:04:16

by Davide Libenzi

[permalink] [raw]
Subject: Re: epoll_wait conflicts with man page

On Sun, 24 Nov 2002, Davide Libenzi wrote:

> On Sun, 24 Nov 2002, Felix von Leitner wrote:
>
> > I just implemented epoll_create, epoll_ctl and epoll_wait for the diet
> > libc and found that epoll_wait in 2.5.59 does not expect struct
> > epoll_event* as second argument but actually struct pollfd*.
>
> Man pages are currently under review/editing and the definitive ones
> should be ready for the next week.

Since I received many emails about the kernel not exposing the final
interface that is currently documented, and since Linus did not merge my
latest bits, I prepared a patch that will align the kernel to the latest
API :

http://www.xmailserver.org/linux-patches/sys_epoll-2.5.49-0.58.diff

The latest API is documented here :

http://www.xmailserver.org/linux-patches/epoll.txt
http://www.xmailserver.org/linux-patches/epoll.4
http://www.xmailserver.org/linux-patches/epoll_create.txt
http://www.xmailserver.org/linux-patches/epoll_create.2
http://www.xmailserver.org/linux-patches/epoll_ctl.txt
http://www.xmailserver.org/linux-patches/epoll_ctl.2
http://www.xmailserver.org/linux-patches/epoll_wait.txt
http://www.xmailserver.org/linux-patches/epoll_wait.2

A few bits inside the man pages might change ( epoll.4 maybe heavily ) but
the API should be pretty much fixed right now. An access library is
available here :

http://www.xmailserver.org/linux-patches/epoll-lib-0.3.tar.gz




- Davide

2002-11-24 20:36:49

by Jamie Lokier

[permalink] [raw]
Subject: Re: epoll_wait conflicts with man page

Felix von Leitner wrote:
> Also, I would move the
> "int fd" out of the union because it is universally useful to know the
> file descriptor without having to save it in the opaque data.

Hi Felix,

It is not possible for epoll itself to keep track of the fd number,
because some more advanced applications can change or duplicate fd
numbers (using dup(), dup2() or fcntl(F_DUPFD)), and in some cases its
possible for an event to arrive on an object which doesn't even _any_
valid fd value in the current process (e.g. while it's being passed
from one process to another through a unix domain socket, or with
certain uses of clone()).

While this might seem peculiar, it is the sort of thing that some
kinds of scalable server software do, for good reasons.

The only thing epoll could report would be the initially registered
"fd" value, which is meaningful for some applications but not
universally correct. As it would not always be correct, it is best
for the application itself to keep track of fd numbers. In practice,
all applications either use the `fd' field in the union (and no other
user-data), or a pointer in the union to a structure of flags and
other per-fd data, and that structure always includes a correct `fd'
value anyway for other reasons.

So, the API is perfect as it is :)

-- Jamie