2002-10-18 21:50:37

by Hanna Linder

[permalink] [raw]
Subject: [PATCH] sys_epoll system call interface to /dev/epoll


Linus,

You said earlier that you didn't like epoll being
in /dev. How about a system call interface instead? this
patch provides the skeleton for that system call. We
can have Davide's patch ported into it ASAP if you include
this by the freeze.
/dev/epoll has been shown to be the most scalable
of all the existing solutions and there needs to be a
scalable poll in 2.6 for various applications.

Please include or give any feedback that might help us
get it into the next release.

Thank you.

Hanna Linder
IBM Linux Technology Center
Kernel Group

----

diff -Nru -X ../dontdiff linux-2.5.43/arch/um/kernel/sys_call_table.c linux-epoll/arch/um/kernel/sys_call_table.c
--- linux-2.5.43/arch/um/kernel/sys_call_table.c Tue Oct 15 20:28:34 2002
+++ linux-epoll/arch/um/kernel/sys_call_table.c Fri Oct 18 12:21:15 2002
@@ -228,6 +228,7 @@
extern syscall_handler_t sys_io_getevents;
extern syscall_handler_t sys_io_submit;
extern syscall_handler_t sys_io_cancel;
+extern syscall_handler_t sys_epoll;
extern syscall_handler_t sys_exit_group;

#if CONFIG_NFSD
@@ -476,6 +477,7 @@
[ __NR_io_getevents ] = sys_io_getevents,
[ __NR_io_submit ] = sys_io_submit,
[ __NR_io_cancel ] = sys_io_cancel,
+ [ __NR_epoll ] = sys_epoll,
[ __NR_alloc_hugepages ] = sys_ni_syscall,
[ __NR_free_hugepages ] = sys_ni_syscall,
[ __NR_exit_group ] = sys_exit_group,
diff -Nru -X ../dontdiff linux-2.5.43/fs/select.c linux-epoll/fs/select.c
--- linux-2.5.43/fs/select.c Tue Oct 15 20:27:09 2002
+++ linux-epoll/fs/select.c Fri Oct 18 14:22:49 2002
@@ -20,6 +20,7 @@
#include <linux/personality.h> /* for STICKY_TIMEOUTS */
#include <linux/file.h>
#include <linux/fs.h>
+#include <linux/epoll.h>

#include <asm/uaccess.h>

@@ -495,3 +496,23 @@
poll_freewait(&table);
return err;
}
+
+asmlinkage long sys_epoll(unsigned int cmd, struct epoll_struct epd)
+{
+
+ switch(cmd) {
+ case EP_CREATE:
+ return -ENOSYS;
+ case EP_ADDFD:
+ return -ENOSYS;
+ case EP_DELFD:
+ return -ENOSYS;
+ case EP_POLL:
+ return -ENOSYS;
+ case EP_DELETE:
+ return -ENOSYS;
+ default:
+ return -ENOSYS;
+ }
+}
+
diff -Nru -X ../dontdiff linux-2.5.43/include/asm-x86_64/unistd.h linux-epoll/include/asm-x86_64/unistd.h
--- linux-2.5.43/include/asm-x86_64/unistd.h Tue Oct 15 20:28:29 2002
+++ linux-epoll/include/asm-x86_64/unistd.h Fri Oct 18 12:24:37 2002
@@ -480,7 +480,9 @@
__SYSCALL(__NR_io_submit, sys_io_submit)
#define __NR_io_cancel 210
__SYSCALL(__NR_io_cancel, sys_io_cancel)
-#define __NR_get_thread_area 211
+#define __NR_epoll 211
+__SYSCALL(__NR_epoll, sys_epoll)
+#define __NR_get_thread_area 212
__SYSCALL(__NR_get_thread_area, sys_get_thread_area)

#define __NR_syscall_max __NR_get_thread_area
diff -Nru -X ../dontdiff linux-2.5.43/include/linux/epoll.h linux-epoll/include/linux/epoll.h
--- linux-2.5.43/include/linux/epoll.h Wed Dec 31 16:00:00 1969
+++ linux-epoll/include/linux/epoll.h Fri Oct 18 14:20:15 2002
@@ -0,0 +1,29 @@
+#ifndef _LINUX_EPOLL_H
+#define _LINUX_EPOLL_H
+
+#include <linux/poll.h>
+
+struct evpoll {
+ int ep_timeout;
+ unsigned long ep_resoff;
+};
+
+struct epoll_struct {
+ int maxfds;
+ void *mmap_base;
+ struct pollfd pfd;
+ struct evpoll evp;
+};
+
+enum cmd {
+ EP_CREATE,
+ EP_ADDFD,
+ EP_DELFD,
+ EP_POLL,
+ EP_DELETE
+};
+
+
+#endif
+
+



---------- End Forwarded Message ----------



2002-10-18 22:03:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] sys_epoll system call interface to /dev/epoll


On Fri, 18 Oct 2002, Hanna Linder wrote:
>
> You said earlier that you didn't like epoll being
> in /dev. How about a system call interface instead? this
> patch provides the skeleton for that system call. We
> can have Davide's patch ported into it ASAP if you include
> this by the freeze.

I like it noticeably better as a system call, so it's maybe worth
discussing. It's not going to happen before I leave (very early tomorrow
morning), but if people involved agree on this and clean patches to
actiually add the code (not just system call stubs) can be made..

Linus

2002-10-18 22:31:30

by Davide Libenzi

[permalink] [raw]
Subject: Re: [PATCH] sys_epoll system call interface to /dev/epoll

On Fri, 18 Oct 2002, Linus Torvalds wrote:

>
> On Fri, 18 Oct 2002, Hanna Linder wrote:
> >
> > You said earlier that you didn't like epoll being
> > in /dev. How about a system call interface instead? this
> > patch provides the skeleton for that system call. We
> > can have Davide's patch ported into it ASAP if you include
> > this by the freeze.
>
> I like it noticeably better as a system call, so it's maybe worth
> discussing. It's not going to happen before I leave (very early tomorrow
> morning), but if people involved agree on this and clean patches to
> actiually add the code (not just system call stubs) can be made..

Linus, yesterday I was sugesting Hanna to use most of the existing code
and to make :

int sys_epoll_create(int maxfds);

to actually return an fd. Basically during this function call the code
allocates a file*, initialize it, allocates a free fd, maps the file* to
the fd, creates the vma* for the shared events area between the kernel and
user space, maps allocated kernel pages to the vma*, install the vma* and
returns the fd. In this way we can avoid the sys_epoll_close() and close()
can be used. Also the task cleanup comes for free being linked to a file*.
In this way also users that are using /dev/epoll in the old way can
continue to use it, being the skeleton code the same.
Do I have to guess that you do not like this idea ?



- Davide


2002-10-18 22:43:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] sys_epoll system call interface to /dev/epoll


On Fri, 18 Oct 2002, Davide Libenzi wrote:
>
> Linus, yesterday I was sugesting Hanna to use most of the existing code
> and to make :
>
> int sys_epoll_create(int maxfds);
>
> to actually return an fd. Basically during this function call the code
> allocates a file*, initialize it, allocates a free fd, maps the file* to
> the fd, creates the vma* for the shared events area between the kernel and
> user space, maps allocated kernel pages to the vma*, install the vma* and
> returns the fd.

But that's what her patch infrastructure seems to do. It's not just
epoll_create(), it's all the other ioctl's too (unlink, remove etc). One
queston is whether there is one epoll system call (that multiplexes, like
in Hanna's patch) or many. I personally don't like multiplexing system
calls - the system call number _is_ a multiplexor, I don't see the point
of having multiple levels.

Linus

2002-10-18 23:13:53

by Shailabh Nagar

[permalink] [raw]
Subject: Re: [PATCH] sys_epoll system call interface to /dev/epoll

Linus,

Apart from the multiple vs. single system call issue, are you okay with
the creation of an fd,file * etc. without having a device ?

The former issue could certainly be avoided by having multiple syscalls.
In fact, Davide had originally suggested an interface looking somewhat like
this:

int sys_epoll_create(int maxfds);
int sys_epoll_addfd(int epd, int fd);
void sys_epoll_close(int epd);
int sys_epoll_wait(int epd, struct pollfd **pevts, int timeout);

which is roughly what Hanna tried to multiplex onto the single sys_epoll.

-- Shailabh



Linus Torvalds wrote:
> On Fri, 18 Oct 2002, Davide Libenzi wrote:
>
>>Linus, yesterday I was sugesting Hanna to use most of the existing code
>>and to make :
>>
>>int sys_epoll_create(int maxfds);
>>
>>to actually return an fd. Basically during this function call the code
>>allocates a file*, initialize it, allocates a free fd, maps the file* to
>>the fd, creates the vma* for the shared events area between the kernel and
>>user space, maps allocated kernel pages to the vma*, install the vma* and
>>returns the fd.
>
>
> But that's what her patch infrastructure seems to do. It's not just
> epoll_create(), it's all the other ioctl's too (unlink, remove etc). One
> queston is whether there is one epoll system call (that multiplexes, like
> in Hanna's patch) or many. I personally don't like multiplexing system
> calls - the system call number _is_ a multiplexor, I don't see the point
> of having multiple levels.
>
> Linus




2002-10-18 23:19:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] sys_epoll system call interface to /dev/epoll


On Fri, 18 Oct 2002, Shailabh Nagar wrote:
>
> Apart from the multiple vs. single system call issue, are you okay with
> the creation of an fd,file * etc. without having a device ?

Hey, that's the UNIX way. Think sockets, think pipes, think futexes. It's
nothing new, it's been there in Unix since 1969, and the "everything is a
file" thing has nothing to say about using "open".

Linus