2019-05-16 09:00:05

by Roman Penyaev

[permalink] [raw]
Subject: [PATCH v3 13/13] epoll: implement epoll_create2() syscall

epoll_create2() is needed to accept EPOLL_USERPOLL flags
and size, i.e. this patch wires up polling from userspace.

Signed-off-by: Roman Penyaev <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: [email protected]
Cc: [email protected]

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 4cd5f982b1e5..f0d271875f4e 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -438,3 +438,4 @@
425 i386 io_uring_setup sys_io_uring_setup __ia32_sys_io_uring_setup
426 i386 io_uring_enter sys_io_uring_enter __ia32_sys_io_uring_enter
427 i386 io_uring_register sys_io_uring_register __ia32_sys_io_uring_register
+428 i386 epoll_create2 sys_epoll_create2 __ia32_sys_epoll_create2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 64ca0d06259a..5ee9bb31a552 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -355,6 +355,7 @@
425 common io_uring_setup __x64_sys_io_uring_setup
426 common io_uring_enter __x64_sys_io_uring_enter
427 common io_uring_register __x64_sys_io_uring_register
+428 common epoll_create2 __x64_sys_epoll_create2

#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9ff666ce7cb5..b44c3a0c4ad0 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2614,6 +2614,14 @@ static int do_epoll_create(int flags, size_t size)
return error;
}

+SYSCALL_DEFINE2(epoll_create2, int, flags, size_t, size)
+{
+ if (size == 0)
+ return -EINVAL;
+
+ return do_epoll_create(flags, size);
+}
+
SYSCALL_DEFINE1(epoll_create1, int, flags)
{
return do_epoll_create(flags, 0);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e2870fe1be5b..5049b0d16949 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -358,6 +358,7 @@ asmlinkage long sys_eventfd2(unsigned int count, int flags);

/* fs/eventpoll.c */
asmlinkage long sys_epoll_create1(int flags);
+asmlinkage long sys_epoll_create2(int flags, size_t size);
asmlinkage long sys_epoll_ctl(int epfd, int op, int fd,
struct epoll_event __user *event);
asmlinkage long sys_epoll_pwait(int epfd, struct epoll_event __user *events,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index dee7292e1df6..fccfaab366ee 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -832,9 +832,11 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup)
__SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
#define __NR_io_uring_register 427
__SYSCALL(__NR_io_uring_register, sys_io_uring_register)
+#define __NR_epoll_create2 428
+__SYSCALL(__NR_epoll_create2, sys_epoll_create2)

#undef __NR_syscalls
-#define __NR_syscalls 428
+#define __NR_syscalls 429

/*
* 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 4d9ae5ea6caf..665908b8a326 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -65,6 +65,7 @@ COND_SYSCALL(eventfd2);

/* fs/eventfd.c */
COND_SYSCALL(epoll_create1);
+COND_SYSCALL(epoll_create2);
COND_SYSCALL(epoll_ctl);
COND_SYSCALL(epoll_pwait);
COND_SYSCALL_COMPAT(epoll_pwait);
--
2.21.0


2019-05-16 10:04:49

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 13/13] epoll: implement epoll_create2() syscall

On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <[email protected]> wrote:
>
> epoll_create2() is needed to accept EPOLL_USERPOLL flags
> and size, i.e. this patch wires up polling from userspace.

Could you add the system call to all syscall*.tbl files at the same time here?

Arnd

2019-05-16 10:23:23

by Roman Penyaev

[permalink] [raw]
Subject: Re: [PATCH v3 13/13] epoll: implement epoll_create2() syscall

On 2019-05-16 12:03, Arnd Bergmann wrote:
> On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <[email protected]>
> wrote:
>>
>> epoll_create2() is needed to accept EPOLL_USERPOLL flags
>> and size, i.e. this patch wires up polling from userspace.
>
> Could you add the system call to all syscall*.tbl files at the same
> time here?

For all other archs, you mean? Sure. But what is the rule of thumb?
Sometimes people tend to add to the most common x86 and other tables
are left untouched, but then you commit the rest, e.g.

commit 39036cd2727395c3369b1051005da74059a85317
Author: Arnd Bergmann <[email protected]>
Date: Thu Feb 28 13:59:19 2019 +0100

arch: add pidfd and io_uring syscalls everywhere



--
Roman

2019-05-16 11:01:33

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 13/13] epoll: implement epoll_create2() syscall

On Thu, May 16, 2019 at 12:20 PM Roman Penyaev <[email protected]> wrote:
>
> On 2019-05-16 12:03, Arnd Bergmann wrote:
> > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <[email protected]>
> > wrote:
> >>
> >> epoll_create2() is needed to accept EPOLL_USERPOLL flags
> >> and size, i.e. this patch wires up polling from userspace.
> >
> > Could you add the system call to all syscall*.tbl files at the same
> > time here?
>
> For all other archs, you mean? Sure. But what is the rule of thumb?
> Sometimes people tend to add to the most common x86 and other tables
> are left untouched, but then you commit the rest, e.g.
>
> commit 39036cd2727395c3369b1051005da74059a85317
> Author: Arnd Bergmann <[email protected]>
> Date: Thu Feb 28 13:59:19 2019 +0100
>
> arch: add pidfd and io_uring syscalls everywhere

We only recently introduced syscall.tbl files in a common format,
which makes it much easier to add new ones. I hope we can
do it for all architectures right away from now on.

I just noticed that the new mount API assigns six new system
calls as well, but did not use the same numbers across
architectures. I hope we can still rectify that before -rc1
and use the next available ones (428..433), then yours should
be 434 on all architectures, with the exception of arch/alpha.

Arnd

2019-05-22 02:34:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 13/13] epoll: implement epoll_create2() syscall

On Thu, 16 May 2019 12:20:50 +0200 Roman Penyaev <[email protected]> wrote:

> On 2019-05-16 12:03, Arnd Bergmann wrote:
> > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <[email protected]>
> > wrote:
> >>
> >> epoll_create2() is needed to accept EPOLL_USERPOLL flags
> >> and size, i.e. this patch wires up polling from userspace.
> >
> > Could you add the system call to all syscall*.tbl files at the same
> > time here?
>
> For all other archs, you mean? Sure. But what is the rule of thumb?
> Sometimes people tend to add to the most common x86 and other tables
> are left untouched, but then you commit the rest, e.g.
>
> commit 39036cd2727395c3369b1051005da74059a85317
> Author: Arnd Bergmann <[email protected]>
> Date: Thu Feb 28 13:59:19 2019 +0100
>
> arch: add pidfd and io_uring syscalls everywhere
>

I thought the preferred approach was to wire up the architectures on
which the submitter has tested the syscall, then allow the arch
maintainers to enable the syscall independently?

And to help them in this, provide a test suite for the new syscall
under tools/testing/selftests/.

https://github.com/rouming/test-tools/blob/master/userpolled-epoll.c
will certainly help but I do think it would be better to move the test
into the kernel tree to keep it maintained and so that many people run
it in their various setups on an ongoing basis.


2019-05-22 09:13:36

by Roman Penyaev

[permalink] [raw]
Subject: Re: [PATCH v3 13/13] epoll: implement epoll_create2() syscall

On 2019-05-22 04:33, Andrew Morton wrote:
> On Thu, 16 May 2019 12:20:50 +0200 Roman Penyaev <[email protected]>
> wrote:
>
>> On 2019-05-16 12:03, Arnd Bergmann wrote:
>> > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <[email protected]>
>> > wrote:
>> >>
>> >> epoll_create2() is needed to accept EPOLL_USERPOLL flags
>> >> and size, i.e. this patch wires up polling from userspace.
>> >
>> > Could you add the system call to all syscall*.tbl files at the same
>> > time here?
>>
>> For all other archs, you mean? Sure. But what is the rule of thumb?
>> Sometimes people tend to add to the most common x86 and other tables
>> are left untouched, but then you commit the rest, e.g.
>>
>> commit 39036cd2727395c3369b1051005da74059a85317
>> Author: Arnd Bergmann <[email protected]>
>> Date: Thu Feb 28 13:59:19 2019 +0100
>>
>> arch: add pidfd and io_uring syscalls everywhere
>>
>
> I thought the preferred approach was to wire up the architectures on
> which the submitter has tested the syscall, then allow the arch
> maintainers to enable the syscall independently?
>
> And to help them in this, provide a test suite for the new syscall
> under tools/testing/selftests/.
>
> https://github.com/rouming/test-tools/blob/master/userpolled-epoll.c
> will certainly help but I do think it would be better to move the test
> into the kernel tree to keep it maintained and so that many people run
> it in their various setups on an ongoing basis.

Yes, on a next iteration I will add the tool to selftests. Thanks.

--
Roman


2019-05-22 11:16:42

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 13/13] epoll: implement epoll_create2() syscall

On Wed, May 22, 2019 at 4:33 AM Andrew Morton <[email protected]> wrote:
> On Thu, 16 May 2019 12:20:50 +0200 Roman Penyaev <[email protected]> wrote:
> > On 2019-05-16 12:03, Arnd Bergmann wrote:
> > > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <[email protected]>
> > > wrote:
> > >>
> > >> epoll_create2() is needed to accept EPOLL_USERPOLL flags
> > >> and size, i.e. this patch wires up polling from userspace.
> > >
> > > Could you add the system call to all syscall*.tbl files at the same
> > > time here?
> >
> > For all other archs, you mean? Sure. But what is the rule of thumb?
> > Sometimes people tend to add to the most common x86 and other tables
> > are left untouched, but then you commit the rest, e.g.
> >
> > commit 39036cd2727395c3369b1051005da74059a85317
> > Author: Arnd Bergmann <[email protected]>
> > Date: Thu Feb 28 13:59:19 2019 +0100
> >
> > arch: add pidfd and io_uring syscalls everywhere
> >
>
> I thought the preferred approach was to wire up the architectures on
> which the submitter has tested the syscall, then allow the arch
> maintainers to enable the syscall independently?

I'm hoping to change that practice now, as it has not worked well
in the past:

- half the architectures now use asm-generic/unistd.h, so they are
already wired up at the same time, regardless of testing
- in the other half, not adding them at the same time actually
made it harder to test, as it was significantly harder to figure
out how to build a modified kernel for a given architecture
than to run the test case
- Not having all architectures add a new call at the same time caused
the architectures to get out of sync when some got added and others
did not. Now that we use the same numbers across all architectures,
that would be even more confusing.

My plan for the long run is to only have one file to which new
system calls get added in the future.

> And to help them in this, provide a test suite for the new syscall
> under tools/testing/selftests/.
>
> https://github.com/rouming/test-tools/blob/master/userpolled-epoll.c
> will certainly help but I do think it would be better to move the test
> into the kernel tree to keep it maintained and so that many people run
> it in their various setups on an ongoing basis.

No disagreement on that.

Arnd

2019-05-22 18:37:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 13/13] epoll: implement epoll_create2() syscall

On Wed, 22 May 2019 13:14:41 +0200 Arnd Bergmann <[email protected]> wrote:

> > I thought the preferred approach was to wire up the architectures on
> > which the submitter has tested the syscall, then allow the arch
> > maintainers to enable the syscall independently?
>
> I'm hoping to change that practice now, as it has not worked well
> in the past:
>
> - half the architectures now use asm-generic/unistd.h, so they are
> already wired up at the same time, regardless of testing
> - in the other half, not adding them at the same time actually
> made it harder to test, as it was significantly harder to figure
> out how to build a modified kernel for a given architecture
> than to run the test case
> - Not having all architectures add a new call at the same time caused
> the architectures to get out of sync when some got added and others
> did not. Now that we use the same numbers across all architectures,
> that would be even more confusing.
>
> My plan for the long run is to only have one file to which new
> system calls get added in the future.

Fair enough. We're adding code to architectures without having tested
it on those architectures but we do that all the time anyway - I guess
there's not a lot of point in special-casing new syscalls.