2019-03-25 14:38:18

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 1/2] io_uring: fix big-endian compat signal mask handling

On big-endian architectures, the signal masks are differnet
between 32-bit and 64-bit tasks, so we have to use a different
function for reading them from user space.

io_cqring_wait() initially got this wrong, and always interprets
this as a native structure. This is ok on x86 and most arm64,
but not on s390, ppc64be, mips64be, sparc64 and parisc.

Signed-off-by: Arnd Bergmann <[email protected]>
---
fs/io_uring.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6aaa30580a2b..8f48d29abf76 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1968,7 +1968,15 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
return 0;

if (sig) {
- ret = set_user_sigmask(sig, &ksigmask, &sigsaved, sigsz);
+#ifdef CONFIG_COMPAT
+ if (in_compat_syscall())
+ ret = set_compat_user_sigmask((const compat_sigset_t __user *)sig,
+ &ksigmask, &sigsaved, sigsz);
+ else
+#endif
+ ret = set_user_sigmask(sig, &ksigmask,
+ &sigsaved, sigsz);
+
if (ret)
return ret;
}
--
2.20.0



2019-03-25 14:50:09

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

Add the io_uring and pidfd_send_signal system calls to all architectures.

These system calls are designed to handle both native and compat tasks,
so all entries are the same across architectures, only arm-compat and
the generic tale still use an old format.

Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/alpha/kernel/syscalls/syscall.tbl | 4 ++++
arch/arm/tools/syscall.tbl | 4 ++++
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 8 ++++++++
arch/ia64/kernel/syscalls/syscall.tbl | 4 ++++
arch/m68k/kernel/syscalls/syscall.tbl | 4 ++++
arch/microblaze/kernel/syscalls/syscall.tbl | 4 ++++
arch/mips/kernel/syscalls/syscall_n32.tbl | 4 ++++
arch/mips/kernel/syscalls/syscall_n64.tbl | 4 ++++
arch/mips/kernel/syscalls/syscall_o32.tbl | 4 ++++
arch/parisc/kernel/syscalls/syscall.tbl | 4 ++++
arch/powerpc/kernel/syscalls/syscall.tbl | 4 ++++
arch/s390/kernel/syscalls/syscall.tbl | 4 ++++
arch/sh/kernel/syscalls/syscall.tbl | 4 ++++
arch/sparc/kernel/syscalls/syscall.tbl | 4 ++++
arch/xtensa/kernel/syscalls/syscall.tbl | 4 ++++
16 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 63ed39cbd3bd..165f268beafc 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -463,3 +463,7 @@
532 common getppid sys_getppid
# all other architectures have common numbers for new syscall, alpha
# is the exception.
+534 common pidfd_send_signal sys_pidfd_send_signal
+535 common io_uring_setup sys_io_uring_setup
+536 common io_uring_enter sys_io_uring_enter
+537 common io_uring_register sys_io_uring_register
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 9016f4081bb9..0393917eaa57 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -437,3 +437,7 @@
421 common rt_sigtimedwait_time64 sys_rt_sigtimedwait
422 common futex_time64 sys_futex
423 common sched_rr_get_interval_time64 sys_sched_rr_get_interval
+424 common pidfd_send_signal sys_pidfd_send_signal
+425 common io_uring_setup sys_io_uring_setup
+426 common io_uring_enter sys_io_uring_enter
+427 common io_uring_register sys_io_uring_register
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 310d8f1cae7a..c6946fe640e6 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -49,7 +49,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)

-#define __NR_compat_syscalls 424
+#define __NR_compat_syscalls 428
#endif

#define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 5590f2623690..23f1a44acada 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -866,6 +866,14 @@ __SYSCALL(__NR_rt_sigtimedwait_time64, compat_sys_rt_sigtimedwait_time64)
__SYSCALL(__NR_futex_time64, sys_futex)
#define __NR_sched_rr_get_interval_time64 423
__SYSCALL(__NR_sched_rr_get_interval_time64, sys_sched_rr_get_interval)
+#define __NR_pidfd_send_signal 424
+__SYSCALL(__NR_pidfd_send_signal, sys_pidfd_send_signal)
+#define __NR_io_uring_setup 425
+__SYSCALL(__NR_io_uring_setup, sys_io_uring_setup)
+#define __NR_io_uring_enter 426
+__SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
+#define __NR_io_uring_register 427
+__SYSCALL(__NR_io_uring_register, sys_io_uring_register)

/*
* Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index ab9cda5f6136..56e3d0b685e1 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -344,3 +344,7 @@
332 common pkey_free sys_pkey_free
333 common rseq sys_rseq
# 334 through 423 are reserved to sync up with other architectures
+424 common pidfd_send_signal sys_pidfd_send_signal
+425 common io_uring_setup sys_io_uring_setup
+426 common io_uring_enter sys_io_uring_enter
+427 common io_uring_register sys_io_uring_register
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 125c14178979..df4ec3ec71d1 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -423,3 +423,7 @@
421 common rt_sigtimedwait_time64 sys_rt_sigtimedwait
422 common futex_time64 sys_futex
423 common sched_rr_get_interval_time64 sys_sched_rr_get_interval
+424 common pidfd_send_signal sys_pidfd_send_signal
+425 common io_uring_setup sys_io_uring_setup
+426 common io_uring_enter sys_io_uring_enter
+427 common io_uring_register sys_io_uring_register
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 8ee3a8c18498..4964947732af 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -429,3 +429,7 @@
421 common rt_sigtimedwait_time64 sys_rt_sigtimedwait
422 common futex_time64 sys_futex
423 common sched_rr_get_interval_time64 sys_sched_rr_get_interval
+424 common pidfd_send_signal sys_pidfd_send_signal
+425 common io_uring_setup sys_io_uring_setup
+426 common io_uring_enter sys_io_uring_enter
+427 common io_uring_register sys_io_uring_register
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 15f4117900ee..9392dfe33f97 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -362,3 +362,7 @@
421 n32 rt_sigtimedwait_time64 compat_sys_rt_sigtimedwait_time64
422 n32 futex_time64 sys_futex
423 n32 sched_rr_get_interval_time64 sys_sched_rr_get_interval
+424 n32 pidfd_send_signal sys_pidfd_send_signal
+425 n32 io_uring_setup sys_io_uring_setup
+426 n32 io_uring_enter sys_io_uring_enter
+427 n32 io_uring_register sys_io_uring_register
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index c85502e67b44..c4a49f7d57bb 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -338,3 +338,7 @@
327 n64 rseq sys_rseq
328 n64 io_pgetevents sys_io_pgetevents
# 329 through 423 are reserved to sync up with other architectures
+424 common pidfd_send_signal sys_pidfd_send_signal
+425 common io_uring_setup sys_io_uring_setup
+426 common io_uring_enter sys_io_uring_enter
+427 common io_uring_register sys_io_uring_register
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 2e063d0f837e..e849e8ffe4a2 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -411,3 +411,7 @@
421 o32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64
422 o32 futex_time64 sys_futex sys_futex
423 o32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval
+424 o32 pidfd_send_signal sys_pidfd_send_signal
+425 o32 io_uring_setup sys_io_uring_setup
+426 o32 io_uring_enter sys_io_uring_enter
+427 o32 io_uring_register sys_io_uring_register
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index b26766c6647d..fe8ca623add8 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -420,3 +420,7 @@
421 32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64
422 32 futex_time64 sys_futex sys_futex
423 32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval
+424 common pidfd_send_signal sys_pidfd_send_signal
+425 common io_uring_setup sys_io_uring_setup
+426 common io_uring_enter sys_io_uring_enter
+427 common io_uring_register sys_io_uring_register
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index b18abb0c3dae..00f5a63c8d9a 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -505,3 +505,7 @@
421 32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64
422 32 futex_time64 sys_futex sys_futex
423 32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval
+424 common pidfd_send_signal sys_pidfd_send_signal
+425 common io_uring_setup sys_io_uring_setup
+426 common io_uring_enter sys_io_uring_enter
+427 common io_uring_register sys_io_uring_register
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 02579f95f391..3eb56e639b96 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -426,3 +426,7 @@
421 32 rt_sigtimedwait_time64 - compat_sys_rt_sigtimedwait_time64
422 32 futex_time64 - sys_futex
423 32 sched_rr_get_interval_time64 - sys_sched_rr_get_interval
+424 common pidfd_send_signal sys_pidfd_send_signal
+425 common io_uring_setup sys_io_uring_setup
+426 common io_uring_enter sys_io_uring_enter
+427 common io_uring_register sys_io_uring_register
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index bfda678576e4..480b057556ee 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -426,3 +426,7 @@
421 common rt_sigtimedwait_time64 sys_rt_sigtimedwait
422 common futex_time64 sys_futex
423 common sched_rr_get_interval_time64 sys_sched_rr_get_interval
+424 common pidfd_send_signal sys_pidfd_send_signal
+425 common io_uring_setup sys_io_uring_setup
+426 common io_uring_enter sys_io_uring_enter
+427 common io_uring_register sys_io_uring_register
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index b9a5a04b2d2c..a1dd24307b00 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -469,3 +469,7 @@
421 32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64
422 32 futex_time64 sys_futex sys_futex
423 32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval
+424 common pidfd_send_signal sys_pidfd_send_signal
+425 common io_uring_setup sys_io_uring_setup
+426 common io_uring_enter sys_io_uring_enter
+427 common io_uring_register sys_io_uring_register
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 6af49929de85..30084eaf8422 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -394,3 +394,7 @@
421 common rt_sigtimedwait_time64 sys_rt_sigtimedwait
422 common futex_time64 sys_futex
423 common sched_rr_get_interval_time64 sys_sched_rr_get_interval
+424 common pidfd_send_signal sys_pidfd_send_signal
+425 common io_uring_setup sys_io_uring_setup
+426 common io_uring_enter sys_io_uring_enter
+427 common io_uring_register sys_io_uring_register
--
2.20.0


2019-03-25 16:07:15

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 1/2] io_uring: fix big-endian compat signal mask handling

On 3/25/19 8:34 AM, Arnd Bergmann wrote:
> On big-endian architectures, the signal masks are differnet
> between 32-bit and 64-bit tasks, so we have to use a different
> function for reading them from user space.
>
> io_cqring_wait() initially got this wrong, and always interprets
> this as a native structure. This is ok on x86 and most arm64,
> but not on s390, ppc64be, mips64be, sparc64 and parisc.

Thanks Arnd, applied.

Was there a 2/2 patch? I only received this one, 1/2.

--
Jens Axboe


2019-03-25 16:13:25

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] io_uring: fix big-endian compat signal mask handling

On Mon, Mar 25, 2019 at 5:05 PM Jens Axboe <[email protected]> wrote:
>
> On 3/25/19 8:34 AM, Arnd Bergmann wrote:
> > On big-endian architectures, the signal masks are differnet
> > between 32-bit and 64-bit tasks, so we have to use a different
> > function for reading them from user space.
> >
> > io_cqring_wait() initially got this wrong, and always interprets
> > this as a native structure. This is ok on x86 and most arm64,
> > but not on s390, ppc64be, mips64be, sparc64 and parisc.
>
> Thanks Arnd, applied.
>
> Was there a 2/2 patch? I only received this one, 1/2.

Sorry I missed you on Cc:
https://lore.kernel.org/lkml/[email protected]/T/#u

This one went out to all the affected arch maintainers.

Arnd

2019-03-25 16:16:17

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 1/2] io_uring: fix big-endian compat signal mask handling

On 3/25/19 10:11 AM, Arnd Bergmann wrote:
> On Mon, Mar 25, 2019 at 5:05 PM Jens Axboe <[email protected]> wrote:
>>
>> On 3/25/19 8:34 AM, Arnd Bergmann wrote:
>>> On big-endian architectures, the signal masks are differnet
>>> between 32-bit and 64-bit tasks, so we have to use a different
>>> function for reading them from user space.
>>>
>>> io_cqring_wait() initially got this wrong, and always interprets
>>> this as a native structure. This is ok on x86 and most arm64,
>>> but not on s390, ppc64be, mips64be, sparc64 and parisc.
>>
>> Thanks Arnd, applied.
>>
>> Was there a 2/2 patch? I only received this one, 1/2.
>
> Sorry I missed you on Cc:
> https://lore.kernel.org/lkml/[email protected]/T/#u
>
> This one went out to all the affected arch maintainers.

Ah gotcha, just wanted to make sure I didn't miss a patch.

--
Jens Axboe


2019-03-25 16:21:43

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 1/2] io_uring: fix big-endian compat signal mask handling

On Mon, 2019-03-25 at 15:34 +0100, Arnd Bergmann wrote:
> On big-endian architectures, the signal masks are differnet
> between 32-bit and 64-bit tasks, so we have to use a different
> function for reading them from user space.
>
> io_cqring_wait() initially got this wrong, and always interprets
> this as a native structure. This is ok on x86 and most arm64,
> but not on s390, ppc64be, mips64be, sparc64 and parisc.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> fs/io_uring.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 6aaa30580a2b..8f48d29abf76 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1968,7 +1968,15 @@ static int io_cqring_wait(struct io_ring_ctx
> *ctx, int min_events,
> return 0;
>
> if (sig) {
> - ret = set_user_sigmask(sig, &ksigmask, &sigsaved,
> sigsz);
> +#ifdef CONFIG_COMPAT
> + if (in_compat_syscall())
> + ret = set_compat_user_sigmask((const
> compat_sigset_t __user *)sig,
> + &ksigmask,
> &sigsaved, sigsz);
> + else
> +#endif

This looks a bit suboptimal: shouldn't in_compat_syscall() be hard
coded to return 0 if CONFIG_COMPAT isn't defined? That way the
compiler can do the correct optimization and we don't have to litter
#ifdefs and worry about undefined variables and other things.

James


2019-03-25 16:24:31

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 1/2] io_uring: fix big-endian compat signal mask handling

On 3/25/19 10:19 AM, James Bottomley wrote:
> On Mon, 2019-03-25 at 15:34 +0100, Arnd Bergmann wrote:
>> On big-endian architectures, the signal masks are differnet
>> between 32-bit and 64-bit tasks, so we have to use a different
>> function for reading them from user space.
>>
>> io_cqring_wait() initially got this wrong, and always interprets
>> this as a native structure. This is ok on x86 and most arm64,
>> but not on s390, ppc64be, mips64be, sparc64 and parisc.
>>
>> Signed-off-by: Arnd Bergmann <[email protected]>
>> ---
>> fs/io_uring.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 6aaa30580a2b..8f48d29abf76 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -1968,7 +1968,15 @@ static int io_cqring_wait(struct io_ring_ctx
>> *ctx, int min_events,
>> return 0;
>>
>> if (sig) {
>> - ret = set_user_sigmask(sig, &ksigmask, &sigsaved,
>> sigsz);
>> +#ifdef CONFIG_COMPAT
>> + if (in_compat_syscall())
>> + ret = set_compat_user_sigmask((const
>> compat_sigset_t __user *)sig,
>> + &ksigmask,
>> &sigsaved, sigsz);
>> + else
>> +#endif
>
> This looks a bit suboptimal: shouldn't in_compat_syscall() be hard
> coded to return 0 if CONFIG_COMPAT isn't defined? That way the
> compiler can do the correct optimization and we don't have to litter
> #ifdefs and worry about undefined variables and other things.

That requires the types to be valid for !CONFIG_COMPAT, as well as the
sigmask helper.

--
Jens Axboe


2019-03-25 16:26:11

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] io_uring: fix big-endian compat signal mask handling

On Mon, Mar 25, 2019 at 5:19 PM James Bottomley
<[email protected]> wrote:

> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -1968,7 +1968,15 @@ static int io_cqring_wait(struct io_ring_ctx
> > *ctx, int min_events,
> > return 0;
> >
> > if (sig) {
> > - ret = set_user_sigmask(sig, &ksigmask, &sigsaved,
> > sigsz);
> > +#ifdef CONFIG_COMPAT
> > + if (in_compat_syscall())
> > + ret = set_compat_user_sigmask((const
> > compat_sigset_t __user *)sig,
> > + &ksigmask,
> > &sigsaved, sigsz);
> > + else
> > +#endif
>
> This looks a bit suboptimal: shouldn't in_compat_syscall() be hard
> coded to return 0 if CONFIG_COMPAT isn't defined? That way the
> compiler can do the correct optimization and we don't have to litter
> #ifdefs and worry about undefined variables and other things.

The check can be outside of the #ifdef, but set_compat_user_sigmask
is not declared then.

I think for the future we can consider just moving the compat logic
into set_user_sigmask(), which would simplify most of the callers,
but that seemed to invasive as a bugfix for 5.1.

Arnd

2019-03-25 17:39:37

by Paul Burton

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

Hi Arnd,

On Mon, Mar 25, 2019 at 03:47:37PM +0100, Arnd Bergmann wrote:
> Add the io_uring and pidfd_send_signal system calls to all architectures.
>
> These system calls are designed to handle both native and compat tasks,
> so all entries are the same across architectures, only arm-compat and
> the generic tale still use an old format.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
>%
> diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
> index c85502e67b44..c4a49f7d57bb 100644
> --- a/arch/mips/kernel/syscalls/syscall_n64.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
> @@ -338,3 +338,7 @@
> 327 n64 rseq sys_rseq
> 328 n64 io_pgetevents sys_io_pgetevents
> # 329 through 423 are reserved to sync up with other architectures
> +424 common pidfd_send_signal sys_pidfd_send_signal
> +425 common io_uring_setup sys_io_uring_setup
> +426 common io_uring_enter sys_io_uring_enter
> +427 common io_uring_register sys_io_uring_register

Shouldn't these declare the ABI as "n64"?

I don't see anywhere that it would actually change the generated code,
but a comment at the top of the file says that every entry should use
"n64" and so far they all do. Did you have something else in mind here?

Thanks,
Paul

2019-03-26 00:15:22

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 1/2] io_uring: fix big-endian compat signal mask handling

On Mon, 2019-03-25 at 17:24 +0100, Arnd Bergmann wrote:
> On Mon, Mar 25, 2019 at 5:19 PM James Bottomley
> <[email protected]> wrote:
>
> > > --- a/fs/io_uring.c
> > > +++ b/fs/io_uring.c
> > > @@ -1968,7 +1968,15 @@ static int io_cqring_wait(struct
> > > io_ring_ctx
> > > *ctx, int min_events,
> > > return 0;
> > >
> > > if (sig) {
> > > - ret = set_user_sigmask(sig, &ksigmask, &sigsaved,
> > > sigsz);
> > > +#ifdef CONFIG_COMPAT
> > > + if (in_compat_syscall())
> > > + ret = set_compat_user_sigmask((const
> > > compat_sigset_t __user *)sig,
> > > + &ksigmask,
> > > &sigsaved, sigsz);
> > > + else
> > > +#endif
> >
> > This looks a bit suboptimal: shouldn't in_compat_syscall() be hard
> > coded to return 0 if CONFIG_COMPAT isn't defined? That way the
> > compiler can do the correct optimization and we don't have to
> > litter #ifdefs and worry about undefined variables and other
> > things.
>
> The check can be outside of the #ifdef, but set_compat_user_sigmask
> is not declared then.

Right, but shouldn't it be declared? I thought BUILD_BUG_ON had nice
magic that allowed it to work here (meaning if the compiler doesn't
eliminate the branch we get a build bug).

> I think for the future we can consider just moving the compat logic
> into set_user_sigmask(), which would simplify most of the callers,
> but that seemed to invasive as a bugfix for 5.1.

Well, that too. I've just been on a recent bender to stop #ifdefs
after I saw what some people were doing with them.

James


2019-03-26 08:38:05

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] io_uring: fix big-endian compat signal mask handling

On Tue, Mar 26, 2019 at 1:13 AM James Bottomley
<[email protected]> wrote:
> On Mon, 2019-03-25 at 17:24 +0100, Arnd Bergmann wrote:
> > On Mon, Mar 25, 2019 at 5:19 PM James Bottomley
> > <[email protected]> wrote:
> > > This looks a bit suboptimal: shouldn't in_compat_syscall() be hard
> > > coded to return 0 if CONFIG_COMPAT isn't defined? That way the
> > > compiler can do the correct optimization and we don't have to
> > > litter #ifdefs and worry about undefined variables and other
> > > things.
> >
> > The check can be outside of the #ifdef, but set_compat_user_sigmask
> > is not declared then.
>
> Right, but shouldn't it be declared? I thought BUILD_BUG_ON had nice
> magic that allowed it to work here (meaning if the compiler doesn't
> eliminate the branch we get a build bug).

My y2038 series originally went in that direction by allowing much more
of the compat code to be compiled and then discarded without the
#ifdefs (and combine it with the 32-bit time_t handling on 32-bit
architectures). I went away from that after Christoph and others found
the reuse of the compat interfaces too confusing.

The current state now is that most compat_* interfaces cannot be
compiled unless CONFIG_COMPAT is set, and making that work
in general is a lot of work, so I followed the usual precedent here
and used that #ifdef. This also matches what is done elsewhere
in the same file (see io_import_iovec).

> > I think for the future we can consider just moving the compat logic
> > into set_user_sigmask(), which would simplify most of the callers,
> > but that seemed to invasive as a bugfix for 5.1.
>
> Well, that too. I've just been on a recent bender to stop #ifdefs
> after I saw what some people were doing with them.

I absolutely agree in general, and have sent many patches to
remove #ifdefs in other code when there was a good alternative
and the #ifdefs are wrong (which they are at least 30% of the time
in my experience).

The problems for doing this in general for compat code are

- some structures have a conditional compat_ioctl() callback
pointer, and need an #ifdef around the assignment until
we change the struct as well.
- Most compat handlers require the use of the compat_ptr()
wrapper, I have a patch to move this to common code, but
that was rejected previously
- many compat handlers rely on types from asm/compat.h
that does not exist on architectures without compat support.

In this specific case, compat_sigset_t is required for declaring
set_compat_user_sigmask(), and the former is not easy to
define on non-compat architectures. I still think that the best
way forward here is to move it into set_user_sigmask()
next merge window, rather than doing a larger scale rewrite
of linux/compat.h to get this bug fixed now.

Arnd

2019-03-26 08:42:06

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

On Mon, Mar 25, 2019 at 6:37 PM Paul Burton <[email protected]> wrote:
> On Mon, Mar 25, 2019 at 03:47:37PM +0100, Arnd Bergmann wrote:
> > Add the io_uring and pidfd_send_signal system calls to all architectures.
> >
> > These system calls are designed to handle both native and compat tasks,
> > so all entries are the same across architectures, only arm-compat and
> > the generic tale still use an old format.
> >
> > Signed-off-by: Arnd Bergmann <[email protected]>
> > ---
> >%
> > diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
> > index c85502e67b44..c4a49f7d57bb 100644
> > --- a/arch/mips/kernel/syscalls/syscall_n64.tbl
> > +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
> > @@ -338,3 +338,7 @@
> > 327 n64 rseq sys_rseq
> > 328 n64 io_pgetevents sys_io_pgetevents
> > # 329 through 423 are reserved to sync up with other architectures
> > +424 common pidfd_send_signal sys_pidfd_send_signal
> > +425 common io_uring_setup sys_io_uring_setup
> > +426 common io_uring_enter sys_io_uring_enter
> > +427 common io_uring_register sys_io_uring_register
>
> Shouldn't these declare the ABI as "n64"?
>
> I don't see anywhere that it would actually change the generated code,
> but a comment at the top of the file says that every entry should use
> "n64" and so far they all do. Did you have something else in mind here?

You are right, the use of 'common' here is unintentional but harmless,
and I should have used 'n64' here.

We may decide to do things differently in the future, i.e. we could
have just a single global file for newly added system calls once
it turns out that the tables are consistent across all architectures,
but I'd probably go on with the separate identical entries for a bit
before changing that.

Arnd

2019-03-30 09:43:47

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

On Mon, Mar 25, 2019 at 03:47:37PM +0100, Arnd Bergmann wrote:
> Add the io_uring and pidfd_send_signal system calls to all architectures.
>
> These system calls are designed to handle both native and compat tasks,
> so all entries are the same across architectures, only arm-compat and
> the generic tale still use an old format.
>
> Signed-off-by: Arnd Bergmann <[email protected]>

> diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
> index 02579f95f391..3eb56e639b96 100644
> --- a/arch/s390/kernel/syscalls/syscall.tbl
> +++ b/arch/s390/kernel/syscalls/syscall.tbl
> @@ -426,3 +426,7 @@
> 421 32 rt_sigtimedwait_time64 - compat_sys_rt_sigtimedwait_time64
> 422 32 futex_time64 - sys_futex
> 423 32 sched_rr_get_interval_time64 - sys_sched_rr_get_interval
> +424 common pidfd_send_signal sys_pidfd_send_signal
> +425 common io_uring_setup sys_io_uring_setup
> +426 common io_uring_enter sys_io_uring_enter
> +427 common io_uring_register sys_io_uring_register

I was just about to write that io_uring_enter is missing compat
handling, but your first patch actually fixes that. Would have been
good to be cc'ed on both patches :)

For s390:
Acked-by: Heiko Carstens <[email protected]>


2019-03-31 09:48:21

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

Arnd Bergmann <[email protected]> writes:
> Add the io_uring and pidfd_send_signal system calls to all architectures.
>
> These system calls are designed to handle both native and compat tasks,
> so all entries are the same across architectures, only arm-compat and
> the generic tale still use an old format.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> arch/alpha/kernel/syscalls/syscall.tbl | 4 ++++
> arch/arm/tools/syscall.tbl | 4 ++++
> arch/arm64/include/asm/unistd.h | 2 +-
> arch/arm64/include/asm/unistd32.h | 8 ++++++++
> arch/ia64/kernel/syscalls/syscall.tbl | 4 ++++
> arch/m68k/kernel/syscalls/syscall.tbl | 4 ++++
> arch/microblaze/kernel/syscalls/syscall.tbl | 4 ++++
> arch/mips/kernel/syscalls/syscall_n32.tbl | 4 ++++
> arch/mips/kernel/syscalls/syscall_n64.tbl | 4 ++++
> arch/mips/kernel/syscalls/syscall_o32.tbl | 4 ++++
> arch/parisc/kernel/syscalls/syscall.tbl | 4 ++++
> arch/powerpc/kernel/syscalls/syscall.tbl | 4 ++++

Have you done any testing?

I'd rather not wire up syscalls that have never been tested at all on
powerpc.

cheers

2019-03-31 16:31:07

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

On Sun, Mar 31, 2019 at 5:47 PM Michael Ellerman <[email protected]> wrote:
>
> Arnd Bergmann <[email protected]> writes:
> > Add the io_uring and pidfd_send_signal system calls to all architectures.
> >
> > These system calls are designed to handle both native and compat tasks,
> > so all entries are the same across architectures, only arm-compat and
> > the generic tale still use an old format.
> >
> > Signed-off-by: Arnd Bergmann <[email protected]>
> > ---
> > arch/alpha/kernel/syscalls/syscall.tbl | 4 ++++
> > arch/arm/tools/syscall.tbl | 4 ++++
> > arch/arm64/include/asm/unistd.h | 2 +-
> > arch/arm64/include/asm/unistd32.h | 8 ++++++++
> > arch/ia64/kernel/syscalls/syscall.tbl | 4 ++++
> > arch/m68k/kernel/syscalls/syscall.tbl | 4 ++++
> > arch/microblaze/kernel/syscalls/syscall.tbl | 4 ++++
> > arch/mips/kernel/syscalls/syscall_n32.tbl | 4 ++++
> > arch/mips/kernel/syscalls/syscall_n64.tbl | 4 ++++
> > arch/mips/kernel/syscalls/syscall_o32.tbl | 4 ++++
> > arch/parisc/kernel/syscalls/syscall.tbl | 4 ++++
> > arch/powerpc/kernel/syscalls/syscall.tbl | 4 ++++
>
> Have you done any testing?
>
> I'd rather not wire up syscalls that have never been tested at all on
> powerpc.

No, I have not. I did review the system calls carefully and added the first
patch to fix the bug on x86 compat mode before adding the same bug
on the other compat architectures though ;-)

Generally, my feeling is that adding system calls is not fundamentally
different from adding other ABIs, and we should really do it at
the same time across all architectures, rather than waiting for each
maintainer to get around to reviewing and testing the new calls
first. This is not a problem on powerpc, but a lot of other architectures
are less active, which is how we have always ended up with
different sets of system calls across architectures.

The problem here is that this makes it harder for the C library to
know when a system call is guaranteed to be available. glibc
still needs a feature test for newly added syscalls to see if they
are working (they might be backported to an older kernel, or
disabled), but whenever the minimum kernel version is increased,
it makes sense to drop those checks and assume non-optional
system calls will work if they were part of that minimum version.

In the future, I'd hope that any new system calls get added
right away on all architectures when they land (it was a bit
tricky this time, because I still did a bunch of reworks that
conflicted with the new calls). Bugs will happen of course, but
I think adding them sooner makes it more likely to catch those
bugs early on so we have a chance to fix them properly,
and need fewer arch specific workarounds (ideally none)
for system calls.

Arnd

2019-04-01 08:21:53

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

On Mon, Mar 25, 2019 at 3:48 PM Arnd Bergmann <[email protected]> wrote:
> Add the io_uring and pidfd_send_signal system calls to all architectures.
>
> These system calls are designed to handle both native and compat tasks,
> so all entries are the same across architectures, only arm-compat and
> the generic tale still use an old format.
>
> Signed-off-by: Arnd Bergmann <[email protected]>

> arch/m68k/kernel/syscalls/syscall.tbl | 4 ++++

Acked-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2019-04-03 01:22:32

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

Arnd Bergmann <[email protected]> writes:
> On Sun, Mar 31, 2019 at 5:47 PM Michael Ellerman <[email protected]> wrote:
>>
>> Arnd Bergmann <[email protected]> writes:
>> > Add the io_uring and pidfd_send_signal system calls to all architectures.
>> >
>> > These system calls are designed to handle both native and compat tasks,
>> > so all entries are the same across architectures, only arm-compat and
>> > the generic tale still use an old format.
>> >
>> > Signed-off-by: Arnd Bergmann <[email protected]>
>> > ---
>> > arch/alpha/kernel/syscalls/syscall.tbl | 4 ++++
>> > arch/arm/tools/syscall.tbl | 4 ++++
>> > arch/arm64/include/asm/unistd.h | 2 +-
>> > arch/arm64/include/asm/unistd32.h | 8 ++++++++
>> > arch/ia64/kernel/syscalls/syscall.tbl | 4 ++++
>> > arch/m68k/kernel/syscalls/syscall.tbl | 4 ++++
>> > arch/microblaze/kernel/syscalls/syscall.tbl | 4 ++++
>> > arch/mips/kernel/syscalls/syscall_n32.tbl | 4 ++++
>> > arch/mips/kernel/syscalls/syscall_n64.tbl | 4 ++++
>> > arch/mips/kernel/syscalls/syscall_o32.tbl | 4 ++++
>> > arch/parisc/kernel/syscalls/syscall.tbl | 4 ++++
>> > arch/powerpc/kernel/syscalls/syscall.tbl | 4 ++++
>>
>> Have you done any testing?
>>
>> I'd rather not wire up syscalls that have never been tested at all on
>> powerpc.
>
> No, I have not. I did review the system calls carefully and added the first
> patch to fix the bug on x86 compat mode before adding the same bug
> on the other compat architectures though ;-)
>
> Generally, my feeling is that adding system calls is not fundamentally
> different from adding other ABIs, and we should really do it at
> the same time across all architectures, rather than waiting for each
> maintainer to get around to reviewing and testing the new calls
> first. This is not a problem on powerpc, but a lot of other architectures
> are less active, which is how we have always ended up with
> different sets of system calls across architectures.

Well it's still something of a problem on powerpc. No one has
volunteered to test io_uring on powerpc, so at this stage it will go in
completely untested.

If there was a selftest in the tree I'd be a bit happier, because at
least then our CI would start testing it as soon as the syscalls were
wired up in linux-next.

And yeah obviously I should test it, but I don't have infinite time
unfortunately.

> The problem here is that this makes it harder for the C library to
> know when a system call is guaranteed to be available. glibc
> still needs a feature test for newly added syscalls to see if they
> are working (they might be backported to an older kernel, or
> disabled), but whenever the minimum kernel version is increased,
> it makes sense to drop those checks and assume non-optional
> system calls will work if they were part of that minimum version.

But that's the thing, if we just wire them up untested they may not
actually work. And then you have the far worse situation where the
syscall exists in kernel version x but does not actually work properly.

See the mess we have with pkeys for example.

> In the future, I'd hope that any new system calls get added
> right away on all architectures when they land (it was a bit
> tricky this time, because I still did a bunch of reworks that
> conflicted with the new calls). Bugs will happen of course, but
> I think adding them sooner makes it more likely to catch those
> bugs early on so we have a chance to fix them properly,
> and need fewer arch specific workarounds (ideally none)
> for system calls.

For syscalls that have a selftest in the tree, and don't rely on
anything arch specific I agree.

I'm a bit more wary of things that are not easily tested and have the
potential to work differently across arches.

cheers

2019-04-03 02:50:38

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

Arnd Bergmann <[email protected]> writes:
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
> index b18abb0c3dae..00f5a63c8d9a 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -505,3 +505,7 @@
> 421 32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64
> 422 32 futex_time64 sys_futex sys_futex
> 423 32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval
> +424 common pidfd_send_signal sys_pidfd_send_signal
> +425 common io_uring_setup sys_io_uring_setup
> +426 common io_uring_enter sys_io_uring_enter
> +427 common io_uring_register sys_io_uring_register

Acked-by: Michael Ellerman <[email protected]> (powerpc)

Lightly tested.

The pidfd_test selftest passes.

Ran the io_uring example from fio, which prints lots of:

IOPS=209952, IOS/call=32/32, inflight=117 (117), Cachehit=0.00%
IOPS=209952, IOS/call=32/32, inflight=116 (116), Cachehit=0.00%
IOPS=209920, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=209952, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=209920, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=209952, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=114 (114), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=113 (113), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=113 (113), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=113 (113), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=112 (112), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=110 (110), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=105 (105), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=104 (104), Cachehit=0.00%
IOPS=210080, IOS/call=32/32, inflight=102 (102), Cachehit=0.00%
IOPS=210112, IOS/call=32/32, inflight=100 (100), Cachehit=0.00%
IOPS=210080, IOS/call=32/32, inflight=97 (97), Cachehit=0.00%
IOPS=210112, IOS/call=32/32, inflight=97 (97), Cachehit=0.00%
IOPS=210112, IOS/call=32/31, inflight=126 (126), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=126 (126), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=125 (125), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=119 (119), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=117 (117), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=114 (114), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=111 (111), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=108 (108), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=107 (107), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=105 (105), Cachehit=0.00%

Which is good I think?


cheers

2019-04-03 11:12:44

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

Hi Michael,

On Wed, Apr 03, 2019 at 01:47:50PM +1100, Michael Ellerman wrote:
> Arnd Bergmann <[email protected]> writes:
> > diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
> > index b18abb0c3dae..00f5a63c8d9a 100644
> > --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> > +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> > @@ -505,3 +505,7 @@
> > 421 32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64
> > 422 32 futex_time64 sys_futex sys_futex
> > 423 32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval
> > +424 common pidfd_send_signal sys_pidfd_send_signal
> > +425 common io_uring_setup sys_io_uring_setup
> > +426 common io_uring_enter sys_io_uring_enter
> > +427 common io_uring_register sys_io_uring_register
>
> Acked-by: Michael Ellerman <[email protected]> (powerpc)
>
> Lightly tested.
>
> The pidfd_test selftest passes.

That reports pass for me too, although it fails to unshare the pid ns, which I
assume is benign.

> Ran the io_uring example from fio, which prints lots of:

How did you invoke that? I had a play with the tests in:

git://git.kernel.dk/liburing

but I quickly ran into the kernel oops below.

Will

--->8

will@autoplooker:~/liburing/test$ ./io_uring_register
RELIMIT_MEMLOCK: 67108864 (67108864)
[ 35.477875] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
[ 35.478969] Mem abort info:
[ 35.479296] ESR = 0x96000004
[ 35.479785] Exception class = DABT (current EL), IL = 32 bits
[ 35.480528] SET = 0, FnV = 0
[ 35.480980] EA = 0, S1PTW = 0
[ 35.481345] Data abort info:
[ 35.481680] ISV = 0, ISS = 0x00000004
[ 35.482267] CM = 0, WnR = 0
[ 35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
[ 35.483486] [0000000000000070] pgd=0000000000000000
[ 35.484041] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 35.484788] Modules linked in:
[ 35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 5.1.0-rc3-00012-g40b114779944 #1
[ 35.486712] Hardware name: linux,dummy-virt (DT)
[ 35.487450] pstate: 20400005 (nzCv daif +PAN -UAO)
[ 35.488228] pc : link_pwq+0x10/0x60
[ 35.488794] lr : apply_wqattrs_commit+0xe0/0x118
[ 35.489550] sp : ffff000017e2bbc0
[ 35.490088] x29: ffff000017e2bbc0 x28: ffff8004b9118000
[ 35.490939] x27: 0000000000000000 x26: ffff8004c21c4200
[ 35.491786] x25: 0000000000000004 x24: ffff00001123e1b0
[ 35.492640] x23: ffff8004c5390000 x22: ffff8004bb440500
[ 35.493502] x21: ffff8004bb440500 x20: 0000000000000070
[ 35.494355] x19: 0000000000000022 x18: 0000000000000000
[ 35.495202] x17: 0000000000000000 x16: 0000000000000000
[ 35.496054] x15: 0000000000000000 x14: ffff7e0012e8a240
[ 35.496910] x13: 00004a73a5e663e2 x12: 0000000000000000
[ 35.497764] x11: 0000000000000001 x10: 0000000000000070
[ 35.498611] x9 : ffff8004cb49d610 x8 : 00000000ffffffff
[ 35.499462] x7 : ffff8004c4ff9c70 x6 : ffff8004cb49ccb0
[ 35.500308] x5 : ffff8004c66cc4c0 x4 : 0000000000000001
[ 35.501173] x3 : 0000000000000000 x2 : 0000000000000040
[ 35.502019] x1 : 0000000000000004 x0 : 0000000000000000
[ 35.502872] Process io_uring_regist (pid: 3973, stack limit = 0x(____ptrval____))
[ 35.504052] Call trace:
[ 35.504463] link_pwq+0x10/0x60
[ 35.504987] apply_wqattrs_commit+0xe0/0x118
[ 35.505681] apply_workqueue_attrs_locked+0x3c/0x80
[ 35.506460] apply_workqueue_attrs+0x3c/0x60
[ 35.507152] alloc_workqueue+0x264/0x430
[ 35.507786] io_uring_setup+0x478/0x6a8
[ 35.508414] __arm64_sys_io_uring_setup+0x18/0x20
[ 35.509183] el0_svc_common+0x80/0xf0
[ 35.509786] el0_svc_handler+0x2c/0x80
[ 35.510393] el0_svc+0x8/0xc
[ 35.510873] Code: a9bd7bfd 910003fd a90153f3 9101c014 (f9403802)
[ 35.511843] ---[ end trace 0a53e45ee26def4c ]---
Segmentation fault

2019-04-03 13:50:34

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

On 4/3/19 5:11 AM, Will Deacon wrote:
> Hi Michael,
>
> On Wed, Apr 03, 2019 at 01:47:50PM +1100, Michael Ellerman wrote:
>> Arnd Bergmann <[email protected]> writes:
>>> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
>>> index b18abb0c3dae..00f5a63c8d9a 100644
>>> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
>>> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
>>> @@ -505,3 +505,7 @@
>>> 421 32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64
>>> 422 32 futex_time64 sys_futex sys_futex
>>> 423 32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval
>>> +424 common pidfd_send_signal sys_pidfd_send_signal
>>> +425 common io_uring_setup sys_io_uring_setup
>>> +426 common io_uring_enter sys_io_uring_enter
>>> +427 common io_uring_register sys_io_uring_register
>>
>> Acked-by: Michael Ellerman <[email protected]> (powerpc)
>>
>> Lightly tested.
>>
>> The pidfd_test selftest passes.
>
> That reports pass for me too, although it fails to unshare the pid ns, which I
> assume is benign.
>
>> Ran the io_uring example from fio, which prints lots of:
>
> How did you invoke that? I had a play with the tests in:

It's t/io_uring from the fio repo:

git://git.kernel.dk/fio

and you just run it ala:

# make t/io_uring
# t/io_uring /dev/some_device

> git://git.kernel.dk/liburing
>
> but I quickly ran into the kernel oops below.
>
> Will
>
> --->8
>
> will@autoplooker:~/liburing/test$ ./io_uring_register
> RELIMIT_MEMLOCK: 67108864 (67108864)
> [ 35.477875] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
> [ 35.478969] Mem abort info:
> [ 35.479296] ESR = 0x96000004
> [ 35.479785] Exception class = DABT (current EL), IL = 32 bits
> [ 35.480528] SET = 0, FnV = 0
> [ 35.480980] EA = 0, S1PTW = 0
> [ 35.481345] Data abort info:
> [ 35.481680] ISV = 0, ISS = 0x00000004
> [ 35.482267] CM = 0, WnR = 0
> [ 35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
> [ 35.483486] [0000000000000070] pgd=0000000000000000
> [ 35.484041] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [ 35.484788] Modules linked in:
> [ 35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 5.1.0-rc3-00012-g40b114779944 #1
> [ 35.486712] Hardware name: linux,dummy-virt (DT)
> [ 35.487450] pstate: 20400005 (nzCv daif +PAN -UAO)
> [ 35.488228] pc : link_pwq+0x10/0x60
> [ 35.488794] lr : apply_wqattrs_commit+0xe0/0x118
> [ 35.489550] sp : ffff000017e2bbc0

Huh, this looks odd, it's crashing inside the wq setup.


--
Jens Axboe

2019-04-03 15:20:37

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

Hi Jens,

On Wed, Apr 03, 2019 at 07:49:26AM -0600, Jens Axboe wrote:
> On 4/3/19 5:11 AM, Will Deacon wrote:
> > will@autoplooker:~/liburing/test$ ./io_uring_register
> > RELIMIT_MEMLOCK: 67108864 (67108864)
> > [ 35.477875] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
> > [ 35.478969] Mem abort info:
> > [ 35.479296] ESR = 0x96000004
> > [ 35.479785] Exception class = DABT (current EL), IL = 32 bits
> > [ 35.480528] SET = 0, FnV = 0
> > [ 35.480980] EA = 0, S1PTW = 0
> > [ 35.481345] Data abort info:
> > [ 35.481680] ISV = 0, ISS = 0x00000004
> > [ 35.482267] CM = 0, WnR = 0
> > [ 35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
> > [ 35.483486] [0000000000000070] pgd=0000000000000000
> > [ 35.484041] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > [ 35.484788] Modules linked in:
> > [ 35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 5.1.0-rc3-00012-g40b114779944 #1
> > [ 35.486712] Hardware name: linux,dummy-virt (DT)
> > [ 35.487450] pstate: 20400005 (nzCv daif +PAN -UAO)
> > [ 35.488228] pc : link_pwq+0x10/0x60
> > [ 35.488794] lr : apply_wqattrs_commit+0xe0/0x118
> > [ 35.489550] sp : ffff000017e2bbc0
>
> Huh, this looks odd, it's crashing inside the wq setup.

Enabling KASAN seems to indicate a double-free, which may well be related.

Will

[ 149.890370] ==================================================================
[ 149.891266] BUG: KASAN: double-free or invalid-free in io_sqe_files_unregister+0xa8/0x140
[ 149.892218]
[ 149.892411] CPU: 113 PID: 3974 Comm: io_uring_regist Tainted: G B 5.1.0-rc3-00012-g40b114779944 #3
[ 149.893623] Hardware name: linux,dummy-virt (DT)
[ 149.894169] Call trace:
[ 149.894539] dump_backtrace+0x0/0x228
[ 149.895172] show_stack+0x14/0x20
[ 149.895747] dump_stack+0xe8/0x124
[ 149.896335] print_address_description+0x60/0x258
[ 149.897148] kasan_report_invalid_free+0x78/0xb8
[ 149.897936] __kasan_slab_free+0x1fc/0x228
[ 149.898641] kasan_slab_free+0x10/0x18
[ 149.899283] kfree+0x70/0x1f8
[ 149.899798] io_sqe_files_unregister+0xa8/0x140
[ 149.900574] io_ring_ctx_wait_and_kill+0x190/0x3c0
[ 149.901402] io_uring_release+0x2c/0x48
[ 149.902068] __fput+0x18c/0x510
[ 149.902612] ____fput+0xc/0x18
[ 149.903146] task_work_run+0xf0/0x148
[ 149.903778] do_notify_resume+0x554/0x748
[ 149.904467] work_pending+0x8/0x10
[ 149.905060]
[ 149.905331] Allocated by task 3974:
[ 149.905934] __kasan_kmalloc.isra.0.part.1+0x48/0xf8
[ 149.906786] __kasan_kmalloc.isra.0+0xb8/0xd8
[ 149.907531] kasan_kmalloc+0xc/0x18
[ 149.908134] __kmalloc+0x168/0x248
[ 149.908724] __arm64_sys_io_uring_register+0x2b8/0x15a8
[ 149.909622] el0_svc_common+0x100/0x258
[ 149.910281] el0_svc_handler+0x48/0xc0
[ 149.910928] el0_svc+0x8/0xc
[ 149.911425]
[ 149.911696] Freed by task 3974:
[ 149.912242] __kasan_slab_free+0x114/0x228
[ 149.912955] kasan_slab_free+0x10/0x18
[ 149.913602] kfree+0x70/0x1f8
[ 149.914118] __arm64_sys_io_uring_register+0xc2c/0x15a8
[ 149.915009] el0_svc_common+0x100/0x258
[ 149.915670] el0_svc_handler+0x48/0xc0
[ 149.916317] el0_svc+0x8/0xc
[ 149.916817]
[ 149.917101] The buggy address belongs to the object at ffff8004ce07ed00
[ 149.917101] which belongs to the cache kmalloc-128 of size 128
[ 149.919197] The buggy address is located 0 bytes inside of
[ 149.919197] 128-byte region [ffff8004ce07ed00, ffff8004ce07ed80)
[ 149.921142] The buggy address belongs to the page:
[ 149.921953] page:ffff7e0013381f00 count:1 mapcount:0 mapping:ffff800503417c00 index:0x0 compound_mapcount: 0
[ 149.923595] flags: 0x1ffff00000010200(slab|head)
[ 149.924388] raw: 1ffff00000010200 dead000000000100 dead000000000200 ffff800503417c00
[ 149.925706] raw: 0000000000000000 0000000080400040 00000001ffffffff 0000000000000000
[ 149.927011] page dumped because: kasan: bad access detected
[ 149.927956]
[ 149.928224] Memory state around the buggy address:
[ 149.929054] ffff8004ce07ec00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[ 149.930274] ffff8004ce07ec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 149.931494] >ffff8004ce07ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 149.932712] ^
[ 149.933281] ffff8004ce07ed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 149.934508] ffff8004ce07ee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 149.935725] ==================================================================

2019-04-03 15:42:19

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

On 4/3/19 9:19 AM, Will Deacon wrote:
> Hi Jens,
>
> On Wed, Apr 03, 2019 at 07:49:26AM -0600, Jens Axboe wrote:
>> On 4/3/19 5:11 AM, Will Deacon wrote:
>>> will@autoplooker:~/liburing/test$ ./io_uring_register
>>> RELIMIT_MEMLOCK: 67108864 (67108864)
>>> [ 35.477875] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
>>> [ 35.478969] Mem abort info:
>>> [ 35.479296] ESR = 0x96000004
>>> [ 35.479785] Exception class = DABT (current EL), IL = 32 bits
>>> [ 35.480528] SET = 0, FnV = 0
>>> [ 35.480980] EA = 0, S1PTW = 0
>>> [ 35.481345] Data abort info:
>>> [ 35.481680] ISV = 0, ISS = 0x00000004
>>> [ 35.482267] CM = 0, WnR = 0
>>> [ 35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>>> [ 35.483486] [0000000000000070] pgd=0000000000000000
>>> [ 35.484041] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>> [ 35.484788] Modules linked in:
>>> [ 35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 5.1.0-rc3-00012-g40b114779944 #1
>>> [ 35.486712] Hardware name: linux,dummy-virt (DT)
>>> [ 35.487450] pstate: 20400005 (nzCv daif +PAN -UAO)
>>> [ 35.488228] pc : link_pwq+0x10/0x60
>>> [ 35.488794] lr : apply_wqattrs_commit+0xe0/0x118
>>> [ 35.489550] sp : ffff000017e2bbc0
>>
>> Huh, this looks odd, it's crashing inside the wq setup.
>
> Enabling KASAN seems to indicate a double-free, which may well be related.

Does this help?


diff --git a/fs/io_uring.c b/fs/io_uring.c
index bbdbd56cf2ac..07d6ef195d05 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2215,6 +2215,7 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
fput(ctx->user_files[i]);

kfree(ctx->user_files);
+ ctx->user_files = NULL;
ctx->nr_user_files = 0;
return ret;
}

--
Jens Axboe

2019-04-03 15:51:54

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

On Wed, Apr 03, 2019 at 09:39:52AM -0600, Jens Axboe wrote:
> On 4/3/19 9:19 AM, Will Deacon wrote:
> > On Wed, Apr 03, 2019 at 07:49:26AM -0600, Jens Axboe wrote:
> >> On 4/3/19 5:11 AM, Will Deacon wrote:
> >>> will@autoplooker:~/liburing/test$ ./io_uring_register
> >>> RELIMIT_MEMLOCK: 67108864 (67108864)
> >>> [ 35.477875] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
> >>> [ 35.478969] Mem abort info:
> >>> [ 35.479296] ESR = 0x96000004
> >>> [ 35.479785] Exception class = DABT (current EL), IL = 32 bits
> >>> [ 35.480528] SET = 0, FnV = 0
> >>> [ 35.480980] EA = 0, S1PTW = 0
> >>> [ 35.481345] Data abort info:
> >>> [ 35.481680] ISV = 0, ISS = 0x00000004
> >>> [ 35.482267] CM = 0, WnR = 0
> >>> [ 35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
> >>> [ 35.483486] [0000000000000070] pgd=0000000000000000
> >>> [ 35.484041] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> >>> [ 35.484788] Modules linked in:
> >>> [ 35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 5.1.0-rc3-00012-g40b114779944 #1
> >>> [ 35.486712] Hardware name: linux,dummy-virt (DT)
> >>> [ 35.487450] pstate: 20400005 (nzCv daif +PAN -UAO)
> >>> [ 35.488228] pc : link_pwq+0x10/0x60
> >>> [ 35.488794] lr : apply_wqattrs_commit+0xe0/0x118
> >>> [ 35.489550] sp : ffff000017e2bbc0
> >>
> >> Huh, this looks odd, it's crashing inside the wq setup.
> >
> > Enabling KASAN seems to indicate a double-free, which may well be related.
>
> Does this help?

Yes, thanks for the quick patch. Feel free to add:

Reported-by: Will Deacon <[email protected]>
Tested-by: Will Deacon <[email protected]>

if you spin a proper patch.

Will

> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index bbdbd56cf2ac..07d6ef195d05 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2215,6 +2215,7 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
> fput(ctx->user_files[i]);
>
> kfree(ctx->user_files);
> + ctx->user_files = NULL;
> ctx->nr_user_files = 0;
> return ret;
> }
>
> --
> Jens Axboe
>

2019-04-03 15:54:07

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

On 4/3/19 9:49 AM, Will Deacon wrote:
> On Wed, Apr 03, 2019 at 09:39:52AM -0600, Jens Axboe wrote:
>> On 4/3/19 9:19 AM, Will Deacon wrote:
>>> On Wed, Apr 03, 2019 at 07:49:26AM -0600, Jens Axboe wrote:
>>>> On 4/3/19 5:11 AM, Will Deacon wrote:
>>>>> will@autoplooker:~/liburing/test$ ./io_uring_register
>>>>> RELIMIT_MEMLOCK: 67108864 (67108864)
>>>>> [ 35.477875] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
>>>>> [ 35.478969] Mem abort info:
>>>>> [ 35.479296] ESR = 0x96000004
>>>>> [ 35.479785] Exception class = DABT (current EL), IL = 32 bits
>>>>> [ 35.480528] SET = 0, FnV = 0
>>>>> [ 35.480980] EA = 0, S1PTW = 0
>>>>> [ 35.481345] Data abort info:
>>>>> [ 35.481680] ISV = 0, ISS = 0x00000004
>>>>> [ 35.482267] CM = 0, WnR = 0
>>>>> [ 35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>>>>> [ 35.483486] [0000000000000070] pgd=0000000000000000
>>>>> [ 35.484041] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>>>> [ 35.484788] Modules linked in:
>>>>> [ 35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 5.1.0-rc3-00012-g40b114779944 #1
>>>>> [ 35.486712] Hardware name: linux,dummy-virt (DT)
>>>>> [ 35.487450] pstate: 20400005 (nzCv daif +PAN -UAO)
>>>>> [ 35.488228] pc : link_pwq+0x10/0x60
>>>>> [ 35.488794] lr : apply_wqattrs_commit+0xe0/0x118
>>>>> [ 35.489550] sp : ffff000017e2bbc0
>>>>
>>>> Huh, this looks odd, it's crashing inside the wq setup.
>>>
>>> Enabling KASAN seems to indicate a double-free, which may well be related.
>>
>> Does this help?
>
> Yes, thanks for the quick patch. Feel free to add:
>
> Reported-by: Will Deacon <[email protected]>
> Tested-by: Will Deacon <[email protected]>
>
> if you spin a proper patch.

Great, thanks for reporting/testing.

--
Jens Axboe

2019-04-04 06:10:08

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

Jens Axboe <[email protected]> writes:
> On 4/3/19 5:11 AM, Will Deacon wrote:
>> On Wed, Apr 03, 2019 at 01:47:50PM +1100, Michael Ellerman wrote:
>>> Arnd Bergmann <[email protected]> writes:
>>>> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
>>>> index b18abb0c3dae..00f5a63c8d9a 100644
>>>> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
>>>> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
>>>> @@ -505,3 +505,7 @@
>>>> 421 32 rt_sigtimedwait_time64 sys_rt_sigtimedwait compat_sys_rt_sigtimedwait_time64
>>>> 422 32 futex_time64 sys_futex sys_futex
>>>> 423 32 sched_rr_get_interval_time64 sys_sched_rr_get_interval sys_sched_rr_get_interval
>>>> +424 common pidfd_send_signal sys_pidfd_send_signal
>>>> +425 common io_uring_setup sys_io_uring_setup
>>>> +426 common io_uring_enter sys_io_uring_enter
>>>> +427 common io_uring_register sys_io_uring_register
>>>
>>> Acked-by: Michael Ellerman <[email protected]> (powerpc)
>>>
>>> Lightly tested.
>>>
>>> The pidfd_test selftest passes.
>>
>> That reports pass for me too, although it fails to unshare the pid ns, which I
>> assume is benign.

If you run it as root it should work?

>>> Ran the io_uring example from fio, which prints lots of:
>>
>> How did you invoke that? I had a play with the tests in:
>
> It's t/io_uring from the fio repo:
>
> git://git.kernel.dk/fio
>
> and you just run it ala:
>
> # make t/io_uring
> # t/io_uring /dev/some_device

Yeah that's all I did.

>> will@autoplooker:~/liburing/test$ ./io_uring_register
>> RELIMIT_MEMLOCK: 67108864 (67108864)
>> [ 35.477875] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
>> [ 35.478969] Mem abort info:
>> [ 35.479296] ESR = 0x96000004
>> [ 35.479785] Exception class = DABT (current EL), IL = 32 bits
>> [ 35.480528] SET = 0, FnV = 0
>> [ 35.480980] EA = 0, S1PTW = 0
>> [ 35.481345] Data abort info:
>> [ 35.481680] ISV = 0, ISS = 0x00000004
>> [ 35.482267] CM = 0, WnR = 0
>> [ 35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>> [ 35.483486] [0000000000000070] pgd=0000000000000000
>> [ 35.484041] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [ 35.484788] Modules linked in:
>> [ 35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 5.1.0-rc3-00012-g40b114779944 #1
>> [ 35.486712] Hardware name: linux,dummy-virt (DT)
>> [ 35.487450] pstate: 20400005 (nzCv daif +PAN -UAO)
>> [ 35.488228] pc : link_pwq+0x10/0x60
>> [ 35.488794] lr : apply_wqattrs_commit+0xe0/0x118
>> [ 35.489550] sp : ffff000017e2bbc0
>
> Huh, this looks odd, it's crashing inside the wq setup.

Looks like you found a bug :)

cheers