2019-06-04 16:12:28

by Christian Brauner

[permalink] [raw]
Subject: [PATCH v3 2/2] arch: wire-up clone3() syscall

Wire up the clone3() call on all arches that don't require hand-rolled
assembly.

Some of the arches look like they need special assembly massaging and it is
probably smarter if the appropriate arch maintainers would do the actual
wiring. Arches that are wired-up are:
- x86{_32,64}
- arm{64}
- xtensa

Signed-off-by: Christian Brauner <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: David Howells <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Adrian Reber <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
v1: unchanged
v2: unchanged
v3:
- Christian Brauner <[email protected]>:
- wire up clone3 on all arches that don't have hand-rolled entry points
for clone
---
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 ++
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/uapi/asm-generic/unistd.h | 4 +++-
8 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index aaf479a9e92d..e99a82bdb93a 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -447,3 +447,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+436 common clone3 sys_clone3
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 70e6882853c0..24480c2d95da 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -44,7 +44,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)

-#define __NR_compat_syscalls 434
+#define __NR_compat_syscalls 437
#endif

#define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index c39e90600bb3..b144ea675d70 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -886,6 +886,8 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
__SYSCALL(__NR_fsmount, sys_fsmount)
#define __NR_fspick 433
__SYSCALL(__NR_fspick, sys_fspick)
+#define __NR_clone3 436
+__SYSCALL(__NR_clone3, sys_clone3)

/*
* Please add new compat syscalls above this comment and update
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 26339e417695..3110440bcc31 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+436 common clone3 sys_clone3
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ad968b7bac72..80e26211feff 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -438,3 +438,4 @@
431 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig
432 i386 fsmount sys_fsmount __ia32_sys_fsmount
433 i386 fspick sys_fspick __ia32_sys_fspick
+436 i386 clone3 sys_clone3 __ia32_sys_clone3
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index b4e6f9e6204a..7968f0b5b5e8 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -355,6 +355,7 @@
431 common fsconfig __x64_sys_fsconfig
432 common fsmount __x64_sys_fsmount
433 common fspick __x64_sys_fspick
+436 common clone3 __x64_sys_clone3/ptregs

#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 5fa0ee1c8e00..b2767c8c2b4e 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -404,3 +404,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+436 common clone3 sys_clone3
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index a87904daf103..45bc87687c47 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
__SYSCALL(__NR_fsmount, sys_fsmount)
#define __NR_fspick 433
__SYSCALL(__NR_fspick, sys_fspick)
+#define __NR_clone3 436
+__SYSCALL(__NR_clone3, sys_clone3)

#undef __NR_syscalls
-#define __NR_syscalls 434
+#define __NR_syscalls 437

/*
* 32 bit systems traditionally used different
--
2.21.0


2019-06-04 18:41:40

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] arch: wire-up clone3() syscall

On Tue, Jun 4, 2019 at 6:09 PM Christian Brauner <[email protected]> wrote:
>
> Wire up the clone3() call on all arches that don't require hand-rolled
> assembly.
>
> Some of the arches look like they need special assembly massaging and it is
> probably smarter if the appropriate arch maintainers would do the actual
> wiring. Arches that are wired-up are:
> - x86{_32,64}
> - arm{64}
> - xtensa

The ones you did look good to me. I would hope that we can do all other
architectures the same way, even if they have special assembly wrappers
for the old clone(). The most interesting cases appear to be ia64, alpha,
m68k and sparc, so it would be good if their maintainers could take a
look.

What do you use for testing? Would it be possible to override the
internal clone() function in glibc with an LD_PRELOAD library
to quickly test one of the other architectures for regressions?

Arnd

2019-06-04 21:30:57

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] arch: wire-up clone3() syscall

On Tue, Jun 04, 2019 at 08:40:01PM +0200, Arnd Bergmann wrote:
> On Tue, Jun 4, 2019 at 6:09 PM Christian Brauner <[email protected]> wrote:
> >
> > Wire up the clone3() call on all arches that don't require hand-rolled
> > assembly.
> >
> > Some of the arches look like they need special assembly massaging and it is
> > probably smarter if the appropriate arch maintainers would do the actual
> > wiring. Arches that are wired-up are:
> > - x86{_32,64}
> > - arm{64}
> > - xtensa
>
> The ones you did look good to me. I would hope that we can do all other
> architectures the same way, even if they have special assembly wrappers
> for the old clone(). The most interesting cases appear to be ia64, alpha,
> m68k and sparc, so it would be good if their maintainers could take a
> look.

Yes, agreed. They can sort this out even after this lands.

>
> What do you use for testing? Would it be possible to override the
> internal clone() function in glibc with an LD_PRELOAD library
> to quickly test one of the other architectures for regressions?

I have a test program that is rather horrendously ugly and I compiled
kernels for x86 and the arms and tested in qemu. The program basically
looks like [1].

Christian

[1]:
#define _GNU_SOURCE
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <linux/sched.h>
#include <linux/types.h>
#include <sched.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mount.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/sysmacros.h>
#include <sys/types.h>
#include <sys/un.h>
#include <sys/wait.h>
#include <unistd.h>

static pid_t raw_clone(struct clone_args *args)
{
return syscall(__NR_clone3, args, sizeof(struct clone_args));
}

static pid_t raw_clone_legacy(int *pidfd, unsigned int flags)
{
return syscall(__NR_clone, flags, 0, pidfd, 0, 0);
}

static int wait_for_pid(pid_t pid)
{
int status, ret;

again:
ret = waitpid(pid, &status, 0);
if (ret == -1) {
if (errno == EINTR)
goto again;

return -1;
}

if (ret != pid)
goto again;

if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
return -1;

return 0;
}

#define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))
#define u64_to_ptr(n) ((uintptr_t)((__u64)(n)))

int main(int argc, char *argv[])
{
int pidfd = -1;
pid_t parent_tid = -1, pid = -1;
struct clone_args args = {0};
args.parent_tid = ptr_to_u64(&parent_tid);
args.pidfd = ptr_to_u64(&pidfd);
args.flags = CLONE_PIDFD | CLONE_PARENT_SETTID;
args.exit_signal = SIGCHLD;

pid = raw_clone(&args);
if (pid < 0) {
fprintf(stderr, "%s - Failed to create new process\n",
strerror(errno));
exit(EXIT_FAILURE);
}

if (pid == 0) {
printf("I am the child with pid %d\n", getpid());
exit(EXIT_SUCCESS);
}

printf("raw_clone: I am the parent. My child's pid is %d\n", pid);
printf("raw_clone: I am the parent. My child's pidfd is %d\n",
*(int *)args.pidfd);
printf("raw_clone: I am the parent. My child's paren_tid value is %d\n",
*(pid_t *)args.parent_tid);

if (wait_for_pid(pid))
exit(EXIT_FAILURE);

if (pid != *(pid_t *)args.parent_tid)
exit(EXIT_FAILURE);

close(pidfd);

printf("\n\n");
pidfd = -1;
pid = raw_clone_legacy(&pidfd, CLONE_PIDFD | SIGCHLD);
if (pid < 0) {
fprintf(stderr, "%s - Failed to create new process\n",
strerror(errno));
exit(EXIT_FAILURE);
}

if (pid == 0) {
printf("I am the child with pid %d\n", getpid());
exit(EXIT_SUCCESS);
}

printf("raw_clone_legacy: I am the parent. My child's pid is %d\n",
pid);
printf("raw_clone_legacy: I am the parent. My child's pidfd is %d\n",
pidfd);

if (wait_for_pid(pid))
exit(EXIT_FAILURE);

if (pid != *(pid_t *)args.parent_tid)
exit(EXIT_FAILURE);

return 0;
}

2019-06-20 18:45:16

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] arch: wire-up clone3() syscall

On Tue, Jun 04, 2019 at 06:09:44PM +0200, Christian Brauner wrote:
> Wire up the clone3() call on all arches that don't require hand-rolled
> assembly.
>
> Some of the arches look like they need special assembly massaging and it is
> probably smarter if the appropriate arch maintainers would do the actual
> wiring. Arches that are wired-up are:
> - x86{_32,64}
> - arm{64}
> - xtensa
>

This patch results in build failures on various architecetures.

h8300-linux-ld: arch/h8300/kernel/syscalls.o:(.data+0x6d0): undefined reference to `sys_clone3'

nios2-linux-ld: arch/nios2/kernel/syscall_table.o:(.data+0x6d0): undefined reference to `sys_clone3'

There may be others; -next is in too bad shape right now to get a complete
picture. Wondering, though: What is special with this syscall ? Normally
one would only get a warning that a syscall is not wired up.

Guenter

> Signed-off-by: Christian Brauner <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: David Howells <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Adrian Reber <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Florian Weimer <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> v1: unchanged
> v2: unchanged
> v3:
> - Christian Brauner <[email protected]>:
> - wire up clone3 on all arches that don't have hand-rolled entry points
> for clone
> ---
> arch/arm/tools/syscall.tbl | 1 +
> arch/arm64/include/asm/unistd.h | 2 +-
> arch/arm64/include/asm/unistd32.h | 2 ++
> arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
> include/uapi/asm-generic/unistd.h | 4 +++-
> 8 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> index aaf479a9e92d..e99a82bdb93a 100644
> --- a/arch/arm/tools/syscall.tbl
> +++ b/arch/arm/tools/syscall.tbl
> @@ -447,3 +447,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +436 common clone3 sys_clone3
> diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
> index 70e6882853c0..24480c2d95da 100644
> --- a/arch/arm64/include/asm/unistd.h
> +++ b/arch/arm64/include/asm/unistd.h
> @@ -44,7 +44,7 @@
> #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
> #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
>
> -#define __NR_compat_syscalls 434
> +#define __NR_compat_syscalls 437
> #endif
>
> #define __ARCH_WANT_SYS_CLONE
> diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
> index c39e90600bb3..b144ea675d70 100644
> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -886,6 +886,8 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
> __SYSCALL(__NR_fsmount, sys_fsmount)
> #define __NR_fspick 433
> __SYSCALL(__NR_fspick, sys_fspick)
> +#define __NR_clone3 436
> +__SYSCALL(__NR_clone3, sys_clone3)
>
> /*
> * Please add new compat syscalls above this comment and update
> diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
> index 26339e417695..3110440bcc31 100644
> --- a/arch/microblaze/kernel/syscalls/syscall.tbl
> +++ b/arch/microblaze/kernel/syscalls/syscall.tbl
> @@ -439,3 +439,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +436 common clone3 sys_clone3
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index ad968b7bac72..80e26211feff 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -438,3 +438,4 @@
> 431 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig
> 432 i386 fsmount sys_fsmount __ia32_sys_fsmount
> 433 i386 fspick sys_fspick __ia32_sys_fspick
> +436 i386 clone3 sys_clone3 __ia32_sys_clone3
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index b4e6f9e6204a..7968f0b5b5e8 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -355,6 +355,7 @@
> 431 common fsconfig __x64_sys_fsconfig
> 432 common fsmount __x64_sys_fsmount
> 433 common fspick __x64_sys_fspick
> +436 common clone3 __x64_sys_clone3/ptregs
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
> index 5fa0ee1c8e00..b2767c8c2b4e 100644
> --- a/arch/xtensa/kernel/syscalls/syscall.tbl
> +++ b/arch/xtensa/kernel/syscalls/syscall.tbl
> @@ -404,3 +404,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +436 common clone3 sys_clone3
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index a87904daf103..45bc87687c47 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
> __SYSCALL(__NR_fsmount, sys_fsmount)
> #define __NR_fspick 433
> __SYSCALL(__NR_fspick, sys_fspick)
> +#define __NR_clone3 436
> +__SYSCALL(__NR_clone3, sys_clone3)
>
> #undef __NR_syscalls
> -#define __NR_syscalls 434
> +#define __NR_syscalls 437
>
> /*
> * 32 bit systems traditionally used different

2019-06-20 22:10:34

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] arch: wire-up clone3() syscall

On Thu, Jun 20, 2019 at 11:44:51AM -0700, Guenter Roeck wrote:
> On Tue, Jun 04, 2019 at 06:09:44PM +0200, Christian Brauner wrote:
> > Wire up the clone3() call on all arches that don't require hand-rolled
> > assembly.
> >
> > Some of the arches look like they need special assembly massaging and it is
> > probably smarter if the appropriate arch maintainers would do the actual
> > wiring. Arches that are wired-up are:
> > - x86{_32,64}
> > - arm{64}
> > - xtensa
> >
>
> This patch results in build failures on various architecetures.
>
> h8300-linux-ld: arch/h8300/kernel/syscalls.o:(.data+0x6d0): undefined reference to `sys_clone3'
>
> nios2-linux-ld: arch/nios2/kernel/syscall_table.o:(.data+0x6d0): undefined reference to `sys_clone3'
>
> There may be others; -next is in too bad shape right now to get a complete
> picture. Wondering, though: What is special with this syscall ? Normally
> one would only get a warning that a syscall is not wired up.

clone3() was placed under __ARCH_WANT_SYS_CLONE. Most architectures
simply define __ARCH_WANT_SYS_CLONE and are done with it.
Some however, such as nios2 and h8300 don't define it but instead
provide a sys_clone stub of their own because of architectural
requirements (or tweaks) and they are mostly written in assembly. (That
should be left to arch maintainers for sys_clone3.)

The build failures were on my radar already. I hadn't yet replied
since I haven't pushed the fixup below.
The solution is to define __ARCH_WANT_SYS_CLONE3 and add a
cond_syscall(clone3) so we catch all architectures that do not yet
provide clone3 with a ENOSYS until maintainers have added it.

diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
index 7a39e77984ef..aa35aa5d68dc 100644
--- a/arch/arm/include/asm/unistd.h
+++ b/arch/arm/include/asm/unistd.h
@@ -40,6 +40,7 @@
#define __ARCH_WANT_SYS_FORK
#define __ARCH_WANT_SYS_VFORK
#define __ARCH_WANT_SYS_CLONE
+#define __ARCH_WANT_SYS_CLONE3

/*
* Unimplemented (or alternatively implemented) syscalls
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 24480c2d95da..e4e0523102e2 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -48,6 +48,7 @@
#endif

#define __ARCH_WANT_SYS_CLONE
+#define __ARCH_WANT_SYS_CLONE3

#ifndef __COMPAT_SYSCALL_NR
#include <uapi/asm/unistd.h>
diff --git a/arch/x86/include/asm/unistd.h b/arch/x86/include/asm/unistd.h
index 146859efd83c..097589753fec 100644
--- a/arch/x86/include/asm/unistd.h
+++ b/arch/x86/include/asm/unistd.h
@@ -54,5 +54,6 @@
# define __ARCH_WANT_SYS_FORK
# define __ARCH_WANT_SYS_VFORK
# define __ARCH_WANT_SYS_CLONE
+# define __ARCH_WANT_SYS_CLONE3

#endif /* _ASM_X86_UNISTD_H */
diff --git a/arch/xtensa/include/asm/unistd.h b/arch/xtensa/include/asm/unistd.h
index 30af4dc3ce7b..b52236245e51 100644
--- a/arch/xtensa/include/asm/unistd.h
+++ b/arch/xtensa/include/asm/unistd.h
@@ -3,6 +3,7 @@
#define _XTENSA_UNISTD_H

#define __ARCH_WANT_SYS_CLONE
+#define __ARCH_WANT_SYS_CLONE3
#include <uapi/asm/unistd.h>

#define __ARCH_WANT_NEW_STAT
diff --git a/kernel/fork.c b/kernel/fork.c
index 08ff131f26b4..98abea995629 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2490,7 +2490,9 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,

return _do_fork(&args);
}
+#endif

+#ifdef __ARCH_WANT_SYS_CLONE3
noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
struct clone_args __user *uargs,
size_t size)
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 4d9ae5ea6caf..34b76895b81e 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -137,6 +137,8 @@ COND_SYSCALL(capset);
/* kernel/exit.c */

/* kernel/fork.c */
+/* __ARCH_WANT_SYS_CLONE3 */
+COND_SYSCALL(clone3);

/* kernel/futex.c */
COND_SYSCALL(futex);

2019-06-21 09:40:19

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] arch: wire-up clone3() syscall

On Fri, Jun 21, 2019 at 12:10 AM Christian Brauner <[email protected]> wrote:
> On Thu, Jun 20, 2019 at 11:44:51AM -0700, Guenter Roeck wrote:
> > On Tue, Jun 04, 2019 at 06:09:44PM +0200, Christian Brauner wrote:
>
> clone3() was placed under __ARCH_WANT_SYS_CLONE. Most architectures
> simply define __ARCH_WANT_SYS_CLONE and are done with it.
> Some however, such as nios2 and h8300 don't define it but instead
> provide a sys_clone stub of their own because of architectural
> requirements (or tweaks) and they are mostly written in assembly. (That
> should be left to arch maintainers for sys_clone3.)
>
> The build failures were on my radar already. I hadn't yet replied
> since I haven't pushed the fixup below.
> The solution is to define __ARCH_WANT_SYS_CLONE3 and add a
> cond_syscall(clone3) so we catch all architectures that do not yet
> provide clone3 with a ENOSYS until maintainers have added it.
>
> diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
> index 7a39e77984ef..aa35aa5d68dc 100644
> --- a/arch/arm/include/asm/unistd.h
> +++ b/arch/arm/include/asm/unistd.h
> @@ -40,6 +40,7 @@
> #define __ARCH_WANT_SYS_FORK
> #define __ARCH_WANT_SYS_VFORK
> #define __ARCH_WANT_SYS_CLONE
> +#define __ARCH_WANT_SYS_CLONE3

I never really liked having __ARCH_WANT_SYS_CLONE here
because it was the only one that a new architecture needed to
set: all the other __ARCH_WANT_* are for system calls that
are already superseded by newer ones, so a new architecture
would start out with an empty list.

Since __ARCH_WANT_SYS_CLONE3 replaces
__ARCH_WANT_SYS_CLONE for new architectures, how about
leaving __ARCH_WANT_SYS_CLONE untouched but instead
coming up with the reverse for clone3 and mark the architectures
that specifically don't want it (if any)?

Arnd

2019-06-21 11:19:15

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] arch: wire-up clone3() syscall

On Fri, Jun 21, 2019 at 11:37:50AM +0200, Arnd Bergmann wrote:
> On Fri, Jun 21, 2019 at 12:10 AM Christian Brauner <[email protected]> wrote:
> > On Thu, Jun 20, 2019 at 11:44:51AM -0700, Guenter Roeck wrote:
> > > On Tue, Jun 04, 2019 at 06:09:44PM +0200, Christian Brauner wrote:
> >
> > clone3() was placed under __ARCH_WANT_SYS_CLONE. Most architectures
> > simply define __ARCH_WANT_SYS_CLONE and are done with it.
> > Some however, such as nios2 and h8300 don't define it but instead
> > provide a sys_clone stub of their own because of architectural
> > requirements (or tweaks) and they are mostly written in assembly. (That
> > should be left to arch maintainers for sys_clone3.)
> >
> > The build failures were on my radar already. I hadn't yet replied
> > since I haven't pushed the fixup below.
> > The solution is to define __ARCH_WANT_SYS_CLONE3 and add a
> > cond_syscall(clone3) so we catch all architectures that do not yet
> > provide clone3 with a ENOSYS until maintainers have added it.
> >
> > diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
> > index 7a39e77984ef..aa35aa5d68dc 100644
> > --- a/arch/arm/include/asm/unistd.h
> > +++ b/arch/arm/include/asm/unistd.h
> > @@ -40,6 +40,7 @@
> > #define __ARCH_WANT_SYS_FORK
> > #define __ARCH_WANT_SYS_VFORK
> > #define __ARCH_WANT_SYS_CLONE
> > +#define __ARCH_WANT_SYS_CLONE3
>
> I never really liked having __ARCH_WANT_SYS_CLONE here
> because it was the only one that a new architecture needed to
> set: all the other __ARCH_WANT_* are for system calls that
> are already superseded by newer ones, so a new architecture
> would start out with an empty list.
>
> Since __ARCH_WANT_SYS_CLONE3 replaces
> __ARCH_WANT_SYS_CLONE for new architectures, how about
> leaving __ARCH_WANT_SYS_CLONE untouched but instead

__ARCH_WANT_SYS_CLONE is left untouched. :)

> coming up with the reverse for clone3 and mark the architectures
> that specifically don't want it (if any)?

Afaict, your suggestion is more or less the same thing what is done
here. So I'm not sure it buys us anything apart from future
architectures not needing to set __ARCH_WANT_SYS_CLONE3.

I expect the macro above to be only here temporarily until all arches
have caught up and we're sure that they don't require assembly stubs
(cf. [1]). A decision I'd leave to the maintainers (since even for
nios2 we were kind of on the fence what exactly the sys_clone stub was
supposed to do).

But I'm happy to take a patch from you if it's equally or more simple
than this one right here.

In any case, linux-next should be fine on all arches with this fixup
now.

Christian


[1]: Architectures such as nios2 or h8300 simply take the asm-generic
syscall definitions and generate their syscall table from it. But
since they don't define __ARCH_WANT_SYS_CLONE the build would fail
complaining about sys_clone3 missing. The reason this doesn't
happen for legacy clone is that nios2 and h8300 provide assembly
stubs for sys_clone but they don't for sys_clone3.

2019-06-21 14:21:46

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] arch: wire-up clone3() syscall

On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <[email protected]> wrote:
> On Fri, Jun 21, 2019 at 11:37:50AM +0200, Arnd Bergmann wrote:
> >
> > I never really liked having __ARCH_WANT_SYS_CLONE here
> > because it was the only one that a new architecture needed to
> > set: all the other __ARCH_WANT_* are for system calls that
> > are already superseded by newer ones, so a new architecture
> > would start out with an empty list.
> >
> > Since __ARCH_WANT_SYS_CLONE3 replaces
> > __ARCH_WANT_SYS_CLONE for new architectures, how about
> > leaving __ARCH_WANT_SYS_CLONE untouched but instead
>
> __ARCH_WANT_SYS_CLONE is left untouched. :)
>
> > coming up with the reverse for clone3 and mark the architectures
> > that specifically don't want it (if any)?
>
> Afaict, your suggestion is more or less the same thing what is done
> here. So I'm not sure it buys us anything apart from future
> architectures not needing to set __ARCH_WANT_SYS_CLONE3.
>
> I expect the macro above to be only here temporarily until all arches
> have caught up and we're sure that they don't require assembly stubs
> (cf. [1]). A decision I'd leave to the maintainers (since even for
> nios2 we were kind of on the fence what exactly the sys_clone stub was
> supposed to do).
>
> But I'm happy to take a patch from you if it's equally or more simple
> than this one right here.
>
> In any case, linux-next should be fine on all arches with this fixup
> now.

I've looked at bit more closely at the nios2 implementation, and I
believe this is purely an artifact of this file being copied over
from m68k, which also has an odd definition. The glibc side
of nios2 clone() is also odd in other ways, but that appears
to be unrelated to the kernel ABI.

I think the best option here would be to not have any special
cases and just hook up clone3() the same way on all
architectures, with no #ifdef at all. If it turns out to not work
on a particular architecture later, they can still disable the
syscall then.

Arnd

2019-06-21 15:32:16

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] arch: wire-up clone3() syscall

On Fri, Jun 21, 2019 at 04:20:15PM +0200, Arnd Bergmann wrote:
> On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <[email protected]> wrote:
> > On Fri, Jun 21, 2019 at 11:37:50AM +0200, Arnd Bergmann wrote:
> > >
> > > I never really liked having __ARCH_WANT_SYS_CLONE here
> > > because it was the only one that a new architecture needed to
> > > set: all the other __ARCH_WANT_* are for system calls that
> > > are already superseded by newer ones, so a new architecture
> > > would start out with an empty list.
> > >
> > > Since __ARCH_WANT_SYS_CLONE3 replaces
> > > __ARCH_WANT_SYS_CLONE for new architectures, how about
> > > leaving __ARCH_WANT_SYS_CLONE untouched but instead
> >
> > __ARCH_WANT_SYS_CLONE is left untouched. :)
> >
> > > coming up with the reverse for clone3 and mark the architectures
> > > that specifically don't want it (if any)?
> >
> > Afaict, your suggestion is more or less the same thing what is done
> > here. So I'm not sure it buys us anything apart from future
> > architectures not needing to set __ARCH_WANT_SYS_CLONE3.
> >
> > I expect the macro above to be only here temporarily until all arches
> > have caught up and we're sure that they don't require assembly stubs
> > (cf. [1]). A decision I'd leave to the maintainers (since even for
> > nios2 we were kind of on the fence what exactly the sys_clone stub was
> > supposed to do).
> >
> > But I'm happy to take a patch from you if it's equally or more simple
> > than this one right here.
> >
> > In any case, linux-next should be fine on all arches with this fixup
> > now.
>
> I've looked at bit more closely at the nios2 implementation, and I
> believe this is purely an artifact of this file being copied over
> from m68k, which also has an odd definition. The glibc side
> of nios2 clone() is also odd in other ways, but that appears
> to be unrelated to the kernel ABI.
>
> I think the best option here would be to not have any special
> cases and just hook up clone3() the same way on all
> architectures, with no #ifdef at all. If it turns out to not work
> on a particular architecture later, they can still disable the
> syscall then.

Hm, if you believe that this is fine and want to "vouch" for it by
whipping up a patch that replaces the wiring up done in [1] I'm happy to
take it. :) Otherwise I'd feel more comfortable not adding all arches at
once.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone

Christian

2019-07-01 15:51:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] arch: wire-up clone3() syscall

On Fri, Jun 21, 2019 at 5:30 PM Christian Brauner <[email protected]> wrote:
> On Fri, Jun 21, 2019 at 04:20:15PM +0200, Arnd Bergmann wrote:
> > On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <[email protected]> wrote:
> Hm, if you believe that this is fine and want to "vouch" for it by
> whipping up a patch that replaces the wiring up done in [1] I'm happy to
> take it. :) Otherwise I'd feel more comfortable not adding all arches at
> once.
>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone

Sorry for my late reply. I had actually looked at the implementations
in a little
more detail and I think you are right that adding these are better
left to the arch
maintainers in case of clone3.

Arnd

2019-07-01 16:03:49

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] arch: wire-up clone3() syscall

On Mon, Jul 01, 2019 at 05:14:51PM +0200, Arnd Bergmann wrote:
> On Fri, Jun 21, 2019 at 5:30 PM Christian Brauner <[email protected]> wrote:
> > On Fri, Jun 21, 2019 at 04:20:15PM +0200, Arnd Bergmann wrote:
> > > On Fri, Jun 21, 2019 at 1:18 PM Christian Brauner <[email protected]> wrote:
> > Hm, if you believe that this is fine and want to "vouch" for it by
> > whipping up a patch that replaces the wiring up done in [1] I'm happy to
> > take it. :) Otherwise I'd feel more comfortable not adding all arches at
> > once.
> >
> > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone
>
> Sorry for my late reply. I had actually looked at the implementations
> in a little
> more detail and I think you are right that adding these are better
> left to the arch
> maintainers in case of clone3.

Perfect, thanks!

Christian

2020-01-16 00:17:16

by Vineet Gupta

[permalink] [raw]
Subject: clone3 on ARC (was Re: [PATCH v3 2/2] arch: wire-up clone3() syscall)

On 6/4/19 2:29 PM, Christian Brauner wrote:
> On Tue, Jun 04, 2019 at 08:40:01PM +0200, Arnd Bergmann wrote:
>> On Tue, Jun 4, 2019 at 6:09 PM Christian Brauner <[email protected]> wrote:
>>>
>>> Wire up the clone3() call on all arches that don't require hand-rolled
>>> assembly.
>>>
>>> Some of the arches look like they need special assembly massaging and it is
>>> probably smarter if the appropriate arch maintainers would do the actual
>>> wiring. Arches that are wired-up are:
>>> - x86{_32,64}
>>> - arm{64}
>>> - xtensa
>>
>> The ones you did look good to me. I would hope that we can do all other
>> architectures the same way, even if they have special assembly wrappers
>> for the old clone(). The most interesting cases appear to be ia64, alpha,
>> m68k and sparc, so it would be good if their maintainers could take a
>> look.
>
> Yes, agreed. They can sort this out even after this lands.
>
>>
>> What do you use for testing? Would it be possible to override the
>> internal clone() function in glibc with an LD_PRELOAD library
>> to quickly test one of the other architectures for regressions?
>
> I have a test program that is rather horrendously ugly and I compiled
> kernels for x86 and the arms and tested in qemu. The program basically
> looks like [1].

I just got around to fixing this for ARC (patch to follow after we sort out the
testing) and was trying to use the test case below for a qucik and dirty smoke
test (so existing toolchain lacking with headers lacking NR_clone3 or struct
clone_args etc). I did hack those up, but then spotted below

uapi/linux/sched.h

| struct clone_args {
| __aligned_u64 flags;
| __aligned_u64 pidfd;
| __aligned_u64 child_tid;
| __aligned_u64 parent_tid;
..
..

Are all clone3 arg fields supposed to be 64-bit wide, even things like @child_tid,
@tls .... which are traditionally ARCH word wide ?


>
> Christian
>
> [1]:
> #define _GNU_SOURCE
> #include <err.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <linux/sched.h>
> #include <linux/types.h>
> #include <sched.h>
> #include <signal.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mount.h>
> #include <sys/socket.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/sysmacros.h>
> #include <sys/types.h>
> #include <sys/un.h>
> #include <sys/wait.h>
> #include <unistd.h>
>
> static pid_t raw_clone(struct clone_args *args)
> {
> return syscall(__NR_clone3, args, sizeof(struct clone_args));
> }
>
> static pid_t raw_clone_legacy(int *pidfd, unsigned int flags)
> {
> return syscall(__NR_clone, flags, 0, pidfd, 0, 0);
> }
>
> static int wait_for_pid(pid_t pid)
> {
> int status, ret;
>
> again:
> ret = waitpid(pid, &status, 0);
> if (ret == -1) {
> if (errno == EINTR)
> goto again;
>
> return -1;
> }
>
> if (ret != pid)
> goto again;
>
> if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
> return -1;
>
> return 0;
> }
>
> #define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))
> #define u64_to_ptr(n) ((uintptr_t)((__u64)(n)))
>
> int main(int argc, char *argv[])
> {
> int pidfd = -1;
> pid_t parent_tid = -1, pid = -1;
> struct clone_args args = {0};
> args.parent_tid = ptr_to_u64(&parent_tid);
> args.pidfd = ptr_to_u64(&pidfd);
> args.flags = CLONE_PIDFD | CLONE_PARENT_SETTID;
> args.exit_signal = SIGCHLD;
>
> pid = raw_clone(&args);
> if (pid < 0) {
> fprintf(stderr, "%s - Failed to create new process\n",
> strerror(errno));
> exit(EXIT_FAILURE);
> }
>
> if (pid == 0) {
> printf("I am the child with pid %d\n", getpid());
> exit(EXIT_SUCCESS);
> }
>
> printf("raw_clone: I am the parent. My child's pid is %d\n", pid);
> printf("raw_clone: I am the parent. My child's pidfd is %d\n",
> *(int *)args.pidfd);
> printf("raw_clone: I am the parent. My child's paren_tid value is %d\n",
> *(pid_t *)args.parent_tid);
>
> if (wait_for_pid(pid))
> exit(EXIT_FAILURE);
>
> if (pid != *(pid_t *)args.parent_tid)
> exit(EXIT_FAILURE);
>
> close(pidfd);
>
> printf("\n\n");
> pidfd = -1;
> pid = raw_clone_legacy(&pidfd, CLONE_PIDFD | SIGCHLD);
> if (pid < 0) {
> fprintf(stderr, "%s - Failed to create new process\n",
> strerror(errno));
> exit(EXIT_FAILURE);
> }
>
> if (pid == 0) {
> printf("I am the child with pid %d\n", getpid());
> exit(EXIT_SUCCESS);
> }
>
> printf("raw_clone_legacy: I am the parent. My child's pid is %d\n",
> pid);
> printf("raw_clone_legacy: I am the parent. My child's pidfd is %d\n",
> pidfd);
>
> if (wait_for_pid(pid))
> exit(EXIT_FAILURE);
>
> if (pid != *(pid_t *)args.parent_tid)
> exit(EXIT_FAILURE);
>
> return 0;
> }
>

2020-01-16 11:27:33

by Christian Brauner

[permalink] [raw]
Subject: Re: clone3 on ARC (was Re: [PATCH v3 2/2] arch: wire-up clone3() syscall)

On Wed, Jan 15, 2020 at 10:41:20PM +0000, Vineet Gupta wrote:
> On 6/4/19 2:29 PM, Christian Brauner wrote:
> > On Tue, Jun 04, 2019 at 08:40:01PM +0200, Arnd Bergmann wrote:
> >> On Tue, Jun 4, 2019 at 6:09 PM Christian Brauner <[email protected]> wrote:
> >>>
> >>> Wire up the clone3() call on all arches that don't require hand-rolled
> >>> assembly.
> >>>
> >>> Some of the arches look like they need special assembly massaging and it is
> >>> probably smarter if the appropriate arch maintainers would do the actual
> >>> wiring. Arches that are wired-up are:
> >>> - x86{_32,64}
> >>> - arm{64}
> >>> - xtensa
> >>
> >> The ones you did look good to me. I would hope that we can do all other
> >> architectures the same way, even if they have special assembly wrappers
> >> for the old clone(). The most interesting cases appear to be ia64, alpha,
> >> m68k and sparc, so it would be good if their maintainers could take a
> >> look.
> >
> > Yes, agreed. They can sort this out even after this lands.
> >
> >>
> >> What do you use for testing? Would it be possible to override the
> >> internal clone() function in glibc with an LD_PRELOAD library
> >> to quickly test one of the other architectures for regressions?
> >
> > I have a test program that is rather horrendously ugly and I compiled
> > kernels for x86 and the arms and tested in qemu. The program basically
> > looks like [1].
>
> I just got around to fixing this for ARC (patch to follow after we sort out the
> testing) and was trying to use the test case below for a qucik and dirty smoke
> test (so existing toolchain lacking with headers lacking NR_clone3 or struct
> clone_args etc). I did hack those up, but then spotted below
>
> uapi/linux/sched.h
>
> | struct clone_args {
> | __aligned_u64 flags;
> | __aligned_u64 pidfd;
> | __aligned_u64 child_tid;
> | __aligned_u64 parent_tid;
> ..
> ..
>
> Are all clone3 arg fields supposed to be 64-bit wide, even things like @child_tid,
> @tls .... which are traditionally ARCH word wide ?

This is just the kernel ABI we expose to userspace with the intention to
make it easy for us to handle 32 and 64 bit. A libc like glibc is
expected to expose a properly typed struct to userspace. The kernel
struct kernel_clone_args has "correct" typing.

Christian