2019-05-31 19:25:09

by Palmer Dabbelt

[permalink] [raw]
Subject: Add a new fchmodat4() syscall

I spent half of dinner last night being complained to by one of our
hardware engineers about Linux's lack of support for the flags argument
to fchmodat(). This all came about because of a FUSE filesystem
implementation, and while there are some application-specific
workarounds for the issue it seemed to me like the cleanest bet was to
just go add another fchmodat() that supports flags to the kernel.

The actual implementation is super simple: essentially it's just
the same as fchmodat(), but LOOKUP_FOLLOW is conditionally set based on
the flags. I've attempted to make this match "man 2 fchmodat" as
closely as possible, which says EINVAL is returned for invalid flags (as
opposed to ENOTSUPP, which is currently returned by glibc for
AT_SYMLINK_NOFOLLOW). I have a sketch of a glibc patch that I haven't
even compiled yet, but seems fairly straight-forward:

diff --git a/sysdeps/unix/sysv/linux/fchmodat.c b/sysdeps/unix/sysv/linux/fchmodat.c
index 6d9cbc1ce9e0..b1beab76d56c 100644
--- a/sysdeps/unix/sysv/linux/fchmodat.c
+++ b/sysdeps/unix/sysv/linux/fchmodat.c
@@ -29,12 +29,36 @@
int
fchmodat (int fd, const char *file, mode_t mode, int flag)
{
- if (flag & ~AT_SYMLINK_NOFOLLOW)
- return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
-#ifndef __NR_lchmod /* Linux so far has no lchmod syscall. */
+ /* There are four paths through this code:
+ - The flags are zero. In this case it's fine to call fchmodat.
+ - The flags are non-zero and glibc doesn't have access to
+ __NR_fchmodat4. In this case all we can do is emulate the error codes
+ defined by the glibc interface from userspace.
+ - The flags are non-zero, glibc has __NR_fchmodat4, and the kernel has
+ fchmodat4. This is the simplest case, as the fchmodat4 syscall exactly
+ matches glibc's library interface so it can be called directly.
+ - The flags are non-zero, glibc has __NR_fchmodat4, but the kernel does
+ not. In this case we must respect the error codes defined by the glibc
+ interface instead of returning ENOSYS.
+ The intent here is to ensure that the kernel is called at most once per
+ library call, and that the error types defined by glibc are always
+ respected. */
+
+#ifdef __NR_fchmodat4
+ long result;
+#endif
+
+ if (flag == 0)
+ return INLINE_SYSCALL (fchmodat, 3, fd, file, mode);
+
+#ifdef __NR_fchmodat4
+ result = INLINE_SYSCALL (fchmodat4, 4, fd, file, mode, flag);
+ if (result == 0 || errno != ENOSYS)
+ return result;
+#endif
+
if (flag & AT_SYMLINK_NOFOLLOW)
return INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOTSUP);
-#endif

- return INLINE_SYSCALL (fchmodat, 3, fd, file, mode);
+ return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
}

I've never added a new syscall before so I'm not really sure what the
proper procedure to follow is. I'm assuming any new syscall will
involve fairly significant discussion, so I've just done the minimum of
an implementation for this patch set. Specifically, I've:

* Defined a new syscall that looks like fchmodat but includes a flag
argument, which I'm calling fchmodat4 because it has 4 arguments. I
don't know if that's the correct naming convention, and don't really
have any skin in that game.
* Implemented that syscall by extending the fchmod code to handle flags,
which is pretty straight-forward. I think it's sane, but given that
it's so simple I'm not sure if I'm missing something -- specifically,
I didn't go check to make sure the semantics of AT_SYMLINK_NOFOLLOW
match !LOOKUP_FOLLOW. I'm assuming the do, but sometimes when I look
at something and say "that's so simple, how is it broken" I'm actually
just missing something entirely.
* Added an asm-generic syscall number for this, which I assume I'm
supposed to do this first as it looks like we're trying to keep the
numbers in sync everywhere.
* Added x86 syscalls for this so I could test it.

I also cleaned up a checkpatch issue in fchmodat(). I only found this
because I copied the fchmodat() interface for fchmodat4() and it threw
the warning, I don't personally care either way as to whether or not the
space is in there.

I've given this fairly minimal testing. Essentially all I've done is
booted up 5.1.6 with this patch set on my local development box and run

$ touch test-file
$ ln -s test-file test-link
$ cat > test.c
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv)
{
long out;

out = syscall(428, AT_FDCWD, "test-file", 0x888, AT_SYMLINK_NOFOLLOW);
printf("fchmodat4(AT_FDCWD, \"test-file\", 0x888, AT_SYMLINK_NOFOLLOW): %ld\n", out);

out = syscall(428, AT_FDCWD, "test-file", 0x888, 0);
printf("fchmodat4(AT_FDCWD, \"test-file\", 0x888, 0): %ld\n", out);

out = syscall(268, AT_FDCWD, "test-file", 0x888);
printf("fchmodat(AT_FDCWD, \"test-file\", 0x888): %ld\n", out);

out = syscall(428, AT_FDCWD, "test-link", 0x888, AT_SYMLINK_NOFOLLOW);
printf("fchmodat4(AT_FDCWD, \"test-link\", 0x888, AT_SYMLINK_NOFOLLOW): %ld\n", out);

out = syscall(428, AT_FDCWD, "test-link", 0x888, 0);
printf("fchmodat4(AT_FDCWD, \"test-link\", 0x888, 0): %ld\n", out);

out = syscall(268, AT_FDCWD, "test-link", 0x888);
printf("fchmodat(AT_FDCWD, \"test-link\", 0x888): %ld\n", out);

return 0;
}
$ gcc test.c -o test
$ ./test
fchmodat4(AT_FDCWD, "test-file", 0x888, AT_SYMLINK_NOFOLLOW): 0
fchmodat4(AT_FDCWD, "test-file", 0x888, 0): 0
fchmodat(AT_FDCWD, "test-file", 0x888): 0
fchmodat4(AT_FDCWD, "test-link", 0x888, AT_SYMLINK_NOFOLLOW): -1
fchmodat4(AT_FDCWD, "test-link", 0x888, 0): 0
fchmodat(AT_FDCWD, "test-link", 0x888): 0

While I don't think there's any reason what's there is unacceptable, I
don't really consider this finished. I couldn't find a cookbook for
"here's how you add a system call", but all I really did was "git grep
add | grep syscall" so if there's something out there then please let me
know and I'll follow it. Specifically, I haven't:

* Added any sort of documentation. I don't find anything with a "git
grep fchmodat", so I'm assuming it's just the man pages that are
relevant here.
* Fixed any of the other architectures. I'm assuming this is just the
mechanical process of fixing all these in the same way I did for x86.

arch/alpha/kernel/syscalls/syscall.tbl:461 common fchmodat sys_fchmodat
arch/arm/tools/syscall.tbl:333 common fchmodat sys_fchmodat
arch/arm64/include/asm/unistd32.h:#define __NR_fchmodat 333
arch/arm64/include/asm/unistd32.h:__SYSCALL(__NR_fchmodat, sys_fchmodat)
arch/ia64/kernel/fsys.S: data8 0 // fchmodat
arch/ia64/kernel/syscalls/syscall.tbl:268 common fchmodat sys_fchmodat
arch/m68k/kernel/syscalls/syscall.tbl:299 common fchmodat sys_fchmodat
arch/microblaze/kernel/syscalls/syscall.tbl:306 common fchmodat sys_fchmodat
arch/mips/kernel/syscalls/syscall_n32.tbl:262 n32 fchmodat sys_fchmodat
arch/mips/kernel/syscalls/syscall_n64.tbl:258 n64 fchmodat sys_fchmodat
arch/mips/kernel/syscalls/syscall_o32.tbl:299 o32 fchmodat sys_fchmodat
arch/parisc/kernel/syscalls/syscall.tbl:286 common fchmodat sys_fchmodat
arch/powerpc/kernel/syscalls/syscall.tbl:297 common fchmodat sys_fchmodat
arch/s390/kernel/syscalls/syscall.tbl:299 common fchmodat sys_fchmodat sys_fchmodat
arch/sh/include/uapi/asm/unistd_64.h:#define __NR_fchmodat 334
arch/sh/kernel/syscalls/syscall.tbl:306 common fchmodat sys_fchmodat
arch/sh/kernel/syscalls_64.S: .long sys_fchmodat
arch/sparc/kernel/syscalls/syscall.tbl:295 common fchmodat sys_fchmodat
arch/xtensa/kernel/syscalls/syscall.tbl:300 common fchmodat sys_fchmodat
* Looked at anything in tools. Again, I'm assuming it's just a
mechanical process of looking at all of these and adding fchmodat4.

tools/include/nolibc/nolibc.h:#ifdef __NR_fchmodat
tools/include/nolibc/nolibc.h: return my_syscall4(__NR_fchmodat, AT_FDCWD, path, mode, 0);
tools/include/uapi/asm-generic/unistd.h:#define __NR_fchmodat 53
tools/include/uapi/asm-generic/unistd.h:__SYSCALL(__NR_fchmodat, sys_fchmodat)
tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:297 common fchmodat sys_fchmodat
tools/perf/arch/s390/entry/syscalls/syscall.tbl:299 common fchmodat sys_fchmodat compat_sys_fchmodat
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:268 common fchmodat __x64_sys_fchmodat
tools/perf/builtin-trace.c: { .name = "fchmodat",

* Done anything with userspace, aside from thinking about the glibc code
above. I'd assume that I'm meant to bring in libc-alpha to the
discussion, but I didn't want to do so this early in case this was
just a non-starter.

I'm happy dealing with all of that, but given that I'm assuming there's
going to be some discussion I wanted to send out the proof-of-concept
first to see if this has any legs. Aside from the glibc side the
remaining work smells pretty mechanical, so I figured I'd wait on that
until I knew it wasn't going to be a waste of time -- partially because
I'm lazy, but mostly because I just realized I blew my whole morning
working on this when all I really wanted to do was avoid discussing
fchmodat in the first place :)



2019-05-31 19:25:15

by Palmer Dabbelt

[permalink] [raw]
Subject: [PATCH 2/5] Add fchmodat4(), a new syscall

man 3p says that fchmodat() takes a flags argument, but the Linux
syscall does not. There doesn't appear to be a good userspace
workaround for this issue but the implementation in the kernel is pretty
straight-forward. The specific use case where the missing flags came up
was WRT a fuse filesystem implemenation, but the functionality is pretty
generic so I'm assuming there would be other use cases.

Signed-off-by: Palmer Dabbelt <[email protected]>
---
fs/open.c | 21 +++++++++++++++++++--
include/linux/syscalls.h | 5 +++++
2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index a00350018a47..cfad7684e8d3 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -568,11 +568,17 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
return ksys_fchmod(fd, mode);
}

-int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
+int do_fchmodat4(int dfd, const char __user *filename, umode_t mode, int flags)
{
struct path path;
int error;
- unsigned int lookup_flags = LOOKUP_FOLLOW;
+ unsigned int lookup_flags;
+
+ if (unlikely(flags & ~AT_SYMLINK_NOFOLLOW))
+ return -EINVAL;
+
+ lookup_flags = flags & AT_SYMLINK_NOFOLLOW ? 0 : LOOKUP_FOLLOW;
+
retry:
error = user_path_at(dfd, filename, lookup_flags, &path);
if (!error) {
@@ -586,6 +592,17 @@ int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
return error;
}

+SYSCALL_DEFINE4(fchmodat4, int, dfd, const char __user *, filename,
+ umode_t, mode, int, flags)
+{
+ return do_fchmodat4(dfd, filename, mode, flags);
+}
+
+int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
+{
+ return do_fchmodat4(dfd, filename, mode, 0);
+}
+
SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename,
umode_t, mode)
{
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 396871b218f4..cb040a412a4c 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -435,6 +435,8 @@ asmlinkage long sys_chroot(const char __user *filename);
asmlinkage long sys_fchmod(unsigned int fd, umode_t mode);
asmlinkage long sys_fchmodat(int dfd, const char __user *filename,
umode_t mode);
+asmlinkage long sys_fchmodat4(int dfd, const char __user *filename,
+ umode_t mode, int flags);
asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
gid_t group, int flag);
asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group);
@@ -1315,6 +1317,9 @@ static inline long ksys_link(const char __user *oldname,

extern int do_fchmodat(int dfd, const char __user *filename, umode_t mode);

+extern int do_fchmodat4(int dfd, const char __user *filename, umode_t mode,
+ int flags);
+
static inline int ksys_chmod(const char __user *filename, umode_t mode)
{
return do_fchmodat(AT_FDCWD, filename, mode);
--
2.21.0

2019-05-31 19:25:17

by Palmer Dabbelt

[permalink] [raw]
Subject: [PATCH 4/5] x86: Add fchmodat4 to syscall_64.tbl

Signed-off-by: Palmer Dabbelt <[email protected]>
---
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 92ee0b4378d4..998aa3eb09e2 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -349,6 +349,7 @@
425 common io_uring_setup __x64_sys_io_uring_setup
426 common io_uring_enter __x64_sys_io_uring_enter
427 common io_uring_register __x64_sys_io_uring_register
+428 common fchmodat4 __x64_sys_fchmodat4

#
# x32-specific system call numbers start at 512 to avoid cache impact
--
2.21.0

2019-05-31 19:25:53

by Palmer Dabbelt

[permalink] [raw]
Subject: [PATCH 1/5] Non-functional cleanup of a "__user * filename"

The next patch defines a very similar interface, which I copied from
this definition. Since I'm touching it anyway I don't see any reason
not to just go fix this one up.

Signed-off-by: Palmer Dabbelt <[email protected]>
---
include/linux/syscalls.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e446806a561f..396871b218f4 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -433,7 +433,7 @@ asmlinkage long sys_chdir(const char __user *filename);
asmlinkage long sys_fchdir(unsigned int fd);
asmlinkage long sys_chroot(const char __user *filename);
asmlinkage long sys_fchmod(unsigned int fd, umode_t mode);
-asmlinkage long sys_fchmodat(int dfd, const char __user * filename,
+asmlinkage long sys_fchmodat(int dfd, const char __user *filename,
umode_t mode);
asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
gid_t group, int flag);
--
2.21.0

2019-05-31 19:26:20

by Palmer Dabbelt

[permalink] [raw]
Subject: [PATCH 5/5] x86: Add fchmod4 to syscall_32.tbl

Signed-off-by: Palmer Dabbelt <[email protected]>
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 1f9607ed087c..319c7a6d3f02 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -433,3 +433,4 @@
425 i386 io_uring_setup sys_io_uring_setup __ia32_sys_io_uring_setup
426 i386 io_uring_enter sys_io_uring_enter __ia32_sys_io_uring_enter
427 i386 io_uring_register sys_io_uring_register __ia32_sys_io_uring_register
+428 i386 fchmodat4 sys_fchmodat4 __ia32_sys_fchmodat4
--
2.21.0

2019-05-31 19:26:44

by Palmer Dabbelt

[permalink] [raw]
Subject: [PATCH 3/5] asm-generic: Register fchmodat4 as syscall 428

Signed-off-by: Palmer Dabbelt <[email protected]>
---
include/uapi/asm-generic/unistd.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index dee7292e1df6..f0f4cad4c416 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -833,8 +833,11 @@ __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
#define __NR_io_uring_register 427
__SYSCALL(__NR_io_uring_register, sys_io_uring_register)

+#define __NR_fchmodat4 428
+__SYSCALL(__NR_fchmodat4, sys_fchmodat4)
+
#undef __NR_syscalls
-#define __NR_syscalls 428
+#define __NR_syscalls 429

/*
* 32 bit systems traditionally used different
--
2.21.0

2019-05-31 19:53:44

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/5] Add fchmodat4(), a new syscall

On Fri, May 31, 2019 at 9:23 PM Palmer Dabbelt <[email protected]> wrote:
>
> man 3p says that fchmodat() takes a flags argument, but the Linux
> syscall does not. There doesn't appear to be a good userspace
> workaround for this issue but the implementation in the kernel is pretty
> straight-forward. The specific use case where the missing flags came up
> was WRT a fuse filesystem implemenation, but the functionality is pretty
> generic so I'm assuming there would be other use cases.
>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> ---
> fs/open.c | 21 +++++++++++++++++++--
> include/linux/syscalls.h | 5 +++++
> 2 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/fs/open.c b/fs/open.c
> index a00350018a47..cfad7684e8d3 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -568,11 +568,17 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
> return ksys_fchmod(fd, mode);
> }
>
> -int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> +int do_fchmodat4(int dfd, const char __user *filename, umode_t mode, int flags)
...
> +
> +int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> +{
> + return do_fchmodat4(dfd, filename, mode, 0);
> +}
> +

There is only one external caller of do_fchmodat(), so just change that
to pass the extra argument here, and keep a single do_fchmodat()
function used by the sys_fchmod(), sys_fchmod4(), sys_chmod()
and ksys_chmod().

Arnd

2019-05-31 19:59:14

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 3/5] asm-generic: Register fchmodat4 as syscall 428

On Fri, May 31, 2019 at 9:23 PM Palmer Dabbelt <[email protected]> wrote:
>
> Signed-off-by: Palmer Dabbelt <[email protected]>

As usual, each patch needs a changelog text. I would prefer having a single
patch here that changes /all/ system call tables at once, rather than doing one
at a time like we used to.

In linux-next, we are at number 434 now, and there are a couple of other
new system calls being proposed right now (more than usual), so you may
have to change the number a few times.

Note: most architectures use .tbl files now, the exceptions are
include/uapi/asm-generic/unistd.h and arch/arm64/include/asm/unistd32.h,
and the latter also requires changing __NR_compat_syscalls in asm/unistd.h.

Numbers should now be the same across architectures, except for alpha,
which has a +110 offset. We have discussed ways to have a single
file to modify for a new call on all architectures, but no patches yet.

Arnd

2019-05-31 23:38:03

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH 2/5] Add fchmodat4(), a new syscall

On Fri, 31 May 2019 12:51:00 PDT (-0700), Arnd Bergmann wrote:
> On Fri, May 31, 2019 at 9:23 PM Palmer Dabbelt <[email protected]> wrote:
>>
>> man 3p says that fchmodat() takes a flags argument, but the Linux
>> syscall does not. There doesn't appear to be a good userspace
>> workaround for this issue but the implementation in the kernel is pretty
>> straight-forward. The specific use case where the missing flags came up
>> was WRT a fuse filesystem implemenation, but the functionality is pretty
>> generic so I'm assuming there would be other use cases.
>>
>> Signed-off-by: Palmer Dabbelt <[email protected]>
>> ---
>> fs/open.c | 21 +++++++++++++++++++--
>> include/linux/syscalls.h | 5 +++++
>> 2 files changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/open.c b/fs/open.c
>> index a00350018a47..cfad7684e8d3 100644
>> --- a/fs/open.c
>> +++ b/fs/open.c
>> @@ -568,11 +568,17 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
>> return ksys_fchmod(fd, mode);
>> }
>>
>> -int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
>> +int do_fchmodat4(int dfd, const char __user *filename, umode_t mode, int flags)
> ...
>> +
>> +int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
>> +{
>> + return do_fchmodat4(dfd, filename, mode, 0);
>> +}
>> +
>
> There is only one external caller of do_fchmodat(), so just change that
> to pass the extra argument here, and keep a single do_fchmodat()
> function used by the sys_fchmod(), sys_fchmod4(), sys_chmod()
> and ksys_chmod().

OK, I'll roll that up into a v2.

2019-05-31 23:59:18

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH 3/5] asm-generic: Register fchmodat4 as syscall 428

On Fri, 31 May 2019 12:56:39 PDT (-0700), Arnd Bergmann wrote:
> On Fri, May 31, 2019 at 9:23 PM Palmer Dabbelt <[email protected]> wrote:
>>
>> Signed-off-by: Palmer Dabbelt <[email protected]>
>
> As usual, each patch needs a changelog text. I would prefer having a single
> patch here that changes /all/ system call tables at once, rather than doing one
> at a time like we used to.

Works for me. That also gives me something to write about it the text :)

> In linux-next, we are at number 434 now, and there are a couple of other
> new system calls being proposed right now (more than usual), so you may
> have to change the number a few times.

OK, no problem. It'll be a bit easier to handle the number that way.

> Note: most architectures use .tbl files now, the exceptions are
> include/uapi/asm-generic/unistd.h and arch/arm64/include/asm/unistd32.h,
> and the latter also requires changing __NR_compat_syscalls in asm/unistd.h.
>
> Numbers should now be the same across architectures, except for alpha,
> which has a +110 offset. We have discussed ways to have a single
> file to modify for a new call on all architectures, but no patches yet.

OK, thanks. I'll wait a bit for feedback, but unless there's anything else
I'll go ahead and finish this.