2019-07-17 01:31:44

by Palmer Dabbelt

[permalink] [raw]
Subject: Add a new fchmodat4() syscall, v2

This patch set adds fchmodat4(), a new syscall. The actual
implementation is super simple: essentially it's just the same as
fchmodat(), but LOOKUP_FOLLOW is conditionally set based on the flags.
I've attempted to make this match "man 2 fchmodat" as closely as
possible, which says EINVAL is returned for invalid flags (as opposed to
ENOTSUPP, which is currently returned by glibc for AT_SYMLINK_NOFOLLOW).
I have a sketch of a glibc patch that I haven't even compiled yet, but
seems fairly straight-forward:

diff --git a/sysdeps/unix/sysv/linux/fchmodat.c b/sysdeps/unix/sysv/linux/fchmodat.c
index 6d9cbc1ce9e0..b1beab76d56c 100644
--- a/sysdeps/unix/sysv/linux/fchmodat.c
+++ b/sysdeps/unix/sysv/linux/fchmodat.c
@@ -29,12 +29,36 @@
int
fchmodat (int fd, const char *file, mode_t mode, int flag)
{
- if (flag & ~AT_SYMLINK_NOFOLLOW)
- return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
-#ifndef __NR_lchmod /* Linux so far has no lchmod syscall. */
+ /* There are four paths through this code:
+ - The flags are zero. In this case it's fine to call fchmodat.
+ - The flags are non-zero and glibc doesn't have access to
+ __NR_fchmodat4. In this case all we can do is emulate the error codes
+ defined by the glibc interface from userspace.
+ - The flags are non-zero, glibc has __NR_fchmodat4, and the kernel has
+ fchmodat4. This is the simplest case, as the fchmodat4 syscall exactly
+ matches glibc's library interface so it can be called directly.
+ - The flags are non-zero, glibc has __NR_fchmodat4, but the kernel does
+ not. In this case we must respect the error codes defined by the glibc
+ interface instead of returning ENOSYS.
+ The intent here is to ensure that the kernel is called at most once per
+ library call, and that the error types defined by glibc are always
+ respected. */
+
+#ifdef __NR_fchmodat4
+ long result;
+#endif
+
+ if (flag == 0)
+ return INLINE_SYSCALL (fchmodat, 3, fd, file, mode);
+
+#ifdef __NR_fchmodat4
+ result = INLINE_SYSCALL (fchmodat4, 4, fd, file, mode, flag);
+ if (result == 0 || errno != ENOSYS)
+ return result;
+#endif
+
if (flag & AT_SYMLINK_NOFOLLOW)
return INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOTSUP);
-#endif

- return INLINE_SYSCALL (fchmodat, 3, fd, file, mode);
+ return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
}

I've never added a new syscall before so I'm not really sure what the
proper procedure to follow is. Based on the feedback from my v1 patch
set it seems this is somewhat uncontroversial. At this point I don't
think there's anything I'm missing, though note that I haven't gotten
around to testing it this time because the diff from v1 is trivial for
any platform I could reasonably test on. The v1 patches suggest a
simple test case, but I didn't re-run it because I don't want to reboot
my laptop.

$ touch test-file
$ ln -s test-file test-link
$ cat > test.c
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv)
{
long out;

out = syscall(434, AT_FDCWD, "test-file", 0x888, AT_SYMLINK_NOFOLLOW);
printf("fchmodat4(AT_FDCWD, \"test-file\", 0x888, AT_SYMLINK_NOFOLLOW): %ld\n", out);

out = syscall(434, AT_FDCWD, "test-file", 0x888, 0);
printf("fchmodat4(AT_FDCWD, \"test-file\", 0x888, 0): %ld\n", out);

out = syscall(268, AT_FDCWD, "test-file", 0x888);
printf("fchmodat(AT_FDCWD, \"test-file\", 0x888): %ld\n", out);

out = syscall(434, AT_FDCWD, "test-link", 0x888, AT_SYMLINK_NOFOLLOW);
printf("fchmodat4(AT_FDCWD, \"test-link\", 0x888, AT_SYMLINK_NOFOLLOW): %ld\n", out);

out = syscall(434, AT_FDCWD, "test-link", 0x888, 0);
printf("fchmodat4(AT_FDCWD, \"test-link\", 0x888, 0): %ld\n", out);

out = syscall(268, AT_FDCWD, "test-link", 0x888);
printf("fchmodat(AT_FDCWD, \"test-link\", 0x888): %ld\n", out);

return 0;
}
$ gcc test.c -o test
$ ./test
fchmodat4(AT_FDCWD, "test-file", 0x888, AT_SYMLINK_NOFOLLOW): 0
fchmodat4(AT_FDCWD, "test-file", 0x888, 0): 0
fchmodat(AT_FDCWD, "test-file", 0x888): 0
fchmodat4(AT_FDCWD, "test-link", 0x888, AT_SYMLINK_NOFOLLOW): -1
fchmodat4(AT_FDCWD, "test-link", 0x888, 0): 0
fchmodat(AT_FDCWD, "test-link", 0x888): 0

I've only built this on 64-bit x86.

Changes since v1 [[email protected]]:

* All architectures are now supported, which support squashed into a
single patch.
* The do_fchmodat() helper function has been removed, in favor of directly
calling do_fchmodat4().
* The patches are based on 5.2 instead of 5.1.


2020-06-09 13:57:42

by Florian Weimer

[permalink] [raw]
Subject: Re: Add a new fchmodat4() syscall, v2

* Palmer Dabbelt:

> This patch set adds fchmodat4(), a new syscall. The actual
> implementation is super simple: essentially it's just the same as
> fchmodat(), but LOOKUP_FOLLOW is conditionally set based on the flags.
> I've attempted to make this match "man 2 fchmodat" as closely as
> possible, which says EINVAL is returned for invalid flags (as opposed to
> ENOTSUPP, which is currently returned by glibc for AT_SYMLINK_NOFOLLOW).
> I have a sketch of a glibc patch that I haven't even compiled yet, but
> seems fairly straight-forward:

What's the status here? We'd really like to see this system call in the
kernel because our emulation in glibc has its problems (especially if
/proc is not mounted).

Thanks,
Florian

2023-07-11 11:32:58

by Alexey Gladkov

[permalink] [raw]
Subject: [PATCH v3 0/5] Add a new fchmodat4() syscall

This patch set adds fchmodat4(), a new syscall. The actual
implementation is super simple: essentially it's just the same as
fchmodat(), but LOOKUP_FOLLOW is conditionally set based on the flags.
I've attempted to make this match "man 2 fchmodat" as closely as
possible, which says EINVAL is returned for invalid flags (as opposed to
ENOTSUPP, which is currently returned by glibc for AT_SYMLINK_NOFOLLOW).
I have a sketch of a glibc patch that I haven't even compiled yet, but
seems fairly straight-forward:

diff --git a/sysdeps/unix/sysv/linux/fchmodat.c b/sysdeps/unix/sysv/linux/fchmodat.c
index 6d9cbc1ce9e0..b1beab76d56c 100644
--- a/sysdeps/unix/sysv/linux/fchmodat.c
+++ b/sysdeps/unix/sysv/linux/fchmodat.c
@@ -29,12 +29,36 @@
int
fchmodat (int fd, const char *file, mode_t mode, int flag)
{
- if (flag & ~AT_SYMLINK_NOFOLLOW)
- return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
-#ifndef __NR_lchmod /* Linux so far has no lchmod syscall. */
+ /* There are four paths through this code:
+ - The flags are zero. In this case it's fine to call fchmodat.
+ - The flags are non-zero and glibc doesn't have access to
+ __NR_fchmodat4. In this case all we can do is emulate the error codes
+ defined by the glibc interface from userspace.
+ - The flags are non-zero, glibc has __NR_fchmodat4, and the kernel has
+ fchmodat4. This is the simplest case, as the fchmodat4 syscall exactly
+ matches glibc's library interface so it can be called directly.
+ - The flags are non-zero, glibc has __NR_fchmodat4, but the kernel does
+ not. In this case we must respect the error codes defined by the glibc
+ interface instead of returning ENOSYS.
+ The intent here is to ensure that the kernel is called at most once per
+ library call, and that the error types defined by glibc are always
+ respected. */
+
+#ifdef __NR_fchmodat4
+ long result;
+#endif
+
+ if (flag == 0)
+ return INLINE_SYSCALL (fchmodat, 3, fd, file, mode);
+
+#ifdef __NR_fchmodat4
+ result = INLINE_SYSCALL (fchmodat4, 4, fd, file, mode, flag);
+ if (result == 0 || errno != ENOSYS)
+ return result;
+#endif
+
if (flag & AT_SYMLINK_NOFOLLOW)
return INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOTSUP);
-#endif

- return INLINE_SYSCALL (fchmodat, 3, fd, file, mode);
+ return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
}

I've never added a new syscall before so I'm not really sure what the
proper procedure to follow is. Based on the feedback from my v1 patch
set it seems this is somewhat uncontroversial. At this point I don't
think there's anything I'm missing, though note that I haven't gotten
around to testing it this time because the diff from v1 is trivial for
any platform I could reasonably test on. The v1 patches suggest a
simple test case, but I didn't re-run it because I don't want to reboot
my laptop.

Changes since v2 [[email protected]]:

* Rebased to master.
* The lookup_flags passed to sys_fchmodat4 as suggested by Al Viro.
* Selftest added.

Changes since v1 [[email protected]]:

* All architectures are now supported, which support squashed into a
single patch.
* The do_fchmodat() helper function has been removed, in favor of directly
calling do_fchmodat4().
* The patches are based on 5.2 instead of 5.1.

---

Alexey Gladkov (1):
selftests: add fchmodat4(2) selftest

Palmer Dabbelt (4):
Non-functional cleanup of a "__user * filename"
fs: Add fchmodat4()
arch: Register fchmodat4, usually as syscall 451
tools headers UAPI: Sync files changed by new fchmodat4 syscall

arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/open.c | 18 ++-
include/linux/syscalls.h | 4 +-
include/uapi/asm-generic/unistd.h | 5 +-
tools/include/uapi/asm-generic/unistd.h | 5 +-
.../arch/mips/entry/syscalls/syscall_n64.tbl | 1 +
.../arch/powerpc/entry/syscalls/syscall.tbl | 1 +
.../perf/arch/s390/entry/syscalls/syscall.tbl | 1 +
.../arch/x86/entry/syscalls/syscall_64.tbl | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/fchmodat4/.gitignore | 2 +
tools/testing/selftests/fchmodat4/Makefile | 6 +
.../selftests/fchmodat4/fchmodat4_test.c | 151 ++++++++++++++++++
29 files changed, 207 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/fchmodat4/.gitignore
create mode 100644 tools/testing/selftests/fchmodat4/Makefile
create mode 100644 tools/testing/selftests/fchmodat4/fchmodat4_test.c

--
2.33.8


2023-07-11 11:35:01

by Alexey Gladkov

[permalink] [raw]
Subject: [PATCH v3 3/5] arch: Register fchmodat4, usually as syscall 451

From: Palmer Dabbelt <[email protected]>

This registers the new fchmodat4 syscall in most places as nuber 451,
with alpha being the exception where it's 561. I found all these sites
by grepping for fspick, which I assume has found me everything.

Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
---
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd32.h | 2 ++
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/uapi/asm-generic/unistd.h | 5 ++++-
18 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 8ebacf37a8cf..00ceeffec7ff 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -490,3 +490,4 @@
558 common process_mrelease sys_process_mrelease
559 common futex_waitv sys_futex_waitv
560 common set_mempolicy_home_node sys_ni_syscall
+561 common fchmodat4 sys_fchmodat4
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index ac964612d8b0..0b9702d5c425 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -464,3 +464,4 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+451 common fchmodat4 sys_fchmodat4
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 604a2053d006..49c65d935049 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -907,6 +907,8 @@ __SYSCALL(__NR_process_mrelease, sys_process_mrelease)
__SYSCALL(__NR_futex_waitv, sys_futex_waitv)
#define __NR_set_mempolicy_home_node 450
__SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
+#define __NR_fchmodat4 451
+__SYSCALL(__NR_fchmodat4, sys_fchmodat4)

/*
* Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 72c929d9902b..b35225c64781 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -371,3 +371,4 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+451 common fchmodat4 sys_fchmodat4
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index b1f3940bc298..4d80cd87e089 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -450,3 +450,4 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+451 common fchmodat4 sys_fchmodat4
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 820145e47350..306bd18e5b52 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -456,3 +456,4 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+451 common fchmodat4 sys_fchmodat4
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 253ff994ed2e..2ef47a546fd3 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -389,3 +389,4 @@
448 n32 process_mrelease sys_process_mrelease
449 n32 futex_waitv sys_futex_waitv
450 n32 set_mempolicy_home_node sys_set_mempolicy_home_node
+451 n32 fchmodat4 sys_fchmodat4
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 3f1886ad9d80..6356c0a6cda0 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -365,3 +365,4 @@
448 n64 process_mrelease sys_process_mrelease
449 n64 futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+451 n64 fchmodat4 sys_fchmodat4
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 8f243e35a7b2..d1111edba47e 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -438,3 +438,4 @@
448 o32 process_mrelease sys_process_mrelease
449 o32 futex_waitv sys_futex_waitv
450 o32 set_mempolicy_home_node sys_set_mempolicy_home_node
+451 o32 fchmodat4 sys_fchmodat4
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 0e42fceb2d5e..0a1c13744c32 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -448,3 +448,4 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+451 common fchmodat4 sys_fchmodat4
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index a0be127475b1..ee23866fa1c8 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -537,3 +537,4 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node
+451 common fchmodat4 sys_fchmodat4
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index b68f47541169..d5ce80065ece 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -453,3 +453,4 @@
448 common process_mrelease sys_process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_mempolicy_home_node
+451 common fchmodat4 sys_fchmodat4 sys_fchmodat4
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 2de85c977f54..697b914f5c33 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -453,3 +453,4 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+451 common fchmodat4 sys_fchmodat4
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 4398cc6fb68d..79be56831f39 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -496,3 +496,4 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+451 common fchmodat4 sys_fchmodat4
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 320480a8db4f..3ab07bd87ea9 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -455,3 +455,4 @@
448 i386 process_mrelease sys_process_mrelease
449 i386 futex_waitv sys_futex_waitv
450 i386 set_mempolicy_home_node sys_set_mempolicy_home_node
+451 i386 fchmodat4 sys_fchmodat4
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index c84d12608cd2..17047878293c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -372,6 +372,7 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+451 common fchmodat4 sys_fchmodat4

#
# Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 52c94ab5c205..ed47fd3b2293 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -421,3 +421,4 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+451 common fchmodat4 sys_fchmodat4
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 45fa180cc56a..b7978b3ce3f1 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -886,8 +886,11 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv)
#define __NR_set_mempolicy_home_node 450
__SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)

+#define __NR_fchmodat4 451
+__SYSCALL(__NR_fchmodat4, sys_fchmodat4)
+
#undef __NR_syscalls
-#define __NR_syscalls 451
+#define __NR_syscalls 452

/*
* 32 bit systems traditionally used different
--
2.33.8


2023-07-11 11:35:18

by Alexey Gladkov

[permalink] [raw]
Subject: [PATCH v3 2/5] fs: Add fchmodat4()

From: Palmer Dabbelt <[email protected]>

On the userspace side fchmodat(3) is implemented as a wrapper
function which implements the POSIX-specified interface. This
interface differs from the underlying kernel system call, which does not
have a flags argument. Most implementations require procfs [1][2].

There doesn't appear to be a good userspace workaround for this issue
but the implementation in the kernel is pretty straight-forward.

The new fchmodat4() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
unlike existing fchmodat.

[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
[2] https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28

Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
---
fs/open.c | 18 ++++++++++++++----
include/linux/syscalls.h | 2 ++
2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 4478adcc4f3a..58bb88c6afb6 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -671,11 +671,11 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
return err;
}

-static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
+static int do_fchmodat4(int dfd, const char __user *filename, umode_t mode, int lookup_flags)
{
struct path path;
int error;
- unsigned int lookup_flags = LOOKUP_FOLLOW;
+
retry:
error = user_path_at(dfd, filename, lookup_flags, &path);
if (!error) {
@@ -689,15 +689,25 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
return error;
}

+SYSCALL_DEFINE4(fchmodat4, int, dfd, const char __user *, filename,
+ umode_t, mode, int, flags)
+{
+ if (unlikely(flags & ~AT_SYMLINK_NOFOLLOW))
+ return -EINVAL;
+
+ return do_fchmodat4(dfd, filename, mode,
+ flags & AT_SYMLINK_NOFOLLOW ? 0 : LOOKUP_FOLLOW);
+}
+
SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename,
umode_t, mode)
{
- return do_fchmodat(dfd, filename, mode);
+ return do_fchmodat4(dfd, filename, mode, LOOKUP_FOLLOW);
}

SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
{
- return do_fchmodat(AT_FDCWD, filename, mode);
+ return do_fchmodat4(AT_FDCWD, filename, mode, LOOKUP_FOLLOW);
}

/**
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 497bdd968c32..b17d37d2bad6 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -466,6 +466,8 @@ asmlinkage long sys_chroot(const char __user *filename);
asmlinkage long sys_fchmod(unsigned int fd, umode_t mode);
asmlinkage long sys_fchmodat(int dfd, const char __user *filename,
umode_t mode);
+asmlinkage long sys_fchmodat4(int dfd, const char __user *filename,
+ umode_t mode, int flags);
asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
gid_t group, int flag);
asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group);
--
2.33.8


2023-07-11 11:37:13

by Alexey Gladkov

[permalink] [raw]
Subject: [PATCH v3 5/5] selftests: add fchmodat4(2) selftest

The test marks as skipped if a syscall with the AT_SYMLINK_NOFOLLOW flag
fails. This is because not all filesystems support changing the mode
bits of symlinks properly. These filesystems return an error but change
the mode bits:

newfstatat(4, "regfile", {st_mode=S_IFREG|0640, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(4, "symlink", {st_mode=S_IFLNK|0777, st_size=7, ...}, AT_SYMLINK_NOFOLLOW) = 0
syscall_0x1c3(0x4, 0x55fa1f244396, 0x180, 0x100, 0x55fa1f24438e, 0x34) = -1 EOPNOTSUPP (Operation not supported)
newfstatat(4, "regfile", {st_mode=S_IFREG|0640, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0

This happens with btrfs and xfs:

$ /kernel/tools/testing/selftests/fchmodat4/fchmodat4_test
TAP version 13
1..1
ok 1 # SKIP fchmodat4(symlink)
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0

$ stat /tmp/ksft-fchmodat4.*/symlink
File: /tmp/ksft-fchmodat4.3NCqlE/symlink -> regfile
Size: 7 Blocks: 0 IO Block: 4096 symbolic link
Device: 7,0 Inode: 133 Links: 1
Access: (0600/lrw-------) Uid: ( 0/ root) Gid: ( 0/ root)

Signed-off-by: Alexey Gladkov <[email protected]>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/fchmodat4/.gitignore | 2 +
tools/testing/selftests/fchmodat4/Makefile | 6 +
.../selftests/fchmodat4/fchmodat4_test.c | 151 ++++++++++++++++++
4 files changed, 160 insertions(+)
create mode 100644 tools/testing/selftests/fchmodat4/.gitignore
create mode 100644 tools/testing/selftests/fchmodat4/Makefile
create mode 100644 tools/testing/selftests/fchmodat4/fchmodat4_test.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 90a62cf75008..fe61fa55412d 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -17,6 +17,7 @@ TARGETS += drivers/net/bonding
TARGETS += drivers/net/team
TARGETS += efivarfs
TARGETS += exec
+TARGETS += fchmodat4
TARGETS += filesystems
TARGETS += filesystems/binderfs
TARGETS += filesystems/epoll
diff --git a/tools/testing/selftests/fchmodat4/.gitignore b/tools/testing/selftests/fchmodat4/.gitignore
new file mode 100644
index 000000000000..82a4846cbc4b
--- /dev/null
+++ b/tools/testing/selftests/fchmodat4/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+/*_test
diff --git a/tools/testing/selftests/fchmodat4/Makefile b/tools/testing/selftests/fchmodat4/Makefile
new file mode 100644
index 000000000000..3d38a69c3c12
--- /dev/null
+++ b/tools/testing/selftests/fchmodat4/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined
+TEST_GEN_PROGS := fchmodat4_test
+
+include ../lib.mk
diff --git a/tools/testing/selftests/fchmodat4/fchmodat4_test.c b/tools/testing/selftests/fchmodat4/fchmodat4_test.c
new file mode 100644
index 000000000000..50beb731d8ba
--- /dev/null
+++ b/tools/testing/selftests/fchmodat4/fchmodat4_test.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <syscall.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+
+#ifndef __NR_fchmodat4
+ #if defined __alpha__
+ #define __NR_fchmodat4 561
+ #elif defined _MIPS_SIM
+ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
+ #define __NR_fchmodat4 (451 + 4000)
+ #endif
+ #if _MIPS_SIM == _MIPS_SIM_NABI32 /* n32 */
+ #define __NR_fchmodat4 (451 + 6000)
+ #endif
+ #if _MIPS_SIM == _MIPS_SIM_ABI64 /* n64 */
+ #define __NR_fchmodat4 (451 + 5000)
+ #endif
+ #elif defined __ia64__
+ #define __NR_fchmodat4 (451 + 1024)
+ #else
+ #define __NR_fchmodat4 451
+ #endif
+#endif
+
+int sys_fchmodat4(int dfd, const char *filename, mode_t mode, int flags)
+{
+ int ret = syscall(__NR_fchmodat4, dfd, filename, mode, flags);
+ return ret >= 0 ? ret : -errno;
+}
+
+int setup_testdir(void)
+{
+ int dfd, ret;
+ char dirname[] = "/tmp/ksft-fchmodat4.XXXXXX";
+
+ /* Make the top-level directory. */
+ if (!mkdtemp(dirname))
+ ksft_exit_fail_msg("setup_testdir: failed to create tmpdir\n");
+
+ dfd = open(dirname, O_PATH | O_DIRECTORY);
+ if (dfd < 0)
+ ksft_exit_fail_msg("setup_testdir: failed to open tmpdir\n");
+
+ ret = openat(dfd, "regfile", O_CREAT | O_WRONLY | O_TRUNC, 0644);
+ if (ret < 0)
+ ksft_exit_fail_msg("setup_testdir: failed to create file in tmpdir\n");
+ close(ret);
+
+ ret = symlinkat("regfile", dfd, "symlink");
+ if (ret < 0)
+ ksft_exit_fail_msg("setup_testdir: failed to create symlink in tmpdir\n");
+
+ return dfd;
+}
+
+int expect_mode(int dfd, const char *filename, mode_t expect_mode)
+{
+ struct stat st;
+ int ret = fstatat(dfd, filename, &st, AT_SYMLINK_NOFOLLOW);
+
+ if (ret)
+ ksft_exit_fail_msg("expect_mode: %s: fstatat failed\n", filename);
+
+ return (st.st_mode == expect_mode);
+}
+
+void test_regfile(void)
+{
+ int dfd, ret;
+
+ dfd = setup_testdir();
+
+ ret = sys_fchmodat4(dfd, "regfile", 0640, 0);
+
+ if (ret < 0)
+ ksft_exit_fail_msg("test_regfile: fchmodat4(noflag) failed\n");
+
+ if (!expect_mode(dfd, "regfile", 0100640))
+ ksft_exit_fail_msg("test_regfile: wrong file mode bits after fchmodat4\n");
+
+ ret = sys_fchmodat4(dfd, "regfile", 0600, AT_SYMLINK_NOFOLLOW);
+
+ if (ret < 0)
+ ksft_exit_fail_msg("test_regfile: fchmodat4(AT_SYMLINK_NOFOLLOW) failed\n");
+
+ if (!expect_mode(dfd, "regfile", 0100600))
+ ksft_exit_fail_msg("test_regfile: wrong file mode bits after fchmodat4 with nofollow\n");
+
+ ksft_test_result_pass("fchmodat4(regfile)\n");
+}
+
+void test_symlink(void)
+{
+ int dfd, ret;
+
+ dfd = setup_testdir();
+
+ ret = sys_fchmodat4(dfd, "symlink", 0640, 0);
+
+ if (ret < 0)
+ ksft_exit_fail_msg("test_symlink: fchmodat4(noflag) failed\n");
+
+ if (!expect_mode(dfd, "regfile", 0100640))
+ ksft_exit_fail_msg("test_symlink: wrong file mode bits after fchmodat4\n");
+
+ if (!expect_mode(dfd, "symlink", 0120777))
+ ksft_exit_fail_msg("test_symlink: wrong symlink mode bits after fchmodat4\n");
+
+ ret = sys_fchmodat4(dfd, "symlink", 0600, AT_SYMLINK_NOFOLLOW);
+
+ /*
+ * On certain filesystems (xfs or btrfs), chmod operation fails. So we
+ * first check the symlink target but if the operation fails we mark the
+ * test as skipped.
+ *
+ * https://sourceware.org/legacy-ml/libc-alpha/2020-02/msg00467.html
+ */
+ if (ret == 0 && !expect_mode(dfd, "symlink", 0120600))
+ ksft_exit_fail_msg("test_symlink: wrong symlink mode bits after fchmodat4 with nofollow\n");
+
+ if (!expect_mode(dfd, "regfile", 0100640))
+ ksft_exit_fail_msg("test_symlink: wrong file mode bits after fchmodat4 with nofollow\n");
+
+ if (ret != 0)
+ ksft_test_result_skip("fchmodat4(symlink)\n");
+ else
+ ksft_test_result_pass("fchmodat4(symlink)\n");
+}
+
+#define NUM_TESTS 2
+
+int main(int argc, char **argv)
+{
+ ksft_print_header();
+ ksft_set_plan(NUM_TESTS);
+
+ test_regfile();
+ test_symlink();
+
+ if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
+ ksft_exit_fail();
+ else
+ ksft_exit_pass();
+}
--
2.33.8


2023-07-11 11:43:14

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] arch: Register fchmodat4, usually as syscall 451

On Tue, Jul 11, 2023, at 13:25, Alexey Gladkov wrote:
> From: Palmer Dabbelt <[email protected]>
>
> This registers the new fchmodat4 syscall in most places as nuber 451,
> with alpha being the exception where it's 561. I found all these sites
> by grepping for fspick, which I assume has found me everything.
>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Alexey Gladkov <[email protected]>

In linux-6.5-rc1, number 451 is used for __NR_cachestat, the
next free one at the moment is 452.

> arch/arm/tools/syscall.tbl | 1 +
> arch/arm64/include/asm/unistd32.h | 2 ++

Unfortunately, you still also need to change __NR_compat_syscalls
in arch/arm64/include/asm/unistd.h. Aside from these two issues,
your patch is the correct way to hook up a new syscall.

Arnd

2023-07-11 11:59:38

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] fs: Add fchmodat4()

On Tue, Jul 11, 2023 at 01:42:19PM +0200, Arnd Bergmann wrote:
> On Tue, Jul 11, 2023, at 13:25, Alexey Gladkov wrote:
> > From: Palmer Dabbelt <[email protected]>
> >
> > On the userspace side fchmodat(3) is implemented as a wrapper
> > function which implements the POSIX-specified interface. This
> > interface differs from the underlying kernel system call, which does not
> > have a flags argument. Most implementations require procfs [1][2].
> >
> > There doesn't appear to be a good userspace workaround for this issue
> > but the implementation in the kernel is pretty straight-forward.
> >
> > The new fchmodat4() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
> > unlike existing fchmodat.
> >
> > [1]
> > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
> > [2]
> > https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
> >
> > Signed-off-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Alexey Gladkov <[email protected]>
>
> I don't know the history of why we ended up with the different
> interface, or whether this was done intentionally in the kernel
> or if we want this syscall.
>
> Assuming this is in fact needed, I double-checked that the
> implementation looks correct to me and is portable to all the
> architectures, without the need for a compat wrapper.
>
> Acked-by: Arnd Bergmann <[email protected]>

The system call itself is useful afaict. But please,

s/fchmodat4/fchmodat2/

With very few exceptions we don't version by argument number but by
revision and we should stick to one scheme:

openat()->openat2()
eventfd()->eventfd2()
clone()/clone2()->clone3()
dup()->dup2()->dup3() // coincides with nr of arguments
pipe()->pipe2() // coincides with nr of arguments
renameat()->renameat2()

2023-07-11 12:11:52

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] fs: Add fchmodat4()

On Tue, Jul 11, 2023, at 13:25, Alexey Gladkov wrote:
> From: Palmer Dabbelt <[email protected]>
>
> On the userspace side fchmodat(3) is implemented as a wrapper
> function which implements the POSIX-specified interface. This
> interface differs from the underlying kernel system call, which does not
> have a flags argument. Most implementations require procfs [1][2].
>
> There doesn't appear to be a good userspace workaround for this issue
> but the implementation in the kernel is pretty straight-forward.
>
> The new fchmodat4() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
> unlike existing fchmodat.
>
> [1]
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
> [2]
> https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Alexey Gladkov <[email protected]>

I don't know the history of why we ended up with the different
interface, or whether this was done intentionally in the kernel
or if we want this syscall.

Assuming this is in fact needed, I double-checked that the
implementation looks correct to me and is portable to all the
architectures, without the need for a compat wrapper.

Acked-by: Arnd Bergmann <[email protected]>

2023-07-11 12:19:23

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH v3 5/5] selftests: add fchmodat4(2) selftest

* Alexey Gladkov:

> The test marks as skipped if a syscall with the AT_SYMLINK_NOFOLLOW flag
> fails. This is because not all filesystems support changing the mode
> bits of symlinks properly. These filesystems return an error but change
> the mode bits:
>
> newfstatat(4, "regfile", {st_mode=S_IFREG|0640, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
> newfstatat(4, "symlink", {st_mode=S_IFLNK|0777, st_size=7, ...}, AT_SYMLINK_NOFOLLOW) = 0
> syscall_0x1c3(0x4, 0x55fa1f244396, 0x180, 0x100, 0x55fa1f24438e, 0x34) = -1 EOPNOTSUPP (Operation not supported)
> newfstatat(4, "regfile", {st_mode=S_IFREG|0640, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
>
> This happens with btrfs and xfs:
>
> $ /kernel/tools/testing/selftests/fchmodat4/fchmodat4_test
> TAP version 13
> 1..1
> ok 1 # SKIP fchmodat4(symlink)
> # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0
>
> $ stat /tmp/ksft-fchmodat4.*/symlink
> File: /tmp/ksft-fchmodat4.3NCqlE/symlink -> regfile
> Size: 7 Blocks: 0 IO Block: 4096 symbolic link
> Device: 7,0 Inode: 133 Links: 1
> Access: (0600/lrw-------) Uid: ( 0/ root) Gid: ( 0/ root)
>
> Signed-off-by: Alexey Gladkov <[email protected]>

This looks like a bug in those file systems?

As an extra test, “echo 3 > /proc/sys/vm/drop_caches” sometimes has
strange effects in such cases because the bits are not actually stored
on disk, only in the dentry cache.

Thanks,
Florian


2023-07-11 12:43:41

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Add a new fchmodat4() syscall

* Alexey Gladkov:

> This patch set adds fchmodat4(), a new syscall. The actual
> implementation is super simple: essentially it's just the same as
> fchmodat(), but LOOKUP_FOLLOW is conditionally set based on the flags.
> I've attempted to make this match "man 2 fchmodat" as closely as
> possible, which says EINVAL is returned for invalid flags (as opposed to
> ENOTSUPP, which is currently returned by glibc for AT_SYMLINK_NOFOLLOW).
> I have a sketch of a glibc patch that I haven't even compiled yet, but
> seems fairly straight-forward:
>
> diff --git a/sysdeps/unix/sysv/linux/fchmodat.c b/sysdeps/unix/sysv/linux/fchmodat.c
> index 6d9cbc1ce9e0..b1beab76d56c 100644
> --- a/sysdeps/unix/sysv/linux/fchmodat.c
> +++ b/sysdeps/unix/sysv/linux/fchmodat.c
> @@ -29,12 +29,36 @@
> int
> fchmodat (int fd, const char *file, mode_t mode, int flag)
> {
> - if (flag & ~AT_SYMLINK_NOFOLLOW)
> - return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
> -#ifndef __NR_lchmod /* Linux so far has no lchmod syscall. */
> + /* There are four paths through this code:
> + - The flags are zero. In this case it's fine to call fchmodat.
> + - The flags are non-zero and glibc doesn't have access to
> + __NR_fchmodat4. In this case all we can do is emulate the error codes
> + defined by the glibc interface from userspace.
> + - The flags are non-zero, glibc has __NR_fchmodat4, and the kernel has
> + fchmodat4. This is the simplest case, as the fchmodat4 syscall exactly
> + matches glibc's library interface so it can be called directly.
> + - The flags are non-zero, glibc has __NR_fchmodat4, but the kernel does

If you define __NR_fchmodat4 on all architectures, we can use these
constants directly in glibc. We no longer depend on the UAPI
definitions of those constants, to cut down the number of code variants,
and to make glibc's system call profile independent of the kernel header
version at build time.

Your version is based on 2.31, more recent versions have some reasonable
emulation for fchmodat based on /proc/self/fd. I even wrote a comment
describing the same buggy behavior that you witnessed:

+ /* Some Linux versions with some file systems can actually
+ change symbolic link permissions via /proc, but this is not
+ intentional, and it gives inconsistent results (e.g., error
+ return despite mode change). The expected behavior is that
+ symbolic link modes cannot be changed at all, and this check
+ enforces that. */
+ if (S_ISLNK (st.st_mode))
+ {
+ __close_nocancel (pathfd);
+ __set_errno (EOPNOTSUPP);
+ return -1;
+ }

I think there was some kernel discussion about that behavior before, but
apparently, it hasn't led to fixes.

I wonder if it makes sense to add a similar error return to the system
call implementation?

> + not. In this case we must respect the error codes defined by the glibc
> + interface instead of returning ENOSYS.
> + The intent here is to ensure that the kernel is called at most once per
> + library call, and that the error types defined by glibc are always
> + respected. */
> +
> +#ifdef __NR_fchmodat4
> + long result;
> +#endif
> +
> + if (flag == 0)
> + return INLINE_SYSCALL (fchmodat, 3, fd, file, mode);
> +
> +#ifdef __NR_fchmodat4
> + result = INLINE_SYSCALL (fchmodat4, 4, fd, file, mode, flag);
> + if (result == 0 || errno != ENOSYS)
> + return result;
> +#endif

The last if condition is the recommended approach, but in the past, it
broke container host compatibility pretty badly due to seccomp filters
that return EPERM instead of ENOSYS. I guess we'll learn soon enough if
that's been fixed by now. 8-P

Thanks,
Florian


2023-07-11 12:45:42

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] fs: Add fchmodat4()

On Tue, Jul 11, 2023 at 01:25:43PM +0200, Alexey Gladkov wrote:
> -static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> +static int do_fchmodat4(int dfd, const char __user *filename, umode_t mode, int lookup_flags)

This function can still be called do_fchmodat(); we don't need to
version internal functions.


2023-07-11 12:59:43

by Alexey Gladkov

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] fs: Add fchmodat4()

On Tue, Jul 11, 2023 at 01:52:01PM +0200, Christian Brauner wrote:
> On Tue, Jul 11, 2023 at 01:42:19PM +0200, Arnd Bergmann wrote:
> > On Tue, Jul 11, 2023, at 13:25, Alexey Gladkov wrote:
> > > From: Palmer Dabbelt <[email protected]>
> > >
> > > On the userspace side fchmodat(3) is implemented as a wrapper
> > > function which implements the POSIX-specified interface. This
> > > interface differs from the underlying kernel system call, which does not
> > > have a flags argument. Most implementations require procfs [1][2].
> > >
> > > There doesn't appear to be a good userspace workaround for this issue
> > > but the implementation in the kernel is pretty straight-forward.
> > >
> > > The new fchmodat4() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
> > > unlike existing fchmodat.
> > >
> > > [1]
> > > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
> > > [2]
> > > https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
> > >
> > > Signed-off-by: Palmer Dabbelt <[email protected]>
> > > Signed-off-by: Alexey Gladkov <[email protected]>
> >
> > I don't know the history of why we ended up with the different
> > interface, or whether this was done intentionally in the kernel
> > or if we want this syscall.
> >
> > Assuming this is in fact needed, I double-checked that the
> > implementation looks correct to me and is portable to all the
> > architectures, without the need for a compat wrapper.
> >
> > Acked-by: Arnd Bergmann <[email protected]>
>
> The system call itself is useful afaict. But please,
>
> s/fchmodat4/fchmodat2/

Sure. I will.

> With very few exceptions we don't version by argument number but by
> revision and we should stick to one scheme:
>
> openat()->openat2()
> eventfd()->eventfd2()
> clone()/clone2()->clone3()
> dup()->dup2()->dup3() // coincides with nr of arguments
> pipe()->pipe2() // coincides with nr of arguments
> renameat()->renameat2()
>

--
Rgrds, legion


2023-07-11 13:18:56

by Alexey Gladkov

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] fs: Add fchmodat4()

On Tue, Jul 11, 2023 at 01:28:04PM +0100, Matthew Wilcox wrote:
> On Tue, Jul 11, 2023 at 01:25:43PM +0200, Alexey Gladkov wrote:
> > -static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> > +static int do_fchmodat4(int dfd, const char __user *filename, umode_t mode, int lookup_flags)
>
> This function can still be called do_fchmodat(); we don't need to
> version internal functions.

Yes. I tried not to change too much when adopting a patch. In the new
version, I will return the old name. Thanks.

--
Rgrds, legion


2023-07-11 13:51:15

by Alexey Gladkov

[permalink] [raw]
Subject: Re: [PATCH v3 5/5] selftests: add fchmodat4(2) selftest

On Tue, Jul 11, 2023 at 02:10:58PM +0200, Florian Weimer wrote:
> * Alexey Gladkov:
>
> > The test marks as skipped if a syscall with the AT_SYMLINK_NOFOLLOW flag
> > fails. This is because not all filesystems support changing the mode
> > bits of symlinks properly. These filesystems return an error but change
> > the mode bits:
> >
> > newfstatat(4, "regfile", {st_mode=S_IFREG|0640, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
> > newfstatat(4, "symlink", {st_mode=S_IFLNK|0777, st_size=7, ...}, AT_SYMLINK_NOFOLLOW) = 0
> > syscall_0x1c3(0x4, 0x55fa1f244396, 0x180, 0x100, 0x55fa1f24438e, 0x34) = -1 EOPNOTSUPP (Operation not supported)
> > newfstatat(4, "regfile", {st_mode=S_IFREG|0640, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
> >
> > This happens with btrfs and xfs:
> >
> > $ /kernel/tools/testing/selftests/fchmodat4/fchmodat4_test
> > TAP version 13
> > 1..1
> > ok 1 # SKIP fchmodat4(symlink)
> > # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0
> >
> > $ stat /tmp/ksft-fchmodat4.*/symlink
> > File: /tmp/ksft-fchmodat4.3NCqlE/symlink -> regfile
> > Size: 7 Blocks: 0 IO Block: 4096 symbolic link
> > Device: 7,0 Inode: 133 Links: 1
> > Access: (0600/lrw-------) Uid: ( 0/ root) Gid: ( 0/ root)
> >
> > Signed-off-by: Alexey Gladkov <[email protected]>
>
> This looks like a bug in those file systems?

To me this looks like a bug. I'm fine if the operation ends with
EOPNOTSUPP, but in that case the mode bits shouldn't change.

> As an extra test, “echo 3 > /proc/sys/vm/drop_caches” sometimes has
> strange effects in such cases because the bits are not actually stored
> on disk, only in the dentry cache.

tmpfs
syscall_0x1c3(0xffffff9c, 0x7ffd58758574, 0, 0x100, 0x7f6cf18adc70, 0x7ffd58756ad8) = 0
+++ exited with 0 +++
l--------- 1 root root 1 Jul 11 16:36 /tmp/dir/link -> f
=== dropping caches ===
l--------- 1 root root 1 Jul 11 16:36 /tmp/dir/link -> f

ext4
syscall_0x1c3(0xffffff9c, 0x7ffedfdb4574, 0, 0x100, 0x7f7f40b45c70, 0x7ffedfdb3ae8) = -1 EOPNOTSUPP (Operation not supported)
+++ exited with 1 +++
l--------- 1 root root 1 Jul 11 16:36 /tmp/dir/link -> f
=== dropping caches ===
l--------- 1 root root 1 Jul 11 16:36 /tmp/dir/link -> f

xfs
syscall_0x1c3(0xffffff9c, 0x7ffcd03ce574, 0, 0x100, 0x7ff2f2980c70, 0x7ffcd03cdd38) = -1 EOPNOTSUPP (Operation not supported)
+++ exited with 1 +++
l--------- 1 root root 1 Jul 11 16:36 /tmp/dir/link -> f
=== dropping caches ===
l--------- 1 root root 1 Jul 11 16:36 /tmp/dir/link -> f

btrfs
syscall_0x1c3(0xffffff9c, 0x7fff13d2e574, 0, 0x100, 0x7f9b67f59c70, 0x7fff13d2ca88) = -1 EOPNOTSUPP (Operation not supported)
+++ exited with 1 +++
l--------- 1 root root 1 Jul 11 16:36 /tmp/dir/link -> f
=== dropping caches ===
l--------- 1 root root 1 Jul 11 16:36 /tmp/dir/link -> f

reiserfs
syscall_0x1c3(0xffffff9c, 0x7ffdf75af574, 0, 0x100, 0x7f7ad0634c70, 0x7ffdf75ae478) = 0
+++ exited with 0 +++
l--------- 1 root root 1 Jul 11 16:43 /tmp/dir/link -> f
=== dropping caches ===
l--------- 1 root root 1 Jul 11 16:43 /tmp/dir/link -> f

--
Rgrds, legion


2023-07-11 14:17:34

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] fs: Add fchmodat4()

On Tue, Jul 11, 2023 at 02:51:01PM +0200, Alexey Gladkov wrote:
> On Tue, Jul 11, 2023 at 01:52:01PM +0200, Christian Brauner wrote:
> > On Tue, Jul 11, 2023 at 01:42:19PM +0200, Arnd Bergmann wrote:
> > > On Tue, Jul 11, 2023, at 13:25, Alexey Gladkov wrote:
> > > > From: Palmer Dabbelt <[email protected]>
> > > >
> > > > On the userspace side fchmodat(3) is implemented as a wrapper
> > > > function which implements the POSIX-specified interface. This
> > > > interface differs from the underlying kernel system call, which does not
> > > > have a flags argument. Most implementations require procfs [1][2].
> > > >
> > > > There doesn't appear to be a good userspace workaround for this issue
> > > > but the implementation in the kernel is pretty straight-forward.
> > > >
> > > > The new fchmodat4() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
> > > > unlike existing fchmodat.
> > > >
> > > > [1]
> > > > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
> > > > [2]
> > > > https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
> > > >
> > > > Signed-off-by: Palmer Dabbelt <[email protected]>
> > > > Signed-off-by: Alexey Gladkov <[email protected]>
> > >
> > > I don't know the history of why we ended up with the different
> > > interface, or whether this was done intentionally in the kernel
> > > or if we want this syscall.
> > >
> > > Assuming this is in fact needed, I double-checked that the
> > > implementation looks correct to me and is portable to all the
> > > architectures, without the need for a compat wrapper.
> > >
> > > Acked-by: Arnd Bergmann <[email protected]>
> >
> > The system call itself is useful afaict. But please,
> >
> > s/fchmodat4/fchmodat2/
>
> Sure. I will.

Thanks. Can you also wire this up for every architecture, please?
I don't see that this has been done in this series.

2023-07-11 15:35:27

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Add a new fchmodat4() syscall

On Tue, Jul 11, 2023 at 02:24:51PM +0200, Florian Weimer wrote:
> * Alexey Gladkov:
>
> > This patch set adds fchmodat4(), a new syscall. The actual
> > implementation is super simple: essentially it's just the same as
> > fchmodat(), but LOOKUP_FOLLOW is conditionally set based on the flags.
> > I've attempted to make this match "man 2 fchmodat" as closely as
> > possible, which says EINVAL is returned for invalid flags (as opposed to
> > ENOTSUPP, which is currently returned by glibc for AT_SYMLINK_NOFOLLOW).
> > I have a sketch of a glibc patch that I haven't even compiled yet, but
> > seems fairly straight-forward:
> >
> > diff --git a/sysdeps/unix/sysv/linux/fchmodat.c b/sysdeps/unix/sysv/linux/fchmodat.c
> > index 6d9cbc1ce9e0..b1beab76d56c 100644
> > --- a/sysdeps/unix/sysv/linux/fchmodat.c
> > +++ b/sysdeps/unix/sysv/linux/fchmodat.c
> > @@ -29,12 +29,36 @@
> > int
> > fchmodat (int fd, const char *file, mode_t mode, int flag)
> > {
> > - if (flag & ~AT_SYMLINK_NOFOLLOW)
> > - return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
> > -#ifndef __NR_lchmod /* Linux so far has no lchmod syscall. */
> > + /* There are four paths through this code:
> > + - The flags are zero. In this case it's fine to call fchmodat.
> > + - The flags are non-zero and glibc doesn't have access to
> > + __NR_fchmodat4. In this case all we can do is emulate the error codes
> > + defined by the glibc interface from userspace.
> > + - The flags are non-zero, glibc has __NR_fchmodat4, and the kernel has
> > + fchmodat4. This is the simplest case, as the fchmodat4 syscall exactly
> > + matches glibc's library interface so it can be called directly.
> > + - The flags are non-zero, glibc has __NR_fchmodat4, but the kernel does
>
> If you define __NR_fchmodat4 on all architectures, we can use these
> constants directly in glibc. We no longer depend on the UAPI
> definitions of those constants, to cut down the number of code variants,
> and to make glibc's system call profile independent of the kernel header
> version at build time.
>
> Your version is based on 2.31, more recent versions have some reasonable
> emulation for fchmodat based on /proc/self/fd. I even wrote a comment
> describing the same buggy behavior that you witnessed:
>
> + /* Some Linux versions with some file systems can actually
> + change symbolic link permissions via /proc, but this is not
> + intentional, and it gives inconsistent results (e.g., error
> + return despite mode change). The expected behavior is that
> + symbolic link modes cannot be changed at all, and this check
> + enforces that. */
> + if (S_ISLNK (st.st_mode))
> + {
> + __close_nocancel (pathfd);
> + __set_errno (EOPNOTSUPP);
> + return -1;
> + }
>
> I think there was some kernel discussion about that behavior before, but
> apparently, it hasn't led to fixes.

I think I've explained this somewhere else a couple of months ago but
just in case you weren't on that thread or don't remember and apologies
if you should already know.

A lot of filesystem will happily update the mode of a symlink. The VFS
doesn't do anything to prevent this from happening. This is filesystem
specific.

The EOPNOTSUPP you're seeing very likely comes from POSIX ACLs.
Specifically it comes from filesystems that call posix_acl_chmod(),
e.g., btrfs via

if (!err && attr->ia_valid & ATTR_MODE)
err = posix_acl_chmod(idmap, dentry, inode->i_mode);

Most filesystems don't implement i_op->set_acl() for POSIX ACLs.
So posix_acl_chmod() will report EOPNOTSUPP. By the time
posix_acl_chmod() is called, most filesystems will have finished
updating the inode. POSIX ACLs also often aren't integrated into
transactions so a rollback wouldn't even be possible on some
filesystems.

Any filesystem that doesn't implement POSIX ACLs at all will obviously
never fail unless it blocks mode changes on symlinks. Or filesystems
that do have a way to rollback failures from posix_acl_chmod(), or
filesystems that do return an error on chmod() on symlinks such as 9p,
ntfs, ocfs2.

>
> I wonder if it makes sense to add a similar error return to the system
> call implementation?

Hm, blocking symlink mode changes is pretty regression prone. And just
blocking it through one interface seems weird and makes things even more
inconsistent.

So two options I see:
(1) minimally invasive:
Filesystems that do call posix_acl_chmod() on symlinks need to be
changed to stop doing that.
(2) might hit us on the head invasive:
Try and block symlink mode changes in chmod_common().

Thoughts?

2023-07-11 16:09:01

by Alexey Gladkov

[permalink] [raw]
Subject: Re: [PATCH v3 2/5] fs: Add fchmodat4()

On Tue, Jul 11, 2023 at 04:01:03PM +0200, Christian Brauner wrote:
> On Tue, Jul 11, 2023 at 02:51:01PM +0200, Alexey Gladkov wrote:
> > On Tue, Jul 11, 2023 at 01:52:01PM +0200, Christian Brauner wrote:
> > > On Tue, Jul 11, 2023 at 01:42:19PM +0200, Arnd Bergmann wrote:
> > > > On Tue, Jul 11, 2023, at 13:25, Alexey Gladkov wrote:
> > > > > From: Palmer Dabbelt <[email protected]>
> > > > >
> > > > > On the userspace side fchmodat(3) is implemented as a wrapper
> > > > > function which implements the POSIX-specified interface. This
> > > > > interface differs from the underlying kernel system call, which does not
> > > > > have a flags argument. Most implementations require procfs [1][2].
> > > > >
> > > > > There doesn't appear to be a good userspace workaround for this issue
> > > > > but the implementation in the kernel is pretty straight-forward.
> > > > >
> > > > > The new fchmodat4() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
> > > > > unlike existing fchmodat.
> > > > >
> > > > > [1]
> > > > > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
> > > > > [2]
> > > > > https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
> > > > >
> > > > > Signed-off-by: Palmer Dabbelt <[email protected]>
> > > > > Signed-off-by: Alexey Gladkov <[email protected]>
> > > >
> > > > I don't know the history of why we ended up with the different
> > > > interface, or whether this was done intentionally in the kernel
> > > > or if we want this syscall.
> > > >
> > > > Assuming this is in fact needed, I double-checked that the
> > > > implementation looks correct to me and is portable to all the
> > > > architectures, without the need for a compat wrapper.
> > > >
> > > > Acked-by: Arnd Bergmann <[email protected]>
> > >
> > > The system call itself is useful afaict. But please,
> > >
> > > s/fchmodat4/fchmodat2/
> >
> > Sure. I will.
>
> Thanks. Can you also wire this up for every architecture, please?
> I don't see that this has been done in this series.

Sure. I have already added in all architectures as far as I can tell:

$ diff -s <(find arch/ -name '*.tbl' |sort -u) <(git grep -lw fchmodat2 arch/ |sort -u)
Files /dev/fd/63 and /dev/fd/62 are identical

--
Rgrds, legion


2023-07-11 16:25:02

by Alexey Gladkov

[permalink] [raw]
Subject: [PATCH v4 0/5] Add a new fchmodat2() syscall

In glibc, the fchmodat(3) function has a flags argument according to the
POSIX specification [1], but kernel syscalls has no such argument.
Therefore, libc implementations do workarounds using /proc. However,
this requires procfs to be mounted and accessible.

This patch set adds fchmodat2(), a new syscall. The syscall allows to
pass the AT_SYMLINK_NOFOLLOW flag to disable LOOKUP_FOLLOW. In all other
respects, this syscall is no different from fchmodat().

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/chmod.html

Changes since v3 [[email protected]]:

* Rebased to master because a new syscall has appeared in master.
* Increased __NR_compat_syscalls as pointed out by Arnd Bergmann.
* Syscall renamed fchmodat4 -> fchmodat2 as suggested by Christian Brauner.
* Returned do_fchmodat4() the original name. We don't need to version
internal functions.
* Fixed warnings found by checkpatch.pl.

Changes since v2 [[email protected]]:

* Rebased to master.
* The lookup_flags passed to sys_fchmodat4 as suggested by Al Viro.
* Selftest added.

Changes since v1 [[email protected]]:

* All architectures are now supported, which support squashed into a
single patch.
* The do_fchmodat() helper function has been removed, in favor of directly
calling do_fchmodat4().
* The patches are based on 5.2 instead of 5.1.

---

Alexey Gladkov (2):
fs: Add fchmodat2()
selftests: Add fchmodat2 selftest

Palmer Dabbelt (3):
Non-functional cleanup of a "__user * filename"
arch: Register fchmodat2, usually as syscall 452
tools headers UAPI: Sync files changed by new fchmodat2 syscall

arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/open.c | 18 +-
include/linux/syscalls.h | 4 +-
include/uapi/asm-generic/unistd.h | 5 +-
tools/include/uapi/asm-generic/unistd.h | 5 +-
.../arch/mips/entry/syscalls/syscall_n64.tbl | 2 +
.../arch/powerpc/entry/syscalls/syscall.tbl | 2 +
.../perf/arch/s390/entry/syscalls/syscall.tbl | 2 +
.../arch/x86/entry/syscalls/syscall_64.tbl | 2 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/fchmodat2/.gitignore | 2 +
tools/testing/selftests/fchmodat2/Makefile | 6 +
.../selftests/fchmodat2/fchmodat2_test.c | 162 ++++++++++++++++++
30 files changed, 223 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/fchmodat2/.gitignore
create mode 100644 tools/testing/selftests/fchmodat2/Makefile
create mode 100644 tools/testing/selftests/fchmodat2/fchmodat2_test.c

--
2.33.8


2023-07-11 16:26:03

by Alexey Gladkov

[permalink] [raw]
Subject: [PATCH v4 1/5] Non-functional cleanup of a "__user * filename"

From: Palmer Dabbelt <[email protected]>

The next patch defines a very similar interface, which I copied from
this definition. Since I'm touching it anyway I don't see any reason
not to just go fix this one up.

Signed-off-by: Palmer Dabbelt <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
---
include/linux/syscalls.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 03e3d0121d5e..584f404bf868 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -438,7 +438,7 @@ asmlinkage long sys_chdir(const char __user *filename);
asmlinkage long sys_fchdir(unsigned int fd);
asmlinkage long sys_chroot(const char __user *filename);
asmlinkage long sys_fchmod(unsigned int fd, umode_t mode);
-asmlinkage long sys_fchmodat(int dfd, const char __user * filename,
+asmlinkage long sys_fchmodat(int dfd, const char __user *filename,
umode_t mode);
asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
gid_t group, int flag);
--
2.33.8


2023-07-11 16:26:29

by Alexey Gladkov

[permalink] [raw]
Subject: [PATCH v4 2/5] fs: Add fchmodat2()

On the userspace side fchmodat(3) is implemented as a wrapper
function which implements the POSIX-specified interface. This
interface differs from the underlying kernel system call, which does not
have a flags argument. Most implementations require procfs [1][2].

There doesn't appear to be a good userspace workaround for this issue
but the implementation in the kernel is pretty straight-forward.

The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
unlike existing fchmodat.

[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
[2] https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28

Co-developed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
---
fs/open.c | 18 ++++++++++++++----
include/linux/syscalls.h | 2 ++
2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 0c55c8e7f837..39a7939f0d00 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -671,11 +671,11 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
return err;
}

-static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
+static int do_fchmodat(int dfd, const char __user *filename, umode_t mode, int lookup_flags)
{
struct path path;
int error;
- unsigned int lookup_flags = LOOKUP_FOLLOW;
+
retry:
error = user_path_at(dfd, filename, lookup_flags, &path);
if (!error) {
@@ -689,15 +689,25 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
return error;
}

+SYSCALL_DEFINE4(fchmodat2, int, dfd, const char __user *, filename,
+ umode_t, mode, int, flags)
+{
+ if (unlikely(flags & ~AT_SYMLINK_NOFOLLOW))
+ return -EINVAL;
+
+ return do_fchmodat(dfd, filename, mode,
+ flags & AT_SYMLINK_NOFOLLOW ? 0 : LOOKUP_FOLLOW);
+}
+
SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename,
umode_t, mode)
{
- return do_fchmodat(dfd, filename, mode);
+ return do_fchmodat(dfd, filename, mode, LOOKUP_FOLLOW);
}

SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
{
- return do_fchmodat(AT_FDCWD, filename, mode);
+ return do_fchmodat(AT_FDCWD, filename, mode, LOOKUP_FOLLOW);
}

/*
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 584f404bf868..6e852279fbc3 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -440,6 +440,8 @@ asmlinkage long sys_chroot(const char __user *filename);
asmlinkage long sys_fchmod(unsigned int fd, umode_t mode);
asmlinkage long sys_fchmodat(int dfd, const char __user *filename,
umode_t mode);
+asmlinkage long sys_fchmodat2(int dfd, const char __user *filename,
+ umode_t mode, int flags);
asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
gid_t group, int flag);
asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group);
--
2.33.8


2023-07-11 16:27:32

by Alexey Gladkov

[permalink] [raw]
Subject: [PATCH v4 3/5] arch: Register fchmodat2, usually as syscall 452

From: Palmer Dabbelt <[email protected]>

This registers the new fchmodat2 syscall in most places as nuber 452,
with alpha being the exception where it's 562. I found all these sites
by grepping for fspick, which I assume has found me everything.

Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
---
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 ++
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/uapi/asm-generic/unistd.h | 5 ++++-
19 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 1f13995d00d7..ad37569d0507 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -491,3 +491,4 @@
559 common futex_waitv sys_futex_waitv
560 common set_mempolicy_home_node sys_ni_syscall
561 common cachestat sys_cachestat
+562 common fchmodat2 sys_fchmodat2
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 8ebed8a13874..c572d6c3dee0 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -465,3 +465,4 @@
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
451 common cachestat sys_cachestat
+452 common fchmodat2 sys_fchmodat2
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 64a514f90131..bd77253b62e0 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -39,7 +39,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)

-#define __NR_compat_syscalls 452
+#define __NR_compat_syscalls 453
#endif

#define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index d952a28463e0..78b68311ec81 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -909,6 +909,8 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv)
__SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
#define __NR_cachestat 451
__SYSCALL(__NR_cachestat, sys_cachestat)
+#define __NR_fchmodat2 452
+__SYSCALL(__NR_fchmodat2, sys_fchmodat2)

/*
* Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index f8c74ffeeefb..83d8609aec03 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -372,3 +372,4 @@
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
451 common cachestat sys_cachestat
+452 common fchmodat2 sys_fchmodat2
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 4f504783371f..259ceb125367 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -451,3 +451,4 @@
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
451 common cachestat sys_cachestat
+452 common fchmodat2 sys_fchmodat2
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 858d22bf275c..a3798c2637fd 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -457,3 +457,4 @@
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
451 common cachestat sys_cachestat
+452 common fchmodat2 sys_fchmodat2
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 1976317d4e8b..152034b8e0a0 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -390,3 +390,4 @@
449 n32 futex_waitv sys_futex_waitv
450 n32 set_mempolicy_home_node sys_set_mempolicy_home_node
451 n32 cachestat sys_cachestat
+452 n32 fchmodat2 sys_fchmodat2
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index cfda2511badf..cb5e757f6621 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -366,3 +366,4 @@
449 n64 futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
451 n64 cachestat sys_cachestat
+452 n64 fchmodat2 sys_fchmodat2
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 7692234c3768..1a646813afdc 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -439,3 +439,4 @@
449 o32 futex_waitv sys_futex_waitv
450 o32 set_mempolicy_home_node sys_set_mempolicy_home_node
451 o32 cachestat sys_cachestat
+452 o32 fchmodat2 sys_fchmodat2
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index a0a9145b6dd4..e97c175b56f9 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -450,3 +450,4 @@
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
451 common cachestat sys_cachestat
+452 common fchmodat2 sys_fchmodat2
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 8c0b08b7a80e..20e50586e8a2 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -538,3 +538,4 @@
449 common futex_waitv sys_futex_waitv
450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node
451 common cachestat sys_cachestat
+452 common fchmodat2 sys_fchmodat2
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index a6935af2235c..0122cc156952 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -454,3 +454,4 @@
449 common futex_waitv sys_futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_mempolicy_home_node
451 common cachestat sys_cachestat sys_cachestat
+452 common fchmodat2 sys_fchmodat2 sys_fchmodat2
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 97377e8c5025..e90d585c4d3e 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -454,3 +454,4 @@
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
451 common cachestat sys_cachestat
+452 common fchmodat2 sys_fchmodat2
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index faa835f3c54a..4ed06c71c43f 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -497,3 +497,4 @@
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
451 common cachestat sys_cachestat
+452 common fchmodat2 sys_fchmodat2
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index bc0a3c941b35..2d0b1bd866ea 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -456,3 +456,4 @@
449 i386 futex_waitv sys_futex_waitv
450 i386 set_mempolicy_home_node sys_set_mempolicy_home_node
451 i386 cachestat sys_cachestat
+452 i386 fchmodat2 sys_fchmodat2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 227538b0ce80..814768249eae 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -373,6 +373,7 @@
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
451 common cachestat sys_cachestat
+452 common fchmodat2 sys_fchmodat2

#
# Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 2b69c3c035b6..fc1a4f3c81d9 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -422,3 +422,4 @@
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
451 common cachestat sys_cachestat
+452 common fchmodat2 sys_fchmodat2
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index fd6c1cb585db..abe087c53b4b 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -820,8 +820,11 @@ __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
#define __NR_cachestat 451
__SYSCALL(__NR_cachestat, sys_cachestat)

+#define __NR_fchmodat2 452
+__SYSCALL(__NR_fchmodat2, sys_fchmodat2)
+
#undef __NR_syscalls
-#define __NR_syscalls 452
+#define __NR_syscalls 453

/*
* 32 bit systems traditionally used different
--
2.33.8


2023-07-11 16:39:16

by Alexey Gladkov

[permalink] [raw]
Subject: [PATCH v4 4/5] tools headers UAPI: Sync files changed by new fchmodat2 syscall

From: Palmer Dabbelt <[email protected]>

That add support for this new syscall in tools such as 'perf trace'.

Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
---
tools/include/uapi/asm-generic/unistd.h | 5 ++++-
tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl | 2 ++
tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 2 ++
tools/perf/arch/s390/entry/syscalls/syscall.tbl | 2 ++
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
5 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index dd7d8e10f16d..76b5922b0d39 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -817,8 +817,11 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv)
#define __NR_set_mempolicy_home_node 450
__SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)

+#define __NR_fchmodat2 452
+__SYSCALL(__NR_fchmodat2, sys_fchmodat2)
+
#undef __NR_syscalls
-#define __NR_syscalls 451
+#define __NR_syscalls 453

/*
* 32 bit systems traditionally used different
diff --git a/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl b/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
index 3f1886ad9d80..434728af4eaa 100644
--- a/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
+++ b/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
@@ -365,3 +365,5 @@
448 n64 process_mrelease sys_process_mrelease
449 n64 futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+# 451 reserved for cachestat
+452 n64 fchmodat2 sys_fchmodat2
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index a0be127475b1..6b70b6705bd7 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -537,3 +537,5 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node
+# 451 reserved for cachestat
+452 common fchmodat2 sys_fchmodat2
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index b68f47541169..0ed90c9535b0 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -453,3 +453,5 @@
448 common process_mrelease sys_process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_mempolicy_home_node
+# 451 reserved for cachestat
+452 common fchmodat2 sys_fchmodat2 sys_fchmodat2
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index c84d12608cd2..a008724a1f48 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -372,6 +372,8 @@
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
450 common set_mempolicy_home_node sys_set_mempolicy_home_node
+# 451 reserved for cachestat
+452 common fchmodat2 sys_fchmodat2

#
# Due to a historical design error, certain syscalls are numbered differently
--
2.33.8


2023-07-11 16:39:17

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v4 3/5] arch: Register fchmodat2, usually as syscall 452

On Tue, Jul 11, 2023, at 18:16, Alexey Gladkov wrote:
> From: Palmer Dabbelt <[email protected]>
>
> This registers the new fchmodat2 syscall in most places as nuber 452,
> with alpha being the exception where it's 562. I found all these sites
> by grepping for fspick, which I assume has found me everything.
>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Alexey Gladkov <[email protected]>

Acked-by: Arnd Bergmann <[email protected]>

2023-07-11 16:49:52

by Alexey Gladkov

[permalink] [raw]
Subject: [PATCH v4 5/5] selftests: Add fchmodat2 selftest

The test marks as skipped if a syscall with the AT_SYMLINK_NOFOLLOW flag
fails. This is because not all filesystems support changing the mode
bits of symlinks properly. These filesystems return an error but change
the mode bits:

newfstatat(4, "regfile", {st_mode=S_IFREG|0640, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(4, "symlink", {st_mode=S_IFLNK|0777, st_size=7, ...}, AT_SYMLINK_NOFOLLOW) = 0
syscall_0x1c3(0x4, 0x55fa1f244396, 0x180, 0x100, 0x55fa1f24438e, 0x34) = -1 EOPNOTSUPP (Operation not supported)
newfstatat(4, "regfile", {st_mode=S_IFREG|0640, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0

This happens with btrfs and xfs:

$ tools/testing/selftests/fchmodat2/fchmodat2_test
TAP version 13
1..1
ok 1 # SKIP fchmodat2(symlink)
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0

$ stat /tmp/ksft-fchmodat2.*/symlink
File: /tmp/ksft-fchmodat2.3NCqlE/symlink -> regfile
Size: 7 Blocks: 0 IO Block: 4096 symbolic link
Device: 7,0 Inode: 133 Links: 1
Access: (0600/lrw-------) Uid: ( 0/ root) Gid: ( 0/ root)

Signed-off-by: Alexey Gladkov <[email protected]>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/fchmodat2/.gitignore | 2 +
tools/testing/selftests/fchmodat2/Makefile | 6 +
.../selftests/fchmodat2/fchmodat2_test.c | 162 ++++++++++++++++++
4 files changed, 171 insertions(+)
create mode 100644 tools/testing/selftests/fchmodat2/.gitignore
create mode 100644 tools/testing/selftests/fchmodat2/Makefile
create mode 100644 tools/testing/selftests/fchmodat2/fchmodat2_test.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 666b56f22a41..8dca8acdb671 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -18,6 +18,7 @@ TARGETS += drivers/net/bonding
TARGETS += drivers/net/team
TARGETS += efivarfs
TARGETS += exec
+TARGETS += fchmodat2
TARGETS += filesystems
TARGETS += filesystems/binderfs
TARGETS += filesystems/epoll
diff --git a/tools/testing/selftests/fchmodat2/.gitignore b/tools/testing/selftests/fchmodat2/.gitignore
new file mode 100644
index 000000000000..82a4846cbc4b
--- /dev/null
+++ b/tools/testing/selftests/fchmodat2/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+/*_test
diff --git a/tools/testing/selftests/fchmodat2/Makefile b/tools/testing/selftests/fchmodat2/Makefile
new file mode 100644
index 000000000000..45b519eab851
--- /dev/null
+++ b/tools/testing/selftests/fchmodat2/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined
+TEST_GEN_PROGS := fchmodat2_test
+
+include ../lib.mk
diff --git a/tools/testing/selftests/fchmodat2/fchmodat2_test.c b/tools/testing/selftests/fchmodat2/fchmodat2_test.c
new file mode 100644
index 000000000000..2d98eb215bc6
--- /dev/null
+++ b/tools/testing/selftests/fchmodat2/fchmodat2_test.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <syscall.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+
+#ifndef __NR_fchmodat2
+ #if defined __alpha__
+ #define __NR_fchmodat2 562
+ #elif defined _MIPS_SIM
+ #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */
+ #define __NR_fchmodat2 (452 + 4000)
+ #endif
+ #if _MIPS_SIM == _MIPS_SIM_NABI32 /* n32 */
+ #define __NR_fchmodat2 (452 + 6000)
+ #endif
+ #if _MIPS_SIM == _MIPS_SIM_ABI64 /* n64 */
+ #define __NR_fchmodat2 (452 + 5000)
+ #endif
+ #elif defined __ia64__
+ #define __NR_fchmodat2 (452 + 1024)
+ #else
+ #define __NR_fchmodat2 452
+ #endif
+#endif
+
+int sys_fchmodat2(int dfd, const char *filename, mode_t mode, int flags)
+{
+ int ret = syscall(__NR_fchmodat2, dfd, filename, mode, flags);
+
+ return ret >= 0 ? ret : -errno;
+}
+
+int setup_testdir(void)
+{
+ int dfd, ret;
+ char dirname[] = "/tmp/ksft-fchmodat2.XXXXXX";
+
+ /* Make the top-level directory. */
+ if (!mkdtemp(dirname))
+ ksft_exit_fail_msg("%s: failed to create tmpdir\n", __func__);
+
+ dfd = open(dirname, O_PATH | O_DIRECTORY);
+ if (dfd < 0)
+ ksft_exit_fail_msg("%s: failed to open tmpdir\n", __func__);
+
+ ret = openat(dfd, "regfile", O_CREAT | O_WRONLY | O_TRUNC, 0644);
+ if (ret < 0)
+ ksft_exit_fail_msg("%s: failed to create file in tmpdir\n",
+ __func__);
+ close(ret);
+
+ ret = symlinkat("regfile", dfd, "symlink");
+ if (ret < 0)
+ ksft_exit_fail_msg("%s: failed to create symlink in tmpdir\n",
+ __func__);
+
+ return dfd;
+}
+
+int expect_mode(int dfd, const char *filename, mode_t expect_mode)
+{
+ struct stat st;
+ int ret = fstatat(dfd, filename, &st, AT_SYMLINK_NOFOLLOW);
+
+ if (ret)
+ ksft_exit_fail_msg("%s: %s: fstatat failed\n",
+ __func__, filename);
+
+ return (st.st_mode == expect_mode);
+}
+
+void test_regfile(void)
+{
+ int dfd, ret;
+
+ dfd = setup_testdir();
+
+ ret = sys_fchmodat2(dfd, "regfile", 0640, 0);
+
+ if (ret < 0)
+ ksft_exit_fail_msg("%s: fchmodat2(noflag) failed\n", __func__);
+
+ if (!expect_mode(dfd, "regfile", 0100640))
+ ksft_exit_fail_msg("%s: wrong file mode bits after fchmodat2\n",
+ __func__);
+
+ ret = sys_fchmodat2(dfd, "regfile", 0600, AT_SYMLINK_NOFOLLOW);
+
+ if (ret < 0)
+ ksft_exit_fail_msg("%s: fchmodat2(AT_SYMLINK_NOFOLLOW) failed\n",
+ __func__);
+
+ if (!expect_mode(dfd, "regfile", 0100600))
+ ksft_exit_fail_msg("%s: wrong file mode bits after fchmodat2 with nofollow\n",
+ __func__);
+
+ ksft_test_result_pass("fchmodat2(regfile)\n");
+}
+
+void test_symlink(void)
+{
+ int dfd, ret;
+
+ dfd = setup_testdir();
+
+ ret = sys_fchmodat2(dfd, "symlink", 0640, 0);
+
+ if (ret < 0)
+ ksft_exit_fail_msg("%s: fchmodat2(noflag) failed\n", __func__);
+
+ if (!expect_mode(dfd, "regfile", 0100640))
+ ksft_exit_fail_msg("%s: wrong file mode bits after fchmodat2\n",
+ __func__);
+
+ if (!expect_mode(dfd, "symlink", 0120777))
+ ksft_exit_fail_msg("%s: wrong symlink mode bits after fchmodat2\n",
+ __func__);
+
+ ret = sys_fchmodat2(dfd, "symlink", 0600, AT_SYMLINK_NOFOLLOW);
+
+ /*
+ * On certain filesystems (xfs or btrfs), chmod operation fails. So we
+ * first check the symlink target but if the operation fails we mark the
+ * test as skipped.
+ *
+ * https://sourceware.org/legacy-ml/libc-alpha/2020-02/msg00467.html
+ */
+ if (ret == 0 && !expect_mode(dfd, "symlink", 0120600))
+ ksft_exit_fail_msg("%s: wrong symlink mode bits after fchmodat2 with nofollow\n",
+ __func__);
+
+ if (!expect_mode(dfd, "regfile", 0100640))
+ ksft_exit_fail_msg("%s: wrong file mode bits after fchmodat2 with nofollow\n",
+ __func__);
+
+ if (ret != 0)
+ ksft_test_result_skip("fchmodat2(symlink)\n");
+ else
+ ksft_test_result_pass("fchmodat2(symlink)\n");
+}
+
+#define NUM_TESTS 2
+
+int main(int argc, char **argv)
+{
+ ksft_print_header();
+ ksft_set_plan(NUM_TESTS);
+
+ test_regfile();
+ test_symlink();
+
+ if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
+ ksft_exit_fail();
+ else
+ ksft_exit_pass();
+}
--
2.33.8


2023-07-11 17:26:09

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

On Tue, Jul 11, 2023 at 06:16:04PM +0200, Alexey Gladkov wrote:
> On the userspace side fchmodat(3) is implemented as a wrapper
> function which implements the POSIX-specified interface. This
> interface differs from the underlying kernel system call, which does not
> have a flags argument. Most implementations require procfs [1][2].
>
> There doesn't appear to be a good userspace workaround for this issue
> but the implementation in the kernel is pretty straight-forward.
>
> The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
> unlike existing fchmodat.
>
> [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
> [2] https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
>
> Co-developed-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Alexey Gladkov <[email protected]>
> Acked-by: Arnd Bergmann <[email protected]>
> ---
> fs/open.c | 18 ++++++++++++++----
> include/linux/syscalls.h | 2 ++
> 2 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/fs/open.c b/fs/open.c
> index 0c55c8e7f837..39a7939f0d00 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -671,11 +671,11 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
> return err;
> }
>
> -static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> +static int do_fchmodat(int dfd, const char __user *filename, umode_t mode, int lookup_flags)

Should all be unsigned instead of int here for flags. We also had a
documentation update to that effect but smh never sent it.
user_path_at() itself takes an unsigned as well.

I'll fix that up though.

2023-07-11 17:41:15

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] tools headers UAPI: Sync files changed by new fchmodat2 syscall

Hello,

On Tue, Jul 11, 2023 at 9:18 AM Alexey Gladkov <[email protected]> wrote:
>
> From: Palmer Dabbelt <[email protected]>
>
> That add support for this new syscall in tools such as 'perf trace'.
>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Alexey Gladkov <[email protected]>
> ---
> tools/include/uapi/asm-generic/unistd.h | 5 ++++-
> tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl | 2 ++
> tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 2 ++
> tools/perf/arch/s390/entry/syscalls/syscall.tbl | 2 ++
> tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
> 5 files changed, 12 insertions(+), 1 deletion(-)

It'd be nice if you route this patch separately through the
perf tools tree. We can add this after the kernel change
is accepted.

Thanks,
Namhyung


>
> diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
> index dd7d8e10f16d..76b5922b0d39 100644
> --- a/tools/include/uapi/asm-generic/unistd.h
> +++ b/tools/include/uapi/asm-generic/unistd.h
> @@ -817,8 +817,11 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv)
> #define __NR_set_mempolicy_home_node 450
> __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
>
> +#define __NR_fchmodat2 452
> +__SYSCALL(__NR_fchmodat2, sys_fchmodat2)
> +
> #undef __NR_syscalls
> -#define __NR_syscalls 451
> +#define __NR_syscalls 453
>
> /*
> * 32 bit systems traditionally used different
> diff --git a/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl b/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
> index 3f1886ad9d80..434728af4eaa 100644
> --- a/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
> +++ b/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
> @@ -365,3 +365,5 @@
> 448 n64 process_mrelease sys_process_mrelease
> 449 n64 futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> +# 451 reserved for cachestat
> +452 n64 fchmodat2 sys_fchmodat2
> diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
> index a0be127475b1..6b70b6705bd7 100644
> --- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
> +++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
> @@ -537,3 +537,5 @@
> 448 common process_mrelease sys_process_mrelease
> 449 common futex_waitv sys_futex_waitv
> 450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node
> +# 451 reserved for cachestat
> +452 common fchmodat2 sys_fchmodat2
> diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
> index b68f47541169..0ed90c9535b0 100644
> --- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
> +++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
> @@ -453,3 +453,5 @@
> 448 common process_mrelease sys_process_mrelease sys_process_mrelease
> 449 common futex_waitv sys_futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_mempolicy_home_node
> +# 451 reserved for cachestat
> +452 common fchmodat2 sys_fchmodat2 sys_fchmodat2
> diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
> index c84d12608cd2..a008724a1f48 100644
> --- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -372,6 +372,8 @@
> 448 common process_mrelease sys_process_mrelease
> 449 common futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> +# 451 reserved for cachestat
> +452 common fchmodat2 sys_fchmodat2
>
> #
> # Due to a historical design error, certain syscalls are numbered differently
> --
> 2.33.8
>

2023-07-11 17:58:18

by Alexey Gladkov

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] tools headers UAPI: Sync files changed by new fchmodat2 syscall

On Tue, Jul 11, 2023 at 10:19:35AM -0700, Namhyung Kim wrote:
> Hello,
>
> On Tue, Jul 11, 2023 at 9:18 AM Alexey Gladkov <[email protected]> wrote:
> >
> > From: Palmer Dabbelt <[email protected]>
> >
> > That add support for this new syscall in tools such as 'perf trace'.
> >
> > Signed-off-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Alexey Gladkov <[email protected]>
> > ---
> > tools/include/uapi/asm-generic/unistd.h | 5 ++++-
> > tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl | 2 ++
> > tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 2 ++
> > tools/perf/arch/s390/entry/syscalls/syscall.tbl | 2 ++
> > tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
> > 5 files changed, 12 insertions(+), 1 deletion(-)
>
> It'd be nice if you route this patch separately through the
> perf tools tree. We can add this after the kernel change
> is accepted.

Sure. No problem.

> >
> > diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
> > index dd7d8e10f16d..76b5922b0d39 100644
> > --- a/tools/include/uapi/asm-generic/unistd.h
> > +++ b/tools/include/uapi/asm-generic/unistd.h
> > @@ -817,8 +817,11 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv)
> > #define __NR_set_mempolicy_home_node 450
> > __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
> >
> > +#define __NR_fchmodat2 452
> > +__SYSCALL(__NR_fchmodat2, sys_fchmodat2)
> > +
> > #undef __NR_syscalls
> > -#define __NR_syscalls 451
> > +#define __NR_syscalls 453
> >
> > /*
> > * 32 bit systems traditionally used different
> > diff --git a/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl b/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
> > index 3f1886ad9d80..434728af4eaa 100644
> > --- a/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
> > +++ b/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
> > @@ -365,3 +365,5 @@
> > 448 n64 process_mrelease sys_process_mrelease
> > 449 n64 futex_waitv sys_futex_waitv
> > 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> > +# 451 reserved for cachestat
> > +452 n64 fchmodat2 sys_fchmodat2
> > diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
> > index a0be127475b1..6b70b6705bd7 100644
> > --- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
> > +++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
> > @@ -537,3 +537,5 @@
> > 448 common process_mrelease sys_process_mrelease
> > 449 common futex_waitv sys_futex_waitv
> > 450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node
> > +# 451 reserved for cachestat
> > +452 common fchmodat2 sys_fchmodat2
> > diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
> > index b68f47541169..0ed90c9535b0 100644
> > --- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
> > +++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
> > @@ -453,3 +453,5 @@
> > 448 common process_mrelease sys_process_mrelease sys_process_mrelease
> > 449 common futex_waitv sys_futex_waitv sys_futex_waitv
> > 450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_mempolicy_home_node
> > +# 451 reserved for cachestat
> > +452 common fchmodat2 sys_fchmodat2 sys_fchmodat2
> > diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
> > index c84d12608cd2..a008724a1f48 100644
> > --- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
> > +++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
> > @@ -372,6 +372,8 @@
> > 448 common process_mrelease sys_process_mrelease
> > 449 common futex_waitv sys_futex_waitv
> > 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> > +# 451 reserved for cachestat
> > +452 common fchmodat2 sys_fchmodat2
> >
> > #
> > # Due to a historical design error, certain syscalls are numbered differently
> > --
> > 2.33.8
> >
>

--
Rgrds, legion


2023-07-11 18:17:04

by Christian Brauner

[permalink] [raw]
Subject: Re: (subset) [PATCH v4 0/5] Add a new fchmodat2() syscall

On Tue, 11 Jul 2023 18:16:02 +0200, Alexey Gladkov wrote:
> In glibc, the fchmodat(3) function has a flags argument according to the
> POSIX specification [1], but kernel syscalls has no such argument.
> Therefore, libc implementations do workarounds using /proc. However,
> this requires procfs to be mounted and accessible.
>
> This patch set adds fchmodat2(), a new syscall. The syscall allows to
> pass the AT_SYMLINK_NOFOLLOW flag to disable LOOKUP_FOLLOW. In all other
> respects, this syscall is no different from fchmodat().
>
> [...]

Tools updates usually go separately.
Flags argument ported to unsigned int; otherwise unchanged.

---

Applied to the master branch of the vfs/vfs.git tree.
Patches in the master branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: master

[1/5] Non-functional cleanup of a "__user * filename"
https://git.kernel.org/vfs/vfs/c/0f05a6af6b7e
[2/5] fs: Add fchmodat2()
https://git.kernel.org/vfs/vfs/c/8d593559ec09
[3/5] arch: Register fchmodat2, usually as syscall 452
https://git.kernel.org/vfs/vfs/c/2ee63b04f206
[5/5] selftests: Add fchmodat2 selftest
https://git.kernel.org/vfs/vfs/c/f175b92081ec

2023-07-25 07:35:41

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 3/5] arch: Register fchmodat2, usually as syscall 452

On Tue, Jul 11, 2023 at 6:25 PM Alexey Gladkov <[email protected]> wrote:
> From: Palmer Dabbelt <[email protected]>
>
> This registers the new fchmodat2 syscall in most places as nuber 452,
> with alpha being the exception where it's 562. I found all these sites
> by grepping for fspick, which I assume has found me everything.
>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Alexey Gladkov <[email protected]>

> arch/m68k/kernel/syscalls/syscall.tbl | 1 +

Acked-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2023-07-25 11:54:23

by Alexey Gladkov

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Add a new fchmodat4() syscall

On Tue, Jul 11, 2023 at 05:14:24PM +0200, Christian Brauner wrote:
> On Tue, Jul 11, 2023 at 02:24:51PM +0200, Florian Weimer wrote:
> > * Alexey Gladkov:
> >
> > > This patch set adds fchmodat4(), a new syscall. The actual
> > > implementation is super simple: essentially it's just the same as
> > > fchmodat(), but LOOKUP_FOLLOW is conditionally set based on the flags.
> > > I've attempted to make this match "man 2 fchmodat" as closely as
> > > possible, which says EINVAL is returned for invalid flags (as opposed to
> > > ENOTSUPP, which is currently returned by glibc for AT_SYMLINK_NOFOLLOW).
> > > I have a sketch of a glibc patch that I haven't even compiled yet, but
> > > seems fairly straight-forward:
> > >
> > > diff --git a/sysdeps/unix/sysv/linux/fchmodat.c b/sysdeps/unix/sysv/linux/fchmodat.c
> > > index 6d9cbc1ce9e0..b1beab76d56c 100644
> > > --- a/sysdeps/unix/sysv/linux/fchmodat.c
> > > +++ b/sysdeps/unix/sysv/linux/fchmodat.c
> > > @@ -29,12 +29,36 @@
> > > int
> > > fchmodat (int fd, const char *file, mode_t mode, int flag)
> > > {
> > > - if (flag & ~AT_SYMLINK_NOFOLLOW)
> > > - return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
> > > -#ifndef __NR_lchmod /* Linux so far has no lchmod syscall. */
> > > + /* There are four paths through this code:
> > > + - The flags are zero. In this case it's fine to call fchmodat.
> > > + - The flags are non-zero and glibc doesn't have access to
> > > + __NR_fchmodat4. In this case all we can do is emulate the error codes
> > > + defined by the glibc interface from userspace.
> > > + - The flags are non-zero, glibc has __NR_fchmodat4, and the kernel has
> > > + fchmodat4. This is the simplest case, as the fchmodat4 syscall exactly
> > > + matches glibc's library interface so it can be called directly.
> > > + - The flags are non-zero, glibc has __NR_fchmodat4, but the kernel does
> >
> > If you define __NR_fchmodat4 on all architectures, we can use these
> > constants directly in glibc. We no longer depend on the UAPI
> > definitions of those constants, to cut down the number of code variants,
> > and to make glibc's system call profile independent of the kernel header
> > version at build time.
> >
> > Your version is based on 2.31, more recent versions have some reasonable
> > emulation for fchmodat based on /proc/self/fd. I even wrote a comment
> > describing the same buggy behavior that you witnessed:
> >
> > + /* Some Linux versions with some file systems can actually
> > + change symbolic link permissions via /proc, but this is not
> > + intentional, and it gives inconsistent results (e.g., error
> > + return despite mode change). The expected behavior is that
> > + symbolic link modes cannot be changed at all, and this check
> > + enforces that. */
> > + if (S_ISLNK (st.st_mode))
> > + {
> > + __close_nocancel (pathfd);
> > + __set_errno (EOPNOTSUPP);
> > + return -1;
> > + }
> >
> > I think there was some kernel discussion about that behavior before, but
> > apparently, it hasn't led to fixes.
>
> I think I've explained this somewhere else a couple of months ago but
> just in case you weren't on that thread or don't remember and apologies
> if you should already know.
>
> A lot of filesystem will happily update the mode of a symlink. The VFS
> doesn't do anything to prevent this from happening. This is filesystem
> specific.
>
> The EOPNOTSUPP you're seeing very likely comes from POSIX ACLs.
> Specifically it comes from filesystems that call posix_acl_chmod(),
> e.g., btrfs via
>
> if (!err && attr->ia_valid & ATTR_MODE)
> err = posix_acl_chmod(idmap, dentry, inode->i_mode);
>
> Most filesystems don't implement i_op->set_acl() for POSIX ACLs.
> So posix_acl_chmod() will report EOPNOTSUPP. By the time
> posix_acl_chmod() is called, most filesystems will have finished
> updating the inode. POSIX ACLs also often aren't integrated into
> transactions so a rollback wouldn't even be possible on some
> filesystems.
>
> Any filesystem that doesn't implement POSIX ACLs at all will obviously
> never fail unless it blocks mode changes on symlinks. Or filesystems
> that do have a way to rollback failures from posix_acl_chmod(), or
> filesystems that do return an error on chmod() on symlinks such as 9p,
> ntfs, ocfs2.
>
> >
> > I wonder if it makes sense to add a similar error return to the system
> > call implementation?
>
> Hm, blocking symlink mode changes is pretty regression prone. And just
> blocking it through one interface seems weird and makes things even more
> inconsistent.
>
> So two options I see:
> (1) minimally invasive:
> Filesystems that do call posix_acl_chmod() on symlinks need to be
> changed to stop doing that.
> (2) might hit us on the head invasive:
> Try and block symlink mode changes in chmod_common().
>
> Thoughts?
>

We have third option. We can choose not to call chmod_common and return an
error right away:

diff --git a/fs/open.c b/fs/open.c
index 39a7939f0d00..86a427a2a083 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -679,7 +679,9 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode, int l
retry:
error = user_path_at(dfd, filename, lookup_flags, &path);
if (!error) {
- error = chmod_common(&path, mode);
+ error = -EOPNOTSUPP;
+ if (!(flags & AT_SYMLINK_NOFOLLOW) || !S_ISLNK(path.dentry->d_inode->i_mode))
+ error = chmod_common(&path, mode);
path_put(&path);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;

It doesn't seem to be invasive.

--
Rgrds, legion


2023-07-25 12:27:16

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] Add a new fchmodat4() syscall

On Tue, Jul 25, 2023 at 01:05:40PM +0200, Alexey Gladkov wrote:
> On Tue, Jul 11, 2023 at 05:14:24PM +0200, Christian Brauner wrote:
> > On Tue, Jul 11, 2023 at 02:24:51PM +0200, Florian Weimer wrote:
> > > * Alexey Gladkov:
> > >
> > > > This patch set adds fchmodat4(), a new syscall. The actual
> > > > implementation is super simple: essentially it's just the same as
> > > > fchmodat(), but LOOKUP_FOLLOW is conditionally set based on the flags.
> > > > I've attempted to make this match "man 2 fchmodat" as closely as
> > > > possible, which says EINVAL is returned for invalid flags (as opposed to
> > > > ENOTSUPP, which is currently returned by glibc for AT_SYMLINK_NOFOLLOW).
> > > > I have a sketch of a glibc patch that I haven't even compiled yet, but
> > > > seems fairly straight-forward:
> > > >
> > > > diff --git a/sysdeps/unix/sysv/linux/fchmodat.c b/sysdeps/unix/sysv/linux/fchmodat.c
> > > > index 6d9cbc1ce9e0..b1beab76d56c 100644
> > > > --- a/sysdeps/unix/sysv/linux/fchmodat.c
> > > > +++ b/sysdeps/unix/sysv/linux/fchmodat.c
> > > > @@ -29,12 +29,36 @@
> > > > int
> > > > fchmodat (int fd, const char *file, mode_t mode, int flag)
> > > > {
> > > > - if (flag & ~AT_SYMLINK_NOFOLLOW)
> > > > - return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
> > > > -#ifndef __NR_lchmod /* Linux so far has no lchmod syscall. */
> > > > + /* There are four paths through this code:
> > > > + - The flags are zero. In this case it's fine to call fchmodat.
> > > > + - The flags are non-zero and glibc doesn't have access to
> > > > + __NR_fchmodat4. In this case all we can do is emulate the error codes
> > > > + defined by the glibc interface from userspace.
> > > > + - The flags are non-zero, glibc has __NR_fchmodat4, and the kernel has
> > > > + fchmodat4. This is the simplest case, as the fchmodat4 syscall exactly
> > > > + matches glibc's library interface so it can be called directly.
> > > > + - The flags are non-zero, glibc has __NR_fchmodat4, but the kernel does
> > >
> > > If you define __NR_fchmodat4 on all architectures, we can use these
> > > constants directly in glibc. We no longer depend on the UAPI
> > > definitions of those constants, to cut down the number of code variants,
> > > and to make glibc's system call profile independent of the kernel header
> > > version at build time.
> > >
> > > Your version is based on 2.31, more recent versions have some reasonable
> > > emulation for fchmodat based on /proc/self/fd. I even wrote a comment
> > > describing the same buggy behavior that you witnessed:
> > >
> > > + /* Some Linux versions with some file systems can actually
> > > + change symbolic link permissions via /proc, but this is not
> > > + intentional, and it gives inconsistent results (e.g., error
> > > + return despite mode change). The expected behavior is that
> > > + symbolic link modes cannot be changed at all, and this check
> > > + enforces that. */
> > > + if (S_ISLNK (st.st_mode))
> > > + {
> > > + __close_nocancel (pathfd);
> > > + __set_errno (EOPNOTSUPP);
> > > + return -1;
> > > + }
> > >
> > > I think there was some kernel discussion about that behavior before, but
> > > apparently, it hasn't led to fixes.
> >
> > I think I've explained this somewhere else a couple of months ago but
> > just in case you weren't on that thread or don't remember and apologies
> > if you should already know.
> >
> > A lot of filesystem will happily update the mode of a symlink. The VFS
> > doesn't do anything to prevent this from happening. This is filesystem
> > specific.
> >
> > The EOPNOTSUPP you're seeing very likely comes from POSIX ACLs.
> > Specifically it comes from filesystems that call posix_acl_chmod(),
> > e.g., btrfs via
> >
> > if (!err && attr->ia_valid & ATTR_MODE)
> > err = posix_acl_chmod(idmap, dentry, inode->i_mode);
> >
> > Most filesystems don't implement i_op->set_acl() for POSIX ACLs.
> > So posix_acl_chmod() will report EOPNOTSUPP. By the time
> > posix_acl_chmod() is called, most filesystems will have finished
> > updating the inode. POSIX ACLs also often aren't integrated into
> > transactions so a rollback wouldn't even be possible on some
> > filesystems.
> >
> > Any filesystem that doesn't implement POSIX ACLs at all will obviously
> > never fail unless it blocks mode changes on symlinks. Or filesystems
> > that do have a way to rollback failures from posix_acl_chmod(), or
> > filesystems that do return an error on chmod() on symlinks such as 9p,
> > ntfs, ocfs2.
> >
> > >
> > > I wonder if it makes sense to add a similar error return to the system
> > > call implementation?
> >
> > Hm, blocking symlink mode changes is pretty regression prone. And just
> > blocking it through one interface seems weird and makes things even more
> > inconsistent.
> >
> > So two options I see:
> > (1) minimally invasive:
> > Filesystems that do call posix_acl_chmod() on symlinks need to be
> > changed to stop doing that.
> > (2) might hit us on the head invasive:
> > Try and block symlink mode changes in chmod_common().
> >
> > Thoughts?
> >
>
> We have third option. We can choose not to call chmod_common and return an
> error right away:
>
> diff --git a/fs/open.c b/fs/open.c
> index 39a7939f0d00..86a427a2a083 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -679,7 +679,9 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode, int l
> retry:
> error = user_path_at(dfd, filename, lookup_flags, &path);
> if (!error) {
> - error = chmod_common(&path, mode);
> + error = -EOPNOTSUPP;
> + if (!(flags & AT_SYMLINK_NOFOLLOW) || !S_ISLNK(path.dentry->d_inode->i_mode))
> + error = chmod_common(&path, mode);
> path_put(&path);
> if (retry_estale(error, lookup_flags)) {
> lookup_flags |= LOOKUP_REVAL;
>
> It doesn't seem to be invasive.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=77b652535528770217186589d97261847f15f862

2023-07-25 16:22:28

by David Howells

[permalink] [raw]
Subject: Add fchmodat2() - or add a more general syscall?

Rather than adding a fchmodat2() syscall, should we add a "set_file_attrs()"
syscall that takes a mask and allows you to set a bunch of stuff all in one
go? Basically, an interface to notify_change() in the kernel that would allow
several stats to be set atomically. This might be of particular interest to
network filesystems.

David


2023-07-25 17:15:03

by Aleksa Sarai

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

On 2023-07-11, Alexey Gladkov <[email protected]> wrote:
> On the userspace side fchmodat(3) is implemented as a wrapper
> function which implements the POSIX-specified interface. This
> interface differs from the underlying kernel system call, which does not
> have a flags argument. Most implementations require procfs [1][2].
>
> There doesn't appear to be a good userspace workaround for this issue
> but the implementation in the kernel is pretty straight-forward.
>
> The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
> unlike existing fchmodat.
>
> [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
> [2] https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
>
> Co-developed-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Alexey Gladkov <[email protected]>
> Acked-by: Arnd Bergmann <[email protected]>
> ---
> fs/open.c | 18 ++++++++++++++----
> include/linux/syscalls.h | 2 ++
> 2 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/fs/open.c b/fs/open.c
> index 0c55c8e7f837..39a7939f0d00 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -671,11 +671,11 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
> return err;
> }
>
> -static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> +static int do_fchmodat(int dfd, const char __user *filename, umode_t mode, int lookup_flags)

I think it'd be much neater to do the conversion of AT_ flags here and
pass 0 as a flags argument for all of the wrappers (this is how most of
the other xyz(), fxyz(), fxyzat() syscall wrappers are done IIRC).

> {
> struct path path;
> int error;
> - unsigned int lookup_flags = LOOKUP_FOLLOW;
> +
> retry:
> error = user_path_at(dfd, filename, lookup_flags, &path);
> if (!error) {
> @@ -689,15 +689,25 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> return error;
> }
>
> +SYSCALL_DEFINE4(fchmodat2, int, dfd, const char __user *, filename,
> + umode_t, mode, int, flags)
> +{
> + if (unlikely(flags & ~AT_SYMLINK_NOFOLLOW))
> + return -EINVAL;

We almost certainly want to support AT_EMPTY_PATH at the same time.
Otherwise userspace will still need to go through /proc when trying to
chmod a file handle they have.

> +
> + return do_fchmodat(dfd, filename, mode,
> + flags & AT_SYMLINK_NOFOLLOW ? 0 : LOOKUP_FOLLOW);
> +}
> +
> SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename,
> umode_t, mode)
> {
> - return do_fchmodat(dfd, filename, mode);
> + return do_fchmodat(dfd, filename, mode, LOOKUP_FOLLOW);
> }
>
> SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
> {
> - return do_fchmodat(AT_FDCWD, filename, mode);
> + return do_fchmodat(AT_FDCWD, filename, mode, LOOKUP_FOLLOW);
> }
>
> /*
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 584f404bf868..6e852279fbc3 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -440,6 +440,8 @@ asmlinkage long sys_chroot(const char __user *filename);
> asmlinkage long sys_fchmod(unsigned int fd, umode_t mode);
> asmlinkage long sys_fchmodat(int dfd, const char __user *filename,
> umode_t mode);
> +asmlinkage long sys_fchmodat2(int dfd, const char __user *filename,
> + umode_t mode, int flags);
> asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user,
> gid_t group, int flag);
> asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group);
> --
> 2.33.8
>

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (3.86 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-25 17:15:49

by Aleksa Sarai

[permalink] [raw]
Subject: Re: Add fchmodat2() - or add a more general syscall?

On 2023-07-25, David Howells <[email protected]> wrote:
> Rather than adding a fchmodat2() syscall, should we add a "set_file_attrs()"
> syscall that takes a mask and allows you to set a bunch of stuff all in one
> go? Basically, an interface to notify_change() in the kernel that would allow
> several stats to be set atomically. This might be of particular interest to
> network filesystems.

Presumably looking something like statx(2) (except hopefully with
extensible structs this time :P)? I think that could also be useful, but
given this is a fairly straight-forward syscall addition (and it also
would resolve the AT_EMPTY_PATH issue for chmod as well as simplify the
glibc wrapper), I think it makes sense to take this and we can do
set_statx(2) separately?

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (888.00 B)
signature.asc (235.00 B)
Download all attachments

2023-07-25 17:27:56

by Aleksa Sarai

[permalink] [raw]
Subject: Re: [PATCH v4 3/5] arch: Register fchmodat2, usually as syscall 452

On 2023-07-11, Alexey Gladkov <[email protected]> wrote:
> From: Palmer Dabbelt <[email protected]>
>
> This registers the new fchmodat2 syscall in most places as nuber 452,
> with alpha being the exception where it's 562. I found all these sites
> by grepping for fspick, which I assume has found me everything.

Shouldn't this patch be squashed with the patch that adds the syscall?
At least, that's how I've usually seen it done...

> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Alexey Gladkov <[email protected]>
> ---
> arch/alpha/kernel/syscalls/syscall.tbl | 1 +
> arch/arm/tools/syscall.tbl | 1 +
> arch/arm64/include/asm/unistd.h | 2 +-
> arch/arm64/include/asm/unistd32.h | 2 ++
> arch/ia64/kernel/syscalls/syscall.tbl | 1 +
> arch/m68k/kernel/syscalls/syscall.tbl | 1 +
> arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
> arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
> arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
> arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
> arch/parisc/kernel/syscalls/syscall.tbl | 1 +
> arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
> arch/s390/kernel/syscalls/syscall.tbl | 1 +
> arch/sh/kernel/syscalls/syscall.tbl | 1 +
> arch/sparc/kernel/syscalls/syscall.tbl | 1 +
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
> include/uapi/asm-generic/unistd.h | 5 ++++-
> 19 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
> index 1f13995d00d7..ad37569d0507 100644
> --- a/arch/alpha/kernel/syscalls/syscall.tbl
> +++ b/arch/alpha/kernel/syscalls/syscall.tbl
> @@ -491,3 +491,4 @@
> 559 common futex_waitv sys_futex_waitv
> 560 common set_mempolicy_home_node sys_ni_syscall
> 561 common cachestat sys_cachestat
> +562 common fchmodat2 sys_fchmodat2
> diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> index 8ebed8a13874..c572d6c3dee0 100644
> --- a/arch/arm/tools/syscall.tbl
> +++ b/arch/arm/tools/syscall.tbl
> @@ -465,3 +465,4 @@
> 449 common futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 common cachestat sys_cachestat
> +452 common fchmodat2 sys_fchmodat2
> diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
> index 64a514f90131..bd77253b62e0 100644
> --- a/arch/arm64/include/asm/unistd.h
> +++ b/arch/arm64/include/asm/unistd.h
> @@ -39,7 +39,7 @@
> #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
> #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
>
> -#define __NR_compat_syscalls 452
> +#define __NR_compat_syscalls 453
> #endif
>
> #define __ARCH_WANT_SYS_CLONE
> diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
> index d952a28463e0..78b68311ec81 100644
> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -909,6 +909,8 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv)
> __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
> #define __NR_cachestat 451
> __SYSCALL(__NR_cachestat, sys_cachestat)
> +#define __NR_fchmodat2 452
> +__SYSCALL(__NR_fchmodat2, sys_fchmodat2)
>
> /*
> * Please add new compat syscalls above this comment and update
> diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
> index f8c74ffeeefb..83d8609aec03 100644
> --- a/arch/ia64/kernel/syscalls/syscall.tbl
> +++ b/arch/ia64/kernel/syscalls/syscall.tbl
> @@ -372,3 +372,4 @@
> 449 common futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 common cachestat sys_cachestat
> +452 common fchmodat2 sys_fchmodat2
> diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
> index 4f504783371f..259ceb125367 100644
> --- a/arch/m68k/kernel/syscalls/syscall.tbl
> +++ b/arch/m68k/kernel/syscalls/syscall.tbl
> @@ -451,3 +451,4 @@
> 449 common futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 common cachestat sys_cachestat
> +452 common fchmodat2 sys_fchmodat2
> diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
> index 858d22bf275c..a3798c2637fd 100644
> --- a/arch/microblaze/kernel/syscalls/syscall.tbl
> +++ b/arch/microblaze/kernel/syscalls/syscall.tbl
> @@ -457,3 +457,4 @@
> 449 common futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 common cachestat sys_cachestat
> +452 common fchmodat2 sys_fchmodat2
> diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
> index 1976317d4e8b..152034b8e0a0 100644
> --- a/arch/mips/kernel/syscalls/syscall_n32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
> @@ -390,3 +390,4 @@
> 449 n32 futex_waitv sys_futex_waitv
> 450 n32 set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 n32 cachestat sys_cachestat
> +452 n32 fchmodat2 sys_fchmodat2
> diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
> index cfda2511badf..cb5e757f6621 100644
> --- a/arch/mips/kernel/syscalls/syscall_n64.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
> @@ -366,3 +366,4 @@
> 449 n64 futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 n64 cachestat sys_cachestat
> +452 n64 fchmodat2 sys_fchmodat2
> diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
> index 7692234c3768..1a646813afdc 100644
> --- a/arch/mips/kernel/syscalls/syscall_o32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
> @@ -439,3 +439,4 @@
> 449 o32 futex_waitv sys_futex_waitv
> 450 o32 set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 o32 cachestat sys_cachestat
> +452 o32 fchmodat2 sys_fchmodat2
> diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
> index a0a9145b6dd4..e97c175b56f9 100644
> --- a/arch/parisc/kernel/syscalls/syscall.tbl
> +++ b/arch/parisc/kernel/syscalls/syscall.tbl
> @@ -450,3 +450,4 @@
> 449 common futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 common cachestat sys_cachestat
> +452 common fchmodat2 sys_fchmodat2
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
> index 8c0b08b7a80e..20e50586e8a2 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -538,3 +538,4 @@
> 449 common futex_waitv sys_futex_waitv
> 450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 common cachestat sys_cachestat
> +452 common fchmodat2 sys_fchmodat2
> diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
> index a6935af2235c..0122cc156952 100644
> --- a/arch/s390/kernel/syscalls/syscall.tbl
> +++ b/arch/s390/kernel/syscalls/syscall.tbl
> @@ -454,3 +454,4 @@
> 449 common futex_waitv sys_futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 common cachestat sys_cachestat sys_cachestat
> +452 common fchmodat2 sys_fchmodat2 sys_fchmodat2
> diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
> index 97377e8c5025..e90d585c4d3e 100644
> --- a/arch/sh/kernel/syscalls/syscall.tbl
> +++ b/arch/sh/kernel/syscalls/syscall.tbl
> @@ -454,3 +454,4 @@
> 449 common futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 common cachestat sys_cachestat
> +452 common fchmodat2 sys_fchmodat2
> diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
> index faa835f3c54a..4ed06c71c43f 100644
> --- a/arch/sparc/kernel/syscalls/syscall.tbl
> +++ b/arch/sparc/kernel/syscalls/syscall.tbl
> @@ -497,3 +497,4 @@
> 449 common futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 common cachestat sys_cachestat
> +452 common fchmodat2 sys_fchmodat2
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index bc0a3c941b35..2d0b1bd866ea 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -456,3 +456,4 @@
> 449 i386 futex_waitv sys_futex_waitv
> 450 i386 set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 i386 cachestat sys_cachestat
> +452 i386 fchmodat2 sys_fchmodat2
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 227538b0ce80..814768249eae 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -373,6 +373,7 @@
> 449 common futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 common cachestat sys_cachestat
> +452 common fchmodat2 sys_fchmodat2
>
> #
> # Due to a historical design error, certain syscalls are numbered differently
> diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
> index 2b69c3c035b6..fc1a4f3c81d9 100644
> --- a/arch/xtensa/kernel/syscalls/syscall.tbl
> +++ b/arch/xtensa/kernel/syscalls/syscall.tbl
> @@ -422,3 +422,4 @@
> 449 common futex_waitv sys_futex_waitv
> 450 common set_mempolicy_home_node sys_set_mempolicy_home_node
> 451 common cachestat sys_cachestat
> +452 common fchmodat2 sys_fchmodat2
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index fd6c1cb585db..abe087c53b4b 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -820,8 +820,11 @@ __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
> #define __NR_cachestat 451
> __SYSCALL(__NR_cachestat, sys_cachestat)
>
> +#define __NR_fchmodat2 452
> +__SYSCALL(__NR_fchmodat2, sys_fchmodat2)
> +
> #undef __NR_syscalls
> -#define __NR_syscalls 452
> +#define __NR_syscalls 453
>
> /*
> * 32 bit systems traditionally used different
> --
> 2.33.8
>

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (10.72 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-25 17:29:07

by Florian Weimer

[permalink] [raw]
Subject: Re: Add fchmodat2() - or add a more general syscall?

* David Howells:

> Rather than adding a fchmodat2() syscall, should we add a "set_file_attrs()"
> syscall that takes a mask and allows you to set a bunch of stuff all in one
> go? Basically, an interface to notify_change() in the kernel that would allow
> several stats to be set atomically. This might be of particular interest to
> network filesystems.

Do you mean atomically as in compare-and-swap (update only if old values
match), or just a way to update multiple file attributes with a single
system call?

Thanks,
Florian


2023-07-25 19:32:15

by Rich Felker

[permalink] [raw]
Subject: Re: Add fchmodat2() - or add a more general syscall?

On Tue, Jul 25, 2023 at 07:39:51PM +0100, David Howells wrote:
> Florian Weimer <[email protected]> wrote:
>
> > > Rather than adding a fchmodat2() syscall, should we add a
> > > "set_file_attrs()" syscall that takes a mask and allows you to set a bunch
> > > of stuff all in one go? Basically, an interface to notify_change() in the
> > > kernel that would allow several stats to be set atomically. This might be
> > > of particular interest to network filesystems.
> >
> > Do you mean atomically as in compare-and-swap (update only if old values
> > match), or just a way to update multiple file attributes with a single
> > system call?
>
> I was thinking more in terms of the latter. AFAIK, there aren't any network
> filesystems support a CAS interface on file attributes like that. To be able
> to do a CAS operation, we'd need to pass in the old values as well as the new.
>
> Another thing we could look at is doing "create_and_set_attrs()", possibly
> allowing it to take a list of xattrs also.

Can we please not let " hey let's invent a new interface to do
something that will be hard for underlying filesystems to even provide
and that nothing needs because there's no standard API to do it" be
the enemy of "fixing a known problem implementing an existing standard
API that just requires a simple, clearly-scoped syscall to do it"?

Rich

2023-07-25 19:36:45

by David Howells

[permalink] [raw]
Subject: Re: Add fchmodat2() - or add a more general syscall?

Florian Weimer <[email protected]> wrote:

> > Rather than adding a fchmodat2() syscall, should we add a
> > "set_file_attrs()" syscall that takes a mask and allows you to set a bunch
> > of stuff all in one go? Basically, an interface to notify_change() in the
> > kernel that would allow several stats to be set atomically. This might be
> > of particular interest to network filesystems.
>
> Do you mean atomically as in compare-and-swap (update only if old values
> match), or just a way to update multiple file attributes with a single
> system call?

I was thinking more in terms of the latter. AFAIK, there aren't any network
filesystems support a CAS interface on file attributes like that. To be able
to do a CAS operation, we'd need to pass in the old values as well as the new.

Another thing we could look at is doing "create_and_set_attrs()", possibly
allowing it to take a list of xattrs also.

David


2023-07-26 13:53:00

by Christian Brauner

[permalink] [raw]
Subject: Re: Add fchmodat2() - or add a more general syscall?

On Tue, Jul 25, 2023 at 07:39:51PM +0100, David Howells wrote:
> Florian Weimer <[email protected]> wrote:
>
> > > Rather than adding a fchmodat2() syscall, should we add a
> > > "set_file_attrs()" syscall that takes a mask and allows you to set a bunch
> > > of stuff all in one go? Basically, an interface to notify_change() in the

That system call would likely be blocked in seccomp sandboxes completely
as seccomp cannot filter structs. I don't consider this an argument to
block new good functionality in general as that would mean arbitrarily
limiting us but it is something to keep in mind. If there's additional
benefit other than just being able to set mutliple values at once then
yeah might be something to discuss.

> > > kernel that would allow several stats to be set atomically. This might be
> > > of particular interest to network filesystems.
> >
> > Do you mean atomically as in compare-and-swap (update only if old values
> > match), or just a way to update multiple file attributes with a single
> > system call?
>
> I was thinking more in terms of the latter. AFAIK, there aren't any network
> filesystems support a CAS interface on file attributes like that. To be able
> to do a CAS operation, we'd need to pass in the old values as well as the new.
>
> Another thing we could look at is doing "create_and_set_attrs()", possibly
> allowing it to take a list of xattrs also.

That would likely require variable sized pointers in a struct which is
something we really try to stay away from. I also think it's not a good
idea to lump xattrs toegether with generic file attributes. They should
remain a separate api imho.

2023-07-26 14:10:55

by Alexey Gladkov

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

On Wed, Jul 26, 2023 at 02:36:25AM +1000, Aleksa Sarai wrote:
> On 2023-07-11, Alexey Gladkov <[email protected]> wrote:
> > On the userspace side fchmodat(3) is implemented as a wrapper
> > function which implements the POSIX-specified interface. This
> > interface differs from the underlying kernel system call, which does not
> > have a flags argument. Most implementations require procfs [1][2].
> >
> > There doesn't appear to be a good userspace workaround for this issue
> > but the implementation in the kernel is pretty straight-forward.
> >
> > The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
> > unlike existing fchmodat.
> >
> > [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
> > [2] https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
> >
> > Co-developed-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Alexey Gladkov <[email protected]>
> > Acked-by: Arnd Bergmann <[email protected]>
> > ---
> > fs/open.c | 18 ++++++++++++++----
> > include/linux/syscalls.h | 2 ++
> > 2 files changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/open.c b/fs/open.c
> > index 0c55c8e7f837..39a7939f0d00 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -671,11 +671,11 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
> > return err;
> > }
> >
> > -static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> > +static int do_fchmodat(int dfd, const char __user *filename, umode_t mode, int lookup_flags)
>
> I think it'd be much neater to do the conversion of AT_ flags here and
> pass 0 as a flags argument for all of the wrappers (this is how most of
> the other xyz(), fxyz(), fxyzat() syscall wrappers are done IIRC).

I just addressed the Al Viro's suggestion.

https://lore.kernel.org/lkml/[email protected]/

> > {
> > struct path path;
> > int error;
> > - unsigned int lookup_flags = LOOKUP_FOLLOW;
> > +
> > retry:
> > error = user_path_at(dfd, filename, lookup_flags, &path);
> > if (!error) {
> > @@ -689,15 +689,25 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> > return error;
> > }
> >
> > +SYSCALL_DEFINE4(fchmodat2, int, dfd, const char __user *, filename,
> > + umode_t, mode, int, flags)
> > +{
> > + if (unlikely(flags & ~AT_SYMLINK_NOFOLLOW))
> > + return -EINVAL;
>
> We almost certainly want to support AT_EMPTY_PATH at the same time.
> Otherwise userspace will still need to go through /proc when trying to
> chmod a file handle they have.

I'm not sure I understand. Can you explain what you mean?

--
Rgrds, legion


2023-07-26 18:34:40

by Rich Felker

[permalink] [raw]
Subject: Re: [PATCH v4 0/5] Add a new fchmodat2() syscall

On Tue, Jul 11, 2023 at 06:16:02PM +0200, Alexey Gladkov wrote:
> In glibc, the fchmodat(3) function has a flags argument according to the
> POSIX specification [1], but kernel syscalls has no such argument.
> Therefore, libc implementations do workarounds using /proc. However,
> this requires procfs to be mounted and accessible.
>
> This patch set adds fchmodat2(), a new syscall. The syscall allows to
> pass the AT_SYMLINK_NOFOLLOW flag to disable LOOKUP_FOLLOW. In all other
> respects, this syscall is no different from fchmodat().
>
> [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/chmod.html
>
> Changes since v3 [[email protected]]:
>
> * Rebased to master because a new syscall has appeared in master.
> * Increased __NR_compat_syscalls as pointed out by Arnd Bergmann.
> * Syscall renamed fchmodat4 -> fchmodat2 as suggested by Christian Brauner.
> * Returned do_fchmodat4() the original name. We don't need to version
> internal functions.
> * Fixed warnings found by checkpatch.pl.
>
> Changes since v2 [[email protected]]:
>
> * Rebased to master.
> * The lookup_flags passed to sys_fchmodat4 as suggested by Al Viro.
> * Selftest added.
>
> Changes since v1 [[email protected]]:
>
> * All architectures are now supported, which support squashed into a
> single patch.
> * The do_fchmodat() helper function has been removed, in favor of directly
> calling do_fchmodat4().
> * The patches are based on 5.2 instead of 5.1.

It's good to see this moving forward. I originally proposed this in a
patch submitted in 2020. I suspect implementation details have changed
since then, but it might make sense to look back at that discussion if
nobody has done so yet (apologies if this was already done and I
missed it) to make sure nothing is overlooked -- I remember there were
some subtleties with what fs backends might try to do with chmod on
symlinks. My proposed commit message also documented a lot of the
history of the issue that might be useful to have as context.

https://lore.kernel.org/all/[email protected]/T/

Rich

2023-07-27 04:15:26

by Eric Biggers

[permalink] [raw]
Subject: Re: Add fchmodat2() - or add a more general syscall?

On Tue, Jul 25, 2023 at 04:58:34PM +0100, David Howells wrote:
> Rather than adding a fchmodat2() syscall, should we add a "set_file_attrs()"
> syscall that takes a mask and allows you to set a bunch of stuff all in one
> go? Basically, an interface to notify_change() in the kernel that would allow
> several stats to be set atomically. This might be of particular interest to
> network filesystems.
>
> David
>

fchmodat2() is a simple addition that fits well with the existing syscalls.
It fixes an oversight in fchmodat().

IMO we should just add fchmodat2(), and not get sidetracked by trying to add
some super-generalized syscall instead. That can always be done later.

- Eric

2023-07-27 09:26:29

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v4 2/5] fs: Add fchmodat2()

From: Aleksa Sarai
> Sent: 25 July 2023 17:36
...
> We almost certainly want to support AT_EMPTY_PATH at the same time.
> Otherwise userspace will still need to go through /proc when trying to
> chmod a file handle they have.

That can't be allowed.

Just because a process has a file open and write access to
the directory that contains it doesn't mean they are allowed
to change the file permissions.

They also need directory search access from a directory
they have open through to the containing directory.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2023-07-27 10:32:08

by Christian Brauner

[permalink] [raw]
Subject: Re: Add fchmodat2() - or add a more general syscall?

On Wed, Jul 26, 2023 at 08:57:10PM -0700, Eric Biggers wrote:
> On Tue, Jul 25, 2023 at 04:58:34PM +0100, David Howells wrote:
> > Rather than adding a fchmodat2() syscall, should we add a "set_file_attrs()"
> > syscall that takes a mask and allows you to set a bunch of stuff all in one
> > go? Basically, an interface to notify_change() in the kernel that would allow
> > several stats to be set atomically. This might be of particular interest to
> > network filesystems.
> >
> > David
> >
>
> fchmodat2() is a simple addition that fits well with the existing syscalls.
> It fixes an oversight in fchmodat().
>
> IMO we should just add fchmodat2(), and not get sidetracked by trying to add
> some super-generalized syscall instead. That can always be done later.

Agreed.

2023-07-27 10:39:55

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

> > I think it'd be much neater to do the conversion of AT_ flags here and
> > pass 0 as a flags argument for all of the wrappers (this is how most of
> > the other xyz(), fxyz(), fxyzat() syscall wrappers are done IIRC).

I've fixed that up in-tree.

2023-07-27 11:04:34

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 3/5] arch: Register fchmodat2, usually as syscall 452

On Wed, Jul 26, 2023 at 02:43:41AM +1000, Aleksa Sarai wrote:
> On 2023-07-11, Alexey Gladkov <[email protected]> wrote:
> > From: Palmer Dabbelt <[email protected]>
> >
> > This registers the new fchmodat2 syscall in most places as nuber 452,
> > with alpha being the exception where it's 562. I found all these sites
> > by grepping for fspick, which I assume has found me everything.
>
> Shouldn't this patch be squashed with the patch that adds the syscall?
> At least, that's how I've usually seen it done...

Depends. Iirc, someone said they'd prefer for doing it in one patch
in some circumstances on some system call we added years ago. But otoh,
having the syscall wiring done separately makes it easy for arch
maintainers to ack only the wiring up part. Both ways are valid imho.
(cachestat() did it for x86 and then all the others separately. So
really it seems a bit all over the place depending on the scenario.)

2023-07-27 17:19:12

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

On Thu, Jul 27, 2023 at 06:28:53PM +0200, Andreas Schwab wrote:
> On Jul 27 2023, David Laight wrote:
>
> > From: Aleksa Sarai
> >> Sent: 25 July 2023 17:36
> > ...
> >> We almost certainly want to support AT_EMPTY_PATH at the same time.
> >> Otherwise userspace will still need to go through /proc when trying to
> >> chmod a file handle they have.
> >
> > That can't be allowed.
>
> IIUC, fchmodat2(fd, "", m, AT_EMPTY_PATH) is equivalent to fchmod(fd,
> m). With that, new architectures only need to implement the fchmodat2
> syscall to cover all chmod variants.

There's a difference though as fchmod() doesn't work with O_PATH file
descriptors while AT_EMPTY_PATH does. Similar to how fchown() doesn't
work with O_PATH file descriptors.

However, we do allow AT_EMPTY_PATH with fchownat() so there's no reason
to not allow it for fchmodat2().

But it's a bit of a shame that O_PATH looks less and less like O_PATH.
It came from can-do-barely-anything to can-do-quite-a-lot-of-things over
the years.

In any case, AT_EMPTY_PATH for fchmodat2() can be an additional patch on
top.

2023-07-27 17:19:49

by Rich Felker

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

On Thu, Jul 27, 2023 at 09:01:06AM +0000, David Laight wrote:
> From: Aleksa Sarai
> > Sent: 25 July 2023 17:36
> ....
> > We almost certainly want to support AT_EMPTY_PATH at the same time.
> > Otherwise userspace will still need to go through /proc when trying to
> > chmod a file handle they have.
>
> That can't be allowed.
>
> Just because a process has a file open and write access to
> the directory that contains it doesn't mean they are allowed
> to change the file permissions.
>
> They also need directory search access from a directory
> they have open through to the containing directory.

Am I missing something? How is this different from fchmod?

Rich

2023-07-27 17:30:00

by Andreas Schwab

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

On Jul 27 2023, David Laight wrote:

> From: Aleksa Sarai
>> Sent: 25 July 2023 17:36
> ...
>> We almost certainly want to support AT_EMPTY_PATH at the same time.
>> Otherwise userspace will still need to go through /proc when trying to
>> chmod a file handle they have.
>
> That can't be allowed.

IIUC, fchmodat2(fd, "", m, AT_EMPTY_PATH) is equivalent to fchmod(fd,
m). With that, new architectures only need to implement the fchmodat2
syscall to cover all chmod variants.

--
Andreas Schwab, [email protected]
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."

2023-07-27 17:30:36

by Aleksa Sarai

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

On 2023-07-26, Alexey Gladkov <[email protected]> wrote:
> On Wed, Jul 26, 2023 at 02:36:25AM +1000, Aleksa Sarai wrote:
> > On 2023-07-11, Alexey Gladkov <[email protected]> wrote:
> > > On the userspace side fchmodat(3) is implemented as a wrapper
> > > function which implements the POSIX-specified interface. This
> > > interface differs from the underlying kernel system call, which does not
> > > have a flags argument. Most implementations require procfs [1][2].
> > >
> > > There doesn't appear to be a good userspace workaround for this issue
> > > but the implementation in the kernel is pretty straight-forward.
> > >
> > > The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
> > > unlike existing fchmodat.
> > >
> > > [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
> > > [2] https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
> > >
> > > Co-developed-by: Palmer Dabbelt <[email protected]>
> > > Signed-off-by: Palmer Dabbelt <[email protected]>
> > > Signed-off-by: Alexey Gladkov <[email protected]>
> > > Acked-by: Arnd Bergmann <[email protected]>
> > > ---
> > > fs/open.c | 18 ++++++++++++++----
> > > include/linux/syscalls.h | 2 ++
> > > 2 files changed, 16 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/open.c b/fs/open.c
> > > index 0c55c8e7f837..39a7939f0d00 100644
> > > --- a/fs/open.c
> > > +++ b/fs/open.c
> > > @@ -671,11 +671,11 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
> > > return err;
> > > }
> > >
> > > -static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> > > +static int do_fchmodat(int dfd, const char __user *filename, umode_t mode, int lookup_flags)
> >
> > I think it'd be much neater to do the conversion of AT_ flags here and
> > pass 0 as a flags argument for all of the wrappers (this is how most of
> > the other xyz(), fxyz(), fxyzat() syscall wrappers are done IIRC).
>
> I just addressed the Al Viro's suggestion.
>
> https://lore.kernel.org/lkml/[email protected]/

I think Al misspoke, because he also said "pass it 0 as an extra
argument", but you actually have to pass LOOKUP_FOLLOW from the
wrappers. If you look at how faccessat2 and faccessat are implemented,
it follows the behaviour I described.

> > > {
> > > struct path path;
> > > int error;
> > > - unsigned int lookup_flags = LOOKUP_FOLLOW;
> > > +
> > > retry:
> > > error = user_path_at(dfd, filename, lookup_flags, &path);
> > > if (!error) {
> > > @@ -689,15 +689,25 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> > > return error;
> > > }
> > >
> > > +SYSCALL_DEFINE4(fchmodat2, int, dfd, const char __user *, filename,
> > > + umode_t, mode, int, flags)
> > > +{
> > > + if (unlikely(flags & ~AT_SYMLINK_NOFOLLOW))
> > > + return -EINVAL;
> >
> > We almost certainly want to support AT_EMPTY_PATH at the same time.
> > Otherwise userspace will still need to go through /proc when trying to
> > chmod a file handle they have.
>
> I'm not sure I understand. Can you explain what you mean?

You should add support for AT_EMPTY_PATH (LOOKUP_EMPTY) as well as
AT_SYMLINK_NOFOLLOW. It would only require something like:

unsigned int lookup_flags = LOOKUP_FOLLOW;

if (flags & ~(AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW))
return -EINVAL;

if (flags & AT_EMPTY_PATH)
lookup_flags |= LOOKUP_EMPTY;
if (flags & AT_SYMLINK_NOFOLLOW)
lookup_flags &= ~LOOKUP_FOLLOW;

/* ... */

This would be effectively equivalent to fchmod(fd, mode). (I was wrong
when I said this wasn't already possible -- I forgot about fchmod(2).)

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (3.91 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-27 17:33:47

by Rich Felker

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

On Thu, Jul 27, 2023 at 07:02:53PM +0200, Christian Brauner wrote:
> On Thu, Jul 27, 2023 at 06:28:53PM +0200, Andreas Schwab wrote:
> > On Jul 27 2023, David Laight wrote:
> >
> > > From: Aleksa Sarai
> > >> Sent: 25 July 2023 17:36
> > > ...
> > >> We almost certainly want to support AT_EMPTY_PATH at the same time.
> > >> Otherwise userspace will still need to go through /proc when trying to
> > >> chmod a file handle they have.
> > >
> > > That can't be allowed.
> >
> > IIUC, fchmodat2(fd, "", m, AT_EMPTY_PATH) is equivalent to fchmod(fd,
> > m). With that, new architectures only need to implement the fchmodat2
> > syscall to cover all chmod variants.
>
> There's a difference though as fchmod() doesn't work with O_PATH file
> descriptors while AT_EMPTY_PATH does. Similar to how fchown() doesn't
> work with O_PATH file descriptors.
>
> However, we do allow AT_EMPTY_PATH with fchownat() so there's no reason
> to not allow it for fchmodat2().
>
> But it's a bit of a shame that O_PATH looks less and less like O_PATH.
> It came from can-do-barely-anything to can-do-quite-a-lot-of-things over
> the years.
>
> In any case, AT_EMPTY_PATH for fchmodat2() can be an additional patch on
> top.

From a standpoint of implementing O_SEARCH/O_EXEC using it, I don't
see any reason fchown/fchmod should not work on O_PATH file
descriptors. And indeed when you have procfs available to emulate them
via procfs, it already does. So I don't see this as unwanted
functionality or an access control regression. I see it as things
behaving as expected.

Semantically, O_PATH is a reference to the inode, not to the dirent.
So there is no reason you should not be able to do things that need
permission to the inode (changing permissions on it) rather than to
the dirent (renaming/moving).

Rich

2023-07-27 17:43:38

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

On Thu, Jul 27, 2023 at 01:13:37PM -0400, [email protected] wrote:
> On Thu, Jul 27, 2023 at 07:02:53PM +0200, Christian Brauner wrote:
> > On Thu, Jul 27, 2023 at 06:28:53PM +0200, Andreas Schwab wrote:
> > > On Jul 27 2023, David Laight wrote:
> > >
> > > > From: Aleksa Sarai
> > > >> Sent: 25 July 2023 17:36
> > > > ...
> > > >> We almost certainly want to support AT_EMPTY_PATH at the same time.
> > > >> Otherwise userspace will still need to go through /proc when trying to
> > > >> chmod a file handle they have.
> > > >
> > > > That can't be allowed.
> > >
> > > IIUC, fchmodat2(fd, "", m, AT_EMPTY_PATH) is equivalent to fchmod(fd,
> > > m). With that, new architectures only need to implement the fchmodat2
> > > syscall to cover all chmod variants.
> >
> > There's a difference though as fchmod() doesn't work with O_PATH file
> > descriptors while AT_EMPTY_PATH does. Similar to how fchown() doesn't
> > work with O_PATH file descriptors.
> >
> > However, we do allow AT_EMPTY_PATH with fchownat() so there's no reason
> > to not allow it for fchmodat2().
> >
> > But it's a bit of a shame that O_PATH looks less and less like O_PATH.
> > It came from can-do-barely-anything to can-do-quite-a-lot-of-things over
> > the years.
> >
> > In any case, AT_EMPTY_PATH for fchmodat2() can be an additional patch on
> > top.
>
> From a standpoint of implementing O_SEARCH/O_EXEC using it, I don't
> see any reason fchown/fchmod should not work on O_PATH file
> descriptors. And indeed when you have procfs available to emulate them
> via procfs, it already does. So I don't see this as unwanted

I'm really not talking about the fact that proc is a giant loophole for
basically everyhing related to O_PATH and reopening fds.

I'm saying that both fchmod() and fchown() don't work on O_PATH fds.
They explicitly reject them.

AT_EMPTY_PATH and therefore O_PATH for fchmodat2() is fine given that we
do it for fchownat() already.

2023-07-27 18:04:02

by Aleksa Sarai

[permalink] [raw]
Subject: Re: [PATCH v4 3/5] arch: Register fchmodat2, usually as syscall 452

On 2023-07-27, Christian Brauner <[email protected]> wrote:
> On Wed, Jul 26, 2023 at 02:43:41AM +1000, Aleksa Sarai wrote:
> > On 2023-07-11, Alexey Gladkov <[email protected]> wrote:
> > > From: Palmer Dabbelt <[email protected]>
> > >
> > > This registers the new fchmodat2 syscall in most places as nuber 452,
> > > with alpha being the exception where it's 562. I found all these sites
> > > by grepping for fspick, which I assume has found me everything.
> >
> > Shouldn't this patch be squashed with the patch that adds the syscall?
> > At least, that's how I've usually seen it done...
>
> Depends. Iirc, someone said they'd prefer for doing it in one patch
> in some circumstances on some system call we added years ago. But otoh,
> having the syscall wiring done separately makes it easy for arch
> maintainers to ack only the wiring up part. Both ways are valid imho.
> (cachestat() did it for x86 and then all the others separately. So
> really it seems a bit all over the place depending on the scenario.)

Fair enough!

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (1.13 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-27 18:33:13

by Aleksa Sarai

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

On 2023-07-28, Aleksa Sarai <[email protected]> wrote:
> On 2023-07-26, Alexey Gladkov <[email protected]> wrote:
> > On Wed, Jul 26, 2023 at 02:36:25AM +1000, Aleksa Sarai wrote:
> > > On 2023-07-11, Alexey Gladkov <[email protected]> wrote:
> > > > On the userspace side fchmodat(3) is implemented as a wrapper
> > > > function which implements the POSIX-specified interface. This
> > > > interface differs from the underlying kernel system call, which does not
> > > > have a flags argument. Most implementations require procfs [1][2].
> > > >
> > > > There doesn't appear to be a good userspace workaround for this issue
> > > > but the implementation in the kernel is pretty straight-forward.
> > > >
> > > > The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
> > > > unlike existing fchmodat.
> > > >
> > > > [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
> > > > [2] https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
> > > >
> > > > Co-developed-by: Palmer Dabbelt <[email protected]>
> > > > Signed-off-by: Palmer Dabbelt <[email protected]>
> > > > Signed-off-by: Alexey Gladkov <[email protected]>
> > > > Acked-by: Arnd Bergmann <[email protected]>
> > > > ---
> > > > fs/open.c | 18 ++++++++++++++----
> > > > include/linux/syscalls.h | 2 ++
> > > > 2 files changed, 16 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/fs/open.c b/fs/open.c
> > > > index 0c55c8e7f837..39a7939f0d00 100644
> > > > --- a/fs/open.c
> > > > +++ b/fs/open.c
> > > > @@ -671,11 +671,11 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
> > > > return err;
> > > > }
> > > >
> > > > -static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> > > > +static int do_fchmodat(int dfd, const char __user *filename, umode_t mode, int lookup_flags)
> > >
> > > I think it'd be much neater to do the conversion of AT_ flags here and
> > > pass 0 as a flags argument for all of the wrappers (this is how most of
> > > the other xyz(), fxyz(), fxyzat() syscall wrappers are done IIRC).
> >
> > I just addressed the Al Viro's suggestion.
> >
> > https://lore.kernel.org/lkml/[email protected]/
>
> I think Al misspoke, because he also said "pass it 0 as an extra
> argument", but you actually have to pass LOOKUP_FOLLOW from the
> wrappers. If you look at how faccessat2 and faccessat are implemented,
> it follows the behaviour I described.
>
> > > > {
> > > > struct path path;
> > > > int error;
> > > > - unsigned int lookup_flags = LOOKUP_FOLLOW;
> > > > +
> > > > retry:
> > > > error = user_path_at(dfd, filename, lookup_flags, &path);
> > > > if (!error) {
> > > > @@ -689,15 +689,25 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode)
> > > > return error;
> > > > }
> > > >
> > > > +SYSCALL_DEFINE4(fchmodat2, int, dfd, const char __user *, filename,
> > > > + umode_t, mode, int, flags)
> > > > +{
> > > > + if (unlikely(flags & ~AT_SYMLINK_NOFOLLOW))
> > > > + return -EINVAL;
> > >
> > > We almost certainly want to support AT_EMPTY_PATH at the same time.
> > > Otherwise userspace will still need to go through /proc when trying to
> > > chmod a file handle they have.
> >
> > I'm not sure I understand. Can you explain what you mean?
>
> You should add support for AT_EMPTY_PATH (LOOKUP_EMPTY) as well as
> AT_SYMLINK_NOFOLLOW. It would only require something like:
>
> unsigned int lookup_flags = LOOKUP_FOLLOW;
>
> if (flags & ~(AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW))
> return -EINVAL;
>
> if (flags & AT_EMPTY_PATH)
> lookup_flags |= LOOKUP_EMPTY;
> if (flags & AT_SYMLINK_NOFOLLOW)
> lookup_flags &= ~LOOKUP_FOLLOW;
>
> /* ... */
>
> This would be effectively equivalent to fchmod(fd, mode). (I was wrong
> when I said this wasn't already possible -- I forgot about fchmod(2).)

... with the exception (as Christian mentioned) of O_PATH descriptors.
However, there are two counter-points to this:

* fchownat(AT_EMPTY_PATH) exists but fchown() doesn't work on O_PATH
descriptors *by design* (according to open(2)).
* chmod(/proc/self/fd/$n) works on O_PATH descriptors, meaning this
behaviour is already allowed and all that AT_EMPTY_PATH would do is
allow programs to avoid depending on procfs for this.

FWIW, I agree with Christian that these behaviours are not ideal (and
I'm working on a series that might allow for these things to be properly
blocked in the future) but there's also the consistency argument -- I
don't think fchownat() is much safer to allow in this way than
fchmodat() and (again) this behaviour is already possible through
procfs.

Ultimately, we can always add AT_EMPTY_PATH later. It just seemed like
an obvious omission to me that would be easy to resolve.

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (5.05 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-28 09:49:37

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v4 2/5] fs: Add fchmodat2()

...
> FWIW, I agree with Christian that these behaviours are not ideal (and
> I'm working on a series that might allow for these things to be properly
> blocked in the future) but there's also the consistency argument -- I
> don't think fchownat() is much safer to allow in this way than
> fchmodat() and (again) this behaviour is already possible through
> procfs.

If the 'through procfs' involves readlink("/proc/self/fd/n") and
accessing through the returned path then the permission checks
are different.
Using the returned path requires search permissions on all the
directories.

This is all fine for xxxat() functions where a real open
directory fd is supplied.
But other cases definitely need a lot of thought to ensure
they don't let programs acquire permissions they aren't
supposed to have.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2023-07-28 19:56:37

by Rich Felker

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] fs: Add fchmodat2()

On Fri, Jul 28, 2023 at 08:43:58AM +0000, David Laight wrote:
> ....
> > FWIW, I agree with Christian that these behaviours are not ideal (and
> > I'm working on a series that might allow for these things to be properly
> > blocked in the future) but there's also the consistency argument -- I
> > don't think fchownat() is much safer to allow in this way than
> > fchmodat() and (again) this behaviour is already possible through
> > procfs.
>
> If the 'through procfs' involves readlink("/proc/self/fd/n") and
> accessing through the returned path then the permission checks
> are different.
> Using the returned path requires search permissions on all the
> directories.

That's *not* how "through procfs" works. The "magic symlinks" in
/proc/*/fd are not actual symlinks that get dereferenced to the
contents they readlink() to, but special-type objects that dereference
directly to the underlying file associated with the open file
description.

Rich