From: Alistair Francis <[email protected]>
This commit adds two futex syscall wrappers that are exposed to
userspace.
Neither the kernel or glibc currently expose a futex wrapper, so
userspace is left performing raw syscalls. This has mostly been becuase
the overloading of one of the arguments makes it impossible to provide a
single type safe function.
Until recently the single syscall has worked fine. With the introduction
of a 64-bit time_t futex call on 32-bit architectures, this has become
more complex. The logic of handling the two possible futex syscalls is
complex and often implemented incorrectly.
This patch adds two futux syscall functions that correctly handle the
time_t complexity for userspace.
This idea is based on previous discussions: https://lkml.org/lkml/2021/9/21/143
Signed-off-by: Alistair Francis <[email protected]>
---
include/uapi/linux/futex_syscall.h | 79 ++++++++++++++++++++++++++++++
1 file changed, 79 insertions(+)
create mode 100644 include/uapi/linux/futex_syscall.h
diff --git a/include/uapi/linux/futex_syscall.h b/include/uapi/linux/futex_syscall.h
new file mode 100644
index 0000000000000..039d371346159
--- /dev/null
+++ b/include/uapi/linux/futex_syscall.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_FUTEX_SYSCALL_H
+#define _UAPI_LINUX_FUTEX_SYSCALL_H
+
+#include <errno.h>
+#include <linux/types.h>
+#include <linux/time_types.h>
+
+/**
+ * futex_syscall_timeout() - __NR_futex/__NR_futex_time64 syscall wrapper
+ * @uaddr: address of first futex
+ * @op: futex op code
+ * @val: typically expected value of uaddr, but varies by op
+ * @timeout: an absolute struct timespec
+ * @uaddr2: address of second futex for some ops
+ * @val3: varies by op
+ */
+static inline int
+futex_syscall_timeout(volatile u_int32_t *uaddr, int op, u_int32_t val,
+ struct timespec *timeout, volatile u_int32_t *uaddr2, int val3)
+{
+#if defined(__NR_futex_time64)
+ if (sizeof(*timeout) != sizeof(struct __kernel_old_timespec)) {
+ int ret = syscall(__NR_futex_time64, uaddr, op, val, timeout, uaddr2, val3);
+
+ if (ret == 0 || errno != ENOSYS)
+ return ret;
+ }
+#endif
+
+#if defined(__NR_futex)
+ if (sizeof(*timeout) == sizeof(struct __kernel_old_timespec))
+ return syscall(__NR_futex, uaddr, op, val, timeout, uaddr2, val3);
+
+ if (timeout && timeout->tv_sec == (long)timeout->tv_sec) {
+ struct __kernel_old_timespec ts32;
+
+ ts32.tv_sec = (__kernel_long_t) timeout->tv_sec;
+ ts32.tv_nsec = (__kernel_long_t) timeout->tv_nsec;
+
+ return syscall(__NR_futex, uaddr, op, val, ts32, uaddr2, val3);
+ } else if (!timeout) {
+ return syscall(__NR_futex, uaddr, op, val, NULL, uaddr2, val3);
+ }
+#endif
+
+ errno = ENOSYS;
+ return -1;
+}
+
+/**
+ * futex_syscall_nr_requeue() - __NR_futex/__NR_futex_time64 syscall wrapper
+ * @uaddr: address of first futex
+ * @op: futex op code
+ * @val: typically expected value of uaddr, but varies by op
+ * @nr_requeue: an op specific meaning
+ * @uaddr2: address of second futex for some ops
+ * @val3: varies by op
+ */
+static inline int
+futex_syscall_nr_requeue(volatile u_int32_t *uaddr, int op, u_int32_t val,
+ u_int32_t nr_requeue, volatile u_int32_t *uaddr2, int val3)
+{
+#if defined(__NR_futex_time64)
+ int ret = syscall(__NR_futex_time64, uaddr, op, val, nr_requeue, uaddr2, val3);
+
+ if (ret == 0 || errno != ENOSYS)
+ return ret;
+#endif
+
+#if defined(__NR_futex)
+ return syscall(__NR_futex, uaddr, op, val, nr_requeue, uaddr2, val3);
+#endif
+
+ errno = ENOSYS;
+ return -1;
+}
+
+#endif /* _UAPI_LINUX_FUTEX_SYSCALL_H */
--
2.31.1
On Fri, Oct 15, 2021 at 2:59 AM Alistair Francis
<[email protected]> wrote:
>
> From: Alistair Francis <[email protected]>
>
> This commit adds two futex syscall wrappers that are exposed to
> userspace.
>
> Neither the kernel or glibc currently expose a futex wrapper, so
> userspace is left performing raw syscalls. This has mostly been becuase
> the overloading of one of the arguments makes it impossible to provide a
> single type safe function.
>
> Until recently the single syscall has worked fine. With the introduction
> of a 64-bit time_t futex call on 32-bit architectures, this has become
> more complex. The logic of handling the two possible futex syscalls is
> complex and often implemented incorrectly.
>
> This patch adds two futux syscall functions that correctly handle the
> time_t complexity for userspace.
>
> This idea is based on previous discussions: https://lkml.org/lkml/2021/9/21/143
>
> Signed-off-by: Alistair Francis <[email protected]>
Thanks a lot for taking this on!
The implementation looks good to me, I've just noted some details below.
The big question here is the exact one that has kept the wrappers out of
glibc and musl for now: What should the function prototype(s) be?
I see you went with having two distinct function names for the two
sets of argument types, which I think was what the glibc developers
were leaning towards.
I think it would be nicer to use a transparent union like
typedef union __attribute__((__transparent_inion__)) {
struct timespec *timeout;
u_int32_t nr_requeue;
} __futex_arg4;
which would let us provide a single function for both variants.
The main downside is that this relies on a GCC extension, but I
don't expect that to be a problem since any code using this is
already nonportable.
The other question here is what namespace to use. You went for the
non-prefixed futex_syscall, which is clearly nicer to use in an application,
but it might clash with libc providing the same function name in the
future. We could consider using __kernel_futex() as the function name,
putting it into a private namespace for the kernel headers that could
then get wrapped again by an application or libc using it.
> +#include <errno.h>
> +#include <linux/types.h>
> +#include <linux/time_types.h>
I wonder if this should also #include <asm/unistd.h> and
<sys/syscall.h>, or if you
can leave that to the caller to include either <asm/unistd.h.> or <unistd.h>.
> +/**
> + * futex_syscall_timeout() - __NR_futex/__NR_futex_time64 syscall wrapper
> + * @uaddr: address of first futex
> + * @op: futex op code
> + * @val: typically expected value of uaddr, but varies by op
> + * @timeout: an absolute struct timespec
> + * @uaddr2: address of second futex for some ops
> + * @val3: varies by op
> + */
> +static inline int
> +futex_syscall_timeout(volatile u_int32_t *uaddr, int op, u_int32_t val,
> + struct timespec *timeout, volatile u_int32_t *uaddr2, int val3)
> +{
> +#if defined(__NR_futex_time64)
> + if (sizeof(*timeout) != sizeof(struct __kernel_old_timespec)) {
> + int ret = syscall(__NR_futex_time64, uaddr, op, val, timeout, uaddr2, val3);
> +
> + if (ret == 0 || errno != ENOSYS)
> + return ret;
> + }
> +#endif
> +
> +#if defined(__NR_futex)
> + if (sizeof(*timeout) == sizeof(struct __kernel_old_timespec))
> + return syscall(__NR_futex, uaddr, op, val, timeout, uaddr2, val3);
> +
> + if (timeout && timeout->tv_sec == (long)timeout->tv_sec) {
Not sure if we actually need to reject timeouts that overflow, should probably
check what glibc does for this case. Intuitively, I'd go with setting
tv_sec=LONG_MAX.
> + struct __kernel_old_timespec ts32;
> +
> + ts32.tv_sec = (__kernel_long_t) timeout->tv_sec;
> + ts32.tv_nsec = (__kernel_long_t) timeout->tv_nsec;
> +
> + return syscall(__NR_futex, uaddr, op, val, ts32, uaddr2, val3);
You need to pass 'ts32' by reference here.
Arnd
Hi Alistair,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linux/master]
[also build test WARNING on soc/for-next linus/master v5.15-rc5 next-20211013]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Alistair-Francis/uapi-futex-Add-a-futex-syscall/20211015-090054
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 5816b3e6577eaa676ceb00a848f0fd65fe2adc29
config: x86_64-randconfig-r015-20211014 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project acb3b187c4c88650a6a717a1bcb234d27d0d7f54)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/38a003698424c55ceb42e9be69f47014a6b4f6bd
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Alistair-Francis/uapi-futex-Add-a-futex-syscall/20211015-090054
git checkout 38a003698424c55ceb42e9be69f47014a6b4f6bd
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All warnings (new ones prefixed by >>):
In file included from <built-in>:1:
./usr/include/linux/futex_syscall.h:19:36: error: unknown type name 'u_int32_t'; did you mean '__int128_t'?
futex_syscall_timeout(__volatile__ u_int32_t *uaddr, int op, u_int32_t val,
^~~~~~~~~
__int128_t
note: '__int128_t' declared here
./usr/include/linux/futex_syscall.h:19:62: error: unknown type name 'u_int32_t'; did you mean '__int128_t'?
futex_syscall_timeout(__volatile__ u_int32_t *uaddr, int op, u_int32_t val,
^~~~~~~~~
__int128_t
note: '__int128_t' declared here
>> ./usr/include/linux/futex_syscall.h:20:16: warning: declaration of 'struct timespec' will not be visible outside of this function [-Wvisibility]
struct timespec *timeout, __volatile__ u_int32_t *uaddr2, int val3)
^
./usr/include/linux/futex_syscall.h:20:48: error: unknown type name 'u_int32_t'; did you mean '__int128_t'?
struct timespec *timeout, __volatile__ u_int32_t *uaddr2, int val3)
^~~~~~~~~
__int128_t
note: '__int128_t' declared here
./usr/include/linux/futex_syscall.h:61:39: error: unknown type name 'u_int32_t'; did you mean '__int128_t'?
futex_syscall_nr_requeue(__volatile__ u_int32_t *uaddr, int op, u_int32_t val,
^~~~~~~~~
__int128_t
note: '__int128_t' declared here
./usr/include/linux/futex_syscall.h:61:65: error: unknown type name 'u_int32_t'; did you mean '__int128_t'?
futex_syscall_nr_requeue(__volatile__ u_int32_t *uaddr, int op, u_int32_t val,
^~~~~~~~~
__int128_t
note: '__int128_t' declared here
./usr/include/linux/futex_syscall.h:62:5: error: unknown type name 'u_int32_t'; did you mean '__int128_t'?
u_int32_t nr_requeue, __volatile__ u_int32_t *uaddr2, int val3)
^~~~~~~~~
__int128_t
note: '__int128_t' declared here
./usr/include/linux/futex_syscall.h:62:40: error: unknown type name 'u_int32_t'; did you mean '__int128_t'?
u_int32_t nr_requeue, __volatile__ u_int32_t *uaddr2, int val3)
^~~~~~~~~
__int128_t
note: '__int128_t' declared here
1 warning and 7 errors generated.
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On Fri, Oct 15, 2021 at 6:24 PM Arnd Bergmann <[email protected]> wrote:
>
> On Fri, Oct 15, 2021 at 2:59 AM Alistair Francis
> <[email protected]> wrote:
> >
> > From: Alistair Francis <[email protected]>
> >
> > This commit adds two futex syscall wrappers that are exposed to
> > userspace.
> >
> > Neither the kernel or glibc currently expose a futex wrapper, so
> > userspace is left performing raw syscalls. This has mostly been becuase
> > the overloading of one of the arguments makes it impossible to provide a
> > single type safe function.
> >
> > Until recently the single syscall has worked fine. With the introduction
> > of a 64-bit time_t futex call on 32-bit architectures, this has become
> > more complex. The logic of handling the two possible futex syscalls is
> > complex and often implemented incorrectly.
> >
> > This patch adds two futux syscall functions that correctly handle the
> > time_t complexity for userspace.
> >
> > This idea is based on previous discussions: https://lkml.org/lkml/2021/9/21/143
> >
> > Signed-off-by: Alistair Francis <[email protected]>
>
> Thanks a lot for taking this on!
No worries!
>
> The implementation looks good to me, I've just noted some details below.
>
> The big question here is the exact one that has kept the wrappers out of
> glibc and musl for now: What should the function prototype(s) be?
>
> I see you went with having two distinct function names for the two
> sets of argument types, which I think was what the glibc developers
> were leaning towards.
>
> I think it would be nicer to use a transparent union like
>
> typedef union __attribute__((__transparent_inion__)) {
> struct timespec *timeout;
> u_int32_t nr_requeue;
> } __futex_arg4;
>
> which would let us provide a single function for both variants.
> The main downside is that this relies on a GCC extension, but I
> don't expect that to be a problem since any code using this is
> already nonportable.
I don't love this. Relying on a GCC extension doesn't seem like a great idea.
As well as that we need to split the types out to check the timeout
value, and I'm not sure of how we could do that. We could have a
switch case on the futex op, but that seems likely to get out of sync.
>
> The other question here is what namespace to use. You went for the
> non-prefixed futex_syscall, which is clearly nicer to use in an application,
> but it might clash with libc providing the same function name in the
> future. We could consider using __kernel_futex() as the function name,
> putting it into a private namespace for the kernel headers that could
> then get wrapped again by an application or libc using it.
Sounds like a good idea, I'll update this.
>
> > +#include <errno.h>
> > +#include <linux/types.h>
> > +#include <linux/time_types.h>
>
> I wonder if this should also #include <asm/unistd.h> and
> <sys/syscall.h>, or if you
> can leave that to the caller to include either <asm/unistd.h.> or <unistd.h>.
I think the kernel test bot answered this question :)
I have added both.
>
> > +/**
> > + * futex_syscall_timeout() - __NR_futex/__NR_futex_time64 syscall wrapper
> > + * @uaddr: address of first futex
> > + * @op: futex op code
> > + * @val: typically expected value of uaddr, but varies by op
> > + * @timeout: an absolute struct timespec
> > + * @uaddr2: address of second futex for some ops
> > + * @val3: varies by op
> > + */
> > +static inline int
> > +futex_syscall_timeout(volatile u_int32_t *uaddr, int op, u_int32_t val,
> > + struct timespec *timeout, volatile u_int32_t *uaddr2, int val3)
> > +{
> > +#if defined(__NR_futex_time64)
> > + if (sizeof(*timeout) != sizeof(struct __kernel_old_timespec)) {
> > + int ret = syscall(__NR_futex_time64, uaddr, op, val, timeout, uaddr2, val3);
> > +
> > + if (ret == 0 || errno != ENOSYS)
> > + return ret;
> > + }
> > +#endif
> > +
> > +#if defined(__NR_futex)
> > + if (sizeof(*timeout) == sizeof(struct __kernel_old_timespec))
> > + return syscall(__NR_futex, uaddr, op, val, timeout, uaddr2, val3);
> > +
> > + if (timeout && timeout->tv_sec == (long)timeout->tv_sec) {
>
> Not sure if we actually need to reject timeouts that overflow, should probably
> check what glibc does for this case. Intuitively, I'd go with setting
> tv_sec=LONG_MAX.
gblic generally rejects these with EOVERFLOW, see:
https://github.com/bminor/glibc/blob/595c22ecd8e87a27fd19270ed30fdbae9ad25426/sysdeps/unix/sysv/linux/clock_settime.c#L46
https://github.com/bminor/glibc/blob/595c22ecd8e87a27fd19270ed30fdbae9ad25426/sysdeps/unix/sysv/linux/clock_adjtime.c#L40
>
> > + struct __kernel_old_timespec ts32;
> > +
> > + ts32.tv_sec = (__kernel_long_t) timeout->tv_sec;
> > + ts32.tv_nsec = (__kernel_long_t) timeout->tv_nsec;
> > +
> > + return syscall(__NR_futex, uaddr, op, val, ts32, uaddr2, val3);
>
> You need to pass 'ts32' by reference here.
Thanks! Fixed.
Alistair
>
> Arnd
On Mon, Oct 18, 2021 at 5:41 AM Alistair Francis <[email protected]> wrote:
(sorry I missed this part of your reply earlier, just saw the new version of the
patch and wondered what had happened to my feedback)
> On Fri, Oct 15, 2021 at 6:24 PM Arnd Bergmann <[email protected]> wrote:
> > On Fri, Oct 15, 2021 at 2:59 AM Alistair Francis
> >
> > I think it would be nicer to use a transparent union like
> >
> > typedef union __attribute__((__transparent_inion__)) {
> > struct timespec *timeout;
> > u_int32_t nr_requeue;
> > } __futex_arg4;
> >
> > which would let us provide a single function for both variants.
> > The main downside is that this relies on a GCC extension, but I
> > don't expect that to be a problem since any code using this is
> > already nonportable.
>
> I don't love this. Relying on a GCC extension doesn't seem like a great idea.
I see that even glibc uses __transparent_union__, though they have
a fallback for incompatible compilers in there, which we could
duplicate if necessary. I would expect that the kernel headers already
have a number of gcc extensions in them elsewhere.
> As well as that we need to split the types out to check the timeout
> value, and I'm not sure of how we could do that. We could have a
> switch case on the futex op, but that seems likely to get out of sync.
This is a good point. It would be ok if we could guarantee that
no new futex operations are added in the future and all new work
is on futex2(), but we probably don't want to make that a hard
requirement.
Another option would be to add the new wrappers into the
existing uapi/linux/futex.h header instead of a new header.
This is also not great, but it would have the advantage of
having the full list of operations in the same file.
Arnd