2020-06-12 18:53:37

by André Almeida

[permalink] [raw]
Subject: [RFC 0/4] futex2: Add new futex interface

Hello,

This RFC is a followup to the previous discussion initiated from my last
patch "futex: Implement mechanism to wait on any of several futexes"[1].
As stated in the thread, the correct approach to move forward with the
wait multiple operation would be to create a new syscall that would have
all new cool features.

The first patch adds the new interface and just translate the call for
the old interface, without implementing new features. The goal here is
to establish the interface and to check if everyone is happy with this
API. The rest of patches are selftests to show the interface in action.
I have the following questions:

- Has anyone stared worked on a implementation of this interface? If
yes, it would be nice to share the progress so we don't have duplicated
work.

- What suggestions do you have to implement this? Start from scratch or
reuse the most code possible?

- The interface seems correct and implements the requirements asked by you?

- The proposed interface uses ktime_t type for absolute timeout, and I
assumed that it should use values in a nsec resolution. If this is true,
we have some problems with i386 ABI, please check out the
COMPAT_32BIT_TIME implementation in patch 1 for more details. I
haven't added a time64 implementation yet, until this is clarified.

- Is expected to have a x32 ABI implementation as well? In the case of
wait and wake, we could use the same as x86_64 ABI. However, for the
waitv (aka wait on multiple futexes) we would need a proper x32 entry
since we are dealing with 32bit pointers.

Those are the cool new features that this syscall should address some
day:

- Operate with variable bit size futexes, not restricted to 32:
8, 16 and 64

- Wait on multiple futexes, using the following semantics:

struct futex_wait {
void *uaddr;
unsigned long val;
unsigned long flags;
};

sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters,
unsigned long flags, ktime_t *timo);

- Have NUMA optimizations: if FUTEX_NUMA_FLAG is present, the `void *uaddr`
argument won't be a u{8, 16, 32, 64} value anymore, but a struct
containing a NUMA node hint:

struct futex32_numa {
u32 value __attribute__ ((aligned (8)));
u32 hint;
};

struct futex64_numa {
u64 value __attribute__ ((aligned (16)));
u64 hint;
};

Thanks,
André

André Almeida (4):
futex2: Add new futex interface
selftests: futex: Add futex2 wake/wait test
selftests: futex: Add futex2 timeout test
selftests: futex: Add futex2 wouldblock test

MAINTAINERS | 2 +-
arch/x86/entry/syscalls/syscall_32.tbl | 2 +
arch/x86/entry/syscalls/syscall_64.tbl | 2 +
include/linux/syscalls.h | 9 ++
include/uapi/asm-generic/unistd.h | 7 +-
include/uapi/linux/futex.h | 10 ++
init/Kconfig | 7 ++
kernel/Makefile | 1 +
kernel/futex2.c | 97 ++++++++++++++++
kernel/sys_ni.c | 5 +
tools/include/uapi/asm-generic/unistd.h | 7 +-
.../selftests/futex/functional/.gitignore | 1 +
.../selftests/futex/functional/Makefile | 4 +-
.../selftests/futex/functional/futex2_wait.c | 106 ++++++++++++++++++
.../futex/functional/futex_wait_timeout.c | 27 ++++-
.../futex/functional/futex_wait_wouldblock.c | 34 +++++-
.../testing/selftests/futex/functional/run.sh | 3 +
.../selftests/futex/include/futex2test.h | 50 +++++++++
18 files changed, 361 insertions(+), 13 deletions(-)
create mode 100644 kernel/futex2.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
create mode 100644 tools/testing/selftests/futex/include/futex2test.h

--
2.27.0


2020-06-12 18:53:36

by André Almeida

[permalink] [raw]
Subject: [RFC 1/4] futex2: Add new futex interface

Add a new futex interface into the kernel, namely futex2. This first
piece of work just introduces the new interface without new feature for
now, using all mechanisms of the old interface in order to work. This
way we can properly formalize the expectations around the new design,
while being able to expand the code as we need and even in parallel
efforts if possible.

This new interface is able just to wait and wake on a 32-bit user futex.
The wait operation supports timeout with absolute values only, using
time_t/ktime_t (aka u64), in monotonic or realtime mode in a nsec
resolution. The code can be found in my git tree[1].

The design of this syscall was based on the discussion following the
"futex: Implement mechanism to wait on any of several futexes" patch[2].

In order to test the code, the glibc low level futex code was
modified to use the new syscall. It's possible to find the code here[3].
After running the tests of glibc (`make check subdir=nptl`), only 26
tests of 370 didn't passed, and since they are all related to FUTEX_PI,
which isn't implemented in the new interface, those failures were expected.

[1] https://gitlab.collabora.com/tonyk/linux/-/commits/futex2
[2] https://lore.kernel.org/patchwork/patch/1194339/
[3] https://gitlab.collabora.com/tonyk/glibc/-/commits/futex2

Signed-off-by: André Almeida <[email protected]>
---
MAINTAINERS | 2 +-
arch/x86/entry/syscalls/syscall_32.tbl | 2 +
arch/x86/entry/syscalls/syscall_64.tbl | 2 +
include/linux/syscalls.h | 9 +++
include/uapi/asm-generic/unistd.h | 7 +-
include/uapi/linux/futex.h | 10 +++
init/Kconfig | 7 ++
kernel/Makefile | 1 +
kernel/futex2.c | 97 ++++++++++++++++++++++++++
kernel/sys_ni.c | 5 ++
10 files changed, 140 insertions(+), 2 deletions(-)
create mode 100644 kernel/futex2.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 50659d76976b..9da4f9e9c750 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7024,7 +7024,7 @@ F: Documentation/*futex*
F: include/asm-generic/futex.h
F: include/linux/futex.h
F: include/uapi/linux/futex.h
-F: kernel/futex.c
+F: kernel/futex*
F: tools/perf/bench/futex*
F: tools/testing/selftests/futex/

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 54581ac671b4..785c91467c23 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -442,3 +442,5 @@
435 i386 clone3 sys_clone3
437 i386 openat2 sys_openat2
438 i386 pidfd_getfd sys_pidfd_getfd
+439 i386 futex_wait sys_futex_wait_time32
+440 i386 futex_wake sys_futex_wake
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 37b844f839bc..b61f07ca401b 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -359,6 +359,8 @@
435 common clone3 sys_clone3
437 common openat2 sys_openat2
438 common pidfd_getfd sys_pidfd_getfd
+439 common futex_wait sys_futex_wait
+440 common futex_wake sys_futex_wake

#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 1815065d52f3..84c82eb410c2 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -586,6 +586,15 @@ asmlinkage long sys_get_robust_list(int pid,
asmlinkage long sys_set_robust_list(struct robust_list_head __user *head,
size_t len);

+/* kernel/futex2.c */
+asmlinkage long sys_futex_wait(void __user *uaddr, unsigned long val,
+ unsigned long flags, ktime_t __user *timo);
+asmlinkage long sys_futex_wait_time32(void __user *uaddr, unsigned long val,
+ unsigned long flags,
+ old_time32_t __user *timo);
+asmlinkage long sys_futex_wake(void __user *uaddr, unsigned long nr_wake,
+ unsigned long flags);
+
/* kernel/hrtimer.c */
asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
struct __kernel_timespec __user *rmtp);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 3a3201e4618e..e050b9669fc3 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -320,6 +320,8 @@ __SYSCALL(__NR_unshare, sys_unshare)
#if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32
#define __NR_futex 98
__SC_3264(__NR_futex, sys_futex_time32, sys_futex)
+#define __NR_futex_wait 439
+__SC_3264(__NR_futex_wait, sys_futex_wait_time32, sys_futex_wait)
#endif
#define __NR_set_robust_list 99
__SC_COMP(__NR_set_robust_list, sys_set_robust_list, \
@@ -856,8 +858,11 @@ __SYSCALL(__NR_openat2, sys_openat2)
#define __NR_pidfd_getfd 438
__SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)

+#define __NR_futex_wake 440
+__SYSCALL(__NR_futex_wake, sys_futex_wake)
+
#undef __NR_syscalls
-#define __NR_syscalls 439
+#define __NR_syscalls 441

/*
* 32 bit systems traditionally used different
diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
index a89eb0accd5e..1e67778690eb 100644
--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -41,6 +41,16 @@
#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)

+/* Size argument to futex2 syscall */
+#define FUTEX_8 0
+#define FUTEX_16 1
+#define FUTEX_32 2
+#if __BITS_PER_LONG == 64 || defined(__x86_64__)
+# define FUTEX_64 3
+#endif
+
+#define FUTEX_SIZE_MASK 0x3
+
/*
* Support for robust futexes: the kernel cleans up held futexes at
* thread exit time.
diff --git a/init/Kconfig b/init/Kconfig
index 74a5ac65644f..24806a672958 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1456,6 +1456,13 @@ config FUTEX
support for "fast userspace mutexes". The resulting kernel may not
run glibc-based applications correctly.

+config FUTEX2
+ bool "Enable futex2 support" if EXPERT
+ depends on FUTEX
+ default n
+ help
+ Experimental support for futex2 interface.
+
config FUTEX_PI
bool
depends on FUTEX && RT_MUTEXES
diff --git a/kernel/Makefile b/kernel/Makefile
index 4cb4130ced32..3c74d0b4b125 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_PROFILING) += profile.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += time/
obj-$(CONFIG_FUTEX) += futex.o
+obj-$(CONFIG_FUTEX2) += futex2.o
obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
obj-$(CONFIG_SMP) += smp.o
ifneq ($(CONFIG_SMP),y)
diff --git a/kernel/futex2.c b/kernel/futex2.c
new file mode 100644
index 000000000000..427c58916dd6
--- /dev/null
+++ b/kernel/futex2.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * futex2 system call interface by André Almeida <[email protected]>
+ *
+ * Copyright 2020 Collabora Ltd.
+ */
+
+#include <linux/syscalls.h>
+
+#include <asm/futex.h>
+
+/*
+ * Set of flags that futex2 operates. If we got something that is not in this
+ * set, it can be a unsupported futex1 operation like BITSET or PI, so we
+ * refuse to accept
+ */
+#define FUTEX2_MASK (FUTEX_SIZE_MASK | FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
+
+SYSCALL_DEFINE4(futex_wait, void __user *, uaddr, unsigned long, val,
+ unsigned long, flags, ktime_t __user *, timo)
+{
+ int op = FUTEX_WAIT | (flags & (FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME));
+ unsigned int size = flags & FUTEX_SIZE_MASK;
+ ktime_t *kt = NULL;
+ ktime_t aux = 0;
+
+ if (flags & ~FUTEX2_MASK)
+ return -EINVAL;
+
+ if (size != FUTEX_32)
+ return -EINVAL;
+
+ if (timo) {
+ if (copy_from_user(&aux, timo, sizeof(*timo)) != 0)
+ return -EINVAL;
+ kt = &aux;
+ }
+
+ return do_futex(uaddr, op, val, kt, NULL, 0, 0);
+}
+
+/*
+ * The timeout of compat syscall implementation doesn't worked properly, given
+ * that i386 ABI apps uses time_t as int32_t. 32 bits is not enough to hold all
+ * nanoseconds since epoch nowadays. Given that, those are the solutions that I
+ * could think of:
+ *
+ * * Use timespec, as in old futex. This will be enough to get both ABIs working
+ *
+ * * Instead of using a nanosecond resolution, use everything as seconds, to
+ * time32_t will be enough to keep working until y2038. Additionally, a flag
+ * FUTEX_NSEC_FLAG can be added to archs that support 64 bits sized time_t to
+ * make available for waiters that wants nsec resolution.
+ *
+ * * Just implement sys_futex_wait_time64 for i386 ABI, and i386 apps will need
+ * to wait until glibc solves the y2038 dilemma, or need to implement time64_t
+ * by themselves.
+ */
+#ifdef CONFIG_COMPAT_32BIT_TIME
+SYSCALL_DEFINE4(futex_wait_time32, void __user *, uaddr, unsigned long, val,
+ unsigned long, flags, old_time32_t __user *, timo)
+{
+ int op = FUTEX_WAIT | (flags & (FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME));
+ unsigned int size = flags & FUTEX_SIZE_MASK;
+ ktime_t *kt = NULL;
+ __kernel_old_time_t aux = 0;
+
+ if (flags & ~FUTEX2_MASK)
+ return -EINVAL;
+
+ if (size != FUTEX_32)
+ return -EINVAL;
+
+ if (timo) {
+ if (copy_from_user(&aux, timo, sizeof(*timo)) != 0)
+ return -EINVAL;
+ kt = (ktime_t *)&aux;
+ }
+
+ return do_futex(uaddr, op, val, kt, NULL, 0, 0);
+}
+#endif /* CONFIG_COMPAT_32BIT_TIME */
+
+SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake,
+ unsigned long, flags)
+{
+ int op = FUTEX_WAKE | (flags & (FUTEX_PRIVATE_FLAG));
+ unsigned int size = flags & FUTEX_SIZE_MASK;
+
+ if (flags & ~FUTEX2_MASK)
+ return -EINVAL;
+
+ if (size != FUTEX_32)
+ return -EINVAL;
+
+ return do_futex(uaddr, op, nr_wake, NULL, NULL, 0, 0);
+}
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 3b69a560a7ac..6291eacf9f28 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -148,6 +148,11 @@ COND_SYSCALL_COMPAT(set_robust_list);
COND_SYSCALL(get_robust_list);
COND_SYSCALL_COMPAT(get_robust_list);

+/* kernel/futex2.c */
+COND_SYSCALL(futex_wait);
+COND_SYSCALL(futex_wait_time32);
+COND_SYSCALL(futex_wake);
+
/* kernel/hrtimer.c */

/* kernel/itimer.c */
--
2.27.0

2020-06-12 18:53:59

by André Almeida

[permalink] [raw]
Subject: [RFC 2/4] selftests: futex: Add futex2 wake/wait test

Add a simple test to test wake/wait mechanism using futex2 interface.
Create helper files so more tests can evaluate futex2.

Signed-off-by: André Almeida <[email protected]>
---
tools/include/uapi/asm-generic/unistd.h | 7 +-
.../selftests/futex/functional/.gitignore | 1 +
.../selftests/futex/functional/Makefile | 4 +-
.../selftests/futex/functional/futex2_wait.c | 106 ++++++++++++++++++
.../testing/selftests/futex/functional/run.sh | 3 +
.../selftests/futex/include/futex2test.h | 50 +++++++++
6 files changed, 169 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
create mode 100644 tools/testing/selftests/futex/include/futex2test.h

diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 3a3201e4618e..e050b9669fc3 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -320,6 +320,8 @@ __SYSCALL(__NR_unshare, sys_unshare)
#if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32
#define __NR_futex 98
__SC_3264(__NR_futex, sys_futex_time32, sys_futex)
+#define __NR_futex_wait 439
+__SC_3264(__NR_futex_wait, sys_futex_wait_time32, sys_futex_wait)
#endif
#define __NR_set_robust_list 99
__SC_COMP(__NR_set_robust_list, sys_set_robust_list, \
@@ -856,8 +858,11 @@ __SYSCALL(__NR_openat2, sys_openat2)
#define __NR_pidfd_getfd 438
__SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)

+#define __NR_futex_wake 440
+__SYSCALL(__NR_futex_wake, sys_futex_wake)
+
#undef __NR_syscalls
-#define __NR_syscalls 439
+#define __NR_syscalls 441

/*
* 32 bit systems traditionally used different
diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore
index 0efcd494daab..d61f1df94360 100644
--- a/tools/testing/selftests/futex/functional/.gitignore
+++ b/tools/testing/selftests/futex/functional/.gitignore
@@ -6,3 +6,4 @@ futex_wait_private_mapped_file
futex_wait_timeout
futex_wait_uninitialized_heap
futex_wait_wouldblock
+futex2_wait
diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile
index 23207829ec75..7142a94a7ac3 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -5,6 +5,7 @@ LDLIBS := -lpthread -lrt

HEADERS := \
../include/futextest.h \
+ ../include/futex2test.h \
../include/atomic.h \
../include/logging.h
TEST_GEN_FILES := \
@@ -14,7 +15,8 @@ TEST_GEN_FILES := \
futex_requeue_pi_signal_restart \
futex_requeue_pi_mismatched_ops \
futex_wait_uninitialized_heap \
- futex_wait_private_mapped_file
+ futex_wait_private_mapped_file \
+ futex2_wait

TEST_PROGS := run.sh

diff --git a/tools/testing/selftests/futex/functional/futex2_wait.c b/tools/testing/selftests/futex/functional/futex2_wait.c
new file mode 100644
index 000000000000..93f1609210eb
--- /dev/null
+++ b/tools/testing/selftests/futex/functional/futex2_wait.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/******************************************************************************
+ *
+ * Copyright Collabora Ltd., 2020
+ *
+ * DESCRIPTION
+ * Test wait/wake mechanism of futex2, using 32bit sized futexes.
+ *
+ * AUTHOR
+ * André Almeida <[email protected]>
+ *
+ * HISTORY
+ * 2020-06-11: Initial version by André <[email protected]>
+ *
+ *****************************************************************************/
+
+#include <errno.h>
+#include <error.h>
+#include <getopt.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include <pthread.h>
+#include "futex2test.h"
+#include "logging.h"
+
+#define TEST_NAME "futex-wait-wouldblock"
+#define timeout_ns 30000000
+#define WAKE_WAIT_US 10000
+futex_t f1 = FUTEX_INITIALIZER;
+
+void usage(char *prog)
+{
+ printf("Usage: %s\n", prog);
+ printf(" -c Use color\n");
+ printf(" -h Display this help message\n");
+ printf(" -v L Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n",
+ VQUIET, VCRITICAL, VINFO);
+}
+
+void *waiterfn(void *arg)
+{
+ time_t abs_to;
+ struct timespec to;
+
+ /* setting absolute timeout for futex2 */
+ if (clock_gettime(CLOCK_MONOTONIC, &to))
+ error("clock_gettime failed\n", errno);
+ abs_to = (NSEC_PER_SEC * to.tv_sec) + to.tv_nsec + timeout_ns;
+
+ if (futex2_wait(&f1, f1, FUTEX_PRIVATE_FLAG | FUTEX_32, &abs_to))
+ printf("waiter failed errno %d\n", errno);
+
+ return NULL;
+}
+
+int main(int argc, char *argv[])
+{
+ pthread_t waiter;
+ int res, ret = RET_PASS;
+ int c;
+
+ while ((c = getopt(argc, argv, "cht:v:")) != -1) {
+ switch (c) {
+ case 'c':
+ log_color(1);
+ break;
+ case 'h':
+ usage(basename(argv[0]));
+ exit(0);
+ case 'v':
+ log_verbosity(atoi(optarg));
+ break;
+ default:
+ usage(basename(argv[0]));
+ exit(1);
+ }
+ }
+
+ ksft_print_header();
+ ksft_set_plan(1);
+ ksft_print_msg("%s: Test FUTEX_WAIT\n",
+ basename(argv[0]));
+
+ info("Calling futex_wait on f1: %u @ %p with val=%u\n", f1, &f1, f1);
+
+ if (pthread_create(&waiter, NULL, waiterfn, NULL))
+ error("pthread_create failed\n", errno);
+
+ usleep(WAKE_WAIT_US);
+
+ info("Calling futex2_wake on f1: %u @ %p with val=%u\n", f1, &f1, f1);
+ res = futex2_wake(&f1, 1, FUTEX_PRIVATE_FLAG | FUTEX_32);
+ if (res != 1) {
+ ksft_test_result_fail("futex2_wake returned: %d %s\n",
+ res ? errno : res,
+ res ? strerror(errno) : "");
+ ret = RET_FAIL;
+ } else {
+ ksft_test_result_pass("futex2_wake wouldblock succeeds\n");
+ }
+
+ ksft_print_cnts();
+ return ret;
+}
diff --git a/tools/testing/selftests/futex/functional/run.sh b/tools/testing/selftests/futex/functional/run.sh
index 1acb6ace1680..3730159c865a 100755
--- a/tools/testing/selftests/futex/functional/run.sh
+++ b/tools/testing/selftests/futex/functional/run.sh
@@ -73,3 +73,6 @@ echo
echo
./futex_wait_uninitialized_heap $COLOR
./futex_wait_private_mapped_file $COLOR
+
+echo
+./futex2_wait $COLOR
diff --git a/tools/testing/selftests/futex/include/futex2test.h b/tools/testing/selftests/futex/include/futex2test.h
new file mode 100644
index 000000000000..3a560ad7e813
--- /dev/null
+++ b/tools/testing/selftests/futex/include/futex2test.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/******************************************************************************
+ *
+ * Copyright Collabora Ltd., 2020
+ *
+ * DESCRIPTION
+ * Futex2 library addons for old futex library
+ *
+ * AUTHOR
+ * André Almeida <[email protected]>
+ *
+ * HISTORY
+ * 2020-06-11: Initial version by André <[email protected]>
+ *
+ *****************************************************************************/
+#include "futextest.h"
+
+#define NSEC_PER_SEC 1000000000L
+
+#ifndef FUTEX_8
+# define FUTEX_8 0
+#endif
+#ifndef FUTEX_16
+# define FUTEX_16 1
+#endif
+#ifndef FUTEX_32
+#define FUTEX_32 2
+#endif
+#ifdef __x86_64__
+# ifndef FUTEX_64
+# define FUTEX_64 3
+# endif
+#endif
+
+/*
+ * wait for uaddr if (*uaddr == val)
+ */
+static inline int futex2_wait(volatile void *uaddr, unsigned long val,
+ unsigned long flags, time_t *timo)
+{
+ return syscall(__NR_futex_wait, uaddr, val, flags, timo);
+}
+
+/*
+ * wake nr futexes waiting for uaddr
+ */
+static inline int futex2_wake(volatile void *uaddr, unsigned int nr, unsigned long flags)
+{
+ return syscall(__NR_futex_wake, uaddr, nr, flags);
+}
--
2.27.0

2020-06-12 18:54:35

by André Almeida

[permalink] [raw]
Subject: [RFC 4/4] selftests: futex: Add futex2 wouldblock test

Adapt existing futex wait wouldblock file to test the same mechanism for
futex2.

Signed-off-by: André Almeida <[email protected]>
---
.../futex/functional/futex_wait_wouldblock.c | 34 ++++++++++++++++---
1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
index 0ae390ff8164..67374d9db70c 100644
--- a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
+++ b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
@@ -12,6 +12,7 @@
*
* HISTORY
* 2009-Nov-14: Initial version by Gowrishankar <[email protected]>
+ * 2020-Jun-11: Add futex2 test by André <[email protected]>
*
*****************************************************************************/

@@ -21,7 +22,7 @@
#include <stdlib.h>
#include <string.h>
#include <time.h>
-#include "futextest.h"
+#include "futex2test.h"
#include "logging.h"

#define TEST_NAME "futex-wait-wouldblock"
@@ -38,7 +39,8 @@ void usage(char *prog)

int main(int argc, char *argv[])
{
- struct timespec to = {.tv_sec = 0, .tv_nsec = timeout_ns};
+ time_t abs_to;
+ struct timespec to;
futex_t f1 = FUTEX_INITIALIZER;
int res, ret = RET_PASS;
int c;
@@ -61,18 +63,40 @@ int main(int argc, char *argv[])
}

ksft_print_header();
- ksft_set_plan(1);
+ ksft_set_plan(2);
ksft_print_msg("%s: Test the unexpected futex value in FUTEX_WAIT\n",
basename(argv[0]));

+ /* setting absolute timeout for futex2 */
+ if (clock_gettime(CLOCK_MONOTONIC, &to))
+ error("clock_gettime failed\n", errno);
+
+ abs_to = (NSEC_PER_SEC * to.tv_sec) + to.tv_nsec + timeout_ns;
+
+ /* initialize timeout */
+ to.tv_sec = 0;
+ to.tv_nsec = timeout_ns;
+
info("Calling futex_wait on f1: %u @ %p with val=%u\n", f1, &f1, f1+1);
res = futex_wait(&f1, f1+1, &to, FUTEX_PRIVATE_FLAG);
if (!res || errno != EWOULDBLOCK) {
- fail("futex_wait returned: %d %s\n",
+ ksft_test_result_fail("futex_wait returned: %d %s\n",
+ res ? errno : res, res ? strerror(errno) : "");
+ ret = RET_FAIL;
+ } else {
+ ksft_test_result_pass("futex_wait wouldblock succeeds\n");
+ }
+
+ info("Calling futex2_wait on f1: %u @ %p with val=%u\n", f1, &f1, f1+1);
+ res = futex2_wait(&f1, f1+1, FUTEX_PRIVATE_FLAG | FUTEX_32, &abs_to);
+ if (!res || errno != EWOULDBLOCK) {
+ ksft_test_result_fail("futex2_wait returned: %d %s\n",
res ? errno : res, res ? strerror(errno) : "");
ret = RET_FAIL;
+ } else {
+ ksft_test_result_pass("futex2_wait wouldblock succeeds\n");
}

- print_result(TEST_NAME, ret);
+ ksft_print_cnts();
return ret;
}
--
2.27.0

2020-06-12 18:56:17

by André Almeida

[permalink] [raw]
Subject: [RFC 3/4] selftests: futex: Add futex2 timeout test

Adapt existing futex wait timeout file to test the same mechanism for
futex2.

Signed-off-by: André Almeida <[email protected]>
---
.../futex/functional/futex_wait_timeout.c | 27 ++++++++++++++++---
1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/futex/functional/futex_wait_timeout.c b/tools/testing/selftests/futex/functional/futex_wait_timeout.c
index ee55e6d389a3..e0b802fcb005 100644
--- a/tools/testing/selftests/futex/functional/futex_wait_timeout.c
+++ b/tools/testing/selftests/futex/functional/futex_wait_timeout.c
@@ -11,6 +11,7 @@
*
* HISTORY
* 2009-Nov-6: Initial version by Darren Hart <[email protected]>
+ * 2020-Jun-11: Add futex2 test by André <[email protected]>
*
*****************************************************************************/

@@ -20,7 +21,7 @@
#include <stdlib.h>
#include <string.h>
#include <time.h>
-#include "futextest.h"
+#include "futex2test.h"
#include "logging.h"

#define TEST_NAME "futex-wait-timeout"
@@ -40,6 +41,7 @@ void usage(char *prog)
int main(int argc, char *argv[])
{
futex_t f1 = FUTEX_INITIALIZER;
+ time_t abs_to;
struct timespec to;
int res, ret = RET_PASS;
int c;
@@ -65,11 +67,17 @@ int main(int argc, char *argv[])
}

ksft_print_header();
- ksft_set_plan(1);
+ ksft_set_plan(2);
ksft_print_msg("%s: Block on a futex and wait for timeout\n",
basename(argv[0]));
ksft_print_msg("\tArguments: timeout=%ldns\n", timeout_ns);

+ /* setting absolute timeout for futex2 */
+ if (clock_gettime(CLOCK_MONOTONIC, &to))
+ error("clock_gettime failed\n", errno);
+
+ abs_to = (NSEC_PER_SEC * to.tv_sec) + to.tv_nsec + timeout_ns;
+
/* initialize timeout */
to.tv_sec = 0;
to.tv_nsec = timeout_ns;
@@ -77,10 +85,21 @@ int main(int argc, char *argv[])
info("Calling futex_wait on f1: %u @ %p\n", f1, &f1);
res = futex_wait(&f1, f1, &to, FUTEX_PRIVATE_FLAG);
if (!res || errno != ETIMEDOUT) {
- fail("futex_wait returned %d\n", ret < 0 ? errno : ret);
+ ksft_test_result_fail("futex_wait returned %d\n", ret < 0 ? errno : ret);
+ ret = RET_FAIL;
+ } else {
+ ksft_test_result_pass("futex_wait timeout succeeds\n");
+ }
+
+ info("Calling futex2_wait on f1: %u @ %p\n", f1, &f1);
+ res = futex2_wait(&f1, f1, FUTEX_PRIVATE_FLAG | FUTEX_32, &abs_to);
+ if (!res || errno != ETIMEDOUT) {
+ ksft_test_result_fail("futex2_wait returned %d\n", ret < 0 ? errno : ret);
ret = RET_FAIL;
+ } else {
+ ksft_test_result_pass("futex2_wait timeout succeeds\n");
}

- print_result(TEST_NAME, ret);
+ ksft_print_cnts();
return ret;
}
--
2.27.0

2020-06-12 19:38:20

by H.J. Lu

[permalink] [raw]
Subject: Re: [RFC 0/4] futex2: Add new futex interface

On Fri, Jun 12, 2020 at 11:53 AM André Almeida via Libc-alpha
<[email protected]> wrote:
>
> Hello,
>
> This RFC is a followup to the previous discussion initiated from my last
> patch "futex: Implement mechanism to wait on any of several futexes"[1].
> As stated in the thread, the correct approach to move forward with the
> wait multiple operation would be to create a new syscall that would have
> all new cool features.
>
> The first patch adds the new interface and just translate the call for
> the old interface, without implementing new features. The goal here is
> to establish the interface and to check if everyone is happy with this
> API. The rest of patches are selftests to show the interface in action.
> I have the following questions:
>
> - Has anyone stared worked on a implementation of this interface? If
> yes, it would be nice to share the progress so we don't have duplicated
> work.
>
> - What suggestions do you have to implement this? Start from scratch or
> reuse the most code possible?
>
> - The interface seems correct and implements the requirements asked by you?
>
> - The proposed interface uses ktime_t type for absolute timeout, and I
> assumed that it should use values in a nsec resolution. If this is true,
> we have some problems with i386 ABI, please check out the
> COMPAT_32BIT_TIME implementation in patch 1 for more details. I
> haven't added a time64 implementation yet, until this is clarified.
>
> - Is expected to have a x32 ABI implementation as well? In the case of
> wait and wake, we could use the same as x86_64 ABI. However, for the
> waitv (aka wait on multiple futexes) we would need a proper x32 entry
> since we are dealing with 32bit pointers.

x32 should be able to use the same i386 compat systcall entry. Will it be
problem?

> Those are the cool new features that this syscall should address some
> day:
>
> - Operate with variable bit size futexes, not restricted to 32:
> 8, 16 and 64
>
> - Wait on multiple futexes, using the following semantics:
>
> struct futex_wait {
> void *uaddr;
> unsigned long val;
> unsigned long flags;
> };
>
> sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters,
> unsigned long flags, ktime_t *timo);
>
> - Have NUMA optimizations: if FUTEX_NUMA_FLAG is present, the `void *uaddr`
> argument won't be a u{8, 16, 32, 64} value anymore, but a struct
> containing a NUMA node hint:
>
> struct futex32_numa {
> u32 value __attribute__ ((aligned (8)));
> u32 hint;
> };
>
> struct futex64_numa {
> u64 value __attribute__ ((aligned (16)));
> u64 hint;
> };
>

H.J.

2020-06-13 19:28:59

by André Almeida

[permalink] [raw]
Subject: Re: [RFC 0/4] futex2: Add new futex interface

Hello H.J.,

On 6/12/20 4:35 PM, H.J. Lu wrote:
> On Fri, Jun 12, 2020 at 11:53 AM André Almeida via Libc-alpha
> <[email protected]> wrote:
>> - Is expected to have a x32 ABI implementation as well? In the case of
>> wait and wake, we could use the same as x86_64 ABI. However, for the
>> waitv (aka wait on multiple futexes) we would need a proper x32 entry
>> since we are dealing with 32bit pointers.
>
> x32 should be able to use the same i386 compat systcall entry. Will it be
> problem?
>
Indeed, you are right. In the last iteration of this work, I had some
problems dealing with x32 ABI, but this new interface doesn't have the
same problem anymore. We can use the same sys_waitv_time64 interface for
both i368 and x32.

>>
>
> H.J.
>

Thanks,
André

2020-06-25 06:49:17

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC 0/4] futex2: Add new futex interface

On Fri, Jun 12, 2020 at 8:51 PM André Almeida <[email protected]> wrote:

> - The proposed interface uses ktime_t type for absolute timeout, and I
> assumed that it should use values in a nsec resolution. If this is true,
> we have some problems with i386 ABI, please check out the
> COMPAT_32BIT_TIME implementation in patch 1 for more details. I
> haven't added a time64 implementation yet, until this is clarified.

ktime_t is not part of the uapi headers, and has always been considered
an implementation detail of the kernel so far. I would argue it should
stay that way. The most sensible alternatives would be to either use
a "__u64 *timeout" argument for a relative timeout, or a
"struct __kernel_timespec *timeout" for an absolute timeout.

old_time32_t also makes no sense for multiple reasons:

- It's another kernel internal type and not part of the uapi headers
- your time32 call has different calling conventions from your time64
version, not just a different type.
- there should be no need to add syscalls that are known to be buggy
when there is a replacement type that does not have that bug.

> - Is expected to have a x32 ABI implementation as well? In the case of
> wait and wake, we could use the same as x86_64 ABI. However, for the
> waitv (aka wait on multiple futexes) we would need a proper x32 entry
> since we are dealing with 32bit pointers.

For new syscalls, I'd actually recommend not having a separate
entry point, but just checking 'if (in_compat_syscall())' inside of the
implementation to pick one behavior vs the other when accessing
the user pointers. This keeps the implementation simpler and
avoids assigning a new x32 syscall number that would be different
from all the other architectures.

Arnd

2020-06-25 14:39:19

by André Almeida

[permalink] [raw]
Subject: Re: [RFC 0/4] futex2: Add new futex interface

Hello Arnd,

On 6/25/20 3:48 AM, Arnd Bergmann wrote:
> On Fri, Jun 12, 2020 at 8:51 PM André Almeida <[email protected]> wrote:
>
>> - The proposed interface uses ktime_t type for absolute timeout, and I
>> assumed that it should use values in a nsec resolution. If this is true,
>> we have some problems with i386 ABI, please check out the
>> COMPAT_32BIT_TIME implementation in patch 1 for more details. I
>> haven't added a time64 implementation yet, until this is clarified.
>
> ktime_t is not part of the uapi headers, and has always been considered
> an implementation detail of the kernel so far. I would argue it should
> stay that way. The most sensible alternatives would be to either use
> a "__u64 *timeout" argument for a relative timeout, or a
> "struct __kernel_timespec *timeout" for an absolute timeout.
>
> old_time32_t also makes no sense for multiple reasons:
>
> - It's another kernel internal type and not part of the uapi headers
> - your time32 call has different calling conventions from your time64
> version, not just a different type.
> - there should be no need to add syscalls that are known to be buggy
> when there is a replacement type that does not have that bug.
>

Thanks for the input. As stated by tglx at [1], "supporting relative
timeouts is wrong to begin with", my next patch will use "struct
__kernel_timespec *timeout" for an absolute timeout.

>> - Is expected to have a x32 ABI implementation as well? In the case of
>> wait and wake, we could use the same as x86_64 ABI. However, for the
>> waitv (aka wait on multiple futexes) we would need a proper x32 entry
>> since we are dealing with 32bit pointers.
>
> For new syscalls, I'd actually recommend not having a separate
> entry point, but just checking 'if (in_compat_syscall())' inside of the
> implementation to pick one behavior vs the other when accessing
> the user pointers. This keeps the implementation simpler and
> avoids assigning a new x32 syscall number that would be different
> from all the other architectures.
>

Cool, this will make the code cleaner.

> Arnd
>


Thanks,
André

[1] https://lkml.org/lkml/2019/7/31/1499