2018-11-21 21:22:17

by Mathieu Desnoyers

[permalink] [raw]
Subject: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

Register rseq(2) TLS for each thread (including main), and unregister
for each thread (excluding main). "rseq" stands for Restartable
Sequences.

See the rseq(2) man page proposed here:
https://lkml.org/lkml/2018/9/19/647

This patch is based on glibc commit a502c5294. The rseq(2) system call
was merged into Linux 4.18.

Signed-off-by: Mathieu Desnoyers <[email protected]>
CC: Carlos O'Donell <[email protected]>
CC: Florian Weimer <[email protected]>
CC: Joseph Myers <[email protected]>
CC: Szabolcs Nagy <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Ben Maurer <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: "Paul E. McKenney" <[email protected]>
CC: Boqun Feng <[email protected]>
CC: Will Deacon <[email protected]>
CC: Dave Watson <[email protected]>
CC: Paul Turner <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
---
Changes since v1:
- Move __rseq_refcount to an extra field at the end of __rseq_abi to
eliminate one symbol.

All libraries/programs which try to register rseq (glibc,
early-adopter applications, early-adopter libraries) should use the
rseq refcount. It becomes part of the ABI within a user-space
process, but it's not part of the ABI shared with the kernel per se.

- Restructure how this code is organized so glibc keeps building on
non-Linux targets.

- Use non-weak symbol for __rseq_abi.

- Move rseq registration/unregistration implementation into its own
nptl/rseq.c compile unit.

- Move __rseq_abi symbol under GLIBC_2.29.

Changes since v2:
- Move __rseq_refcount to its own symbol, which is less ugly than
trying to play tricks with the rseq uapi.
- Move __rseq_abi from nptl to csu (C start up), so it can be used
across glibc, including memory allocator and sched_getcpu(). The
__rseq_refcount symbol is kept in nptl, because there is no reason
to use it elsewhere in glibc.

Changes since v3:
- Use atomic_fetch_add_relaxed to update __rseq_refcount TLS in a
async-signal-safe way. This is sufficient for the required atomicity
guarantees provided that the refcount is only touched by the current
thread and nested signal handlers, but more lightweight than
atomic_increment_val and atomic_decrement_val.
- Add missing abilist items.
- Rebase on glibc master commit a502c5294.
- Add NEWS entry.
---
NEWS | 6 ++
csu/Makefile | 2 +-
csu/Versions | 3 +
csu/rseq.c | 38 ++++++++++
nptl/Makefile | 2 +-
nptl/Versions | 4 +
nptl/nptl-init.c | 3 +
nptl/pthreadP.h | 3 +
nptl/pthread_create.c | 8 ++
nptl/rseq.c | 42 +++++++++++
sysdeps/nptl/rseq-internal.h | 34 +++++++++
sysdeps/unix/sysv/linux/aarch64/libc.abilist | 1 +
.../sysv/linux/aarch64/libpthread.abilist | 1 +
sysdeps/unix/sysv/linux/alpha/libc.abilist | 1 +
.../unix/sysv/linux/alpha/libpthread.abilist | 1 +
sysdeps/unix/sysv/linux/arm/libc.abilist | 1 +
.../unix/sysv/linux/arm/libpthread.abilist | 1 +
sysdeps/unix/sysv/linux/hppa/libc.abilist | 1 +
.../unix/sysv/linux/hppa/libpthread.abilist | 1 +
sysdeps/unix/sysv/linux/i386/libc.abilist | 1 +
.../unix/sysv/linux/i386/libpthread.abilist | 1 +
sysdeps/unix/sysv/linux/ia64/libc.abilist | 1 +
.../unix/sysv/linux/ia64/libpthread.abilist | 1 +
.../sysv/linux/m68k/coldfire/libc.abilist | 1 +
.../linux/m68k/coldfire/libpthread.abilist | 1 +
.../unix/sysv/linux/m68k/m680x0/libc.abilist | 1 +
.../sysv/linux/m68k/m680x0/libpthread.abilist | 1 +
.../unix/sysv/linux/microblaze/libc.abilist | 1 +
.../sysv/linux/microblaze/libpthread.abilist | 1 +
.../sysv/linux/mips/mips32/fpu/libc.abilist | 1 +
.../sysv/linux/mips/mips32/libpthread.abilist | 1 +
.../sysv/linux/mips/mips32/nofpu/libc.abilist | 1 +
.../sysv/linux/mips/mips64/libpthread.abilist | 1 +
.../sysv/linux/mips/mips64/n32/libc.abilist | 1 +
.../sysv/linux/mips/mips64/n64/libc.abilist | 1 +
sysdeps/unix/sysv/linux/nios2/libc.abilist | 1 +
.../unix/sysv/linux/nios2/libpthread.abilist | 1 +
.../linux/powerpc/powerpc32/fpu/libc.abilist | 1 +
.../powerpc/powerpc32/libpthread.abilist | 1 +
.../powerpc/powerpc32/nofpu/libc.abilist | 1 +
.../linux/powerpc/powerpc64/libc-le.abilist | 1 +
.../sysv/linux/powerpc/powerpc64/libc.abilist | 1 +
.../powerpc/powerpc64/libpthread-le.abilist | 1 +
.../powerpc/powerpc64/libpthread.abilist | 1 +
.../unix/sysv/linux/riscv/rv64/libc.abilist | 1 +
.../sysv/linux/riscv/rv64/libpthread.abilist | 1 +
sysdeps/unix/sysv/linux/rseq-internal.h | 73 +++++++++++++++++++
.../unix/sysv/linux/s390/s390-32/libc.abilist | 1 +
.../linux/s390/s390-32/libpthread.abilist | 1 +
.../unix/sysv/linux/s390/s390-64/libc.abilist | 1 +
.../linux/s390/s390-64/libpthread.abilist | 1 +
sysdeps/unix/sysv/linux/sh/libc.abilist | 1 +
sysdeps/unix/sysv/linux/sh/libpthread.abilist | 1 +
.../sysv/linux/sparc/sparc32/libc.abilist | 1 +
.../linux/sparc/sparc32/libpthread.abilist | 1 +
.../sysv/linux/sparc/sparc64/libc.abilist | 1 +
.../linux/sparc/sparc64/libpthread.abilist | 1 +
.../unix/sysv/linux/x86_64/64/libc.abilist | 1 +
.../sysv/linux/x86_64/64/libpthread.abilist | 1 +
.../unix/sysv/linux/x86_64/x32/libc.abilist | 1 +
.../sysv/linux/x86_64/x32/libpthread.abilist | 1 +
61 files changed, 265 insertions(+), 2 deletions(-)
create mode 100644 csu/rseq.c
create mode 100644 nptl/rseq.c
create mode 100644 sysdeps/nptl/rseq-internal.h
create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h

diff --git a/NEWS b/NEWS
index f488821af1..b238eaa391 100644
--- a/NEWS
+++ b/NEWS
@@ -35,6 +35,12 @@ Major new features:
different directory. This is a GNU extension and similar to the
Solaris function of the same name.

+* Support for automatically registering threads with the Linux rseq(2)
+ system call has been added. This system call is implemented starting
+ from Linux 4.18. In order to be activated, it requires that glibc is built
+ against kernel headers that include this system call, and that glibc
+ detects availability of that system call at runtime.
+
Deprecated and removed features, and other changes affecting compatibility:

* The glibc.tune tunable namespace has been renamed to glibc.cpu and the
diff --git a/csu/Makefile b/csu/Makefile
index 88fc77662e..81d471587f 100644
--- a/csu/Makefile
+++ b/csu/Makefile
@@ -28,7 +28,7 @@ include ../Makeconfig

routines = init-first libc-start $(libc-init) sysdep version check_fds \
libc-tls elf-init dso_handle
-aux = errno
+aux = errno rseq
elide-routines.os = libc-tls
static-only-routines = elf-init
csu-dummies = $(filter-out $(start-installed-name),crt1.o Mcrt1.o)
diff --git a/csu/Versions b/csu/Versions
index 43010c3443..0f44ebf991 100644
--- a/csu/Versions
+++ b/csu/Versions
@@ -7,6 +7,9 @@ libc {
# New special glibc functions.
gnu_get_libc_release; gnu_get_libc_version;
}
+ GLIBC_2.29 {
+ __rseq_abi;
+ }
GLIBC_PRIVATE {
errno;
}
diff --git a/csu/rseq.c b/csu/rseq.c
new file mode 100644
index 0000000000..ccc88e4582
--- /dev/null
+++ b/csu/rseq.c
@@ -0,0 +1,38 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+ Contributed by Mathieu Desnoyers <[email protected]>, 2018.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <stdint.h>
+
+enum libc_rseq_cpu_id_state {
+ LIBC_RSEQ_CPU_ID_UNINITIALIZED = -1,
+ LIBC_RSEQ_CPU_ID_REGISTRATION_FAILED = -2,
+};
+
+/* linux/rseq.h defines struct rseq as aligned on 32 bytes. The kernel ABI
+ size is 20 bytes. */
+struct libc_rseq {
+ uint32_t cpu_id_start;
+ uint32_t cpu_id;
+ uint64_t rseq_cs;
+ uint32_t flags;
+} __attribute__ ((aligned(4 * sizeof(uint64_t))));
+
+__attribute__ ((weak))
+__thread volatile struct libc_rseq __rseq_abi = {
+ .cpu_id = LIBC_RSEQ_CPU_ID_UNINITIALIZED,
+};
diff --git a/nptl/Makefile b/nptl/Makefile
index 49b6faa330..3a5dc80c65 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -145,7 +145,7 @@ libpthread-routines = nptl-init nptlfreeres vars events version pt-interp \
mtx_destroy mtx_init mtx_lock mtx_timedlock \
mtx_trylock mtx_unlock call_once cnd_broadcast \
cnd_destroy cnd_init cnd_signal cnd_timedwait cnd_wait \
- tss_create tss_delete tss_get tss_set
+ tss_create tss_delete tss_get tss_set rseq
# pthread_setuid pthread_seteuid pthread_setreuid \
# pthread_setresuid \
# pthread_setgid pthread_setegid pthread_setregid \
diff --git a/nptl/Versions b/nptl/Versions
index e7f691da7a..f7890f73fc 100644
--- a/nptl/Versions
+++ b/nptl/Versions
@@ -277,6 +277,10 @@ libpthread {
cnd_timedwait; cnd_wait; tss_create; tss_delete; tss_get; tss_set;
}

+ GLIBC_2.29 {
+ __rseq_refcount;
+ }
+
GLIBC_PRIVATE {
__pthread_initialize_minimal;
__pthread_clock_gettime; __pthread_clock_settime;
diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c
index 907411d5bc..ab17bbb6e4 100644
--- a/nptl/nptl-init.c
+++ b/nptl/nptl-init.c
@@ -279,6 +279,9 @@ __pthread_initialize_minimal_internal (void)
THREAD_SETMEM (pd, cpuclock_offset, GL(dl_cpuclock_offset));
#endif

+ /* Register rseq ABI to the kernel. */
+ (void) __rseq_register_current_thread ();
+
/* Initialize the robust mutex data. */
{
#if __PTHREAD_MUTEX_HAVE_PREV
diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
index 19efe1e35f..7fb996f12d 100644
--- a/nptl/pthreadP.h
+++ b/nptl/pthreadP.h
@@ -609,6 +609,9 @@ extern void __shm_directory_freeres (void) attribute_hidden;

extern void __wait_lookup_done (void) attribute_hidden;

+extern int __rseq_register_current_thread (void) attribute_hidden;
+extern int __rseq_unregister_current_thread (void) attribute_hidden;
+
#ifdef SHARED
# define PTHREAD_STATIC_FN_REQUIRE(name)
#else
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index fe75d04113..a5233cdf2f 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -378,6 +378,7 @@ __free_tcb (struct pthread *pd)
START_THREAD_DEFN
{
struct pthread *pd = START_THREAD_SELF;
+ bool has_rseq = false;

#if HP_TIMING_AVAIL
/* Remember the time when the thread was started. */
@@ -396,6 +397,9 @@ START_THREAD_DEFN
if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2))
futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE);

+ /* Register rseq TLS to the kernel. */
+ has_rseq = !__rseq_register_current_thread ();
+
#ifdef __NR_set_robust_list
# ifndef __ASSUME_SET_ROBUST_LIST
if (__set_robust_list_avail >= 0)
@@ -573,6 +577,10 @@ START_THREAD_DEFN
}
#endif

+ /* Unregister rseq TLS from kernel. */
+ if (has_rseq && __rseq_unregister_current_thread ())
+ abort();
+
advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
pd->guardsize);

diff --git a/nptl/rseq.c b/nptl/rseq.c
new file mode 100644
index 0000000000..415674964f
--- /dev/null
+++ b/nptl/rseq.c
@@ -0,0 +1,42 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+ Contributed by Mathieu Desnoyers <[email protected]>, 2018.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "pthreadP.h"
+
+__attribute__((weak))
+__thread volatile uint32_t __rseq_refcount;
+
+#ifdef __NR_rseq
+#include <sysdeps/unix/sysv/linux/rseq-internal.h>
+#else
+#include <sysdeps/nptl/rseq-internal.h>
+#endif /* __NR_rseq. */
+
+int
+attribute_hidden
+__rseq_register_current_thread (void)
+{
+ return sysdep_rseq_register_current_thread ();
+}
+
+int
+attribute_hidden
+__rseq_unregister_current_thread (void)
+{
+ return sysdep_rseq_register_current_thread ();
+}
diff --git a/sysdeps/nptl/rseq-internal.h b/sysdeps/nptl/rseq-internal.h
new file mode 100644
index 0000000000..96422ebd57
--- /dev/null
+++ b/sysdeps/nptl/rseq-internal.h
@@ -0,0 +1,34 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+ Contributed by Mathieu Desnoyers <[email protected]>, 2018.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef RSEQ_INTERNAL_H
+#define RSEQ_INTERNAL_H
+
+static inline int
+sysdep_rseq_register_current_thread (void)
+{
+ return -1;
+}
+
+static inline int
+sysdep_rseq_unregister_current_thread (void)
+{
+ return -1;
+}
+
+#endif /* rseq-internal.h */
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index e66c741d04..36af4d0e94 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2138,4 +2138,5 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
diff --git a/sysdeps/unix/sysv/linux/aarch64/libpthread.abilist b/sysdeps/unix/sysv/linux/aarch64/libpthread.abilist
index 9a9e4cee85..d5b010eee1 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libpthread.abilist
@@ -243,3 +243,4 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index 8df162fe99..cdf1d53e35 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2033,6 +2033,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/alpha/libpthread.abilist b/sysdeps/unix/sysv/linux/alpha/libpthread.abilist
index b413007ccb..d477af8d10 100644
--- a/sysdeps/unix/sysv/linux/alpha/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libpthread.abilist
@@ -227,6 +227,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist
index 43c804f9dc..f11ff5c5cf 100644
--- a/sysdeps/unix/sysv/linux/arm/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/libc.abilist
@@ -123,6 +123,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.4 _Exit F
GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/arm/libpthread.abilist b/sysdeps/unix/sysv/linux/arm/libpthread.abilist
index af82a4c632..b49c38114a 100644
--- a/sysdeps/unix/sysv/linux/arm/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/arm/libpthread.abilist
@@ -27,6 +27,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.4 _IO_flockfile F
GLIBC_2.4 _IO_ftrylockfile F
GLIBC_2.4 _IO_funlockfile F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 88b01c2e75..deb2dd860b 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -1880,6 +1880,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/hppa/libpthread.abilist b/sysdeps/unix/sysv/linux/hppa/libpthread.abilist
index bcba07f575..3a307653f4 100644
--- a/sysdeps/unix/sysv/linux/hppa/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libpthread.abilist
@@ -219,6 +219,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 6d02f31612..156235898a 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2045,6 +2045,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/i386/libpthread.abilist b/sysdeps/unix/sysv/linux/i386/libpthread.abilist
index bece86d246..e95ce92103 100644
--- a/sysdeps/unix/sysv/linux/i386/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libpthread.abilist
@@ -227,6 +227,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index 4249712611..39f291f97d 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -1914,6 +1914,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/ia64/libpthread.abilist b/sysdeps/unix/sysv/linux/ia64/libpthread.abilist
index ccc9449826..6e23c0e62d 100644
--- a/sysdeps/unix/sysv/linux/ia64/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libpthread.abilist
@@ -219,6 +219,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index d47b808862..a18501e570 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -124,6 +124,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.4 _Exit F
GLIBC_2.4 _IO_2_1_stderr_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libpthread.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libpthread.abilist
index af82a4c632..b49c38114a 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libpthread.abilist
@@ -27,6 +27,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.4 _IO_flockfile F
GLIBC_2.4 _IO_ftrylockfile F
GLIBC_2.4 _IO_funlockfile F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index d5e38308be..cf0f41ec70 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -1989,6 +1989,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libpthread.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libpthread.abilist
index bece86d246..e95ce92103 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libpthread.abilist
@@ -227,6 +227,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
index 8596b84399..1405f27555 100644
--- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
@@ -2130,4 +2130,5 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
diff --git a/sysdeps/unix/sysv/linux/microblaze/libpthread.abilist b/sysdeps/unix/sysv/linux/microblaze/libpthread.abilist
index 5067375d23..2a94d9e588 100644
--- a/sysdeps/unix/sysv/linux/microblaze/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/libpthread.abilist
@@ -243,3 +243,4 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 88e0f896d5..7b0144392f 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -1967,6 +1967,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/libpthread.abilist b/sysdeps/unix/sysv/linux/mips/mips32/libpthread.abilist
index 02144967c6..e507b677ee 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/libpthread.abilist
@@ -227,6 +227,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index aff7462c34..430426e03e 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -1965,6 +1965,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/libpthread.abilist b/sysdeps/unix/sysv/linux/mips/mips64/libpthread.abilist
index 02144967c6..e507b677ee 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/libpthread.abilist
@@ -227,6 +227,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 71d82444aa..60915c4d81 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -1973,6 +1973,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index de6c53d293..958488bf19 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -1968,6 +1968,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index e724bab9fb..9f4be44396 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2171,4 +2171,5 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
diff --git a/sysdeps/unix/sysv/linux/nios2/libpthread.abilist b/sysdeps/unix/sysv/linux/nios2/libpthread.abilist
index 78cac2ae27..79fbbec7bb 100644
--- a/sysdeps/unix/sysv/linux/nios2/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libpthread.abilist
@@ -241,3 +241,4 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index e9ecbccb71..758a88cb40 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -1993,6 +1993,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/libpthread.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/libpthread.abilist
index 09e8447b06..2def055dec 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/libpthread.abilist
@@ -227,6 +227,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index da83ea6028..ab8d68446a 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -1997,6 +1997,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
index 4535b40d15..d9dc159d85 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist
@@ -2228,4 +2228,5 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
index 65725de4f0..cc47390688 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist
@@ -123,6 +123,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 _Exit F
GLIBC_2.3 _IO_2_1_stderr_ D 0xe0
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libpthread-le.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libpthread-le.abilist
index 9a9e4cee85..d5b010eee1 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libpthread-le.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libpthread-le.abilist
@@ -243,3 +243,4 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libpthread.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libpthread.abilist
index 8300958d47..26a5ced7a2 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libpthread.abilist
@@ -27,6 +27,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3 _IO_flockfile F
GLIBC_2.3 _IO_ftrylockfile F
GLIBC_2.3 _IO_funlockfile F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index bbb3c4a8e7..a59d67aca2 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2100,4 +2100,5 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libpthread.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libpthread.abilist
index c370fda73d..9da78d59d2 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libpthread.abilist
@@ -235,3 +235,4 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
new file mode 100644
index 0000000000..b6616ef32c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -0,0 +1,73 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+ Contributed by Mathieu Desnoyers <[email protected]>, 2018.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef RSEQ_INTERNAL_H
+#define RSEQ_INTERNAL_H
+
+#include <stdint.h>
+#include <atomic.h>
+#include <linux/rseq.h>
+
+#define RSEQ_SIG 0x53053053
+
+extern __thread volatile struct rseq __rseq_abi
+__attribute__ ((tls_model ("initial-exec")));
+
+extern __thread volatile uint32_t __rseq_refcount
+__attribute__ ((tls_model ("initial-exec")));
+
+static inline int
+sysdep_rseq_register_current_thread (void)
+{
+ int rc, ret = 0;
+ INTERNAL_SYSCALL_DECL (err);
+
+ if (__rseq_abi.cpu_id == RSEQ_CPU_ID_REGISTRATION_FAILED)
+ return -1;
+ if (atomic_fetch_add_relaxed (&__rseq_refcount, 1))
+ goto end;
+ rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
+ 0, RSEQ_SIG);
+ if (!rc)
+ goto end;
+ if (INTERNAL_SYSCALL_ERRNO (rc, err) != EBUSY)
+ __rseq_abi.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED;
+ ret = -1;
+ atomic_decrement (&__rseq_refcount);
+end:
+ return ret;
+}
+
+static inline int
+sysdep_rseq_unregister_current_thread (void)
+{
+ int rc, ret = 0;
+ INTERNAL_SYSCALL_DECL (err);
+
+ if (atomic_fetch_add_relaxed (&__rseq_refcount, -1) - 1)
+ goto end;
+ rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
+ RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
+ if (!rc)
+ goto end;
+ ret = -1;
+end:
+ return ret;
+}
+
+#endif /* rseq-internal.h */
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index e85ac2a178..fca7f6de8d 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2002,6 +2002,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libpthread.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libpthread.abilist
index d05468f3b2..8876434f46 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libpthread.abilist
@@ -229,6 +229,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index d56931022c..74eebccbd0 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -1908,6 +1908,7 @@ GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
GLIBC_2.29 __fentry__ F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libpthread.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libpthread.abilist
index e8161aa747..2ae3980aba 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libpthread.abilist
@@ -221,6 +221,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist
index ff939a15c4..e614aa9f0c 100644
--- a/sysdeps/unix/sysv/linux/sh/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/libc.abilist
@@ -1884,6 +1884,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/sh/libpthread.abilist b/sysdeps/unix/sysv/linux/sh/libpthread.abilist
index bcba07f575..3a307653f4 100644
--- a/sysdeps/unix/sysv/linux/sh/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/sh/libpthread.abilist
@@ -219,6 +219,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 64fa9e10a5..03b0c2dc32 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -1996,6 +1996,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libpthread.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libpthread.abilist
index b413007ccb..d477af8d10 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libpthread.abilist
@@ -227,6 +227,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index db909d1506..800ea4d98b 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -1937,6 +1937,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libpthread.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libpthread.abilist
index ccc9449826..6e23c0e62d 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libpthread.abilist
@@ -219,6 +219,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 3b175f104b..b694817e55 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -1895,6 +1895,7 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
GLIBC_2.3 __ctype_b_loc F
GLIBC_2.3 __ctype_tolower_loc F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libpthread.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libpthread.abilist
index 931c8277a8..889b71fb92 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libpthread.abilist
@@ -219,6 +219,7 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
GLIBC_2.3.2 pthread_cond_broadcast F
GLIBC_2.3.2 pthread_cond_destroy F
GLIBC_2.3.2 pthread_cond_init F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index 1b57710477..e885d3c6eb 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2146,4 +2146,5 @@ GLIBC_2.28 thrd_current F
GLIBC_2.28 thrd_equal F
GLIBC_2.28 thrd_sleep F
GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __rseq_abi T 0x20
GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libpthread.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libpthread.abilist
index c09c9b015a..68886246a2 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libpthread.abilist
@@ -243,3 +243,4 @@ GLIBC_2.28 tss_create F
GLIBC_2.28 tss_delete F
GLIBC_2.28 tss_get F
GLIBC_2.28 tss_set F
+GLIBC_2.29 __rseq_refcount T 0x4
--
2.17.1





2018-11-21 21:22:02

by Mathieu Desnoyers

[permalink] [raw]
Subject: [RFC PATCH v4 2/5] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux

When available, use the cpu_id field from __rseq_abi on Linux to
implement sched_getcpu(). Fall-back on the vgetcpu vDSO if unavailable.

Benchmarks:

x86-64: Intel E5-2630 [email protected], 16-core, hyperthreading

glibc sched_getcpu(): 13.7 ns (baseline)
glibc sched_getcpu() using rseq: 2.5 ns (speedup: 5.5x)
inline load cpuid from __rseq_abi TLS: 0.8 ns (speedup: 17.1x)

Signed-off-by: Mathieu Desnoyers <[email protected]>
CC: Carlos O'Donell <[email protected]>
CC: Florian Weimer <[email protected]>
CC: Joseph Myers <[email protected]>
CC: Szabolcs Nagy <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Ben Maurer <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: "Paul E. McKenney" <[email protected]>
CC: Boqun Feng <[email protected]>
CC: Will Deacon <[email protected]>
CC: Dave Watson <[email protected]>
CC: Paul Turner <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
---
sysdeps/unix/sysv/linux/sched_getcpu.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c
index b69eeda15c..e1a206075c 100644
--- a/sysdeps/unix/sysv/linux/sched_getcpu.c
+++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
@@ -24,8 +24,8 @@
#endif
#include <sysdep-vdso.h>

-int
-sched_getcpu (void)
+static int
+vsyscall_sched_getcpu (void)
{
#ifdef __NR_getcpu
unsigned int cpu;
@@ -37,3 +37,24 @@ sched_getcpu (void)
return -1;
#endif
}
+
+#ifdef __NR_rseq
+#include <linux/rseq.h>
+
+extern __attribute__ ((tls_model ("initial-exec")))
+__thread volatile struct rseq __rseq_abi;
+
+int
+sched_getcpu (void)
+{
+ int cpu_id = __rseq_abi.cpu_id;
+
+ return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu ();
+}
+#else
+int
+sched_getcpu (void)
+{
+ return vsyscall_sched_getcpu ();
+}
+#endif
--
2.17.1


2018-11-23 20:43:24

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Wed, Nov 21, 2018 at 01:39:32PM -0500, Mathieu Desnoyers wrote:
> Register rseq(2) TLS for each thread (including main), and unregister
> for each thread (excluding main). "rseq" stands for Restartable
> Sequences.

Maybe I'm missing something obvious, but "unregister" does not seem to
be a meaningful operation. Can you clarify what it's for?

Rich

2018-11-23 22:21:34

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 22, 2018, at 9:36 AM, Rich Felker [email protected] wrote:

> On Wed, Nov 21, 2018 at 01:39:32PM -0500, Mathieu Desnoyers wrote:
>> Register rseq(2) TLS for each thread (including main), and unregister
>> for each thread (excluding main). "rseq" stands for Restartable
>> Sequences.
>
> Maybe I'm missing something obvious, but "unregister" does not seem to
> be a meaningful operation. Can you clarify what it's for?

There are really two ways rseq TLS can end up being unregistered: either
through an explicit call to the rseq "unregister", or when the OS frees the
thread's task struct.

You bring an interesting point here: do we need to explicitly unregister
rseq at thread exit, or can we leave that to the OS ?

The key thing to look for here is whether it's valid to access the
TLS area of the thread from preemption or signal delivery happening
at the very end of START_THREAD_DEFN. If it's OK to access it until
the very end of the thread lifetime, then we could do without an
explicit unregistration. However, if at any given point of the late
thread lifetime we end up in a situation where reading or writing to
that TLS area can cause corruption, then we need to carefully
unregister it before that memory is reclaimed/reused.

What we have below the current location for the __rseq_unregister_current_thread ()
call is as follows. I'm not all that convinced that it's valid to access the TLS
area up until __exit_thread () at the very end, especially after setting
setxid_futex back to 0.

Thoughts ?

/* Unregister rseq TLS from kernel. */
if (has_rseq && __rseq_unregister_current_thread ())
abort();

advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
pd->guardsize);

/* If the thread is detached free the TCB. */
if (IS_DETACHED (pd))
/* Free the TCB. */
__free_tcb (pd);
else if (__glibc_unlikely (pd->cancelhandling & SETXID_BITMASK))
{
/* Some other thread might call any of the setXid functions and expect
us to reply. In this case wait until we did that. */
do
/* XXX This differs from the typical futex_wait_simple pattern in that
the futex_wait condition (setxid_futex) is different from the
condition used in the surrounding loop (cancelhandling). We need
to check and document why this is correct. */
futex_wait_simple (&pd->setxid_futex, 0, FUTEX_PRIVATE);
while (pd->cancelhandling & SETXID_BITMASK);

/* Reset the value so that the stack can be reused. */
pd->setxid_futex = 0;
}

/* We cannot call '_exit' here. '_exit' will terminate the process.

The 'exit' implementation in the kernel will signal when the
process is really dead since 'clone' got passed the CLONE_CHILD_CLEARTID
flag. The 'tid' field in the TCB will be set to zero.

The exit code is zero since in case all threads exit by calling
'pthread_exit' the exit status must be 0 (zero). */
__exit_thread ();

/* NOTREACHED */

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-23 22:43:05

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Thu, Nov 22, 2018 at 04:11:45PM +0100, Florian Weimer wrote:
> * Mathieu Desnoyers:
>
> > Thoughts ?
> >
> > /* Unregister rseq TLS from kernel. */
> > if (has_rseq && __rseq_unregister_current_thread ())
> > abort();
> >
> > advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
> > pd->guardsize);
> >
> > /* If the thread is detached free the TCB. */
> > if (IS_DETACHED (pd))
> > /* Free the TCB. */
> > __free_tcb (pd);
>
> Considering that we proceed to free the TCB, I really hope that all
> signals are blocked at this point. (I have not checked this, though.)
>
> Wouldn't this address your concern about access to the rseq area?

I'm not familiar with glibc's logic here, but for other reasons, I
don't think freeing it is safe until the kernel task exit futex (set
via clone or set_tid_address) has fired. I would guess __free_tcb just
sets up for it to be reclaimable when this happens rather than
immediately freeing it for reuse.

Rich

2018-11-23 23:06:13

by Florian Weimer

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

* Rich Felker:

> On Thu, Nov 22, 2018 at 04:11:45PM +0100, Florian Weimer wrote:
>> * Mathieu Desnoyers:
>>
>> > Thoughts ?
>> >
>> > /* Unregister rseq TLS from kernel. */
>> > if (has_rseq && __rseq_unregister_current_thread ())
>> > abort();
>> >
>> > advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
>> > pd->guardsize);
>> >
>> > /* If the thread is detached free the TCB. */
>> > if (IS_DETACHED (pd))
>> > /* Free the TCB. */
>> > __free_tcb (pd);
>>
>> Considering that we proceed to free the TCB, I really hope that all
>> signals are blocked at this point. (I have not checked this, though.)
>>
>> Wouldn't this address your concern about access to the rseq area?
>
> I'm not familiar with glibc's logic here, but for other reasons, I
> don't think freeing it is safe until the kernel task exit futex (set
> via clone or set_tid_address) has fired. I would guess __free_tcb just
> sets up for it to be reclaimable when this happens rather than
> immediately freeing it for reuse.

Right, but in case of user-supplied stacks, we actually free TLS memory
at this point, so signals need to be blocked because the TCB is
(partially) gone after that.

Thanks,
Florian

2018-11-23 23:35:18

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Thu, Nov 22, 2018 at 10:33:19AM -0500, Mathieu Desnoyers wrote:
> ----- On Nov 22, 2018, at 10:21 AM, Florian Weimer [email protected] wrote:
>
> > * Rich Felker:
> >
> >> On Thu, Nov 22, 2018 at 04:11:45PM +0100, Florian Weimer wrote:
> >>> * Mathieu Desnoyers:
> >>>
> >>> > Thoughts ?
> >>> >
> >>> > /* Unregister rseq TLS from kernel. */
> >>> > if (has_rseq && __rseq_unregister_current_thread ())
> >>> > abort();
> >>> >
> >>> > advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
> >>> > pd->guardsize);
> >>> >
> >>> > /* If the thread is detached free the TCB. */
> >>> > if (IS_DETACHED (pd))
> >>> > /* Free the TCB. */
> >>> > __free_tcb (pd);
> >>>
> >>> Considering that we proceed to free the TCB, I really hope that all
> >>> signals are blocked at this point. (I have not checked this, though.)
> >>>
> >>> Wouldn't this address your concern about access to the rseq area?
> >>
> >> I'm not familiar with glibc's logic here, but for other reasons, I
> >> don't think freeing it is safe until the kernel task exit futex (set
> >> via clone or set_tid_address) has fired. I would guess __free_tcb just
> >> sets up for it to be reclaimable when this happens rather than
> >> immediately freeing it for reuse.
> >
> > Right, but in case of user-supplied stacks, we actually free TLS memory
> > at this point, so signals need to be blocked because the TCB is
> > (partially) gone after that.
>
> Unfortuntately, disabling signals is not enough.
>
> With rseq registered, the kernel accesses the rseq TLS area when returning to
> user-space after _preemption_ of user-space, which can be triggered at any
> point by an interrupt or a fault, even if signals are blocked.
>
> So if there are cases where the TLS memory is freed while the thread is still
> running, we _need_ to explicitly unregister rseq beforehand.

OK, that makes sense. I was wrongly under the impression that the TLS
memory could not be reused until the task exit futex fired, but in
glibc that's not the case with caller-provided stacks.

I still don't understand the need for a reference count though.

Rich

2018-11-23 23:37:07

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 22, 2018, at 10:14 AM, Rich Felker [email protected] wrote:

> On Thu, Nov 22, 2018 at 10:04:16AM -0500, Mathieu Desnoyers wrote:
>> ----- On Nov 22, 2018, at 9:36 AM, Rich Felker [email protected] wrote:
>>
>> > On Wed, Nov 21, 2018 at 01:39:32PM -0500, Mathieu Desnoyers wrote:
>> >> Register rseq(2) TLS for each thread (including main), and unregister
>> >> for each thread (excluding main). "rseq" stands for Restartable
>> >> Sequences.
>> >
>> > Maybe I'm missing something obvious, but "unregister" does not seem to
>> > be a meaningful operation. Can you clarify what it's for?
>>
>> There are really two ways rseq TLS can end up being unregistered: either
>> through an explicit call to the rseq "unregister", or when the OS frees the
>> thread's task struct.
>>
>> You bring an interesting point here: do we need to explicitly unregister
>> rseq at thread exit, or can we leave that to the OS ?
>>
>> The key thing to look for here is whether it's valid to access the
>> TLS area of the thread from preemption or signal delivery happening
>> at the very end of START_THREAD_DEFN. If it's OK to access it until
>> the very end of the thread lifetime, then we could do without an
>> explicit unregistration. However, if at any given point of the late
>> thread lifetime we end up in a situation where reading or writing to
>> that TLS area can cause corruption, then we need to carefully
>> unregister it before that memory is reclaimed/reused.
>
> The thread memory cannot be reused until after kernel task exit,
> reported via the set_tid_address futex. Also, assuming signals are
> blocked (which is absolutely necessary for other reasons) nothing in
> userspace can touch the rseq state after this point anyway.

As discussed in the other leg of the email thread, disabling signals is
not enough to prevent the kernel to access the rseq TLS area on preemption.

> I was more confused about the need for reference counting, though.
> Where would anything be able to observe a state other than "refcnt>0"?
> -- in which case tracking it makes no sense. If the goal is to make an
> ABI thatsupports environments where libc doesn't have rseq support,
> and a third-party library is providing a compatible ABI, it seems all
> that would be needed it a boolean thread-local "is_initialized" flag.
> There does not seem to be any safe way such a library could be
> dynamically unloaded (which would require unregistration in all
> threads) and thus no need for a count.

Here is one scenario: we have 2 early adopter libraries using rseq which
are deployed in an environment with an older glibc (which does not
support rseq).

Of course, none of those libraries can be dlclose'd unless they somehow
track all registered threads. But let's focus on how exactly those
libraries can handle lazily registering rseq. They can use pthread_key,
and pthread_setspecific on first use by the thread to setup a destructor
function to be invoked at thread exit. But each early adopter library
is unaware of the other, so if we just use a "is_initialized" flag, the
first destructor to run will unregister rseq while the second library
may still be using it.

The same problem arises if we have an application early adopter which
explicitly deal with rseq, with a library early adopter. The issue is
similar, except that the application will explicitly want to unregister
rseq before exiting the thread, which leaves a race window where rseq
is unregistered, but the library may still need to use it.

The reference counter solves this: only the last rseq user for a thread
performs unregistration.

Thanks,

Mathieu



--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-24 00:22:42

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Thu, Nov 22, 2018 at 10:04:16AM -0500, Mathieu Desnoyers wrote:
> ----- On Nov 22, 2018, at 9:36 AM, Rich Felker [email protected] wrote:
>
> > On Wed, Nov 21, 2018 at 01:39:32PM -0500, Mathieu Desnoyers wrote:
> >> Register rseq(2) TLS for each thread (including main), and unregister
> >> for each thread (excluding main). "rseq" stands for Restartable
> >> Sequences.
> >
> > Maybe I'm missing something obvious, but "unregister" does not seem to
> > be a meaningful operation. Can you clarify what it's for?
>
> There are really two ways rseq TLS can end up being unregistered: either
> through an explicit call to the rseq "unregister", or when the OS frees the
> thread's task struct.
>
> You bring an interesting point here: do we need to explicitly unregister
> rseq at thread exit, or can we leave that to the OS ?
>
> The key thing to look for here is whether it's valid to access the
> TLS area of the thread from preemption or signal delivery happening
> at the very end of START_THREAD_DEFN. If it's OK to access it until
> the very end of the thread lifetime, then we could do without an
> explicit unregistration. However, if at any given point of the late
> thread lifetime we end up in a situation where reading or writing to
> that TLS area can cause corruption, then we need to carefully
> unregister it before that memory is reclaimed/reused.

The thread memory cannot be reused until after kernel task exit,
reported via the set_tid_address futex. Also, assuming signals are
blocked (which is absolutely necessary for other reasons) nothing in
userspace can touch the rseq state after this point anyway.

I was more confused about the need for reference counting, though.
Where would anything be able to observe a state other than "refcnt>0"?
-- in which case tracking it makes no sense. If the goal is to make an
ABI thatsupports environments where libc doesn't have rseq support,
and a third-party library is providing a compatible ABI, it seems all
that would be needed it a boolean thread-local "is_initialized" flag.
There does not seem to be any safe way such a library could be
dynamically unloaded (which would require unregistration in all
threads) and thus no need for a count.

Rich

2018-11-24 00:24:18

by Florian Weimer

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

* Mathieu Desnoyers:

> Thoughts ?
>
> /* Unregister rseq TLS from kernel. */
> if (has_rseq && __rseq_unregister_current_thread ())
> abort();
>
> advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
> pd->guardsize);
>
> /* If the thread is detached free the TCB. */
> if (IS_DETACHED (pd))
> /* Free the TCB. */
> __free_tcb (pd);

Considering that we proceed to free the TCB, I really hope that all
signals are blocked at this point. (I have not checked this, though.)

Wouldn't this address your concern about access to the rseq area?

Thanks,
Florian

2018-11-24 00:24:46

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 22, 2018, at 10:21 AM, Florian Weimer [email protected] wrote:

> * Rich Felker:
>
>> On Thu, Nov 22, 2018 at 04:11:45PM +0100, Florian Weimer wrote:
>>> * Mathieu Desnoyers:
>>>
>>> > Thoughts ?
>>> >
>>> > /* Unregister rseq TLS from kernel. */
>>> > if (has_rseq && __rseq_unregister_current_thread ())
>>> > abort();
>>> >
>>> > advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
>>> > pd->guardsize);
>>> >
>>> > /* If the thread is detached free the TCB. */
>>> > if (IS_DETACHED (pd))
>>> > /* Free the TCB. */
>>> > __free_tcb (pd);
>>>
>>> Considering that we proceed to free the TCB, I really hope that all
>>> signals are blocked at this point. (I have not checked this, though.)
>>>
>>> Wouldn't this address your concern about access to the rseq area?
>>
>> I'm not familiar with glibc's logic here, but for other reasons, I
>> don't think freeing it is safe until the kernel task exit futex (set
>> via clone or set_tid_address) has fired. I would guess __free_tcb just
>> sets up for it to be reclaimable when this happens rather than
>> immediately freeing it for reuse.
>
> Right, but in case of user-supplied stacks, we actually free TLS memory
> at this point, so signals need to be blocked because the TCB is
> (partially) gone after that.

Unfortuntately, disabling signals is not enough.

With rseq registered, the kernel accesses the rseq TLS area when returning to
user-space after _preemption_ of user-space, which can be triggered at any
point by an interrupt or a fault, even if signals are blocked.

So if there are cases where the TLS memory is freed while the thread is still
running, we _need_ to explicitly unregister rseq beforehand.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-24 02:37:28

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On 22/11/18 15:33, Mathieu Desnoyers wrote:
> ----- On Nov 22, 2018, at 10:21 AM, Florian Weimer [email protected] wrote:
>> Right, but in case of user-supplied stacks, we actually free TLS memory
>> at this point, so signals need to be blocked because the TCB is
>> (partially) gone after that.
>
> Unfortuntately, disabling signals is not enough.
>
> With rseq registered, the kernel accesses the rseq TLS area when returning to
> user-space after _preemption_ of user-space, which can be triggered at any
> point by an interrupt or a fault, even if signals are blocked.
>
> So if there are cases where the TLS memory is freed while the thread is still
> running, we _need_ to explicitly unregister rseq beforehand.

i think the man page should point this out.

the memory of a registered rseq object must not be freed
before thread exit. (either unregister it or free later)

and ideally also point out that c language thread storage
duration does not provide this guarantee: it may be freed
by the implementation before thread exit (which is currently
not observable, but with the rseq syscall it is).

2018-11-24 03:16:17

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 22, 2018, at 11:28 AM, Florian Weimer [email protected] wrote:

> * Mathieu Desnoyers:
>
>> Here is one scenario: we have 2 early adopter libraries using rseq which
>> are deployed in an environment with an older glibc (which does not
>> support rseq).
>>
>> Of course, none of those libraries can be dlclose'd unless they somehow
>> track all registered threads.
>
> Well, you can always make them NODELETE so that dlclose is not an issue.
> If the library is small enough, that shouldn't be a problem.

That's indeed what I do with lttng-ust, mainly due to use of pthread_key.

>
>> But let's focus on how exactly those libraries can handle lazily
>> registering rseq. They can use pthread_key, and pthread_setspecific on
>> first use by the thread to setup a destructor function to be invoked
>> at thread exit. But each early adopter library is unaware of the
>> other, so if we just use a "is_initialized" flag, the first destructor
>> to run will unregister rseq while the second library may still be
>> using it.
>
> I don't think you need unregistering if the memory is initial-exec TLS
> memory. Initial-exec TLS memory is tied directly to the TCB and cannot
> be freed while the thread is running, so it should be safe to put the
> rseq area there even if glibc knows nothing about it.

Is it true for user-supplied stacks as well ?

> Then you'll only
> need a mechanism to find the address of the actually active rseq area
> (which you probably have to store in a TLS variable for performance
> reasons). And that part you need whether you have reference counter or
> not.

I'm not sure I follow your thoughts here. Currently, the __rseq_abi
TLS symbol identifies a structure registered to the kernel. The
"currently active" rseq critical section is identified by the field
"rseq_cs" within the __rseq_abi structure.

So here when you say "actually active rseq area", do you mean the
currently registered struct rseq (__rseq_abi) or the currently running
rseq critical section ? (pointed to by __rseq_abi.rseq_cs)

One issue here is that early adopter libraries cannot always use
the IE model. I tried using it for other TLS variables in lttng-ust, and
it ended up hanging our CI tests when tracing a sample application with
lttng-ust under a Java virtual machine: being dlopen'd in a process that
possibly already exhausts the number of available backup TLS IE entries
seems to have odd effects. This is why I'm worried about using the IE model
within lttng-ust.

So using the IE model for glibc makes sense, because nobody dlopen
glibc AFAIK. But it's not so simple for early adopter libraries which
can be dlopen'd.

>
>> The same problem arises if we have an application early adopter which
>> explicitly deal with rseq, with a library early adopter. The issue is
>> similar, except that the application will explicitly want to unregister
>> rseq before exiting the thread, which leaves a race window where rseq
>> is unregistered, but the library may still need to use it.
>>
>> The reference counter solves this: only the last rseq user for a thread
>> performs unregistration.
>
> If you do explicit unregistration, you will run into issues related to
> destructor ordering. You should really find a way to avoid that.

The per-thread reference counter is a way to avoid issues that arise from
lack of destructor ordering. Is it an acceptable approach for you, or
you have something else in mind ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-24 05:52:50

by Florian Weimer

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

* Mathieu Desnoyers:

> Here is one scenario: we have 2 early adopter libraries using rseq which
> are deployed in an environment with an older glibc (which does not
> support rseq).
>
> Of course, none of those libraries can be dlclose'd unless they somehow
> track all registered threads.

Well, you can always make them NODELETE so that dlclose is not an issue.
If the library is small enough, that shouldn't be a problem.

> But let's focus on how exactly those libraries can handle lazily
> registering rseq. They can use pthread_key, and pthread_setspecific on
> first use by the thread to setup a destructor function to be invoked
> at thread exit. But each early adopter library is unaware of the
> other, so if we just use a "is_initialized" flag, the first destructor
> to run will unregister rseq while the second library may still be
> using it.

I don't think you need unregistering if the memory is initial-exec TLS
memory. Initial-exec TLS memory is tied directly to the TCB and cannot
be freed while the thread is running, so it should be safe to put the
rseq area there even if glibc knows nothing about it. Then you'll only
need a mechanism to find the address of the actually active rseq area
(which you probably have to store in a TLS variable for performance
reasons). And that part you need whether you have reference counter or
not.

> The same problem arises if we have an application early adopter which
> explicitly deal with rseq, with a library early adopter. The issue is
> similar, except that the application will explicitly want to unregister
> rseq before exiting the thread, which leaves a race window where rseq
> is unregistered, but the library may still need to use it.
>
> The reference counter solves this: only the last rseq user for a thread
> performs unregistration.

If you do explicit unregistration, you will run into issues related to
destructor ordering. You should really find a way to avoid that.

Thanks,
Florian

2018-11-24 06:55:42

by Florian Weimer

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

* Mathieu Desnoyers:

> ----- On Nov 22, 2018, at 11:28 AM, Florian Weimer [email protected] wrote:
>
>> * Mathieu Desnoyers:
>>
>>> Here is one scenario: we have 2 early adopter libraries using rseq which
>>> are deployed in an environment with an older glibc (which does not
>>> support rseq).
>>>
>>> Of course, none of those libraries can be dlclose'd unless they somehow
>>> track all registered threads.
>>
>> Well, you can always make them NODELETE so that dlclose is not an issue.
>> If the library is small enough, that shouldn't be a problem.
>
> That's indeed what I do with lttng-ust, mainly due to use of pthread_key.
>
>>
>>> But let's focus on how exactly those libraries can handle lazily
>>> registering rseq. They can use pthread_key, and pthread_setspecific on
>>> first use by the thread to setup a destructor function to be invoked
>>> at thread exit. But each early adopter library is unaware of the
>>> other, so if we just use a "is_initialized" flag, the first destructor
>>> to run will unregister rseq while the second library may still be
>>> using it.
>>
>> I don't think you need unregistering if the memory is initial-exec TLS
>> memory. Initial-exec TLS memory is tied directly to the TCB and cannot
>> be freed while the thread is running, so it should be safe to put the
>> rseq area there even if glibc knows nothing about it.
>
> Is it true for user-supplied stacks as well ?

I'm not entirely sure because the glibc terminology is confusing, but I
think it places intial-exec TLS into the static TLS area (so that it has
a fixed offset from the TCB). The static TLS area is placed on the
user-supplied stack.

>> Then you'll only need a mechanism to find the address of the actually
>> active rseq area (which you probably have to store in a TLS variable
>> for performance reasons). And that part you need whether you have
>> reference counter or not.
>
> I'm not sure I follow your thoughts here. Currently, the __rseq_abi
> TLS symbol identifies a structure registered to the kernel. The
> "currently active" rseq critical section is identified by the field
> "rseq_cs" within the __rseq_abi structure.
>
> So here when you say "actually active rseq area", do you mean the
> currently registered struct rseq (__rseq_abi) or the currently running
> rseq critical section ? (pointed to by __rseq_abi.rseq_cs)

__rseq_abi.

> One issue here is that early adopter libraries cannot always use
> the IE model. I tried using it for other TLS variables in lttng-ust, and
> it ended up hanging our CI tests when tracing a sample application with
> lttng-ust under a Java virtual machine: being dlopen'd in a process that
> possibly already exhausts the number of available backup TLS IE entries
> seems to have odd effects. This is why I'm worried about using the IE model
> within lttng-ust.

You can work around this by preloading the library. I'm not sure if
this is a compelling reason not to use initial-exec TLS memory.

>>> The same problem arises if we have an application early adopter which
>>> explicitly deal with rseq, with a library early adopter. The issue is
>>> similar, except that the application will explicitly want to unregister
>>> rseq before exiting the thread, which leaves a race window where rseq
>>> is unregistered, but the library may still need to use it.
>>>
>>> The reference counter solves this: only the last rseq user for a thread
>>> performs unregistration.
>>
>> If you do explicit unregistration, you will run into issues related to
>> destructor ordering. You should really find a way to avoid that.
>
> The per-thread reference counter is a way to avoid issues that arise from
> lack of destructor ordering. Is it an acceptable approach for you, or
> you have something else in mind ?

Only for the involved libraries. It will not help if other TLS
destructors run and use these libraries.

Thanks,
Florian

2018-11-24 07:01:24

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Thu, Nov 22, 2018 at 05:59:44PM +0100, Florian Weimer wrote:
> * Mathieu Desnoyers:
>
> > ----- On Nov 22, 2018, at 11:28 AM, Florian Weimer [email protected] wrote:
> >
> >> * Mathieu Desnoyers:
> >>
> >>> Here is one scenario: we have 2 early adopter libraries using rseq which
> >>> are deployed in an environment with an older glibc (which does not
> >>> support rseq).
> >>>
> >>> Of course, none of those libraries can be dlclose'd unless they somehow
> >>> track all registered threads.
> >>
> >> Well, you can always make them NODELETE so that dlclose is not an issue.
> >> If the library is small enough, that shouldn't be a problem.
> >
> > That's indeed what I do with lttng-ust, mainly due to use of pthread_key.
> >
> >>
> >>> But let's focus on how exactly those libraries can handle lazily
> >>> registering rseq. They can use pthread_key, and pthread_setspecific on
> >>> first use by the thread to setup a destructor function to be invoked
> >>> at thread exit. But each early adopter library is unaware of the
> >>> other, so if we just use a "is_initialized" flag, the first destructor
> >>> to run will unregister rseq while the second library may still be
> >>> using it.
> >>
> >> I don't think you need unregistering if the memory is initial-exec TLS
> >> memory. Initial-exec TLS memory is tied directly to the TCB and cannot
> >> be freed while the thread is running, so it should be safe to put the
> >> rseq area there even if glibc knows nothing about it.
> >
> > Is it true for user-supplied stacks as well ?
>
> I'm not entirely sure because the glibc terminology is confusing, but I
> think it places intial-exec TLS into the static TLS area (so that it has
> a fixed offset from the TCB). The static TLS area is placed on the
> user-supplied stack.

This is an implementation detail that should not leak to applications,
and I believe it's still considered a bug, in that, with large static
TLS, it could overflow or leave unusably little space left on an
otherwise-plenty-large application-provided stack.

> > One issue here is that early adopter libraries cannot always use
> > the IE model. I tried using it for other TLS variables in lttng-ust, and
> > it ended up hanging our CI tests when tracing a sample application with
> > lttng-ust under a Java virtual machine: being dlopen'd in a process that
> > possibly already exhausts the number of available backup TLS IE entries
> > seems to have odd effects. This is why I'm worried about using the IE model
> > within lttng-ust.
>
> You can work around this by preloading the library. I'm not sure if
> this is a compelling reason not to use initial-exec TLS memory.

Use of IE model from a .so file (except possibly libc.so or something
else that inherently needs to be present at program startup for other
reasons) should be a considered a bug and unsupported usage.
Encouraging libraries to perpetuate this behavior is going backwards
on progress that's being made to end it.

> >>> The same problem arises if we have an application early adopter which
> >>> explicitly deal with rseq, with a library early adopter. The issue is
> >>> similar, except that the application will explicitly want to unregister
> >>> rseq before exiting the thread, which leaves a race window where rseq
> >>> is unregistered, but the library may still need to use it.
> >>>
> >>> The reference counter solves this: only the last rseq user for a thread
> >>> performs unregistration.
> >>
> >> If you do explicit unregistration, you will run into issues related to
> >> destructor ordering. You should really find a way to avoid that.
> >
> > The per-thread reference counter is a way to avoid issues that arise from
> > lack of destructor ordering. Is it an acceptable approach for you, or
> > you have something else in mind ?
>
> Only for the involved libraries. It will not help if other TLS
> destructors run and use these libraries.

Presumably they should have registered their need for rseq too,
thereby incrementing the reference count. I'm not sure this is a good
idea, but I think I understand it now.

Rich

2018-11-24 07:04:24

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation



----- On Nov 22, 2018, at 11:59 AM, Florian Weimer [email protected] wrote:

> * Mathieu Desnoyers:
>
>> ----- On Nov 22, 2018, at 11:28 AM, Florian Weimer [email protected] wrote:
>>
>>> * Mathieu Desnoyers:
>>>
>>>> Here is one scenario: we have 2 early adopter libraries using rseq which
>>>> are deployed in an environment with an older glibc (which does not
>>>> support rseq).
>>>>
>>>> Of course, none of those libraries can be dlclose'd unless they somehow
>>>> track all registered threads.
>>>
>>> Well, you can always make them NODELETE so that dlclose is not an issue.
>>> If the library is small enough, that shouldn't be a problem.
>>
>> That's indeed what I do with lttng-ust, mainly due to use of pthread_key.
>>
>>>
>>>> But let's focus on how exactly those libraries can handle lazily
>>>> registering rseq. They can use pthread_key, and pthread_setspecific on
>>>> first use by the thread to setup a destructor function to be invoked
>>>> at thread exit. But each early adopter library is unaware of the
>>>> other, so if we just use a "is_initialized" flag, the first destructor
>>>> to run will unregister rseq while the second library may still be
>>>> using it.
>>>
>>> I don't think you need unregistering if the memory is initial-exec TLS
>>> memory. Initial-exec TLS memory is tied directly to the TCB and cannot
>>> be freed while the thread is running, so it should be safe to put the
>>> rseq area there even if glibc knows nothing about it.
>>
>> Is it true for user-supplied stacks as well ?
>
> I'm not entirely sure because the glibc terminology is confusing, but I
> think it places intial-exec TLS into the static TLS area (so that it has
> a fixed offset from the TCB). The static TLS area is placed on the
> user-supplied stack.

You said earlier in the email thread that user-supplied stack can be
reclaimed by __free_tcb () while the thread still runs, am I correct ?
If so, then we really want to unregister the rseq TLS before that.

I notice that __free_tcb () calls __deallocate_stack (), which invokes
_dl_deallocate_tls (). Accessing the TLS from the kernel upon preemption
would appear fragile after this call.

[...]

>> One issue here is that early adopter libraries cannot always use
>> the IE model. I tried using it for other TLS variables in lttng-ust, and
>> it ended up hanging our CI tests when tracing a sample application with
>> lttng-ust under a Java virtual machine: being dlopen'd in a process that
>> possibly already exhausts the number of available backup TLS IE entries
>> seems to have odd effects. This is why I'm worried about using the IE model
>> within lttng-ust.
>
> You can work around this by preloading the library. I'm not sure if
> this is a compelling reason not to use initial-exec TLS memory.

LTTng-UST is meant to be used as a dependency for e.g. a java logger,
or a python logger. Those rely on dlopen, and it would be very painful
to ask all users to preload lttng-ust within their environment which is
sometimes already complex. It works today through dlopen, and I consider
this a user-facing behavior which I am very reluctant to break.

>
>>>> The same problem arises if we have an application early adopter which
>>>> explicitly deal with rseq, with a library early adopter. The issue is
>>>> similar, except that the application will explicitly want to unregister
>>>> rseq before exiting the thread, which leaves a race window where rseq
>>>> is unregistered, but the library may still need to use it.
>>>>
>>>> The reference counter solves this: only the last rseq user for a thread
>>>> performs unregistration.
>>>
>>> If you do explicit unregistration, you will run into issues related to
>>> destructor ordering. You should really find a way to avoid that.
>>
>> The per-thread reference counter is a way to avoid issues that arise from
>> lack of destructor ordering. Is it an acceptable approach for you, or
>> you have something else in mind ?
>
> Only for the involved libraries. It will not help if other TLS
> destructors run and use these libraries.

You bring an interesting point. The reference counter suffice to ensure
that the kernel will not try to reference the TLS area beyond its registration
scope, but it does not guarantee that another destructor (or a signal
handler) won't try to use the rseq TLS area after it has been unregistered.

Unregistration of the TLS before freeing its memory is required for correctness.

However, a use-after-unregistration can be dealt with by other means. This
is one of the reasons why I want to upstream the "cpu_opv" system call into
Linux: this is a fallback mechanism to use when rseq cannot do forward
progress (e.g. debugger single-stepping), or to use in those scenarios
where rseq is not registered (early at thread creation, or late at thread
exit). Moreover, it allows handling use-cases of migration of data between
per-cpu data structures, which is pretty much impossible to do right if we
only have rseq available.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-24 07:12:07

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 22, 2018, at 11:24 AM, Szabolcs Nagy [email protected] wrote:

> On 22/11/18 15:33, Mathieu Desnoyers wrote:
>> ----- On Nov 22, 2018, at 10:21 AM, Florian Weimer [email protected] wrote:
>>> Right, but in case of user-supplied stacks, we actually free TLS memory
>>> at this point, so signals need to be blocked because the TCB is
>>> (partially) gone after that.
>>
>> Unfortuntately, disabling signals is not enough.
>>
>> With rseq registered, the kernel accesses the rseq TLS area when returning to
>> user-space after _preemption_ of user-space, which can be triggered at any
>> point by an interrupt or a fault, even if signals are blocked.
>>
>> So if there are cases where the TLS memory is freed while the thread is still
>> running, we _need_ to explicitly unregister rseq beforehand.
>
> i think the man page should point this out.

Yes, I should add this to the proposed rseq(2) man page.

>
> the memory of a registered rseq object must not be freed
> before thread exit. (either unregister it or free later)
>
> and ideally also point out that c language thread storage
> duration does not provide this guarantee: it may be freed
> by the implementation before thread exit (which is currently
> not observable, but with the rseq syscall it is).

How about the following wording ?

Memory of a registered rseq object must not be freed before the
thread exits. Reclaim of rseq object's memory must only be
done after either an explicit rseq unregistration is performed
or after the thread exit. Keep in mind that the implementation
of the Thread-Local Storage (C language __thread) lifetime does
not guarantee existence of the TLS area up until the thread exits.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-24 07:14:47

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Thu, Nov 22, 2018 at 01:35:44PM -0500, Mathieu Desnoyers wrote:
> ----- On Nov 22, 2018, at 11:24 AM, Szabolcs Nagy [email protected] wrote:
>
> > On 22/11/18 15:33, Mathieu Desnoyers wrote:
> >> ----- On Nov 22, 2018, at 10:21 AM, Florian Weimer [email protected] wrote:
> >>> Right, but in case of user-supplied stacks, we actually free TLS memory
> >>> at this point, so signals need to be blocked because the TCB is
> >>> (partially) gone after that.
> >>
> >> Unfortuntately, disabling signals is not enough.
> >>
> >> With rseq registered, the kernel accesses the rseq TLS area when returning to
> >> user-space after _preemption_ of user-space, which can be triggered at any
> >> point by an interrupt or a fault, even if signals are blocked.
> >>
> >> So if there are cases where the TLS memory is freed while the thread is still
> >> running, we _need_ to explicitly unregister rseq beforehand.
> >
> > i think the man page should point this out.
>
> Yes, I should add this to the proposed rseq(2) man page.
>
> >
> > the memory of a registered rseq object must not be freed
> > before thread exit. (either unregister it or free later)
> >
> > and ideally also point out that c language thread storage
> > duration does not provide this guarantee: it may be freed
> > by the implementation before thread exit (which is currently
> > not observable, but with the rseq syscall it is).
>
> How about the following wording ?
>
> Memory of a registered rseq object must not be freed before the
> thread exits. Reclaim of rseq object's memory must only be
> done after either an explicit rseq unregistration is performed
> or after the thread exit. Keep in mind that the implementation
> of the Thread-Local Storage (C language __thread) lifetime does
> not guarantee existence of the TLS area up until the thread exits.

This is all really ugly for application/library code to have to deal
with. Maybe if the man page is considered as documenting the syscall
only, and not something you can use, it's okay, but "until the thread
exits" is not well-defined in the sense you want it here. It's more
like "until the kernel task for the thread exits", and the whole point
is that there is some interval in time between the abstract thread
exit and the kernel task exit that is not observable without rseq but
is observable if the rseq is wrongly left installed.

Rich

2018-11-24 07:17:34

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 22, 2018, at 2:01 PM, Rich Felker [email protected] wrote:

> On Thu, Nov 22, 2018 at 01:35:44PM -0500, Mathieu Desnoyers wrote:
>> ----- On Nov 22, 2018, at 11:24 AM, Szabolcs Nagy [email protected] wrote:
>>
>> > On 22/11/18 15:33, Mathieu Desnoyers wrote:
>> >> ----- On Nov 22, 2018, at 10:21 AM, Florian Weimer [email protected] wrote:
>> >>> Right, but in case of user-supplied stacks, we actually free TLS memory
>> >>> at this point, so signals need to be blocked because the TCB is
>> >>> (partially) gone after that.
>> >>
>> >> Unfortuntately, disabling signals is not enough.
>> >>
>> >> With rseq registered, the kernel accesses the rseq TLS area when returning to
>> >> user-space after _preemption_ of user-space, which can be triggered at any
>> >> point by an interrupt or a fault, even if signals are blocked.
>> >>
>> >> So if there are cases where the TLS memory is freed while the thread is still
>> >> running, we _need_ to explicitly unregister rseq beforehand.
>> >
>> > i think the man page should point this out.
>>
>> Yes, I should add this to the proposed rseq(2) man page.
>>
>> >
>> > the memory of a registered rseq object must not be freed
>> > before thread exit. (either unregister it or free later)
>> >
>> > and ideally also point out that c language thread storage
>> > duration does not provide this guarantee: it may be freed
>> > by the implementation before thread exit (which is currently
>> > not observable, but with the rseq syscall it is).
>>
>> How about the following wording ?
>>
>> Memory of a registered rseq object must not be freed before the
>> thread exits. Reclaim of rseq object's memory must only be
>> done after either an explicit rseq unregistration is performed
>> or after the thread exit. Keep in mind that the implementation
>> of the Thread-Local Storage (C language __thread) lifetime does
>> not guarantee existence of the TLS area up until the thread exits.
>
> This is all really ugly for application/library code to have to deal
> with. Maybe if the man page is considered as documenting the syscall
> only, and not something you can use, it's okay,

This is indeed for the rseq(2) manpage targeting the man-pages project,
which documents system calls.

> but "until the thread
> exits" is not well-defined in the sense you want it here. It's more
> like "until the kernel task for the thread exits", and the whole point
> is that there is some interval in time between the abstract thread
> exit and the kernel task exit that is not observable without rseq but
> is observable if the rseq is wrongly left installed.

It's important to clear a possible misunderstanding here: from the
point where the thread issues the "exit" system call, the kernel won't
touch the registered rseq TLS area anymore.

So the point where the thread exits is actually well defined, even from
a user-space perspective.

The problematic scenario arises when glibc frees the TLS memory
before invoking exit() when the thread terminates. In this kind of
scenario, we need to explicitly invoke rseq unregister before TLS
memory reclaim.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-24 08:34:10

by Florian Weimer

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

* Rich Felker:

>> I'm not entirely sure because the glibc terminology is confusing, but I
>> think it places intial-exec TLS into the static TLS area (so that it has
>> a fixed offset from the TCB). The static TLS area is placed on the
>> user-supplied stack.
>
> This is an implementation detail that should not leak to applications,
> and I believe it's still considered a bug, in that, with large static
> TLS, it could overflow or leave unusably little space left on an
> otherwise-plenty-large application-provided stack.

Sure, but that does not matter in this context because right now, there
is no fix for this bug, and when we fix it, we can take backwards
compatibility into account.

Any library that ends up using rseq will need to coordinate with the
toolchain. I think that's unavoidable given the kernel interface.

>> > One issue here is that early adopter libraries cannot always use
>> > the IE model. I tried using it for other TLS variables in lttng-ust, and
>> > it ended up hanging our CI tests when tracing a sample application with
>> > lttng-ust under a Java virtual machine: being dlopen'd in a process that
>> > possibly already exhausts the number of available backup TLS IE entries
>> > seems to have odd effects. This is why I'm worried about using the IE model
>> > within lttng-ust.
>>
>> You can work around this by preloading the library. I'm not sure if
>> this is a compelling reason not to use initial-exec TLS memory.
>
> Use of IE model from a .so file (except possibly libc.so or something
> else that inherently needs to be present at program startup for other
> reasons) should be a considered a bug and unsupported usage.
> Encouraging libraries to perpetuate this behavior is going backwards
> on progress that's being made to end it.

Why? Just because glibc's TCB allocation strategy is problematic?
We can fix that, even with dlopen.

If you are only concerned about the interactions with dlopen, then why
do you think initial-exec TLS is the problem, and not dlopen?

>> > The per-thread reference counter is a way to avoid issues that arise from
>> > lack of destructor ordering. Is it an acceptable approach for you, or
>> > you have something else in mind ?
>>
>> Only for the involved libraries. It will not help if other TLS
>> destructors run and use these libraries.
>
> Presumably they should have registered their need for rseq too,
> thereby incrementing the reference count. I'm not sure this is a good
> idea, but I think I understand it now.

They may have to increase the reference count from 0 to 1, though, so
they have to re-register the rseq area. This tends to get rather messy.

I still I think implicit destruction of the rseq area is preferable over
this complexity.

Thanks,
Florian

2018-11-24 08:35:06

by Florian Weimer

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

* Mathieu Desnoyers:

>>>> I don't think you need unregistering if the memory is initial-exec TLS
>>>> memory. Initial-exec TLS memory is tied directly to the TCB and cannot
>>>> be freed while the thread is running, so it should be safe to put the
>>>> rseq area there even if glibc knows nothing about it.
>>>
>>> Is it true for user-supplied stacks as well ?
>>
>> I'm not entirely sure because the glibc terminology is confusing, but I
>> think it places intial-exec TLS into the static TLS area (so that it has
>> a fixed offset from the TCB). The static TLS area is placed on the
>> user-supplied stack.
>
> You said earlier in the email thread that user-supplied stack can be
> reclaimed by __free_tcb () while the thread still runs, am I correct ?
> If so, then we really want to unregister the rseq TLS before that.

No, dynamic TLS can be reclaimed. Static TLS (which I assume includes
initial-exec TLS) is not deallocated.

> I notice that __free_tcb () calls __deallocate_stack (), which invokes
> _dl_deallocate_tls (). Accessing the TLS from the kernel upon preemption
> would appear fragile after this call.

_dl_deallocate_tls only covers dynamic TLS.

Thanks,
Florian

2018-11-24 08:37:55

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Fri, Nov 23, 2018 at 02:10:14PM +0100, Florian Weimer wrote:
> * Rich Felker:
>
> >> I'm not entirely sure because the glibc terminology is confusing, but I
> >> think it places intial-exec TLS into the static TLS area (so that it has
> >> a fixed offset from the TCB). The static TLS area is placed on the
> >> user-supplied stack.
> >
> > This is an implementation detail that should not leak to applications,
> > and I believe it's still considered a bug, in that, with large static
> > TLS, it could overflow or leave unusably little space left on an
> > otherwise-plenty-large application-provided stack.
>
> Sure, but that does not matter in this context because right now, there
> is no fix for this bug, and when we fix it, we can take backwards
> compatibility into account.
>
> Any library that ends up using rseq will need to coordinate with the
> toolchain. I think that's unavoidable given the kernel interface.

Right. I don't agree with this. What I'm saying is that this behavior
(putting static TLS in the caller-provided stack) should not be
documented as a behavior applications can rely on or accounted as a
solution to the rseq problem, since doing so would preclude fixing the
"application doesn't have as much stack as it requested" bug.

> >> > One issue here is that early adopter libraries cannot always use
> >> > the IE model. I tried using it for other TLS variables in lttng-ust, and
> >> > it ended up hanging our CI tests when tracing a sample application with
> >> > lttng-ust under a Java virtual machine: being dlopen'd in a process that
> >> > possibly already exhausts the number of available backup TLS IE entries
> >> > seems to have odd effects. This is why I'm worried about using the IE model
> >> > within lttng-ust.
> >>
> >> You can work around this by preloading the library. I'm not sure if
> >> this is a compelling reason not to use initial-exec TLS memory.
> >
> > Use of IE model from a .so file (except possibly libc.so or something
> > else that inherently needs to be present at program startup for other
> > reasons) should be a considered a bug and unsupported usage.
> > Encouraging libraries to perpetuate this behavior is going backwards
> > on progress that's being made to end it.
>
> Why? Just because glibc's TCB allocation strategy is problematic?
> We can fix that, even with dlopen.
>
> If you are only concerned about the interactions with dlopen, then why
> do you think initial-exec TLS is the problem, and not dlopen?

The initial-exec model, *by design*, only works for TLS objects that
exist at initial execution time. That's why it's called initial-exec.
This is not an implementation flaw/limitation in glibc but
fundamental to the fact that you don't have an unlimited-size (or
practically unlimited) virtual address space range for each thread.
The global-dynamic model is the one that admits dynamic creation of
new TLS objects at runtime (thus the name).

> >> > The per-thread reference counter is a way to avoid issues that arise from
> >> > lack of destructor ordering. Is it an acceptable approach for you, or
> >> > you have something else in mind ?
> >>
> >> Only for the involved libraries. It will not help if other TLS
> >> destructors run and use these libraries.
> >
> > Presumably they should have registered their need for rseq too,
> > thereby incrementing the reference count. I'm not sure this is a good
> > idea, but I think I understand it now.
>
> They may have to increase the reference count from 0 to 1, though, so
> they have to re-register the rseq area. This tends to get rather messy.
>
> I still I think implicit destruction of the rseq area is preferable over
> this complexity.

Absolutely. As long as it's in libc, implicit destruction will happen.
Actually I think the glibc code shound unconditionally unregister the
rseq address at exit (after blocking signals, so no application code
can run) in case a third-party rseq library was linked and failed to
do so before thread exit (e.g. due to mismatched ref counts) rather
than respecting the reference count, since it knows it's the last
user. This would make potentially-buggy code safer.

Rich

2018-11-24 08:45:13

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
> ----- On Nov 23, 2018, at 9:28 AM, Rich Felker [email protected] wrote:
> [...]
> >
> > Absolutely. As long as it's in libc, implicit destruction will happen.
> > Actually I think the glibc code shound unconditionally unregister the
> > rseq address at exit (after blocking signals, so no application code
> > can run) in case a third-party rseq library was linked and failed to
> > do so before thread exit (e.g. due to mismatched ref counts) rather
> > than respecting the reference count, since it knows it's the last
> > user. This would make potentially-buggy code safer.
>
> OK, let me go ahead with a few ideas/questions along that path.
^^^^^^^^^^^^^^^
>
> Let's say our stated goal is to let the "exit" system call from the
> glibc thread exit path perform rseq unregistration (without explicit
> unregistration beforehand). Let's look at what we need.

This is not "along that path". The above-quoted text is not about
assuming it's safe to make SYS_exit without unregistering the rseq
object, but rather about glibc being able to perform the
rseq-unregister syscall without caring about reference counts, since
it knows no other code that might depend on rseq can run after it.

> First, we need the TLS area to be valid until the exit system call
> is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
> I'm not entirely sure we can guarantee the IE model if another library
> gets its own global-dynamic weak symbol elected at execution time. Would
> it be better to switch to a "strong" symbol for the glibc __rseq_abi
> rather than weak ?

This doesn't help; still whichever comes first in link order would
override. Either way __rseq_abi would be in static TLS, though,
because any dynamically-loaded library is necessarily loaded after
libc, which is loaded at initial exec time.

> There has been presumptions about signals being blocked when the thread
> exits throughout this email thread. Out of curiosity, what code is
> responsible for disabling signals in this situation ? Related to this,
> is it valid to access a IE model TLS variable from a signal handler at
> _any_ point where the signal handler nests over thread's execution ?
> This includes early start and just before invoking the exit system call.

It should be valid to access *any* TLS object like this, but the
standards don't cover it well. Right now access to dynamic TLS from
signal handlers is unsafe in glibc, but static is safe.

Rich

2018-11-24 08:45:34

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 23, 2018, at 9:28 AM, Rich Felker [email protected] wrote:
[...]
>
> Absolutely. As long as it's in libc, implicit destruction will happen.
> Actually I think the glibc code shound unconditionally unregister the
> rseq address at exit (after blocking signals, so no application code
> can run) in case a third-party rseq library was linked and failed to
> do so before thread exit (e.g. due to mismatched ref counts) rather
> than respecting the reference count, since it knows it's the last
> user. This would make potentially-buggy code safer.

OK, let me go ahead with a few ideas/questions along that path.

Let's say our stated goal is to let the "exit" system call from the
glibc thread exit path perform rseq unregistration (without explicit
unregistration beforehand). Let's look at what we need.

First, we need the TLS area to be valid until the exit system call
is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
I'm not entirely sure we can guarantee the IE model if another library
gets its own global-dynamic weak symbol elected at execution time. Would
it be better to switch to a "strong" symbol for the glibc __rseq_abi
rather than weak ?

If we rely on implicit unregistration by the exit system call, then we
need to be really sure that the __rseq_abi TLS area can be accessed
(load and store) from kernel preemption up to the point where exit
is invoked. If we have that guarantee with the IE model, then we should
be fine. This means the memory area with the __rseq_abi sits can only
be re-used after the tid field in the TLB is set to 0 by the exit system
call. Looking at allocatestack.c, it looks like the FREE_P () macro
does exactly that.

With all the above respected, we could rely on implicit rseq unregistration
by thread exit rather than do an explicit unregister. We could still need
to increment the __rseq_refcount upon thread start however, so we can
ensure early adopter libraries won't unregister rseq while glibc is using
it. No need to bring the refcount back to 0 in glibc though.

There has been presumptions about signals being blocked when the thread
exits throughout this email thread. Out of curiosity, what code is
responsible for disabling signals in this situation ? Related to this,
is it valid to access a IE model TLS variable from a signal handler at
_any_ point where the signal handler nests over thread's execution ?
This includes early start and just before invoking the exit system call.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-24 08:45:41

by Florian Weimer

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

* Rich Felker:

> On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
>> There has been presumptions about signals being blocked when the thread
>> exits throughout this email thread. Out of curiosity, what code is
>> responsible for disabling signals in this situation ? Related to this,
>> is it valid to access a IE model TLS variable from a signal handler at
>> _any_ point where the signal handler nests over thread's execution ?
>> This includes early start and just before invoking the exit system call.
>
> It should be valid to access *any* TLS object like this, but the
> standards don't cover it well.

C++ makes it undefined:

<http://eel.is/c++draft/support.signal#def:evaluation,signal-safe>

Thanks,
Florian

2018-11-24 08:46:05

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 23, 2018, at 12:30 PM, Rich Felker [email protected] wrote:

> On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
>> ----- On Nov 23, 2018, at 9:28 AM, Rich Felker [email protected] wrote:
>> [...]
>> >
>> > Absolutely. As long as it's in libc, implicit destruction will happen.
>> > Actually I think the glibc code shound unconditionally unregister the
>> > rseq address at exit (after blocking signals, so no application code
>> > can run) in case a third-party rseq library was linked and failed to
>> > do so before thread exit (e.g. due to mismatched ref counts) rather
>> > than respecting the reference count, since it knows it's the last
>> > user. This would make potentially-buggy code safer.
>>
>> OK, let me go ahead with a few ideas/questions along that path.
> ^^^^^^^^^^^^^^^
>>
>> Let's say our stated goal is to let the "exit" system call from the
>> glibc thread exit path perform rseq unregistration (without explicit
>> unregistration beforehand). Let's look at what we need.
>
> This is not "along that path". The above-quoted text is not about
> assuming it's safe to make SYS_exit without unregistering the rseq
> object, but rather about glibc being able to perform the
> rseq-unregister syscall without caring about reference counts, since
> it knows no other code that might depend on rseq can run after it.

When saying "along that path", what I mean is: if we go in that direction,
then we should look into going all the way there, and rely on thread
exit to implicitly unregister the TLS area.

Do you see any reason for doing an explicit unregistration at thread
exit rather than simply rely on the exit system call ?


>
>> First, we need the TLS area to be valid until the exit system call
>> is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
>> I'm not entirely sure we can guarantee the IE model if another library
>> gets its own global-dynamic weak symbol elected at execution time. Would
>> it be better to switch to a "strong" symbol for the glibc __rseq_abi
>> rather than weak ?
>
> This doesn't help; still whichever comes first in link order would
> override. Either way __rseq_abi would be in static TLS, though,
> because any dynamically-loaded library is necessarily loaded after
> libc, which is loaded at initial exec time.

OK, AFAIU so you argue for leaving the __rseq_abi symbol "weak". Just making
sure I correctly understand your position.

Something can be technically correct based on the current implementation,
but fragile with respect to future changes. We need to carefully distinguish
between the two when exposing ABIs.

>
>> There has been presumptions about signals being blocked when the thread
>> exits throughout this email thread. Out of curiosity, what code is
>> responsible for disabling signals in this situation ?

This question is still open.

> Related to this,
>> is it valid to access a IE model TLS variable from a signal handler at
>> _any_ point where the signal handler nests over thread's execution ?
>> This includes early start and just before invoking the exit system call.
>
> It should be valid to access *any* TLS object like this, but the
> standards don't cover it well. Right now access to dynamic TLS from
> signal handlers is unsafe in glibc, but static is safe.

Which is a shame for the lttng-ust tracer, which needs global-dynamic
TLS variables so it can be dlopen'd, but aims at allowing tracing from
signal handlers. It looks like due to limitations of global-dynamic
TLS, tracing from instrumented signal handlers with lttng-ust tracepoints
could crash the process if the signal handler nests early at thread start
or late before thread exit. One way out of this would be to ensure signals
are blocked at thread start/exit, but I can't find the code responsible for
doing this within glibc.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-24 08:46:19

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Fri, Nov 23, 2018 at 06:39:04PM +0100, Florian Weimer wrote:
> * Rich Felker:
>
> > On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
> >> There has been presumptions about signals being blocked when the thread
> >> exits throughout this email thread. Out of curiosity, what code is
> >> responsible for disabling signals in this situation ? Related to this,
> >> is it valid to access a IE model TLS variable from a signal handler at
> >> _any_ point where the signal handler nests over thread's execution ?
> >> This includes early start and just before invoking the exit system call.
> >
> > It should be valid to access *any* TLS object like this, but the
> > standards don't cover it well.
>
> C++ makes it undefined:
>
> <http://eel.is/c++draft/support.signal#def:evaluation,signal-safe>

C also leaves access to pretty much anything from a signal handler
undefined, but that makes signals basically useless. POSIX
inadvertently defines a lot more than it wanted to by ignoring
indirect ways you can access objects using AS-safe functions to pass
around their addresses; there's an open issue for this:

http://austingroupbugs.net/view.php?id=728

I think it's reasonable to say, based on how fond POSIX is of signals
for realtime stuff, that it should allow some reasonable operations,
but just be more careful about what it allows, and disallowing access
to TLS would preclude the only ways to make signals non-awful for
multithreaded processes.

Rich

2018-11-24 08:47:35

by Florian Weimer

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

* Rich Felker:

> On Fri, Nov 23, 2018 at 06:39:04PM +0100, Florian Weimer wrote:
>> * Rich Felker:
>>
>> > On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
>> >> There has been presumptions about signals being blocked when the thread
>> >> exits throughout this email thread. Out of curiosity, what code is
>> >> responsible for disabling signals in this situation ? Related to this,
>> >> is it valid to access a IE model TLS variable from a signal handler at
>> >> _any_ point where the signal handler nests over thread's execution ?
>> >> This includes early start and just before invoking the exit system call.
>> >
>> > It should be valid to access *any* TLS object like this, but the
>> > standards don't cover it well.
>>
>> C++ makes it undefined:
>>
>> <http://eel.is/c++draft/support.signal#def:evaluation,signal-safe>
>
> C also leaves access to pretty much anything from a signal handler
> undefined, but that makes signals basically useless.

Access to atomic variables of thread storage duration is defined,
though.

Thanks,
Florian

2018-11-24 08:49:10

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Fri, Nov 23, 2018 at 12:52:21PM -0500, Mathieu Desnoyers wrote:
> ----- On Nov 23, 2018, at 12:30 PM, Rich Felker [email protected] wrote:
>
> > On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
> >> ----- On Nov 23, 2018, at 9:28 AM, Rich Felker [email protected] wrote:
> >> [...]
> >> >
> >> > Absolutely. As long as it's in libc, implicit destruction will happen.
> >> > Actually I think the glibc code shound unconditionally unregister the
> >> > rseq address at exit (after blocking signals, so no application code
> >> > can run) in case a third-party rseq library was linked and failed to
> >> > do so before thread exit (e.g. due to mismatched ref counts) rather
> >> > than respecting the reference count, since it knows it's the last
> >> > user. This would make potentially-buggy code safer.
> >>
> >> OK, let me go ahead with a few ideas/questions along that path.
> > ^^^^^^^^^^^^^^^
> >>
> >> Let's say our stated goal is to let the "exit" system call from the
> >> glibc thread exit path perform rseq unregistration (without explicit
> >> unregistration beforehand). Let's look at what we need.
> >
> > This is not "along that path". The above-quoted text is not about
> > assuming it's safe to make SYS_exit without unregistering the rseq
> > object, but rather about glibc being able to perform the
> > rseq-unregister syscall without caring about reference counts, since
> > it knows no other code that might depend on rseq can run after it.
>
> When saying "along that path", what I mean is: if we go in that direction,
> then we should look into going all the way there, and rely on thread
> exit to implicitly unregister the TLS area.
>
> Do you see any reason for doing an explicit unregistration at thread
> exit rather than simply rely on the exit system call ?

Whether this is needed is an implementation detail of glibc that
should be permitted to vary between versions. Unless glibc wants to
promise that it would become a public guarantee, it's not part of the
discussion around the API/ABI. Only part of the discussion around
implementation internals of the glibc rseq stuff.

Of course I may be biased thinking application code should not assume
this since it's not true on musl -- for detached threads, the thread
frees its own stack before exiting (and thus has to unregister
set_tid_address and set_robustlist before exiting).

> >> First, we need the TLS area to be valid until the exit system call
> >> is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
> >> I'm not entirely sure we can guarantee the IE model if another library
> >> gets its own global-dynamic weak symbol elected at execution time. Would
> >> it be better to switch to a "strong" symbol for the glibc __rseq_abi
> >> rather than weak ?
> >
> > This doesn't help; still whichever comes first in link order would
> > override. Either way __rseq_abi would be in static TLS, though,
> > because any dynamically-loaded library is necessarily loaded after
> > libc, which is loaded at initial exec time.
>
> OK, AFAIU so you argue for leaving the __rseq_abi symbol "weak". Just making
> sure I correctly understand your position.

I don't think it matters, and I don't think making it weak is
meaningful or useful (weak in a shared library is largely meaningless)
but maybe I'm missing something here.

> Something can be technically correct based on the current implementation,
> but fragile with respect to future changes. We need to carefully distinguish
> between the two when exposing ABIs.

Yes.

> >> There has been presumptions about signals being blocked when the thread
> >> exits throughout this email thread. Out of curiosity, what code is
> >> responsible for disabling signals in this situation ?
>
> This question is still open.

I can't find it -- maybe it's not done in glibc. It is in musl, and I
assumed glibc would also do it, because otherwise it's possible to see
some inconsistent states from signal handlers. Maybe these are all
undefined due to AS-unsafety of pthread_exit, but I think you can
construct examples where something could be observably wrong without
breaking any rules.

> > Related to this,
> >> is it valid to access a IE model TLS variable from a signal handler at
> >> _any_ point where the signal handler nests over thread's execution ?
> >> This includes early start and just before invoking the exit system call.
> >
> > It should be valid to access *any* TLS object like this, but the
> > standards don't cover it well. Right now access to dynamic TLS from
> > signal handlers is unsafe in glibc, but static is safe.
>
> Which is a shame for the lttng-ust tracer, which needs global-dynamic
> TLS variables so it can be dlopen'd, but aims at allowing tracing from
> signal handlers. It looks like due to limitations of global-dynamic
> TLS, tracing from instrumented signal handlers with lttng-ust tracepoints
> could crash the process if the signal handler nests early at thread start
> or late before thread exit. One way out of this would be to ensure signals
> are blocked at thread start/exit, but I can't find the code responsible for
> doing this within glibc.

Just blocking at start/exit won't solve the problem because
global-dynamic TLS in glibc involves dynamic allocation, which is hard
to make AS-safe and of course can fail, leaving no way to make forward
progress.

Rich

2018-11-24 08:52:31

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 23, 2018, at 1:35 PM, Rich Felker [email protected] wrote:

> On Fri, Nov 23, 2018 at 12:52:21PM -0500, Mathieu Desnoyers wrote:
>> ----- On Nov 23, 2018, at 12:30 PM, Rich Felker [email protected] wrote:
>>
>> > On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
>> >> ----- On Nov 23, 2018, at 9:28 AM, Rich Felker [email protected] wrote:
>> >> [...]
>> >> >
>> >> > Absolutely. As long as it's in libc, implicit destruction will happen.
>> >> > Actually I think the glibc code shound unconditionally unregister the
>> >> > rseq address at exit (after blocking signals, so no application code
>> >> > can run) in case a third-party rseq library was linked and failed to
>> >> > do so before thread exit (e.g. due to mismatched ref counts) rather
>> >> > than respecting the reference count, since it knows it's the last
>> >> > user. This would make potentially-buggy code safer.
>> >>
>> >> OK, let me go ahead with a few ideas/questions along that path.
>> > ^^^^^^^^^^^^^^^
>> >>
>> >> Let's say our stated goal is to let the "exit" system call from the
>> >> glibc thread exit path perform rseq unregistration (without explicit
>> >> unregistration beforehand). Let's look at what we need.
>> >
>> > This is not "along that path". The above-quoted text is not about
>> > assuming it's safe to make SYS_exit without unregistering the rseq
>> > object, but rather about glibc being able to perform the
>> > rseq-unregister syscall without caring about reference counts, since
>> > it knows no other code that might depend on rseq can run after it.
>>
>> When saying "along that path", what I mean is: if we go in that direction,
>> then we should look into going all the way there, and rely on thread
>> exit to implicitly unregister the TLS area.
>>
>> Do you see any reason for doing an explicit unregistration at thread
>> exit rather than simply rely on the exit system call ?
>
> Whether this is needed is an implementation detail of glibc that
> should be permitted to vary between versions. Unless glibc wants to
> promise that it would become a public guarantee, it's not part of the
> discussion around the API/ABI. Only part of the discussion around
> implementation internals of the glibc rseq stuff.
>
> Of course I may be biased thinking application code should not assume
> this since it's not true on musl -- for detached threads, the thread
> frees its own stack before exiting (and thus has to unregister
> set_tid_address and set_robustlist before exiting).

OK, so on glibc, the implementation could rely on exit side-effect to
implicitly unregister rseq. On musl, based on the scenario you describe,
the library should unregister rseq explicitly before stack reclaim.

Am I understanding the situation correctly ?

>
>> >> First, we need the TLS area to be valid until the exit system call
>> >> is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
>> >> I'm not entirely sure we can guarantee the IE model if another library
>> >> gets its own global-dynamic weak symbol elected at execution time. Would
>> >> it be better to switch to a "strong" symbol for the glibc __rseq_abi
>> >> rather than weak ?
>> >
>> > This doesn't help; still whichever comes first in link order would
>> > override. Either way __rseq_abi would be in static TLS, though,
>> > because any dynamically-loaded library is necessarily loaded after
>> > libc, which is loaded at initial exec time.
>>
>> OK, AFAIU so you argue for leaving the __rseq_abi symbol "weak". Just making
>> sure I correctly understand your position.
>
> I don't think it matters, and I don't think making it weak is
> meaningful or useful (weak in a shared library is largely meaningless)
> but maybe I'm missing something here.

Using a "weak" symbol in early adopter libraries is important, so they
can be loaded together into the same process without causing loader
errors due to many definitions of the same strong symbol.

Using "weak" in a C library is something I'm not sure is a characteristic
we want or need, because I doubt we would ever want to load two libc at the
same time in a given process.

The only reason I see for using "weak" for the __rseq_abi symbol in the
libc is if we want to allow early adopter applications to define
__rseq_abi as a strong symbol, which would make some sense.


>
>> Something can be technically correct based on the current implementation,
>> but fragile with respect to future changes. We need to carefully distinguish
>> between the two when exposing ABIs.
>
> Yes.
>
>> >> There has been presumptions about signals being blocked when the thread
>> >> exits throughout this email thread. Out of curiosity, what code is
>> >> responsible for disabling signals in this situation ?
>>
>> This question is still open.
>
> I can't find it -- maybe it's not done in glibc. It is in musl, and I
> assumed glibc would also do it, because otherwise it's possible to see
> some inconsistent states from signal handlers. Maybe these are all
> undefined due to AS-unsafety of pthread_exit, but I think you can
> construct examples where something could be observably wrong without
> breaking any rules.

Good to know for the musl case.

>
>> > Related to this,
>> >> is it valid to access a IE model TLS variable from a signal handler at
>> >> _any_ point where the signal handler nests over thread's execution ?
>> >> This includes early start and just before invoking the exit system call.
>> >
>> > It should be valid to access *any* TLS object like this, but the
>> > standards don't cover it well. Right now access to dynamic TLS from
>> > signal handlers is unsafe in glibc, but static is safe.
>>
>> Which is a shame for the lttng-ust tracer, which needs global-dynamic
>> TLS variables so it can be dlopen'd, but aims at allowing tracing from
>> signal handlers. It looks like due to limitations of global-dynamic
>> TLS, tracing from instrumented signal handlers with lttng-ust tracepoints
>> could crash the process if the signal handler nests early at thread start
>> or late before thread exit. One way out of this would be to ensure signals
>> are blocked at thread start/exit, but I can't find the code responsible for
>> doing this within glibc.
>
> Just blocking at start/exit won't solve the problem because
> global-dynamic TLS in glibc involves dynamic allocation, which is hard
> to make AS-safe and of course can fail, leaving no way to make forward
> progress.

How hard would it be to create a async-signal-safe memory pool, which would
be always accessed with signals blocked, so we could fix those corner-cases
for good ?

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-26 08:29:49

by Florian Weimer

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

* Mathieu Desnoyers:

> Using a "weak" symbol in early adopter libraries is important, so they
> can be loaded together into the same process without causing loader
> errors due to many definitions of the same strong symbol.

This is not how ELF dynamic linking works. If the symbol name is the
same, one definition interposes the others.

You need to ensure that the symbol has the same size everywhere, though.
There are some tricky interactions with symbol versions, too. (The
interposing libraries must not use symbol versioning.)

Thanks,
Florian

2018-11-26 11:57:42

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On 23/11/18 21:09, Mathieu Desnoyers wrote:
> ----- On Nov 23, 2018, at 1:35 PM, Rich Felker [email protected] wrote:
>
>> On Fri, Nov 23, 2018 at 12:52:21PM -0500, Mathieu Desnoyers wrote:
>>> ----- On Nov 23, 2018, at 12:30 PM, Rich Felker [email protected] wrote:
>>>
>>>> On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
>>>>> ----- On Nov 23, 2018, at 9:28 AM, Rich Felker [email protected] wrote:
>>>>> [...]
>>>>>>
>>>>>> Absolutely. As long as it's in libc, implicit destruction will happen.
>>>>>> Actually I think the glibc code shound unconditionally unregister the
>>>>>> rseq address at exit (after blocking signals, so no application code
>>>>>> can run) in case a third-party rseq library was linked and failed to
>>>>>> do so before thread exit (e.g. due to mismatched ref counts) rather
>>>>>> than respecting the reference count, since it knows it's the last
>>>>>> user. This would make potentially-buggy code safer.
>>>>>
>>>>> OK, let me go ahead with a few ideas/questions along that path.
>>>> ^^^^^^^^^^^^^^^
>>>>>
>>>>> Let's say our stated goal is to let the "exit" system call from the
>>>>> glibc thread exit path perform rseq unregistration (without explicit
>>>>> unregistration beforehand). Let's look at what we need.
>>>>
>>>> This is not "along that path". The above-quoted text is not about
>>>> assuming it's safe to make SYS_exit without unregistering the rseq
>>>> object, but rather about glibc being able to perform the
>>>> rseq-unregister syscall without caring about reference counts, since
>>>> it knows no other code that might depend on rseq can run after it.
>>>
>>> When saying "along that path", what I mean is: if we go in that direction,
>>> then we should look into going all the way there, and rely on thread
>>> exit to implicitly unregister the TLS area.
>>>
>>> Do you see any reason for doing an explicit unregistration at thread
>>> exit rather than simply rely on the exit system call ?
>>
>> Whether this is needed is an implementation detail of glibc that
>> should be permitted to vary between versions. Unless glibc wants to
>> promise that it would become a public guarantee, it's not part of the
>> discussion around the API/ABI. Only part of the discussion around
>> implementation internals of the glibc rseq stuff.
>>
>> Of course I may be biased thinking application code should not assume
>> this since it's not true on musl -- for detached threads, the thread
>> frees its own stack before exiting (and thus has to unregister
>> set_tid_address and set_robustlist before exiting).
>
> OK, so on glibc, the implementation could rely on exit side-effect to
> implicitly unregister rseq. On musl, based on the scenario you describe,
> the library should unregister rseq explicitly before stack reclaim.
>
> Am I understanding the situation correctly ?

i think the point is that you don't need to know these
details in order to come up with a design that allows
both implementations. (then the libc can change later)

so
- is there a need for public unregister api (does the
user do it or the rseq library implicitly unregisters)?
- is there a need for ref counting (or the rseq lib
unconditionally unregisters at the end of a thread,
the libc can certainly do this)?

>>> OK, AFAIU so you argue for leaving the __rseq_abi symbol "weak". Just making
>>> sure I correctly understand your position.
>>
>> I don't think it matters, and I don't think making it weak is
>> meaningful or useful (weak in a shared library is largely meaningless)
>> but maybe I'm missing something here.
>
> Using a "weak" symbol in early adopter libraries is important, so they
> can be loaded together into the same process without causing loader
> errors due to many definitions of the same strong symbol.
>
> Using "weak" in a C library is something I'm not sure is a characteristic
> we want or need, because I doubt we would ever want to load two libc at the
> same time in a given process.
>
> The only reason I see for using "weak" for the __rseq_abi symbol in the
> libc is if we want to allow early adopter applications to define
> __rseq_abi as a strong symbol, which would make some sense.

weak really does not matter in dynamic linking
(unless you set the LD_DYNAMIC_WEAK env var for
backward compat with very old glibc, or if it's
an undefined weak reference)

>> Just blocking at start/exit won't solve the problem because
>> global-dynamic TLS in glibc involves dynamic allocation, which is hard
>> to make AS-safe and of course can fail, leaving no way to make forward
>> progress.
>
> How hard would it be to create a async-signal-safe memory pool, which would
> be always accessed with signals blocked, so we could fix those corner-cases
> for good ?

that is hard.

in musl tls access is as-safe, but it uses a different
approach: it does all allocations at thread creation or
dlopen time.

glibc has further issues because it supports dlclose
with module unloading and then dynamic tls related
internal structures are hard to free (it is valid to
implement dlclose as a noop, which is what musl does.
tls access needs to synchronize with dlopen and dlclose
when accessing internal structures, but you need a
lock-free mechanism if the access has to be as-safe,
and dlclose is harder to do that way than dlopen)

2018-11-26 15:54:31

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 26, 2018, at 3:28 AM, Florian Weimer [email protected] wrote:

> * Mathieu Desnoyers:
>
>> Using a "weak" symbol in early adopter libraries is important, so they
>> can be loaded together into the same process without causing loader
>> errors due to many definitions of the same strong symbol.
>
> This is not how ELF dynamic linking works. If the symbol name is the
> same, one definition interposes the others.
>
> You need to ensure that the symbol has the same size everywhere, though.
> There are some tricky interactions with symbol versions, too. (The
> interposing libraries must not use symbol versioning.)

I was under the impression that loading the same strong symbol into an
application multiple times would cause some kind of warning if non-weak. I did
some testing to figure out which case I remembered would cause this.

When compiling with "-fno-common", dynamic and static linking work fine, but
trying to add multiple instances of a given symbol into a single object fails
with:

/tmp/ccSakXZV.o:(.bss+0x0): multiple definition of `a'
/tmp/ccQBJBOo.o:(.bss+0x0): first defined here

Even if the symbol has the same size.

So considering that we don't care about compiling into a single object here,
and only care about static and dynamic linking of libraries, indeed the "weak"
symbol is not useful.

So let's make __rseq_abi and __rseq_refcount strong symbols then ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-26 16:08:58

by Florian Weimer

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

* Mathieu Desnoyers:

> So let's make __rseq_abi and __rseq_refcount strong symbols then ?

Yes, please. (But I'm still not sure we need the reference counter.)

Thanks,
Florian

2018-11-26 16:36:18

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 26, 2018, at 10:51 AM, Mathieu Desnoyers [email protected] wrote:

> ----- On Nov 26, 2018, at 3:28 AM, Florian Weimer [email protected] wrote:
>
>> * Mathieu Desnoyers:
>>
>>> Using a "weak" symbol in early adopter libraries is important, so they
>>> can be loaded together into the same process without causing loader
>>> errors due to many definitions of the same strong symbol.
>>
>> This is not how ELF dynamic linking works. If the symbol name is the
>> same, one definition interposes the others.
>>
>> You need to ensure that the symbol has the same size everywhere, though.
>> There are some tricky interactions with symbol versions, too. (The
>> interposing libraries must not use symbol versioning.)
>
> I was under the impression that loading the same strong symbol into an
> application multiple times would cause some kind of warning if non-weak. I did
> some testing to figure out which case I remembered would cause this.
>
> When compiling with "-fno-common", dynamic and static linking work fine, but
> trying to add multiple instances of a given symbol into a single object fails
> with:
>
> /tmp/ccSakXZV.o:(.bss+0x0): multiple definition of `a'
> /tmp/ccQBJBOo.o:(.bss+0x0): first defined here
>
> Even if the symbol has the same size.
>
> So considering that we don't care about compiling into a single object here,
> and only care about static and dynamic linking of libraries, indeed the "weak"
> symbol is not useful.
>
> So let's make __rseq_abi and __rseq_refcount strong symbols then ?

Actually, looking into ld(1) --warn-common, it looks like "weak" would be cleaner
after all, especially for __rseq_abi which we needs to be initialized to a specific
value, which is therefore not a common symbol.

" --warn-common
Warn when a common symbol is combined with another common symbol or with a symbol definition. Unix
linkers allow this somewhat sloppy practice, but linkers on some other operating systems do not.
This option allows you to find potential problems from combining global symbols. Unfortunately,
some C libraries use this practice, so you may get some warnings about symbols in the libraries as
well as in your programs."

Thoughts ?

Thanks,

Mathieu

>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-11-26 17:14:26

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Mon, Nov 26, 2018 at 11:30:51AM -0500, Mathieu Desnoyers wrote:
> ----- On Nov 26, 2018, at 10:51 AM, Mathieu Desnoyers [email protected] wrote:
>
> > ----- On Nov 26, 2018, at 3:28 AM, Florian Weimer [email protected] wrote:
> >
> >> * Mathieu Desnoyers:
> >>
> >>> Using a "weak" symbol in early adopter libraries is important, so they
> >>> can be loaded together into the same process without causing loader
> >>> errors due to many definitions of the same strong symbol.
> >>
> >> This is not how ELF dynamic linking works. If the symbol name is the
> >> same, one definition interposes the others.
> >>
> >> You need to ensure that the symbol has the same size everywhere, though.
> >> There are some tricky interactions with symbol versions, too. (The
> >> interposing libraries must not use symbol versioning.)
> >
> > I was under the impression that loading the same strong symbol into an
> > application multiple times would cause some kind of warning if non-weak. I did
> > some testing to figure out which case I remembered would cause this.
> >
> > When compiling with "-fno-common", dynamic and static linking work fine, but
> > trying to add multiple instances of a given symbol into a single object fails
> > with:
> >
> > /tmp/ccSakXZV.o:(.bss+0x0): multiple definition of `a'
> > /tmp/ccQBJBOo.o:(.bss+0x0): first defined here
> >
> > Even if the symbol has the same size.
> >
> > So considering that we don't care about compiling into a single object here,
> > and only care about static and dynamic linking of libraries, indeed the "weak"
> > symbol is not useful.
> >
> > So let's make __rseq_abi and __rseq_refcount strong symbols then ?
>
> Actually, looking into ld(1) --warn-common, it looks like "weak" would be cleaner
> after all, especially for __rseq_abi which we needs to be initialized to a specific
> value, which is therefore not a common symbol.
>
> " --warn-common
> Warn when a common symbol is combined with another common symbol or with a symbol definition. Unix
> linkers allow this somewhat sloppy practice, but linkers on some other operating systems do not.
> This option allows you to find potential problems from combining global symbols. Unfortunately,
> some C libraries use this practice, so you may get some warnings about symbols in the libraries as
> well as in your programs."
>
> Thoughts ?

AFAIK this has nothing to do with dynamic linking.

Rich

2018-11-26 17:17:53

by Rich Felker

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

On Mon, Nov 26, 2018 at 05:03:02PM +0100, Florian Weimer wrote:
> * Mathieu Desnoyers:
>
> > So let's make __rseq_abi and __rseq_refcount strong symbols then ?
>
> Yes, please. (But I'm still not sure we need the reference counter.)

The reference counter is needed for out-of-libc implementations
interacting and using the dtor hack. An in-libc implementation doesn't
need to inspect/honor the reference counter, but it does seem to need
to indicate that it has a reference, if you want it to be compatible
with out-of-libc implementations, so that the out-of-libc one will not
unregister the rseq before libc is done with it.

Alternatively another protocol could be chosen for this purpose, but
if has to be something stable and agreed upon, since things would
break badly if libc and the library providing rseq disagreed.

Rich

2018-11-26 19:25:29

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 26, 2018, at 12:10 PM, Rich Felker [email protected] wrote:

> On Mon, Nov 26, 2018 at 05:03:02PM +0100, Florian Weimer wrote:
>> * Mathieu Desnoyers:
>>
>> > So let's make __rseq_abi and __rseq_refcount strong symbols then ?
>>
>> Yes, please. (But I'm still not sure we need the reference counter.)
>
> The reference counter is needed for out-of-libc implementations
> interacting and using the dtor hack. An in-libc implementation doesn't
> need to inspect/honor the reference counter, but it does seem to need
> to indicate that it has a reference, if you want it to be compatible
> with out-of-libc implementations, so that the out-of-libc one will not
> unregister the rseq before libc is done with it.

Let's consider two use-cases here: one (simpler) is use of rseq TLS
from thread context by out-of-libc implementations. The other is use of
rseq TLS from signal handler by out-of-libc implementations.

If we only care about users of rseq from thread context, then libc
could simply set the refcount value to 1 on thread start,
and should not care about the value on thread exit. The libc can
either directly call rseq unregister, or rely on thread calling exit
to implicitly unregister rseq, which depends on its own TLS life-time
guarantees. For instance, if the IE-model TLS is valid up until call
to exit, just calling the exit system call is fine. However, if a libc
has a window at thread exit during which the kernel can preempt the
thread with the IE-model TLS area being already reclaimed, then it
needs to explicitly call rseq unregister before freeing the TLS.

The second use-case is out-of-libc implementations using rseq from
signal handler. This one is trickier. First, pthread_key setspecific
is unfortunately not async-signal-safe. I can't find a good way to
seamlessly integrate rseq into out-of-libc signal handlers while
performing lazy registration without races on thread exit. If we
figure out a way to do this though, we should increment the refcount
at thread start in libc (rather than just set it to 1) in case a
signal handler gets nested immediately over the start of the thread
and registers rseq as well.

It looks like it's not the only issue I have with calling lttng-ust
instrumentation from signal handlers, here is the list I have so
far:

* glibc global-dynamic TLS variables are not async-signal-safe,
and lttng-ust cannot use IE-model TLS because it is meant to be
dlopen'd,
* pthread_setspecific is not async-signal-safe,

There should be ways to eventually solve those issues, but it would
be nice if for now the way rseq is implemented in libc does not add
yet another limitation for signal handlers.

>
> Alternatively another protocol could be chosen for this purpose, but
> if has to be something stable and agreed upon, since things would
> break badly if libc and the library providing rseq disagreed.

Absolutely. We need to agree on that protocol before user-space
applications/libraries start using rseq.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-12-03 21:31:11

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

----- On Nov 26, 2018, at 2:22 PM, Mathieu Desnoyers [email protected] wrote:

> ----- On Nov 26, 2018, at 12:10 PM, Rich Felker [email protected] wrote:
>
>> On Mon, Nov 26, 2018 at 05:03:02PM +0100, Florian Weimer wrote:
>>> * Mathieu Desnoyers:
>>>
>>> > So let's make __rseq_abi and __rseq_refcount strong symbols then ?
>>>
>>> Yes, please. (But I'm still not sure we need the reference counter.)
>>
>> The reference counter is needed for out-of-libc implementations
>> interacting and using the dtor hack. An in-libc implementation doesn't
>> need to inspect/honor the reference counter, but it does seem to need
>> to indicate that it has a reference, if you want it to be compatible
>> with out-of-libc implementations, so that the out-of-libc one will not
>> unregister the rseq before libc is done with it.
>
> Let's consider two use-cases here: one (simpler) is use of rseq TLS
> from thread context by out-of-libc implementations. The other is use of
> rseq TLS from signal handler by out-of-libc implementations.
>
> If we only care about users of rseq from thread context, then libc
> could simply set the refcount value to 1 on thread start,
> and should not care about the value on thread exit. The libc can
> either directly call rseq unregister, or rely on thread calling exit
> to implicitly unregister rseq, which depends on its own TLS life-time
> guarantees. For instance, if the IE-model TLS is valid up until call
> to exit, just calling the exit system call is fine. However, if a libc
> has a window at thread exit during which the kernel can preempt the
> thread with the IE-model TLS area being already reclaimed, then it
> needs to explicitly call rseq unregister before freeing the TLS.
>
> The second use-case is out-of-libc implementations using rseq from
> signal handler. This one is trickier. First, pthread_key setspecific
> is unfortunately not async-signal-safe. I can't find a good way to
> seamlessly integrate rseq into out-of-libc signal handlers while
> performing lazy registration without races on thread exit. If we
> figure out a way to do this though, we should increment the refcount
> at thread start in libc (rather than just set it to 1) in case a
> signal handler gets nested immediately over the start of the thread
> and registers rseq as well.
>
> It looks like it's not the only issue I have with calling lttng-ust
> instrumentation from signal handlers, here is the list I have so
> far:
>
> * glibc global-dynamic TLS variables are not async-signal-safe,
> and lttng-ust cannot use IE-model TLS because it is meant to be
> dlopen'd,
> * pthread_setspecific is not async-signal-safe,
>
> There should be ways to eventually solve those issues, but it would
> be nice if for now the way rseq is implemented in libc does not add
> yet another limitation for signal handlers.

So, after thinking about a bit further, considering that current glibc
do not offer the async-signal-safe APIs required to proceed to touch
global-dynamic TLS variables from signal handlers nor register pthread key
destructors from signal handlers, I will end up needing glibc improvements
to eventually make lttng-ust instrumentation signal-safe.

This means that my main use-case for supporting out-of-libc rseq registration
from signal handlers does not exist today, and will require new glibc APIs
anyway. Therefore, it would make sense to require use of rseq from signal
handlers to depend on rseq registration by glibc at thread start, and limit
the use-case of out-of-libc rseq registration to those that do not nest
within signal handlers.

If we _never_ even want to allow signal handlers to register rseq, we could
set the __rseq_refcount to 1 at thread start in nptl and nptl init. However,
if we want to eventually allow rseq registration from signal handlers in the
future, we may want to consider keeping the __rseq_refcount relaxed atomic
increment at thread start, as long as it does not represent a too big
performance overhead.

For thread exit, we don't care about the __rseq_refcount value at thread exit
and we can unregister rseq unconditionally.

As long as it is not too costly to increment the __rseq_refcount at thread start,
I would be inclined to keep it as an increment rather than setting it to 1, so
we can have more flexibility with respect to future registration of rseq from
signal handlers, even though it is not possible today.

Thoughts ?

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-12-05 17:25:58

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation



----- On Nov 26, 2018, at 12:07 PM, Rich Felker [email protected] wrote:

> On Mon, Nov 26, 2018 at 11:30:51AM -0500, Mathieu Desnoyers wrote:
>> ----- On Nov 26, 2018, at 10:51 AM, Mathieu Desnoyers
>> [email protected] wrote:
>>
>> > ----- On Nov 26, 2018, at 3:28 AM, Florian Weimer [email protected] wrote:
>> >
>> >> * Mathieu Desnoyers:
>> >>
>> >>> Using a "weak" symbol in early adopter libraries is important, so they
>> >>> can be loaded together into the same process without causing loader
>> >>> errors due to many definitions of the same strong symbol.
>> >>
>> >> This is not how ELF dynamic linking works. If the symbol name is the
>> >> same, one definition interposes the others.
>> >>
>> >> You need to ensure that the symbol has the same size everywhere, though.
>> >> There are some tricky interactions with symbol versions, too. (The
>> >> interposing libraries must not use symbol versioning.)
>> >
>> > I was under the impression that loading the same strong symbol into an
>> > application multiple times would cause some kind of warning if non-weak. I did
>> > some testing to figure out which case I remembered would cause this.
>> >
>> > When compiling with "-fno-common", dynamic and static linking work fine, but
>> > trying to add multiple instances of a given symbol into a single object fails
>> > with:
>> >
>> > /tmp/ccSakXZV.o:(.bss+0x0): multiple definition of `a'
>> > /tmp/ccQBJBOo.o:(.bss+0x0): first defined here
>> >
>> > Even if the symbol has the same size.
>> >
>> > So considering that we don't care about compiling into a single object here,
>> > and only care about static and dynamic linking of libraries, indeed the "weak"
>> > symbol is not useful.
>> >
>> > So let's make __rseq_abi and __rseq_refcount strong symbols then ?
>>
>> Actually, looking into ld(1) --warn-common, it looks like "weak" would be
>> cleaner
>> after all, especially for __rseq_abi which we needs to be initialized to a
>> specific
>> value, which is therefore not a common symbol.
>>
>> " --warn-common
>> Warn when a common symbol is combined with another common symbol or with a
>> symbol definition. Unix
>> linkers allow this somewhat sloppy practice, but linkers on some other operating
>> systems do not.
>> This option allows you to find potential problems from combining global symbols.
>> Unfortunately,
>> some C libraries use this practice, so you may get some warnings about symbols
>> in the libraries as
>> well as in your programs."
>>
>> Thoughts ?
>
> AFAIK this has nothing to do with dynamic linking.

Reading through the ELF specification, it seems to imply that "weak" only affects the link
editor behavior when combining relocatable objects, not the behavior of the dynamic linker.
Is that what you refer to when you say "weak" has nothing to do with dynamic linking ?

If that interpretation is correct, then indeed I should remove the "weak" from the __rseq_abi
and __rseq_refcount.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com