Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp10797380imu; Thu, 6 Dec 2018 06:59:50 -0800 (PST) X-Google-Smtp-Source: AFSGD/U9RueWzs6LO+0p0i7rDjKkdXwqOQCSEXZSH43Jh92OdRjpBrFWycI9EnSARZQEUJJbSv+N X-Received: by 2002:a63:4187:: with SMTP id o129mr22518140pga.370.1544108390090; Thu, 06 Dec 2018 06:59:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544108390; cv=none; d=google.com; s=arc-20160816; b=VLr+E/01MIfBVYEN8LqJHa2MkKB97itSFqueHjdmuuhXCa6kiGmHJj0dSKTTug2b3d X8eDYf50U+4Czn8U8gkdGZegPeQnrJdt3LgsBM5v7jrZpAC+clc6QY+139ENiD3FDoVq p68c/3GM9r7Py1ArMXTpFQSPnRFRdzHZ+8A9ZUJ+GvbfsUnmAl7WETKAaCTDqi1I28eP OkuJ8ekx/HhzlU9A1iytJpstoWXq5qelEDLgz0IJM9PWMdoqK3iOGEqGKQO8fWDqjjkh EAOL7/YIkjQajnCtC9jBBcAT83EUVfO1vNWOYaGOS5wIesfQR4DNhxO3ReGoOtBIhm8s 6Kyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=1l0MfmDGGDU9STLmAhMWUW5Ln69/QEsxjX11MFqlXww=; b=MG70fSMQKJoat3bC/aVkukMPjIbTcq+SV970dyP6Uob2cXELrs1zoN4ox0zo3EBx5+ Aks6jP33uyUy649pGqFVtBIsxUuu+D3rK+uOLbvPZHF1TMhwB6LsUe2b7L0sPvjDQVNn eM1DOIF4ysov/aAE+ci/HrfIBdrXp1Z8GjiW971B3NjDoMafS7f1CQq9+URswcswD5vk pInNqk3r4bh45EFdaZrhm+XRjuM1mVwXT1yRz0/4Bbg5karRZ47K1OwvlW1MzZa7R42O msmuR0L9f94g9FZjBEjIOtwr/BlumbBk19j/e+uWQ/ElpWrpC8h42o0iYYKS73nbpoRH L2Zg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=lBV+p2H8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 18si397864pgo.331.2018.12.06.06.59.34; Thu, 06 Dec 2018 06:59:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=lBV+p2H8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730535AbeLFOmm (ORCPT + 99 others); Thu, 6 Dec 2018 09:42:42 -0500 Received: from mail.efficios.com ([167.114.142.138]:38438 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730513AbeLFOml (ORCPT ); Thu, 6 Dec 2018 09:42:41 -0500 Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id 37FF39FF7C; Thu, 6 Dec 2018 09:42:39 -0500 (EST) Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10032) with ESMTP id pbqHcvwxHoTA; Thu, 6 Dec 2018 09:42:35 -0500 (EST) Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id 0E2319FF79; Thu, 6 Dec 2018 09:42:35 -0500 (EST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 0E2319FF79 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1544107355; bh=1l0MfmDGGDU9STLmAhMWUW5Ln69/QEsxjX11MFqlXww=; h=From:To:Date:Message-Id; b=lBV+p2H8PYeuxbGm6+LeZcnQqma6bNpt7XTPqnZwTRmnG1a7A02E5A45cLX1AoBwE olsvNmqQRPRMFA5eEi2QhfxB184Toyv6pZIJm9vNrT8f0INRql9cHatMo/e9y4ar9C zqHeqFnaVX/HlNm0HgA256BFKlnqj8SuFD/eXQbgQHK5hgwOVadphGDFWsY2JX9jGW QXxO2SwZU0pkRKqZc+hPSWqvXwzRG/zofhtNcVaUU1cPCxABD6lFMeuA7nbrEHpoYG vOnt/zB+qICo13LNe9WEYkJfk4P6UoGIgguCwF9PCkC2vqA1SRP5n+IAZqIcDi7ALX tdXtLsqzp6FOQ== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10026) with ESMTP id nG3jT03cxgqr; Thu, 6 Dec 2018 09:42:34 -0500 (EST) Received: from thinkos.polymtl.ca (Sansfil-Securise-Etudiants-Lassonde-245-12.polymtl.ca [132.207.245.12]) by mail.efficios.com (Postfix) with ESMTPSA id AAEED9FF74; Thu, 6 Dec 2018 09:42:34 -0500 (EST) From: Mathieu Desnoyers To: Michael Kerrisk Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Peter Zijlstra , "Paul E . McKenney" , Boqun Feng , Andy Lutomirski , Dave Watson , Paul Turner , Andrew Morton , Russell King , Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , Andi Kleen , Chris Lameter , Ben Maurer , Steven Rostedt , Josh Triplett , Linus Torvalds , Catalin Marinas , Will Deacon , Mathieu Desnoyers Subject: [PATCH man-pages] Add rseq manpage Date: Thu, 6 Dec 2018 09:42:28 -0500 Message-Id: <20181206144228.9656-1-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.11.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [ Michael, rseq(2) was merged into 4.18. Can you have a look at this patch which adds rseq documentation to the man-pages project ? ] Signed-off-by: Mathieu Desnoyers CC: "Paul E. McKenney" CC: Peter Zijlstra CC: Paul Turner CC: Thomas Gleixner CC: Andy Lutomirski CC: Andi Kleen CC: Dave Watson CC: Chris Lameter CC: Ingo Molnar CC: "H. Peter Anvin" CC: Ben Maurer CC: Steven Rostedt CC: Josh Triplett CC: Linus Torvalds CC: Andrew Morton CC: Russell King CC: Catalin Marinas CC: Will Deacon CC: Michael Kerrisk CC: Boqun Feng CC: linux-api@vger.kernel.org --- man2/rseq.2 | 299 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 299 insertions(+) create mode 100644 man2/rseq.2 diff --git a/man2/rseq.2 b/man2/rseq.2 new file mode 100644 index 000000000..005c1cee4 --- /dev/null +++ b/man2/rseq.2 @@ -0,0 +1,299 @@ +.\" Copyright 2015-2018 Mathieu Desnoyers +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and authors of this work. +.\" %%%LICENSE_END +.\" +.TH RSEQ 2 2018-09-19 "Linux" "Linux Programmer's Manual" +.SH NAME +rseq \- Restartable sequences and cpu number cache +.SH SYNOPSIS +.nf +.B #include +.sp +.BI "int rseq(struct rseq * " rseq ", uint32_t " rseq_len ", int " flags ", uint32_t " sig "); +.sp +.SH DESCRIPTION +The +.BR rseq () +ABI accelerates user-space operations on per-cpu data by defining a +shared data structure ABI between each user-space thread and the kernel. + +It allows user-space to perform update operations on per-cpu data +without requiring heavy-weight atomic operations. + +The term CPU used in this documentation refers to a hardware execution +context. + +Restartable sequences are atomic with respect to preemption (making it +atomic with respect to other threads running on the same CPU), as well +as signal delivery (user-space execution contexts nested over the same +thread). They either complete atomically with respect to preemption on +the current CPU and signal delivery, or they are aborted. + +It is suited for update operations on per-cpu data. + +It can be used on data structures shared between threads within a +process, and on data structures shared between threads across different +processes. + +.PP +Some examples of operations that can be accelerated or improved +by this ABI: +.IP \[bu] 2 +Memory allocator per-cpu free-lists, +.IP \[bu] 2 +Querying the current CPU number, +.IP \[bu] 2 +Incrementing per-CPU counters, +.IP \[bu] 2 +Modifying data protected by per-CPU spinlocks, +.IP \[bu] 2 +Inserting/removing elements in per-CPU linked-lists, +.IP \[bu] 2 +Writing/reading per-CPU ring buffers content. +.IP \[bu] 2 +Accurately reading performance monitoring unit counters +with respect to thread migration. + +.PP +Restartable sequences must not perform system calls. Doing so may result +in termination of the process by a segmentation fault. + +.PP +The +.I rseq +argument is a pointer to the thread-local rseq structure to be shared +between kernel and user-space. + +.PP +The layout of +.B struct rseq +is as follows: +.TP +.B Structure alignment +This structure is aligned on 32-byte boundary. +.TP +.B Structure size +This structure is extensible. Its size is passed as parameter to the +rseq system call. +.TP +.B Fields + +.TP +.in +4n +.I cpu_id_start +Optimistic cache of the CPU number on which the current thread is +running. Its value is guaranteed to always be a possible CPU number, +even when rseq is not initialized. The value it contains should always +be confirmed by reading the cpu_id field. + +This field is an optimistic cache in the sense that it is always +guaranteed to hold a valid CPU number in the range [ 0 .. +nr_possible_cpus - 1 ]. It can therefore be loaded by user-space and +used as an offset in per-cpu data structures without having to +check whether its value is within the valid bounds compared to the +number of possible CPUs in the system. + +For user-space applications executed on a kernel without rseq support, +the cpu_id_start field stays initialized at 0, which is indeed a valid +CPU number. It is therefore valid to use it as an offset in per-cpu data +structures, and only validate whether it's actually the current CPU +number by comparing it with the cpu_id field within the rseq critical +section. If the kernel does not provide rseq support, that cpu_id field +stays initialized at -1, so the comparison always fails, as intended. + +It is then up to user-space to use a fall-back mechanism, considering +that rseq is not available. + +.in +.TP +.in +4n +.I cpu_id +Cache of the CPU number on which the current thread is running. +-1 if uninitialized. +.in +.TP +.in +4n +.I rseq_cs +The rseq_cs field is a pointer to a struct rseq_cs. Is is NULL when no +rseq assembly block critical section is active for the current thread. +Setting it to point to a critical section descriptor (struct rseq_cs) +marks the beginning of the critical section. +.in +.TP +.in +4n +.I flags +Flags indicating the restart behavior for the current thread. This is +mainly used for debugging purposes. Can be either: +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE +.in + +.PP +The layout of +.B struct rseq_cs +version 0 is as follows: +.TP +.B Structure alignment +This structure is aligned on 32-byte boundary. +.TP +.B Structure size +This structure has a fixed size of 32 bytes. +.TP +.B Fields + +.TP +.in +4n +.I version +Version of this structure. +.in +.TP +.in +4n +.I flags +Flags indicating the restart behavior of this structure. Can be +a combination of: +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL +.IP \[bu] +RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE +.TP +.in +4n +.I start_ip +Instruction pointer address of the first instruction of the sequence of +consecutive assembly instructions. +.in +.TP +.in +4n +.I post_commit_offset +Offset (from start_ip address) of the address after the last instruction +of the sequence of consecutive assembly instructions. +.in +.TP +.in +4n +.I abort_ip +Instruction pointer address where to move the execution flow in case of +abort of the sequence of consecutive assembly instructions. +.in + +.PP +The +.I rseq_len +argument is the size of the +.I struct rseq +to register. + +.PP +The +.I flags +argument is 0 for registration, and +.IR RSEQ_FLAG_UNREGISTER +for unregistration. + +.PP +The +.I sig +argument is the 32-bit signature to be expected before the abort +handler code. + +.PP +A single library per process should keep the rseq structure in a +thread-local storage variable. +The +.I cpu_id +field should be initialized to -1, and the +.I cpu_id_start +field should be initialized to a possible CPU value (typically 0). + +.PP +Each thread is responsible for registering and unregistering its rseq +structure. No more than one rseq structure address can be registered +per thread at a given time. + +.PP +Memory of a registered rseq object must not be freed before the thread +exits. Reclaim of rseq object's memory must only be done after either an +explicit rseq unregistration is performed or after the thread exits. Keep +in mind that the implementation of the Thread-Local Storage (C language +__thread) lifetime does not guarantee existence of the TLS area up until +the thread exits. + +.PP +In a typical usage scenario, the thread registering the rseq +structure will be performing loads and stores from/to that structure. It +is however also allowed to read that structure from other threads. +The rseq field updates performed by the kernel provide relaxed atomicity +semantics, which guarantee that other threads performing relaxed atomic +reads of the cpu number cache will always observe a consistent value. + +.SH RETURN VALUE +A return value of 0 indicates success. On error, \-1 is returned, and +.I errno +is set appropriately. + +.SH ERRORS +.TP +.B EINVAL +Either +.I flags +contains an invalid value, or +.I rseq +contains an address which is not appropriately aligned, or +.I rseq_len +contains a size that does not match the size received on registration. +.TP +.B ENOSYS +The +.BR rseq () +system call is not implemented by this kernel. +.TP +.B EFAULT +.I rseq +is an invalid address. +.TP +.B EBUSY +Restartable sequence is already registered for this thread. +.TP +.B EPERM +The +.I sig +argument on unregistration does not match the signature received +on registration. + +.SH VERSIONS +The +.BR rseq () +system call was added in Linux 4.18. + +.SH CONFORMING TO +.BR rseq () +is Linux-specific. + +.in +.SH SEE ALSO +.BR sched_getcpu (3) , +.BR membarrier (2) -- 2.11.0