Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751971AbbEFU11 (ORCPT ); Wed, 6 May 2015 16:27:27 -0400 Received: from mail.efficios.com ([78.47.125.74]:35613 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751639AbbEFU1Z convert rfc822-to-8bit (ORCPT ); Wed, 6 May 2015 16:27:25 -0400 Date: Wed, 6 May 2015 20:27:19 +0000 (UTC) From: Mathieu Desnoyers To: josh@joshtriplett.org Cc: Andrew Morton , linux-kernel@vger.kernel.org, KOSAKI Motohiro , Steven Rostedt , Nicholas Miell , Linus Torvalds , Ingo Molnar , Alan Cox , Lai Jiangshan , Stephen Hemminger , Thomas Gleixner , Peter Zijlstra , David Howells , Pranith Kumar , Michael Kerrisk , linux-api@vger.kernel.org Message-ID: <371299002.44925.1430944039395.JavaMail.zimbra@efficios.com> In-Reply-To: <20150506202120.GA23011@cloud> References: <1430940068-4326-1-git-send-email-mathieu.desnoyers@efficios.com> <1430940068-4326-2-git-send-email-mathieu.desnoyers@efficios.com> <20150506202120.GA23011@cloud> Subject: Re: [PATCH v18 for v4.1-rc2 1/3] sys_membarrier(): system-wide memory barrier (generic, x86) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Originating-IP: [24.114.97.124] X-Mailer: Zimbra 8.0.7_GA_6021 (ZimbraWebClient - FF37 (Linux)/8.0.7_GA_6021) Thread-Topic: sys_membarrier(): system-wide memory barrier (generic, x86) Thread-Index: 2bMA9QDZ21kwoMOiiT5Xs3klS0puDw== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 26736 Lines: 670 ----- Original Message ----- > On Wed, May 06, 2015 at 03:21:06PM -0400, Mathieu Desnoyers wrote: > > Here is an implementation of a new system call, sys_membarrier(), which > > executes a memory barrier on all threads running on the system. It is > > implemented by calling synchronize_sched(). It can be used to distribute > > the cost of user-space memory barriers asymmetrically by transforming > > pairs of memory barriers into pairs consisting of sys_membarrier() and a > > compiler barrier. For synchronization primitives that distinguish > > between read-side and write-side (e.g. userspace RCU [1], rwlocks), the > > read-side can be accelerated significantly by moving the bulk of the > > memory barrier overhead to the write-side. > > > > It is based on kernel v4.1-rc2. > > > > To explain the benefit of this scheme, let's introduce two example threads: > > > > Thread A (non-frequent, e.g. executing liburcu synchronize_rcu()) > > Thread B (frequent, e.g. executing liburcu > > rcu_read_lock()/rcu_read_unlock()) > > > > In a scheme where all smp_mb() in thread A are ordering memory accesses > > with respect to smp_mb() present in Thread B, we can change each > > smp_mb() within Thread A into calls to sys_membarrier() and each > > smp_mb() within Thread B into compiler barriers "barrier()". > > > > Before the change, we had, for each smp_mb() pairs: > > > > Thread A Thread B > > previous mem accesses previous mem accesses > > smp_mb() smp_mb() > > following mem accesses following mem accesses > > > > After the change, these pairs become: > > > > Thread A Thread B > > prev mem accesses prev mem accesses > > sys_membarrier() barrier() > > follow mem accesses follow mem accesses > > > > As we can see, there are two possible scenarios: either Thread B memory > > accesses do not happen concurrently with Thread A accesses (1), or they > > do (2). > > > > 1) Non-concurrent Thread A vs Thread B accesses: > > > > Thread A Thread B > > prev mem accesses > > sys_membarrier() > > follow mem accesses > > prev mem accesses > > barrier() > > follow mem accesses > > > > In this case, thread B accesses will be weakly ordered. This is OK, > > because at that point, thread A is not particularly interested in > > ordering them with respect to its own accesses. > > > > 2) Concurrent Thread A vs Thread B accesses > > > > Thread A Thread B > > prev mem accesses prev mem accesses > > sys_membarrier() barrier() > > follow mem accesses follow mem accesses > > > > In this case, thread B accesses, which are ensured to be in program > > order thanks to the compiler barrier, will be "upgraded" to full > > smp_mb() by synchronize_sched(). > > > > * Benchmarks > > > > On Intel Xeon E5405 (8 cores) > > (one thread is calling sys_membarrier, the other 7 threads are busy > > looping) > > > > 1000 non-expedited sys_membarrier calls in 33s = 33 milliseconds/call. > > > > * User-space user of this system call: Userspace RCU library > > > > Both the signal-based and the sys_membarrier userspace RCU schemes > > permit us to remove the memory barrier from the userspace RCU > > rcu_read_lock() and rcu_read_unlock() primitives, thus significantly > > accelerating them. These memory barriers are replaced by compiler > > barriers on the read-side, and all matching memory barriers on the > > write-side are turned into an invocation of a memory barrier on all > > active threads in the process. By letting the kernel perform this > > synchronization rather than dumbly sending a signal to every process > > threads (as we currently do), we diminish the number of unnecessary wake > > ups and only issue the memory barriers on active threads. Non-running > > threads do not need to execute such barrier anyway, because these are > > implied by the scheduler context switches. > > > > Results in liburcu: > > > > Operations in 10s, 6 readers, 2 writers: > > > > memory barriers in reader: 1701557485 reads, 2202847 writes > > signal-based scheme: 9830061167 reads, 6700 writes > > sys_membarrier: 9952759104 reads, 425 writes > > sys_membarrier (dyn. check): 7970328887 reads, 425 writes > > > > The dynamic sys_membarrier availability check adds some overhead to > > the read-side compared to the signal-based scheme, but besides that, > > sys_membarrier slightly outperforms the signal-based scheme. However, > > this non-expedited sys_membarrier implementation has a much slower grace > > period than signal and memory barrier schemes. > > > > Besides diminishing the number of wake-ups, one major advantage of the > > membarrier system call over the signal-based scheme is that it does not > > need to reserve a signal. This plays much more nicely with libraries, > > and with processes injected into for tracing purposes, for which we > > cannot expect that signals will be unused by the application. > > > > An expedited version of this system call can be added later on to speed > > up the grace period. Its implementation will likely depend on reading > > the cpu_curr()->mm without holding each CPU's rq lock. > > > > This patch adds the system call to x86 and to asm-generic. > > > > membarrier(2) man page: > > --------------- snip ------------------- > > MEMBARRIER(2) Linux Programmer's Manual > > MEMBARRIER(2) > > > > NAME > > membarrier - issue memory barriers on a set of threads > > > > SYNOPSIS > > #include > > > > int membarrier(int cmd, int flags); > > > > DESCRIPTION > > The cmd argument is one of the following: > > > > MEMBARRIER_CMD_QUERY > > Query the set of supported commands. It returns a bitmask > > of > > supported commands. > > > > MEMBARRIER_CMD_SHARED > > Execute a memory barrier on all threads running on the > > system. > > Upon return from system call, the caller thread is ensured > > that > > all running threads have passed through a state where all > > memory > > accesses to user-space addresses match program order > > between > > entry to and return from the system call (non-running > > threads > > are de facto in such a state). This covers threads from all > > pro‐ > > cesses running on the system. This command returns 0. > > > > The flags argument needs to be 0. For future extensions. > > > > All memory accesses performed in program order from each > > targeted > > thread is guaranteed to be ordered with respect to sys_membarrier(). > > If > > we use the semantic "barrier()" to represent a compiler barrier > > forcing > > memory accesses to be performed in program order across the > > barrier, > > and smp_mb() to represent explicit memory barriers forcing full > > memory > > ordering across the barrier, we have the following ordering table > > for > > each pair of barrier(), sys_membarrier() and smp_mb(): > > > > The pair ordering is detailed as (O: ordered, X: not ordered): > > > > barrier() smp_mb() sys_membarrier() > > barrier() X X O > > smp_mb() X O O > > sys_membarrier() O O O > > > > RETURN VALUE > > On success, these system calls return zero. On error, -1 is > > returned, > > and errno is set appropriately. For a given command, with flags > > argument set to 0, this system call is guaranteed to always return > > the > > same value until reboot. > > > > ERRORS > > ENOSYS System call is not implemented. > > > > EINVAL Invalid arguments. > > > > Linux 2015-04-15 > > MEMBARRIER(2) > > --------------- snip ------------------- > > > > [1] http://urcu.so > > > > Changes since v17: > > - Update commit message. > > > > Changes since v16: > > - Update documentation. > > - Add man page to changelog. > > - Build sys_membarrier on !CONFIG_SMP. It allows userspace applications > > to not care about the number of processors on the system. Based on > > recommendations from Stephen Hemminger and Steven Rostedt. > > - Check that flags argument is 0, update documentation to require it. > > > > Changes since v15: > > - Add flags argument in addition to cmd. > > - Update documentation. > > > > Changes since v14: > > - Take care of Thomas Gleixner's comments. > > > > Changes since v13: > > - Move to kernel/membarrier.c. > > - Remove MEMBARRIER_PRIVATE flag. > > - Add MAINTAINERS file entry. > > > > Changes since v12: > > - Remove _FLAG suffix from uapi flags. > > - Add Expert menuconfig option CONFIG_MEMBARRIER (default=y). > > - Remove EXPEDITED mode. Only implement non-expedited for now, until > > reading the cpu_curr()->mm can be done without holding the CPU's rq > > lock. > > > > Changes since v11: > > - 5 years have passed. > > - Rebase on v3.19 kernel. > > - Add futex-alike PRIVATE vs SHARED semantic: private for per-process > > barriers, non-private for memory mappings shared between processes. > > - Simplify user API. > > - Code refactoring. > > > > Changes since v10: > > - Apply Randy's comments. > > - Rebase on 2.6.34-rc4 -tip. > > > > Changes since v9: > > - Clean up #ifdef CONFIG_SMP. > > > > Changes since v8: > > - Go back to rq spin locks taken by sys_membarrier() rather than adding > > memory barriers to the scheduler. It implies a potential RoS > > (reduction of service) if sys_membarrier() is executed in a busy-loop > > by a user, but nothing more than what is already possible with other > > existing system calls, but saves memory barriers in the scheduler fast > > path. > > - re-add the memory barrier comments to x86 switch_mm() as an example to > > other architectures. > > - Update documentation of the memory barriers in sys_membarrier and > > switch_mm(). > > - Append execution scenarios to the changelog showing the purpose of > > each memory barrier. > > > > Changes since v7: > > - Move spinlock-mb and scheduler related changes to separate patches. > > - Add support for sys_membarrier on x86_32. > > - Only x86 32/64 system calls are reserved in this patch. It is planned > > to incrementally reserve syscall IDs on other architectures as these > > are tested. > > > > Changes since v6: > > - Remove some unlikely() not so unlikely. > > - Add the proper scheduler memory barriers needed to only use the RCU > > read lock in sys_membarrier rather than take each runqueue spinlock: > > - Move memory barriers from per-architecture switch_mm() to schedule() > > and finish_lock_switch(), where they clearly document that all data > > protected by the rq lock is guaranteed to have memory barriers issued > > between the scheduler update and the task execution. Replacing the > > spin lock acquire/release barriers with these memory barriers imply > > either no overhead (x86 spinlock atomic instruction already implies a > > full mb) or some hopefully small overhead caused by the upgrade of the > > spinlock acquire/release barriers to more heavyweight smp_mb(). > > - The "generic" version of spinlock-mb.h declares both a mapping to > > standard spinlocks and full memory barriers. Each architecture can > > specialize this header following their own need and declare > > CONFIG_HAVE_SPINLOCK_MB to use their own spinlock-mb.h. > > - Note: benchmarks of scheduler overhead with specialized spinlock-mb.h > > implementations on a wide range of architecture would be welcome. > > > > Changes since v5: > > - Plan ahead for extensibility by introducing mandatory/optional masks > > to the "flags" system call parameter. Past experience with accept4(), > > signalfd4(), eventfd2(), epoll_create1(), dup3(), pipe2(), and > > inotify_init1() indicates that this is the kind of thing we want to > > plan for. Return -EINVAL if the mandatory flags received are unknown. > > - Create include/linux/membarrier.h to define these flags. > > - Add MEMBARRIER_QUERY optional flag. > > > > Changes since v4: > > - Add "int expedited" parameter, use synchronize_sched() in the > > non-expedited case. Thanks to Lai Jiangshan for making us consider > > seriously using synchronize_sched() to provide the low-overhead > > membarrier scheme. > > - Check num_online_cpus() == 1, quickly return without doing nothing. > > > > Changes since v3a: > > - Confirm that each CPU indeed runs the current task's ->mm before > > sending an IPI. Ensures that we do not disturb RT tasks in the > > presence of lazy TLB shootdown. > > - Document memory barriers needed in switch_mm(). > > - Surround helper functions with #ifdef CONFIG_SMP. > > > > Changes since v2: > > - simply send-to-many to the mm_cpumask. It contains the list of > > processors we have to IPI to (which use the mm), and this mask is > > updated atomically. > > > > Changes since v1: > > - Only perform the IPI in CONFIG_SMP. > > - Only perform the IPI if the process has more than one thread. > > - Only send IPIs to CPUs involved with threads belonging to our process. > > - Adaptative IPI scheme (single vs many IPI with threshold). > > - Issue smp_mb() at the beginning and end of the system call. > > > > Signed-off-by: Mathieu Desnoyers > > Reviewed-by: Paul E. McKenney > > CC: Josh Triplett > > Reviewed-by: Josh Triplett > Thanks! > But also, the "snip" and "changes since" should not be in the commit message, > while this list of signoffs and CCs should be. > Is there a typical way to handle this while keeping it attached to a commit locally in my git branch ? Thanks, Mathieu > - Josh Triplett > > > CC: KOSAKI Motohiro > > CC: Steven Rostedt > > CC: Nicholas Miell > > CC: Linus Torvalds > > CC: Ingo Molnar > > CC: Alan Cox > > CC: Lai Jiangshan > > CC: Stephen Hemminger > > CC: Andrew Morton > > CC: Thomas Gleixner > > CC: Peter Zijlstra > > CC: David Howells > > CC: Pranith Kumar > > CC: Michael Kerrisk > > CC: linux-api@vger.kernel.org > > --- > > MAINTAINERS | 8 ++++ > > arch/x86/syscalls/syscall_32.tbl | 1 + > > arch/x86/syscalls/syscall_64.tbl | 1 + > > include/linux/syscalls.h | 2 + > > include/uapi/asm-generic/unistd.h | 4 ++- > > include/uapi/linux/Kbuild | 1 + > > include/uapi/linux/membarrier.h | 53 +++++++++++++++++++++++++++++ > > init/Kconfig | 12 +++++++ > > kernel/Makefile | 1 + > > kernel/membarrier.c | 66 > > +++++++++++++++++++++++++++++++++++++ > > kernel/sys_ni.c | 3 ++ > > 11 files changed, 151 insertions(+), 1 deletions(-) > > create mode 100644 include/uapi/linux/membarrier.h > > create mode 100644 kernel/membarrier.c > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index 781e099..fcb63d4 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -6370,6 +6370,14 @@ W: http://www.mellanox.com > > Q: http://patchwork.ozlabs.org/project/netdev/list/ > > F: drivers/net/ethernet/mellanox/mlx4/en_* > > > > +MEMBARRIER SUPPORT > > +M: Mathieu Desnoyers > > +M: "Paul E. McKenney" > > +L: linux-kernel@vger.kernel.org > > +S: Supported > > +F: kernel/membarrier.c > > +F: include/uapi/linux/membarrier.h > > + > > MEMORY MANAGEMENT > > L: linux-mm@kvack.org > > W: http://www.linux-mm.org > > diff --git a/arch/x86/syscalls/syscall_32.tbl > > b/arch/x86/syscalls/syscall_32.tbl > > index ef8187f..e63ad61 100644 > > --- a/arch/x86/syscalls/syscall_32.tbl > > +++ b/arch/x86/syscalls/syscall_32.tbl > > @@ -365,3 +365,4 @@ > > 356 i386 memfd_create sys_memfd_create > > 357 i386 bpf sys_bpf > > 358 i386 execveat sys_execveat stub32_execveat > > +359 i386 membarrier sys_membarrier > > diff --git a/arch/x86/syscalls/syscall_64.tbl > > b/arch/x86/syscalls/syscall_64.tbl > > index 9ef32d5..87f3cd6 100644 > > --- a/arch/x86/syscalls/syscall_64.tbl > > +++ b/arch/x86/syscalls/syscall_64.tbl > > @@ -329,6 +329,7 @@ > > 320 common kexec_file_load sys_kexec_file_load > > 321 common bpf sys_bpf > > 322 64 execveat stub_execveat > > +323 common membarrier sys_membarrier > > > > # > > # x32-specific system call numbers start at 512 to avoid cache impact > > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h > > index 76d1e38..51a9054 100644 > > --- a/include/linux/syscalls.h > > +++ b/include/linux/syscalls.h > > @@ -884,4 +884,6 @@ asmlinkage long sys_execveat(int dfd, const char __user > > *filename, > > const char __user *const __user *argv, > > const char __user *const __user *envp, int flags); > > > > +asmlinkage long sys_membarrier(int cmd, int flags); > > + > > #endif > > diff --git a/include/uapi/asm-generic/unistd.h > > b/include/uapi/asm-generic/unistd.h > > index e016bd9..8da542a 100644 > > --- a/include/uapi/asm-generic/unistd.h > > +++ b/include/uapi/asm-generic/unistd.h > > @@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create) > > __SYSCALL(__NR_bpf, sys_bpf) > > #define __NR_execveat 281 > > __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat) > > +#define __NR_membarrier 282 > > +__SYSCALL(__NR_membarrier, sys_membarrier) > > > > #undef __NR_syscalls > > -#define __NR_syscalls 282 > > +#define __NR_syscalls 283 > > > > /* > > * All syscalls below here should go away really, > > diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild > > index 1a0006a..7bcc827 100644 > > --- a/include/uapi/linux/Kbuild > > +++ b/include/uapi/linux/Kbuild > > @@ -250,6 +250,7 @@ header-y += mdio.h > > header-y += media.h > > header-y += media-bus-format.h > > header-y += mei.h > > +header-y += membarrier.h > > header-y += memfd.h > > header-y += mempolicy.h > > header-y += meye.h > > diff --git a/include/uapi/linux/membarrier.h > > b/include/uapi/linux/membarrier.h > > new file mode 100644 > > index 0000000..e0b108b > > --- /dev/null > > +++ b/include/uapi/linux/membarrier.h > > @@ -0,0 +1,53 @@ > > +#ifndef _UAPI_LINUX_MEMBARRIER_H > > +#define _UAPI_LINUX_MEMBARRIER_H > > + > > +/* > > + * linux/membarrier.h > > + * > > + * membarrier system call API > > + * > > + * Copyright (c) 2010, 2015 Mathieu Desnoyers > > > > + * > > + * Permission is hereby granted, free of charge, to any person obtaining a > > copy > > + * of this software and associated documentation files (the "Software"), > > to deal > > + * in the Software without restriction, including without limitation the > > rights > > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or > > sell > > + * copies of the Software, and to permit persons to whom the Software is > > + * furnished to do so, subject to the following conditions: > > + * > > + * The above copyright notice and this permission notice shall be included > > in > > + * all copies or substantial portions of the Software. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS > > OR > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > > MERCHANTABILITY, > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > > THE > > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > > FROM, > > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS > > IN THE > > + * SOFTWARE. > > + */ > > + > > +/** > > + * enum membarrier_cmd - membarrier system call command > > + * @MEMBARRIER_CMD_QUERY: Query the set of supported commands. It > > returns > > + * a bitmask of valid commands. > > + * @MEMBARRIER_CMD_SHARED: Execute a memory barrier on all running > > threads. > > + * Upon return from system call, the caller > > thread > > + * is ensured that all running threads have > > passed > > + * through a state where all memory accesses to > > + * user-space addresses match program order > > between > > + * entry to and return from the system call > > + * (non-running threads are de facto in such a > > + * state). This covers threads from all processes > > + * running on the system. This command returns 0. > > + * > > + * Command to be passed to the membarrier system call. The commands need > > to > > + * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned > > to > > + * the value 0. > > + */ > > +enum membarrier_cmd { > > + MEMBARRIER_CMD_QUERY = 0, > > + MEMBARRIER_CMD_SHARED = (1 << 0), > > +}; > > + > > +#endif /* _UAPI_LINUX_MEMBARRIER_H */ > > diff --git a/init/Kconfig b/init/Kconfig > > index dc24dec..307e406 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -1583,6 +1583,18 @@ config PCI_QUIRKS > > bugs/quirks. Disable this only if your target machine is > > unaffected by PCI quirks. > > > > +config MEMBARRIER > > + bool "Enable membarrier() system call" if EXPERT > > + default y > > + help > > + Enable the membarrier() system call that allows issuing memory > > + barriers across all running threads, which can be used to distribute > > + the cost of user-space memory barriers asymmetrically by transforming > > + pairs of memory barriers into pairs consisting of membarrier() and a > > + compiler barrier. > > + > > + If unsure, say Y. > > + > > config EMBEDDED > > bool "Embedded system" > > option allnoconfig_y > > diff --git a/kernel/Makefile b/kernel/Makefile > > index 60c302c..05191fd 100644 > > --- a/kernel/Makefile > > +++ b/kernel/Makefile > > @@ -98,6 +98,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o > > obj-$(CONFIG_JUMP_LABEL) += jump_label.o > > obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o > > obj-$(CONFIG_TORTURE_TEST) += torture.o > > +obj-$(CONFIG_MEMBARRIER) += membarrier.o > > > > $(obj)/configs.o: $(obj)/config_data.h > > > > diff --git a/kernel/membarrier.c b/kernel/membarrier.c > > new file mode 100644 > > index 0000000..a20b279 > > --- /dev/null > > +++ b/kernel/membarrier.c > > @@ -0,0 +1,66 @@ > > +/* > > + * Copyright (C) 2010, 2015 Mathieu Desnoyers > > > > + * > > + * membarrier system call > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License as published by > > + * the Free Software Foundation; either version 2 of the License, or > > + * (at your option) any later version. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > + */ > > + > > +#include > > +#include > > + > > +/* > > + * Bitmask made from a "or" of all commands within enum membarrier_cmd, > > + * except MEMBARRIER_CMD_QUERY. > > + */ > > +#define MEMBARRIER_CMD_BITMASK (MEMBARRIER_CMD_SHARED) > > + > > +/** > > + * sys_membarrier - issue memory barriers on a set of threads > > + * @cmd: Takes command values defined in enum membarrier_cmd. > > + * @flags: Currently needs to be 0. For future extensions. > > + * > > + * If this system call is not implemented, -ENOSYS is returned. If the > > + * command specified does not exist, or if the command argument is > > invalid, > > + * this system call returns -EINVAL. For a given command, with flags > > argument > > + * set to 0, this system call is guaranteed to always return the same > > value > > + * until reboot. > > + * > > + * All memory accesses performed in program order from each targeted > > thread > > + * is guaranteed to be ordered with respect to sys_membarrier(). If we use > > + * the semantic "barrier()" to represent a compiler barrier forcing memory > > + * accesses to be performed in program order across the barrier, and > > + * smp_mb() to represent explicit memory barriers forcing full memory > > + * ordering across the barrier, we have the following ordering table for > > + * each pair of barrier(), sys_membarrier() and smp_mb(): > > + * > > + * The pair ordering is detailed as (O: ordered, X: not ordered): > > + * > > + * barrier() smp_mb() sys_membarrier() > > + * barrier() X X O > > + * smp_mb() X O O > > + * sys_membarrier() O O O > > + */ > > +SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) > > +{ > > + if (flags) > > + return -EINVAL; > > + switch (cmd) { > > + case MEMBARRIER_CMD_QUERY: > > + return MEMBARRIER_CMD_BITMASK; > > + case MEMBARRIER_CMD_SHARED: > > + if (num_online_cpus() > 1) > > + synchronize_sched(); > > + return 0; > > + default: > > + return -EINVAL; > > + } > > +} > > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c > > index 7995ef5..eb4fde0 100644 > > --- a/kernel/sys_ni.c > > +++ b/kernel/sys_ni.c > > @@ -243,3 +243,6 @@ cond_syscall(sys_bpf); > > > > /* execveat */ > > cond_syscall(sys_execveat); > > + > > +/* membarrier */ > > +cond_syscall(sys_membarrier); > > -- > > 1.7.7.3 > > > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/