Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751609AbdG0TUT (ORCPT ); Thu, 27 Jul 2017 15:20:19 -0400 Received: from mail-wm0-f41.google.com ([74.125.82.41]:38614 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751461AbdG0TUS (ORCPT ); Thu, 27 Jul 2017 15:20:18 -0400 Subject: Re: Udpated sys_membarrier() speedup patch, FYI To: paulmck@linux.vnet.ibm.com, maged.michael@gmail.com, ahh@google.com, gromer@google.com Cc: linux-kernel@vger.kernel.org References: <20170727181250.GA20183@linux.vnet.ibm.com> From: Avi Kivity Message-ID: <5c8c6946-ce3a-6183-76a2-027823a9948a@scylladb.com> Date: Thu, 27 Jul 2017 22:20:14 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170727181250.GA20183@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5558 Lines: 127 On 07/27/2017 09:12 PM, Paul E. McKenney wrote: > Hello! > > Please see below for a prototype sys_membarrier() speedup patch. > Please note that there is some controversy on this subject, so the final > version will probably be quite a bit different than this prototype. > > But my main question is whether the throttling shown below is acceptable > for your use cases, namely only one expedited sys_membarrier() permitted > per scheduling-clock period (1 millisecond on many platforms), with any > excess being silently converted to non-expedited form. The reason for > the throttling is concerns about DoS attacks based on user code with a > tight loop invoking this system call. > > Thoughts? Silent throttling would render it useless for me. -EAGAIN is a little better, but I'd be forced to spin until either I get kicked out of my loop, or it succeeds. IPIing only running threads of my process would be perfect. In fact I might even be able to make use of "membarrier these threads please" to reduce IPIs, when I change the topology from fully connected to something more sparse, on larger machines. My previous implementations were a signal (but that's horrible on large machines) and trylock + mprotect (but that doesn't work on ARM). > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 4cd5253094b6d7f9501e21e13aa4e2e78e8a70cd > Author: Paul E. McKenney > Date: Tue Jul 18 13:53:32 2017 -0700 > > sys_membarrier: Add expedited option > > The sys_membarrier() system call has proven too slow for some use cases, > which has prompted users to instead rely on TLB shootdown. Although TLB > shootdown is much faster, it has the slight disadvantage of not working > at all on arm and arm64 and also of being vulnerable to reasonable > optimizations that might skip some IPIs. However, the Linux kernel > does not currrently provide a reasonable alternative, so it is hard to > criticize these users from doing what works for them on a given piece > of hardware at a given time. > > This commit therefore adds an expedited option to the sys_membarrier() > system call, thus providing a faster mechanism that is portable and > is not subject to death by optimization. Note that if more than one > MEMBARRIER_CMD_SHARED_EXPEDITED sys_membarrier() call happens within > the same jiffy, all but the first will use synchronize_sched() instead > of synchronize_sched_expedited(). > > Signed-off-by: Paul E. McKenney > [ paulmck: Fix code style issue pointed out by Boqun Feng. ] > Tested-by: Avi Kivity > Cc: Maged Michael > Cc: Andrew Hunter > Cc: Geoffrey Romer > > diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h > index e0b108bd2624..5720386d0904 100644 > --- a/include/uapi/linux/membarrier.h > +++ b/include/uapi/linux/membarrier.h > @@ -40,6 +40,16 @@ > * (non-running threads are de facto in such a > * state). This covers threads from all processes > * running on the system. This command returns 0. > + * @MEMBARRIER_CMD_SHARED_EXPEDITED: Execute a memory barrier on all > + * running threads, but in an expedited fashion. > + * Upon return from system call, the caller thread > + * is ensured that all running threads have passed > + * through a state where all memory accesses to > + * user-space addresses match program order between > + * entry to and return from the system call > + * (non-running threads are de facto in such a > + * state). This covers threads from all processes > + * running on the system. This command returns 0. > * > * Command to be passed to the membarrier system call. The commands need to > * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to > @@ -48,6 +58,7 @@ > enum membarrier_cmd { > MEMBARRIER_CMD_QUERY = 0, > MEMBARRIER_CMD_SHARED = (1 << 0), > + MEMBARRIER_CMD_SHARED_EXPEDITED = (1 << 1), > }; > > #endif /* _UAPI_LINUX_MEMBARRIER_H */ > diff --git a/kernel/membarrier.c b/kernel/membarrier.c > index 9f9284f37f8d..587e3bbfae7e 100644 > --- a/kernel/membarrier.c > +++ b/kernel/membarrier.c > @@ -22,7 +22,8 @@ > * Bitmask made from a "or" of all commands within enum membarrier_cmd, > * except MEMBARRIER_CMD_QUERY. > */ > -#define MEMBARRIER_CMD_BITMASK (MEMBARRIER_CMD_SHARED) > +#define MEMBARRIER_CMD_BITMASK (MEMBARRIER_CMD_SHARED | \ > + MEMBARRIER_CMD_SHARED_EXPEDITED) > > /** > * sys_membarrier - issue memory barriers on a set of threads > @@ -64,6 +65,20 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) > if (num_online_cpus() > 1) > synchronize_sched(); > return 0; > + case MEMBARRIER_CMD_SHARED_EXPEDITED: > + if (num_online_cpus() > 1) { > + static unsigned long lastexp; > + unsigned long j; > + > + j = jiffies; > + if (READ_ONCE(lastexp) == j) { > + synchronize_sched(); > + WRITE_ONCE(lastexp, j); > + } else { > + synchronize_sched_expedited(); > + } > + } > + return 0; > default: > return -EINVAL; > } >