Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751663AbdGaTb3 (ORCPT ); Mon, 31 Jul 2017 15:31:29 -0400 Received: from mail.efficios.com ([167.114.142.141]:43750 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751599AbdGaTb1 (ORCPT ); Mon, 31 Jul 2017 15:31:27 -0400 Date: Mon, 31 Jul 2017 19:31:19 +0000 (UTC) From: Mathieu Desnoyers To: Nicholas Piggin Cc: Peter Zijlstra , "Paul E. McKenney" , linux-kernel , Boqun Feng , Andrew Hunter , maged michael , gromer , Avi Kivity , Michael Ellerman , Benjamin Herrenschmidt , Palmer Dabbelt , Dave Watson Message-ID: <1551913097.355.1501529479848.JavaMail.zimbra@efficios.com> In-Reply-To: <20170729115840.7dff4ea5@roar.ozlabs.ibm.com> References: <20170727211314.32666-1-mathieu.desnoyers@efficios.com> <20170728085532.ylhuz2irwmgpmejv@hirez.programming.kicks-ass.net> <20170728115702.5vgnvwhmbbmyrxbf@hirez.programming.kicks-ass.net> <2118431661.29566.1501256295573.JavaMail.zimbra@efficios.com> <20170728164642.jolhwyqs3swhzmrb@hirez.programming.kicks-ass.net> <856243469.29609.1501261613685.JavaMail.zimbra@efficios.com> <20170729115840.7dff4ea5@roar.ozlabs.ibm.com> Subject: Re: [RFC PATCH v2] membarrier: expedited private command MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.141] X-Mailer: Zimbra 8.7.9_GA_1794 (ZimbraWebClient - FF52 (Linux)/8.7.9_GA_1794) Thread-Topic: membarrier: expedited private command Thread-Index: U6h6xBnNY8kyWpeHBXvB3doI53U0rA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3864 Lines: 86 ----- On Jul 28, 2017, at 9:58 PM, Nicholas Piggin npiggin@gmail.com wrote: > On Fri, 28 Jul 2017 17:06:53 +0000 (UTC) > Mathieu Desnoyers wrote: > >> ----- On Jul 28, 2017, at 12:46 PM, Peter Zijlstra peterz@infradead.org wrote: >> >> > On Fri, Jul 28, 2017 at 03:38:15PM +0000, Mathieu Desnoyers wrote: >> >> > Which only leaves PPC stranded.. but the 'good' news is that mpe says >> >> > they'll probably need a barrier in switch_mm() in any case. >> >> >> >> As I pointed out in my other email, I plan to do this: >> >> >> >> --- a/kernel/sched/core.c >> >> +++ b/kernel/sched/core.c >> >> @@ -2636,6 +2636,11 @@ static struct rq *finish_task_switch(struct task_struct >> >> *prev) >> >> vtime_task_switch(prev); >> >> perf_event_task_sched_in(prev, current); >> > >> > Here would place it _inside_ the rq->lock, which seems to make more >> > sense given the purpose of the barrier, but either way works given its >> > definition. >> >> Given its naming "...after_unlock_lock", I thought it would be clearer to put >> it after the unlock. Anyway, this barrier does not seem to be used to ensure >> the release barrier per se (unlock already has release semantic), but rather >> ensures a full memory barrier wrt memory accesses that are synchronized by >> means other than this this lock. >> >> > >> >> finish_lock_switch(rq, prev); >> > >> > You could put the whole thing inside IS_ENABLED(CONFIG_SYSMEMBARRIER) or >> > something. >> >> I'm tempted to wait until we hear from powerpc maintainers, so we learn >> whether they deeply care about this extra barrier in finish_task_switch() >> before making it conditional on CONFIG_MEMBARRIER. >> >> Having a guaranteed barrier after context switch on all architectures may >> have other uses. > > I haven't had time to read the thread and understand exactly why you need > this extra barrier, I'll do it next week. Thanks for cc'ing us on it. > > A smp_mb is pretty expensive on powerpc CPUs. Removing the sync from > switch_to increased thread switch performance by 2-3%. Putting it in > switch_mm may be a little less painful, but still we have to weigh it > against the benefit of this new functionality. Would that be a net win > for the average end-user? Seems unlikely. > > But we also don't want to lose sys_membarrier completely. Would it be too > painful to make MEMBARRIER_CMD_PRIVATE_EXPEDITED return error, or make it > fall back to a slower case if we decide not to implement it? The need for an expedited membarrier comes from a need to use it to implement synchronization schemes like hazard pointers, RCU, and garbage collectors in user-space. One example is the use-case of hazard pointers. If the memory free is implemented in the same thread doing the retire, the slowdown introduced by non-expedited membarrier is not acceptable at all. In that case, only an expedited membarrier brings an acceptable slowdown. The user's alternative currently is to rely on undocumented side-effects of mprotect() to achieve the same result. This happens to work on some architectures, and may break in the future. If users do not have membarrier expedited on a given architecture, and are told that mprotect() does not provide the barrier guarantees they are looking for, then they would have to add heavy-weight memory barriers on many user-space fast-paths on those specific architectures, assuming they are willing to go through that trouble. I understand that the 2-3% overhead when switching between threads is a big deal. Do you have numbers on the overhead added by a memory barrier in switch_mm ? I suspect that switching between processes (including the cost of following cache line and TLB misses) will be quite heavier in the first place. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com