Date: Sat, 29 Jul 2017 11:58:40 +1000
From: Nicholas Piggin <npiggin@gmail.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Boqun Feng <boqun.feng@gmail.com>, Andrew Hunter <ahh@google.com>,
        maged michael <maged.michael@gmail.com>, gromer <gromer@google.com>,
        Avi Kivity <avi@scylladb.com>, Michael Ellerman <mpe@ellerman.id.au>,
        Benjamin Herrenschmidt <benh@kernel.crashing.org>,
        Palmer Dabbelt <palmer@dabbelt.com>
Subject: Re: [RFC PATCH v2] membarrier: expedited private command
Message-ID: <20170729115840.7dff4ea5@roar.ozlabs.ibm.com>
In-Reply-To: <856243469.29609.1501261613685.JavaMail.zimbra@efficios.com>
References: <20170727211314.32666-1-mathieu.desnoyers@efficios.com>
        <20170728085532.ylhuz2irwmgpmejv@hirez.programming.kicks-ass.net>
        <20170728115702.5vgnvwhmbbmyrxbf@hirez.programming.kicks-ass.net>
        <2118431661.29566.1501256295573.JavaMail.zimbra@efficios.com>
        <20170728164642.jolhwyqs3swhzmrb@hirez.programming.kicks-ass.net>
        <856243469.29609.1501261613685.JavaMail.zimbra@efficios.com>
Organization: IBM
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2400
Lines: 56

On Fri, 28 Jul 2017 17:06:53 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Jul 28, 2017, at 12:46 PM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > On Fri, Jul 28, 2017 at 03:38:15PM +0000, Mathieu Desnoyers wrote:  
> >> > Which only leaves PPC stranded.. but the 'good' news is that mpe says
> >> > they'll probably need a barrier in switch_mm() in any case.  
> >> 
> >> As I pointed out in my other email, I plan to do this:
> >> 
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -2636,6 +2636,11 @@ static struct rq *finish_task_switch(struct task_struct
> >> *prev)
> >>         vtime_task_switch(prev);
> >>         perf_event_task_sched_in(prev, current);  
> > 
> > Here would place it _inside_ the rq->lock, which seems to make more
> > sense given the purpose of the barrier, but either way works given its
> > definition.  
> 
> Given its naming "...after_unlock_lock", I thought it would be clearer to put
> it after the unlock. Anyway, this barrier does not seem to be used to ensure
> the release barrier per se (unlock already has release semantic), but rather
> ensures a full memory barrier wrt memory accesses that are synchronized by
> means other than this this lock.
> 
> >   
> >>         finish_lock_switch(rq, prev);  
> > 
> > You could put the whole thing inside IS_ENABLED(CONFIG_SYSMEMBARRIER) or
> > something.  
> 
> I'm tempted to wait until we hear from powerpc maintainers, so we learn
> whether they deeply care about this extra barrier in finish_task_switch()
> before making it conditional on CONFIG_MEMBARRIER.
> 
> Having a guaranteed barrier after context switch on all architectures may
> have other uses.

I haven't had time to read the thread and understand exactly why you need
this extra barrier, I'll do it next week. Thanks for cc'ing us on it.

A smp_mb is pretty expensive on powerpc CPUs. Removing the sync from
switch_to increased thread switch performance by 2-3%. Putting it in
switch_mm may be a little less painful, but still we have to weigh it
against the benefit of this new functionality. Would that be a net win
for the average end-user? Seems unlikely.

But we also don't want to lose sys_membarrier completely. Would it be too
painful to make  MEMBARRIER_CMD_PRIVATE_EXPEDITED return error, or make it
fall back to a slower case if we decide not to implement it?

Thanks,
Nick