Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752436AbdG1RXH (ORCPT ); Fri, 28 Jul 2017 13:23:07 -0400 Received: from mail.efficios.com ([167.114.142.141]:41110 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751882AbdG1RXG (ORCPT ); Fri, 28 Jul 2017 13:23:06 -0400 Date: Fri, 28 Jul 2017 17:25:11 +0000 (UTC) From: Mathieu Desnoyers To: Andrew Hunter Cc: "Paul E. McKenney" , Avi Kivity , maged michael , gromer , linux-kernel Message-ID: <697197060.29633.1501262711614.JavaMail.zimbra@efficios.com> In-Reply-To: References: <20170727181250.GA20183@linux.vnet.ibm.com> <5c8c6946-ce3a-6183-76a2-027823a9948a@scylladb.com> <20170727194322.GL3730@linux.vnet.ibm.com> Subject: Re: Udpated sys_membarrier() speedup patch, FYI MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.141] X-Mailer: Zimbra 8.7.9_GA_1794 (ZimbraWebClient - FF52 (Linux)/8.7.9_GA_1794) Thread-Topic: Udpated sys_membarrier() speedup patch, FYI Thread-Index: 6bRj0VSHRzoj4jM0Lo1q6aAHgh7Nlg== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2047 Lines: 60 ----- On Jul 28, 2017, at 1:15 PM, Andrew Hunter ahh@google.com wrote: > On Thu, Jul 27, 2017 at 12:43 PM, Paul E. McKenney > wrote: >> On Thu, Jul 27, 2017 at 10:20:14PM +0300, Avi Kivity wrote: >>> IPIing only running threads of my process would be perfect. In fact >>> I might even be able to make use of "membarrier these threads >>> please" to reduce IPIs, when I change the topology from fully >>> connected to something more sparse, on larger machines. >>> > > We do this as well--sometimes we only need RSEQ fences against > specific CPU(s), and thus pass a subset. > >> +static void membarrier_private_expedited_ipi_each(void) >> +{ >> + int cpu; >> + >> + for_each_online_cpu(cpu) { >> + struct task_struct *p; >> + >> + rcu_read_lock(); >> + p = task_rcu_dereference(&cpu_rq(cpu)->curr); >> + if (p && p->mm == current->mm) >> + smp_call_function_single(cpu, ipi_mb, NULL, 1); >> + rcu_read_unlock(); >> + } >> +} >> + > > We have the (simpler imho) > > const struct cpumask *mask = mm_cpumask(mm); > /* possibly AND it with a user requested mask */ > smp_call_function_many(mask, ipi_func, ....); > > which I think will be faster on some archs (that support broadcast) > and have fewer problems with out of sync values (though we do have to > check in our IPI function that we haven't context switched out. > > Am I missing why this won't work? The mm cpumask is not populated on all architectures, unfortunately, so we need to do the generic implementation without it. Moreover, I recall that using this in addition to the rq->curr checks adds extra complexity wrt memory barriers vs updates of the mm_cpumask. The ipi_each loop you refer to here is only for the fallback case. The common case allocates a cpumask, populates it by looking at each rq->curr, and uses smp_call_function_many on that cpumask. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com