Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752600AbdG1RqZ (ORCPT ); Fri, 28 Jul 2017 13:46:25 -0400 Received: from mail.efficios.com ([167.114.142.141]:41329 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752553AbdG1RqY (ORCPT ); Fri, 28 Jul 2017 13:46:24 -0400 Date: Fri, 28 Jul 2017 17:48:36 +0000 (UTC) From: Mathieu Desnoyers To: "Paul E. McKenney" Cc: Andrew Hunter , Avi Kivity , maged michael , gromer , linux-kernel Message-ID: <832910184.29636.1501264116824.JavaMail.zimbra@efficios.com> In-Reply-To: <20170728173123.GH3730@linux.vnet.ibm.com> References: <20170727181250.GA20183@linux.vnet.ibm.com> <5c8c6946-ce3a-6183-76a2-027823a9948a@scylladb.com> <20170727194322.GL3730@linux.vnet.ibm.com> <20170728173123.GH3730@linux.vnet.ibm.com> Subject: Re: Udpated sys_membarrier() speedup patch, FYI MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.141] X-Mailer: Zimbra 8.7.9_GA_1794 (ZimbraWebClient - FF52 (Linux)/8.7.9_GA_1794) Thread-Topic: Udpated sys_membarrier() speedup patch, FYI Thread-Index: 1qObBlq4I+HEHYWqY4ZgunmIcV8X/Q== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2154 Lines: 62 ----- On Jul 28, 2017, at 1:31 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > On Fri, Jul 28, 2017 at 10:15:49AM -0700, Andrew Hunter wrote: >> On Thu, Jul 27, 2017 at 12:43 PM, Paul E. McKenney >> wrote: >> > On Thu, Jul 27, 2017 at 10:20:14PM +0300, Avi Kivity wrote: >> >> IPIing only running threads of my process would be perfect. In fact >> >> I might even be able to make use of "membarrier these threads >> >> please" to reduce IPIs, when I change the topology from fully >> >> connected to something more sparse, on larger machines. >> >> We do this as well--sometimes we only need RSEQ fences against >> specific CPU(s), and thus pass a subset. > > Sounds like a good future enhancement, probably requiring a new syscall > to accommodate the cpumask. > >> > +static void membarrier_private_expedited_ipi_each(void) >> > +{ >> > + int cpu; >> > + >> > + for_each_online_cpu(cpu) { >> > + struct task_struct *p; >> > + >> > + rcu_read_lock(); >> > + p = task_rcu_dereference(&cpu_rq(cpu)->curr); >> > + if (p && p->mm == current->mm) >> > + smp_call_function_single(cpu, ipi_mb, NULL, 1); >> > + rcu_read_unlock(); >> > + } >> > +} >> > + >> >> We have the (simpler imho) >> >> const struct cpumask *mask = mm_cpumask(mm); >> /* possibly AND it with a user requested mask */ >> smp_call_function_many(mask, ipi_func, ....); >> >> which I think will be faster on some archs (that support broadcast) >> and have fewer problems with out of sync values (though we do have to >> check in our IPI function that we haven't context switched out. >> >> Am I missing why this won't work? > > My impression is that some architectures don't provide the needed > ordering in this case, and also that some architectures support ASIDs > and would thus IPI CPUs that weren't actually running threads in the > process at the current time. > > Mathieu, anything I am missing? As per my other email, it's pretty much it, yes. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com