Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752535AbdG1Rh2 (ORCPT ); Fri, 28 Jul 2017 13:37:28 -0400 Received: from mail-io0-f179.google.com ([209.85.223.179]:33976 "EHLO mail-io0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752115AbdG1Rh1 (ORCPT ); Fri, 28 Jul 2017 13:37:27 -0400 MIME-Version: 1.0 In-Reply-To: <20170727190637.GK3730@linux.vnet.ibm.com> References: <20170727181250.GA20183@linux.vnet.ibm.com> <20170727190637.GK3730@linux.vnet.ibm.com> From: Andrew Hunter Date: Fri, 28 Jul 2017 10:37:25 -0700 Message-ID: Subject: Re: Udpated sys_membarrier() speedup patch, FYI To: "Paul E. McKenney" Cc: Avi Kivity , Maged Michael , Geoffrey Romer , lkml Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1682 Lines: 30 On Thu, Jul 27, 2017 at 12:06 PM, Paul E. McKenney wrote: > IPIin only those CPUs running threads in the same process as the > thread invoking membarrier() would be very nice! There is some LKML > discussion on this topic, which is currently circling around making this > determination reliable on all CPU families. ARM and x86 are thought > to be OK, PowerPC is thought to require a smallish patch, MIPS is > a big question mark, and so on. > I'm not sure what you mean by the determination or how this is arch specific? > But I am surprised when you say that the downgrade would not work, at > least if you are not running with nohz_full CPUs. The rcu_sched_qs() > function simply sets a per-CPU quiescent-state flag. The needed strong > ordering is instead supplied by the combination of the code starting > the grace period, reporting the setting of the quiescent-state flag > to core RCU, and the code completing the grace period. Each non-idle > CPU will execute full memory barriers either in RCU_SOFTIRQ context, > on entry to idle, on exit from idle, or within the grace-period kthread. > In particular, a CPU running the same usermode thread for the entire > grace period will execute the needed memory barriers in RCU_SOFTIRQ > context shortly after taking a scheduling-clock interrupt. > Recall that I need more than just a memory barrier--also to interrupt RSEQ critical sections in progress on those CPUs. I know this isn't general purpose, I'm just saying a trivial downgrade wouldn't work for me. :) It would probably be sufficient to set NOTIFY_RESUME on all cpus running my code (which is what my IPI function does anyway...)