Received: by 10.223.176.5 with SMTP id f5csp2990891wra; Mon, 5 Feb 2018 13:43:03 -0800 (PST) X-Google-Smtp-Source: AH8x224ilbftKNfhv8bOg2EjfmN3BtvRkog/BfHs2/TeG6+No/blP7U7SG10MMyKjSIHpqEuGhh5 X-Received: by 10.99.107.201 with SMTP id g192mr141196pgc.295.1517866983185; Mon, 05 Feb 2018 13:43:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517866983; cv=none; d=google.com; s=arc-20160816; b=bcske+y1l2EjVx9maTvCHiPZB7UEkhMAeynwuzuIdfvTPKRgQk99PjTOXvkzC7oVEG mREziuEi6tzh0d+18xS4V3i+GLnrtUYilqUKc0hDBfdjKmYkHZVX9PnMGnFmdUIzro2m n6th1vktmT3eVujwK4MmxRuUmELW+Ba7JjqfKwaUNqtRt3unqazaFehEK3gv/ngFt4RC YHIsLJq3WZyaSQ85BtihgmyF6yjF81XhJy0WCFfK5j/ZPjOpHC/mEFC9NYlWmXfUO5iA C7pgtL/IOhAOYvKNAvAhzwBNv9T0ClzGfEPrGyj8vreW0GGrN7RHQpg1IJRwKUbZirn6 dp+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition :content-transfer-encoding:mime-version:robot-unsubscribe:robot-id :git-commit-id:subject:to:references:in-reply-to:reply-to:cc :message-id:from:date:arc-authentication-results; bh=P3rBsL2Cjhw4DbBkYmlTyTSY4F50muPJZq/XwEeMM5E=; b=Ejp5Ms34ORtRPqRVOSs0D7ua0u3d3PaBMV7i3dj03MfgJiCfw80ReIiUwDtyHfQvFm OPi1bfOyACW5GXbWTVK7pK9rJgb4n8790T/iqy0YoglbstdMQHtkDpezqETIXtaYns3K jhDGxSPsX1UuMicg4CQaZ2xKEaTgsee/4oQr2ErSXIWD5x3B1CB2CzTdK0FwVPosxJIG +8bO6QDdIFe9leS+trKttstR6nSlMG6RpLn+5vFTyMiY32J5m888fypCJY/PbtYcAHrU vZw9DjXowoBDYL7TfQB/AsQIvzQhfU/JYY6dQ7FwRgEX3vne/+V9aVaoUj0wltq8u6SE u7nA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f5si5954078pgt.833.2018.02.05.13.42.48; Mon, 05 Feb 2018 13:43:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752524AbeBEVlx (ORCPT + 99 others); Mon, 5 Feb 2018 16:41:53 -0500 Received: from terminus.zytor.com ([65.50.211.136]:49303 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752203AbeBEVkz (ORCPT ); Mon, 5 Feb 2018 16:40:55 -0500 Received: from terminus.zytor.com (localhost [127.0.0.1]) by terminus.zytor.com (8.15.2/8.15.2) with ESMTP id w15LT9I9028922; Mon, 5 Feb 2018 13:29:09 -0800 Received: (from tipbot@localhost) by terminus.zytor.com (8.15.2/8.15.2/Submit) id w15LT77B028905; Mon, 5 Feb 2018 13:29:07 -0800 Date: Mon, 5 Feb 2018 13:29:07 -0800 X-Authentication-Warning: terminus.zytor.com: tipbot set sender to tipbot@zytor.com using -f From: tip-bot for Mathieu Desnoyers Message-ID: Cc: parri.andrea@gmail.com, paulmck@linux.vnet.ibm.com, mathieu.desnoyers@efficios.com, hpa@zytor.com, davejwatson@fb.com, tglx@linutronix.de, will.deacon@arm.com, peterz@infradead.org, luto@kernel.org, ahh@google.com, mpe@ellerman.id.au, avi@scylladb.com, maged.michael@gmail.com, mingo@kernel.org, boqun.feng@gmail.com, linux-kernel@vger.kernel.org, benh@kernel.crashing.org, linux@armlinux.org.uk, torvalds@linux-foundation.org, ghackmann@google.com, paulus@samba.org, sehr@google.com Reply-To: mpe@ellerman.id.au, peterz@infradead.org, will.deacon@arm.com, luto@kernel.org, ahh@google.com, davejwatson@fb.com, tglx@linutronix.de, parri.andrea@gmail.com, paulmck@linux.vnet.ibm.com, mathieu.desnoyers@efficios.com, hpa@zytor.com, paulus@samba.org, sehr@google.com, torvalds@linux-foundation.org, linux@armlinux.org.uk, benh@kernel.crashing.org, ghackmann@google.com, maged.michael@gmail.com, mingo@kernel.org, boqun.feng@gmail.com, linux-kernel@vger.kernel.org, avi@scylladb.com In-Reply-To: <20180129202020.8515-9-mathieu.desnoyers@efficios.com> References: <20180129202020.8515-9-mathieu.desnoyers@efficios.com> To: linux-tip-commits@vger.kernel.org Subject: [tip:sched/urgent] membarrier: Provide core serializing command, *_SYNC_CORE Git-Commit-ID: 70216e18e519a54a2f13569e8caff99a092a92d6 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on terminus.zytor.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: 70216e18e519a54a2f13569e8caff99a092a92d6 Gitweb: https://git.kernel.org/tip/70216e18e519a54a2f13569e8caff99a092a92d6 Author: Mathieu Desnoyers AuthorDate: Mon, 29 Jan 2018 15:20:17 -0500 Committer: Ingo Molnar CommitDate: Mon, 5 Feb 2018 21:35:03 +0100 membarrier: Provide core serializing command, *_SYNC_CORE Provide core serializing membarrier command to support memory reclaim by JIT. Each architecture needs to explicitly opt into that support by documenting in their architecture code how they provide the core serializing instructions required when returning from the membarrier IPI, and after the scheduler has updated the curr->mm pointer (before going back to user-space). They should then select ARCH_HAS_MEMBARRIER_SYNC_CORE to enable support for that command on their architecture. Architectures selecting this feature need to either document that they issue core serializing instructions when returning to user-space, or implement their architecture-specific sync_core_before_usermode(). Signed-off-by: Mathieu Desnoyers Acked-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Cc: Andrea Parri Cc: Andrew Hunter Cc: Andy Lutomirski Cc: Avi Kivity Cc: Benjamin Herrenschmidt Cc: Boqun Feng Cc: Dave Watson Cc: David Sehr Cc: Greg Hackmann Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Maged Michael Cc: Michael Ellerman Cc: Paul E. McKenney Cc: Paul Mackerras Cc: Russell King Cc: Will Deacon Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-9-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar --- include/linux/sched/mm.h | 18 ++++++++++++++ include/uapi/linux/membarrier.h | 32 ++++++++++++++++++++++++- init/Kconfig | 3 +++ kernel/sched/core.c | 18 ++++++++++---- kernel/sched/membarrier.c | 53 +++++++++++++++++++++++++++++++---------- 5 files changed, 106 insertions(+), 18 deletions(-) diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 1c4e40c..03a1690 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -7,6 +7,7 @@ #include #include #include +#include /* * Routines for handling mm_structs @@ -223,12 +224,26 @@ enum { MEMBARRIER_STATE_PRIVATE_EXPEDITED = (1U << 1), MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY = (1U << 2), MEMBARRIER_STATE_GLOBAL_EXPEDITED = (1U << 3), + MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY = (1U << 4), + MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE = (1U << 5), +}; + +enum { + MEMBARRIER_FLAG_SYNC_CORE = (1U << 0), }; #ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS #include #endif +static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) +{ + if (likely(!(atomic_read(&mm->membarrier_state) & + MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE))) + return; + sync_core_before_usermode(); +} + static inline void membarrier_execve(struct task_struct *t) { atomic_set(&t->mm->membarrier_state, 0); @@ -244,6 +259,9 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev, static inline void membarrier_execve(struct task_struct *t) { } +static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) +{ +} #endif #endif /* _LINUX_SCHED_MM_H */ diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h index d252506..5891d76 100644 --- a/include/uapi/linux/membarrier.h +++ b/include/uapi/linux/membarrier.h @@ -73,7 +73,7 @@ * to and return from the system call * (non-running threads are de facto in such a * state). This only covers threads from the - * same processes as the caller thread. This + * same process as the caller thread. This * command returns 0 on success. The * "expedited" commands complete faster than * the non-expedited ones, they never block, @@ -86,6 +86,34 @@ * Register the process intent to use * MEMBARRIER_CMD_PRIVATE_EXPEDITED. Always * returns 0. + * @MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE: + * In addition to provide memory ordering + * guarantees described in + * MEMBARRIER_CMD_PRIVATE_EXPEDITED, ensure + * the caller thread, upon return from system + * call, that all its running threads siblings + * have executed a core serializing + * instruction. (architectures are required to + * guarantee that non-running threads issue + * core serializing instructions before they + * resume user-space execution). This only + * covers threads from the same process as the + * caller thread. This command returns 0 on + * success. The "expedited" commands complete + * faster than the non-expedited ones, they + * never block, but have the downside of + * causing extra overhead. If this command is + * not implemented by an architecture, -EINVAL + * is returned. A process needs to register its + * intent to use the private expedited sync + * core command prior to using it, otherwise + * this command returns -EPERM. + * @MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE: + * Register the process intent to use + * MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE. + * If this command is not implemented by an + * architecture, -EINVAL is returned. + * Returns 0 on success. * @MEMBARRIER_CMD_SHARED: * Alias to MEMBARRIER_CMD_GLOBAL. Provided for * header backward compatibility. @@ -101,6 +129,8 @@ enum membarrier_cmd { MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED = (1 << 2), MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED = (1 << 4), + MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE = (1 << 5), + MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE = (1 << 6), /* Alias for header backward compatibility. */ MEMBARRIER_CMD_SHARED = MEMBARRIER_CMD_GLOBAL, diff --git a/init/Kconfig b/init/Kconfig index 535421f..e37f4b2 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1415,6 +1415,9 @@ config USERFAULTFD config ARCH_HAS_MEMBARRIER_CALLBACKS bool +config ARCH_HAS_MEMBARRIER_SYNC_CORE + bool + config EMBEDDED bool "Embedded system" option allnoconfig_y diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 11bf4d4..ee420d7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2704,13 +2704,21 @@ static struct rq *finish_task_switch(struct task_struct *prev) fire_sched_in_preempt_notifiers(current); /* - * When transitioning from a kernel thread to a userspace - * thread, mmdrop()'s implicit full barrier is required by the - * membarrier system call, because the current ->active_mm can - * become the current mm without going through switch_mm(). + * When switching through a kernel thread, the loop in + * membarrier_{private,global}_expedited() may have observed that + * kernel thread and not issued an IPI. It is therefore possible to + * schedule between user->kernel->user threads without passing though + * switch_mm(). Membarrier requires a barrier after storing to + * rq->curr, before returning to userspace, so provide them here: + * + * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly + * provided by mmdrop(), + * - a sync_core for SYNC_CORE. */ - if (mm) + if (mm) { + membarrier_mm_sync_core_before_usermode(mm); mmdrop(mm); + } if (unlikely(prev_state == TASK_DEAD)) { if (prev->sched_class->task_dead) prev->sched_class->task_dead(prev); diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index d2087d5..5d07626 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -26,11 +26,20 @@ * Bitmask made from a "or" of all commands within enum membarrier_cmd, * except MEMBARRIER_CMD_QUERY. */ +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE +#define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK \ + (MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE \ + | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE) +#else +#define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK 0 +#endif + #define MEMBARRIER_CMD_BITMASK \ (MEMBARRIER_CMD_GLOBAL | MEMBARRIER_CMD_GLOBAL_EXPEDITED \ | MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED \ | MEMBARRIER_CMD_PRIVATE_EXPEDITED \ - | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED) + | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED \ + | MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK) static void ipi_mb(void *info) { @@ -104,15 +113,23 @@ static int membarrier_global_expedited(void) return 0; } -static int membarrier_private_expedited(void) +static int membarrier_private_expedited(int flags) { int cpu; bool fallback = false; cpumask_var_t tmpmask; - if (!(atomic_read(¤t->mm->membarrier_state) - & MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)) - return -EPERM; + if (flags & MEMBARRIER_FLAG_SYNC_CORE) { + if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) + return -EINVAL; + if (!(atomic_read(¤t->mm->membarrier_state) & + MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY)) + return -EPERM; + } else { + if (!(atomic_read(¤t->mm->membarrier_state) & + MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)) + return -EPERM; + } if (num_online_cpus() == 1) return 0; @@ -205,20 +222,29 @@ static int membarrier_register_global_expedited(void) return 0; } -static int membarrier_register_private_expedited(void) +static int membarrier_register_private_expedited(int flags) { struct task_struct *p = current; struct mm_struct *mm = p->mm; + int state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY; + + if (flags & MEMBARRIER_FLAG_SYNC_CORE) { + if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) + return -EINVAL; + state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY; + } /* * We need to consider threads belonging to different thread * groups, which use the same mm. (CLONE_VM but not * CLONE_THREAD). */ - if (atomic_read(&mm->membarrier_state) - & MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY) + if (atomic_read(&mm->membarrier_state) & state) return 0; atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state); + if (flags & MEMBARRIER_FLAG_SYNC_CORE) + atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE, + &mm->membarrier_state); if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) { /* * Ensure all future scheduler executions will observe the @@ -226,8 +252,7 @@ static int membarrier_register_private_expedited(void) */ synchronize_sched(); } - atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY, - &mm->membarrier_state); + atomic_or(state, &mm->membarrier_state); return 0; } @@ -283,9 +308,13 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED: return membarrier_register_global_expedited(); case MEMBARRIER_CMD_PRIVATE_EXPEDITED: - return membarrier_private_expedited(); + return membarrier_private_expedited(0); case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED: - return membarrier_register_private_expedited(); + return membarrier_register_private_expedited(0); + case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE: + return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE); + case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE: + return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE); default: return -EINVAL; }