Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp494100pxk; Thu, 24 Sep 2020 10:29:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJytahmi80lkWum8MDaYZVkw22dTJas+pHJP1hXnWs40QnPw0U2x64xJbNAAFnNSStw/YqhD X-Received: by 2002:a17:906:14ca:: with SMTP id y10mr1006777ejc.542.1600968587251; Thu, 24 Sep 2020 10:29:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600968587; cv=none; d=google.com; s=arc-20160816; b=IhIauRqPAmREE2nMIAvdk5FK1XbM20hIT7HgC/rq5RKnXtlfm59hDRwJta4R4LJXoX pQUzA9CnJDUVotm83R6LCIXgSXwy24ETC0EvTvFfsYqY/QKfuskDQ4uKEOB+HIdLUUf4 fuMMWP61p7Kbwd5snE8rvL7Kr5CCHBPmNhVYQBRmV3G1GBSln4xcSxAYsKZWGGUw8AUa md2Rtkj1sHI1zRQZZYUjVOK52KErUsQl9L3xRUvnoPV8EfyimkFtcw0rD6vtavIpJ0qx NS0b2iUtsat8ZqH9GFhQ8ZiQRiP/f2/IX2LXw6RBnBzc9bOK/zXS7+LiYi4E5KE0/VXn dCZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature:dkim-filter; bh=zdrxtBYfYBOFgWEPLTIsxULhzZQ9l77xgbOeF/2I5e8=; b=Ha7r4ZtE3EHiqdFLxdbdiDPcxQwzxMAdxKkgGzxOy/p6ECE39CGGPGMWhAMxsy/6sF IEeTzalCu2UHApunlpfsB2i54+nn5kPZ5tkrTBzPb5KuLErVEHSECvq/K0iSR6NUYJUX Yu6UJnz/f+7yw495qysdtIUsspmS6aG+IBOryXqd4XiVI8coanddLk4yZcM1YOjJGL+A 4U8b8zxqCwTku4q7xa8CEKeHBs14LjR9snxCNKShlc7R0adJ35mlYO5+Wj6wdG7H7LJN Aphj+pR1tJFfVQLzswIUMYx68Sx67HhEsqOB05zPwi6gnAcqoBtZs0jO7mpzNBhaogUr viaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=AdJL2VDy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p18si20286edy.100.2020.09.24.10.29.23; Thu, 24 Sep 2020 10:29:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=AdJL2VDy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728679AbgIXRZU (ORCPT + 99 others); Thu, 24 Sep 2020 13:25:20 -0400 Received: from mail.efficios.com ([167.114.26.124]:53556 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728421AbgIXRZT (ORCPT ); Thu, 24 Sep 2020 13:25:19 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id C9DF12D64DD; Thu, 24 Sep 2020 13:25:17 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id hdzMxv5viQO6; Thu, 24 Sep 2020 13:25:17 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 7047C2D655D; Thu, 24 Sep 2020 13:25:17 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 7047C2D655D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1600968317; bh=zdrxtBYfYBOFgWEPLTIsxULhzZQ9l77xgbOeF/2I5e8=; h=From:To:Date:Message-Id; b=AdJL2VDySGg8lfq74aDS7xukc2w/T8RLzsb/myaY2ePtJWYtBZVWtKNSu34bz2wnR XEZ0Q064f6sHnEmOTqIlQpjxypEmErCZPSpvKtUYojaoLXOppWSmMnkhkLQiR0oMP+ C7ZdGifRTddNMfujk3Gv+L8w4uw0d5FFS2GwIwfrH4F7nq9e3bG+7y3TtFNUZW/YpA 9v2sN2oxPte7ippkutQ0L/suRt5TVGojvAiubvObsrH6/Ie/JriFrU0sia5zYsyKj3 slrJc3J2AdnjhTkA7lETFAZZogreXk6K96coaylbzFvkDQi1QCrYpIQ5SB5loe0pKT fUaUo4g9b8o9A== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id A7n2xgkRa9mB; Thu, 24 Sep 2020 13:25:17 -0400 (EDT) Received: from localhost.localdomain (192-222-181-218.qc.cable.ebox.net [192.222.181.218]) by mail.efficios.com (Postfix) with ESMTPSA id 2ACDB2D690A; Thu, 24 Sep 2020 13:25:17 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Boqun Feng Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Will Deacon , "Paul E . McKenney" , Nicholas Piggin , Andy Lutomirski , Thomas Gleixner , Linus Torvalds , Alan Stern , linux-mm@kvack.org Subject: [RFC PATCH 1/3] sched: fix exit_mm vs membarrier (v3) Date: Thu, 24 Sep 2020 13:25:06 -0400 Message-Id: <20200924172508.8724-2-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200924172508.8724-1-mathieu.desnoyers@efficios.com> References: <20200924172508.8724-1-mathieu.desnoyers@efficios.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org exit_mm should issue memory barriers after user-space memory accesses, before clearing current->mm, to order user-space memory accesses performed prior to exit_mm before clearing tsk->mm, which has the effect of skipping the membarrier private expedited IPIs. exit_mm should also update the runqueue's membarrier_state so membarrier global expedited IPIs are not sent when they are not needed. The membarrier system call can be issued concurrently with do_exit if we have thread groups created with CLONE_VM but not CLONE_THREAD. Here is the scenario I have in mind: Two thread groups are created, A and B. Thread group B is created by issuing clone from group A with flag CLONE_VM set, but not CLONE_THREAD. Let's assume we have a single thread within each thread group (Thread A and Thread B). The AFAIU we can have: Userspace variables: int x = 0, y = 0; CPU 0 CPU 1 Thread A Thread B (in thread group A) (in thread group B) x = 1 barrier() y = 1 exit() exit_mm() current->mm = NULL; r1 = load y membarrier() skips CPU 0 (no IPI) because its current mm is NULL r2 = load x BUG_ON(r1 == 1 && r2 == 0) Signed-off-by: Mathieu Desnoyers Cc: Peter Zijlstra (Intel) Cc: Boqun Feng Cc: Will Deacon Cc: Paul E. McKenney Cc: Nicholas Piggin Cc: Andy Lutomirski Cc: Thomas Gleixner Cc: Linus Torvalds Cc: Alan Stern Cc: linux-mm@kvack.org --- Changes since v1: - Use smp_mb__after_spinlock rather than smp_mb. - Document race scenario in commit message. Changes since v2: - Introduce membarrier_update_current_mm, - Use membarrier_update_current_mm to update rq's membarrier_state from exit_mm. --- include/linux/sched/mm.h | 5 +++++ kernel/exit.c | 12 ++++++++++++ kernel/sched/membarrier.c | 12 ++++++++++++ 3 files changed, 29 insertions(+) diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index f889e332912f..5dd7f56baaba 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -370,6 +370,8 @@ static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) extern void membarrier_exec_mmap(struct mm_struct *mm); +extern void membarrier_update_current_mm(struct mm_struct *next_mm); + #else #ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS static inline void membarrier_arch_switch_mm(struct mm_struct *prev, @@ -384,6 +386,9 @@ static inline void membarrier_exec_mmap(struct mm_struct *mm) static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) { } +static inline void membarrier_update_current_mm(struct mm_struct *next_mm) +{ +} #endif #endif /* _LINUX_SCHED_MM_H */ diff --git a/kernel/exit.c b/kernel/exit.c index 733e80f334e7..0767a2dbf245 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -475,7 +475,19 @@ static void exit_mm(void) BUG_ON(mm != current->active_mm); /* more a memory barrier than a real lock */ task_lock(current); + /* + * When a thread stops operating on an address space, the loop + * in membarrier_private_expedited() may not observe that + * tsk->mm, and the loop in membarrier_global_expedited() may + * not observe a MEMBARRIER_STATE_GLOBAL_EXPEDITED + * rq->membarrier_state, so those would not issue an IPI. + * Membarrier requires a memory barrier after accessing + * user-space memory, before clearing tsk->mm or the + * rq->membarrier_state. + */ + smp_mb__after_spinlock(); current->mm = NULL; + membarrier_update_current_mm(NULL); mmap_read_unlock(mm); enter_lazy_tlb(mm, current); task_unlock(current); diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index 168479a7d61b..8bc8b8a888b7 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -63,6 +63,18 @@ void membarrier_exec_mmap(struct mm_struct *mm) this_cpu_write(runqueues.membarrier_state, 0); } +void membarrier_update_current_mm(struct mm_struct *next_mm) +{ + struct rq *rq = this_rq(); + int membarrier_state = 0; + + if (next_mm) + membarrier_state = atomic_read(&next_mm->membarrier_state); + if (READ_ONCE(rq->membarrier_state) == membarrier_state) + return; + WRITE_ONCE(rq->membarrier_state, membarrier_state); +} + static int membarrier_global_expedited(void) { int cpu; -- 2.17.1