Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3209321imm; Sun, 29 Jul 2018 12:56:46 -0700 (PDT) X-Google-Smtp-Source: AAOMgpecoyuBlRR2MqcBcPLv6gOLzwoEofx8289Zmj+7PQvtBBgJVc8JJnjl610gjlidGhmxrxdI X-Received: by 2002:a63:c60:: with SMTP id 32-v6mr13851828pgm.155.1532894206462; Sun, 29 Jul 2018 12:56:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532894206; cv=none; d=google.com; s=arc-20160816; b=q00f7E2fDuTtHMRd7uR4Xj9WEGRVyFVZWTO9NRkYBpKmjm42Rtmdxi8m6VgrjCoJW0 24HK7+NHwp7InKK8TnsoFO+8D+oJCO5ImozY6rumbNpfTTQmg0udOr/pOzH+uuTFVFMo Nvag0FiFOeMQBNK25PkKkeb8DSF2seXPenZRG3zHwerCKOy6hd2jxNstrZ5cmsSFPlSi EHEiK7MVh9Dz7Q2l82VgS94BCAWTNEzggKmTO7luAopIcXfrO7/8uEsEeY4/6sUEZ/Vo MAyVrgp7kt6AV8spkwS1PlADjjcS3uwe73rDnUTdOqP/lgDNzi1Ic6YDzHUgqRFp2Q9C QZOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=mqvRb+/5cpvEVQ3WyKs7BOhn4KVnNmXLXZLP3fcdiEE=; b=xiIWuhkvx0k25j2D9LLDNBPvxT064HFPYQvczWLNcATw8WnPAkcXwSnNdBtNKBgyQt pfL5lE7MzU4iryca5k3vFv5scsdee65ehvhWh7CaEYJkcDs3jF98oNhuH3zLhJZ2/ySK 7dNM6Upo+BopWJDW9WNmdBgyrMqZgCvX28mgBRN5IOTiE7CTQQDKe//eWxtn3UenE7VF aStoLOv105JyGZ05W/YEAzeiSR4yT0HlAjWd2XCOUxidMTZevpbNJY2Xkxc6Nhr4qbWj 7s7xiP8M0sPoC/rrOg4qhWt0WTq3O/LvGU+JuSzN3zz+NmfhTjSn9fHLnduNDvg8IHt1 Zoxg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z12-v6si7933791pgu.692.2018.07.29.12.56.32; Sun, 29 Jul 2018 12:56:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728632AbeG2V04 (ORCPT + 99 others); Sun, 29 Jul 2018 17:26:56 -0400 Received: from shelob.surriel.com ([96.67.55.147]:48302 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726644AbeG2V04 (ORCPT ); Sun, 29 Jul 2018 17:26:56 -0400 Received: from [2603:3005:d05:2b00:6e0b:84ff:fee2:98bb] (helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from ) id 1fjrmh-0002MM-Vm; Sun, 29 Jul 2018 15:55:19 -0400 Date: Sun, 29 Jul 2018 15:54:52 -0400 From: Rik van Riel To: Andy Lutomirski Cc: LKML , kernel-team , Peter Zijlstra , X86 ML , Vitaly Kuznetsov , Ingo Molnar , Mike Galbraith , Dave Hansen , Catalin Marinas , Benjamin Herrenschmidt Subject: [PATCH v2 11/11] mm,sched: conditionally skip lazy TLB mm refcounting Message-ID: <20180729155452.37eddc11@imladris.surriel.com> In-Reply-To: References: <20180728215357.3249-1-riel@surriel.com> <20180728215357.3249-11-riel@surriel.com> X-Mailer: Claws Mail 3.15.1-dirty (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 28 Jul 2018 21:21:17 -0700 Andy Lutomirski wrote: > On Sat, Jul 28, 2018 at 2:53 PM, Rik van Riel wrote: > > Conditionally skip lazy TLB mm refcounting. When an architecture has > > CONFIG_ARCH_NO_ACTIVE_MM_REFCOUNTING enabled, an mm that is used in > > lazy TLB mode anywhere will get shot down from exit_mmap, and there > > in no need to incur the cache line bouncing overhead of refcounting > > a lazy TLB mm. > > Unless I've misunderstood something, this patch results in idle tasks > whose active_mm has been freed still having active_mm pointing at > freed memory. ---8<--- Author: Rik van Riel Subject: [PATCH 11/11] mm,sched: conditionally skip lazy TLB mm refcounting Conditionally skip lazy TLB mm refcounting. When an architecture has CONFIG_ARCH_NO_ACTIVE_MM_REFCOUNTING enabled, an mm that is used in lazy TLB mode anywhere will get shot down from exit_mmap, and there in no need to incur the cache line bouncing overhead of refcounting a lazy TLB mm. Implement this by moving the refcounting of a lazy TLB mm to helper functions, which skip the refcounting when it is not necessary. Deal with use_mm and unuse_mm by fully splitting out the refcounting of the lazy TLB mm a kernel thread may have when entering use_mm from the refcounting of the mm that use_mm is about to start using. Signed-off-by: Rik van Riel --- arch/x86/mm/tlb.c | 5 +++-- fs/exec.c | 2 +- include/linux/sched/mm.h | 25 +++++++++++++++++++++++++ kernel/sched/core.c | 6 +++--- mm/mmu_context.c | 21 ++++++++++++++------- 5 files changed, 46 insertions(+), 13 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 425cb9fa2640..d53d9c19b97d 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -141,7 +142,7 @@ void leave_mm(void *dummy) switch_mm(NULL, &init_mm, NULL); current->active_mm = &init_mm; - mmdrop(loaded_mm); + drop_lazy_mm(loaded_mm); } EXPORT_SYMBOL_GPL(leave_mm); @@ -486,7 +487,7 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f, */ switch_mm_irqs_off(NULL, &init_mm, NULL); current->active_mm = &init_mm; - mmdrop(loaded_mm); + drop_lazy_mm(loaded_mm); return; } diff --git a/fs/exec.c b/fs/exec.c index bdd0eacefdf5..7a6d4811b02b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1043,7 +1043,7 @@ static int exec_mmap(struct mm_struct *mm) mmput(old_mm); return 0; } - mmdrop(active_mm); + drop_lazy_mm(active_mm); return 0; } diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 44d356f5e47c..7308bf38012f 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -49,6 +49,31 @@ static inline void mmdrop(struct mm_struct *mm) __mmdrop(mm); } +/* + * In lazy TLB mode, a CPU keeps the mm of the last process mapped while + * running a kernel thread or idle; we must make sure the lazy TLB mm and + * page tables do not disappear while a lazy TLB mode CPU uses them. + * There are two ways to handle the race between lazy TLB CPUs and exit_mmap: + * 1) Have a lazy TLB CPU hold a refcount on the lazy TLB mm. + * 2) Have the architecture code shoot down the lazy TLB mm from exit_mmap; + * in that case, refcounting can be skipped, reducing cache line bouncing. + */ +static inline void grab_lazy_mm(struct mm_struct *mm) +{ + if (IS_ENABLED(CONFIG_ARCH_NO_ACTIVE_MM_REFCOUNTING)) + return; + + mmgrab(mm); +} + +static inline void drop_lazy_mm(struct mm_struct *mm) +{ + if (IS_ENABLED(CONFIG_ARCH_NO_ACTIVE_MM_REFCOUNTING)) + return; + + mmdrop(mm); +} + /** * mmget() - Pin the address space associated with a &struct mm_struct. * @mm: The address space to pin. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c45de46fdf10..11724c9e88b0 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2691,7 +2691,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) */ if (mm) { membarrier_mm_sync_core_before_usermode(mm); - mmdrop(mm); + drop_lazy_mm(mm); } if (unlikely(prev_state == TASK_DEAD)) { if (prev->sched_class->task_dead) @@ -2805,7 +2805,7 @@ context_switch(struct rq *rq, struct task_struct *prev, */ if (!mm) { next->active_mm = oldmm; - mmgrab(oldmm); + grab_lazy_mm(oldmm); enter_lazy_tlb(oldmm, next); } else switch_mm_irqs_off(oldmm, mm, next); @@ -5532,7 +5532,7 @@ void idle_task_exit(void) current->active_mm = &init_mm; finish_arch_post_lock_switch(); } - mmdrop(mm); + drop_lazy_mm(mm); } /* diff --git a/mm/mmu_context.c b/mm/mmu_context.c index 3e612ae748e9..d5c2524cdd9a 100644 --- a/mm/mmu_context.c +++ b/mm/mmu_context.c @@ -24,12 +24,15 @@ void use_mm(struct mm_struct *mm) struct mm_struct *active_mm; struct task_struct *tsk = current; + /* Kernel threads have a NULL tsk->mm when entering. */ + WARN_ON(tsk->mm); + task_lock(tsk); + /* Previous ->active_mm was held in lazy TLB mode. */ active_mm = tsk->active_mm; - if (active_mm != mm) { - mmgrab(mm); - tsk->active_mm = mm; - } + /* Grab mm for reals; tsk->mm needs to stick around until unuse_mm. */ + mmgrab(mm); + tsk->active_mm = mm; tsk->mm = mm; switch_mm(active_mm, mm, tsk); task_unlock(tsk); @@ -37,8 +40,9 @@ void use_mm(struct mm_struct *mm) finish_arch_post_lock_switch(); #endif - if (active_mm != mm) - mmdrop(active_mm); + /* Drop the lazy TLB mode mm. */ + if (active_mm) + drop_lazy_mm(active_mm); } EXPORT_SYMBOL_GPL(use_mm); @@ -57,8 +61,11 @@ void unuse_mm(struct mm_struct *mm) task_lock(tsk); sync_mm_rss(mm); tsk->mm = NULL; - /* active_mm is still 'mm' */ + /* active_mm is still 'mm'; grab it as a lazy TLB mm */ + grab_lazy_mm(mm); enter_lazy_tlb(mm, tsk); + /* drop the tsk->mm refcount */ + mmdrop(mm); task_unlock(tsk); } EXPORT_SYMBOL_GPL(unuse_mm); -- 2.14.4