Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3309281imm; Tue, 17 Jul 2018 02:35:22 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfkcLxq6GRGTrz393g+nGGHbN4qLkNuWtFH9JaW9g6tPDADWEM8NpC5gzh+LoA8qv4iiRWt X-Received: by 2002:a65:4344:: with SMTP id k4-v6mr836772pgq.409.1531820122640; Tue, 17 Jul 2018 02:35:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531820122; cv=none; d=google.com; s=arc-20160816; b=IMWVbTz1UfiiqHcj+UmH1BjQxeq7vMtPwNo3UxWxx30aJX0NCmt5FEUR4Q/nYWJB0V J8AYdoK55zPR+MV6N5ULzM2imiIc/rs5iXFw5Xef2k/xFdFgJnJgq0oS2B7/bbzd8H4y PZhBfx7aShTSRWimn6JxZY70lszuJKsSIRdoiuzzF/3ghfbMd2tuXwqEHNEa9n2LN/GZ LD9j279mEwkaKV8muQjFQV2zj4gYbfATvpNdBO/dUIqw4oHuYYnVKyBtAB3sVLElfEl6 2wsDmQh07g/88NOegsZj5VTiiKP+fQfneuu/pYrcc4D34ZE9ojK44IuWzP9HxxKkZlIJ JRaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition :content-transfer-encoding:mime-version:robot-unsubscribe:robot-id :git-commit-id:subject:to:references:in-reply-to:reply-to:cc :message-id:from:date:arc-authentication-results; bh=K0ZWIjb5n3VM5g6kHZ5bEjDiNwwocspgPv3cdqbLqpo=; b=s0SDmUgOcY729o8TEgmPlyEzpWxgyOBiNM2PXa94xR1JFPiRf6qpPVjRQe7s+vYUes b5lfr85uyvJkROsitZnOHfZ5Oj8rh3AK/RK7bYu8T+QtbVYPAUBSmBVPdOrpnkbAA1VL k4l4lTDzfgznZWk3PviRtZCPwBdCD5exnPqwAeI4pQxzIVmYt2AMKRacx/61ZHt+9lWw cUDHA2LW7n6mZNbvG4u/DY74kKxYXCZruaeECImjBQXrttmRY+bkEp8F6BiGGpvjykuZ g2AOSoxYqrcbyo0QQh0WDc0qCIsmIFDv55y0ONJIYVgfNDpwvjl5lgrku8uPc9JalI/E C4ow== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g14-v6si460751plo.95.2018.07.17.02.35.07; Tue, 17 Jul 2018 02:35:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730460AbeGQKGM (ORCPT + 99 others); Tue, 17 Jul 2018 06:06:12 -0400 Received: from terminus.zytor.com ([198.137.202.136]:37685 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729602AbeGQKGM (ORCPT ); Tue, 17 Jul 2018 06:06:12 -0400 Received: from terminus.zytor.com (localhost [127.0.0.1]) by terminus.zytor.com (8.15.2/8.15.2) with ESMTPS id w6H9Y7641463334 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 17 Jul 2018 02:34:07 -0700 Received: (from tipbot@localhost) by terminus.zytor.com (8.15.2/8.15.2/Submit) id w6H9Y7cl1463331; Tue, 17 Jul 2018 02:34:07 -0700 Date: Tue, 17 Jul 2018 02:34:07 -0700 X-Authentication-Warning: terminus.zytor.com: tipbot set sender to tipbot@zytor.com using -f From: tip-bot for Rik van Riel Message-ID: Cc: dave.hansen@intel.com, linux-kernel@vger.kernel.org, peterz@infradead.org, hpa@zytor.com, riel@surriel.com, mingo@kernel.org, torvalds@linux-foundation.org, songliubraving@fb.com, tglx@linutronix.de Reply-To: tglx@linutronix.de, mingo@kernel.org, songliubraving@fb.com, torvalds@linux-foundation.org, riel@surriel.com, hpa@zytor.com, peterz@infradead.org, linux-kernel@vger.kernel.org, dave.hansen@intel.com In-Reply-To: <20180716190337.26133-3-riel@surriel.com> References: <20180716190337.26133-3-riel@surriel.com> To: linux-tip-commits@vger.kernel.org Subject: [tip:x86/mm] x86/mm/tlb: Leave lazy TLB mode at page table free time Git-Commit-ID: 2ff6ddf19c0ec40633bd14d8fe28a289816bd98d X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, DATE_IN_FUTURE_96_Q autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on terminus.zytor.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: 2ff6ddf19c0ec40633bd14d8fe28a289816bd98d Gitweb: https://git.kernel.org/tip/2ff6ddf19c0ec40633bd14d8fe28a289816bd98d Author: Rik van Riel AuthorDate: Mon, 16 Jul 2018 15:03:32 -0400 Committer: Ingo Molnar CommitDate: Tue, 17 Jul 2018 09:35:31 +0200 x86/mm/tlb: Leave lazy TLB mode at page table free time Andy discovered that speculative memory accesses while in lazy TLB mode can crash a system, when a CPU tries to dereference a speculative access using memory contents that used to be valid page table memory, but have since been reused for something else and point into la-la land. The latter problem can be prevented in two ways. The first is to always send a TLB shootdown IPI to CPUs in lazy TLB mode, while the second one is to only send the TLB shootdown at page table freeing time. The second should result in fewer IPIs, since operationgs like mprotect and madvise are very common with some workloads, but do not involve page table freeing. Also, on munmap, batching of page table freeing covers much larger ranges of virtual memory than the batching of unmapped user pages. Tested-by: Song Liu Signed-off-by: Rik van Riel Acked-by: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: efault@gmx.de Cc: kernel-team@fb.com Cc: luto@kernel.org Link: http://lkml.kernel.org/r/20180716190337.26133-3-riel@surriel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/tlbflush.h | 5 +++++ arch/x86/mm/tlb.c | 27 +++++++++++++++++++++++++++ include/asm-generic/tlb.h | 10 ++++++++++ mm/memory.c | 22 ++++++++++++++-------- 4 files changed, 56 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 6690cd3fc8b1..3aa3204b5dc0 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -554,4 +554,9 @@ extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); native_flush_tlb_others(mask, info) #endif +extern void tlb_flush_remove_tables(struct mm_struct *mm); +extern void tlb_flush_remove_tables_local(void *arg); + +#define HAVE_TLB_FLUSH_REMOVE_TABLES + #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 6eb1f34c3c85..9a893673c56b 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -646,6 +646,33 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, put_cpu(); } +void tlb_flush_remove_tables_local(void *arg) +{ + struct mm_struct *mm = arg; + + if (this_cpu_read(cpu_tlbstate.loaded_mm) == mm && + this_cpu_read(cpu_tlbstate.is_lazy)) { + /* + * We're in lazy mode. We need to at least flush our + * paging-structure cache to avoid speculatively reading + * garbage into our TLB. Since switching to init_mm is barely + * slower than a minimal flush, just switch to init_mm. + */ + switch_mm_irqs_off(NULL, &init_mm, NULL); + } +} + +void tlb_flush_remove_tables(struct mm_struct *mm) +{ + int cpu = get_cpu(); + /* + * XXX: this really only needs to be called for CPUs in lazy TLB mode. + */ + if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) + smp_call_function_many(mm_cpumask(mm), tlb_flush_remove_tables_local, (void *)mm, 1); + + put_cpu(); +} static void do_flush_tlb_all(void *info) { diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 3063125197ad..e811ef7b8350 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -303,4 +303,14 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb, #define tlb_migrate_finish(mm) do {} while (0) +/* + * Used to flush the TLB when page tables are removed, when lazy + * TLB mode may cause a CPU to retain intermediate translations + * pointing to about-to-be-freed page table memory. + */ +#ifndef HAVE_TLB_FLUSH_REMOVE_TABLES +#define tlb_flush_remove_tables(mm) do {} while (0) +#define tlb_flush_remove_tables_local(mm) do {} while (0) +#endif + #endif /* _ASM_GENERIC__TLB_H */ diff --git a/mm/memory.c b/mm/memory.c index 7206a634270b..18355e0b971a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -326,16 +326,20 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_ #ifdef CONFIG_HAVE_RCU_TABLE_FREE -/* - * See the comment near struct mmu_table_batch. - */ - static void tlb_remove_table_smp_sync(void *arg) { - /* Simply deliver the interrupt */ + struct mm_struct __maybe_unused *mm = arg; + /* + * On most architectures this does nothing. Simply delivering the + * interrupt is enough to prevent races with software page table + * walking like that done in get_user_pages_fast. + * + * See the comment near struct mmu_table_batch. + */ + tlb_flush_remove_tables_local(mm); } -static void tlb_remove_table_one(void *table) +static void tlb_remove_table_one(void *table, struct mmu_gather *tlb) { /* * This isn't an RCU grace period and hence the page-tables cannot be @@ -344,7 +348,7 @@ static void tlb_remove_table_one(void *table) * It is however sufficient for software page-table walkers that rely on * IRQ disabling. See the comment near struct mmu_table_batch. */ - smp_call_function(tlb_remove_table_smp_sync, NULL, 1); + smp_call_function(tlb_remove_table_smp_sync, tlb->mm, 1); __tlb_remove_table(table); } @@ -365,6 +369,8 @@ void tlb_table_flush(struct mmu_gather *tlb) { struct mmu_table_batch **batch = &tlb->batch; + tlb_flush_remove_tables(tlb->mm); + if (*batch) { call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu); *batch = NULL; @@ -387,7 +393,7 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table) if (*batch == NULL) { *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN); if (*batch == NULL) { - tlb_remove_table_one(table); + tlb_remove_table_one(table, tlb); return; } (*batch)->nr = 0;