Received: by 10.223.185.116 with SMTP id b49csp3213578wrg; Mon, 5 Mar 2018 16:42:04 -0800 (PST) X-Google-Smtp-Source: AG47ELvddAs311iiNMYf7Xcge+x/MtavwuDOs4z9gQ96raXTZ3GmGEg6YFTGTzFLxjCRPNuK7cxi X-Received: by 10.98.174.16 with SMTP id q16mr16914653pff.92.1520296924251; Mon, 05 Mar 2018 16:42:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520296924; cv=none; d=google.com; s=arc-20160816; b=cyKD7di1uN5rt+M8DVSfudKMsIgwn8HjyGP11tyredNBk2mvcCKTFfOzsQnuuc9hGF MIaqmJD47deUhX9JrGVg0UchUZyASao9zxgdEWhw5GMiJAeoq8qVhRdh3qKpNytI07l6 CHyzMD4a5FbR3lNTpe26RNBHwlvFkgceYxbSfoDi386v/6eD39iulAenso6uFbtZ9xLZ Nqj1Vq8AIu3YRuPjARmh7CbP4TH9PMbu23SMAUjmAI36RTsD6W8uVRdtUqf5qLWLDXpM ijlq4ilSIz09uiH2gvnQwEKnc+N0slkv5IK8XqxpRLpvjLGrE06J8p85UV3BVbVGYxP+ cXXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:to:from:dkim-signature:arc-authentication-results; bh=U6LekhF5L6zObDuRnj+gIYlRM+qljTVVkXWjggyHW6c=; b=p1z4UYjySvIExM6dTINmUsdoqhrojBjnUnwFLqUVsTmMmfsiQGlEqhP/PV1wAYlOyW BmPMLjf4MME4E0hyTClobAEL/M4GW6IWN7qC/3bdpYcd935X6KetEjWrXw4yD5OCAN/O SxoZVegihZT9wwlyL200IdQdiXptpGxwSEF6dLysEBaza5kf3r6zcg4Mbescra9ZoIqM z6YgrHr9c6EtyIgCvjFIduTFE5CPXe7WjHx78vxTuJ2RFDirrVSo53mghhJq5bVN1obN GFnyWyoT7vNr+ssYmK3+dWtpolK+LtSv3IXWsePZpTPBc9J/l5EKHAP9qFj9o10vYhGi J+Zw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=tI8h0M/3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q126si11071487pfc.43.2018.03.05.16.41.49; Mon, 05 Mar 2018 16:42:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=tI8h0M/3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933545AbeCFAk3 (ORCPT + 99 others); Mon, 5 Mar 2018 19:40:29 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:42604 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932830AbeCFA0K (ORCPT ); Mon, 5 Mar 2018 19:26:10 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w260MiD6074334; Tue, 6 Mar 2018 00:26:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=U6LekhF5L6zObDuRnj+gIYlRM+qljTVVkXWjggyHW6c=; b=tI8h0M/3dqt3AMw4+f/sifzCLQyw039h4xlqHX7UPaMKNDCVgl/UTfajDRQlxnshKO5y s8zXMWG95sEZmWdNole0v2TrGeWYclJ6U85NInMwA/xvwUr9cScg0OFuMZyeDtX6wuU+ 1EoLI5iQuesoqyj60WhiLj3iYabvYc6XnB7ejAHk3qkhdlfix1YXWHpTFg6SELAm0zVk H2PEsqkm/Ad81yRtMiofDDyWju2dpzIDmCcslfw3NbIGedprWSfi9Z8PBL2X3cCcsnsC PUzk5o156XjjqZwko+dqrxIUx7b77LHy/YTYoHcwy3lP5VkGnYDMRInppBPIvq/Rulu0 KA== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2ghe5xgeyq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 06 Mar 2018 00:26:05 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w260Q4od024727 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 6 Mar 2018 00:26:04 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w260Q3DY022024; Tue, 6 Mar 2018 00:26:03 GMT Received: from localhost.localdomain (/98.216.35.41) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 05 Mar 2018 16:26:03 -0800 From: Pavel Tatashin To: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, linux-kernel@vger.kernel.org, Alexander.Levin@microsoft.com, dan.j.williams@intel.com, sathyanarayanan.kuppuswamy@intel.com, pankaj.laxminarayan.bharadiya@intel.com, akuster@mvista.com, cminyard@mvista.com, pasha.tatashin@oracle.com, gregkh@linuxfoundation.org, stable@vger.kernel.org Subject: [PATCH 4.1 08/65] x86/mm, sched/core: Uninline switch_mm() Date: Mon, 5 Mar 2018 19:24:41 -0500 Message-Id: <20180306002538.1761-9-pasha.tatashin@oracle.com> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180306002538.1761-1-pasha.tatashin@oracle.com> References: <20180306002538.1761-1-pasha.tatashin@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8823 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=845 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803060003 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org commit 69c0319aabba45bcf33178916a2f06967b4adede upstream. It's fairly large and it has quite a few callers. This may also help untangle some headers down the road. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/54f3367803e7f80b2be62c8a21879aa74b1a5f57.1461688545.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 70a39c7fd167399fde76aeac314dce026a255b49) Signed-off-by: Pavel Tatashin Conflicts: arch/x86/include/asm/mmu_context.h --- arch/x86/include/asm/mmu_context.h | 97 +---------------------------------- arch/x86/mm/tlb.c | 100 +++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+), 95 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 73e38f14ddeb..375fafccb32c 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -92,101 +92,8 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) #endif } -static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, - struct task_struct *tsk) -{ - unsigned cpu = smp_processor_id(); - - if (likely(prev != next)) { -#ifdef CONFIG_SMP - this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK); - this_cpu_write(cpu_tlbstate.active_mm, next); -#endif - cpumask_set_cpu(cpu, mm_cpumask(next)); - - /* - * Re-load page tables. - * - * This logic has an ordering constraint: - * - * CPU 0: Write to a PTE for 'next' - * CPU 0: load bit 1 in mm_cpumask. if nonzero, send IPI. - * CPU 1: set bit 1 in next's mm_cpumask - * CPU 1: load from the PTE that CPU 0 writes (implicit) - * - * We need to prevent an outcome in which CPU 1 observes - * the new PTE value and CPU 0 observes bit 1 clear in - * mm_cpumask. (If that occurs, then the IPI will never - * be sent, and CPU 0's TLB will contain a stale entry.) - * - * The bad outcome can occur if either CPU's load is - * reordered before that CPU's store, so both CPUs must - * execute full barriers to prevent this from happening. - * - * Thus, switch_mm needs a full barrier between the - * store to mm_cpumask and any operation that could load - * from next->pgd. TLB fills are special and can happen - * due to instruction fetches or for no reason at all, - * and neither LOCK nor MFENCE orders them. - * Fortunately, load_cr3() is serializing and gives the - * ordering guarantee we need. - * - */ - load_cr3(next->pgd); - - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); - - /* Stop flush ipis for the previous mm */ - cpumask_clear_cpu(cpu, mm_cpumask(prev)); - - /* Load per-mm CR4 state */ - load_mm_cr4(next); - - /* - * Load the LDT, if the LDT is different. - * - * It's possible that prev->context.ldt doesn't match - * the LDT register. This can happen if leave_mm(prev) - * was called and then modify_ldt changed - * prev->context.ldt but suppressed an IPI to this CPU. - * In this case, prev->context.ldt != NULL, because we - * never set context.ldt to NULL while the mm still - * exists. That means that next->context.ldt != - * prev->context.ldt, because mms never share an LDT. - */ - if (unlikely(prev->context.ldt != next->context.ldt)) - load_mm_ldt(next); - } -#ifdef CONFIG_SMP - else { - this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK); - BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) != next); - - if (!cpumask_test_cpu(cpu, mm_cpumask(next))) { - /* - * On established mms, the mm_cpumask is only changed - * from irq context, from ptep_clear_flush() while in - * lazy tlb mode, and here. Irqs are blocked during - * schedule, protecting us from simultaneous changes. - */ - cpumask_set_cpu(cpu, mm_cpumask(next)); - - /* - * We were in lazy tlb mode and leave_mm disabled - * tlb flush IPI delivery. We must reload CR3 - * to make sure to use no freed page tables. - * - * As above, load_cr3() is serializing and orders TLB - * fills with respect to the mm_cpumask write. - */ - load_cr3(next->pgd); - trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); - load_mm_cr4(next); - load_mm_ldt(next); - } - } -#endif -} +extern void switch_mm(struct mm_struct *prev, struct mm_struct *next, + struct task_struct *tsk); #define activate_mm(prev, next) \ do { \ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index a1aa5f59e3ad..40c640980720 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -59,6 +59,106 @@ void leave_mm(int cpu) } EXPORT_SYMBOL_GPL(leave_mm); +#endif /* CONFIG_SMP */ + +void switch_mm(struct mm_struct *prev, struct mm_struct *next, + struct task_struct *tsk) +{ + unsigned cpu = smp_processor_id(); + + if (likely(prev != next)) { +#ifdef CONFIG_SMP + this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK); + this_cpu_write(cpu_tlbstate.active_mm, next); +#endif + cpumask_set_cpu(cpu, mm_cpumask(next)); + + /* + * Re-load page tables. + * + * This logic has an ordering constraint: + * + * CPU 0: Write to a PTE for 'next' + * CPU 0: load bit 1 in mm_cpumask. if nonzero, send IPI. + * CPU 1: set bit 1 in next's mm_cpumask + * CPU 1: load from the PTE that CPU 0 writes (implicit) + * + * We need to prevent an outcome in which CPU 1 observes + * the new PTE value and CPU 0 observes bit 1 clear in + * mm_cpumask. (If that occurs, then the IPI will never + * be sent, and CPU 0's TLB will contain a stale entry.) + * + * The bad outcome can occur if either CPU's load is + * reordered before that CPU's store, so both CPUs must + * execute full barriers to prevent this from happening. + * + * Thus, switch_mm needs a full barrier between the + * store to mm_cpumask and any operation that could load + * from next->pgd. TLB fills are special and can happen + * due to instruction fetches or for no reason at all, + * and neither LOCK nor MFENCE orders them. + * Fortunately, load_cr3() is serializing and gives the + * ordering guarantee we need. + * + */ + load_cr3(next->pgd); + + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + + /* Stop flush ipis for the previous mm */ + cpumask_clear_cpu(cpu, mm_cpumask(prev)); + + /* Load per-mm CR4 state */ + load_mm_cr4(next); + + /* + * Load the LDT, if the LDT is different. + * + * It's possible that prev->context.ldt doesn't match + * the LDT register. This can happen if leave_mm(prev) + * was called and then modify_ldt changed + * prev->context.ldt but suppressed an IPI to this CPU. + * In this case, prev->context.ldt != NULL, because we + * never set context.ldt to NULL while the mm still + * exists. That means that next->context.ldt != + * prev->context.ldt, because mms never share an LDT. + */ + if (unlikely(prev->context.ldt != next->context.ldt)) + load_mm_ldt(next); + } +#ifdef CONFIG_SMP + else { + this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK); + BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) != next); + + if (!cpumask_test_cpu(cpu, mm_cpumask(next))) { + /* + * On established mms, the mm_cpumask is only changed + * from irq context, from ptep_clear_flush() while in + * lazy tlb mode, and here. Irqs are blocked during + * schedule, protecting us from simultaneous changes. + */ + cpumask_set_cpu(cpu, mm_cpumask(next)); + + /* + * We were in lazy tlb mode and leave_mm disabled + * tlb flush IPI delivery. We must reload CR3 + * to make sure to use no freed page tables. + * + * As above, load_cr3() is serializing and orders TLB + * fills with respect to the mm_cpumask write. + */ + load_cr3(next->pgd); + trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + load_mm_cr4(next); + load_mm_ldt(next); + } + } +#endif +} + +#ifdef CONFIG_SMP + /* * The flush IPI assumes that a thread switch happens in this order: * [cpu0: the cpu that switches] -- 2.16.2