Received: by 10.223.176.5 with SMTP id f5csp3433984wra; Mon, 29 Jan 2018 13:07:18 -0800 (PST) X-Google-Smtp-Source: AH8x227/guXnAdev0H2EjgnG8+EF1jZ8OWp8c/e8rxFA+w9jNAmkZ4dxAgQ/jSgPP6XdLM+uCGWl X-Received: by 10.98.9.67 with SMTP id e64mr27630312pfd.230.1517260038300; Mon, 29 Jan 2018 13:07:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517260038; cv=none; d=google.com; s=arc-20160816; b=GQCUlUXi4QLfo0xE3Nh3EUaFtWFwrcTcg4PDtY9AE9KWD/fmJauQufSLykHuO0Lmk3 geTdchtZJkwEt1kCf5XGih7cF6GnkEjrRq7x9AwssS4XTKO38FL9FJ+9sV0zbUyFzStW iX0L4WpvayDZHJ3HeylQNZEtEhNxAT2VKDofhd1L2wBIqkjbLuUYk6Fyr4Wueglwppd5 vQmyyQ2qBFtSBlrpBZYyQ2KevxJG66sFKb433d9C4BJcmYs7dZ6BgvIkmfEYtyyAHTz7 z3x6Eo+x8fHjUIXxAmjmEnFPWTByiqroTYpa6hrzlHEPmFGr1PkKJzg5UQgJ66V8cx2H Xs2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=e1nVfFSjJjoF3NxRKRhdH4S85QHYBAfaVLLikcl2Hds=; b=jMiJWZNx3t8xSqSzQHOL0gLdzgjY6o+Yu0LXzwDHgUgEDgXyEiBYRyyunB3/d4XHlJ gHto3yLNHRCzien86NhPWb9CrCJ4xyLUoSfXYA3Cwjkgu9GDLQYl/7xMzg6lO/rK/h28 pAU2+S3zFFCOdd5bObxB7jZCxxw3DkKH6IHBX9ek8y5w2DwWkQrhQxlnEdSioXXWIcLt GRVdqi9v0SxTSBA8s8qBwUWlhy3IxGYlSLaJKXzbaHAVoVsXG2qdqWMtq/8+bV9j9nQx ZPDpwOBzmIvRM+PJp5rtbBTMvwr3Y6ykp+nPn61ILfJ+i9CFiXmyves4EarN0aXmku2Z zBUw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x63si2743602pfk.335.2018.01.29.13.07.03; Mon, 29 Jan 2018 13:07:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751601AbeA2VGL (ORCPT + 99 others); Mon, 29 Jan 2018 16:06:11 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:49184 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752158AbeA2UGr (ORCPT ); Mon, 29 Jan 2018 15:06:47 -0500 Received: from localhost (LFbn-1-12258-90.w90-92.abo.wanadoo.fr [90.92.71.90]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 617923032; Mon, 29 Jan 2018 13:10:34 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Andy Lutomirski , Thomas Gleixner , Konstantin Khlebnikov , Dave Hansen , Borislav Petkov Subject: [PATCH 4.14 63/71] x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems Date: Mon, 29 Jan 2018 13:57:31 +0100 Message-Id: <20180129123831.837370025@linuxfoundation.org> X-Mailer: git-send-email 2.16.1 In-Reply-To: <20180129123827.271171825@linuxfoundation.org> References: <20180129123827.271171825@linuxfoundation.org> User-Agent: quilt/0.65 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Andy Lutomirski commit 5beda7d54eafece4c974cfa9fbb9f60fb18fd20a upstream. Neil Berrington reported a double-fault on a VM with 768GB of RAM that uses large amounts of vmalloc space with PTI enabled. The cause is that load_new_mm_cr3() was never fixed to take the 5-level pgd folding code into account, so, on a 4-level kernel, the pgd synchronization logic compiles away to exactly nothing. Interestingly, the problem doesn't trigger with nopti. I assume this is because the kernel is mapped with global pages if we boot with nopti. The sequence of operations when we create a new task is that we first load its mm while still running on the old stack (which crashes if the old stack is unmapped in the new mm unless the TLB saves us), then we call prepare_switch_to(), and then we switch to the new stack. prepare_switch_to() pokes the new stack directly, which will populate the mapping through vmalloc_fault(). I assume that we're getting lucky on non-PTI systems -- the old stack's TLB entry stays alive long enough to make it all the way through prepare_switch_to() and switch_to() so that we make it to a valid stack. Fixes: b50858ce3e2a ("x86/mm/vmalloc: Add 5-level paging support") Reported-and-tested-by: Neil Berrington Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Konstantin Khlebnikov Cc: Dave Hansen Cc: Borislav Petkov Link: https://lkml.kernel.org/r/346541c56caed61abbe693d7d2742b4a380c5001.1516914529.git.luto@kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/tlb.c | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -151,6 +151,34 @@ void switch_mm(struct mm_struct *prev, s local_irq_restore(flags); } +static void sync_current_stack_to_mm(struct mm_struct *mm) +{ + unsigned long sp = current_stack_pointer; + pgd_t *pgd = pgd_offset(mm, sp); + + if (CONFIG_PGTABLE_LEVELS > 4) { + if (unlikely(pgd_none(*pgd))) { + pgd_t *pgd_ref = pgd_offset_k(sp); + + set_pgd(pgd, *pgd_ref); + } + } else { + /* + * "pgd" is faked. The top level entries are "p4d"s, so sync + * the p4d. This compiles to approximately the same code as + * the 5-level case. + */ + p4d_t *p4d = p4d_offset(pgd, sp); + + if (unlikely(p4d_none(*p4d))) { + pgd_t *pgd_ref = pgd_offset_k(sp); + p4d_t *p4d_ref = p4d_offset(pgd_ref, sp); + + set_p4d(p4d, *p4d_ref); + } + } +} + void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) { @@ -226,11 +254,7 @@ void switch_mm_irqs_off(struct mm_struct * mapped in the new pgd, we'll double-fault. Forcibly * map it. */ - unsigned int index = pgd_index(current_stack_pointer); - pgd_t *pgd = next->pgd + index; - - if (unlikely(pgd_none(*pgd))) - set_pgd(pgd, init_mm.pgd[index]); + sync_current_stack_to_mm(next); } /* Stop remote flushes for the previous mm */