Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754584AbeAMGIn (ORCPT + 1 other); Sat, 13 Jan 2018 01:08:43 -0500 Received: from mail.kernel.org ([198.145.29.99]:59538 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750758AbeAMGIl (ORCPT ); Sat, 13 Jan 2018 01:08:41 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4E24F21772 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: ACJfBosSzjYc7vBGgup7GpIeOuw+hsqAibbaX7j3sDpqNZdjSYCiUYdmHl0A4ThgOUeVfSGtdmbRtgMTA7xDRA7snC4= MIME-Version: 1.0 In-Reply-To: References: <9eb15489-da09-7a4c-0700-7b6eb99e6f7b@redhat.com> From: Andy Lutomirski Date: Fri, 12 Jan 2018 22:08:20 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Yet another KPTI regression with 4.14.x series in a VM To: Thomas Gleixner , Peter Zijlstra , Borislav Petkov Cc: Laura Abbott , X86 ML , Linux Kernel Mailing List , stable , Andy Lutomirski Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Fri, Jan 12, 2018 at 1:51 PM, Thomas Gleixner wrote: > On Fri, 12 Jan 2018, Laura Abbott wrote: > > Cc+ Andy > > I'm almost crashed out by now. Andy might have an idea. I'll look again > tomorrow with brain awake. > >> On 01/12/2018 10:51 AM, Thomas Gleixner wrote: >> > On Fri, 12 Jan 2018, Laura Abbott wrote: >> > > Fedora got a bug report on 4.14.11 of a panic when booting a >> > > Fedora guest in a CentOS 6 VM, not reproducible with nopti. >> > > The issue is still present as of 4.14.13 as well. The only >> > > report is a panic screenshot >> > > https://bugzilla.redhat.com/show_bug.cgi?id=1532458 >> > > >> > > I've lost track of all the fixes that have been flying around, >> > > is this a new issue or has a fix not yet made it to stable? >> > >> > Hmm. Looks kinda familiar, but that has been fixed I think even before >> > 4.4.11. Could you please ask the reported to provide a full console log via >> > the VM "serial console" ? >> > >> > Thanks, >> > >> > tglx >> > >> >> [ 2.910025] PANIC: double fault, error_code: 0x0 Probably a stack overflow >> [ 2.910025] CPU: 1 PID: 56 Comm: modprobe Not tainted >> 4.14.13-300.fc27.x86_64 #1 >> [ 2.910025] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 >> [ 2.910025] task: ffff891f78dc3c00 task.stack: ffffa6eac0594000 >> [ 2.910025] RIP: 0010:vprintk_default+0x5/0x30 >> [ 2.910025] RSP: 0000:fffffe000002e000 EFLAGS: 00010046 >> [ 2.910025] RAX: 0000000000000000 RBX: fffffe000002e118 RCX: >> 0000000000000001 >> [ 2.910025] RDX: 0000000000000000 RSI: fffffe000002e018 RDI: >> ffffffffbe0715a0 >> [ 2.910025] RBP: fffffe000002e008 R08: ffffffffbe0bb565 R09: >> ffffffffbe07159b >> [ 2.910025] R10: fffffe000002e080 R11: 0000000000000000 R12: >> ffffffffbe070fdd >> [ 2.910025] R13: 0000000000000000 R14: 0000000000000000 R15: >> 0000000000000000 >> [ 2.910025] FS: 0000000000000000(0000) GS:ffff891f7fd00000(0000) >> knlGS:0000000000000000 >> [ 2.910025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 2.910025] CR2: fffffe000002dff8 CR3: 0000000078da6000 CR4: >> 00000000000006e0 CR2 and RSP are consistent with stack overflow, but that's not terribly important, since... >> [ 2.910025] Call Trace: >> [ 2.910025] >> [ 2.910025] ? vprintk_func+0x27/0x60 >> [ 2.910025] printk+0x52/0x6e >> [ 2.910025] __die+0x6b/0xe0 >> [ 2.910025] die+0x2f/0x50 >> [ 2.910025] do_general_protection+0x149/0x160 This is the real problem here. >> [ 2.910025] general_protection+0x2c/0x60 >> [ 2.910025] RIP: 0010:swapgs_restore_regs_and_return_to_usermode+0x6f/0x80 >> [ 2.910025] RSP: 0000:fffffe000002e1c8 EFLAGS: 00000006 >> [ 2.910025] RAX: 0000000000000000 RBX: 0000000000000000 RCX: >> 0000000000000000 >> [ 2.910025] RDX: 0000000000000000 RSI: 0000000000000000 RDI: >> 0000000078da7800 >> [ 2.910025] RBP: 0000000000000000 R08: 0000000000000000 R09: >> 0000000000000000 >> [ 2.910025] R10: 0000000000000000 R11: 0000000000000000 R12: >> 0000000000000000 >> [ 2.910025] R13: 0000000000000000 R14: 0000000000000000 R15: >> 0000000000000000 >> [ 2.910025] ffffffff81a00b3e: 65 48 0f b3 3c 25 8e btr %rdi,%gs:0x1e168e ffffffff81a00b45: 16 1e 00 ffffffff81a00b48: 48 89 c7 mov %rax,%rdi ffffffff81a00b4b: eb 08 jmp 0xffffffff81a00b55 ffffffff81a00b4d: 48 89 c7 mov %rax,%rdi ffffffff81a00b50: 48 0f ba ef 3f bts $0x3f,%rdi ffffffff81a00b55: 48 81 cf 00 18 00 00 or $0x1800,%rdi ffffffff81a00b5c: 0f 22 df mov %rdi,%cr3 <-- #GP here ffffffff81a00b5f: 58 pop %rax ffffffff81a00b60: 5f pop %rdi The CPU didn't like writing 0x0000000078da7800 to %cr3. I'm guessing that CR4.PCIDE=0. Now this is quite a strange value to write to CR3. The 0x800 part means that we're using the "user" variant of the address space that would have ASID=0 and the 0x1000 bit being set corresponds to the user pgdir, but this is nonsense, since the kernel never uses PCID 0 for user mode. We always start at 1. The only exception is if X86_FEATURE_PCID is off. But, if X86_FEATURE_PCID is off, then we shouldn't be setting any PCID bits. In fact, it looks like this code is totally bogus and has never been correct at all. Even in: commit 4b1d5ae3b103eda43f9d0f85c355bb6995b03a30 Author: Peter Zijlstra Date: Mon Dec 4 15:07:59 2017 +0100 x86/mm: Use/Fix PCID to optimize user/kernel switches We have: .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI mov %cr3, \scratch_reg ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID ... .Lwrcr3_\@: /* Flip the PGD and ASID to the user version */ orq $(PTI_SWITCH_MASK), \scratch_reg mov \scratch_reg, %cr3 .Lend_\@: That's bogus. PTI_SWITCH_MASK is 0x1800, which has PCID = 0x800. This should probably use an alternative to select between 0x1000 and 0x800 depending on X86_FEATURE_PCID or just use an entirely different label for the !PCID case. FWIW, this bit in SAVE_AND_SWITCH_TO_KERNEL_CR3 testq $(PTI_SWITCH_MASK), \scratch_reg jz .Ldone_\@ is a bit silly, too. It's *correct* (I think), but shouldn't that just be bt $(PTI_SWITCH_PGTABLES_BIT), \scratch_reg, with the obvious caveat that the headers don't actually define PTI_SWITCH_PGTABLES_BIT? Was this code *ever* tested with nopcid?