Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755768AbdIFW3X (ORCPT ); Wed, 6 Sep 2017 18:29:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:53720 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933031AbdIFW3P (ORCPT ); Wed, 6 Sep 2017 18:29:15 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5A6DD21AF6 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: AOwi7QDFbfcz5i5tr+Vt2Wu9JebDfhVpQNWsJJ8HqVwuMYxzdICHeZsHDbIU/AX+PQtqgiMPgVqNLWs4RXN1033zrbw= MIME-Version: 1.0 In-Reply-To: References: <20170904093158.k6pg3ytcbotjlhv5@gmail.com> <20170905214046.ishenhbj7jrtoufc@gmail.com> From: Andy Lutomirski Date: Wed, 6 Sep 2017 15:26:19 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [GIT PULL] x86/mm changes for v4.14: PCID support, 5-level paging support, Secure Memory Encryption support To: Jiri Kosina Cc: Linus Torvalds , Ingo Molnar , Linux Kernel Mailing List , Thomas Gleixner , "H. Peter Anvin" , Peter Zijlstra , Andrew Morton , Andy Lutomirski , Borislav Petkov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2134 Lines: 56 On Wed, Sep 6, 2017 at 2:16 PM, Jiri Kosina wrote: > On Wed, 6 Sep 2017, Jiri Kosina wrote: > >> This is a "me too", observed on my Lenovo thinkpad x270 (so it's not >> specific to that XPS 13 system at all). >> >> The symptom I observe is that an attempt to resume from hibernation >> proceeds up to reading 100% of the hibernation image, and then reboot >> happens (IOW looks like triple fault). >> >> nopcid cures it, I haven't tried to revert 10af6235e0d3 yet, but looks >> like it's the same thing. > > [ reposting the information again with LKML re-introduced to CC ] > > As suggested by Andy off-list, I tested with this change to always force > ASID 0 > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index 5ca71d1..c3b0811 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -35,7 +35,7 @@ static void choose_new_asid(struct mm_struct *next, u64 next_tlb_gen, > { > u16 asid; > > - if (!static_cpu_has(X86_FEATURE_PCID)) { > + if (true || !static_cpu_has(X86_FEATURE_PCID)) { > *new_asid = 0; > *need_flush = true; > return; > > and that fixes the issue on my system. I got Linus' config to boot. The problem was that I ended up with a root-owned file (not sure which) in my tree that cause an incorrect build but didn't generate errors. I don't know how this happened, but an ill-timed sudo make -j4 modules_install install was probably involved. git clean -ffxxxd , did *not* fix it or even notice it in any obvious way. Anyway, the problem appears to depend on kernel config because it's dying here on resume on secondary cpus: VM_BUG_ON(__read_cr3() != (__sme_pa(real_prev->pgd) | prev_asid)); in switch_mm_irqs_off(). What seems to be going on is that the wakeup CPU is exactly restoring original state. All other CPUs are restoring swapper_pg_dir but are failing to restore the PCID tag bits, which trips the assertion w.p. 5/6 per non-boot CPU. So, if you have that debug option set, you die w.p. 1 - (1/6)^(cpus - 1), which is pretty large. I'll come up with a clean fix this evening, I hope.