Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757251AbdIHVrX (ORCPT ); Fri, 8 Sep 2017 17:47:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:60858 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753011AbdIHVrW (ORCPT ); Fri, 8 Sep 2017 17:47:22 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8875721EAD Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: AOwi7QB2PdMc8L+FHSw6M5faKAWkKNgopcP0/m+pqXy+1Mc9rEBlnJ/PEDJU7Cu2T/JVvJD1AwRaYlEeDRo4+AddJPs= MIME-Version: 1.0 In-Reply-To: <20170908171633.GA279@x4> References: <20170907062845.GA280@x4> <20170908053534.GA276@x4> <20170908080536.ninspvplibd37fj2@pd.tnic> <20170908091614.nmdxjnukxowlsjja@pd.tnic> <20170908094815.GA278@x4> <20170908103513.npjmb2kcjt2zljb2@gmail.com> <20170908103906.GB278@x4> <20170908113039.GA285@x4> <20170908171633.GA279@x4> From: Andy Lutomirski Date: Fri, 8 Sep 2017 14:47:00 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Current mainline git (24e700e291d52bd2) hangs when building e.g. perf To: Markus Trippelsdorf Cc: Andy Lutomirski , Ingo Molnar , Borislav Petkov , Thomas Gleixner , Peter Zijlstra , LKML , Ingo Molnar , Tom Lendacky Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2651 Lines: 66 On Fri, Sep 8, 2017 at 10:16 AM, Markus Trippelsdorf wrote: > On 2017.09.08 at 09:12 -0700, Andy Lutomirski wrote: >> On Fri, Sep 8, 2017 at 4:30 AM, Markus Trippelsdorf >> wrote: >> > On 2017.09.08 at 12:39 +0200, Markus Trippelsdorf wrote: >> >> On 2017.09.08 at 12:35 +0200, Ingo Molnar wrote: >> >> > >> >> > * Markus Trippelsdorf wrote: >> >> > >> >> > > On 2017.09.08 at 11:16 +0200, Borislav Petkov wrote: >> >> > > > On Fri, Sep 08, 2017 at 10:05:36AM +0200, Borislav Petkov wrote: >> >> > > > > On Fri, Sep 08, 2017 at 08:26:44AM +0200, Thomas Gleixner wrote: >> >> > > > > > On Fri, 8 Sep 2017, Markus Trippelsdorf wrote: >> >> > > > > > >> >> > > > > > CC+ Borislav. He might have access to such a beast >> >> > > > > >> >> > > > > Can I have /proc/cpuinfo and dmesg pls, in order to see whether I have >> >> > > > > something similar? >> >> > > > > >> >> > > > > Private mail's fine too. >> >> > > > >> >> > > > So I don't have exactly your model - mine is model 2, stepping 3 but I see >> >> > > > something strange too, in dmesg: >> >> > > >> >> > > I'm pretty sure the bug is in the merged 'x86-mm-for-linus' branch: >> >> > > Either Andy's "PCID optimized TLB flushing" (would be my guess) or >> >> > > 'encrypted memory' support by Tom Lendacky. >> >> > > >> >> > > (Bisecting is hard, because sometimes I can compile stuff for over 15 >> >> > > minutes without hitting the bug. At other times the machine locks up >> >> > > hard when starting X11 already.) >> >> > >> >> > Do you have the 72c0098d92ce fix? >> >> >> >> Yes. The bug still happens on the current git tree (which has the fix >> >> already): >> > >> > The bug is definitely caused by Andy Lutomirski's PCID optimized TLB >> > flushing" patches. Tom is off the hook. >> >> I'm pretty sure it can't be PCID per se, since these CPUs are way too >> old and are very unlikely to have PCID. > > Yes, the CPU doesn't support PCID (,but it does support PGE). > >> It could plausibly be the lazy TLB flushing changes. > > Yes, I've narrowed it down to: > > commit 94b1b03b519b81c494900cb112aa00ed205cc2d9 > Author: Andy Lutomirski > Date: Thu Jun 29 08:53:17 2017 -0700 > > x86/mm: Rework lazy TLB mode and TLB freshness tracking > > > Theoretically you guys should be able to reproduce the issue by using > the "nopcid" boot option. > Any chance you could test with CONFIG_DEBUG_VM=y? There are lots of potentially useful assertions in that code. Can you also post your /proc/cpuinfo? And can you re-confirm that a problematic guest kernel is causing problems in the *host*?