Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751406AbdI3PVI (ORCPT ); Sat, 30 Sep 2017 11:21:08 -0400 Received: from ud10.udmedia.de ([194.117.254.50]:59780 "EHLO mail.ud10.udmedia.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751119AbdI3PVH (ORCPT ); Sat, 30 Sep 2017 11:21:07 -0400 Date: Sat, 30 Sep 2017 17:21:04 +0200 From: Markus Trippelsdorf To: Brian Gerst Cc: Borislav Petkov , Adam Borowski , Linux Kernel Mailing List , Andy Lutomirski , x86-ml Subject: Re: random insta-reboots on AMD Phenom II Message-ID: <20170930152104.GC238@x4> References: <20170930020516.sybqsf5yn2gzuph3@angband.pl> <20170930111137.sxmygy3577iu2hj4@pd.tnic> <20170930112903.kompuesgn6jjuwil@angband.pl> <20170930115302.ezvch7pdx7ws2k5h@pd.tnic> <20170930124711.GB238@x4> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1796 Lines: 48 On 2017.09.30 at 10:20 -0400, Brian Gerst wrote: > On Sat, Sep 30, 2017 at 8:47 AM, Markus Trippelsdorf > wrote: > > On 2017.09.30 at 13:53 +0200, Borislav Petkov wrote: > >> On Sat, Sep 30, 2017 at 01:29:03PM +0200, Adam Borowski wrote: > >> > On Sat, Sep 30, 2017 at 01:11:37PM +0200, Borislav Petkov wrote: > >> > > On Sat, Sep 30, 2017 at 04:05:16AM +0200, Adam Borowski wrote: > >> > > > Any hints how to debug this? > >> > > > >> > > Do > >> > > rdmsr -a 0xc0010015 > >> > > as root and paste it here. > >> > > >> > 1000010 > >> > 1000010 > >> > 1000010 > >> > 1000010 > >> > 1000010 > >> > 1000010 > >> > > >> > on both 4.13.4 and 4.14-rc2+. > >> > >> Boot into -rc2+ and do as root: > >> > >> # wrmsr -a 0xc0010015 0x1000018 > >> > >> If the issue gets fixed then Mr. Luto better revert the new lazy TLB > >> flushing fun'n'games for 4.14 before it is too late and that kernel > >> releases b0rked. > > > > The issue does get fixed by setting TlbCacheDis to 1. I have been > > running it for the last few weeks without any problems. > > Performance is not affected at all. So it might by easier to just set > > the bit for older AMD processors as a boot quirk. > > Changing the TLB code so late might not be a good idea... > > Looking at the AMD K10 revision guide > (http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf), errata #298 > that this fixes should only apply to revisions DR-BA and DR-B2, which > include the original Phenom, but not Phenom II. The Phenom II X6 is > revision PH-E0, which does not have this errata. It has nothing to do with errata #298. The new lazy TLB code causes MCEs, because the page tables may now contain garbage. See the long "Current mainline git (24e700e291d52bd2) hangs when building e.g. perf" LKML thread. -- Markus