Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757641AbdIIRFy (ORCPT ); Sat, 9 Sep 2017 13:05:54 -0400 Received: from mail.skyhub.de ([5.9.137.197]:34096 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753884AbdIIRFx (ORCPT ); Sat, 9 Sep 2017 13:05:53 -0400 Date: Sat, 9 Sep 2017 19:05:37 +0200 From: Borislav Petkov To: Markus Trippelsdorf Cc: Andy Lutomirski , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , LKML , Ingo Molnar , Tom Lendacky Subject: Re: Current mainline git (24e700e291d52bd2) hangs when building e.g. perf Message-ID: <20170909170537.6xmxtzwripplhhwi@pd.tnic> References: <20170909101810.a757cja7vslofyrj@pd.tnic> <20170909110749.GA277@x4> <20170909130727.3jjnc6p5g45dihmy@pd.tnic> <20170909133745.GA289@x4> <20170909133954.GB289@x4> <20170909140700.bp7jonmp7etlb7ov@pd.tnic> <20170909142014.GC289@x4> <20170909143335.ja2iwjsbeyfxz4ez@pd.tnic> <20170909144350.GA290@x4> <20170909163225.GA290@x4> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170909163225.GA290@x4> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2310 Lines: 71 On Sat, Sep 09, 2017 at 06:32:25PM +0200, Markus Trippelsdorf wrote: > Also tried the following patch. It does not help. Ok, another theory. This one still needs to be fixed properly but that for later. For some reason (insufficient coffee maybe), I have mistyped your MCi_STATUS value earlier. Your mail says it is "fa000010000b0c0f". Do you still have a screen photo to verify it? Because if so, the correct error type is: MC4_STATUS[Val|Over|UC|EN|MiscV|PCC|EEC: Protocol error (link, L3, probe filter) (0x0b)|ET: BUS(pp:OBS;t:NOTIMOUT;r4:GEN;ii:GEN;ll:LG)]: 0xfa000010000b0c0f And for that I'd need the MC4_ADDR value too. So can you please apply the patch below ontop of the syncflood quirk patch and retrigger, make a photo of the MCE and send it to me? Thanks. --- commit e84e5ad290c7c26af69a721148f404766529509b Author: Borislav Petkov Date: Sat Sep 9 00:55:50 2017 +0200 x86/MCE/AMD: Collect error info even if valid bits are not set The MCA banks log error info into MCA_ADDR, MCA_MISC0, and MCA_SYND even if the corresponding valid bits are not set: "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and MCA_STATUS[SyndV] are zero." Do so by setting those bits so that code down the MCE processing path doesn't need to be changed. Signed-off-by: Borislav Petkov diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 3b413065c613..c63c7ef326c7 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -436,6 +436,20 @@ static inline void mce_gather_info(struct mce *m, struct pt_regs *regs) if (mca_cfg.rip_msr) m->ip = mce_rdmsrl(mca_cfg.rip_msr); } + + /* + * Error handlers should save the values in MCA_ADDR, MCA_MISC0, and + * MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and + * MCA_STATUS[SyndV] are zero. + */ + if (m->cpuvendor == X86_VENDOR_AMD) { + u64 status = MCI_STATUS_ADDRV | MCI_STATUS_MISCV; + + if (mce_flags.smca) + status |= MCI_STATUS_SYNDV; + + m->status |= status; + } } int mce_available(struct cpuinfo_x86 *c) -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.