Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752184AbdGaMWm (ORCPT ); Mon, 31 Jul 2017 08:22:42 -0400 Received: from ud10.udmedia.de ([194.117.254.50]:52486 "EHLO mail.ud10.udmedia.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751054AbdGaMWl (ORCPT ); Mon, 31 Jul 2017 08:22:41 -0400 Date: Mon, 31 Jul 2017 14:22:38 +0200 From: Markus Trippelsdorf To: Alan Cox Cc: Satoru Takeuchi , LKML , x86@kernel.org Subject: Re: [FYI] GCC segfaults under heavy multithreaded compilation with AMD Ryzen Message-ID: <20170731122238.GA277@x4> References: <20170731130435.6cb4f00f@alans-desktop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170731130435.6cb4f00f@alans-desktop> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1343 Lines: 33 On 2017.07.31 at 13:04 +0100, Alan Cox wrote: > On Wed, 26 Jul 2017 06:54:01 +0900 > Satoru Takeuchi wrote: > > > # I'm a LKML subscriber, but not a x86 list subscriber > > > > I found the following new linux kernel bugzilla about Ryzen related problem. > > Since many developers don't check this bugzilla and I've also > > encountered this problem, > > I decided to introduce this problem here. > > Historically we've seen exactly these symptoms on all kinds of systems > where the memory is at fault, even in cases where memtest86 passes. > Whether there's a specific problem on some Ryzen boards is a question for > AMD, but if I saw this without knowing the CPU I'd suspect memory > firstly. GCC it turns out is by accident an amazingly effective memory > testing tool. > > If it is memory corruption problems then no - the kernel cannot work > around that level of hardware failure. The BIOS may be able to if it is a > board or compatibility problem as memory tuning is usually done by the > BIOS. People are seeing these segfaults even with ECC memory (and EDAC enabled). There are no ECC related MCEs in their logs. Also for some the segfaults are gone after they RMAed their CPU. Others are not so lucky and they still see segfaults after RMA. For me it looks like a chip binning issue -- Markus