Received: by 2002:a05:6358:53a8:b0:117:f937:c515 with SMTP id z40csp321957rwe; Fri, 14 Apr 2023 03:28:59 -0700 (PDT) X-Google-Smtp-Source: AKy350bMbRIVzHpO0neWhqv/KmRdiRaypa8Wg2b20Uo9UoXDK3l7B8gk85e6wsCNdW3tYBp9a7Zk X-Received: by 2002:aa7:88ca:0:b0:63b:6727:eda0 with SMTP id k10-20020aa788ca000000b0063b6727eda0mr2300463pff.26.1681468139093; Fri, 14 Apr 2023 03:28:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681468139; cv=none; d=google.com; s=arc-20160816; b=xalr6yrLjGaUMUeBbsGeLH5R1qjNue3KkteH8UaONeu9jl+sheTASHWkx5uUfLtiO5 CJprDl9+PX0RSxZuVxRUiKt+JAYt44wybWlLFrY4OU1abZvMkPALvFoqJpAadNvhcuIk e/KyJS7dFV6KwAhlHKmb2LF/8jtcIBy8iAZB2xkfWZmndm1XZN227NTqtULB5g7Z2+fc 0Z/448Y3jS4vderCqJdX/Itc64AVwdzwdCZ12SqSiATV0bYSW8VIkrVtEHVpAp4y8xfJ cZfhjOakU+DCvfAgwPlu4Ust/owTRtWn5snUPeUYk2dR3S+gA3mYzS7wMKQj0krzQ8TQ /DWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=PzwGmagy1MIMXAlSo+f5eCVMJJv19H9njCTpV2nxbJQ=; b=DiSNboHh/wmb8AYXTmPeuJzqB4+/fmn9b9AK9iU+lxsQfjK0XYfvSRiJ6H3VX+XNQk 3Iv29OmccIRTfXl3OnM6jGnBMRvBnG44DhRSUWTJfxvNoVqRJJtD/1VGLBmKe76apXs9 OT+80L5V5k6bLLSnJXjy7E/O5GyGH5UDcLIXJ+IhQ6/NwNCPv9vAPT7A0v7/WsO9jFvx QeOItCJdxYoLwuqzNyxZE1YWJI5s+5rxuox3AMDBDFhag912jJ41peKIt5DyeiEcdx4h uAHDZxDkiJYoRTQWqZMb9F/4zy3Q6EGCxSfJfV3iv+H/AJ9/as8yrT99bCZDfzmMrm+0 iYog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=Km+bVBsE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h24-20020a63df58000000b0051b54dccff6si3239050pgj.727.2023.04.14.03.28.44; Fri, 14 Apr 2023 03:28:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=Km+bVBsE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230128AbjDNKYM (ORCPT + 99 others); Fri, 14 Apr 2023 06:24:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229534AbjDNKYL (ORCPT ); Fri, 14 Apr 2023 06:24:11 -0400 Received: from mail.skyhub.de (mail.skyhub.de [5.9.137.197]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CA464C06 for ; Fri, 14 Apr 2023 03:24:09 -0700 (PDT) Received: from zn.tnic (p5de8e687.dip0.t-ipconnect.de [93.232.230.135]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id C6A371EC0691; Fri, 14 Apr 2023 12:24:07 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=dkim; t=1681467847; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PzwGmagy1MIMXAlSo+f5eCVMJJv19H9njCTpV2nxbJQ=; b=Km+bVBsEXHNxE+eeFfgY5ed3dzBvFv4vupo30rdDxHJ+2LItz7zkNIjjmpWVWRgO72lCs+ OVoD62Awwj19BXL61k+49NgAvmet04Gbgxz8B1DJkNqJB5eT8UWsRlzSNspJBm6DxU0ie7 T3RqA2bJ/x7HoM558D5Z1JI/4vVAvhw= Date: Fri, 14 Apr 2023 12:24:01 +0200 From: Borislav Petkov To: Paul Menzel Cc: Thomas Gleixner , Ingo Molnar , Dave Hansen , x86@kernel.org, LKML , Yazen Ghannam Subject: Re: AMD EPYC 25 (19h): Hardware Error: Machine Check: 0 Bank 17: d42040000000011b Message-ID: <20230414102401.GAZDkpwUHfFM64dpIK@fat_crate.local> References: <21a09968-296b-5b21-8079-6d9d4e0769d4@molgen.mpg.de> <20230412163240.GAZDbdKHjmQcxqkeDQ@fat_crate.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 14, 2023 at 11:26:27AM +0200, Paul Menzel wrote: > It says “no action required”, Yes, it means you had a single bit flip in some DIMM and it got corrected by the ECC so you don't need to do anything. > but out of the identical 14 servers with the same workload this is the > only one having shown this errors three times. Or you could enable CONFIG_RAS_CEC and don't see those errors anymore. It all depends: a DIMM could be producing correctable errors for a long time before going bad. If ever. If you don't want to risk whatever you're running on that machine by a DIMM *potentially* going bad, sure, you can replace it. That's a budget call. :) > Maybe the DIMM at bank 17 should just be replaced. Bank 17 is the CPU MCA bank which reports the error - not a DIMM bank. In order to pinpoint the location, you should have amd64_edac loaded so that it decodes to which DIMM. You could try loading that module and injecting all errors you have to see what it says, it should work this way too as the error signature has everything needed for decoding, AFAICT. But Yazen can chime in here if I'm off. HTH. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette