Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0409C6FA8E for ; Thu, 2 Mar 2023 23:25:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229854AbjCBXZz (ORCPT ); Thu, 2 Mar 2023 18:25:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229482AbjCBXZx (ORCPT ); Thu, 2 Mar 2023 18:25:53 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D00B434006 for ; Thu, 2 Mar 2023 15:25:51 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 78702B815A6 for ; Thu, 2 Mar 2023 23:25:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 13332C4339B; Thu, 2 Mar 2023 23:25:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677799549; bh=souZATXtXYX70l/pZZC0A+gYJ1Sr5C2ttdWcq7vgKAw=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=kl/4BH0M8Oa2YwvEgLlCyGXbKAzS4H7kVuvJ9+lVw1OHvKvMd0MtMHkmP6m6tkbQr hyzYK2f7Nm9sobicyaf0sldvh+ue5+FN0jm/VZL6HudEjlxnzOoA5/370JW+1El2ri p1hKHS6wkSJD9XfMwhB9Vk5bzFHA44ltt8x1C805GkPWCFRveFrvzsWXwW8lTT4cRz 8Cv0fWk+agzbjbKzKVAb7NS6rgQcA/NvugehwmC3otO+/xNdlzDQ1unFu5V0qQLwMJ 2A+Ap+R8vOKFh2IUBIxKk7XpSHVYrH4Kqj5ozAbQoelQTHIxrJqL+wjLVAVmdZ9Uzp mIbutwItUPCSA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pXsJ0-00EVIG-R1; Thu, 02 Mar 2023 23:25:46 +0000 Date: Thu, 02 Mar 2023 23:25:37 +0000 Message-ID: <865ybizqfi.wl-maz@kernel.org> From: Marc Zyngier To: Aristeu Rozanski , Darren Hart Cc: linux-kernel@vger.kernel.org Subject: Re: Error reports at boot time in Ampere Altra machines since c733ebb7c In-Reply-To: <20230302201732.pwnhg46mum6st2bv@redhat.com> References: <20230302201732.pwnhg46mum6st2bv@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: aris@redhat.com, darren@os.amperecomputing.com, linux-kernel@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 02 Mar 2023 20:17:32 +0000, Aristeu Rozanski wrote: > > Hi Marc, > > Since c733ebb7cb67d ("irqchip/gic-v3-its: Reset each ITS's BASERn > register before probe"), Ampere Altra machines are reporting corrected > errors during boot: > > [ 0.294334] HEST: Table parsing has been initialized. > [ 0.294397] sdei: SDEIv1.0 (0x0) detected in firmware. > [ 0.299622] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 > [ 0.299626] {1}[Hardware Error]: event severity: recoverable > [ 0.299629] {1}[Hardware Error]: Error 0, type: recoverable > [ 0.299633] {1}[Hardware Error]: section type: unknown, e8ed898d-df16-43cc-8ecc-54f060ef157f > [ 0.299638] {1}[Hardware Error]: section length: 0x30 > [ 0.299645] {1}[Hardware Error]: 00000000: 00000005 ec30000e 00080110 80001001 ......0......... > [ 0.299648] {1}[Hardware Error]: 00000010: 00000300 00000000 00000000 00000000 ................ > [ 0.299650] {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................ > [ 0.299714] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3 > [ 0.299716] {2}[Hardware Error]: event severity: recoverable > [ 0.299717] {2}[Hardware Error]: Error 0, type: recoverable > [ 0.299718] {2}[Hardware Error]: section type: unknown, e8ed898d-df16-43cc-8ecc-54f060ef157f > [ 0.299720] {2}[Hardware Error]: section length: 0x30 > [ 0.299722] {2}[Hardware Error]: 00000000: 40000005 ec30000e 00080110 80005001 ...@..0......P.. > [ 0.299724] {2}[Hardware Error]: 00000010: 00000300 00000000 00000000 00000000 ................ > [ 0.299726] {2}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................ > [ 0.299912] GHES: APEI firmware first mode is enabled by APEI bit. > > Because the errors are being reported later in boot, it's hard to > pinpoint exactly what's causing it without decoding the error information, > which I currently don't know how to do it. + Darren Hopefully someone at Ampere can decode this and tell us what is happening. > There're no problems other than of course triggering tests because of > the warnings. It says "Hardware Error". In my book, that's pretty bad. Do you see this on more than a single machine? > Do you know what's going on here? No idea. I haven't seen this on the Altra I have access to so far, It could be related to firmware and/or things like power management, but again, someone needs to help us with the error report above. Thanks, M. -- Without deviation from the norm, progress is not possible.