Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A45AFC7EE2F for ; Fri, 3 Mar 2023 20:10:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231575AbjCCUKk (ORCPT ); Fri, 3 Mar 2023 15:10:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231679AbjCCUKh (ORCPT ); Fri, 3 Mar 2023 15:10:37 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC21B233FC for ; Fri, 3 Mar 2023 12:10:35 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 804D2B819A5 for ; Fri, 3 Mar 2023 20:10:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4353DC4339B; Fri, 3 Mar 2023 20:10:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677874233; bh=80j8RKiRq2GoeO26JmcNOummkwVsnrQG/ufcY6/SAiM=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Q8S/JhaZmZzvk39p6UU/cGYdvWIHA208fO3zDNP8kuBknjRq4NIECPmcvoY8HzYt2 T7BmAVqXkBDMy5AW99NUPw/UQTdteQKweExIs3DWpLJj2a5sE2O5bxoyLsePee0IUj lcVAVVH0/q7XW3zSF+BMnUZnhFdDz9ZWowzY/tazpSElY0/jEVZhUcVXxJqg2mddKh 5lqr1jjygoLOUALUhlCigLeLfmuvdte6J8pw7pmzJbdLqx91cKICQApHOmtBzPBxrN FIqLa+/cXQDZZnXjL4t+UZMrtF4tYnWmAvWz97NVq+yWA/AGhrXwMYezetQtR69J3G kqmPHivrKnQkA== Received: from [206.0.71.4] (helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pYBja-00EiDs-2k; Fri, 03 Mar 2023 20:10:31 +0000 Date: Fri, 03 Mar 2023 20:10:17 +0000 Message-ID: <87mt4th9zq.wl-maz@kernel.org> From: Marc Zyngier To: Darren Hart Cc: Aristeu Rozanski , linux-kernel@vger.kernel.org, "D. Scott Phillips" Subject: Re: Error reports at boot time in Ampere Altra machines since c733ebb7c In-Reply-To: References: <20230302201732.pwnhg46mum6st2bv@redhat.com> <865ybizqfi.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 206.0.71.4 X-SA-Exim-Rcpt-To: darren@os.amperecomputing.com, aris@redhat.com, linux-kernel@vger.kernel.org, scott@os.amperecomputing.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 03 Mar 2023 19:38:40 +0000, Darren Hart wrote: > > On Thu, Mar 02, 2023 at 11:25:37PM +0000, Marc Zyngier wrote: > > On Thu, 02 Mar 2023 20:17:32 +0000, > > Aristeu Rozanski wrote: > > > > > > Hi Marc, > > > > > > Since c733ebb7cb67d ("irqchip/gic-v3-its: Reset each ITS's BASERn > > > register before probe"), Ampere Altra machines are reporting corrected > > > errors during boot: > > > > > > [ 0.294334] HEST: Table parsing has been initialized. > > > [ 0.294397] sdei: SDEIv1.0 (0x0) detected in firmware. > > > [ 0.299622] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 > > > [ 0.299626] {1}[Hardware Error]: event severity: recoverable > > > [ 0.299629] {1}[Hardware Error]: Error 0, type: recoverable > > > [ 0.299633] {1}[Hardware Error]: section type: unknown, e8ed898d-df16-43cc-8ecc-54f060ef157f > > > [ 0.299638] {1}[Hardware Error]: section length: 0x30 > > > [ 0.299645] {1}[Hardware Error]: 00000000: 00000005 ec30000e 00080110 80001001 ......0......... > > > [ 0.299648] {1}[Hardware Error]: 00000010: 00000300 00000000 00000000 00000000 ................ > > > [ 0.299650] {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................ > > > [ 0.299714] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3 > > > [ 0.299716] {2}[Hardware Error]: event severity: recoverable > > > [ 0.299717] {2}[Hardware Error]: Error 0, type: recoverable > > > [ 0.299718] {2}[Hardware Error]: section type: unknown, e8ed898d-df16-43cc-8ecc-54f060ef157f > > > [ 0.299720] {2}[Hardware Error]: section length: 0x30 > > > [ 0.299722] {2}[Hardware Error]: 00000000: 40000005 ec30000e 00080110 80005001 ...@..0......P.. > > > [ 0.299724] {2}[Hardware Error]: 00000010: 00000300 00000000 00000000 00000000 ................ > > > [ 0.299726] {2}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................ > > > [ 0.299912] GHES: APEI firmware first mode is enabled by APEI bit. > > > > > > Because the errors are being reported later in boot, it's hard to > > > pinpoint exactly what's causing it without decoding the error information, > > > which I currently don't know how to do it. > > > > + Darren > > > > Hopefully someone at Ampere can decode this and tell us what is happening. > > Hi Marc, > > + D Scott > > Thanks for the connection. > > This is reporting that something attempted to access GITS2_BASER2, the base > register for the gicv4 vcpu table. Altra doesn't support gicv4. Is c733ebb7c > assuming GITS_BASER2 should be accessible on gicv3? All the GITS_BASERn registers should be RES0 if not implemented, as per the spec (12.19.1 GITS_BASER, ITS Translation Table Descriptors, n = 0 - 7) A maximum of 8 GITS_BASER registers can be provided. Unimplemented registers are RES 0. Returning an error on access is thus definitely a violation of the spec. So either the GIC implementation you are using is buggy, or you have some sort of HW firewalling between the CPU and the GIC that is trigger happy. My hunch is that this is the latter, as buggy implementations tend to return an SError when missing this sort of detail. Thanks, M. -- Without deviation from the norm, progress is not possible.