Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp5572297pxb; Mon, 14 Feb 2022 02:17:42 -0800 (PST) X-Google-Smtp-Source: ABdhPJzh6Pn7kJsye/NqpPERkZThaCMiTudcfP053Nu5FE01XGD/PxrB3z5BpCpNZb1J2zEMw1B3 X-Received: by 2002:a17:902:d4c2:: with SMTP id o2mr13544417plg.39.1644833862310; Mon, 14 Feb 2022 02:17:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644833862; cv=none; d=google.com; s=arc-20160816; b=n9y2QrS4N0ZegQjqw54peNUbsPeQab2PSfhT1Nvhz7bSahG5B3tTCdRid2gHkKPjwx LMUn46jTYgiizR43daW9e6sXqYsbad/32ExCvv8l25rA5ezNMvacA0Mnc9wqv5aCJcMv qfdaU8PlQAlKJ1RSB5ZpVS7xuLsoJRs0JvUQv1Pd1NSfo1z/NscL2HMgDg+beADu2l4P nHYPS0Nb6ohpM9/linqYFdWryau+iEROA4InG+ohU+7u75wPiIFJ4YxjNWHlr3Gv4Wam nFRpH6GUYJEfCWcy0T0J4/V9EyFEASkYfmb0oX7ZDNcveV1s+utiCSV4IsPtK4Qb7i8r DjnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:to:from; bh=Usqmv/G29BV3/OSGENtYBltjmLNTPso3YospGbkzudE=; b=Ca4LaqMfsKCvSIS7lAz/77TlcLU/nDKSvYgT9XGLuYtPKhLKbTLbTC/7s6hn+vxvVM dNhlrpV78q61S2DZ/Pm7AxlKh0jX7JXq+b6JEZk9Zft/3ZwceI9ie5TiYj2isBF+c2nM DKBZi2eo0fT8lpJdYaVVvD6Kmhdv+vWXQPdg7bEneRptlHC2WqQ2bYN2rgOAnsjmLD7R +ykTc/WG1kUD7D2Og/HU4+b4Kte/NOWgEX8NTeo6GCQt/s0SXRct+A/O+LjOUjcHoBy4 dlaWutzhAeXZZ9IlfDVpqa4lyjK36HbKHfHm6u/mRsu7HU2xtXktoCf3InFXFrTcqCpQ ml2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 196si22596259pgb.133.2022.02.14.02.17.27; Mon, 14 Feb 2022 02:17:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239349AbiBNDOm (ORCPT + 99 others); Sun, 13 Feb 2022 22:14:42 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:50570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229715AbiBNDOl (ORCPT ); Sun, 13 Feb 2022 22:14:41 -0500 X-Greylist: delayed 362 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Sun, 13 Feb 2022 19:14:30 PST Received: from zg8tmty1ljiyny4xntqumjca.icoremail.net (zg8tmty1ljiyny4xntqumjca.icoremail.net [165.227.154.27]) by lindbergh.monkeyblade.net (Postfix) with SMTP id 5857750E3B; Sun, 13 Feb 2022 19:14:30 -0800 (PST) Received: from localhost.localdomain (unknown [123.60.114.22]) by mail-app4 (Coremail) with SMTP id cS_KCgAXHQmcxwliUo4aCQ--.6446S2; Mon, 14 Feb 2022 11:08:13 +0800 (CST) From: lostway@zju.edu.cn To: linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, bp@alien8.de, tony.luck@intel.com, james.morse@arm.com Subject: [PATCH v3] RAS: Report ARM processor information to userspace Date: Mon, 14 Feb 2022 11:08:12 +0800 Message-Id: <20220214030813.135766-1-lostway@zju.edu.cn> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: cS_KCgAXHQmcxwliUo4aCQ--.6446S2 X-Coremail-Antispam: 1UD129KBjvJXoWxtF4rGw1xJry8CF1UJr1DKFg_yoW3Ar43pF n8CryYkr4rJFsxG3y3JayF93y3X3s5uw1DK343Xay7CFs5ur1qqFs0gr42kF93Jr98J34a q3Wqgry3Ca4DJrDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUkYb7Iv0xC_tr1lb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I2 0VC2zVCF04k26cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rw A2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xII jxv20xvEc7CjxVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwV C2z280aVCY1x0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC 0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr 1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4kE6xkIj40Ew7xC0wCF04k2 0xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI 8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw1lIxkGc2Ij64vIr41l IxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1lIx AIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2 jsIEc7CjxVAFwI0_Jr0_GrUvcSsGvfC2KfnxnUUI43ZEXa7IU8dpnPUUUUU== X-CM-SenderInfo: isrxjjaqquq6lmxovvfxof0/ X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Shengwei Luo The original arm_event trace code only traces out ARM processor error information data. It's not enough for user to take appropriate action. According to UEFI_2_9 specification chapter N2.4.4, the ARM processor error section includes several ARM processor error information, several ARM processor context information and several vendor specific error information structures. In addition to these info, there are error severity and cpu logical index about the event. Report all of these information to userspace via perf i/f. So that the user can do cpu core isolation according to error severity and other info. Signed-off-by: Shengwei Luo Signed-off-by: Jason Tian --- Links: https://lore.kernel.org/lkml/20220126030906.56765-1-lostway@zju.edu.cn/ https://lore.kernel.org/lkml/20210205022229.313030-1-jason@os.amperecomputing.com/ v2->v3: Add signed-off of original author. Fix commit message to explain why a change is being done. v1->v2: Cleaned up ci warnings. --- drivers/acpi/apei/ghes.c | 3 +-- drivers/ras/ras.c | 46 ++++++++++++++++++++++++++++++++++++-- include/linux/ras.h | 15 +++++++++++-- include/ras/ras_event.h | 48 +++++++++++++++++++++++++++++++++++----- 4 files changed, 101 insertions(+), 11 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 0c5c9acc6254..f824c26057b1 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -490,9 +490,8 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int s int sec_sev, i; char *p; - log_arm_hw_error(err); - sec_sev = ghes_severity(gdata->error_severity); + log_arm_hw_error(err, sec_sev); if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE) return false; diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c index 95540ea8dd9d..2a7f424d59b9 100644 --- a/drivers/ras/ras.c +++ b/drivers/ras/ras.c @@ -21,9 +21,51 @@ void log_non_standard_event(const guid_t *sec_type, const guid_t *fru_id, trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len); } -void log_arm_hw_error(struct cper_sec_proc_arm *err) +void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev) { - trace_arm_event(err); + u32 pei_len; + u32 ctx_len = 0; + s32 vsei_len; + u8 *pei_err; + u8 *ctx_err; + u8 *ven_err_data; + struct cper_arm_err_info *err_info; + struct cper_arm_ctx_info *ctx_info; + int n, sz; + int cpu; + + pei_len = sizeof(struct cper_arm_err_info) * err->err_info_num; + pei_err = (u8 *)err + sizeof(struct cper_sec_proc_arm); + + err_info = (struct cper_arm_err_info *)(err + 1); + ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num); + ctx_err = (u8 *)ctx_info; + for (n = 0; n < err->context_info_num; n++) { + sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size; + ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz); + ctx_len += sz; + } + + vsei_len = err->section_length - (sizeof(struct cper_sec_proc_arm) + + pei_len + ctx_len); + if (vsei_len < 0) { + pr_warn(FW_BUG + "section length: %d\n", err->section_length); + pr_warn(FW_BUG + "section length is too small\n"); + pr_warn(FW_BUG + "firmware-generated error record is incorrect\n"); + vsei_len = 0; + } + ven_err_data = (u8 *)ctx_info; + + cpu = GET_LOGICAL_INDEX(err->mpidr); + /* when return value is invalid, set cpu index to -1 */ + if (cpu < 0) + cpu = -1; + + trace_arm_event(err, pei_err, pei_len, ctx_err, ctx_len, + ven_err_data, (u32)vsei_len, sev, cpu); } static int __init ras_init(void) diff --git a/include/linux/ras.h b/include/linux/ras.h index 1f4048bf2674..4529775374d0 100644 --- a/include/linux/ras.h +++ b/include/linux/ras.h @@ -24,7 +24,7 @@ int __init parse_cec_param(char *str); void log_non_standard_event(const guid_t *sec_type, const guid_t *fru_id, const char *fru_text, const u8 sev, const u8 *err, const u32 len); -void log_arm_hw_error(struct cper_sec_proc_arm *err); +void log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev); #else static inline void log_non_standard_event(const guid_t *sec_type, @@ -32,7 +32,18 @@ log_non_standard_event(const guid_t *sec_type, const u8 sev, const u8 *err, const u32 len) { return; } static inline void -log_arm_hw_error(struct cper_sec_proc_arm *err) { return; } +log_arm_hw_error(struct cper_sec_proc_arm *err, const u8 sev) { return; } #endif +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64) +#include +/* + * Include ARM specific SMP header which provides a function mapping mpidr to + * cpu logical index. + */ +#define GET_LOGICAL_INDEX(mpidr) get_logical_index(mpidr & MPIDR_HWID_BITMASK) +#else +#define GET_LOGICAL_INDEX(mpidr) -EINVAL +#endif /* CONFIG_ARM || CONFIG_ARM64 */ + #endif /* __RAS_H__ */ diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h index d0337a41141c..92cfb61bdb20 100644 --- a/include/ras/ras_event.h +++ b/include/ras/ras_event.h @@ -168,11 +168,24 @@ TRACE_EVENT(mc_event, * This event is generated when hardware detects an ARM processor error * has occurred. UEFI 2.6 spec section N.2.4.4. */ +#define APEIL "ARM Processor Err Info data len" +#define APEID "ARM Processor Err Info raw data" +#define APECIL "ARM Processor Err Context Info data len" +#define APECID "ARM Processor Err Context Info raw data" +#define VSEIL "Vendor Specific Err Info data len" +#define VSEID "Vendor Specific Err Info raw data" TRACE_EVENT(arm_event, - TP_PROTO(const struct cper_sec_proc_arm *proc), + TP_PROTO(const struct cper_sec_proc_arm *proc, const u8 *pei_err, + const u32 pei_len, + const u8 *ctx_err, + const u32 ctx_len, + const u8 *oem, + const u32 oem_len, + u8 sev, + int cpu), - TP_ARGS(proc), + TP_ARGS(proc, pei_err, pei_len, ctx_err, ctx_len, oem, oem_len, sev, cpu), TP_STRUCT__entry( __field(u64, mpidr) @@ -180,6 +193,14 @@ TRACE_EVENT(arm_event, __field(u32, running_state) __field(u32, psci_state) __field(u8, affinity) + __field(u32, pei_len) + __dynamic_array(u8, buf, pei_len) + __field(u32, ctx_len) + __dynamic_array(u8, buf1, ctx_len) + __field(u32, oem_len) + __dynamic_array(u8, buf2, oem_len) + __field(u8, sev) + __field(int, cpu) ), TP_fast_assign( @@ -199,12 +220,29 @@ TRACE_EVENT(arm_event, __entry->running_state = ~0; __entry->psci_state = ~0; } + __entry->pei_len = pei_len; + memcpy(__get_dynamic_array(buf), pei_err, pei_len); + __entry->ctx_len = ctx_len; + memcpy(__get_dynamic_array(buf1), ctx_err, ctx_len); + __entry->oem_len = oem_len; + memcpy(__get_dynamic_array(buf2), oem, oem_len); + __entry->sev = sev; + __entry->cpu = cpu; ), - TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; " - "running state: %d; PSCI state: %d", + TP_printk("cpu: %d; error: %d; affinity level: %d; MPIDR: %016llx; MIDR: %016llx; " + "running state: %d; PSCI state: %d; " + "%s: %d; %s: %s; %s: %d; %s: %s; %s: %d; %s: %s", + __entry->cpu, + __entry->sev, __entry->affinity, __entry->mpidr, __entry->midr, - __entry->running_state, __entry->psci_state) + __entry->running_state, __entry->psci_state, + APEIL, __entry->pei_len, APEID, + __print_hex(__get_dynamic_array(buf), __entry->pei_len), + APECIL, __entry->ctx_len, APECID, + __print_hex(__get_dynamic_array(buf1), __entry->ctx_len), + VSEIL, __entry->oem_len, VSEID, + __print_hex(__get_dynamic_array(buf2), __entry->oem_len)) ); /* -- 2.27.0