Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp2827009pxv; Mon, 12 Jul 2021 02:55:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwAJtDYKWi64nNLTQJt3JAwULO8E5KioPojLGMTILS2MKP/tWpSoouWhnN4VlFwr43U8pz+ X-Received: by 2002:a05:6e02:1a82:: with SMTP id k2mr38643312ilv.173.1626083741303; Mon, 12 Jul 2021 02:55:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626083741; cv=none; d=google.com; s=arc-20160816; b=mo64qzymrRWLXidb+WpXS5nTqrR+3Jyld/a/mxn3BmrV+QxmiogT0TVvnGhsfEqkx0 bBVQtOFyARe3bhAtHST14XqCvWSh6PZ8kvWNBUzZWy6XiO2hQvhUkUYnsy6NVa2I9dah jNDrjtbRlSnK37fmsUvItIATyruKLTFoiKanOYsQ8pv/S2C0NzzqgUM6neykySHev0tQ Z2akW46/o7SgXTWOq+K7lmat+pe6YAJJbG2aeDRbEluHIFPMloNdh6Vj1aopO1TRi0yO emRdVL0J4sqjS1JkzZQ2koYJbueXpWhJVDvt9CGzKXlTOiAfH8M5QFlc1ypdusmy5a7Y TqOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=iLv8nUn1+urTAJGdYPuOPbzpVF0poCbPZ57Pt4y9+e0=; b=l16NV/7NfVciqlvr0xXs4BDrt4zoVhK8/av8oa8IkxSv/oGtYm79esRvZHvExot24u DDZUq3gKd8FnXNnKnewR04ELpkMx+BC+PVootLYxIQ8s7O0WNtAadHO6UrLilbqy8TgU r/n2VX1gNoSmwuNgffIAvgOojIEl/NeOp+Kuq3GgwDKZBeBL+QvDyWKkedZvqVKK1WFa 73aesE5q6IrMFqf5bbYFI6gtbtSgsxh47czLL2jp1uHxfM9P6uGAJ9cfGXzQqNyKZgLN 8DXFZwYk5Jh8w+VyQEy079rUriE0ADu7voS58vhrtVQqlQAxPit5/O1pbG4HCl4+G+ER NQqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=QzXW90t8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s13si16304079ioj.45.2021.07.12.02.55.30; Mon, 12 Jul 2021 02:55:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=QzXW90t8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239394AbhGLG6H (ORCPT + 99 others); Mon, 12 Jul 2021 02:58:07 -0400 Received: from mail.kernel.org ([198.145.29.99]:34842 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238676AbhGLGlv (ORCPT ); Mon, 12 Jul 2021 02:41:51 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 5708460FD8; Mon, 12 Jul 2021 06:38:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1626071934; bh=k3kZ2VEBrJ7CdTD5g3GNvFDitEJ7OpDfeIsF0NqydwM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QzXW90t8DKdTGaTUL3eIOmVHyO+RuV+fL96WsaHLC6ckOEGnfAEwvhUa6UyYtwcjO Ps49a4+o2mpL7c/5Bzslmv8SZbSF88zvMnUycwa4zKR02RY2zaVkwHzz3BjDlsF7Zl FjgIxIXy5ZRPrxKOqBwyuRCdRPemWQbJgm9KXpYg= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Xiaofei Tan , James Morse , "Rafael J. Wysocki" , Sasha Levin Subject: [PATCH 5.10 282/593] ACPI: APEI: fix synchronous external aborts in user-mode Date: Mon, 12 Jul 2021 08:07:22 +0200 Message-Id: <20210712060915.198149954@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210712060843.180606720@linuxfoundation.org> References: <20210712060843.180606720@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Xiaofei Tan [ Upstream commit ccb5ecdc2ddeaff744ee075b54cdff8a689e8fa7 ] Before commit 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work"), do_sea() would unconditionally signal the affected task from the arch code. Since that change, the GHES driver sends the signals. This exposes a problem as errors the GHES driver doesn't understand or doesn't handle effectively are silently ignored. It will cause the errors get taken again, and circulate endlessly. User-space task get stuck in this loop. Existing firmware on Kunpeng9xx systems reports cache errors with the 'ARM Processor Error' CPER records. Do memory failure handling for ARM Processor Error Section just like for Memory Error Section. Fixes: 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work") Signed-off-by: Xiaofei Tan Reviewed-by: James Morse [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/acpi/apei/ghes.c | 81 +++++++++++++++++++++++++++++++--------- 1 file changed, 64 insertions(+), 17 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index fce7ade2aba9..0c8330ed1ffd 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -441,28 +441,35 @@ static void ghes_kick_task_work(struct callback_head *head) gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, node_len); } -static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, - int sev) +static bool ghes_do_memory_failure(u64 physical_addr, int flags) { unsigned long pfn; - int flags = -1; - int sec_sev = ghes_severity(gdata->error_severity); - struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata); if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) return false; - if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) - return false; - - pfn = mem_err->physical_addr >> PAGE_SHIFT; + pfn = PHYS_PFN(physical_addr); if (!pfn_valid(pfn)) { pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid address in generic error data: %#llx\n", - mem_err->physical_addr); + physical_addr); return false; } + memory_failure_queue(pfn, flags); + return true; +} + +static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, + int sev) +{ + int flags = -1; + int sec_sev = ghes_severity(gdata->error_severity); + struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata); + + if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) + return false; + /* iff following two events can be handled properly by now */ if (sec_sev == GHES_SEV_CORRECTED && (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED)) @@ -470,14 +477,56 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) flags = 0; - if (flags != -1) { - memory_failure_queue(pfn, flags); - return true; - } + if (flags != -1) + return ghes_do_memory_failure(mem_err->physical_addr, flags); return false; } +static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int sev) +{ + struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); + bool queued = false; + int sec_sev, i; + char *p; + + log_arm_hw_error(err); + + sec_sev = ghes_severity(gdata->error_severity); + if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE) + return false; + + p = (char *)(err + 1); + for (i = 0; i < err->err_info_num; i++) { + struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p; + bool is_cache = (err_info->type == CPER_ARM_CACHE_ERROR); + bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR); + const char *error_type = "unknown error"; + + /* + * The field (err_info->error_info & BIT(26)) is fixed to set to + * 1 in some old firmware of HiSilicon Kunpeng920. We assume that + * firmware won't mix corrected errors in an uncorrected section, + * and don't filter out 'corrected' error here. + */ + if (is_cache && has_pa) { + queued = ghes_do_memory_failure(err_info->physical_fault_addr, 0); + p += err_info->length; + continue; + } + + if (err_info->type < ARRAY_SIZE(cper_proc_error_type_strs)) + error_type = cper_proc_error_type_strs[err_info->type]; + + pr_warn_ratelimited(FW_WARN GHES_PFX + "Unhandled processor error type: %s\n", + error_type); + p += err_info->length; + } + + return queued; +} + /* * PCIe AER errors need to be sent to the AER driver for reporting and * recovery. The GHES severities map to the following AER severities and @@ -605,9 +654,7 @@ static bool ghes_do_proc(struct ghes *ghes, ghes_handle_aer(gdata); } else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) { - struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); - - log_arm_hw_error(err); + queued = ghes_handle_arm_hw_error(gdata, sev); } else { void *err = acpi_hest_get_payload(gdata); -- 2.30.2