Received: by 10.213.65.68 with SMTP id h4csp3761238imn; Tue, 3 Apr 2018 10:11:59 -0700 (PDT) X-Google-Smtp-Source: AIpwx48R9IodtNCSDTaQ6u5jSZ5eaFN2kOR3WTwYmxlQZE/bhRvm+zD0X0caUD3Jkns+qI6NOjm8 X-Received: by 2002:a17:902:2a43:: with SMTP id i61-v6mr15069019plb.54.1522775519712; Tue, 03 Apr 2018 10:11:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522775519; cv=none; d=google.com; s=arc-20160816; b=EEVR3rLwA9gcLlP52PY5QgTVRoEHBXR20tNxdQx19dfidmhp0bzE8m/IqCyXTa8WaW nILh/G02tzfFcH0g7nUV89+lLlPATHy5wLtVoTfdxnBv9zGNoQ551NBkFfWZXN52kXls /OYMyhvTPopJ5vNGr2XX/6KR/aQC0OZh9MulvAOgzOZVr4q5N0eCvGBZBaUDbpcxS5Id AEcsa5rlQLW9yJAMzRnUHls/QlJd+Ho4uZWnYeX9Xm8xgqkgV02uMzqo/LquRZPQcgiM t6NIJlWLHvtSGnsxBQ3cQZOyazRZ0WPtmHwr5PfMlYNvJM/aNPFiFIkEPS82XFv6Vndd Hemg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=7GFnrRW++JjG3oKVequb9zK2xl+DQhNyXoNtJjA1wz4=; b=mgiD5+p0HOTkVNXcKSfVyKStn3ChZnRSD1hmD/1vwX9n/pTs3IS3XWByI5cTPXIwSV Oy9KW76srYas7MfJd9tBRbxjqGeL9cxXAAYyO6Fn2apBdZatv8EEroCuh/HMbmH/RRd6 hhtIqcFEkafR8KQFQ/f9ZTcsrJMM9k3BpkTm0g0ygvjeA5q+2g5+hUVbS1BDuQfjnehl /L/hZQmP2gWb8kbURQWQ5fS9hZxvmT3dcIbfNS9DmdB/N1KL8S0Xk9ubIGqoEDGh3gpl GpSd5ZSGE0WSUtCZh9hvTzAXFqmAFpObOBIV3ZxmsoAEC+Ajb2/T7zQrk44mmUUPaUxC z3dA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MseofQNA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c2si611142pgq.675.2018.04.03.10.11.45; Tue, 03 Apr 2018 10:11:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MseofQNA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752132AbeDCRJw (ORCPT + 99 others); Tue, 3 Apr 2018 13:09:52 -0400 Received: from mail-ot0-f196.google.com ([74.125.82.196]:39377 "EHLO mail-ot0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751946AbeDCRIv (ORCPT ); Tue, 3 Apr 2018 13:08:51 -0400 Received: by mail-ot0-f196.google.com with SMTP id a14-v6so2042053otf.6; Tue, 03 Apr 2018 10:08:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=7GFnrRW++JjG3oKVequb9zK2xl+DQhNyXoNtJjA1wz4=; b=MseofQNASN8PGlZWZt8mELBubCP+FxPUvf8gRtUU2XRUW2be8QaNgAwpKYWFCZTcG9 vCUwXB7fad/XT/rgeAJNUgh2m3x240eI7l+UAlEkdMwYiJa95x5cFmvX5gHmZjCu9G60 /0BwNnykCpO0AHMGjMcZjxvMOiheQ3MtlGzFgh3ztG8Nji+nAXeQpbNnQvmAPkYtOi7j iWOGOq63pPvX/Pa5PniAGygWE+VCS87CdcRyGC2w+OA5RePTin/8ih0rJeKXdsNdb2bP o08/t/P3crScmY7YKusnFQ9kPDYfKicYT6rlZMpew04+Sgwkf0tBn7UxoP9JvP3XYzF7 jC0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=7GFnrRW++JjG3oKVequb9zK2xl+DQhNyXoNtJjA1wz4=; b=o5ZZyuVhejz/HgUJ+e4qkNuf3i6PzvPnoGPYGsWe2P3ldlg0aciZhUNV4nJLDqRt5x Xh4H6QLBoUnkn/LtKPPHHGJ/irDTJf+MPLgnPDlxAB4wYwqga+/KLzA+svQ0wZ0DLCZl NKEXKgihF+C8IKEiC+xhG6Uu8S5ogFwrrljBh5GKrhfPlyLF/krbPEXW2XehOs40xsER IiqnueT2SEjLON7+ClmN6Wg3viXVxslrXsd/Tbc7IxeimK7sFYTJIBJRNjJnc16Vwv/K 3hVxn9ZQFiyNOm1jcOKBMvr5N0WCkapwhqH6Q5QfI2PT7Kg93ei2x4laiEr1XhunD7RB WyEA== X-Gm-Message-State: ALQs6tAUPjfOO19NZBjKnmOrycrnGDLO0YU/B8Ydyjra+ubOwFV2Xfh2 xPyzlkvUSsMP3h3t2k35pvwK23+F X-Received: by 2002:a9d:cd8:: with SMTP id o24-v6mr8005443otd.129.1522775330363; Tue, 03 Apr 2018 10:08:50 -0700 (PDT) Received: from nuclearis2_1.lan (c-98-197-2-30.hsd1.tx.comcast.net. [98.197.2.30]) by smtp.gmail.com with ESMTPSA id p35-v6sm1763878ota.72.2018.04.03.10.08.49 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 03 Apr 2018 10:08:49 -0700 (PDT) From: Alexandru Gagniuc To: linux-acpi@vger.kernel.org Cc: rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, bp@alien8.de, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, Alexandru Gagniuc Subject: [RFC PATCH 1/4] acpi: apei: Return severity of GHES messages after handling Date: Tue, 3 Apr 2018 12:08:27 -0500 Message-Id: <20180403170830.29282-2-mr.nuke.me@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180403170830.29282-1-mr.nuke.me@gmail.com> References: <20180403170830.29282-1-mr.nuke.me@gmail.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The policy currently is to simply panic() on GHES fatal errors. Oftentimes we may correct fatal errors i.e. "Fatal" PCIe errors can be corrected via AER When these errors are corrected, it doesn't make sense to panic(). Update ghes_do_proc() to return the severity of the worst error, while marking handled errors as corrected. Signed-off-by: Alexandru Gagniuc --- drivers/acpi/apei/ghes.c | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 1efefe919555..25cf77a18e0a 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -383,7 +383,7 @@ static void ghes_clear_estatus(struct ghes *ghes) ghes->flags &= ~GHES_TO_CLEAR; } -static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev) +static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev) { #ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE unsigned long pfn; @@ -411,7 +411,10 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int if (flags != -1) memory_failure_queue(pfn, flags); + + return true; #endif + return false; } /* @@ -428,7 +431,7 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int * GHES_SEV_PANIC does not make it to this handling since the kernel must * panic. */ -static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) +static bool ghes_handle_aer(struct acpi_hest_generic_data *gdata) { #ifdef CONFIG_ACPI_APEI_PCIEAER struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata); @@ -456,20 +459,33 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) (struct aer_capability_regs *) pcie_err->aer_info); } + + return true; #endif + return false; } -static void ghes_do_proc(struct ghes *ghes, +/* + * Handle GHES messages, and return the highest encountered severity. + * Errors which are handled are considered to be CORRECTED. The severity is + * taken from each GHES error data entry, not the error status block. + * An error is considered corrected if it can be dispatched to an appropriate + * handler. However, simply logging an error is not enough to "correct" it. + */ +static int ghes_do_proc(struct ghes *ghes, const struct acpi_hest_generic_status *estatus) { - int sev, sec_sev; + int sev, sec_sev, corrected_sev; struct acpi_hest_generic_data *gdata; guid_t *sec_type; guid_t *fru_id = &NULL_UUID_LE; char *fru_text = ""; + bool handled; + corrected_sev = GHES_SEV_NO; sev = ghes_severity(estatus->error_severity); apei_estatus_for_each_section(estatus, gdata) { + handled = false; sec_type = (guid_t *)gdata->section_type; sec_sev = ghes_severity(gdata->error_severity); if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID) @@ -484,10 +500,10 @@ static void ghes_do_proc(struct ghes *ghes, ghes_edac_report_mem_error(ghes, sev, mem_err); arch_apei_report_mem_error(sev, mem_err); - ghes_handle_memory_failure(gdata, sev); + handled = ghes_handle_memory_failure(gdata, sev); } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) { - ghes_handle_aer(gdata); + handled = ghes_handle_aer(gdata); } else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) { struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); @@ -500,7 +516,14 @@ static void ghes_do_proc(struct ghes *ghes, sec_sev, err, gdata->error_data_length); } + + if (sec_sev >= GHES_SEV_RECOVERABLE && handled) + sec_sev = GHES_SEV_CORRECTED; + + corrected_sev = max(corrected_sev, sec_sev); } + + return corrected_sev; } static void __ghes_print_estatus(const char *pfx, -- 2.14.3