Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp3632520imm; Fri, 25 May 2018 08:56:05 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrTXbvRbw4/ssVimyCghDuq9GrV7YA0qJjWC3Hq/qJLAFTr3a/cVQc2+FKP2iUewd2+2ByJ X-Received: by 2002:a63:7c0b:: with SMTP id x11-v6mr2424593pgc.384.1527263765383; Fri, 25 May 2018 08:56:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527263765; cv=none; d=google.com; s=arc-20160816; b=vvUpTDQ7bZcohDKcfBdfKOiXskRYokkd9ALMR8t9J2R4GFU1uBWf9+CJavnqVXLNKW Bd/yaWZQY/jQuCtz2kxlDRntgTBFNg95ZpO2xwJqmyZfgk0pazVR2MUurKtyx3ODX7S5 /dg85mUzvtwam6cWrahhhhdkEu08an3IhOwEhR1r6PwYsH6befzyT2J9K56ZYqXMtM0G OrBmNdr81fUI6z+QlkPB/jBAGFziTbZLsKUoiitwDSAaNUmPhmxW3Hr4tVCe6H7k51Qx dcBLckw4ut6V9uvIvxZN73B1lw/6iId0ReJBXX5ACcCwfz8bha8oJdVtdUbLJrn6rVA0 A9tw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=iyot9WMyOffjxTKGeSAgyqoMl+LoA0/bo0gM4pa4lXw=; b=kSrNADeKAKJOtnWgbB6G3/2xeP5vftl3VrLjgu08pHOkvZdNRIMqFO+Cye4rGnC5p0 2jr0c+e4iqTBIiTfFd3P0aioXlqVmeYnPunGojAPRataf9UCFYvuvUKHCl57m9h+PRgm S3VXzT5B5La4w5K57w1wT4jXBynFSxuhZCBQYqyHNwo8mctHyzMYMlg9TCnpF1gJ1lQr XcdQ3NLtyjTa7UBBt6wZegsp5u33y+ch47RgbE1bAtou0bDRqQtTjQDBbJpKUC1wcwAi x2R/oqYfIDcxdWs7dwapYUgILdoyjPo+/YYb5w3YWWPMIfw2bdnP1sxPH9/Z684erE5c 1vow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=XL+dIUSc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v9-v6si24676797plg.124.2018.05.25.08.55.50; Fri, 25 May 2018 08:56:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=XL+dIUSc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S967171AbeEYPyW (ORCPT + 99 others); Fri, 25 May 2018 11:54:22 -0400 Received: from mail-ot0-f193.google.com ([74.125.82.193]:36497 "EHLO mail-ot0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965666AbeEYPyS (ORCPT ); Fri, 25 May 2018 11:54:18 -0400 Received: by mail-ot0-f193.google.com with SMTP id m11-v6so6573782otf.3; Fri, 25 May 2018 08:54:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=iyot9WMyOffjxTKGeSAgyqoMl+LoA0/bo0gM4pa4lXw=; b=XL+dIUScgDXsJzC2qGol4sZOPdGdee1KwtAPGDr6NC/YcWqcR49Yi+OZY04pOiGljK pXjnYS7diZsQP8exW+FBduLWg9ToItWYgq5QXCctDqydCxhTCOrGhLJgWSWJlRFxSHaq UWFNVdLBkpEM8Bp0j4l3GT/IMxpTTZbyPzLhtY1e+ZmtuMT7o5qw9xVZNVUBZxavRU10 4wrzt5l4wHERsrdJtgCQGHphOBrhfYA7c4qDPWqNktpTH4Q4McoGpKs3au9AUFgdQcTn ys4WCo9FgvPB2RjFCrGQrantSdPXMWgNxZVjOq4LjxzwySl5W2mmfIqiUj/R1kYlLwuM sM9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=iyot9WMyOffjxTKGeSAgyqoMl+LoA0/bo0gM4pa4lXw=; b=B6c3Vtyefb10hzrQMCmHXol4im27Utdfd29NJpVwIJFpAaEjx9C+TBZS7dYl06iHHL Yfp5POb2vNVySrsq2u7H+TaLH5NNCyqA12bN66lkUwlM/smjhMt+3CxInHpeGxOSvpEY IMojFpBok25BMf/fCWJ6k+42T3JjgTdkLFOL9QMGXtNf/0zsMLD41e9HAIhLX60T+obj yZVimmr6WdZ7Y3PNM4ZBJAzwvIyLyX/SrhdaCPrAVOtb1cZGgpVoiFiu1AYz8/QlyZLr CCSK1fIxgUzLRMzkSn8LtSXxvCdBTRK9GYGPjBM5jJ+1ZUwQQZrnbcxbkjj6ZwL0TFu7 i9EQ== X-Gm-Message-State: ALKqPwcb/SfFX9alLePMqdiO4TkPnNbiCI+NurLxwRMj7YrVQi3Abd7o 4tydBmyEpjOPRUMzpTzSsAgWfOK5 X-Received: by 2002:a9d:10e:: with SMTP id 14-v6mr1810110otu.238.1527263657572; Fri, 25 May 2018 08:54:17 -0700 (PDT) Received: from nuclearis2_1.lan (c-98-201-114-184.hsd1.tx.comcast.net. [98.201.114.184]) by smtp.gmail.com with ESMTPSA id o206-v6sm2636856oia.35.2018.05.25.08.54.16 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 25 May 2018 08:54:17 -0700 (PDT) From: Alexandru Gagniuc To: linux-acpi@vger.kernel.org Cc: alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, Alexandru Gagniuc , Tony Luck , Borislav Petkov , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, "Rafael J. Wysocki" , Len Brown , Mauro Carvalho Chehab , Robert Moore , Erik Schmauss , Tyler Baicar , Will Deacon , James Morse , "Jonathan (Zhixiong) Zhang" , Dongjiu Geng , linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, devel@acpica.org Subject: [PATCH v7 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES Date: Fri, 25 May 2018 10:53:48 -0500 Message-Id: <20180525155352.22350-4-mr.nuke.me@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180525155352.22350-1-mr.nuke.me@gmail.com> References: <20180525155352.22350-1-mr.nuke.me@gmail.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As previously noted, the policy to panic on any "Fatal" GHES error is not suitable for several classes of errors. The most notable is error containment. The correct policy is to achieve identical behavior to native error handling -- i.e. when not reported through GHES. This, in special cases, may not be possible, as we have to exit NMIs, which requires these special considerations PCIe AER errors are contained and reported at the root port. On DPC capable hardware, containment can be done by all downstream ports. DPC also has the added advantage of preventing future errors. Since these errors stop at the root port, we can do all the work we need to exit NMI and reach the error handler. This patch does away with the mindless crashing of the system, and correctly invokes the AER handler. When AER is not enabled, or the firmware doesn't provide sufficient information to identify the source of the error, the original panic() behavior is maintained. Signed-off-by: Alexandru Gagniuc --- drivers/acpi/apei/ghes.c | 43 +++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 41 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 1b22e18168f5..f7126f6d8d52 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -425,7 +425,7 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int * GHES_SEV_RECOVERABLE -> AER_NONFATAL * GHES_SEV_RECOVERABLE && CPER_SEC_RESET -> AER_FATAL * These both need to be reported and recovered from by the AER driver. - * GHES_SEV_FATAL does not make it to this handler + * GHES_SEV_FATAL -> AER_FATAL */ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) { @@ -837,6 +837,45 @@ static inline void ghes_sea_remove(struct ghes *ghes) { } static struct llist_head ghes_estatus_llist; static struct irq_work ghes_proc_irq_work; +/* PCIe AER errors are safe if AER section contains enough info. */ +static int ghes_pcie_has_safe_handler(struct acpi_hest_generic_data *gdata) +{ + struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata); + + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID && + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO && + IS_ENABLED(CONFIG_ACPI_APEI_PCIEAER)) + return true; + + return false; +} + +/* + * Do we have an error handler that we can safely reach? We're concerned with + * being able to notify an error handler by crossing the NMI/IRQ boundary, + * being able to schedule_work, and so forth. + */ +static int ghes_has_fatal_handler(struct ghes *ghes) +{ + int worst_sev, sec_sev; + bool safe = true; + struct acpi_hest_generic_data *gdata; + const guid_t *section_type; + const struct acpi_hest_generic_status *estatus = ghes->estatus; + + apei_estatus_for_each_section(estatus, gdata) { + section_type = (guid_t *)gdata->section_type; + + if (guid_equal(section_type, &CPER_SEC_PCIE)) + safe = ghes_pcie_has_safe_handler(gdata); + + if (!safe) + break; + } + + return safe; +} + /* * NMI may be triggered on any CPU, so ghes_in_nmi is used for * having only one concurrent reader. @@ -944,7 +983,7 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) } sev = ghes_cper_severity(ghes->estatus->error_severity); - if (sev >= GHES_SEV_FATAL) { + if ((sev >= GHES_SEV_FATAL) && !ghes_has_fatal_handler(ghes)) { oops_begin(); ghes_print_queued_estatus(); __ghes_panic(ghes); -- 2.14.3