Received: by 10.213.65.68 with SMTP id h4csp868471imn; Wed, 4 Apr 2018 08:35:03 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/TG6GgC0qjss86UHZZKTwefgLjWrWpQ2ayvQeJ2hY5mzbTdx7kB2k4rMUopBblfzxKyf2Q X-Received: by 2002:a17:902:7102:: with SMTP id a2-v6mr18899296pll.87.1522856103176; Wed, 04 Apr 2018 08:35:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522856103; cv=none; d=google.com; s=arc-20160816; b=OpjOe23qkzqrnL+lfwOfQ7RRNZb5zdoFwIx8SPUj1hJiCYyJMDJQmj+ddwYb+1rMAY xmkX1fOp4M9qnzofOmzDu+f4rt4ifXxuLiV2n0Ue1H/H7hfeUDNKAyjbVE8khMZSdaY4 OQmAwjiN7xVqSCMaXZ8UNoxgb1IADmWZWZ+oRlYkbUou6OvhRdFXWMv5WHB2LgJ2khhD vyZ3LN3LkE3Us7Nm6pbRRP16Z6UTV3uE302TIoXDolqdh78zWapoV0U9eP6PLyJCmax+ HNqklsGi3DwxDnI5mub1DJqgevkJfiyvzRw1KOK+wUJXpxfrXEF3bY7zou0Gqirh9WIF e+xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=rMyMbGM/eYEhB7haRmMMOZlMhN6JBo68Vy/H8Z/Jg1g=; b=Js1Sn7TdW/glTsQbT88v/gCVeWnxAwumJZrIYjNBH37JCUnd8cYmvI/ZG23Q++kIpm fwsxI5GSmWr/+jzgSfAS5vtg+7i5gc0HOTr3E+XnkDpR7f38eCvlrPxmmhLge50KE/d2 8vGPeVpiBo7tS2vsiU9zNFA6QWsbk+rINxGuz4gpEWBwWeLy1L6WEP4Kud/v9H2uP8+D 38L65QkOvx6BdJ2a9bsDRhApVi0SH+TkHbfFZ4LmyeenENOHHlRfVq6X0Q1nHgUKHYFp oYpWUyna1XaeA9r0s9p5BxSkc6quGUiImA2ZHpFop0WzoOQuD7OdYdl3G1gDbaY9dtH+ mGaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=i0o2xNf7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c12-v6si3344664plo.216.2018.04.04.08.34.47; Wed, 04 Apr 2018 08:35:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=i0o2xNf7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752467AbeDDPdd (ORCPT + 99 others); Wed, 4 Apr 2018 11:33:33 -0400 Received: from mail-ot0-f194.google.com ([74.125.82.194]:44699 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752401AbeDDPd1 (ORCPT ); Wed, 4 Apr 2018 11:33:27 -0400 Received: by mail-ot0-f194.google.com with SMTP id p33-v6so17500088otp.11; Wed, 04 Apr 2018 08:33:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=rMyMbGM/eYEhB7haRmMMOZlMhN6JBo68Vy/H8Z/Jg1g=; b=i0o2xNf72oAH/JTlbYRjnjgzHjjRSqp2buoAllak5V8fU+ubDhtbmpJH5P/rvi1Edl Enkaql8CoHNgpZdRsB4D2pwLiEhBR3s6pygkWrsJPnyH7kUrH8oAZ8/WA6HgDFczqmtg wzW9Xh0mEpkxICyPfvzOj6TLCdoMdsqAeaKLFKBeqhVppY96uMdKrcJTsjQ5w2byjVA6 Zfnm45oLWN3M078EcYS5yCAGT9ndbJMdg3+lXU22/ieREBEZt60q55qcb1m46/TaUxOA MiFbn5lntax9KL7tbaQlQacK0SQ9mNZkuOhpBvNFutm6zXqgPKLRzRQA9ujq7eEYLzAY ddYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=rMyMbGM/eYEhB7haRmMMOZlMhN6JBo68Vy/H8Z/Jg1g=; b=f+v4UsMvHRGmBzTvT3GcTvQzlSXtbZ3Vn4Ki1A5M+C2nG58MBNgNLX02IbQAqhKEQt uvd3+E//FXavk98mnqRxGlt0CoScWqb0P90FW7NuISpFlo2FCnvAUrrtStaYTlbIcYjk TRQ4yO6OPYUGj1+Cx3YB15f0QRVGYTKgQwNltC53cUn7GnfM/3buqHnsZ2Ewv+sbCGK6 LQbqtU4XLFNmCM1/jkq6lVPZp6MQhpkycZH0CUeHVZIPhZ6b5X9j+3xXUWfiP9atsD2L CV9cuX/GIcIRZWbRFT9s+U4zWaPF4lhnhMUSdeyZ2kGoWA68oin8gtvsJCEL0Zp5j/cf T07A== X-Gm-Message-State: ALQs6tBa3ZXMYkkd+T0+jIHGV57uCCt9zC2KZCVEyuL4WY2rVooxLZZa 3Jp2v5Onj/5Np5+lij7o/SQ= X-Received: by 2002:a9d:3483:: with SMTP id g3-v6mr10650440otc.401.1522856006415; Wed, 04 Apr 2018 08:33:26 -0700 (PDT) Received: from nuclearis2_1.gtech (c-98-197-2-30.hsd1.tx.comcast.net. [98.197.2.30]) by smtp.gmail.com with ESMTPSA id m42-v6sm3115228otd.66.2018.04.04.08.33.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Apr 2018 08:33:26 -0700 (PDT) Subject: Re: [RFC PATCH 3/4] acpi: apei: Do not panic() in NMI because of GHES messages To: James Morse Cc: linux-acpi@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, bp@alien8.de, tbaicar@codeaurora.org, will.deacon@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com References: <20180403170830.29282-1-mr.nuke.me@gmail.com> <20180403170830.29282-4-mr.nuke.me@gmail.com> <338e9bb4-a837-69f9-36e5-5ee2ddcaaa38@arm.com> From: "Alex G." Message-ID: Date: Wed, 4 Apr 2018 10:33:25 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <338e9bb4-a837-69f9-36e5-5ee2ddcaaa38@arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi James, On 04/04/2018 02:18 AM, James Morse wrote: > Hi Alexandru, > > On 03/04/18 18:08, Alexandru Gagniuc wrote: >> BIOSes like to send NMIs for a number of silly reasons often deemed >> to be "fatal". For example pin bounce during a PCIE hotplug/unplug >> might cause the link to go down and retrain, with fatal PCI errors >> being generated while the link is retraining. > > Sounds fun! > > >> Instead of panic()ing in NMI context, pass fatal errors down to IRQ >> context to see if they can be resolved. > > How do we know we will survive this trip? We don't. > On arm64 systems it may not be possible to return to the context we took the NMI > notification from: we may bounce back into firmware with the same "world is on > fire" error. Firmware can rightly assume the OS has made no attempt to handle > the error. I'm not aware of the NMI path being used on arm64: $ git grep 'select HAVE_ACPI_APEI_NMI' arch/x86/Kconfig: select HAVE_ACPI_APEI_NMI if ACPI $ A far as jumping back into firmware, the OS should clear the GHES block status before exiting the NMI. THis tells the firmware that the OS got the message [1] I'm not seeing any mechanism to say "handled/not handled", so I'm not sure how firmware can make such assumptions. > Your 'not re-arming the error' example makes this worrying. Those are one class of bugs you get when you let firmware run the show. It's been an issue since day one of EFI. The best you can do in such cases is accept you may later lose the link to a device. Side-note: We have a BIOS defect filed with against this particular issue, and it only seems to affect PCIe SMIs. >> With these change, PCIe error are handled by AER. Other far less >> common errors, such as machine check exceptions, still cause a panic() >> in their respective handlers. > > I agree AER is always going to be different. Could we take a look at the CPER > records while still in_nmi() to decide whether linux knows better than firmware? > > For non-standard or processor-errors I think we should always panic() if they're > marked as fatal. > For memory-errors we could split memory_failure() up to have > {NMI,IRQ,process}-context helpers, all we need to know at NMI-time is whether > the affected memory is kernel memory. I'm trying to avoid splitting up the recovery logic based on _how_ we were notified of the error. The only sure way to know if you've handled an error is to try to handle it. Not handling _any_ fatal error still results in a panic(), and I was very careful to ensure that. The alternative is to keep ghes_do_proc() and ghes_notify_nmi() in sync, duplicate code between the two. But we can also get ghes_proc_irq_work() as an entry point, and this easily turns in a maintenance nightmare. > For the PCI*/AER errors we should be able to try and handle it ... if we can get > back to process/IRQ context: > What happens if a PCIe driver has interrupts masked and is doing something to > cause this error? We can take the NMI and setup the irq-work, but it may never > run as we return to interrupts-masked context. The way the recovery from NMI is set up is confusing. On NMIs, ghes_proc_in_irq() is queued up to do further work. Despite the name, in this case it runs as a task. That further queues up AER work, Either way, PCIe endpoint interrupts are irrelevant here. AER recovery runs independent of drivers or interrupts: - Drivers that implement AER recovery services are expected to do the right thing and shut down IO. To not do so, is a bug in the driver. - Drivers that do not implement AER have their devices taken offline. It is not uncommon for PCIe errors to amplify. Imagine losing the link right after you've fired off several DMA queues. You'd get a stream of Unsupported Request errors. AER handles this fairly well. > >> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c >> index 2c998125b1d5..7243a99ea57e 100644 >> --- a/drivers/acpi/apei/ghes.c >> +++ b/drivers/acpi/apei/ghes.c >> @@ -955,7 +962,7 @@ static void __process_error(struct ghes *ghes) >> static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) >> { >> struct ghes *ghes; >> - int sev, ret = NMI_DONE; >> + int ret = NMI_DONE; >> >> if (!atomic_add_unless(&ghes_in_nmi, 1, 1)) >> return ret; >> @@ -968,13 +975,6 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) >> ret = NMI_HANDLED; >> } >> >> - sev = ghes_severity(ghes->estatus->error_severity); >> - if (sev >= GHES_SEV_PANIC) { >> - oops_begin(); >> - ghes_print_queued_estatus(); >> - __ghes_panic(ghes); >> - } >> - >> if (!(ghes->flags & GHES_TO_CLEAR)) >> continue; > > For Processor-errors I think this is the wrong thing to do, but we should be > able to poke around in the CPER records and find out what we are dealing with. I don't think being trigger-happy with the panic button is ever a good thing. The logic to stop the system exists downstream, and is being called. I can see an issue if we somehow lose the ability to run queued tasks, but in that case, we've already lost control of the machine. Alex [1] ACPI 6.2 spec. 18.3.2.8 Generic Hardware Error Source version 2 (GHESv2 - Type 10) > Thanks, > > James >