Received: by 10.192.165.156 with SMTP id m28csp1142037imm; Mon, 16 Apr 2018 15:01:16 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+gGhFZO2r0rCUXp6uFuPREC/jev2QKygFMYThAVSDR9tpTzNztP+ZkbUSrqtCHuKeJqMX0 X-Received: by 2002:a17:902:bc44:: with SMTP id t4-v6mr8268129plz.2.1523916076123; Mon, 16 Apr 2018 15:01:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523916076; cv=none; d=google.com; s=arc-20160816; b=ObjJOxAMlWjbb1mQk/r57aV+rOS+zILTzg6WMO3gTbBFAzGGbrRHwcHxQm3k7we66E TRXeqxTH3sQulDjkeK5ycdtbufvguwPMcdV/RRlOSzPN3Gf8QaBbqKcvuA+MUWsx8sL3 JFle5HJCWON9OcXQx0dTuTfsedJB2WsG/3gYA1WoXet+qLHzVbo/8x4ezAaKziehMp1T 2d5WKW5S8+0xnIwHD5IkU5LTBH0yoHB2prXx1l+XchayHU+uW5eK1pqXL9++1UkLljBz wlIm2iSwQOMIUKsYIJwvROWm2anwNH8qfZPKbK6CuzWfA3rkLDkmoYnY+xfTfeKnHF3H Ranw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=kJmAZsiQP+lXhIafB84KyZY5PleLcMUYDVg/zMmckPM=; b=LVs87YVAUcBGMs/h6FPj9ZiAsA7b3wXLmigu0vyVYSlPyhYc4gok+p1U40otjaRT5K 1AcMlsXmnaeJjG0/xGawZYuP/mdX3lvoxYS5yacDH9gqNSI/xlAQrvyhe7xyUb2ruknI 9+7ifLQQAqmo3u52sCpsWR6vvC5nJdCBkqanSm+oBRkwAe4SkqSb4Y5IVuwrKw4Jr2Oj hDrmLh8i2virj7NWd0wBwIsP1Rv6PJuwGzyw/dIhxmtvqaqUGWByig2ROFWiIbLyxwyM MUs0nFfuI8CbpicMElBPwYBhxgQVSlQXZpxM1Tl+5+BjeLod8snXVBFxoSw1v5+DzazS b5TA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=KmwW6QND; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s5-v6si9176107plp.139.2018.04.16.15.01.01; Mon, 16 Apr 2018 15:01:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=KmwW6QND; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753156AbeDPV7f (ORCPT + 99 others); Mon, 16 Apr 2018 17:59:35 -0400 Received: from mail-oi0-f50.google.com ([209.85.218.50]:34009 "EHLO mail-oi0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753065AbeDPV7c (ORCPT ); Mon, 16 Apr 2018 17:59:32 -0400 Received: by mail-oi0-f50.google.com with SMTP id e23-v6so4856914oiy.1; Mon, 16 Apr 2018 14:59:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=kJmAZsiQP+lXhIafB84KyZY5PleLcMUYDVg/zMmckPM=; b=KmwW6QNDc0nSnxU5nDLPyAA67AXzuqkGl7M3PWnO2PazU7wUiZGRuldD8k7OJeqLGA LO3FtBvSIsveR7It/NUiFTEIzf3tU/ol5gpnBf8gDnWptSWFmQpvZxg7Sf6iURcNuktb Hb13m+jyCgINb/Zgx5U3MRMi076mMyg1sjs+HOGrAI9iv5ImOAzu/sCKzwbJgqny/xZF tW7J4qfK6vU4MkpkbUTq3Jq5IchO6AIHSX4AqxfKFZbKiVRMLXFW/0YDhPAA3IlJRwyz ikdWdHHO/sqpYCY/eDOBRtTHP5Kr3jW+c0D6PG+kNLM8muOMpXAjyG7O4gEfzVyaD/Lr AwJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=kJmAZsiQP+lXhIafB84KyZY5PleLcMUYDVg/zMmckPM=; b=D32WjEj2x+a5qzM6Mmpg4N5P+9yqBEpeP0CpXQgnpshJgik2wEJYWEGhR33mnYHDjc KOPdRWwab0f8sPNXJ0959uMRxayS8XKlSWYDSA/jIK9bOQGcy+pQpAgzVj2R4gKTxDFV BV6qHu2U9zzHp92ulNMJMtr4RdlUIPXtt9im7kxAYwD+8KoQfT5+6ihAEnA7/5AeP8ZC NizGFCuJXOwnUlRwlaEHOWXnyk85giJbNZWg1pHb/gl2L8pikyfE0NhMUjNeQocY15/l 6mqNHHoJBzC23BwjAiPRQ9TJbesT/Rk5vxxqbDQUNdhibNdPgfuiUdxTM+qqLDAIM90F pRBw== X-Gm-Message-State: ALQs6tATzKfb3+9xN7654g8m4TuB0cMg12K4yBYnKRDPFpLFWIgqNHw1 MT/cqQRaqrgxO3XsbrXHIj0RcRXqr5M= X-Received: by 2002:aca:3e07:: with SMTP id l7-v6mr4748919oia.223.1523915971610; Mon, 16 Apr 2018 14:59:31 -0700 (PDT) Received: from nuclearis2_1.gtech (c-98-197-2-30.hsd1.tx.comcast.net. [98.197.2.30]) by smtp.gmail.com with ESMTPSA id n13-v6sm5389834ota.50.2018.04.16.14.59.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 16 Apr 2018 14:59:30 -0700 (PDT) Subject: Re: [RFC PATCH 3/4] acpi: apei: Do not panic() in NMI because of GHES messages To: James Morse Cc: linux-acpi@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, bp@alien8.de, tbaicar@codeaurora.org, will.deacon@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com References: <20180403170830.29282-1-mr.nuke.me@gmail.com> <20180403170830.29282-4-mr.nuke.me@gmail.com> <338e9bb4-a837-69f9-36e5-5ee2ddcaaa38@arm.com> <9e29e5c6-b942-617e-f92e-728627799506@gmail.com> <2120d34a-41d2-9fff-2710-d11e9a19e12a@gmail.com> <855860ef-f84e-00af-ed44-55d6a5a41a94@arm.com> From: "Alex G." Message-ID: <70c0a230-945a-3a1a-7c49-4b0784a3cfa6@gmail.com> Date: Mon, 16 Apr 2018 16:59:29 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <855860ef-f84e-00af-ed44-55d6a5a41a94@arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/13/2018 11:38 AM, James Morse wrote: > Hi Alex, > > On 09/04/18 19:11, Alex G. wrote: >> On 04/06/2018 01:24 PM, James Morse wrote: >> Do you have any ETA on when your SEA patches are going to make it >> upstream? There's not much point in updating my patchset if it's going >> to conflict with your work. > > The SEA stuff went in with 7edda0886bc3 ("acpi: apei: handle SEA notification > type for ARMv8"). My series is moving it to use the estatus-queue in the same > way as x86's NOTIFY_NMI does. This lets us safely add the other two NMI-like > notifications. I have no idea on the ETA, it depends on review feedback! Okay. I'll get a v2 out soonish then. (snip) > This assumes a cache-invalidate will clear the error, which I don't think we're > guaranteed on arm. > It also destroys any adjacent data, "everyone's happy" includes the thread that > got a chunk of someone-else's stack frame, I don't think it will be happy for > very long! Hmm, no cache-line (or page) invalidation on arm64? How does dma_map/unmap_*() work then? You may not guarantee to fix the error, but I don't buy into the "let's crash without trying" argument. > (this is a side issue for AER though) Somebody muddled up AER with these tables, so we now have to worry about it. :) (snip) >> How does FFS handle race conditions that can occur when accessing HW >> concurrently with the OS? I'm told it's the main reasons why BIOS >> doesn't release unused cores from SMM early. > > This is firmware's problem, it depends on whether there is any hardware that is > shared with the OS. Some hardware can be marked 'secure' in which case only > firmware can access it, alternatively firmware can trap or just disable the OS's > access to the shared hardware. It's everyone's problem. It's the firmware's responsibility. > For example, with the v8.2 RAS Extensions, there are some per-cpu error > registers. Firmware can disable these for the OS, so that it always reads 0 from > them. Instead firmware takes the error via FF, reads the registers from > firmware, and dumps CPER records into the OS's memory. > > If there is a shared hardware resource that both the OS and firmware may be > accessing, yes firmware needs to pull the other CPUs in, but this depends on the > SoC design, it doesn't necessarily happen. The problem with shared resources is just a problem. I've seen systems where all 100 cores are held up for 300+ ms. In latency-critical applications reliability drops exponentially. Am I correct in assuming your answer would be to "hide" more stuff from the OS? (snip) > Sure, we're quirking our behaviour based on a high level of mistrust for the > firmware. My point here was we shouldn't duplicate the implementation because we > want x86:{AER,CPU,MEM} to behave differently to arm64:{AER,CPU,MEM}. I'd rather > the quirked-behaviour was along the *:{AER} versus *:{CPU,MEM} line. If we have > extra code to spot deferrable errors, we should use it on both architectures. It's a well earned and well deserved mistrust. Firmware is evil (*). (*) sarcastic overstatement of facts (snip) > For AER we agree, these never mean 'the CPU is on fire'. Sounds like a good marketing slogan: "ACME's new turbo-encabulated CPU -- it's on fire!" (snip) >>> even if broken firmware thinks they are fatal. > >> I think the idea of firmware-first is broken. But it's there, it's >> shipping in FW, so we have to accommodate it in SW. > > Part of our different-views here is firmware-first is taking something away from > you, whereas for me its giving me information that would otherwise be in > secret-soc-specific registers. Under this interpretation, FFS is a band-aid to the problem of "secret" registers. "Secret" hardware doesn't really fit well into the idea of an OS [1]. The irony of the solution is that the response is centered on firmware extensions which were designed to stile interoperability [2]. You are right, FFS is a poopstorm and headache for me. It takes my sanity away. I once wrote FW for a laptop where the only use of SMM was to run the "Enable ACPI" call -- which only disabled further SMM. We didn't need an iota of FFS because at that time AMD wasn't secretive, and there was no need to have "secret" registers in the first place. [1] https://www.youtube.com/watch?v=_36yNWw_07g [2] http://antitrust.slated.org/www.iowaconsumercase.org/011607/3000/PX03020.pdf (snip) >> And linux can handle a wide subset of MCEs just fine, so the >> ghes_is_deferrable() logic would, under my argument, agree to pass >> execution to the actual handlers. > > For some classes of error we can't safely get there. Optimize for the common case. (snip) >> Though in that case, the problem is generalized to "how to best handle >> error Y", rather than "how to handle error Y in FFS". > > (that's a good thing yes?) I assume x86 can take MCE errors out of IRQ-masked > code. Sharing the handle_foo_error_nmi() code between the two paths would be a > good thing. That's a good thing, yes. But, by ignorance, FFS doesn't always let you do this. Getting to your kernel data structures is not IRQ-safe in the general case. In the AER case, you need pci_get_domain_bus_and_slot(), which is not IRQ-safe. So, while in the native case, you can stage context pointers, and separate things based on interrupt vector, you don't get this sort of flexibility with FFS. There is the extra step of recovering your context, which is not the sort of situation you often get with kernel-first. Most of the discussion has been around whether to crash before we even get this context. Stay tuned for v2 though. (snip) >> Sarcasm aside, I do like the idea of a message complaining about the >> firmware. > > It wouldn't be appropriate in all cases, but for AER it looks like 'fatal' is > always an over-reaction. > For example, given a memory error firmware can't know whether we can re-read > some page of kernel memory from disk and then reschedule the thread. If firmware can't know the error is recoverable, it also can't know it's fatal. When firmware does try to make a determination from insufficient information, it opens itself up to sarky remarks. That's fair game. (snip) >> If we're going to split recovery paths, then it makes better sense to >> have a system where handleable errors are passed down to a lesser >> context. Or even run non-blocking handlers in whatever context they are >> received. Much more than having a special case for a specific error. > > Yes. I think running the NMI-safe parts of the handler first, then the IRQ-parts > when we drop to IRQ context is the way to do this. Coming soon to a ML near you. >>>> I abstract things at irq_work_queue(): "queue this work and let me >>>> return from this interrupt" >>> >>> ('from this NMI', to come back later as an IRQ, from where we can call >>> schedule_work_on(). This chaining is because the scheduler takes irqsave >>> spin-locks to schedule the work.) >> >> Hopefully, the handling is set up in such a way that I don't have to >> worry about these details on a per-error basis. If I do have to worry, >> them I must be doing something wrong. > > I agree, but something has to know. Lets tackle AER first, The actual handler knows. For AER, that's do_recovery(). > where we have none of theses concerns Untamed PCIe errors can end up as MCEs and triple-fault the CPU :) You didn't think we get a free ride with AER, did you? (snip) >>> On arm64 poisoned-cache locations triggering an external abort (leading to a >>> NOTIFY_SEA NMI-like notification to APEI) in response to a load is synchronous, >>> hence the live-lock problem. >> >> How is this currently handled in the kernel-first case? > > We don't have any kernel-first support today :( > it would need the Arm version of > MCA (v8.2 RAS Extensions) which are relatively new. Everything we've seen so far > is using firmware-first as it has to know a little about the SoC topology. > > How would it be handled? For the kernel-first version of NOTIFY_SEA the code > would get the interrupted pt_regs and can probe the RAS Error registers (snip) Sounds like a regular, standard, raceless error handler :) >>>> I am not concert about livelocking the system. >>> >>> This is what keeps me awake at night. >> >> Nothing keeps me awake at night. I keep others awake at night. > > (Ha! I owe you a beer!) (I will not forget) >>> I think your platform has NMI handled as MCE/kernel-first for synchronous CPU >>> errors. SMI triggered by PCI-AER are notified by NMI too, but handled by >>> firmware-first. >> >> MCEs come in via FFS. > > Oh? I thought those were to separate worlds. I assumed MCE-NMIs didn't go via SMM. FFS is done in SMM. I'm certain there's a very simple and easy way in which MCEs can be reported kernel-first, but I've only seen it done FFS recently. (snip) > I think the way forward here is to spot defer-able errors, even if they are > marked fatal. Initially this will just be AER, but if we can split up the > handlers to have NMI/IRQ/Process context counterparts, we may be able to do > enough work early-on to defer other types of error. You're gonna love my v2. Alex