Received: by 10.192.165.156 with SMTP id m28csp795857imm; Fri, 13 Apr 2018 08:00:36 -0700 (PDT) X-Google-Smtp-Source: AIpwx48FDHG/w/pBmNFeTJ94nE3DIXlLEtDuoG5xMOYLEsPcLfhpDlAvTEu8iz8+QBrat+TBCtOr X-Received: by 2002:a17:902:6807:: with SMTP id h7-v6mr5433339plk.90.1523631636494; Fri, 13 Apr 2018 08:00:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523631636; cv=none; d=google.com; s=arc-20160816; b=PRVxI9QscVSjADvDZhRF3rozuuLLUs9BllK8rPWpKOmWW9sQzpo8Ptqm1wm/i1fQek ZesO4DvwzYJrlL30UCpN9IYvAyQXjQqEG1QCt92IVIJuz+81FUAM5iiRa/XhniGNo5T3 So7OeeszJsHhnZGqczryS9uLIRZx6tfg1hgQsmGMes6+AeGvR6wCklIbTKE+Vf4+dkrP xD9LPEX6dw9bSBfC8yId0bFFqgKJI39kc7Lz+ZdVR4Eyj51auY38m5ge/CRjzsFi6v76 pkh704RtPHs3wNO0SzOPbHDzaKVoczGBp1mzh1K0TMjuS+sFA4A++kMYuwIpyqIloMFo 9ztw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=Xd8epC2jxSg5N3I7swQcXlF75cOT1uT5PaOZ9D2oess=; b=fM1ajySMqWDpBYCaaV4G+CgFa1vjNhBwRsQB/Kr66mHixWZI184qsp0pKHngf9i86g 8lmuvAL4LHYn2scWznjPvOuBF54q5G7bbu9RvLgXOu1xHhiEYBzyg1B2mTbfYhoGvL/h ukLGz/R2GhPmCVXeTTGdSCIGZFCS0DFUENMxLPZkwo2pO0GKckZPMaDD8fKcX1dfaaac 7dXFWDnHNCx4VcNPFIN/QbGol03l49nI+HEHsGqTMHFvF4S66+P1SRokoFiXhZ0s4g74 gbSqq0W2Utv+aMRDLimb8pQ+h7s0ITzrmSqRnDkUOJTj1pIYSMIq2NghTOhrLXKBjDrS EMnA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r79si4804201pfb.149.2018.04.13.08.00.22; Fri, 13 Apr 2018 08:00:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754279AbeDMNwb (ORCPT + 99 others); Fri, 13 Apr 2018 09:52:31 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:6739 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753630AbeDMNw2 (ORCPT ); Fri, 13 Apr 2018 09:52:28 -0400 Received: from DGGEMS407-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 7E32E928A1F1D; Fri, 13 Apr 2018 21:52:13 +0800 (CST) Received: from [127.0.0.1] (10.142.68.147) by DGGEMS407-HUB.china.huawei.com (10.3.19.207) with Microsoft SMTP Server id 14.3.361.1; Fri, 13 Apr 2018 21:52:05 +0800 Subject: Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8 To: James Morse , gengdongjiu CC: , , "linux-arm-kernel@lists.infradead.org" , "Liujun (Jun Liu)" , "linux-kernel@vger.kernel.org" , "corbet@lwn.net" , "marc.zyngier@arm.com" , "catalin.marinas@arm.com" , "linux-doc@vger.kernel.org" , "rjw@rjwysocki.net" , "linux@armlinux.org.uk" , "will.deacon@arm.com" , "robert.moore@intel.com" , "linux-acpi@vger.kernel.org" , "bp@alien8.de" , "lv.zheng@intel.com" , Huangshaoyu , "kvmarm@lists.cs.columbia.edu" , "devel@acpica.org" References: <0184EA26B2509940AA629AE1405DD7F201AC71DE@DGGEMA503-MBS.china.huawei.com> <5A85C97C.5080605@arm.com> <649fa2f6-1823-116e-fe69-aeaaa6a508af@arm.com> From: gengdongjiu Message-ID: Date: Fri, 13 Apr 2018 21:50:30 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <649fa2f6-1823-116e-fe69-aeaaa6a508af@arm.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.142.68.147] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org James, Thanks for this mail. On 2018/4/13 0:14, James Morse wrote: > Hi gengdongjiu, > > On 12/04/18 06:00, gengdongjiu wrote: >> 2018-02-16 1:55 GMT+08:00 James Morse : >>> On 05/02/18 11:24, gengdongjiu wrote: >>>>> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO, >>>>> TGE}? >>>> >>>> Yes, it is. >>> >>> ... and yet ... >>> >>> >>>>> What does your firmware do when it wants to emulate SError but its masked? >>>>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had >>>>> PSTATE.A set. >>>>> e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the >>>>> emulated SError should go to EL1. This effectively masks SError.) >>>> >>>> Currently we does not consider much about the mask status(SPSR). >>> >>> .. this is a problem. >>> >>> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception >>> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't eret to >>> EL2. This should never happen, SError is effectively masked if you are running >>> at an EL higher than the one its routed to. >>> >>> More obviously: if the exception came from the EL that SError should be routed >>> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the only > >> James, I summarized the masking and routing rules for SError to >> confirm with you for the firmware first solution, > > You also said "Currently we does not consider much about the mask status(SPSR)." Yes, we currently do not consider much it. After clarification with you, we want to modify the EL3 firmware to follow this rule. > > >> 1. If the HCR_EL2.{AMO,TGE} is set, > > If one or the other of these bits is set: (AMO==1 || TGE==1) > >> which means the SError should route to EL2, >> When system happens SError and trap to EL3, If EL3 find >> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set, >> and find this SError come from EL2, it will not deliver an SError: >> store the RAS error in the BERT and 'reboot'; but if >> it find that this SError come from EL1 or EL0, it also need to deliver >> an SError, right? > > Yes. > > >> 2. If the HCR_EL2.{AMO,TGE} is not set, > > If neither of these bits is set: (AMO==0 && TGE == 0) > >> which means the SError should route to EL1, >> When system happens SError and trap to EL3, If EL3 find >> HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set, > > (I'm reading this as all three of these bits are clear) sorry, it is a typo issue. it should be HCR_EL2.AMO and HCR_EL2.TGE are both clear, but SPSR_EL3.A is set. > >> and find this SError come from EL1, it will not deliver an SError: >> store the RAS error in the BERT and 'reboot'; > > No, (AMO==0 && TGE == 0) means SError is routed to EL1, this exception > interrupted EL1 and the A bit was clear, so EL1 can take an SError. Agree. > > The two cases here are: > AMO==0,TGE==0 means SError should be routed to EL1. If SPSR_EL3 says the > exception interrupted EL1 and the A bit was set, you need to do the BERT trick. > > If SPSR_EL3 says the exception interrupted EL2, you need to do the BERT trick "BERT trick" is storing the RAS error in the BERT and 'reboot, right? > regardless of the A bit, as SError is implicitly masked by running at a higher > exception level than it was routed to. > > >>From your v11 reply: >> 2. The exception came from the EL that SError should not be routed >> to(according to hcr_EL2.{AMO, TGE}),even though the PSTATE.A was set,EL3 >> firmware still deliver SError > > (this is re-iterating the two-cases above:) > 'not be routed to' is one of two things: Route-to-EL2+interruted-EL1, or > Route-to-EL1+interrupted-EL2. > > Route-to-EL2+interrupted-EL1 is fine, regardless of SPSR_EL3.A the emulated > SError can be delivered to EL2, as EL2 can't mask SError when executing at a > lower EL. Agree. > > Route-to-EL1+interrupted-EL2 is the problem. SError is implicitly masked by > running at a higher EL. Regardless of SPSR_EL3.A, the emulated SError can not be > delivered. "can not be delivered" means storing the RAS error in the BERT and 'reboot, right? In the Table D1-15 in "D1.14.2 Asynchronous exception masking", for the case, it is "C" "C"means SError is not taken regardless of the value of the Process state interrupt mask. for this case, whether it will be unsafe if BIOS directly reboot? > KVM does this on the way out of a guest, if an SError occurs during this time > the CPU will wait until execution returns to EL1 before delivering the SError. > Your firmware has to do the same. > > Table D1-15 in "D1.14.2 Asynchronous exception masking" has a table with all the > combinations. The ARM-ARM is what we need to match with this behaviour. > > >> but if it find that this SError come from EL0, it also need to deliver an >> SError, right? > > I thought interrupted-EL0 could always be delivered: but re-reading the > ARM-ARM's "D1.14.2 Asynchronous exception masking", if asynchronous exceptions > are routed to EL1 then EL0&EL1 are treated the same. > So if SError is routed to EL1, the exception interrupted EL0, and SPSR_EL3.A was > set, you still can't deliver the emulated-SError you have to do the BERT-trick. > Linux doesn't do this today, but another OS might (e.g. UEFI), and we might do > this in the future. For this case, whether it will be unsafe if BIOS directly reboot? For example, for some test purpose, EL0 set PSTATE.A, just right happen SError, then BIOS will reboot system. I am afraid that system will become unsafe because BIOS will reboot system. > > This is really tricky for firmware to get right. Another alternative would be to > put the CPER records in a Polled buffer, unless something needs doing right now, > in which case a BERT-reboot is probably best. In summary: [1]: Route-to-EL1 + interrupted-EL1, if SPSR_EL3.A is set, EL3 firmware can't deliver the emulated-SError, store the RAS error in the BERT and 'reboot. Route-to-EL2 + interrupted-EL2, if SPSR_EL3.A is set, EL3 firmware can't deliver the emulated-SError, store the RAS error in the BERT and 'reboot. I agree above two cases, but maybe we need to ensure that only in EL2 SError handler and EL1 SError exception handler the PSTATE.A is set, for other places, the PSTATE.A is not set. then BIOS can know this is nested-SError when find the SPSR_EL3.A is set, can we ensure that in the Linux kernel code and KVM code? [2]: Route-to-EL2 + interrupted-EL1, regardless of SPSR_EL3.A the emulated SError can be delivered to EL2. Route-to-EL2 + interrupted-EL0, regardless of SPSR_EL3.A the emulated SError can be delivered to EL2. I agree above two cases. [3]: Route-to-EL1+interrupted-EL0, if SPSR_EL3.A is set, EL3 firmware can't deliver the emulated-SError, store the RAS error in the BERT and 'reboot Route-to-EL1+interrupted-EL2, EL3 firmware store the RAS error in the BERT and 'reboot regardless of SPSR_EL3.A. For above two cases, I am worried system will become unsafe because BIOS will reboot system. > > > Thanks, > > James > > . >