Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1764106imu; Wed, 21 Nov 2018 01:23:30 -0800 (PST) X-Google-Smtp-Source: AFSGD/XwU6au4SQfiykddKhRHNS66WXrJFLoOt5/Nl8sZr1euwiczLZt6EwvbWcHTp1UgyZJ23ud X-Received: by 2002:a63:9306:: with SMTP id b6mr5074179pge.36.1542792210498; Wed, 21 Nov 2018 01:23:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542792210; cv=none; d=google.com; s=arc-20160816; b=ltnY6tgy6+Jz+tkjP6SYzi505t+fCHWycxKoOgsuy2OuTvZmLMOe58zLrrTImxZsA4 FIspac5Qa9sN0+9zZl8jYZ6EarF3/btjXqkFsdRYGQ/KHIhy4AZQsl2qA6WCjWMoJErA KZpAWmSfsQ1AFc6DrW9bIIZpXVk5Tcp9GGb7Xa1P5nb/0IiNks3MqCWU0OsY4t4KcjKs cyy/MeLifBvvtfJyI3AelUC3TUUb2DvK1dt7SuXQrW8iZUNYtDWlQvvt785IHJYdAk0v sT1bnsPIhx58sfShVgBJe4WL3VP/EGSvZ0mSHfJQpaY0pIjJ+zUl05ljC1Ox6SEIRAut KWBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=ZiyfZ53Jya+qJUiQhHaifKVo93VD580Ma6Ogc6t2miE=; b=UZQd6c+hjWERNrrjPUCf4KBpkZtetzdK9pKIOXkEt1cbWQ5NWaukEyHFa9mrnsTRn7 iecRudJ6pFVL/XwzjNaL41Ux8eMin+CSENwomGFhUMEynVwUCXAHNVsg7vrGglfra1OC +9BzbhndqJBhuZ/udHJ9HcSPW6GrHl4W+R2YoEy8g5fSFGbLXc8HH9GlkhO0KIpDt4xi EBrWd9jCRY6z0IqCdxfxN5ZVtn+X9HuJM2f2UJWyajUJtI2QJopH+sgLj2bSI+UKsXz7 vzb0BoJRMOG7Rjqf+w+jRPooHb8AAliZ/NjhNbXiTZ2Eh9fNk8aJmTYDjXhuK7j6jLJ6 TR4A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c17si21098428pfb.81.2018.11.21.01.23.14; Wed, 21 Nov 2018 01:23:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728428AbeKUSoM (ORCPT + 99 others); Wed, 21 Nov 2018 13:44:12 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:14694 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726195AbeKUSoM (ORCPT ); Wed, 21 Nov 2018 13:44:12 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id C816596B9046F; Wed, 21 Nov 2018 16:10:37 +0800 (CST) Received: from [127.0.0.1] (10.142.68.147) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.408.0; Wed, 21 Nov 2018 16:10:33 +0800 Subject: Re: Resend: How to handle the SMMU RAS Error in the kernel To: James Morse , gengdongjiu CC: , arm-mail-list , , "Linux Kernel Mailing List" References: From: gengdongjiu Message-ID: <7a93bf1b-7b7b-240e-bc74-68bf0ee18ac9@huawei.com> Date: Wed, 21 Nov 2018 16:10:26 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.142.68.147] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi James, Thanks for the mail. On 2018/11/20 2:05, James Morse wrote: > Hi gengdongjiu, > > On 17/11/2018 15:41, gengdongjiu wrote: >> In the current kernel, it only handles three kinds of error, which is >> memory error, PCIE device and ARM process. But now the SMMU already >> support the RAS, how to handle the SMMU RAS error in the kernel? > > What errors are being detected here? > > I don't know much about the SMMU, but I think we should start with a list of > errors that we want to handle. In our platform, the SMMU RAS error mainly include below which flow the SMMU spec: 1. one bit ECC error, reported as CE. 2. two bits ECC error, reported as UEU. 3. fetch error in the SMMUv3 spec, reported as UER. The 2 and 3 should be handled, but I do not know how do recovery to it. > > > Is this a v8.2 fault handling interrupt from the SMMU taken to EL3? > Or a cpu-access that was returned as external-abort? or a device access that was > told external-abort? it flows v8.2 RAS spec, it is a v8.2 fault handling interrupt from the SMMU taken to EL3. > > What do we intend to do with this error information? Does the DMA layer have > error handling we can hook this into? we can get the DMA layer error from the RAS registers, such as DMA read errors. may be the handle method is disabling this SMMU to avoid propagation. > > Is this just another interface for memory-errors? (e.g the SMMU provides a > device/address pair and the kernel works out the physical page to run > memory_failure() on) I need to check more. > > >> I check the UEFI_SPEC_2.7, the ACPI's CPER have the IOMMU type, but it >> seems the IOMMU type only are specific to AMD’s IOMMU specification, > > ... and Intel VT-d. It looks like UEFI generalises all these as types of 'DMAr'. yes, it is. > > >> not have the ARM’s IOMMU section type, can we reuse this IOMMU section >> type for the ARM SMMU? > > The architecture specific records for AMD? No. Even if the information was the > same, the presence of this record tells you its an AMD IOMMU, which its not. > > The generic error section? Maybe. > > Assuming the 'fault reason' list in Table 285 is sufficient to cover our list or > errors, we can use the 'DMAr Generic Errors' section, (N.2.11.1), to describe > the generic bits of the error ... but SMMU doesn't have an 'Architecture Type', > so we at least need to get one allocated. > > We will probably need an architecture specific 'DMAr Error Section'. > > > I think adding the UEFI bits is probably the 'easy' bit. We should start with a > list of errors, and the error handling code. This way we know what we need to > add to the spec. The list of SMMU RAS error is shown below, but I still do not know how to handle it. 1. one bit ECC error 2. two bits ECC error 3. fetch error in the SMMUv3 spec > > > > Thanks, > > James > > . >