Received: by 10.223.164.202 with SMTP id h10csp493611wrb; Wed, 15 Nov 2017 03:20:01 -0800 (PST) X-Google-Smtp-Source: AGs4zMYIYjxzIFcx8MJqPqS0M6a1P8pOf2ci0lFnFQRgzgY3A0HAHEUOBxWoIYYwPDLLsX2CY1uB X-Received: by 10.84.241.204 with SMTP id t12mr6949614plm.93.1510744801622; Wed, 15 Nov 2017 03:20:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510744801; cv=none; d=google.com; s=arc-20160816; b=VHvFPlvZmzzzLQkg1Q6iMUkcdMDcXS3deVd/5a3aR+d+VF2tEY6LOCahx1PDtrPg4s nW0PscPBWg2vJdn33WOgIuj9Buvlr5QSvY1UrsDS2hGJiGLdmGpHEepbidgXc4q3hgd6 WnLVxvfTs0x1eUVmpldbzghUiKW92CuIkLFY5PTzGmIf0qLAyl5PA45fgfJkNuBKRuM4 MdXH8DI/bzhinhria8ILsNltaGEnoz6Wdo1OB51Km/MLXkI8EwDeXqlvDhiBMhzIfRNe 8nEBIwkfd0aRwIOqxGEzyRVHCX49dwh98KcKDEEIdHRiuU5LUMpsLVZO6rSsvRqWLouz H1ZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=8Ku2UfKSWrYLIV7xFc4qqLrP6ORiDpQKjNerxNYfP8U=; b=nkZcmMNdscVsP/BcFhWtlti/yF09qYpJj9CSrSC8dHY9a5z62BwRKcz9cnfyxD6HAP ld536lXNVUviXjdDvScQgDR23Kzt2NkqnxCELaoD7g2tipBZ4AsWOoJxXTYjz2qXnzO/ pZrUZzyVesAScA978fcyNCGO8rYx1axuDutMjxqyDf5M/zorifSAzgjzasmGmpzcCFy9 QlemGqZK9r1E1fYwAjthIiZJdZAaIVoHRHD1aJ5e+LQjosdQk1ODQu6j0V0X/Pv8LuVr ajRwRbJVufpNMj9fhNqIY6mT/+7KNm/s/OKwNw4GGdzDGKOjL9FbzOUg38+3pB6fCuZO B2Cw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e7si5504518pfl.148.2017.11.15.03.19.49; Wed, 15 Nov 2017 03:20:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757685AbdKOLHo (ORCPT + 88 others); Wed, 15 Nov 2017 06:07:44 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:10978 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756961AbdKOLHj (ORCPT ); Wed, 15 Nov 2017 06:07:39 -0500 Received: from 172.30.72.60 (EHLO DGGEMS408-HUB.china.huawei.com) ([172.30.72.60]) by dggrg05-dlp.huawei.com (MOS 4.4.6-GA FastPath queued) with ESMTP id DKZ72362; Wed, 15 Nov 2017 19:06:46 +0800 (CST) Received: from [127.0.0.1] (10.142.68.147) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.361.1; Wed, 15 Nov 2017 19:06:36 +0800 Subject: Re: [PATCH v8 0/7] Support RAS virtualization in KVM To: James Morse CC: , , , , , , , , , , , , , , , , , , References: <1510343650-23659-1-git-send-email-gengdongjiu@huawei.com> <5A0B132C.6070400@arm.com> From: gengdongjiu Message-ID: <18af7014-abcd-4593-cd56-208a446ca7c0@huawei.com> Date: Wed, 15 Nov 2017 19:06:33 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <5A0B132C.6070400@arm.com> Content-Type: text/plain; charset="windows-1252" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.142.68.147] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090206.5A0C1FC7.010D,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 174669afbe4a22541b1e5312f4f468f1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi James, Thank you very much for your comments and review. On 2017/11/15 0:00, James Morse wrote: > Hi Dongjiu Geng, > > On 10/11/17 19:54, Dongjiu Geng wrote: >> This series patches mainly do below things: >> >> 1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1, >> KVM will will do a minimum simulation, there registers are simulated >> to RAZ/WI in KVM. >> 2. Route synchronous External Abort exceptions from Non-secure EL0 >> and EL1 to EL2. When exception EL3 routing is enabled by firmware, >> system will trap to EL3 firmware instead of EL2 KVM, then firmware >> judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, >> otherwise jump to EL1 host kernel. >> 3. Enable APEI ARv8 SEI notification to parse the CPER records for SError >> in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI >> driver to parse the CPER record for SError which happened in the guest >> 4. Although we can use APEI driver to handle the guest SError, but not all >> system support SEI notification, such as kernel-first. So here KVM will >> also classify the Error through Exception Syndrome Register and do different >> approaches according to Asynchronous Error Type > >> 5. If the guest SError error is not propagated and not consumed, then KVM return >> recoverable error status to user-space, user-space will specify the guest ESR > > I thought we'd gone over this. There should be no RAS errors/notifications in > user space. Only the symptoms should be sent, using the SIGBUS_MCEERR_A{O,R} if > the kernel has handled as much as it can. This hides the actual mechanisms the > kernel and firmware used. Yes, I understand it. For guest SError, if it is not propagated and not consumed by PE, and the error address recorded by firmware is not accurate, what is your suggestion about this scenario ? I check again the comments in [0](as shown below), you ever suggest system panic. ----------------------------------------------------------------- "I think in this scenario your firmware should describe a memory-error with an unknown address. (i.e. don't set the 'physical address valid' bit in CPER's 'Table 275 Memory Error Record'). When Linux gets one of these, it should panic(): We know some memory is corrupt, we don't know where it is" ---------------------------------------------------------------- but I think it is not better, you ever have below concern in [0] "The fault may be in the page tables belonging to the guest kernel, even worse they may belong to they hypervisor's stage2 page tables" If it is in the page tables, killing the APP, the memory will be free. if there is another application will use this error address again, trigger another SError? you know the error still not consumed by PE , so we can isolated it by killing it. lets discuses the host EL0, if host El0 APP happen SError and error not consumed by the PE. do you mean we also panic host OS? > > User-space should not have to know how to handle RAS errors directly. This is a > service the operating system provides for it. This abstraction means the smae > user-space code is portable between x86, arm64, powerpc etc. > > What if the firmware uses another notification method? User space should expect > the kernel to hide things like this from it. > > If the kernel has no information to interpret a notification, how is user space > supposed to know? > > I understand you are trying to work around your 'memory corruption at an unknown > address'[0] problem, but if the kernel can't know where this corrupt memory is > it should really reboot. What stops this corrupt data being swapped to disk? > > Killing 'the thing' that was running at the time is not sufficient because we > don't know that this 'got' all the users of the corrupt memory. KSM can merge I think if we better using the ESB to isolate the error between EL0 and EL1, isolate the error between different guest. then the error will be isolate to El0 application if it happen in El0. When KSM running, the ESB can synchronize the error out instead of spread the error to other guests. > pages between guests. This is the difference between the error persisting > forever killing off all the VMs one by one, and the corrupt page being silently > re-read from disk clearing the error. > > >> and inject a virtual SError. For other Asynchronous Error Type, KVM directly >> injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the >> error is fatal. In the RAS extension, guest virtual ESR must be set, because >> all-zero means 'RAS error: Uncategorized' instead of 'no valid ISS', so set >> this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it. > > > Thanks, > > James > > > [0] https://www.spinics.net/lists/arm-kernel/msg605345.html > > . > From 1584058304023451033@xxx Tue Nov 14 16:08:40 +0000 2017 X-GM-THRID: 1583679840301277608 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread