Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp1528540ybz; Thu, 16 Apr 2020 10:45:16 -0700 (PDT) X-Google-Smtp-Source: APiQypIjZusfFhys7X/DPsQwZBCfjVpMCRy72B8DSIN5nHEl+6jJfdKY4PPgP0ic7sQ6PKr+ySA9 X-Received: by 2002:aa7:c152:: with SMTP id r18mr30438793edp.378.1587059116474; Thu, 16 Apr 2020 10:45:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587059116; cv=none; d=google.com; s=arc-20160816; b=YYHxyU/Au65nLJZw9ej+9m9hOviFkYmlD/mWkDHOLQnpLWebJo6pmapo0pjRq1l5XT KhJs4PGKeDMW0cZJcx3sMdbul/8KlDrJWhsE1rdIaZ0QtHzSFJZJ9T1dOIk8k/GgRkia +fpdClthjCmxSi3zUANmJqvnNKPtnhhtLRgoLgZNtBMYSonEhg5K6ZP1U3fWL0wcglXN BlbPEFmMj+CZvzgV4iNvP/qAbgLDKvdLd52nEIYu3dbGgfcqN6Sz3kvZYhdtlcy1S27w RsMO0d3zhcOcih9x7OWl1TneIXSksfdV39dERXCjhil+RgrSecoAAXcn580vHld1780e TzfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=3CYNJ2mt0BKK+tFl/2WUyHZaYUl8r+mqzeOqE5JmTMk=; b=TmOtLcNNjyPnSxRnFu0D0s9jDD9qXLis0qm1BRIwRSCh2SCUVVZBIOoZTyliD2lgWs WSkp1wcjHD73kD79uZfJDZiYoJAX5LRGulgrQZV8faSSXhWjF68tqT58XOzkVFNoi4QR LsrA18rTSsbouCRtggkn96IJnIoeg5YaaQdULYYvWADpYAa8PZmn+Ku30YRdpthv5eBw +FmXjZhX+dINqDvwnVUwCoMgJnPqRkWJ7gzF978BXbUtEHEnjSTEEGmf9E3cuvO2Almt 054Q4Yywwxbov6RePamRbyryMAYQZXM/s1w5IEfGVwt6Lp1e7jB6n7/abr2Cg4fmgMDE kbJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w29si5092902edi.197.2020.04.16.10.44.53; Thu, 16 Apr 2020 10:45:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2635031AbgDPMIS (ORCPT + 99 others); Thu, 16 Apr 2020 08:08:18 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:38952 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2634744AbgDPMIK (ORCPT ); Thu, 16 Apr 2020 08:08:10 -0400 Received: from DGGEMS406-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 0B70046E24AE8485149C; Thu, 16 Apr 2020 20:08:06 +0800 (CST) Received: from [127.0.0.1] (10.142.68.147) by DGGEMS406-HUB.china.huawei.com (10.3.19.206) with Microsoft SMTP Server id 14.3.487.0; Thu, 16 Apr 2020 20:07:55 +0800 Subject: Re: [PATCH] KVM: handle the right RAS SEA(Synchronous External Abort) type To: James Morse CC: , , , , , , , , , , References: <20200411121740.37615-1-gengdongjiu@huawei.com> <0fa259ab-0e2f-a8b3-783d-24a725b4cc5d@arm.com> From: gengdongjiu Message-ID: <65414dc5-1cd7-003d-7c6a-5da62c6a4a1d@huawei.com> Date: Thu, 16 Apr 2020 20:07:52 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <0fa259ab-0e2f-a8b3-783d-24a725b4cc5d@arm.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.142.68.147] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi James On 2020/4/14 20:18, James Morse wrote: > Hi Geng, > > On 11/04/2020 13:17, Dongjiu Geng wrote: >> When the RAS Extension is implemented, b0b011000, 0b011100, >> 0b011101, 0b011110, and 0b011111, are not used and reserved >> to the DFSC[5:0] of ESR_ELx, but the code still checks these >> unused bits, so remove them. > > They aren't unused: CPUs without the RAS extensions may still generate these. > > kvm_handle_guest_abort() wants to know if this is an external abort. > KVM doesn't really care if the CPU has the RAS extensions or not, its the arch code's job > to sort all that out. No, handle_guest_sea() ---> ghes_notify_sea ---> apei driver If it is an external abort, it will call apei driver to handle it, but it should be only SEA will call the apei driver. other type of external abort should not call apei driver. I am not see arch code sort all that out. /* Synchronous External Abort? */ if (kvm_vcpu_dabt_isextabt(vcpu)) { /* * For RAS the host kernel may handle this abort. * There is no need to pass the error into the guest. */ if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu))) return 1; } > > >> If the handling of guest ras data error fails, it should >> inject data instead of SError to let the guest recover as >> much as possible. In some hardware platform, it supports RAS, but the RAS error address will be not recorded, so it is better to inject a data abort instead of SError for thtat platform. because guest will try to do recovery for the Synchronous data abort, such as kill the error application. But for SError, guest will be panic. > > (I don't quite follow your point here). > > If KVM injected a synchronous external abort due to a RAS error here, then you wouldn't be > able to support firmware-first RAS with Qemu. I don't think this is what you want. > > > The handling is (and should be) decoupled. > > KVM guests aren't special. Whatever happens for a normal user-space process is what should > happen here. KVM is just doing the plumbing: > > When the hypervisor takes an external abort due to the guest, it should plumb the error > into the arch code to be handled. This is what would happen for a normal EL0 process. > This is what do_sea() and kvm_handle_guest_sea() do with apei_claim_sea(). > > If the RAS code says it handled this error, then we can continue. For user-space, we > return to user-space. For a guest, we return to the guest. (for user-space this piece is > not quite complete in mainline, see: > https://lore.kernel.org/linux-acpi/20200228174817.74278-4-james.morse@arm.com/ ) > > This first part happens even if the errors are notified by IRQs, or found in a polled buffer. > > The RAS code may have 'handled' the memory by unmapping it, and marking the corresponding > page as HWPOISONed. If user-space tries to access this, it will be give an > SIGBUS:MCEERR_AR. If a guest tries to do this, the same things happens. (The signal goes > to Qemu). > (See do_page_fault()s use of the MCEERR si_code's, and kvm_send_hwpoison_signal) > > This second part is the same regardless of how the kernel discovered the RAS error in the > first place. > > > If the RAS code says it did not handle this error, it means it wasn't a RAS error, or your > platform doesn't support RAS. For an external-abort there is very little the hypervisor > can do in this situation. It does what KVM has always done: inject an asynchronous > external abort. > This should only happen if the host has failed to handle the error. KVM's use of > asynchronous abort is the simplest one size fits all. > > Are you seeing this happen? If so, what are the circumstances. Did the host handle the > error? (if not: why not!) Yes, some platform supports RAS but will not record the error address, so the host has failed to handle the error. > > > Thanks, > > James > . >