Received: by 10.223.176.46 with SMTP id f43csp1418544wra; Sat, 20 Jan 2018 18:56:35 -0800 (PST) X-Google-Smtp-Source: AH8x227VjRGH9z9fapaOmSxvdR6r8gZ5gH8HaWC5uMn6gjWZLgNZdCBS8RK2dRZfB26DSdyHzhOz X-Received: by 10.101.81.7 with SMTP id f7mr3486890pgq.443.1516503395005; Sat, 20 Jan 2018 18:56:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516503394; cv=none; d=google.com; s=arc-20160816; b=AYCsMXlDQEpTEzzqplW5ZhB4wcPA/gKeHou+Hbw+VemM+HTCAkiXJvX2SE1/sGSNrJ ryNLsKzEouyer9VuW//8Mkc3BN52yqAM8G2HXco2ZzGLKT41Adc2z/a6g2fk2IrlXZCo qmc6LQGQvTTCZtVuVPnxRS76Z8Je7kPotCbF5EITvVT9ex6s5sk0uoiUxh0CKks9Jd1Y lStK69TgzQXfyWq6U1lEwYZFkzvQN2ceHkPDvN4Si/5Sdfa3thFF7KfC/RqKPP5HYHlX nd+gQZiE8LhKrEmA8x9XARRw1nQSqFs44x48GZSXba2Em2UyxknFBK3En8FMRUwHOTi+ r78A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=gkH1y0xeJ1a7+M4XvEzokdxfoymYP3QrL7tMk5LAsMY=; b=xUscmZlghJEc4/uwWb1nlsUGOFh9FhaNKvMR2Xxxgh63+uVLwdORH+9sfuCKXcpWdm CxXXvZvkZB1gQpvd10UzkTd8p31Pg5JzHzSjXzsLQsBLSCoMqT7hAFcrUOV0qI9mzFsl Q3pd/LowvAqwXMvo5fUahn2AZCK5PEzHdIYlhgmSZRQsfAbmvoNhrdz46AzYucmeNLn/ Auebp6F52R13/QfWfk5ZR83OmFz235xzhsWyRg5Z+SSv5R4utdx1+EO3czWXicDyYdH0 PamqsTu/y4FWw0ilA/WAQmbun848q5PHnl75XbumZYpX+fAaLwGr5j+vJ6l1zPwdBz4X TUOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Ay2zwplT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 36-v6si2200771pla.343.2018.01.20.18.56.19; Sat, 20 Jan 2018 18:56:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Ay2zwplT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932719AbeAUCyz (ORCPT + 99 others); Sat, 20 Jan 2018 21:54:55 -0500 Received: from mail-ua0-f193.google.com ([209.85.217.193]:34220 "EHLO mail-ua0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932145AbeAUCyq (ORCPT ); Sat, 20 Jan 2018 21:54:46 -0500 Received: by mail-ua0-f193.google.com with SMTP id d1so3669137uak.1; Sat, 20 Jan 2018 18:54:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=gkH1y0xeJ1a7+M4XvEzokdxfoymYP3QrL7tMk5LAsMY=; b=Ay2zwplT8oQBWnIpk8DhPBf9WNn9ffKu238cyWvzg93DRXIsPJ/z4nCNXZJ4qHX4jt PrgWQkZ7CVwmynfprpvQm3djXTkZKUmG8+jgUvmZTSPClX4kzOrCCjmPxrrsycJ9LrHb r970jhsWg+77cssTkVgqsPcZKNh274l8gxbOR8VYqiw/GnTfLX7y0v9TXHIb2Uy6rSQ6 mEljnDElsQ1bWMfMn1T3EVxjTfBXsrF1WXK+iI+KBRTpdAqPkpPDpSFeisBjp1Bo7wqr rp0mhLn3i+fTH25wJH90akyT9BhIaBqRTQB4P8FIeHZQn/uIsqDR4KVdigTmpbeaKzBn WMiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=gkH1y0xeJ1a7+M4XvEzokdxfoymYP3QrL7tMk5LAsMY=; b=BeG7cqPDyQDBMS2S7tY7P7fYmZv5HA/3B9FleBGv7dzpe1VaiReM4FFioBnCQMhT3P 1/JHx5+u8XiP20w9x1EjmwB9win7JyeMV/lkIpS+gpoAo7+NpMoPbS2ZgO+dFCC69Wq3 huW2Fk6rQTSIO10dct70BgR+55kde5akgIT2NakpH4XgwDpAI6DTA5XA+lHWJg+HDhpj gATS/qk9MLZhrhYLxORXLUzI5OKb9zKAskGAS6Gi0LzWAhsTSciBzqziY+UPMck2LBGP lSDAuRP0NVliWOiaEhEZ15qNg/DNeo9BXoayLWoC73yxRyDdTYa4wtSWzAElESZUNXbN SNVw== X-Gm-Message-State: AKwxytfoFplBaSGfyCu/B8Ar9T/TWQlfcBOYWzsaVBa+uZFDkZUCA8Ky FpSSBm2EFT7Ys+Z7EhQfMGzF/LsGrjfZZB2PU/I= X-Received: by 10.176.72.83 with SMTP id c19mr2355117uad.75.1516503285501; Sat, 20 Jan 2018 18:54:45 -0800 (PST) MIME-Version: 1.0 Received: by 10.159.37.131 with HTTP; Sat, 20 Jan 2018 18:54:45 -0800 (PST) In-Reply-To: <5A58F8EB.9080909@arm.com> References: <1510343650-23659-1-git-send-email-gengdongjiu@huawei.com> <1510343650-23659-8-git-send-email-gengdongjiu@huawei.com> <5A0B1334.7060500@arm.com> <5A58F8EB.9080909@arm.com> From: gengdongjiu Date: Sun, 21 Jan 2018 10:54:45 +0800 Message-ID: Subject: Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization To: James Morse Cc: Dongjiu Geng , wuquanming , linux-doc@vger.kernel.org, kvm@vger.kernel.org, Marc Zyngier , linux@armlinux.org.uk, linuxarm@huawei.com, Linux Kernel Mailing List , linux-acpi@vger.kernel.org, arm-mail-list , Huangshaoyu , kvmarm@lists.cs.columbia.edu, devel@acpica.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2018-01-13 2:05 GMT+08:00 James Morse : > Hi gengdongjiu, > > On 16/12/17 04:47, gengdongjiu wrote: >> [...] >>> >>>> + case ESR_ELx_AET_UER: /* The error has not been propagated */ >>>> + /* >>>> + * Userspace only handle the guest SError Interrupt(SEI) if the >>>> + * error has not been propagated >>>> + */ >>>> + run->exit_reason = KVM_EXIT_EXCEPTION; >>>> + run->ex.exception = ESR_ELx_EC_SERROR; >>>> + run->ex.error_code = KVM_SEI_SEV_RECOVERABLE; >>>> + return 0; >>> >>> We should not pass RAS notifications to user space. The kernel either handles >>> them, or it panics(). User space shouldn't even know if the kernel supports RAS >> >> For the ESR_ELx_AET_UER(Recoverable error), let us see its definition >> below, which get from [0] > > [..] > >> so we can see the exception is precise and PE can recover execution >> from the preferred return address of the exception, > >> so let guest handling it is >> better, for example, if it is guest application RAS error, we can kill >> the guest application instead of panic whole OS; if it is guest kernel >> RAS error, guest will panic. > > If the kernel takes an unhandled RAS error it should panic - we don't know where > the error is. OK, here I will panic. > > I understand you want to kill-off guest tasks as a result of RAS errors, but > this needs to go through the whole APEI->memory_failure()->sigbus machinery so > that the kernel knows the kernel can keep running. > > This saves us signalling user-space when we don't need to. An example: > code-corruption. Linux can happily re-read affected user-space executables from > disk, there is absolutely nothing user-space can do about it. > Handling errors first in the kernel allows us to do recovery for all the > affected processes, not just the one that happens to be running right now. > > >> Host does not know which application of guest has error, so host can >> not handle it, > > It has to work this out, otherwise the errors we can handle never get a chance. > > This kernel is expected to look at the error description, (which for some reason > we aren't talking about here), e.g. the CPER records, and determine what > recovery action is necessary for this error. > For memory errors this may be re-reading from disk, or at the worst case, > unmapping from all user-space users (including KVM's stage2) and raining signals > on all affected processes. > > For a memory error the important piece of information is the physical address. > Only the kernel can do anything with this, it determines who owns the affected > memory and what needs doing to recover from the error. > > If you pass the notification to user-space, all it can do is signal the guest to > "stop doing whatever it is you're doing". The guest may have been able to > re-read pages from disk, or otherwise handle the error. > Has the error been handled? No: The error remains latent in the system. > > >> panic OS is not a good choice for the Recoverable error. > > If we don't know where the error is, and we can't make progress, its the only > sane choice. Ok, I will panic here. > > This code is never expected to run! (why are we arguing about it?) We should get > RAS errors as GHES notifications from firmware via some mechanism. If those are > NOTIFY_SEI then APEI should claim the notification and kick off the appropriate > handling based on the CPER records. If/when we get kernel-first, that can claim > the SError. What we're left with is RAS notifications that no-one claimed > because there was no error-description found. > > > > James