Received: by 10.213.65.68 with SMTP id h4csp3971256imn; Tue, 10 Apr 2018 07:23:03 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/Nl+mNq+n7RzoeUl8clzf2EGbjdcyPkIVfa50sse5Yrhs8gGEqjUhh7gEacZbcmHRGgnVf X-Received: by 10.167.134.1 with SMTP id p1mr544766pfn.77.1523370183644; Tue, 10 Apr 2018 07:23:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523370183; cv=none; d=google.com; s=arc-20160816; b=NEQK2r3X7RJhwYaJYZm7lCyO21Wlhq8xP8hZCOjc1QdOzqgWN/K0qLBI7w9rda2Rwy zjWrUrRgWwCHp43UzgEWg3cbDaS4dhtNwJ2mKIJyFas6EuFMxXor8hR4txHcMGA7Wxdc H9jbsMJoll4Jo3bjyla8U9sEFUTmZO66BJxeh8dkRsTz4ktBP6tenaBuupsgdwLWWqFq QLLHasHsNU5f0fnYEAa3UFgVO/WytKd2qH66TAmJ3ugYbtyJbwILOwUqUcb55DpM+SNH FMsMY1Yl9zt8jAjg3f6a31j5HC3vmrs31UlOdRYX5ZofXe2UJFwsKcRD/fhX4brcTmrI F7jA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=Vl77CB5c5VXgLS7qF+UgqtKnZB0qgjzj3tEnujkaIts=; b=Y5GnvMzmOgZwOPT076C+gfSJ38tOd/jJM+wJxhHDUUa5s1gqc7pNjINO3Q8wmm50cO 7UVaI+Xdbi/+KS8eXc3gidRcEjv5CU1BfkRL0qPUPjZxUIuNatEH7/XPt5GpwJkvNt0z nAvFCJfbX8UGeplyXeR0gW5EYQI2GSC8lfjh8kck/hL1I95TAruH1GZNyVC+CvPO5+nK zIxQ8g6TY1C5CkajCALbRpll5NRGLqmXnjXoRHYfsIHTjLRw+K6nLYsYkU/Nadk14mcN X0NCAgs+/sTdwHWYITo/F8KFaHw6msCbbPsa/BH23AeNcNyGivdazVteaRHWi22cTvyx oMSw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i8si1880934pgt.279.2018.04.10.07.22.26; Tue, 10 Apr 2018 07:23:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754030AbeDJOSe (ORCPT + 99 others); Tue, 10 Apr 2018 10:18:34 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:38978 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753824AbeDJOS2 (ORCPT ); Tue, 10 Apr 2018 10:18:28 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C50621435; Tue, 10 Apr 2018 07:18:27 -0700 (PDT) Received: from [10.1.207.55] (melchizedek.cambridge.arm.com [10.1.207.55]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2792A3F59F; Tue, 10 Apr 2018 07:18:24 -0700 (PDT) Subject: Re: [PATCH v11 2/4] arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS To: Dongjiu Geng , devel@acpica.org Cc: rkrcmar@redhat.com, corbet@lwn.net, christoffer.dall@linaro.org, marc.zyngier@arm.com, linux@armlinux.org.uk, catalin.marinas@arm.com, rjw@rjwysocki.net, bp@alien8.de, lenb@kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-acpi@vger.kernel.org, huangshaoyu@huawei.com, zhengxiang9@huawei.com References: <1523309796-36423-1-git-send-email-gengdongjiu@huawei.com> <1523309796-36423-3-git-send-email-gengdongjiu@huawei.com> From: James Morse Message-ID: Date: Tue, 10 Apr 2018 15:15:35 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <1523309796-36423-3-git-send-email-gengdongjiu@huawei.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Dongjiu Geng, On 09/04/18 22:36, Dongjiu Geng wrote: > This new IOCTL exports user-invisible states related to SError. > Together with appropriate user space changes, it can inject > SError with specified syndrome to guest by setup kvm_vcpu_events > value. > Also it can support live migration. Could you explain what user-space is expected to do for this? (this is also relevant for snapshot-ing/suspending VMs) It's probably worth noting that this solves an existing problem: KVM may make an SError pending, but user-space has no way to discover/migrate this. > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index 8a3d708..45719b4 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -819,11 +819,13 @@ struct kvm_clock_data { > > Capability: KVM_CAP_VCPU_EVENTS > Extended by: KVM_CAP_INTR_SHADOW > -Architectures: x86 > +Architectures: x86, arm, arm64 > Type: vm ioctl > Parameters: struct kvm_vcpu_event (out) > Returns: 0 on success, -1 on error > > +X86: > + > Gets currently pending exceptions, interrupts, and NMIs as well as related > states of the vcpu. > > @@ -865,15 +867,31 @@ Only two fields are defined in the flags field: > - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that > smi contains a valid state. > > +ARM, ARM64: > + > +Gets currently pending SError exceptions as well as related states of the vcpu. > + > +struct kvm_vcpu_events { > + struct { > + __u8 serror_pending; > + __u8 serror_has_esr; > + /* Align it to 4 bytes */ > + __u8 pad[2]; > + __u64 serror_esr; > + } exception; > +}; > + I'm not convinced we should change this struct from the layout/size x86 has. Its confusing for the documentation, is this API call really the same on all architectures? What if we want to add some future interrupt, NMI or related state? We've found ourselves needing to add this API, it seems odd to remove its other uses on x86. We can't put them back in the future. Having a different layout would force user-space to ifdef/duplicate any code that accesses this between architectures. The compiler will want that __u64 to be naturally aligned to 8-bytes, so your 4-byte padding still causes some secret compiler-padding to be inserted. Different versions of the compiler may put it in different places. > 4.32 KVM_SET_VCPU_EVENTS > > Capability: KVM_CAP_VCPU_EVENTS > Extended by: KVM_CAP_INTR_SHADOW > -Architectures: x86 > +Architectures: x86, arm, arm64 > Type: vm ioctl > Parameters: struct kvm_vcpu_event (in) > Returns: 0 on success, -1 on error > > +X86: > + > Set pending exceptions, interrupts, and NMIs as well as related states of the > vcpu. > > @@ -894,6 +912,12 @@ shall be written into the VCPU. > > KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available. > > +ARM, ARM64: > + > +Set pending SError exceptions as well as related states of the vcpu. > + > +See KVM_GET_VCPU_EVENTS for the data structure. > + > > 4.33 KVM_GET_DEBUGREGS > > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h > index 9abbf30..855cc9a 100644 > --- a/arch/arm64/include/uapi/asm/kvm.h > +++ b/arch/arm64/include/uapi/asm/kvm.h > @@ -39,6 +39,7 @@ > #define __KVM_HAVE_GUEST_DEBUG > #define __KVM_HAVE_IRQ_LINE > #define __KVM_HAVE_READONLY_MEM > +#define __KVM_HAVE_VCPU_EVENTS > > #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 > > @@ -153,6 +154,17 @@ struct kvm_sync_regs { > struct kvm_arch_memory_slot { > }; > > +/* for KVM_GET/SET_VCPU_EVENTS */ > +struct kvm_vcpu_events { > + struct { > + __u8 serror_pending; > + __u8 serror_has_esr; > + /* Align it to 4 bytes */ > + __u8 pad[2]; (padding noted above) > + __u64 serror_esr; > + } exception; > +}; > + > /* If you need to interpret the index values, here is the key: */ > #define KVM_REG_ARM_COPROC_MASK 0x000000000FFF0000 > #define KVM_REG_ARM_COPROC_SHIFT 16 > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c > index 5c7f657..42e1222 100644 > --- a/arch/arm64/kvm/guest.c > +++ b/arch/arm64/kvm/guest.c > @@ -277,6 +277,37 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, > return -EINVAL; > } > > +int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu, > + struct kvm_vcpu_events *events) > +{ > + events->exception.serror_pending = (vcpu_get_hcr(vcpu) & HCR_VSE); > + events->exception.serror_has_esr = > + cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && > + (!!vcpu_get_vsesr(vcpu)); > + events->exception.serror_esr = vcpu_get_vsesr(vcpu); This will return a stale ESR even if nothing is pending. On systems without the RAS extensions it will return 'ESR_ELx_ISV' if kvm_inject_vabt() has ever been called for this CPU. Could we hide this behind (pending && has_esr), setting it to 0 otherwise. This is just to avoid exposing the stale value. > + > + return 0; > +} > +int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, > + struct kvm_vcpu_events *events) > +{ > + bool injected = events->exception.serror_pending; > + bool has_esr = events->exception.serror_has_esr; > + > + if (injected && has_esr) { > + if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) > + return -EINVAL; > + > + kvm_set_sei_esr(vcpu, events->exception.serror_esr); > + > + } else if (injected) { > + kvm_inject_vabt(vcpu); Nit: looks like 'injected' is misnamed. > + } > + > + return 0; > +} > + > int __attribute_const__ kvm_target_cpu(void) > { > unsigned long implementor = read_cpuid_implementor(); > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index 38c8a64..20e919a 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -82,6 +82,7 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext) > break; > case KVM_CAP_SET_GUEST_DEBUG: > case KVM_CAP_VCPU_ATTRIBUTES: > + case KVM_CAP_VCPU_EVENTS: > r = 1; > break; > default: > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 7e3941f..945655d 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -1051,6 +1051,27 @@ long kvm_arch_vcpu_ioctl(struct file *filp, > return -EFAULT; > return kvm_arm_vcpu_has_attr(vcpu, &attr); > } > + case KVM_GET_VCPU_EVENTS: { > + struct kvm_vcpu_events events; > + > + memset(&events, 0, sizeof(struct kvm_vcpu_events)); sizeof(events) is the normal style here, it means if someone changes event's type we don't get any surprises... > + if (kvm_arm_vcpu_get_events(vcpu, &events)) > + return -EINVAL; > + > + if (copy_to_user(argp, &events, sizeof(struct kvm_vcpu_events))) sizeof(events) > + return -EFAULT; > + > + return 0; > + } > + case KVM_SET_VCPU_EVENTS: { > + struct kvm_vcpu_events events; > + > + if (copy_from_user(&events, argp, > + sizeof(struct kvm_vcpu_events))) > + return -EFAULT; > + > + return kvm_arm_vcpu_set_events(vcpu, &events); > + } > default: > return -EINVAL; > } > Despite KVM_CAP_VCPU_EVENTS not being advertised on 32bit, any attempt to call it will still end up in here, but will always fail as the {g,s}et_events() calls always return -EINVAL. I don't think this will cause us any problems. Thanks, James