Received: by 10.223.176.5 with SMTP id f5csp2320864wra; Mon, 5 Feb 2018 01:59:24 -0800 (PST) X-Google-Smtp-Source: AH8x224xO2Gy0b2E3LFhLz3IS1RzyqnnQoxUHPLJQLMNJCX2G2289DHM9KT0i8Qfeg7lvlnGARdG X-Received: by 2002:a17:902:b2c2:: with SMTP id x2-v6mr13266744plw.414.1517824764817; Mon, 05 Feb 2018 01:59:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517824764; cv=none; d=google.com; s=arc-20160816; b=lBX2WUoXXUrhLcteBbDt3ivwXmavcOkaQykQuUeOMomn9R8FlL1u/mNmr7VGsEBDfn LC3T+RJf6pQ90VIPpvy3zlmOWnq+phPHZxBlxUINSInoDGlvmbAPF/Ssv2nRFA8spMgJ PVQ2P3Q24ssBG4aEZUnJGQYXd4t5NdgUoUvTmX8Z+/26ioP6naPOpHoHTG/yOQonDuxQ UguQTwlXk+Vfr+1ScarN+pXitTbbmxpstx3WIJHRibU1Miok8e2/WKhL2H8nhBqauVBx BM7TLo5KruKXdANmIxjNWNJug9G455XdylZrg3PIyRoDZhS7tEYAvXs4CJzmLyvIxm0Y FuBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=FfMH88+evAb7MJNNgAqwHH0W0u/vAITfgSAkZeq3naA=; b=Bh3ce/jXyk2SblVbPhGhDscvFiTn1suVjv/dmkS3kKDMWjTJ3bYl3n+VF/spXBG+fs mRxt/pgX6s9vgPdfoiV1Rwcb0IQRawd+fzQUcEjYsW3WTl1CGt0nSTUO990ZfCMgCSwa jm3wnrdhNnW0h3yP7FYEZSy+gLQYopmT1pAIg//vmyRMuiPC9PVymAtwysivIuzryjys ta6jcw2dQVn/zsU5o9W596EC3NpbZhXMZ6wJRClhQ5TH5wpzm/oX6rmCLwakIb0MtM0W ACEdiO2V3FG1Hbje+H0921vpnuQISD/KVAifDcZ/9qqB7rR3e8pSZ8SN72S99i7XNnWg UqKA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v10-v6si6553662ply.177.2018.02.05.01.59.09; Mon, 05 Feb 2018 01:59:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752719AbeBEJ6g (ORCPT + 99 others); Mon, 5 Feb 2018 04:58:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48996 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751705AbeBEJ63 (ORCPT ); Mon, 5 Feb 2018 04:58:29 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AC6BDC04D317; Mon, 5 Feb 2018 09:58:29 +0000 (UTC) Received: from hawk.localdomain (unknown [10.40.205.34]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1D0A95D6A3; Mon, 5 Feb 2018 09:58:25 +0000 (UTC) Date: Mon, 5 Feb 2018 10:58:23 +0100 From: Andrew Jones To: Marc Zyngier Cc: Christoffer Dall , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, Catalin Marinas , Will Deacon , Peter Maydell , Lorenzo Pieralisi , Mark Rutland , Robin Murphy , Ard Biesheuvel , Hanjun Guo , Jayachandran C , Jon Masters , Russell King - ARM Linux Subject: Re: [PATCH v3 08/18] arm/arm64: KVM: Add PSCI version selection API Message-ID: <20180205095823.r5vlfkovrgksykxh@hawk.localdomain> References: <20180201114657.7323-1-marc.zyngier@arm.com> <20180201114657.7323-9-marc.zyngier@arm.com> <20180202201706.dj6jmff7lh7bleze@kamzik.brq.redhat.com> <20180203115932.24c4bcfb@why.wild-wind.fr.eu.org> <20180204123701.GK21802@cbox> <6b0d5f23-950c-b2a6-3cc8-63c3145893b4@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6b0d5f23-950c-b2a6-3cc8-63c3145893b4@arm.com> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 05 Feb 2018 09:58:29 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 05, 2018 at 09:24:33AM +0000, Marc Zyngier wrote: > On 04/02/18 12:37, Christoffer Dall wrote: > > On Sat, Feb 03, 2018 at 11:59:32AM +0000, Marc Zyngier wrote: > >> On Fri, 2 Feb 2018 21:17:06 +0100 > >> Andrew Jones wrote: > >> > >>> On Thu, Feb 01, 2018 at 11:46:47AM +0000, Marc Zyngier wrote: > >>>> Although we've implemented PSCI 1.0 and 1.1, nothing can select them > >>>> Since all the new PSCI versions are backward compatible, we decide to > >>>> default to the latest version of the PSCI implementation. This is no > >>>> different from doing a firmware upgrade on KVM. > >>>> > >>>> But in order to give a chance to hypothetical badly implemented guests > >>>> that would have a fit by discovering something other than PSCI 0.2, > >>>> let's provide a new API that allows userspace to pick one particular > >>>> version of the API. > >>>> > >>>> This is implemented as a new class of "firmware" registers, where > >>>> we expose the PSCI version. This allows the PSCI version to be > >>>> save/restored as part of a guest migration, and also set to > >>>> any supported version if the guest requires it. > >>>> > >>>> Signed-off-by: Marc Zyngier > >>>> --- > >>>> Documentation/virtual/kvm/api.txt | 3 +- > >>>> Documentation/virtual/kvm/arm/psci.txt | 30 +++++++++++++++ > >>>> arch/arm/include/asm/kvm_host.h | 3 ++ > >>>> arch/arm/include/uapi/asm/kvm.h | 6 +++ > >>>> arch/arm/kvm/guest.c | 13 +++++++ > >>>> arch/arm64/include/asm/kvm_host.h | 3 ++ > >>>> arch/arm64/include/uapi/asm/kvm.h | 6 +++ > >>>> arch/arm64/kvm/guest.c | 14 ++++++- > >>>> include/kvm/arm_psci.h | 9 +++++ > >>>> virt/kvm/arm/psci.c | 68 +++++++++++++++++++++++++++++++++- > >>>> 10 files changed, 151 insertions(+), 4 deletions(-) > >>>> create mode 100644 Documentation/virtual/kvm/arm/psci.txt > >>>> > >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > >>>> index 57d3ee9e4bde..334905202141 100644 > >>>> --- a/Documentation/virtual/kvm/api.txt > >>>> +++ b/Documentation/virtual/kvm/api.txt > >>>> @@ -2493,7 +2493,8 @@ Possible features: > >>>> and execute guest code when KVM_RUN is called. > >>>> - KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode. > >>>> Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only). > >>>> - - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 for the CPU. > >>>> + - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 (or a future revision > >>>> + backward compatible with v0.2) for the CPU. > >>>> Depends on KVM_CAP_ARM_PSCI_0_2. > >>>> - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU. > >>>> Depends on KVM_CAP_ARM_PMU_V3. > >>>> diff --git a/Documentation/virtual/kvm/arm/psci.txt b/Documentation/virtual/kvm/arm/psci.txt > >>>> new file mode 100644 > >>>> index 000000000000..aafdab887b04 > >>>> --- /dev/null > >>>> +++ b/Documentation/virtual/kvm/arm/psci.txt > >>>> @@ -0,0 +1,30 @@ > >>>> +KVM implements the PSCI (Power State Coordination Interface) > >>>> +specification in order to provide services such as CPU on/off, reset > >>>> +and power-off to the guest. > >>>> + > >>>> +The PSCI specification is regularly updated to provide new features, > >>>> +and KVM implements these updates if they make sense from a virtualization > >>>> +point of view. > >>>> + > >>>> +This means that a guest booted on two different versions of KVM can > >>>> +observe two different "firmware" revisions. This could cause issues if > >>>> +a given guest is tied to a particular PSCI revision (unlikely), or if > >>>> +a migration causes a different PSCI version to be exposed out of the > >>>> +blue to an unsuspecting guest. > >>>> + > >>>> +In order to remedy this situation, KVM exposes a set of "firmware > >>>> +pseudo-registers" that can be manipulated using the GET/SET_ONE_REG > >>>> +interface. These registers can be saved/restored by userspace, and set > >>>> +to a convenient value if required. > >>>> + > >>>> +The following register is defined: > >>>> + > >>>> +* KVM_REG_ARM_PSCI_VERSION: > >>>> + > >>>> + - Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set > >>>> + (and thus has already been initialized) > >>>> + - Returns the current PSCI version on GET_ONE_REG (defaulting to the > >>>> + highest PSCI version implemented by KVM and compatible with v0.2) > >>>> + - Allows any PSCI version implemented by KVM and compatible with > >>>> + v0.2 to be set with SET_ONE_REG > >>>> + - Affects the whole VM (even if the register view is per-vcpu) > >>> > >> > >> Hi Drew, > >> > >> Thanks for looking into this, and for the exhaustive data. > >> > >>> > >>> I've put some more thought and experimentation into this. I think we > >>> should change to a vcpu feature bit. The feature bit would be used to > >>> force compat mode, v0.2, so KVM would still enable the new PSCI > >>> version by default. Below are two tables describing why I think we > >>> should switch to something other than a new sysreg, and below those > >>> tables are notes as to why I think we should use a vcpu feature. The > >>> asterisks in the tables point out behaviors that aren't what we want. > >>> While both tables have an asterisk, the sysreg approach's issue is > >>> bug. The vcpu feature approach's issue is risk incurred from an > >>> unsupported migration, albeit one that is hard to detect without a > >>> new machine type. > >>> > >>> +-----------------------------------------------------------------------+ > >>> | sysreg approach | > >>> +------------------+-----------+-------+--------------------------------+ > >>> | migration | userspace | works | notes | > >>> | | change | | | > >>> +------------------+-----------+-------+--------------------------------+ > >>> | new -> new | NO | YES | Expected | > >>> +------------------+-----------+-------+--------------------------------+ > >>> | old -> new | NO | YES | PSCI 1.0 is backward compatible| > >>> +------------------+-----------+-------+--------------------------------+ > >>> | new -> old | NO | NO | Migration fails due to the new | > >>> | | | | sysreg. Migration shouldn't | > >>> | | | | have been attempted, but no | > >>> | | | | way to know without a new | > >>> | | | | machine type. | > >>> +------------------+-----------+-------+--------------------------------+ > >>> | compat -> old | YES | NO* | Even when setting PSCI version | > >>> | | | | to 0.2, we add a new sysreg, | > >>> | | | | so migration will still fail. | > >>> +------------------+-----------+-------+--------------------------------+ > >>> | old -> compat | YES | YES | It's OK for the destination to | > >>> | | | | support more sysregs than the | > >>> | | | | source sends. | > >>> +------------------+-----------+-------+--------------------------------+ > >>> > >>> > >>> +-----------------------------------------------------------------------+ > >>> | vcpu feature approach | > >>> +------------------+-----------+-------+--------------------------------+ > >>> | migration | userspace | works | notes | > >>> | | change | | | > >>> +------------------+-----------+-------+--------------------------------+ > >>> | new -> new | NO | YES | Expected | > >>> +------------------+-----------+-------+--------------------------------+ > >>> | old -> new | NO | YES | PSCI 1.0 is backward compatible| > >>> +------------------+-----------+-------+--------------------------------+ > >>> | new -> old | NO | YES* | Migrates, but it's not safe | > >>> | | | | for the guest kernel, and no | > >>> | | | | way to know without a new | > >>> | | | | machine type. | > >>> +------------------+-----------+-------+--------------------------------+ > >>> | compat -> old | YES | YES | Expected | > >>> +------------------+-----------+-------+--------------------------------+ > >>> | old -> compat | YES | YES | Expected | > >>> +------------------+-----------+-------+--------------------------------+ > >>> > >>> > >>> Notes as to why the vcpu feature approach was selected: > >>> > >>> 1) While this is VM state, and thus a VM control would be a more natural > >>> fit, a new vcpu feature bit would be much less new code. We also > >>> already have a PSCI vcpu feature bit, so a new one will actually fit > >>> quite well. > >>> > >>> 2) No new state needs to be migrated, as we already migrate the feature > >>> bitmap. Unlike, KVM, QEMU doesn't track the max number of features, > >>> so bumping it one more in KVM doesn't require a QEMU change. > >>> > >>> > >>> If we switch to a vcpu feature bit, then I think this patch can be > >>> replaced with something like this > >> > >> A couple of remarks: > >> > >> - My worry with this feature bit is that it is a point fix, and it > >> doesn't scale. Come PSCI 1.2 and WORKAROUND_2, what do we do? Add > >> another feature bit that says "force to 1.0"? I'd really like > >> something we can live with in the long run, and "KVM as firmware" > >> needs to be able to evolve without requiring a new userspace > >> interface each time we rev it. > >> > >> - The "compat->old" entry in your sysreg table is not quite fair. In > >> the feature table, you teach userspace about the new feature bit. You > >> could just as well teach userspace about the new sysreg. Yes, things > >> may be different in QEMU, but that's not what we're talking about > >> here. > >> > >> - Allowing a guest to migrate in an unsafe way seems worse than failing > >> a migration unexpectedly. Or at least not any better. > >> > >> To be clear: I'm not dismissing the idea at all, but I want to make sure > >> we're not cornering ourselves into an uncomfortable place. > >> > >> Christoffer, Peter, what are your thoughts on this? > >> > > > > Taking a step back, the only reasons why this patch isn't simply > > enabling PSCI v1.0 by default (without any selection method) are that we > > (1) want to support guests that complain about PSCI_VERSION != 0.2 > > (which isn't completely outside the realm of a reasonable implementation > > if you read the description of PSCI_VERSION in the 0.2 spec) and (2) to > > provide migration support for guests that call > > PSCI_1_0_FN_PSCI_FEATURES. > > > > If we ignore (1) because we don't know of any guests where this is an > > issue, then it's all about (2), migration from "new -> old". > > > > As far as I can tell, the use case we are worried about here is updating > > the kernel (and not QEMU) on half of your data center and then trying to > > migrate from the upgraded kernel machine to a legacy (and potentially > > variant 2 vulnerable) machine. For this specific move from PSCI 0.2 to > > 1.0 with the included mitigation, I don't really think this is an > > important use case to support. > > I'm not so sure. Promising mitigation to a guest, and then seeing that > mitigation being silently taken away because we've allowed it to migrate > seem bad to me. > > > In terms of the more general approach to "KVM firmware upgrades" and > > migration, I think something like the proposed FW register interface > > here should work, but I'm concerned about the lack of opportunity from > > userspace to predict a migration failure. But I don't understand why > > Userspace could predict some of the failure cases, if only by checking > that all registers can be restored in a new guest. I'm not sure how > viable this is in a data centre type of environment. > > > this requires a new machine type? Why can't we simply provide a KVM > > capability that libvirt etc. can query for? > > > > Also, is it generally true that we can't expose any additional system > > registers from KVM without breaking migration and we don't have any > > method to deal with that in userspace and upper layers? If that's true, > > It is my understanding that each time we add a new sysreg to KVM, > migration in QEMU breaks in the new->old direction. > > > that's a bigger problem in general and something we should work on > > trying to solve. If it's not true, then there should be some method to > > deal with the FW register already (like capabilities). > > > > Given the urgency of adding mitigation towards variant 2 which is the > > driver for this work, I think we should drop the compat functionality in > > this series and work this out later on if needed. I think we can just > > tweak the previous patch to enable PSCI 1.0 by default and drop this > > patch for the current merge window. > > I'd be fine with that, as long as we have a clear agreement on the > impact of such a move. Yeah, that's what I was trying to figure out with my fancy tables. I might be coming around more to your approach now, though. Ensuring the new->old migration fails is a nice feature of this series. It would be good if we could preserve that behavior without committing to a new userspace interface, but I'm not sure how. Maybe I should just apologize for the noise, and this patch be left as is... Thanks, drew