Received: by 10.223.176.5 with SMTP id f5csp2313819wra; Mon, 5 Feb 2018 01:50:05 -0800 (PST) X-Google-Smtp-Source: AH8x224Zy3vYCI4k4WL/15y6PUGrHncWMs+SOBPHBydmMfCgN6gwcCDNbgTvW7+mqaFj8SpPW1uR X-Received: by 10.98.20.77 with SMTP id 74mr9934248pfu.45.1517824205295; Mon, 05 Feb 2018 01:50:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517824205; cv=none; d=google.com; s=arc-20160816; b=bSQWUu9ESbYjOF6OC9JMsAl9oHK3XJqaq1KJ4a0dcRAlxAQUMpvWCa6GHSn9gEhBYt Pp3aW5gKod6p15x84P9p19K2l7YbvElNk6LBVRaf6feg1KWD7OciEUTr1nW84+d2SS4Z zgoxITeiGubCDrSEftfadjob7/2xZJ6XGBRw9hdQlJdXgTSJEjGK7qLS2iinezAdy95b gpMB97XNLuQQMUPoPQ2GtOAz9QRkYfV+vJrKMYua5Wj2k8rA62owRQ/hYYCx9lltopFB GpBg//eL/ESx0OaW7RvG7gA/l+JD1sen4C/QONtNEuXnD8mMz7UrVZsE7bFyLQEHr+BU OY4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=XHRnHQNp1RRUCdK0CrbmjoTb0HyDjfoijklUEsyx7CQ=; b=rP2F5uOhbrjoGW+EjN5esNKItiZPF8g+P6AXh2cq8ibq02B3+5+zNXBE9mexSCawhQ +ugfMfIOjE3vLZKojj8RO0SHLZHIVn26dbvwNSWyvhPCHtHrovp6qAukXotdINsBtwEM TmiN6aez3UUPXONdkPHZYOtbzpDPKIr6oXiKh8gRnIU9IGNfa69RpA2CSn52uGyRDoCh agqy1iSIZaHPuDZzXJVGT0zriGNaG6aBnRL+ktRcH0QPxxljsEPeqD026BqhTEOeOjJt aApCLcmGeR8K5J2g1Esr+et5BttXW5PLMTObg6P3TKjX1NO8H/bZkuLF8drRr94Y6Uar GF0g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j78si4737259pfa.332.2018.02.05.01.49.50; Mon, 05 Feb 2018 01:50:05 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752851AbeBEJsF (ORCPT + 99 others); Mon, 5 Feb 2018 04:48:05 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39426 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752779AbeBEJru (ORCPT ); Mon, 5 Feb 2018 04:47:50 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C32F7C047B97; Mon, 5 Feb 2018 09:47:49 +0000 (UTC) Received: from hawk.localdomain (unknown [10.40.205.34]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DFE9C2C8DA; Mon, 5 Feb 2018 09:47:45 +0000 (UTC) Date: Mon, 5 Feb 2018 10:47:42 +0100 From: Andrew Jones To: Christoffer Dall Cc: Marc Zyngier , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, Catalin Marinas , Will Deacon , Peter Maydell , Lorenzo Pieralisi , Mark Rutland , Robin Murphy , Ard Biesheuvel , Hanjun Guo , Jayachandran C , Jon Masters , Russell King - ARM Linux Subject: Re: [PATCH v3 08/18] arm/arm64: KVM: Add PSCI version selection API Message-ID: <20180205094742.dxcejlvmpndwcupg@hawk.localdomain> References: <20180201114657.7323-1-marc.zyngier@arm.com> <20180201114657.7323-9-marc.zyngier@arm.com> <20180202201706.dj6jmff7lh7bleze@kamzik.brq.redhat.com> <20180203115932.24c4bcfb@why.wild-wind.fr.eu.org> <20180204123701.GK21802@cbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180204123701.GK21802@cbox> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 05 Feb 2018 09:47:50 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Feb 04, 2018 at 01:37:01PM +0100, Christoffer Dall wrote: > On Sat, Feb 03, 2018 at 11:59:32AM +0000, Marc Zyngier wrote: > > On Fri, 2 Feb 2018 21:17:06 +0100 > > Andrew Jones wrote: > > > > > On Thu, Feb 01, 2018 at 11:46:47AM +0000, Marc Zyngier wrote: > > > > Although we've implemented PSCI 1.0 and 1.1, nothing can select them > > > > Since all the new PSCI versions are backward compatible, we decide to > > > > default to the latest version of the PSCI implementation. This is no > > > > different from doing a firmware upgrade on KVM. > > > > > > > > But in order to give a chance to hypothetical badly implemented guests > > > > that would have a fit by discovering something other than PSCI 0.2, > > > > let's provide a new API that allows userspace to pick one particular > > > > version of the API. > > > > > > > > This is implemented as a new class of "firmware" registers, where > > > > we expose the PSCI version. This allows the PSCI version to be > > > > save/restored as part of a guest migration, and also set to > > > > any supported version if the guest requires it. > > > > > > > > Signed-off-by: Marc Zyngier > > > > --- > > > > Documentation/virtual/kvm/api.txt | 3 +- > > > > Documentation/virtual/kvm/arm/psci.txt | 30 +++++++++++++++ > > > > arch/arm/include/asm/kvm_host.h | 3 ++ > > > > arch/arm/include/uapi/asm/kvm.h | 6 +++ > > > > arch/arm/kvm/guest.c | 13 +++++++ > > > > arch/arm64/include/asm/kvm_host.h | 3 ++ > > > > arch/arm64/include/uapi/asm/kvm.h | 6 +++ > > > > arch/arm64/kvm/guest.c | 14 ++++++- > > > > include/kvm/arm_psci.h | 9 +++++ > > > > virt/kvm/arm/psci.c | 68 +++++++++++++++++++++++++++++++++- > > > > 10 files changed, 151 insertions(+), 4 deletions(-) > > > > create mode 100644 Documentation/virtual/kvm/arm/psci.txt > > > > > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > > > > index 57d3ee9e4bde..334905202141 100644 > > > > --- a/Documentation/virtual/kvm/api.txt > > > > +++ b/Documentation/virtual/kvm/api.txt > > > > @@ -2493,7 +2493,8 @@ Possible features: > > > > and execute guest code when KVM_RUN is called. > > > > - KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode. > > > > Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only). > > > > - - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 for the CPU. > > > > + - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 (or a future revision > > > > + backward compatible with v0.2) for the CPU. > > > > Depends on KVM_CAP_ARM_PSCI_0_2. > > > > - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU. > > > > Depends on KVM_CAP_ARM_PMU_V3. > > > > diff --git a/Documentation/virtual/kvm/arm/psci.txt b/Documentation/virtual/kvm/arm/psci.txt > > > > new file mode 100644 > > > > index 000000000000..aafdab887b04 > > > > --- /dev/null > > > > +++ b/Documentation/virtual/kvm/arm/psci.txt > > > > @@ -0,0 +1,30 @@ > > > > +KVM implements the PSCI (Power State Coordination Interface) > > > > +specification in order to provide services such as CPU on/off, reset > > > > +and power-off to the guest. > > > > + > > > > +The PSCI specification is regularly updated to provide new features, > > > > +and KVM implements these updates if they make sense from a virtualization > > > > +point of view. > > > > + > > > > +This means that a guest booted on two different versions of KVM can > > > > +observe two different "firmware" revisions. This could cause issues if > > > > +a given guest is tied to a particular PSCI revision (unlikely), or if > > > > +a migration causes a different PSCI version to be exposed out of the > > > > +blue to an unsuspecting guest. > > > > + > > > > +In order to remedy this situation, KVM exposes a set of "firmware > > > > +pseudo-registers" that can be manipulated using the GET/SET_ONE_REG > > > > +interface. These registers can be saved/restored by userspace, and set > > > > +to a convenient value if required. > > > > + > > > > +The following register is defined: > > > > + > > > > +* KVM_REG_ARM_PSCI_VERSION: > > > > + > > > > + - Only valid if the vcpu has the KVM_ARM_VCPU_PSCI_0_2 feature set > > > > + (and thus has already been initialized) > > > > + - Returns the current PSCI version on GET_ONE_REG (defaulting to the > > > > + highest PSCI version implemented by KVM and compatible with v0.2) > > > > + - Allows any PSCI version implemented by KVM and compatible with > > > > + v0.2 to be set with SET_ONE_REG > > > > + - Affects the whole VM (even if the register view is per-vcpu) > > > > > > > Hi Drew, > > > > Thanks for looking into this, and for the exhaustive data. > > > > > > > > I've put some more thought and experimentation into this. I think we > > > should change to a vcpu feature bit. The feature bit would be used to > > > force compat mode, v0.2, so KVM would still enable the new PSCI > > > version by default. Below are two tables describing why I think we > > > should switch to something other than a new sysreg, and below those > > > tables are notes as to why I think we should use a vcpu feature. The > > > asterisks in the tables point out behaviors that aren't what we want. > > > While both tables have an asterisk, the sysreg approach's issue is > > > bug. The vcpu feature approach's issue is risk incurred from an > > > unsupported migration, albeit one that is hard to detect without a > > > new machine type. > > > > > > +-----------------------------------------------------------------------+ > > > | sysreg approach | > > > +------------------+-----------+-------+--------------------------------+ > > > | migration | userspace | works | notes | > > > | | change | | | > > > +------------------+-----------+-------+--------------------------------+ > > > | new -> new | NO | YES | Expected | > > > +------------------+-----------+-------+--------------------------------+ > > > | old -> new | NO | YES | PSCI 1.0 is backward compatible| > > > +------------------+-----------+-------+--------------------------------+ > > > | new -> old | NO | NO | Migration fails due to the new | > > > | | | | sysreg. Migration shouldn't | > > > | | | | have been attempted, but no | > > > | | | | way to know without a new | > > > | | | | machine type. | > > > +------------------+-----------+-------+--------------------------------+ > > > | compat -> old | YES | NO* | Even when setting PSCI version | > > > | | | | to 0.2, we add a new sysreg, | > > > | | | | so migration will still fail. | > > > +------------------+-----------+-------+--------------------------------+ > > > | old -> compat | YES | YES | It's OK for the destination to | > > > | | | | support more sysregs than the | > > > | | | | source sends. | > > > +------------------+-----------+-------+--------------------------------+ > > > > > > > > > +-----------------------------------------------------------------------+ > > > | vcpu feature approach | > > > +------------------+-----------+-------+--------------------------------+ > > > | migration | userspace | works | notes | > > > | | change | | | > > > +------------------+-----------+-------+--------------------------------+ > > > | new -> new | NO | YES | Expected | > > > +------------------+-----------+-------+--------------------------------+ > > > | old -> new | NO | YES | PSCI 1.0 is backward compatible| > > > +------------------+-----------+-------+--------------------------------+ > > > | new -> old | NO | YES* | Migrates, but it's not safe | > > > | | | | for the guest kernel, and no | > > > | | | | way to know without a new | > > > | | | | machine type. | > > > +------------------+-----------+-------+--------------------------------+ > > > | compat -> old | YES | YES | Expected | > > > +------------------+-----------+-------+--------------------------------+ > > > | old -> compat | YES | YES | Expected | > > > +------------------+-----------+-------+--------------------------------+ > > > > > > > > > Notes as to why the vcpu feature approach was selected: > > > > > > 1) While this is VM state, and thus a VM control would be a more natural > > > fit, a new vcpu feature bit would be much less new code. We also > > > already have a PSCI vcpu feature bit, so a new one will actually fit > > > quite well. > > > > > > 2) No new state needs to be migrated, as we already migrate the feature > > > bitmap. Unlike, KVM, QEMU doesn't track the max number of features, > > > so bumping it one more in KVM doesn't require a QEMU change. > > > > > > > > > If we switch to a vcpu feature bit, then I think this patch can be > > > replaced with something like this > > > > A couple of remarks: > > > > - My worry with this feature bit is that it is a point fix, and it > > doesn't scale. Come PSCI 1.2 and WORKAROUND_2, what do we do? Add > > another feature bit that says "force to 1.0"? I'd really like > > something we can live with in the long run, and "KVM as firmware" > > needs to be able to evolve without requiring a new userspace > > interface each time we rev it. > > > > - The "compat->old" entry in your sysreg table is not quite fair. In > > the feature table, you teach userspace about the new feature bit. You > > could just as well teach userspace about the new sysreg. Yes, things > > may be different in QEMU, but that's not what we're talking about > > here. > > > > - Allowing a guest to migrate in an unsafe way seems worse than failing > > a migration unexpectedly. Or at least not any better. > > > > To be clear: I'm not dismissing the idea at all, but I want to make sure > > we're not cornering ourselves into an uncomfortable place. > > > > Christoffer, Peter, what are your thoughts on this? > > > > Taking a step back, the only reasons why this patch isn't simply > enabling PSCI v1.0 by default (without any selection method) are that we > (1) want to support guests that complain about PSCI_VERSION != 0.2 > (which isn't completely outside the realm of a reasonable implementation > if you read the description of PSCI_VERSION in the 0.2 spec) and (2) to > provide migration support for guests that call > PSCI_1_0_FN_PSCI_FEATURES. > > If we ignore (1) because we don't know of any guests where this is an > issue, then it's all about (2), migration from "new -> old". > > As far as I can tell, the use case we are worried about here is updating > the kernel (and not QEMU) on half of your data center and then trying to > migrate from the upgraded kernel machine to a legacy (and potentially > variant 2 vulnerable) machine. For this specific move from PSCI 0.2 to > 1.0 with the included mitigation, I don't really think this is an > important use case to support. > > In terms of the more general approach to "KVM firmware upgrades" and > migration, I think something like the proposed FW register interface > here should work, but I'm concerned about the lack of opportunity from > userspace to predict a migration failure. But I don't understand why > this requires a new machine type? Why can't we simply provide a KVM > capability that libvirt etc. can query for? Right, just exposing (or not) a property (which will become a libvirt capability) should work for the management layers to determine if a migration will fail, and then not attempt it. We just tend to avoid allowing new properties from appearing in old machine types, and thus new machine types would be the only ones exposing it. I'm not sure if we need to be so strict though. > > Also, is it generally true that we can't expose any additional system > registers from KVM without breaking migration and we don't have any > method to deal with that in userspace and upper layers? If that's true, > that's a bigger problem in general and something we should work on > trying to solve. If it's not true, then there should be some method to > deal with the FW register already (like capabilities). Capabilities can certainly be added for anything that need them, including a new sysreg. QEMU just currently manages the sysregs as an array, without concern for which ones are old and which ones are new. Of course that can be changed. It may not even be that difficult to do, we just need to filter registers before adding the array to the migration stream. > > Given the urgency of adding mitigation towards variant 2 which is the > driver for this work, I think we should drop the compat functionality in > this series and work this out later on if needed. I think we can just > tweak the previous patch to enable PSCI 1.0 by default and drop this > patch for the current merge window. I agree. Without planning to wait for the userspace changes, then I guess we don't need to wait for a decision on how to do them yet either. Thanks, drew