Received: by 10.223.176.5 with SMTP id f5csp444668wra; Fri, 9 Feb 2018 01:31:24 -0800 (PST) X-Google-Smtp-Source: AH8x226KaPLtLkPCzldefk+BJJq3K5Gs4doK2gitdPGOTj6jB5Wt/45EcMJpalP5o/Me+Agg958S X-Received: by 10.101.96.47 with SMTP id p15mr1805414pgu.390.1518168684758; Fri, 09 Feb 2018 01:31:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518168684; cv=none; d=google.com; s=arc-20160816; b=EJI2IYvZu/CaLyS7eKHG0eRSK/ljkBYI04qUgXzwGzh2rU35QWh4WwFwP9MVQGz6jr vjUH+tuGTmUy2iuVz4DX3cRIe9KISt4626Xca5IHkSOnu/8q2GRPNSpoQuBCGf/KCUXM 8MYeJ27AshCNAsnGMP6tDxcY5OKQvL3VA0BfQsRXIB4fE/iiWHEXWOBgCZ7VHVs8SsGe /tgLopKGRI09PACpJgj1/pT0CKTn7svj7wSfSydeTX8T/7VrY9KakRVd/78alsa66FER e6NHy9krLLo0EI+MLFSj5u5cjY3duH/YWMnEOiV6Y7muw9yU5VewNQJYbLrfshTlD4UY GS8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=NyZ/bKYb+r+vama/YYNLYx5J6CNsShzT/mdyoYmjbrU=; b=v84qB1NZY5JdfnX+k96lqz6v94WDipuF/odtvLYdAOWPAIFa9WN738aRvpyB/sw5mR xaFjZCcMewlEUNQPjJ2VwdZYXt/QH2OZLL7+qPiWh2uXYQCv3Bt+C3ZGHndjhV1vXo25 MOl01AwrkPhGENsdJL8YX5e5nMBprld1sHwhF9xAn3XxW8Ybmy7Iie5Y8O6qUPiBTJDK q/ESHf9QOLiZKlC+4Zjr05jKyC/dBBwTuj4E9hatEjClKW/1juycbYjoQInA4LgZ59Ud Ewh1uXEczNb4/F53zqFoq9BSFAePn0nkEaWgmqjmbjsG4laaqwg3AArGmeEgVzxBbqM2 tCyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p128si1158243pga.15.2018.02.09.01.31.07; Fri, 09 Feb 2018 01:31:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751117AbeBIJ1l (ORCPT + 99 others); Fri, 9 Feb 2018 04:27:41 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46072 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751034AbeBIJ1e (ORCPT ); Fri, 9 Feb 2018 04:27:34 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 541DF40265F6; Fri, 9 Feb 2018 09:27:34 +0000 (UTC) Received: from hawk.localdomain (unknown [10.40.205.76]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A3C8310083A8; Fri, 9 Feb 2018 09:27:31 +0000 (UTC) Date: Fri, 9 Feb 2018 10:27:29 +0100 From: Andrew Jones To: Christoffer Dall Cc: Suzuki K Poulose , linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, marc.zyngier@arm.com, linux-kernel@vger.kernel.org, kristina.martsenko@arm.com, peter.maydell@linaro.org, pbonzini@redhat.com, rkrcmar@redhat.com, will.deacon@arm.com, ard.biesheuvel@linaro.org, mark.rutland@arm.com, catalin.marinas@arm.com, Christoffer Dall Subject: Re: [PATCH v1 15/16] kvm: arm64: Allow configuring physical address space size Message-ID: <20180209092729.o55kl2tjg5hplk52@hawk.localdomain> References: <20180109190414.4017-1-suzuki.poulose@arm.com> <20180109190414.4017-16-suzuki.poulose@arm.com> <20180208111414.GM29286@cbox> <31e5bf40-4fcb-934e-6036-ff2670f793df@arm.com> <20180209081606.GC7339@cbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180209081606.GC7339@cbox> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 09 Feb 2018 09:27:34 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 09 Feb 2018 09:27:34 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'drjones@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 09, 2018 at 09:16:06AM +0100, Christoffer Dall wrote: > On Thu, Feb 08, 2018 at 05:53:17PM +0000, Suzuki K Poulose wrote: > > On 08/02/18 11:14, Christoffer Dall wrote: > > >On Tue, Jan 09, 2018 at 07:04:10PM +0000, Suzuki K Poulose wrote: > > >>Allow the guests to choose a larger physical address space size. > > >>The default and minimum size is 40bits. A guest can change this > > >>right after the VM creation, but before the stage2 entry page > > >>tables are allocated (i.e, before it registers a memory range > > >>or maps a device address). The size is restricted to the maximum > > >>supported by the host. Also, the guest can only increase the PA size, > > >>from the existing value, as reducing it could break the devices which > > >>may have verified their physical address for validity and may do a > > >>lazy mapping(e.g, VGIC). > > >> > > >>Cc: Marc Zyngier > > >>Cc: Christoffer Dall > > >>Cc: Peter Maydell > > >>Signed-off-by: Suzuki K Poulose > > >>--- > > >> Documentation/virtual/kvm/api.txt | 27 ++++++++++++++++++++++++++ > > >> arch/arm/include/asm/kvm_host.h | 7 +++++++ > > >> arch/arm64/include/asm/kvm_host.h | 1 + > > >> arch/arm64/include/asm/kvm_mmu.h | 41 ++++++++++++++++++++++++++++++--------- > > >> arch/arm64/kvm/reset.c | 28 ++++++++++++++++++++++++++ > > >> include/uapi/linux/kvm.h | 4 ++++ > > >> virt/kvm/arm/arm.c | 2 +- > > >> 7 files changed, 100 insertions(+), 10 deletions(-) > > >> > > >>diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > > >>index 57d3ee9e4bde..a203faf768c4 100644 > > >>--- a/Documentation/virtual/kvm/api.txt > > >>+++ b/Documentation/virtual/kvm/api.txt > > >>@@ -3403,6 +3403,33 @@ invalid, if invalid pages are written to (e.g. after the end of memory) > > >> or if no page table is present for the addresses (e.g. when using > > >> hugepages). > > >>+4.109 KVM_ARM_GET_PHYS_SHIFT > > >>+ > > >>+Capability: KVM_CAP_ARM_CONFIG_PHYS_SHIFT > > >>+Architectures: arm64 > > >>+Type: vm ioctl > > >>+Parameters: __u32 (out) > > >>+Returns: 0 on success, a negative value on error > > >>+ > > >>+This ioctl is used to get the current maximum physical address size for > > >>+the VM. The value is Log2(Maximum_Physical_Address). This is neither the > > >>+ amount of physical memory assigned to the VM nor the maximum physical address > > >>+that a real CPU on the host can handle. Rather, this is the upper limit of the > > >>+guest physical address that can be used by the VM. > > > > > >What is the point of this? Presumably if userspace has set the size, it > > >already knows the size? > > > > This can help the userspace know, what the "default" limit is. As such I am > > not particular about keeping this around. > > > > Userspace has to already know, since otherwise things don't work today, > so I think we can omit this. > > > > > > >>+ > > >>+4.109 KVM_ARM_SET_PHYS_SHIFT > > >>+ > > >>+Capability: KVM_CAP_ARM_CONFIG_PHYS_SHIFT > > >>+Architectures: arm64 > > >>+Type: vm ioctl > > >>+Parameters: __u32 (in) > > >>+Returns: 0 on success, a negative value on error > > >>+ > > >>+This ioctl is used to set the maximum physical address size for > > >>+the VM. The value is Log2(Maximum_Physical_Address). The value can only > > >>+be increased from the existing setting. The value cannot be changed > > >>+after the stage-2 page tables are allocated and will return an error. > > >>+ > > > > > >Is there a way for userspace to discover what the underlying hardware > > >can actually support, beyond trial-and-error on this ioctl? > > > > Unfortunately, there is none. We don't expose ID_AA64MMFR0 via mrs emulation. > > > > We should probably think about that. Perhaps it could be tied to the > return value of KVM_CAP_ARM_CONFIG_PHYS_SHIFT ? FWIW, that sounds good to me. > > > >>+static inline int kvm_reconfig_stage2(struct kvm *kvm, u32 phys_shift) > > >>+{ > > >>+ int rc = 0; > > >>+ unsigned int pa_max, parange; > > >>+ > > >>+ parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 7; > > >>+ pa_max = id_aa64mmfr0_parange_to_phys_shift(parange); > > >>+ /* Raise it to 40bits for backward compatibility */ > > >>+ pa_max = (pa_max < 40) ? 40 : pa_max; > > >>+ /* Make sure the size is supported/available */ > > >>+ if (phys_shift > PHYS_MASK_SHIFT || phys_shift > pa_max) > > >>+ return -EINVAL; > > >>+ /* > > >>+ * The stage2 PGD is dependent on the settings we initialise here > > >>+ * and should be allocated only after this step. We cannot allow > > >>+ * down sizing the IPA size as there could be devices or memory > > >>+ * regions, that depend on the previous size. > > >>+ */ > > >>+ mutex_lock(&kvm->slots_lock); > > >>+ if (kvm->arch.pgd || phys_shift < kvm->arch.phys_shift) { > > >>+ rc = -EPERM; > > >>+ } else if (phys_shift > kvm->arch.phys_shift) { > > >>+ kvm->arch.phys_shift = phys_shift; > > >>+ kvm->arch.s2_levels = stage2_pt_levels(kvm->arch.phys_shift); > > >>+ kvm->arch.vtcr_private = VTCR_EL2_SL0(kvm->arch.s2_levels) | > > >>+ TCR_T0SZ(kvm->arch.phys_shift); > > >>+ } > > > > > >I think you can rework the above to make it more obvious what's going on > > >in this way: > > > > > > rc = -EPERM; > > > if (kvm->arch.pgd || phys_shift < kvm->arch.phys_shift) > > > goto out_unlock; > > > > > > rc = 0; > > > if (phys_shift == kvm->arch.phys_shift) > > > goto out_unlock; > > > > > > kvm->arch.phys_shift = phys_shift; > > > kvm->arch.s2_levels = stage2_pt_levels(kvm->arch.phys_shift); > > > kvm->arch.vtcr_private = VTCR_EL2_SL0(kvm->arch.s2_levels) | > > > TCR_T0SZ(kvm->arch.phys_shift); > > > > > >out_unlock: > > > > > > > Sure. > > > > > > > > >>--- a/virt/kvm/arm/arm.c > > >>+++ b/virt/kvm/arm/arm.c > > >>@@ -1136,7 +1136,7 @@ long kvm_arch_vm_ioctl(struct file *filp, > > >> return 0; > > >> } > > >> default: > > >>- return -EINVAL; > > >>+ return kvm_arch_dev_vm_ioctl(kvm, ioctl, arg); > > >> } > > >> } > > >>-- > > >>2.13.6 > > >> > > > > > >Have you considered making this capability a generic capability and > > >encoding this in the 'type' argument to KVM_CREATE_VM? This would > > >significantly simplify all the above and would allow you to drop patch 8 > > >and 9 I think. > > > > No. I will take a look. Btw, there were patches flying around to support > > "userspace" requesting specific values for ID feature registers. But even that > > doesn't help with the detection part. May be that is another way to configure > > the size, but not sure about the current status of that work. > > > > It's a bit stranded. Drew was driving this work (on cc). But the ID > register exposed to the guest should represent the actual limits > of the VM, so I don't think we need userspace to configure this, but we > can implement this in KVM based on the PA range configured for the VM. > I heard there were some patches being worked by someone at Arm, which haven't been posted yet (obviously), but maybe that was just a rumor? I was about to revisit this topic myself, at least to some degree, to attempt to address PMU hiding. We really need to put some thought into how best to generally give userspace control of the VM's ID registers, within the constraints of the host. Anyway, I guess that should be done in a separate thread, so I won't start brainstorming now, here. Thanks, drew