Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753224AbbGIS5V (ORCPT ); Thu, 9 Jul 2015 14:57:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59642 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752905AbbGIS5D (ORCPT ); Thu, 9 Jul 2015 14:57:03 -0400 Message-ID: <559EC3FC.8050204@redhat.com> Date: Thu, 09 Jul 2015 20:57:00 +0200 From: Laszlo Ersek User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Bandan Das , Paolo Bonzini CC: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org Subject: Re: [PATCH] KVM: x86: Add host physical address width capability References: <559E101A.7080601@redhat.com> <559E180E.8080308@redhat.com> <559E6BE5.4030000@redhat.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4711 Lines: 111 On 07/09/15 20:32, Bandan Das wrote: > Paolo Bonzini writes: > >> On 09/07/2015 08:43, Laszlo Ersek wrote: >>> On 07/09/15 08:09, Paolo Bonzini wrote: >>>> >>>> >>>> On 09/07/2015 00:36, Bandan Das wrote: >>>>> Let userspace inquire the maximum physical address width >>>>> of the host processors; this can be used to identify maximum >>>>> memory that can be assigned to the guest. >>>>> >>>>> Reported-by: Laszlo Ersek >>>>> Signed-off-by: Bandan Das >>>>> --- >>>>> arch/x86/kvm/x86.c | 3 +++ >>>>> include/uapi/linux/kvm.h | 1 + >>>>> 2 files changed, 4 insertions(+) >>>>> >>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>>> index bbaf44e..97d6746 100644 >>>>> --- a/arch/x86/kvm/x86.c >>>>> +++ b/arch/x86/kvm/x86.c >>>>> @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >>>>> case KVM_CAP_NR_MEMSLOTS: >>>>> r = KVM_USER_MEM_SLOTS; >>>>> break; >>>>> + case KVM_CAP_PHY_ADDR_WIDTH: >>>>> + r = boot_cpu_data.x86_phys_bits; >>>>> + break; >>>> >>>> Userspace can just use CPUID, can't it? >>> >>> I believe KVM's cooperation is necessary, for the following reason: >>> >>> The truncation only occurs when the guest-phys <-> host-phys translation >>> is done in hardware, *and* the phys bits of the host processor are >>> insufficient to represent the highest guest-phys address that the guest >>> will ever face. >>> >>> The first condition (of course) means that the truncation depends on EPT >>> being enabled. (I didn't test on AMD so I don't know if RVI has the same >>> issue.) If EPT is disabled, either because the host processor lacks it, >>> or because the respective kvm_intel module parameter is set so, then the >>> issue cannot be experienced. >>> >>> Therefore I believe a KVM patch is necessary. >>> >>> However, this specific patch doesn't seem sufficient; it should also >>> consider whether EPT is enabled. (And the ioctl should be perhaps >>> renamed to reflect that -- what QEMU needs to know is not the raw >>> physical address width of the host processor, but whether that width >>> will cause EPT to silently truncate high guest-phys addresses.) >> >> Right; if you want to consider whether EPT is enabled (which is the >> right thing to do, albeit it makes for a much bigger patch) a KVM patch >> is necessary. In that case you also need to patch the API documentation. > > Note that this patch really doesn't do anything except for printing a > message that something might potentially go wrong. Yes. > Without EPT, you don't > hit the processor limitation with your setup, but the user should nevertheless > still be notified. I disagree. > In fact, I think shadow paging code should also emulate > this behavior if the gpa is out of range. I disagree. There is no "out of range" gpa. QEMU allocates enough memory, and it should be completely transparent to the guest. The fact that it silently breaks with nested paging if the host processor doesn't have enough address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not sure, but I suspect it's a hardware bug). In any case the guest shouldn't care at all. It is a *virtual* machine, and the VMM should lie to it plausibly enough. How much RAM, and how many phys address bits the host has, is a performance question, but it should not be a correctness question. A 256 GB guest should run (slowly, but correctly) on a laptop that has only 4 GB of RAM and only 36 phys addr bits, but plenty of swap space. Because otherwise your argument could be extrapolated as "TCG should break too if the gpa is 'out of range'". So, I disagree. Whatever memory you give to the guest should just work (unless of course you want to emulate a small address width for the *VCPU*, but that's absolutely not the use case here). What we have here is a leaky abstraction: a PCPU limitation giving away a lie that the guest should never notice. The guest should be able to use all memory that was specified with QEMU's -m, regardless of TCG vs. KVM-without-EPT vs. KVM-with-EPT. If the last case cannot work (due to hardware limitations), that's fine, but then (and only then) a warning should be printed. ... In any case, please understand that I'm not campaigning for this warning :) IIRC the warning was your (very welcome!) idea after I reported the problem; I'm just trying to ensure that the warning match the exact issue I encountered. Thanks! Laszlo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/