Message-ID: <559EC3FC.8050204@redhat.com>
Date: Thu, 09 Jul 2015 20:57:00 +0200
From: Laszlo Ersek <lersek@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Bandan Das <bsd@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
CC: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org
Subject: Re: [PATCH] KVM: x86: Add host physical address width capability
References: <jpg615ul1j8.fsf@linux.bootlegged.copy>	<559E101A.7080601@redhat.com> <559E180E.8080308@redhat.com>	<559E6BE5.4030000@redhat.com> <jpgy4ip6v1y.fsf@linux.bootlegged.copy>
In-Reply-To: <jpgy4ip6v1y.fsf@linux.bootlegged.copy>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4711
Lines: 111

On 07/09/15 20:32, Bandan Das wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 09/07/2015 08:43, Laszlo Ersek wrote:
>>> On 07/09/15 08:09, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 09/07/2015 00:36, Bandan Das wrote:
>>>>> Let userspace inquire the maximum physical address width
>>>>> of the host processors; this can be used to identify maximum
>>>>> memory that can be assigned to the guest.
>>>>>
>>>>> Reported-by: Laszlo Ersek <lersek@redhat.com>
>>>>> Signed-off-by: Bandan Das <bsd@redhat.com>
>>>>> ---
>>>>>  arch/x86/kvm/x86.c       | 3 +++
>>>>>  include/uapi/linux/kvm.h | 1 +
>>>>>  2 files changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>>> index bbaf44e..97d6746 100644
>>>>> --- a/arch/x86/kvm/x86.c
>>>>> +++ b/arch/x86/kvm/x86.c
>>>>> @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>>  	case KVM_CAP_NR_MEMSLOTS:
>>>>>  		r = KVM_USER_MEM_SLOTS;
>>>>>  		break;
>>>>> +	case KVM_CAP_PHY_ADDR_WIDTH:
>>>>> +		r = boot_cpu_data.x86_phys_bits;
>>>>> +		break;
>>>>
>>>> Userspace can just use CPUID, can't it?
>>>
>>> I believe KVM's cooperation is necessary, for the following reason:
>>>
>>> The truncation only occurs when the guest-phys <-> host-phys translation
>>> is done in hardware, *and* the phys bits of the host processor are
>>> insufficient to represent the highest guest-phys address that the guest
>>> will ever face.
>>>
>>> The first condition (of course) means that the truncation depends on EPT
>>> being enabled. (I didn't test on AMD so I don't know if RVI has the same
>>> issue.) If EPT is disabled, either because the host processor lacks it,
>>> or because the respective kvm_intel module parameter is set so, then the
>>> issue cannot be experienced.
>>>
>>> Therefore I believe a KVM patch is necessary.
>>>
>>> However, this specific patch doesn't seem sufficient; it should also
>>> consider whether EPT is enabled. (And the ioctl should be perhaps
>>> renamed to reflect that -- what QEMU needs to know is not the raw
>>> physical address width of the host processor, but whether that width
>>> will cause EPT to silently truncate high guest-phys addresses.)
>>
>> Right; if you want to consider whether EPT is enabled (which is the
>> right thing to do, albeit it makes for a much bigger patch) a KVM patch
>> is necessary.  In that case you also need to patch the API documentation.
> 
> Note that this patch really doesn't do anything except for printing a
> message that something might potentially go wrong.

Yes.

> Without EPT, you don't
> hit the processor limitation with your setup, but the user should nevertheless
> still be notified.

I disagree.

> In fact, I think shadow paging code should also emulate
> this behavior if the gpa is out of range.

I disagree.

There is no "out of range" gpa. QEMU allocates enough memory, and it
should be completely transparent to the guest. The fact that it silently
breaks with nested paging if the host processor doesn't have enough
address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not
sure, but I suspect it's a hardware bug). In any case the guest
shouldn't care at all. It is a *virtual* machine, and the VMM should lie
to it plausibly enough. How much RAM, and how many phys address bits the
host has, is a performance question, but it should not be a correctness
question. A 256 GB guest should run (slowly, but correctly) on a laptop
that has only 4 GB of RAM and only 36 phys addr bits, but plenty of swap
space.

Because otherwise your argument could be extrapolated as "TCG should
break too if the gpa is 'out of range'".

So, I disagree. Whatever memory you give to the guest should just work
(unless of course you want to emulate a small address width for the
*VCPU*, but that's absolutely not the use case here). What we have here
is a leaky abstraction: a PCPU limitation giving away a lie that the
guest should never notice. The guest should be able to use all memory
that was specified with QEMU's -m, regardless of TCG vs. KVM-without-EPT
vs. KVM-with-EPT. If the last case cannot work (due to hardware
limitations), that's fine, but then (and only then) a warning should be
printed.

... In any case, please understand that I'm not campaigning for this
warning :) IIRC the warning was your (very welcome!) idea after I
reported the problem; I'm just trying to ensure that the warning match
the exact issue I encountered.

Thanks!
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/