Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752504AbdGLNYy (ORCPT ); Wed, 12 Jul 2017 09:24:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54564 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750957AbdGLNYw (ORCPT ); Wed, 12 Jul 2017 09:24:52 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 0DC844E34E Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=rkrcmar@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 0DC844E34E Date: Wed, 12 Jul 2017 15:24:45 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Bandan Das Cc: David Hildenbrand , kvm@vger.kernel.org, pbonzini@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v4 3/3] KVM: nVMX: Emulate EPTP switching for the L1 hypervisor Message-ID: <20170712132445.GD3442@potion> References: <20170710204936.4001-4-bsd@redhat.com> <2d50ebc4-9328-ce08-b55b-6a331ee13cc3@redhat.com> <20170711193235.GE3326@potion> <20170711202118.GC28875@potion> <20170711204521.GF3326@potion> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Wed, 12 Jul 2017 13:24:52 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6912 Lines: 157 2017-07-11 17:08-0400, Bandan Das: > Radim Krčmář writes: > > > 2017-07-11 16:34-0400, Bandan Das: > >> Radim Krčmář writes: > >> > >> > 2017-07-11 15:50-0400, Bandan Das: > >> >> Radim Krčmář writes: > >> >> > 2017-07-11 14:24-0400, Bandan Das: > >> >> >> Bandan Das writes: > >> >> >> > If there's a triple fault, I think it's a good idea to inject it > >> >> >> > back. Basically, there's no need to take care of damage control > >> >> >> > that L1 is intentionally doing. > >> >> >> > > >> >> >> >>> + goto fail; > >> >> >> >>> + kvm_mmu_unload(vcpu); > >> >> >> >>> + vmcs12->ept_pointer = address; > >> >> >> >>> + kvm_mmu_reload(vcpu); > >> >> >> >> > >> >> >> >> I was thinking about something like this: > >> >> >> >> > >> >> >> >> kvm_mmu_unload(vcpu); > >> >> >> >> old = vmcs12->ept_pointer; > >> >> >> >> vmcs12->ept_pointer = address; > >> >> >> >> if (kvm_mmu_reload(vcpu)) { > >> >> >> >> /* pointer invalid, restore previous state */ > >> >> >> >> kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu); > >> >> >> >> vmcs12->ept_pointer = old; > >> >> >> >> kvm_mmu_reload(vcpu); > >> >> >> >> goto fail; > >> >> >> >> } > >> >> >> >> > >> >> >> >> The you can inherit the checks from mmu_check_root(). > >> >> >> > >> >> >> Actually, thinking about this a bit more, I agree with you. Any fault > >> >> >> with a vmfunc operation should end with a vmfunc vmexit, so this > >> >> >> is a good thing to have. Thank you for this idea! :) > >> >> > > >> >> > SDM says > >> >> > > >> >> > IF tent_EPTP is not a valid EPTP value (would cause VM entry to fail > >> >> > if in EPTP) THEN VMexit; > >> >> > >> >> This section here: > >> >> As noted in Section 25.5.5.2, an execution of the > >> >> EPTP-switching VM function that causes a VM exit (as specified > >> >> above), uses the basic exit reason 59, indicating “VMFUNC”. > >> >> The length of the VMFUNC instruction is saved into the > >> >> VM-exit instruction-length field. No additional VM-exit > >> >> information is provided. > >> >> > >> >> Although, it adds (as specified above), from testing, any vmexit that > >> >> happens as a result of the execution of the vmfunc instruction always > >> >> has exit reason 59. > >> >> > >> >> IMO, the case David pointed out comes under "as a result of the > >> >> execution of the vmfunc instruction", so I would prefer exiting > >> >> with reason 59. > >> > > >> > Right, the exit reason is 59 for reasons that trigger a VM exit > >> > (i.e. invalid EPTP value, the four below), but kvm_mmu_reload() checks > >> > unrelated stuff. > >> > > >> > If the EPTP value is correct, then the switch should succeed. > >> > If the EPTP is correct, but bogus, then the guest should get > >> > EPT_MISCONFIG VM exit on its first access (when reading the > >> > instruction). Source: I added > >> > >> My point is that we are using kvm_mmu_reload() to emulate eptp > >> switching. If that emulation of vmfunc fails, it should exit with reason > >> 59. >> >> Yeah, we just disagree on what is a vmfunc failure. >> >>> > vmcs_write64(EPT_POINTER, vmcs_read64(EPT_POINTER) | (1ULL << 40)); >>> > >>> > shortly before a VMLAUNCH on L0. :) >>> >>> What happens if this ept pointer is actually in the eptp list and the guest >>> switches to it using vmfunc ? I think it will exit with reason 59. >> >> I think otherwise, because it doesn't cause a VM entry failure on >> bare-metal (and SDM says that we get a VM exit only if there would be a >> VM entry failure). >> I expect the vmfunc to succeed and to get a EPT_MISCONFIG right after. >> (Experiment pending :]) >> >>> > I think that we might be emulating this case incorrectly and throwing >>> > triple faults when it should be VM exits in vcpu_run(). >>> >>> No, I agree with not throwing a triple fault. We should clear it out. >>> But we should emulate a vmfunc vmexit back to L1 when kvm_mmu_load fails. >> >> Here we disagree. I think that it's a bug do the VM exit, so we can > > Why do you think it's a bug ? SDM defines a different behavior and hardware doesn't do that either. There are only two reasons for a VMFUNC VM exit from EPTP switching: 1) ECX > 0 2) EPTP would cause VM entry to fail if in VMCS.EPT_POINTER KVM can fail for other reasons because of its bugs, but that should be notified to the guest in another way. Rebooting the guest is kind of acceptable in that case. > The eptp switching function really didn't > succeed as far as our emulation goes when kvm_mmu_reload() fails. > And as such, the generic vmexit failure event should be a vmfunc vmexit. I interpret it as two separate events -- at first, the vmfunc succeeds and when it later tries to access memory through the new EPTP (valid, but not pointing into backed memory), it results in a EPT_MISCONFIG VM exit. > We cannot strictly follow the spec here, the spec doesn't even mention a way > to emulate eptp switching. If setting up the switching succeeded and the > new root pointer is invalid or whatever, I really don't care what happens > next but this is not the case. We fail to get a new root pointer and without > that, we can't even make a switch! We just make it behave exactly how the spec says that it behaves. We do have a value (we read 'address') to put into VMCS.EPT_POINTER, which is all we need for the emulation. The function doesn't dereference that pointer, it just looks at its value to decide whether it is valid or not. (btw. we should check that properly, because we cannot depend on VM entry failure pass-through like the normal case.) The dereference done in kvm_mmu_reload() should happen after EPTP switching finishes, because the spec doesn't mention a VM exit for other reason than invalid EPT_POINTER value. >> just keep the original bug -- we want to eventually fix it and it's no >> worse till then. > > Anyway, can you please confirm again what is the behavior that you > are expecting if kvm_mmu_reload fails ? This would be a rarely used > branch and I am actually fine diverging from what I think is right if > I can get the reviewers to agree on a common thing. kvm_mmu_reload() fails when mmu_check_root() is false, which means that the pointed physical address is not backed. We've hit this corner-case in the past -- Jim said that the chipset returns all 1s if a read is not claimed. So in theory, KVM should not fail kvm_mmu_reload(), but behave as if the pointer pointed to a memory of all 1s, which would likely result in EPT_MISCONFIG when the guest does a memory access. It is a mishandled corner case, but turning it into VM exit would only confuse an OS that receives the impossible VM exit and potentially confuse reader of the KVM logic. I think that not using kvm_mmu_reload() directly in EPTP switching is best. The bug is not really something we care about.