Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757826AbcLANWn (ORCPT ); Thu, 1 Dec 2016 08:22:43 -0500 Received: from mx1.redhat.com ([209.132.183.28]:62188 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753786AbcLANWl (ORCPT ); Thu, 1 Dec 2016 08:22:41 -0500 Date: Thu, 1 Dec 2016 14:21:33 +0100 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: David Matlack Cc: Paolo Bonzini , kvm list , "linux-kernel@vger.kernel.org" , Jim Mattson Subject: Re: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation Message-ID: <20161201132132.GG1682@potion> References: <1480472050-58023-1-git-send-email-dmatlack@google.com> <1480472050-58023-4-git-send-email-dmatlack@google.com> <20161130215234.GA8372@potion> <2123679491.870979.1480545210780.JavaMail.zimbra@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 01 Dec 2016 13:21:36 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3657 Lines: 84 2016-11-30 15:53-0800, David Matlack: > On Wed, Nov 30, 2016 at 2:33 PM, Paolo Bonzini wrote: >> ----- Original Message ----- >>> From: "Radim Krčmář" >>> To: "David Matlack" >>> Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jmattson@google.com, pbonzini@redhat.com >>> Sent: Wednesday, November 30, 2016 10:52:35 PM >>> Subject: Re: [PATCH v3 3/5] KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation >>> >>> 2016-11-29 18:14-0800, David Matlack: >>> > KVM emulates MSR_IA32_VMX_CR{0,4}_FIXED1 with the value -1ULL, meaning >>> > all CR0 and CR4 bits are allowed to be 1 during VMX operation. >>> > >>> > This does not match real hardware, which disallows the high 32 bits of >>> > CR0 to be 1, and disallows reserved bits of CR4 to be 1 (including bits >>> > which are defined in the SDM but missing according to CPUID). A guest >>> > can induce a VM-entry failure by setting these bits in GUEST_CR0 and >>> > GUEST_CR4, despite MSR_IA32_VMX_CR{0,4}_FIXED1 indicating they are >>> > valid. >>> > >>> > Since KVM has allowed all bits to be 1 in CR0 and CR4, the existing >>> > checks on these registers do not verify must-be-0 bits. Fix these checks >>> > to identify must-be-0 bits according to MSR_IA32_VMX_CR{0,4}_FIXED1. >>> > >>> > This patch should introduce no change in behavior in KVM, since these >>> > MSRs are still -1ULL. >>> > >>> > Signed-off-by: David Matlack >>> > --- >>> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>> > @@ -4104,6 +4110,40 @@ static void ept_save_pdptrs(struct kvm_vcpu *vcpu) >>> > +static bool nested_guest_cr0_valid(struct kvm_vcpu *vcpu, unsigned long >>> > val) >>> > +{ >>> > + u64 fixed0 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed0; >>> > + u64 fixed1 = to_vmx(vcpu)->nested.nested_vmx_cr0_fixed1; >>> > + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); >>> > + >>> > + if (to_vmx(vcpu)->nested.nested_vmx_secondary_ctls_high & >>> > + SECONDARY_EXEC_UNRESTRICTED_GUEST && >>> > + nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST)) >>> > + fixed0 &= ~(X86_CR0_PE | X86_CR0_PG); >>> >>> These bits also seem to be guaranteed in fixed1 ... complicated >>> dependencies. >> >> Bits that are set in fixed0 must be set in fixed1 too. Since patch 4 >> always sets CR0_FIXED1 to all-ones (matching bare metal), this is okay. >> >>> There is another exception, SDM 26.3.1.1 (Checks on Guest Control >>> Registers, Debug Registers, and MSRs): >>> >>> Bit 29 (corresponding to CR0.NW) and bit 30 (CD) are never checked >>> because the values of these bits are not changed by VM entry; see >>> Section 26.3.2.1. >> >> Same here, we never check them anyway. True. >>> And another check: >>> >>> If bit 31 in the CR0 field (corresponding to PG) is 1, bit 0 in that >>> field (PE) must also be 1. >> >> This should not be a problem, a failed vmentry is reflected into L1 >> anyway. We only need to check insofar as we could have a more restrictive >> check than what the processor does. > > I had the same thought when I was first writing this patch, Radim. > Maybe we should add a comment here. E.g. > > /* > * CR0.PG && !CR0.PE is also invalid but caught by the CPU > * during VM-entry to L2. > */ Ah, no need, thanks. I expected that we want to kill L1 when VM entry failure happens, because it could have been L0's bug too ... we also pass failure while checking host state, which is definitely L0 bug and shouldn't ever get to L1. I'd like to assume that KVM is (going to be) sound, so both cases are reasonable (just hindering development).