Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758134AbcDEL3D (ORCPT ); Tue, 5 Apr 2016 07:29:03 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:33763 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757931AbcDEL27 (ORCPT ); Tue, 5 Apr 2016 07:28:59 -0400 Subject: Re: [PATCH] kvm: x86: do not leak guest xcr0 into host interrupt handlers To: David Matlack , kvm@vger.kernel.org References: <1459365887-146735-1-git-send-email-dmatlack@google.com> Cc: linux-kernel@vger.kernel.org, luto@amacapital.net, stable@vger.kernel.org From: Paolo Bonzini Message-ID: <5703A175.4000005@redhat.com> Date: Tue, 5 Apr 2016 13:28:53 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <1459365887-146735-1-git-send-email-dmatlack@google.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4619 Lines: 131 On 30/03/2016 21:24, David Matlack wrote: > An interrupt handler that uses the fpu can kill a KVM VM, if it runs > under the following conditions: > - the guest's xcr0 register is loaded on the cpu > - the guest's fpu context is not loaded > - the host is using eagerfpu > > Note that the guest's xcr0 register and fpu context are not loaded as > part of the atomic world switch into "guest mode". They are loaded by > KVM while the cpu is still in "host mode". > > Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The > interrupt handler will look something like this: > > if (irq_fpu_usable()) { > kernel_fpu_begin(); > > [... code that uses the fpu ...] > > kernel_fpu_end(); > } > > As long as the guest's fpu is not loaded and the host is using eager > fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle() > returns true). The interrupt handler proceeds to use the fpu with > the guest's xcr0 live. > > kernel_fpu_begin() saves the current fpu context. If this uses > XSAVE[OPT], it may leave the xsave area in an undesirable state. > According to the SDM, during XSAVE bit i of XSTATE_BV is not modified > if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and > xcr0[i] == 0 following an XSAVE. > > kernel_fpu_end() restores the fpu context. Now if any bit i in > XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The > fault is trapped and SIGSEGV is delivered to the current process. > > Only pre-4.2 kernels appear to be vulnerable to this sequence of > events. Commit 653f52c ("kvm,x86: load guest FPU context more eagerly") > from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts. > > This patch fixes the bug by keeping the host's xcr0 loaded outside > of the interrupts-disabled region where KVM switches into guest mode. > > Cc: stable@vger.kernel.org > Suggested-by: Andy Lutomirski > Signed-off-by: David Matlack > --- > arch/x86/kvm/x86.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) While running my acceptance tests, in one case I got one CPU whose xcr0 had leaked into the host. This showed up as a SIGILL in strncasecmp's AVX code, and a simple program confirmed it: $ cat xgetbv.c #include int main(void) { unsigned xcr0_h, xcr0_l; asm("xgetbv" : "=d"(xcr0_h), "=a"(xcr0_l) : "c"(0)); printf("%08x:%08x\n", xcr0_h, xcr0_l); } $ gcc xgetbv.c -O2 $ for i in `seq 0 55`; do echo $i `taskset -c $i ./a.out`; done|grep -v 007 19 00000000:00000003 I'm going to rerun the tests without this patch, as it seems the most likely culprit, and leave it out of the pull request if they pass. Paolo > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index e260ccb..8df1167 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -700,7 +700,6 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) > if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512) > return 1; > } > - kvm_put_guest_xcr0(vcpu); > vcpu->arch.xcr0 = xcr0; > > if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND) > @@ -6590,8 +6589,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > kvm_x86_ops->prepare_guest_switch(vcpu); > if (vcpu->fpu_active) > kvm_load_guest_fpu(vcpu); > - kvm_load_guest_xcr0(vcpu); > - > vcpu->mode = IN_GUEST_MODE; > > srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); > @@ -6607,6 +6604,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > > local_irq_disable(); > > + kvm_load_guest_xcr0(vcpu); > + > if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests > || need_resched() || signal_pending(current)) { > vcpu->mode = OUTSIDE_GUEST_MODE; > @@ -6667,6 +6666,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > vcpu->mode = OUTSIDE_GUEST_MODE; > smp_wmb(); > > + kvm_put_guest_xcr0(vcpu); > + > /* Interrupt is enabled by handle_external_intr() */ > kvm_x86_ops->handle_external_intr(vcpu); > > @@ -7314,7 +7315,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) > * and assume host would use all available bits. > * Guest xcr0 would be loaded later. > */ > - kvm_put_guest_xcr0(vcpu); > vcpu->guest_fpu_loaded = 1; > __kernel_fpu_begin(); > __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state); > @@ -7323,8 +7323,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) > > void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) > { > - kvm_put_guest_xcr0(vcpu); > - > if (!vcpu->guest_fpu_loaded) { > vcpu->fpu_counter = 0; > return; >