Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754531AbdLFCsh (ORCPT ); Tue, 5 Dec 2017 21:48:37 -0500 Received: from mail-oi0-f65.google.com ([209.85.218.65]:39303 "EHLO mail-oi0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754331AbdLFCse (ORCPT ); Tue, 5 Dec 2017 21:48:34 -0500 X-Google-Smtp-Source: AGs4zMble4B+S8RQZMppWQxuGQGWEExSUTdbi21b7+2fy+g7vIU4Ah5wAqOWaXdjxVjERtGg162s8Q/Af1FdPeiJsMs= MIME-Version: 1.0 In-Reply-To: <20171205170951.GA829@flask> References: <20171114001223.441ea2ca@annuminas.surriel.com> <20171205170951.GA829@flask> From: Wanpeng Li Date: Wed, 6 Dec 2017 10:48:32 +0800 Message-ID: Subject: Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run To: Radim Krcmar Cc: Rik van Riel , Paolo Bonzini , kvm , "linux-kernel@vger.kernel.org" , Thomas Gleixner , Christian Borntraeger Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5084 Lines: 97 2017-12-06 1:09 GMT+08:00 Radim Krcmar : > 2017-12-04 10:15+0800, Wanpeng Li: >> 2017-11-14 13:12 GMT+08:00 Rik van Riel : >> > Currently, every time a VCPU is scheduled out, the host kernel will >> > first save the guest FPU/xstate context, then load the qemu userspace >> > FPU context, only to then immediately save the qemu userspace FPU >> > context back to memory. When scheduling in a VCPU, the same extraneous >> > FPU loads and saves are done. >> > >> > This could be avoided by moving from a model where the guest FPU is >> > loaded and stored with preemption disabled, to a model where the >> > qemu userspace FPU is swapped out for the guest FPU context for >> > the duration of the KVM_RUN ioctl. >> > >> > This is done under the VCPU mutex, which is also taken when other >> > tasks inspect the VCPU FPU context, so the code should already be >> > safe for this change. That should come as no surprise, given that >> > s390 already has this optimization. >> > >> > No performance changes were detected in quick ping-pong tests on >> > my 4 socket system, which is expected since an FPU+xstate load is >> > on the order of 0.1us, while ping-ponging between CPUs is on the >> > order of 20us, and somewhat noisy. >> > >> > There may be other tests where performance changes are noticeable. >> >> The kvm/queue has the below splatting: >> >> [ 650.866212] Bad FPU state detected at kvm_put_guest_fpu+0x7d/0x210 >> [kvm], reinitializing FPU registers. >> [ 650.866232] ------------[ cut here ]------------ >> [ 650.866241] WARNING: CPU: 2 PID: 2583 at arch/x86/mm/extable.c:103 >> ex_handler_fprestore+0x5f/0x70 >> [ 650.866473] libahci wmi hid pinctrl_sunrisepoint video pinctrl_intel >> [ 650.866496] CPU: 2 PID: 2583 Comm: qemu-system-x86 Not tainted 4.14.0+ #7 >> [ 650.866500] Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS >> 1.4.9 09/12/2016 >> [ 650.866503] task: ffff97a095a28000 task.stack: ffffa71c8585c000 >> [ 650.866509] RIP: 0010:ex_handler_fprestore+0x5f/0x70 >> [ 650.866512] RSP: 0018:ffffa71c8585fc28 EFLAGS: 00010282 >> [ 650.866519] RAX: 000000000000005b RBX: ffffa71c8585fc68 RCX: 0000000000000006 >> [ 650.866522] RDX: 0000000000000000 RSI: ffffffffb4d35333 RDI: 0000000000000282 >> [ 650.866526] RBP: 000000000000000d R08: 00000000fddae359 R09: 0000000000000000 >> [ 650.866529] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >> [ 650.866532] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280 >> [ 650.866536] FS: 00007f6f8f22c700(0000) GS:ffff97a09ca00000(0000) >> knlGS:0000000000000000 >> [ 650.866540] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 650.866543] CR2: 00007f6f993f3000 CR3: 00000003d4aae005 CR4: 00000000003626e0 >> [ 650.866547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 650.866550] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> [ 650.866554] Call Trace: >> [ 650.866559] fixup_exception+0x32/0x40 >> [ 650.866567] do_general_protection+0xa0/0x1b0 >> [ 650.866574] general_protection+0x22/0x30 >> [ 650.866595] RIP: 0010:kvm_put_guest_fpu+0x7d/0x210 [kvm] >> [ 650.866599] RSP: 0018:ffffa71c8585fd18 EFLAGS: 00010246 >> [ 650.866605] RAX: 00000000ffffffff RBX: ffff97a095a30000 RCX: 0000000000000001 >> [ 650.866608] RDX: 00000000ffffffff RSI: 00000000f7d5d46a RDI: ffff97a095a30b80 >> [ 650.866611] RBP: 0000000000000000 R08: 00000000fddae359 R09: ffff97a095a28968 >> [ 650.866615] R10: 0000000000000000 R11: 00000000e8d39b88 R12: ffff97a095a31bc0 >> [ 650.866618] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280 >> [ 650.866650] ? kvm_put_guest_fpu+0x27/0x210 [kvm] > > Looks like we're calling put when the fpu was not loaded. The simplest > fix would be: > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index e0367f688547..064eba25c215 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -7842,7 +7842,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > * To avoid have the INIT path from kvm_apic_has_events() that be > * called with loaded FPU and does not let userspace fix the state. > */ > - kvm_put_guest_fpu(vcpu); > + if (init_event) > + kvm_put_guest_fpu(vcpu); > mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave, > XFEATURE_MASK_BNDREGS); > if (mpx_state_buffer) > @@ -7851,6 +7852,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > XFEATURE_MASK_BNDCSR); > if (mpx_state_buffer) > memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr)); > + if (init_event) > + kvm_load_guest_fpu(vcpu); > } > > if (!init_event) { > > I'll carry that until there is a nicer solution, thanks for the report. NP, and thanks for the fix. :) Regards, Wanpeng Li