Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752208AbdLMEZi (ORCPT ); Tue, 12 Dec 2017 23:25:38 -0500 Received: from mail-ot0-f193.google.com ([74.125.82.193]:33911 "EHLO mail-ot0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750749AbdLMEZf (ORCPT ); Tue, 12 Dec 2017 23:25:35 -0500 X-Google-Smtp-Source: ACJfBovnyajDLYHrsTkWuUKPGHi9MeMXkUKkl6IGVl67uoU9USu1yYKsdtuBlpDwMKs8ETTFRkEzvDlbxfViIMJ0FKc= MIME-Version: 1.0 In-Reply-To: <3f45aa1e-bfb0-7b00-5b0a-fbec9d4c0ca8@redhat.com> References: <1512942265-40606-1-git-send-email-wanpeng.li@hotmail.com> <47429484-71b1-6c42-74e0-af2fd7dfb5c5@redhat.com> <20171212033646.GC2790@xz-mi> <3f45aa1e-bfb0-7b00-5b0a-fbec9d4c0ca8@redhat.com> From: Wanpeng Li Date: Wed, 13 Dec 2017 12:25:34 +0800 Message-ID: Subject: Re: [PATCH RESEND] KVM: X86: Fix load bad host fpu state To: Paolo Bonzini Cc: Peter Xu , David Hildenbrand , "linux-kernel@vger.kernel.org" , kvm , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Wanpeng Li , Rik van Riel , "# v3 . 10+" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id vBD4PgGW007887 Content-Length: 7264 Lines: 163 2017-12-13 0:16 GMT+08:00 Paolo Bonzini : > On 12/12/2017 06:40, Wanpeng Li wrote: >> 2017-12-12 11:36 GMT+08:00 Peter Xu : >>> On Tue, Dec 12, 2017 at 05:51:26AM +0800, Wanpeng Li wrote: >>>> 2017-12-12 4:48 GMT+08:00 David Hildenbrand : >>>>> On 10.12.2017 22:44, Wanpeng Li wrote: >>>>>> From: Wanpeng Li >>>>>> >>>>>> ------------[ cut here ]------------ >>>>>> Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers. >>>>>> WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90 >>>>>> CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G B OE 4.15.0-rc2+ #10 >>>>>> RIP: 0010:ex_handler_fprestore+0x88/0x90 >>>>>> Call Trace: >>>>>> fixup_exception+0x4e/0x60 >>>>>> do_general_protection+0xff/0x270 >>>>>> general_protection+0x22/0x30 >>>>>> RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm] >>>>>> RSP: 0018:ffff8803d5627810 EFLAGS: 00010246 >>>>>> kvm_vcpu_reset+0x3b4/0x3c0 [kvm] >>>>>> kvm_apic_accept_events+0x1c0/0x240 [kvm] >>>>>> kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm] >>>>>> kvm_vcpu_ioctl+0x479/0x880 [kvm] >>>>>> do_vfs_ioctl+0x142/0x9a0 >>>>>> SyS_ioctl+0x74/0x80 >>>>>> do_syscall_64+0x15f/0x600 >>>>>> >>>>>> This can be reproduced by running any testcase in kvm-unit-tests since >>>>>> the qemu userspace FPU context is not initialized, which results in the >>>>>> init path from kvm_apic_accept_events() will load/put qemu userspace >>>>>> FPU context w/o initialized. In addition, w/o this splatting we still >>>>>> should initialize vcpu->arch.user_fpu instead of current->thread.fpu. >>>>>> This patch fixes it by initializing qemu user FPU context if it is >>>>>> uninitialized before KVM_RUN. >>>>>> >>>>>> Cc: Paolo Bonzini >>>>>> Cc: Radim Krčmář >>>>>> Cc: Rik van Riel >>>>>> Cc: stable@vger.kernel.org >>>>>> Fixes: f775b13eedee (x86,kvm: move qemu/guest FPU switching out to vcpu_run) >>>>>> Signed-off-by: Wanpeng Li >>>>>> --- >>>>>> arch/x86/kvm/x86.c | 7 +++++-- >>>>>> 1 file changed, 5 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>>>> index a92b22f..063a643 100644 >>>>>> --- a/arch/x86/kvm/x86.c >>>>>> +++ b/arch/x86/kvm/x86.c >>>>>> @@ -7273,10 +7273,13 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu) >>>>>> >>>>>> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) >>>>>> { >>>>>> - struct fpu *fpu = ¤t->thread.fpu; >>>>>> + struct fpu *fpu = &vcpu->arch.user_fpu; >>>>>> int r; >>>>>> >>>>>> - fpu__initialize(fpu); >>>>>> + if (!fpu->initialized) { >>>>>> + fpstate_init(&fpu->state); >>>>>> + fpu->initialized = 1; >>>>>> + } >>>>> >>>>> Is there a chance of keeping using fpu__initialize() ? Duplicating the >>>>> code is ugly. >>>> >>>> There is a warning in fpu__initialize() which results in just >>>> current->thread.fpu can take advantage of. >>>> >>>>> >>>>> E.g. can't we simply initialize that in kvm_load_guest_fpu? >>>> >>>> We still miss to initialize qemu user FPU context for the above calltrace. >>> >>> IMHO we should not really init the user FPU since we should always >>> load FPU then put FPU. The problem now is that for vcpus that with >>> vcpu_id>1 we'll first put the FPU before loading it. So, instead how >>> about we make sure we load the FPU first even for non-bootstrap vcpus? >>> And we can actually drop fpu__initialize() call, like: >> >> It will introduce extra overhead for all the cases which can't enter >> into vcpu_run(), I think move >> fpstate_init(&vcpu->arch.user_fpu.state); to fx_init() is better. > > Those cases with a sleeping AP are so rare that they don't matter. They > will occur only a few times per boot. Peter's solution is right, I've > queued it. I add a trace_printk here to test Peter's solution: if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { if (kvm_run->immediate_exit) { r = -EINTR; goto out; } kvm_vcpu_block(vcpu); kvm_apic_accept_events(vcpu); kvm_clear_request(KVM_REQ_UNHALT, vcpu); r = -EAGAIN; if (signal_pending(current)) { r = -EINTR; vcpu->run->exit_reason = KVM_EXIT_INTR; ++vcpu->stat.signal_exits; } trace_printk("load/put make no sense\n"); goto out; } I can observe 92339 times "load/put make no sense" in the log during a 32 vCPUs guest booting on a 32 pCPUs Xeon Skylake server. The print frequency is as below: <...>-207694 [016] .... 1021785.120346: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207704 [031] .... 1021785.120347: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207710 [005] .... 1021785.120349: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207687 [002] .... 1021785.120351: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207709 [018] .... 1021785.120353: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207687 [002] .... 1021785.120354: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207692 [004] .... 1021785.120354: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207698 [015] .... 1021785.120358: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207702 [029] .... 1021785.120358: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207715 [026] .... 1021785.120361: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207707 [013] .... 1021785.120362: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207712 [022] .... 1021785.120365: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207711 [006] .... 1021785.120365: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207695 [000] .... 1021785.120367: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207689 [027] .... 1021785.120367: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207703 [011] .... 1021785.120368: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207708 [007] .... 1021785.120370: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207692 [004] .... 1021785.120371: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207708 [007] .... 1021785.120372: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207708 [007] .... 1021785.120373: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207698 [015] .... 1021785.120374: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207713 [012] .... 1021785.120375: kvm_arch_vcpu_ioctl_run: load/put make no sense <...>-207690 [028] .... 1021785.120376: kvm_arch_vcpu_ioctl_run: load/put make no sense I will send out the fx_init() solution, you can pick it if it makes sense. :) Regards, Wanpeng Li