Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753574AbdCBOTn (ORCPT ); Thu, 2 Mar 2017 09:19:43 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54484 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752873AbdCBOTR (ORCPT ); Thu, 2 Mar 2017 09:19:17 -0500 Date: Thu, 2 Mar 2017 15:18:43 +0100 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Wanpeng Li Cc: Dmitry Vyukov , Paolo Bonzini , KVM list , LKML , Jim Mattson , Steve Rutherford , Haozhong Zhang , Xiao Guangrong , syzkaller Subject: Re: kvm: WARNING in nested_vmx_vmexit Message-ID: <20170302141843.GA2548@potion> References: <20170301183150.GB17506@potion> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Thu, 02 Mar 2017 14:18:54 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4850 Lines: 112 2017-03-02 20:28+0800, Wanpeng Li: > 2017-03-02 2:31 GMT+08:00 Radim Krčmář : >> 2017-03-01 10:44+0100, Dmitry Vyukov: >>> On Wed, Mar 1, 2017 at 7:13 AM, Wanpeng Li wrote: >>>> 2017-02-28 20:15 GMT+08:00 Dmitry Vyukov : >>>>> Hello, >>>>> >>>>> The following program triggers WARNING in nested_vmx_vmexit: >>>>> https://gist.githubusercontent.com/dvyukov/16b946d7dc703bb07b9b933f12fb8a6e/raw/dac60506feb8dd9dd22828c486e46ee8a5e30f13/gistfile1.txt >>>>> >>>>> >>>>> ------------[ cut here ]------------ >>>>> WARNING: CPU: 1 PID: 27742 at arch/x86/kvm/vmx.c:11029 >>>>> nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029 >>>>> CPU: 1 PID: 27742 Comm: a.out Not tainted 4.10.0+ #229 >>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >>>>> Call Trace: >>>>> __dump_stack lib/dump_stack.c:15 [inline] >>>>> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 >>>>> panic+0x1fb/0x412 kernel/panic.c:179 >>>>> __warn+0x1c4/0x1e0 kernel/panic.c:540 >>>>> warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 >>>>> nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029 >>>>> vmx_leave_nested arch/x86/kvm/vmx.c:11136 [inline] >>>>> vmx_set_msr+0x1565/0x1910 arch/x86/kvm/vmx.c:3324 >>>>> kvm_set_msr+0xd4/0x170 arch/x86/kvm/x86.c:1099 >>>>> do_set_msr+0x11e/0x190 arch/x86/kvm/x86.c:1128 >>>>> __msr_io arch/x86/kvm/x86.c:2577 [inline] >>>>> msr_io+0x24b/0x450 arch/x86/kvm/x86.c:2614 >>>>> kvm_arch_vcpu_ioctl+0x35b/0x46a0 arch/x86/kvm/x86.c:3497 >>>>> kvm_vcpu_ioctl+0x232/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2721 >>>>> vfs_ioctl fs/ioctl.c:43 [inline] >>>>> do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683 >>>>> SYSC_ioctl fs/ioctl.c:698 [inline] >>>>> SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689 >>>>> entry_SYSCALL_64_fastpath+0x1f/0xc2 >>>>> RIP: 0033:0x451229 >>>>> RSP: 002b:00007fc1e7ebec98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >>>>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000451229 >>>>> RDX: 0000000020aecfe8 RSI: 000000004008ae89 RDI: 0000000000000008 >>>>> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 >>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 >>>>> R13: 0000000000000000 R14: 00007fc1e7ebf9c0 R15: 00007fc1e7ebf700 >>>>> >>>>> >>>>> On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570 >>>> >>>> I git checkout to this commit for linus tree, however the line number >>>> 11029 doesn't match any warning, and there are several warnings in the >>>> function nested_vmx_vmexit(), could you point out which one? >>> >>> Ah sorry, I have some local diff. >>> So now I am on 86292b33d4b79ee03e2f43ea0381ef85f077c760 and here is my diff: >>> https://gist.githubusercontent.com/dvyukov/6bb88c6e3584c3fd989df01386efcb74/raw/1ae08a3a682ad9467de5eb7fb96d457db514f9d2/gistfile1.txt >>> >>> The warning is: >>> >>> /* trying to cancel vmlaunch/vmresume is a bug */ >>> WARN_ON_ONCE(vmx->nested.nested_run_pending); >> >> I can't reproduce this one, but it is there: >> >> The warning gets thrown while doing >> >> (*(uint32_t*)0x20aecfe8 = (uint32_t)0x1); >> (*(uint32_t*)0x20aecfec = (uint32_t)0x0); >> (*(uint32_t*)0x20aecff0 = (uint32_t)0x3a); >> (*(uint32_t*)0x20aecff4 = (uint32_t)0x0); >> (*(uint64_t*)0x20aecff8 = (uint64_t)0x0); >> r[29] = syscall(__NR_ioctl, r[4], 0x4008ae89ul, >> 0x20aecfe8ul, 0, 0, 0, 0, 0, 0); >> >> i.e. KVM_SET_MSR ioctl with >> >> struct kvm_msrs { >> .nmsrs = 1, >> .pad = 0, >> .entries = { >> {.index = MSR_IA32_FEATURE_CONTROL, >> .reserved = 0, >> .data = 0} >> } >> } >> >> and the relevant setter is >> >> case MSR_IA32_FEATURE_CONTROL: >> if (!vmx_feature_control_msr_valid(vcpu, data) || >> (to_vmx(vcpu)->msr_ia32_feature_control & >> FEATURE_CONTROL_LOCKED && !msr_info->host_initiated)) >> return 1; >> vmx->msr_ia32_feature_control = data; >> if (msr_info->host_initiated && data == 0) >> vmx_leave_nested(vcpu); >> break; >> >> The condition is wrong as it is no longer only VMX bits in data, so VMX >> can get disabled while the data is nonzero. KVM might not need to force >> nested VM exit when userspace goes crazy, so the condition would just go >> away ... need to check. > > If the bits just influence the new VMXON execution instead of > currently running hypervisor? If nesting is exited some other way during a reset from userspace. There is nothing like that, so a disgusting vmx->nested.nested_run_pending = 0; nested_vmx_vmexit(vcpu, -1, 0, 0); in vmx_leave_nested() is the safest solution. The CPU is going to be reset hence there should be nothing pending.