Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753756AbcJNJcO (ORCPT ); Fri, 14 Oct 2016 05:32:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39510 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750944AbcJNJcH (ORCPT ); Fri, 14 Oct 2016 05:32:07 -0400 Message-ID: <1476437522.7194.7.camel@redhat.com> Subject: Re: kernel BUG at arch/x86/kernel/traps.c:643! when run Redhat7(v3.10) in kvm guest From: Rik van Riel To: Kefeng Wang , Linux Kernel Mailing List Cc: Andy Lutomirski , "Xiexiuqi (Xie XiuQi)" , Oleg Nesterov , Avi Kivity , Gleb Natapov Date: Fri, 14 Oct 2016 05:32:02 -0400 In-Reply-To: References: Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-TXal5irFV9qUTkgfkn7L" Mime-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Fri, 14 Oct 2016 09:32:06 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6719 Lines: 181 --=-TXal5irFV9qUTkgfkn7L Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2016-10-14 at 14:14 +0800, Kefeng Wang wrote: > Hi all, >=20 > We met BUG_ON in do_device_not_available(fpu exception handler) when > run redhat7 in kvm guest, > and there is no special test on this guest, only some network packet > receipt and transmission. >=20 > I checked the new kernel version, found this commit > 4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a > Author: Andy Lutomirski > Date:=C2=A0=C2=A0=C2=A0Sun Jan 24 14:38:06 2016 -0800 >=20 > =C2=A0=C2=A0=C2=A0=C2=A0x86/fpu: Fix math emulation in eager fpu mode >=20 > =C2=A0=C2=A0=C2=A0=C2=A0Systems without an FPU are generally old and ther= efore use lazy > FPU > =C2=A0=C2=A0=C2=A0=C2=A0switching. Unsurprisingly, math emulation in eage= r FPU mode is a > =C2=A0=C2=A0=C2=A0=C2=A0bit buggy. Fix it. This patch is for "systems without an FPU" Before we go on, I would like to know what kind of CPU you tell your KVM virtual machine to present to the guest, and what kind of CPU your host system has. Are you by any chance configuring the CPU inside your virtual machine without an FPU? =C2=A0It is possible to mask out bits presented in the CPUID result inside a KVM guest, so I suspect this is possible... > =C2=A0=C2=A0=C2=A0=C2=A0There were two bugs involving kernel code trying = to use the FPU > =C2=A0=C2=A0=C2=A0=C2=A0registers in eager mode even if they didn't exist= and one > BUG_ON() > =C2=A0=C2=A0=C2=A0=C2=A0that was incorrect. >=20 >=20 > The BUG_ON() is incorrect, but I have no idea about eager fpu, why > the BUG_ON is incorrect? > Should we backport the patch to v3.10, or is there some bugs in the > qemu-kvm? > Any reply will be appreciated. >=20 > Thanks, > Kefeng >=20 > [1] BUG_ON > ---------------------------------------------------- > [347134.486436] ------------[ cut here ]------------ > [347134.487310] kernel BUG at arch/x86/kernel/traps.c:643! > [347134.487398] invalid opcode: 0000 [#1] SMP > [347134.500532] Modules linked in:loop binfmt_misc nf_log_ipv4 > nf_log_common xt_LOG softdog ipmi_devintf ipmi_msghandler xfs > libcrc32c tipc squashfs ipt_REJECT iptable_filter ip_tables dm_mod > crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel > ghash_clmulni_intel aesni_intel lrw ppdev gf128mul i2c_piix4 > glue_helper ablk_helper i2c_core cryptd serio_raw parport_pc parport > pcspkr ext3 mbcache jbd ata_generic pata_acpi virtio_console(OVE) > virtio_blk(OVE) virtio_balloon(OVE) virtio_net(OVE) ata_piix libata > floppy virtio_pci(OVE) virtio_ring(OVE) virtio(OVE) > [347134.525182] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0O > E ----V-------=C2=A0=C2=A0=C2=A03.10.0-229.20.1.x86_64 #1 SMP Mon Apr 18 = 11:26:55 > UTC 2016 > [347134.525182] Hardware name: QEMU Standard PC (i440FX + PIIX, > 1996), BIOS rel-1.8.1-0-g4adadbd-20160318_175052-HGH1000008214 > 04/01/2014 > [347134.525182] task: ffff8803fa2c5c00 ti: ffff8803fa2ec000 task.ti: > ffff8803fa2ec000 > [347134.525182] RIP: 0010:[]=C2=A0=C2=A0[] > do_device_not_available+0x13/0x60 > [347134.525182] RSP: 0018:ffff8803fa2abc80=C2=A0=C2=A0EFLAGS: 00010046 > [347134.525182] RAX: 000000008160ecec RBX: 0000000000000000 RCX: > ffffffff8160ecec > [347134.525182] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: > ffff8803fa2abc98 > [347134.525182] RBP: ffff8803fa2abc88 R08: 0000000000000000 R09: > 0000000000000000 > [347134.525182] R10: 0000000000000001 R11: 0000000000000005 R12: > ffff8803fa2c5c00 > [347134.525182] R13: ffff88040f550a40 R14: ffff8803fa2c6298 R15: > 0000000000000005 > [347134.544262] FS:=C2=A0=C2=A00000000000000000(0000) GS:ffff88040f540000= (0000) > knlGS:0000000000000000 > [347134.544262] CS:=C2=A0=C2=A00010 DS: 0000 ES: 0000 CR0: 00000000800500= 3b > [347134.544262] CR2: 00007f15a4ac0e30 CR3: 00000003f5081000 CR4: > 00000000000407e0 > [347134.544262] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [347134.544262] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [347134.544262] Stack: > [347134.544262]=C2=A0=C2=A00000000000000001 ffff8803fa2abd88 ffffffff8161= 8d8e > 0000000000000005 > [347134.544262]=C2=A0=C2=A0ffff8803fa2c6298 ffff88040f550a40 ffff8803fa2c= 5c00 > ffff8803fa2abd88 > [347134.544262]=C2=A0=C2=A0ffff8803fa284500 0000000000000005 000000000000= 0001 > 0000000000000000 > [347134.544262] Call Trace: > [347134.544262] Code: 81 a4 24 90 00 00 00 ff fe ff ff e9 df fe ff ff > e8 c3 f5 a5 ff 0f 1f 00 66 66 66 66 90 55 48 89 e5 53 66 66 66 66 90 > 31 db 66 90 <0f> 0b 0f 1f 00 65 8b 1c 25 74 f1 00 00 e8 fb 86 b4 ff > eb ea 66 > [347134.544262] RIP=C2=A0=C2=A0[] > do_device_not_available+0x13/0x60 > [347134.544262]=C2=A0=C2=A0RSP > [347134.580457] ---[ end trace 7c0ed2be7ded5c73 ]--- > [347134.580457] Kernel panic - not syncing: Fatal exception > [347134.580457] Shutting down cpus with NMI >=20 >=20 >=20 > [2] The /proc/cpuinfo shows below(show only the first cpu0), > -------------------------------- > localhost:~ # cat /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 45 > model name : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz > stepping : 7 > microcode : 0x1 > cpu MHz : 2899.992 > cache size : 4096 KB > physical id : 0 > siblings : 8 > core id : 0 > cpu cores : 8 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 13 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep > mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl eagerfpu > pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt > tsc_deadline_timer aes xsave avx hypervisor lahf_lm xsaveopt > bogomips : 5799.98 > clflush size : 64 > cache_alignment : 64 > address sizes : 42 bits physical, 48 bits virtual > power management: >=20 --=20 All Rights Reversed. --=-TXal5irFV9qUTkgfkn7L Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJYAKYTAAoJEM553pKExN6DxWgIAKRIXDWImWtpLOg8weIHqnST IVZ9WDpu5DIbV1WGQ6+9akIag+mQc9ThjcqLDttrd99lrS27W7BMZGzfdlh/p9u3 D3L8UWFUYLRsmPizvyNm7u51Lw1uB1AncJiuRw4WduFFomQC6Yp1UwQLeDOCtzjM nJr9cWacRqkC4nqZpvyyWCs0J2d0CvH9nZmp9a61RZZjNLoePTPq5ulFT+mWJco/ M6SIbMaSb0oxHIvePb8V9Kp1AIhlzPnDFFvUvsICSFhxzzQWb2pFwEACLuLeywso XZoQ/+1jDVI59M8TABIqpnuuUKBIdBnipiszFdOiqPZJ8JIwtWPNtgw3Rwt0fzQ= =RJQH -----END PGP SIGNATURE----- --=-TXal5irFV9qUTkgfkn7L--