2014-04-07 11:27:05

by Fengguang Wu

[permalink] [raw]
Subject: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

Hi Ken,

I got the below dmesg and the first bad commit is

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit 12e364b9f08aa335dc7716ce74113e834c993765
Author: Ken Cox <[email protected]>
AuthorDate: Tue Mar 4 07:58:07 2014 -0600
Commit: Greg Kroah-Hartman <[email protected]>
CommitDate: Tue Mar 4 16:58:21 2014 -0800

staging: visorchipset driver to provide registration and other services

The visorchipset module receives device creation and destruction
events from the Command service partition of s-Par, as well as
controlling registration of shared device drivers with the s-Par
driver core. The events received are used to populate other s-Par
modules with their assigned shared devices. Visorchipset is required
for shared device drivers to function properly. Visorchipset also
stores information for handling dump disk device creation during
kdump.

In operation, the visorchipset module processes device creation and
destruction messages sent by s-Par's Command service partition through
a channel. These messages result in creation (or destruction) of each
virtual bus and virtual device. Each bus and device is also associated
with a communication channel, which is used to communicate with one or
more IO service partitions to perform device IO on behalf of the
guest.

Signed-off-by: Ken Cox <[email protected]>
Cc: Ben Romer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

+-------------------------------------------------+------------+---------------+
| | 12e364b9f0 | next-20140403 |
+-------------------------------------------------+------------+---------------+
| boot_successes | 12 | 0 |
| boot_failures | 18 | 3 |
| invalid_opcode | 2 | |
| RIP:visorchipset_init | 10 | 3 |
| Kernel_panic-not_syncing:Fatal_exception | 10 | 3 |
| backtrace:visorchipset_init | 10 | 3 |
| backtrace:kernel_init_freeable | 10 | 3 |
| loadedinvalid_opcode | 8 | |
| BUG:kernel_early_hang_without_any_printk_output | 8 | |
| loadedinvalid_opcode:PREEMPT_SMP | 0 | 2 |
| invalid_opcode:PREEMPT_SMP | 0 | 1 |
| early-boot-hang | 0 | 4 |
+-------------------------------------------------+------------+---------------+

[ 24.135101] FPGA image file name: xlinx_fpga_firmware.bit
[ 24.137595] GPIO INIT FAIL!!
[ 24.141283] driver version 1.0.0.0 loaded
[ 24.142539] chipset driver version 1.0.0.0 loadedinvalid opcode: 0000 [#1] PREEMPT SMP
[ 24.144793] Modules linked in:
[ 24.145303] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc5-00621-g12e364b #1
[ 24.145303] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 24.145303] task: ffff88001157a010 ti: ffff88001157c000 task.ti: ffff88001157c000
[ 24.145303] RIP: 0010:[<ffffffff81e37115>] [<ffffffff81e37115>] visorchipset_init+0x7b/0x8c5
[ 24.145303] RSP: 0000:ffff88001157de58 EFLAGS: 00000286
[ 24.145303] RAX: 000000000000070b RBX: 0000000000000004 RCX: 4000000000000000
[ 24.145303] RDX: a70aba7500000000 RSI: ffff88001157de5c RDI: ffff88001157de58
[ 24.145303] RBP: ffff88001157de90 R08: 0000000000000002 R09: ffff88001157de60
[ 24.145303] R10: ffff88001157de64 R11: 0000000000000000 R12: ffff88001157de5c
[ 24.145303] R13: ffff88001157de60 R14: ffff88001157de64 R15: 0000000000000000
[ 24.145303] FS: 0000000000000000(0000) GS:ffff880012600000(0000) knlGS:0000000000000000
[ 24.145303] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 24.145303] CR2: ffff880002992000 CR3: 0000000001c07000 CR4: 00000000003006f0
[ 24.145303] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 24.145303] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
[ 24.145303] Stack:
[ 24.145303] 00000800000306c1 078bfbf982d82203 ffffffff81e3709a 0000000000000000
[ 24.145303] 00000000000001df 0000000000000000 0000000000000000 ffff88001157df08
[ 24.145303] ffffffff810002b2 ffffffff810b2600 ffff88001157df08 ffffffff810b27db
[ 24.145303] Call Trace:
[ 24.145303] [<ffffffff81e3709a>] ? visorchannel_init+0x1d/0x1d
[ 24.145303] [<ffffffff810002b2>] do_one_initcall+0x8e/0x138
[ 24.145303] [<ffffffff810b2600>] ? param_array_set+0xef/0xf5
[ 24.145303] [<ffffffff810b27db>] ? parse_args+0x180/0x248
[ 24.145303] [<ffffffff81dfbf86>] kernel_init_freeable+0x108/0x199
[ 24.145303] [<ffffffff81dfb73a>] ? do_early_param+0x8a/0x8a
[ 24.145303] [<ffffffff8173f08e>] ? rest_init+0xc2/0xc2
[ 24.145303] [<ffffffff8173f097>] kernel_init+0x9/0xda
[ 24.145303] [<ffffffff8176024c>] ret_from_fork+0x7c/0xb0
[ 24.145303] [<ffffffff8173f08e>] ? rest_init+0xc2/0xc2
[ 24.145303] Code: 8d 65 cc 4c 8d 6d d0 4c 8d 75 d4 79 21 48 ba 00 00 00 00 75 ba 0a a7 48 b9 00 00 00 00 00 00 00 40 bb 04 00 00 00 b8 0b 07 00 00 <0f> 01 c1 8b 35 c2 c4 b4 00 48 c7 c7 f5 93 b4 81 31 c0 e8 3b 21
[ 24.145303] RIP [<ffffffff81e37115>] visorchipset_init+0x7b/0x8c5
[ 24.145303] RSP <ffff88001157de58>
[ 24.187247] ---[ end trace 62b5721899a66a6c ]---
[ 24.188157] Kernel panic - not syncing: Fatal exception

git bisect start 4b22efdd5595f0acb48f02bf664a451ee98f9a2e v3.14 --
git bisect bad 850ba1df2c6aa754c9b2c8c23eac3161373d5492 # 16:46 0- 2 Merge remote-tracking branch 'samsung/for-next'
git bisect good 62ff577fa2fec87edbf26f53e87210ba726d4d44 # 16:51 30+ 1 Merge tag 'edac_for_3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
git bisect bad e6d9bfc63813882c896bf7ea6f6b14ca7b50b755 # 16:54 0- 15 Merge branch 'powernv-cpuidle' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
git bisect bad b33ce442993865180292df2a314ee5251ba38b50 # 16:56 0- 1 Merge branch 'for-3.15/drivers' of git://git.kernel.dk/linux-block
git bisect bad c12e69c6aaf785fd307d05cb6f36ca0e7577ead7 # 16:59 1- 3 Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good c70929147a10fa4538886cb23b934b509c4c0e49 # 17:04 30+ 0 Merge tag 'sound-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good 675c354a95d5375153b8bb80a0448cab916c7991 # 17:08 30+ 0 Merge tag 'char-misc-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect good 158e0d3621683ee0cdfeeba56f0e5ddd97ae984f # 17:12 30+ 0 Merge tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect bad 10933a9cfa258048813fd8a287be844d30711731 # 17:16 0- 11 staging: comedi: pcl818: exit interrupt quick when there is nothing to do
git bisect good b14ec4bd3f576be2bb7dd4c69764a422d782e06d # 17:19 30+ 3 staging: comedi: pcl816: remove 'last_int_sub' from private data
git bisect good f37e756033c918c198bc2df6dada27df25ce5a6f # 17:26 30+ 0 staging: rtl8187se: Convert Stats typedef into a struct
git bisect good 99a1b98b32211713d8e544e9737a5fa20b73dce3 # 17:37 30+ 1 staging: comedi: ke_counter: fix ke_counter_insn_write()
git bisect bad 9b073ac53eea902712f88388b42efcebce211bec # 17:40 0- 4 staging: dgap: Fix various previously missed checkpatch errors
git bisect bad 6c76aab5bdec769ac05bb81dc6bb46cd5a253b4b # 17:43 1- 4 drivers: staging: rtl8187se: refactor/clean signal smoothing
git bisect good 5c2f26def8f3bb252c32df4cbe0979140d8face6 # 17:50 30+ 0 staging: octeon-ethernet: add missing include
git bisect bad 7b2a2d8383d08793aac3f157fa3f38ea90c5d3c0 # 17:53 0- 2 staging: visorchannelstub driver to provide channel support routines
git bisect good 9d9baadd4069c77a97bf530abad9ddb74875fe76 # 18:03 30+ 0 staging: visorutil driver to provide common functionality to other s-Par drivers
git bisect bad 12e364b9f08aa335dc7716ce74113e834c993765 # 18:28 7- 10 staging: visorchipset driver to provide registration and other services
git bisect good e423812a9e430913e41c6565922142fe22f83ad7 # 18:34 43+ 0 staging: visorchannel module
# first bad commit: [12e364b9f08aa335dc7716ce74113e834c993765] staging: visorchipset driver to provide registration and other services
git bisect good e423812a9e430913e41c6565922142fe22f83ad7 # 18:39 129+ 0 staging: visorchannel module
git bisect bad 4b22efdd5595f0acb48f02bf664a451ee98f9a2e # 18:40 0- 7 Add linux-next specific files for 20140403
git bisect bad 18a1a7a1d862ae0794a0179473d08a414dd49234 # 18:45 0- 1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
git bisect bad abfcdfd63f0b68994bf3d0de84ddb7220d73a063 # 18:48 0- 16 Add linux-next specific files for 20140407


This script may reproduce the error.

-----------------------------------------------------------------------------
#!/bin/bash

kernel=$1

kvm=(
qemu-system-x86_64 -cpu kvm64 -enable-kvm
-kernel $kernel
-smp 2
-m 256M
-net nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio
-net user,vlan=0
-net nic,vlan=1,model=e1000
-net user,vlan=1
-boot order=nc
-no-reboot
-watchdog i6300esb
-serial stdio
-display none
-monitor null
)

append=(
debug
sched_debug
apic=debug
ignore_loglevel
sysrq_always_enabled
panic=10
prompt_ramdisk=0
earlyprintk=ttyS0,115200
console=ttyS0,115200
console=tty0
vga=normal
root=/dev/ram0
rw
)

"${kvm[@]}" --append "${append[*]}"
-----------------------------------------------------------------------------

Thanks,
Fengguang


Attachments:
(No filename) (9.85 kB)
dmesg-quantal-f4-128:20140407182830:x86_64-randconfig-br0-04050702:3.14.0-rc5-00621-g12e364b:1 (59.84 kB)
x86_64-randconfig-br0-04050702-4b22efdd5595f0acb48f02bf664a451ee98f9a2e-RIP:----visorchipset_init+-x-59736.log (70.77 kB)
config-3.14.0-rc5-00621-g12e364b (96.71 kB)
Download all attachments

2014-04-07 14:04:22

by Ken Cox

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP


On 04/07/2014 06:17 AM, Fengguang Wu wrote:
> Hi Ken,
>
> I got the below dmesg and the first bad commit is
>
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
> commit 12e364b9f08aa335dc7716ce74113e834c993765
> Author: Ken Cox <[email protected]>
> AuthorDate: Tue Mar 4 07:58:07 2014 -0600
> Commit: Greg Kroah-Hartman <[email protected]>
> CommitDate: Tue Mar 4 16:58:21 2014 -0800
>
--snip--
> [ 24.135101] FPGA image file name: xlinx_fpga_firmware.bit
> [ 24.137595] GPIO INIT FAIL!!
> [ 24.141283] driver version 1.0.0.0 loaded
> [ 24.142539] chipset driver version 1.0.0.0 loadedinvalid opcode: 0000 [#1] PREEMPT SMP
> [ 24.144793] Modules linked in:
> [ 24.145303] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc5-00621-g12e364b #1
> [ 24.145303] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 24.145303] task: ffff88001157a010 ti: ffff88001157c000 task.ti: ffff88001157c000
> [ 24.145303] RIP: 0010:[<ffffffff81e37115>] [<ffffffff81e37115>] visorchipset_init+0x7b/0x8c5
The problem is that the driver is trying to call firmware code that only
exists on Unisys s-Par hardware. I will add a check to make sure the
driver is running on the correct platform before trying to call into the
firmware.

2014-04-07 14:07:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote:
> Hi Ken,
>
> I got the below dmesg and the first bad commit is
>
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
> commit 12e364b9f08aa335dc7716ce74113e834c993765
> Author: Ken Cox <[email protected]>
> AuthorDate: Tue Mar 4 07:58:07 2014 -0600
> Commit: Greg Kroah-Hartman <[email protected]>
> CommitDate: Tue Mar 4 16:58:21 2014 -0800
>
> staging: visorchipset driver to provide registration and other services

I think Sasha has already sent a fix to resolve this issue that I'll be
sending to Linus in a day or so.

Ken, is Sasha's patch going to resolve this issue as well? It looks
like people haven't tested what happens when the module is loaded
without the hardware present in the system :(

thanks,

greg k-h

2014-04-07 14:24:44

by Ken Cox

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP


On 04/07/2014 09:09 AM, Greg Kroah-Hartman wrote:
> On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote:
>> Hi Ken,
>>
>> I got the below dmesg and the first bad commit is
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>
>> commit 12e364b9f08aa335dc7716ce74113e834c993765
>> Author: Ken Cox <[email protected]>
>> AuthorDate: Tue Mar 4 07:58:07 2014 -0600
>> Commit: Greg Kroah-Hartman <[email protected]>
>> CommitDate: Tue Mar 4 16:58:21 2014 -0800
>>
>> staging: visorchipset driver to provide registration and other services
> I think Sasha has already sent a fix to resolve this issue that I'll be
> sending to Linus in a day or so.
>
> Ken, is Sasha's patch going to resolve this issue as well? It looks
> like people haven't tested what happens when the module is loaded
> without the hardware present in the system :(
You are exactly right. The driver needs to check for hardware early on
before trying to use it. Unfortunately, Sasha's patch will not resolve
this one. I'll work with Ben Romer to get a patch out ASAP.

Thanks,
Ken Cox

2014-04-07 19:21:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On Mon, Apr 07, 2014 at 09:24:37AM -0500, Ken Cox wrote:
>
> On 04/07/2014 09:09 AM, Greg Kroah-Hartman wrote:
> >On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote:
> >>Hi Ken,
> >>
> >>I got the below dmesg and the first bad commit is
> >>
> >>git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> >>
> >>commit 12e364b9f08aa335dc7716ce74113e834c993765
> >>Author: Ken Cox <[email protected]>
> >>AuthorDate: Tue Mar 4 07:58:07 2014 -0600
> >>Commit: Greg Kroah-Hartman <[email protected]>
> >>CommitDate: Tue Mar 4 16:58:21 2014 -0800
> >>
> >> staging: visorchipset driver to provide registration and other services
> >I think Sasha has already sent a fix to resolve this issue that I'll be
> >sending to Linus in a day or so.
> >
> >Ken, is Sasha's patch going to resolve this issue as well? It looks
> >like people haven't tested what happens when the module is loaded
> >without the hardware present in the system :(
> You are exactly right. The driver needs to check for hardware early on
> before trying to use it. Unfortunately, Sasha's patch will not resolve this
> one. I'll work with Ben Romer to get a patch out ASAP.

Wait, in looking at this closer, I don't see any of the "normal"
hardware checks to determine that this really is a valid piece of
hardware present, before it starts to just go and initialize a whole
bunch of things (sysfs busses, proc files and directories, and other
things.)

That's not ok, and it's obvious it's starting to affect people's work
systems.

How about I just mark the whole thing BROKEN for now, disabling the
build, until "correct" hardware probing can be added to the driver, so
no one else gets hurt by this?

thanks,

greg k-h

2014-04-07 19:44:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On Mon, Apr 07, 2014 at 12:23:47PM -0700, Greg Kroah-Hartman wrote:
> On Mon, Apr 07, 2014 at 09:24:37AM -0500, Ken Cox wrote:
> >
> > On 04/07/2014 09:09 AM, Greg Kroah-Hartman wrote:
> > >On Mon, Apr 07, 2014 at 07:17:25PM +0800, Fengguang Wu wrote:
> > >>Hi Ken,
> > >>
> > >>I got the below dmesg and the first bad commit is
> > >>
> > >>git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > >>
> > >>commit 12e364b9f08aa335dc7716ce74113e834c993765
> > >>Author: Ken Cox <[email protected]>
> > >>AuthorDate: Tue Mar 4 07:58:07 2014 -0600
> > >>Commit: Greg Kroah-Hartman <[email protected]>
> > >>CommitDate: Tue Mar 4 16:58:21 2014 -0800
> > >>
> > >> staging: visorchipset driver to provide registration and other services
> > >I think Sasha has already sent a fix to resolve this issue that I'll be
> > >sending to Linus in a day or so.
> > >
> > >Ken, is Sasha's patch going to resolve this issue as well? It looks
> > >like people haven't tested what happens when the module is loaded
> > >without the hardware present in the system :(
> > You are exactly right. The driver needs to check for hardware early on
> > before trying to use it. Unfortunately, Sasha's patch will not resolve this
> > one. I'll work with Ben Romer to get a patch out ASAP.
>
> Wait, in looking at this closer, I don't see any of the "normal"
> hardware checks to determine that this really is a valid piece of
> hardware present, before it starts to just go and initialize a whole
> bunch of things (sysfs busses, proc files and directories, and other
> things.)
>
> That's not ok, and it's obvious it's starting to affect people's work
> systems.
>
> How about I just mark the whole thing BROKEN for now, disabling the
> build, until "correct" hardware probing can be added to the driver, so
> no one else gets hurt by this?

In looking at it further, that seems like the best thing to do for now,
we can slowly enable the driver back after things like proper device
probing is fixed up so as to not break people's boxes.

thanks,

greg k-h

2014-04-08 02:57:45

by Fengguang Wu

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

Hi Benjamin,

> Fengguang,
>
> I ran your script against freshly-checked-out source from staging-next, and was not able to reproduce the error with it. My boot log is attached. I noticed that your log did not have "Hypervisor detected: KVM" in the trace. The KVM options in your script also differ substantially from the ones shown at the end of your trace...

> When I reran your script with the "-cpu Haswell,+smep,+smap" option I was able to get the same result as you. IMHO KVM should not be setting this bit if it's emulating bare metal.

Sorry.. We tried to provide a simplified reproduce script and in your
case, it has a significant mismatch with the real KVM options. We'll
fix it, thanks for pointing it out!

Thanks,
Fengguang

2014-04-08 15:46:07

by Ben Romer

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On Tue, 2014-04-08 at 10:53 +0800, Fengguang Wu wrote:
> Hi Benjamin,
>
> > Fengguang,
> >
> > I ran your script against freshly-checked-out source from staging-next, and was not able to reproduce the error with it. My boot log is attached. I noticed that your log did not have "Hypervisor detected: KVM" in the trace. The KVM options in your script also differ substantially from the ones shown at the end of your trace...
>
> > When I reran your script with the "-cpu Haswell,+smep,+smap" option I was able to get the same result as you. IMHO KVM should not be setting this bit if it's emulating bare metal.
>
> Sorry.. We tried to provide a simplified reproduce script and in your
> case, it has a significant mismatch with the real KVM options. We'll
> fix it, thanks for pointing it out!
>
> Thanks,
> Fengguang

That will be helpful, and as I mentioned, I can reproduce your results,
but I'm still not sure why a virtualized processor is giving an invalid
opcode fault on a vmcall. The Intel documentation is pretty specific
about this - IF not in VMX operation THEN #UD; ELSIF in VMX non-root
operation THEN VM exit.

Either KVM should be saying "I'm a real processor and not a virtual CPU,
really!" - in which case, the hypervisor bit should be off and vmcalls
should cause an invalid opcode fault, or, KVM should be saying "I'm a
vritualized processor!" and setting the hypervisor bit, and doing a
vmexit on vmcall instead. This seems like a KVM bug to me.

-- Ben
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-04-09 23:03:07

by Fengguang Wu

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

CC the KVM people: it looks like a KVM problem that can be triggered by

qemu-system-x86_64 -cpu Haswell,+smep,+smap

On Thu, Apr 10, 2014 at 01:58:18AM +0800, Jet Chen wrote:
> On 04/09/2014 10:44 PM, Romer, Benjamin M wrote:
> > On Wed, 2014-04-09 at 02:38 +0800, Jet Chen wrote:
> >
> >> Hi Ben,
> >>
> >> I checked my <Intel 64 and IA-32 Architectures Software Developer's Manual> which published in Feb 2014.
> >> Volume 2: Instruction Set Reference, A-Z: CPUID--CPU Identification
> >>
> >
> > I agree completely, which is why I'm confused about KVM's behavior. If
> > bit 31 was off, the code in our drivers that uses the vmcall instruction
> > would not have been run, the kernel would not have tried to perform a
> > vmcall, and not crashed with invalid op.
> >
> > If you look in the definition for the VMCALL instruction (Intel 64 and
> > IA32 Architectures Software Developer's Manual, volume 3C pg.30-9)
> > You'll see that a processor in VMX non-root operation should perform a
> > vmexit.
> >
> >> Why this document not match what you said ? I am not experienced with VM, please correct me if I went for wrong document
> >>
> >
> > According to VMWare's documentation (there is a page at
> > http://kb.vmware.com./selfservice/microsites/search.do?cmd=displayKC&externalId=1009458 ) , as well as Microsoft's hypervisor spec (at http://www.microsoft.com/en-us/download/details.aspx?id=39289 ), this bit is used to indicate the CPU is running under virtualization. KVM is also setting this bit to indicate virtualization. I believe Xen uses it as well.
> >
> >
> > My contention is, if KVM is going to set the ISVM bit, it needs to do a
> > vmexit, and if it's not going to set the bit, then doing an invalid op
> > is okay, but the current behavior is inconsistent.
> >
> > -- Ben
> >
>
> Ben,
>
> Really thanks for your explanation.
> Let me summary it up, please correct me where i am wrong. If it is really a KVM bug, we report it to KVM guys.
> On a real CPU, ECX 31bit always be 0 as Intel documentation filed.
> However, KVM, as a hypervisor, should emulate this bit of the virtual ECX register to 1 for guest OS to indicate it is running in a virtualization environment.
> Problem is, KVM does set this bit to 1, but does an invalid op instead of emit a VMCALL. As a result, we get this dmesg error messages.
>
> Thanks,
> -Jet

2014-04-09 23:10:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On 04/09/2014 04:01 PM, Fengguang Wu wrote:
> CC the KVM people: it looks like a KVM problem that can be triggered by
>
> qemu-system-x86_64 -cpu Haswell,+smep,+smap

Is it a KVM problem or a Qemu bug? It sounds more like a Qemu JIT bug.

-hpa


> On Thu, Apr 10, 2014 at 01:58:18AM +0800, Jet Chen wrote:
>> On 04/09/2014 10:44 PM, Romer, Benjamin M wrote:
>>> On Wed, 2014-04-09 at 02:38 +0800, Jet Chen wrote:
>>>
>>>> Hi Ben,
>>>>
>>>> I checked my <Intel 64 and IA-32 Architectures Software Developer's Manual> which published in Feb 2014.
>>>> Volume 2: Instruction Set Reference, A-Z: CPUID--CPU Identification
>>>>
>>>
>>> I agree completely, which is why I'm confused about KVM's behavior. If
>>> bit 31 was off, the code in our drivers that uses the vmcall instruction
>>> would not have been run, the kernel would not have tried to perform a
>>> vmcall, and not crashed with invalid op.
>>>
>>> If you look in the definition for the VMCALL instruction (Intel 64 and
>>> IA32 Architectures Software Developer's Manual, volume 3C pg.30-9)
>>> You'll see that a processor in VMX non-root operation should perform a
>>> vmexit.
>>>
>>>> Why this document not match what you said ? I am not experienced with VM, please correct me if I went for wrong document
>>>>
>>>
>>> According to VMWare's documentation (there is a page at
>>> http://kb.vmware.com./selfservice/microsites/search.do?cmd=displayKC&externalId=1009458 ) , as well as Microsoft's hypervisor spec (at http://www.microsoft.com/en-us/download/details.aspx?id=39289 ), this bit is used to indicate the CPU is running under virtualization. KVM is also setting this bit to indicate virtualization. I believe Xen uses it as well.
>>>
>>>
>>> My contention is, if KVM is going to set the ISVM bit, it needs to do a
>>> vmexit, and if it's not going to set the bit, then doing an invalid op
>>> is okay, but the current behavior is inconsistent.
>>>
>>> -- Ben
>>>
>>
>> Ben,
>>
>> Really thanks for your explanation.
>> Let me summary it up, please correct me where i am wrong. If it is really a KVM bug, we report it to KVM guys.
>> On a real CPU, ECX 31bit always be 0 as Intel documentation filed.
>> However, KVM, as a hypervisor, should emulate this bit of the virtual ECX register to 1 for guest OS to indicate it is running in a virtualization environment.
>> Problem is, KVM does set this bit to 1, but does an invalid op instead of emit a VMCALL. As a result, we get this dmesg error messages.
>>
>> Thanks,
>> -Jet

2014-04-09 23:10:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On 04/09/2014 04:01 PM, Fengguang Wu wrote:
> CC the KVM people: it looks like a KVM problem that can be triggered by
>
> qemu-system-x86_64 -cpu Haswell,+smep,+smap

I'm really confused. First of all, is this a KVM problem or is it a
Qemu JIT problem?

Either seems really wonky. It is questionable at best whether or not
Qemu in JIT mode should set the hypervisor bit IMO. However, even so,
you *better* not call VMCALL *just* because the hypervisor bit is set.

The reason for it is that you have absolutely no idea what VMCALL is
going to do on any one hypervisor... different hypervisors even use
completely different conventions for VMCALL, and some might not accept
VMCALL at all and might just terminate your guest with extreme prejudice.

So what is actually going on here?

-hpa

2014-04-10 13:19:46

by Ben Romer

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On Wed, 2014-04-09 at 16:10 -0700, H. Peter Anvin wrote:
> On 04/09/2014 04:01 PM, Fengguang Wu wrote:
> > CC the KVM people: it looks like a KVM problem that can be triggered by
> >
> > qemu-system-x86_64 -cpu Haswell,+smep,+smap
>
> I'm really confused. First of all, is this a KVM problem or is it a
> Qemu JIT problem?
>
> Either seems really wonky. It is questionable at best whether or not
> Qemu in JIT mode should set the hypervisor bit IMO. However, even so,
> you *better* not call VMCALL *just* because the hypervisor bit is set.
>
> The reason for it is that you have absolutely no idea what VMCALL is
> going to do on any one hypervisor... different hypervisors even use
> completely different conventions for VMCALL, and some might not accept
> VMCALL at all and might just terminate your guest with extreme prejudice.
>
> So what is actually going on here?
>
> -hpa
>

I'm confused by the intended behavior of KVM.. Is the intention of the
-cpu switch to fully emulate a particular CPU? If that's the case, the
Intel documentation says bit 31 should always be 0, so the value
returned by the cpuid instruction isn't correct. If the intention is to
present a VM with a specific CPU architecture, the CPU ought to behave
as described in Intel's virtualization documentation and just vmexit
instead of faulting with invalid op, IMHO.

I've already said the check in the code was insufficient, and I'm trying
to fix that part now. :)
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-04-11 02:28:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On 04/10/2014 06:19 AM, Romer, Benjamin M wrote:
>
> I'm confused by the intended behavior of KVM.. Is the intention of the
> -cpu switch to fully emulate a particular CPU? If that's the case, the
> Intel documentation says bit 31 should always be 0, so the value
> returned by the cpuid instruction isn't correct. If the intention is to
> present a VM with a specific CPU architecture, the CPU ought to behave
> as described in Intel's virtualization documentation and just vmexit
> instead of faulting with invalid op, IMHO.
>
> I've already said the check in the code was insufficient, and I'm trying
> to fix that part now. :)
>

I'm still confused where KVM comes into the picture. Are you actually
using KVM (and thus talking about nested virtualization) or are you
using Qemu in JIT mode and running another hypervisor underneath?

The hypervisor bit is a complete red herring. If the guest CPU is
running in VT-x mode, then VMCALL should VMEXIT inside the guest
(invoking the guest root VT-x), but the fact still remains that you
should never, ever, invoke VMCALL unless you know what hypervisor you
have underneath.

-hpa

2014-04-11 13:51:51

by Ben Romer

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On Thu, 2014-04-10 at 19:28 -0700, H. Peter Anvin wrote:
> On 04/10/2014 06:19 AM, Romer, Benjamin M wrote:
> >
> > I'm confused by the intended behavior of KVM.. Is the intention of the
> > -cpu switch to fully emulate a particular CPU? If that's the case, the
> > Intel documentation says bit 31 should always be 0, so the value
> > returned by the cpuid instruction isn't correct. If the intention is to
> > present a VM with a specific CPU architecture, the CPU ought to behave
> > as described in Intel's virtualization documentation and just vmexit
> > instead of faulting with invalid op, IMHO.
> >
> > I've already said the check in the code was insufficient, and I'm trying
> > to fix that part now. :)
> >
>
> I'm still confused where KVM comes into the picture. Are you actually
> using KVM (and thus talking about nested virtualization) or are you
> using Qemu in JIT mode and running another hypervisor underneath?

The test that Fengguang used to find the problem was running the linux
kernel directly using KVM. When the kernel was run with "-cpu Haswell,
+smep,+smap" set, the vmcall failed with invalid op, but when the kernel
is run with "-cpu qemu64", the vmcall causes a vmexit, as it should.

My point is, the vmcall was made because the hypervisor bit was set. If
this bit had been turned off, as it would be on a real processor, the
vmcall wouldn't have happened.

> The hypervisor bit is a complete red herring. If the guest CPU is
> running in VT-x mode, then VMCALL should VMEXIT inside the guest
> (invoking the guest root VT-x),

The CPU is running in VT-X. That was my point, the kernel is running in
the KVM guest, and KVM is setting the CPU feature bits such that bit 31
is enabled.

I don't think it's a red herring because the kernel uses this bit
elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU
features, and can be checked with the cpu_has_hypervisor macro (which
was not used by the original author of the code in the driver, but
should have been). VMWare and KVM support in the kernel also check for
this bit before checking their hypervisor leaves for an ID. If it's not
properly set it affects more than just the s-Par drivers.

> but the fact still remains that you
> should never, ever, invoke VMCALL unless you know what hypervisor you
> have underneath.

>From the standpoint of the s-Par drivers, yes, I agree (as I already
said). However, VMCALL is not a privileged instruction, so anyone could
use it from user space and go right past the OS straight to the
hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since
any user could hard-stop the guest with a couple of lines of C.

-- Ben
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-04-11 16:34:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On 04/11/2014 06:51 AM, Romer, Benjamin M wrote:
>
>> I'm still confused where KVM comes into the picture. Are you actually
>> using KVM (and thus talking about nested virtualization) or are you
>> using Qemu in JIT mode and running another hypervisor underneath?
>
> The test that Fengguang used to find the problem was running the linux
> kernel directly using KVM. When the kernel was run with "-cpu Haswell,
> +smep,+smap" set, the vmcall failed with invalid op, but when the kernel
> is run with "-cpu qemu64", the vmcall causes a vmexit, as it should.

As far as I know, Fengguang's test doesn't use KVM at all, it runs Qemu
as a JIT. Completely different thing. In that case Qemu probably
should *not* set the hypervisor bit. However, the only thing that the
hypervisor bit means is that you can look for specific hypervisor APIs
in CPUID level 0x40000000+.

> My point is, the vmcall was made because the hypervisor bit was set. If
> this bit had been turned off, as it would be on a real processor, the
> vmcall wouldn't have happened.

And my point is that that is a bug. In the driver. A very serious one.
You cannot call VMCALL until you know *which* hypervisor API(s) you
have available, period.

>> The hypervisor bit is a complete red herring. If the guest CPU is
>> running in VT-x mode, then VMCALL should VMEXIT inside the guest
>> (invoking the guest root VT-x),
>
> The CPU is running in VT-X. That was my point, the kernel is running in
> the KVM guest, and KVM is setting the CPU feature bits such that bit 31
> is enabled.

Which it is because it wants to export the KVM hypercall interface.
However, keying VMCALL *only* on the HYPERVISOR bit is wrong in the extreme.

> I don't think it's a red herring because the kernel uses this bit
> elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU
> features, and can be checked with the cpu_has_hypervisor macro (which
> was not used by the original author of the code in the driver, but
> should have been). VMWare and KVM support in the kernel also check for
> this bit before checking their hypervisor leaves for an ID. If it's not
> properly set it affects more than just the s-Par drivers.
>
>> but the fact still remains that you
>> should never, ever, invoke VMCALL unless you know what hypervisor you
>> have underneath.
>
> From the standpoint of the s-Par drivers, yes, I agree (as I already
> said). However, VMCALL is not a privileged instruction, so anyone could
> use it from user space and go right past the OS straight to the
> hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since
> any user could hard-stop the guest with a couple of lines of C.

Typically the hypervisor wants to generate a #UD inside of the guest for
that case. The guest OS will intercept it and SIGILL the user space
process.

-hpa

2014-04-11 17:36:29

by Jet Chen

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On 04/12/2014 12:33 AM, H. Peter Anvin wrote:
> On 04/11/2014 06:51 AM, Romer, Benjamin M wrote:
>>
>>> I'm still confused where KVM comes into the picture. Are you actually
>>> using KVM (and thus talking about nested virtualization) or are you
>>> using Qemu in JIT mode and running another hypervisor underneath?
>>
>> The test that Fengguang used to find the problem was running the linux
>> kernel directly using KVM. When the kernel was run with "-cpu Haswell,
>> +smep,+smap" set, the vmcall failed with invalid op, but when the kernel
>> is run with "-cpu qemu64", the vmcall causes a vmexit, as it should.
>
> As far as I know, Fengguang's test doesn't use KVM at all, it runs Qemu
> as a JIT. Completely different thing. In that case Qemu probably
> should *not* set the hypervisor bit. However, the only thing that the
> hypervisor bit means is that you can look for specific hypervisor APIs
> in CPUID level 0x40000000+.
>
>> My point is, the vmcall was made because the hypervisor bit was set. If
>> this bit had been turned off, as it would be on a real processor, the
>> vmcall wouldn't have happened.
>
> And my point is that that is a bug. In the driver. A very serious one.
> You cannot call VMCALL until you know *which* hypervisor API(s) you
> have available, period.
>
>>> The hypervisor bit is a complete red herring. If the guest CPU is
>>> running in VT-x mode, then VMCALL should VMEXIT inside the guest
>>> (invoking the guest root VT-x),
>>
>> The CPU is running in VT-X. That was my point, the kernel is running in
>> the KVM guest, and KVM is setting the CPU feature bits such that bit 31
>> is enabled.
>
> Which it is because it wants to export the KVM hypercall interface.
> However, keying VMCALL *only* on the HYPERVISOR bit is wrong in the extreme.
>
>> I don't think it's a red herring because the kernel uses this bit
>> elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU
>> features, and can be checked with the cpu_has_hypervisor macro (which
>> was not used by the original author of the code in the driver, but
>> should have been). VMWare and KVM support in the kernel also check for
>> this bit before checking their hypervisor leaves for an ID. If it's not
>> properly set it affects more than just the s-Par drivers.
>>
>>> but the fact still remains that you
>>> should never, ever, invoke VMCALL unless you know what hypervisor you
>>> have underneath.
>>
>> From the standpoint of the s-Par drivers, yes, I agree (as I already
>> said). However, VMCALL is not a privileged instruction, so anyone could
>> use it from user space and go right past the OS straight to the
>> hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since
>> any user could hard-stop the guest with a couple of lines of C.
>
> Typically the hypervisor wants to generate a #UD inside of the guest for
> that case. The guest OS will intercept it and SIGILL the user space
> process.
>
> -hpa
>

Hi Ben,

I re-tested this case with/without option -enable-kvm.

qemu-system-x86_64 -cpu Haswell,+smep,+smap invalid op
qemu-system-x86_64 -cpu kvm64 invalid op
qemu-system-x86_64 -cpu Haswell,+smep,+smap -enable-kvm everything OK
qemu-system-x86_64 -cpu kvm64 -enable-kvm everything OK

I think this is probably a bug in QEMU.
Sorry for misleading you. I am not experienced in QEMU usage. I don't realize I need try this case with different options Until read Peter's reply.

As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case.

Thanks,
Jet

2014-04-11 17:41:10

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On 04/11/2014 10:35 AM, Jet Chen wrote:
>
> As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case.
>

Either way, unless there is a CPUID interface exposed in CPUID levels
0x40000000+, then relying on the hypervisor bit to do VMCALL is wrong in
the extreme.

-hpa

2014-04-11 17:49:41

by Ben Romer

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On Sat, 2014-04-12 at 01:35 +0800, Jet Chen wrote:

> Hi Ben,
>
> I re-tested this case with/without option -enable-kvm.
>
> qemu-system-x86_64 -cpu Haswell,+smep,+smap invalid op
> qemu-system-x86_64 -cpu kvm64 invalid op
> qemu-system-x86_64 -cpu Haswell,+smep,+smap -enable-kvm everything OK
> qemu-system-x86_64 -cpu kvm64 -enable-kvm everything OK
>
> I think this is probably a bug in QEMU.
> Sorry for misleading you. I am not experienced in QEMU usage. I don't realize I need try this case with different options Until read Peter's reply.
>
> As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case.
>
> Thanks,
> Jet

Great, thanks! Sorry for the trouble. :)

-- Ben


????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-04-11 17:51:58

by Ben Romer

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

On Fri, 2014-04-11 at 10:40 -0700, H. Peter Anvin wrote:
> On 04/11/2014 10:35 AM, Jet Chen wrote:
> >
> > As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case.
> >
>
> Either way, unless there is a CPUID interface exposed in CPUID levels
> 0x40000000+, then relying on the hypervisor bit to do VMCALL is wrong in
> the extreme.
>
> -hpa
>
>

I'll pass your feedback on to the people who wrote the bad code. Sorry
for the trouble. :)

-- Ben
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-04-13 11:51:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

Should we perhaps CC qemu-devel here for an opinion.

Guys, this mail should explain the issue but in case there are
questions, the whole thread starts here:

http://lkml.kernel.org/r/20140407111725.GC25152@localhost

Thanks.

On Sat, Apr 12, 2014 at 01:35:49AM +0800, Jet Chen wrote:
> On 04/12/2014 12:33 AM, H. Peter Anvin wrote:
> > On 04/11/2014 06:51 AM, Romer, Benjamin M wrote:
> >>
> >>> I'm still confused where KVM comes into the picture. Are you actually
> >>> using KVM (and thus talking about nested virtualization) or are you
> >>> using Qemu in JIT mode and running another hypervisor underneath?
> >>
> >> The test that Fengguang used to find the problem was running the linux
> >> kernel directly using KVM. When the kernel was run with "-cpu Haswell,
> >> +smep,+smap" set, the vmcall failed with invalid op, but when the kernel
> >> is run with "-cpu qemu64", the vmcall causes a vmexit, as it should.
> >
> > As far as I know, Fengguang's test doesn't use KVM at all, it runs Qemu
> > as a JIT. Completely different thing. In that case Qemu probably
> > should *not* set the hypervisor bit. However, the only thing that the
> > hypervisor bit means is that you can look for specific hypervisor APIs
> > in CPUID level 0x40000000+.
> >
> >> My point is, the vmcall was made because the hypervisor bit was set. If
> >> this bit had been turned off, as it would be on a real processor, the
> >> vmcall wouldn't have happened.
> >
> > And my point is that that is a bug. In the driver. A very serious one.
> > You cannot call VMCALL until you know *which* hypervisor API(s) you
> > have available, period.
> >
> >>> The hypervisor bit is a complete red herring. If the guest CPU is
> >>> running in VT-x mode, then VMCALL should VMEXIT inside the guest
> >>> (invoking the guest root VT-x),
> >>
> >> The CPU is running in VT-X. That was my point, the kernel is running in
> >> the KVM guest, and KVM is setting the CPU feature bits such that bit 31
> >> is enabled.
> >
> > Which it is because it wants to export the KVM hypercall interface.
> > However, keying VMCALL *only* on the HYPERVISOR bit is wrong in the extreme.
> >
> >> I don't think it's a red herring because the kernel uses this bit
> >> elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU
> >> features, and can be checked with the cpu_has_hypervisor macro (which
> >> was not used by the original author of the code in the driver, but
> >> should have been). VMWare and KVM support in the kernel also check for
> >> this bit before checking their hypervisor leaves for an ID. If it's not
> >> properly set it affects more than just the s-Par drivers.
> >>
> >>> but the fact still remains that you
> >>> should never, ever, invoke VMCALL unless you know what hypervisor you
> >>> have underneath.
> >>
> >> From the standpoint of the s-Par drivers, yes, I agree (as I already
> >> said). However, VMCALL is not a privileged instruction, so anyone could
> >> use it from user space and go right past the OS straight to the
> >> hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since
> >> any user could hard-stop the guest with a couple of lines of C.
> >
> > Typically the hypervisor wants to generate a #UD inside of the guest for
> > that case. The guest OS will intercept it and SIGILL the user space
> > process.
> >
> > -hpa
> >
>
> Hi Ben,
>
> I re-tested this case with/without option -enable-kvm.
>
> qemu-system-x86_64 -cpu Haswell,+smep,+smap invalid op
> qemu-system-x86_64 -cpu kvm64 invalid op
> qemu-system-x86_64 -cpu Haswell,+smep,+smap -enable-kvm everything OK
> qemu-system-x86_64 -cpu kvm64 -enable-kvm everything OK
>
> I think this is probably a bug in QEMU.
> Sorry for misleading you. I am not experienced in QEMU usage. I don't realize I need try this case with different options Until read Peter's reply.
>
> As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case.
>
> Thanks,
> Jet
>

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-13 12:21:10

by Jet Chen

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

Thanks Borislav.
As I never test this issue on the latest version of qemu, qemu guys may want to reproduce it on their side.

Although every reproduce detail can be found in this mail thread, I would like to give a summary here.

- kernel code base:

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit 12e364b9f08aa335dc7716ce74113e834c993765
Author: Ken Cox <[email protected]>
AuthorDate: Tue Mar 4 07:58:07 2014 -0600
Commit: Greg Kroah-Hartman <[email protected]>
CommitDate: Tue Mar 4 16:58:21 2014 -0800

staging: visorchipset driver to provide registration and other services


- reproduce script (original one in the first message of this thread can't manage to reproduce. should remove -enable-kvm option):
------------------------------------------------------------
#!/bin/bash

kernel=$1

kvm=(
qemu-system-x86_64 -cpu kvm64
-kernel $kernel
-smp 2
-m 256M
-net nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio
-net user,vlan=0
-net nic,vlan=1,model=e1000
-net user,vlan=1
-boot order=nc
-no-reboot
-watchdog i6300esb
-serial stdio
-display none
-monitor null
)

append=(
debug
sched_debug
apic=debug
ignore_loglevel
sysrq_always_enabled
panic=10
prompt_ramdisk=0
earlyprintk=ttyS0,115200
console=ttyS0,115200
console=tty0
vga=normal
root=/dev/ram0
rw
)

"${kvm[@]}" --append "${append[*]}"
------------------------------------------------------------------

- dmesg log:

[ 24.135101] FPGA image file name: xlinx_fpga_firmware.bit
[ 24.137595] GPIO INIT FAIL!!
[ 24.141283] driver version 1.0.0.0 loaded
[ 24.142539] chipset driver version 1.0.0.0 loadedinvalid opcode: 0000 [#1] PREEMPT \
SMP [ 24.144793] Modules linked in:
[ 24.145303] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc5-00621-g12e364b #1
[ 24.145303] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 24.145303] task: ffff88001157a010 ti: ffff88001157c000 task.ti: ffff88001157c000
[ 24.145303] RIP: 0010:[<ffffffff81e37115>] [<ffffffff81e37115>] \
visorchipset_init+0x7b/0x8c5 [ 24.145303] RSP: 0000:ffff88001157de58 EFLAGS: \
00000286 [ 24.145303] RAX: 000000000000070b RBX: 0000000000000004 RCX: \
4000000000000000 [ 24.145303] RDX: a70aba7500000000 RSI: ffff88001157de5c RDI: \
ffff88001157de58 [ 24.145303] RBP: ffff88001157de90 R08: 0000000000000002 R09: \
ffff88001157de60 [ 24.145303] R10: ffff88001157de64 R11: 0000000000000000 R12: \
ffff88001157de5c [ 24.145303] R13: ffff88001157de60 R14: ffff88001157de64 R15: \
0000000000000000 [ 24.145303] FS: 0000000000000000(0000) GS:ffff880012600000(0000) \
knlGS:0000000000000000 [ 24.145303] CS: 0010 DS: 0000 ES: 0000 CR0: \
000000008005003b [ 24.145303] CR2: ffff880002992000 CR3: 0000000001c07000 CR4: \
00000000003006f0 [ 24.145303] DR0: 0000000000000000 DR1: 0000000000000000 DR2: \
0000000000000000 [ 24.145303] DR3: 0000000000000000 DR6: 0000000000000000 DR7: \
0000000000000000 [ 24.145303] Stack:
[ 24.145303] 00000800000306c1 078bfbf982d82203 ffffffff81e3709a 0000000000000000
[ 24.145303] 00000000000001df 0000000000000000 0000000000000000 ffff88001157df08
[ 24.145303] ffffffff810002b2 ffffffff810b2600 ffff88001157df08 ffffffff810b27db
[ 24.145303] Call Trace:
[ 24.145303] [<ffffffff81e3709a>] ? visorchannel_init+0x1d/0x1d
[ 24.145303] [<ffffffff810002b2>] do_one_initcall+0x8e/0x138
[ 24.145303] [<ffffffff810b2600>] ? param_array_set+0xef/0xf5
[ 24.145303] [<ffffffff810b27db>] ? parse_args+0x180/0x248
[ 24.145303] [<ffffffff81dfbf86>] kernel_init_freeable+0x108/0x199
[ 24.145303] [<ffffffff81dfb73a>] ? do_early_param+0x8a/0x8a
[ 24.145303] [<ffffffff8173f08e>] ? rest_init+0xc2/0xc2
[ 24.145303] [<ffffffff8173f097>] kernel_init+0x9/0xda
[ 24.145303] [<ffffffff8176024c>] ret_from_fork+0x7c/0xb0
[ 24.145303] [<ffffffff8173f08e>] ? rest_init+0xc2/0xc2
[ 24.145303] Code: 8d 65 cc 4c 8d 6d d0 4c 8d 75 d4 79 21 48 ba 00 00 00 00 75 ba \
0a a7 48 b9 00 00 00 00 00 00 00 40 bb 04 00 00 00 b8 0b 07 00 00 <0f> 01 c1 8b 35 c2 \
c4 b4 00 48 c7 c7 f5 93 b4 81 31 c0 e8 3b 21 [ 24.145303] RIP \
[<ffffffff81e37115>] visorchipset_init+0x7b/0x8c5 [ 24.145303] RSP \
<ffff88001157de58> [ 24.187247] ---[ end trace 62b5721899a66a6c ]---
[ 24.188157] Kernel panic - not syncing: Fatal exception


kernel kconfig & full dmesg log please check attachment in this mail.

Thanks,
Jet


On 04/13/2014 07:51 PM, Borislav Petkov wrote:
> Should we perhaps CC qemu-devel here for an opinion.
>
> Guys, this mail should explain the issue but in case there are
> questions, the whole thread starts here:
>
> http://lkml.kernel.org/r/20140407111725.GC25152@localhost
>
> Thanks.
>
> On Sat, Apr 12, 2014 at 01:35:49AM +0800, Jet Chen wrote:
>> On 04/12/2014 12:33 AM, H. Peter Anvin wrote:
>>> On 04/11/2014 06:51 AM, Romer, Benjamin M wrote:
>>>>
>>>>> I'm still confused where KVM comes into the picture. Are you actually
>>>>> using KVM (and thus talking about nested virtualization) or are you
>>>>> using Qemu in JIT mode and running another hypervisor underneath?
>>>>
>>>> The test that Fengguang used to find the problem was running the linux
>>>> kernel directly using KVM. When the kernel was run with "-cpu Haswell,
>>>> +smep,+smap" set, the vmcall failed with invalid op, but when the kernel
>>>> is run with "-cpu qemu64", the vmcall causes a vmexit, as it should.
>>>
>>> As far as I know, Fengguang's test doesn't use KVM at all, it runs Qemu
>>> as a JIT. Completely different thing. In that case Qemu probably
>>> should *not* set the hypervisor bit. However, the only thing that the
>>> hypervisor bit means is that you can look for specific hypervisor APIs
>>> in CPUID level 0x40000000+.
>>>
>>>> My point is, the vmcall was made because the hypervisor bit was set. If
>>>> this bit had been turned off, as it would be on a real processor, the
>>>> vmcall wouldn't have happened.
>>>
>>> And my point is that that is a bug. In the driver. A very serious one.
>>> You cannot call VMCALL until you know *which* hypervisor API(s) you
>>> have available, period.
>>>
>>>>> The hypervisor bit is a complete red herring. If the guest CPU is
>>>>> running in VT-x mode, then VMCALL should VMEXIT inside the guest
>>>>> (invoking the guest root VT-x),
>>>>
>>>> The CPU is running in VT-X. That was my point, the kernel is running in
>>>> the KVM guest, and KVM is setting the CPU feature bits such that bit 31
>>>> is enabled.
>>>
>>> Which it is because it wants to export the KVM hypercall interface.
>>> However, keying VMCALL *only* on the HYPERVISOR bit is wrong in the extreme.
>>>
>>>> I don't think it's a red herring because the kernel uses this bit
>>>> elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU
>>>> features, and can be checked with the cpu_has_hypervisor macro (which
>>>> was not used by the original author of the code in the driver, but
>>>> should have been). VMWare and KVM support in the kernel also check for
>>>> this bit before checking their hypervisor leaves for an ID. If it's not
>>>> properly set it affects more than just the s-Par drivers.
>>>>
>>>>> but the fact still remains that you
>>>>> should never, ever, invoke VMCALL unless you know what hypervisor you
>>>>> have underneath.
>>>>
>>>> From the standpoint of the s-Par drivers, yes, I agree (as I already
>>>> said). However, VMCALL is not a privileged instruction, so anyone could
>>>> use it from user space and go right past the OS straight to the
>>>> hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since
>>>> any user could hard-stop the guest with a couple of lines of C.
>>>
>>> Typically the hypervisor wants to generate a #UD inside of the guest for
>>> that case. The guest OS will intercept it and SIGILL the user space
>>> process.
>>>
>>> -hpa
>>>
>>
>> Hi Ben,
>>
>> I re-tested this case with/without option -enable-kvm.
>>
>> qemu-system-x86_64 -cpu Haswell,+smep,+smap invalid op
>> qemu-system-x86_64 -cpu kvm64 invalid op
>> qemu-system-x86_64 -cpu Haswell,+smep,+smap -enable-kvm everything OK
>> qemu-system-x86_64 -cpu kvm64 -enable-kvm everything OK
>>
>> I think this is probably a bug in QEMU.
>> Sorry for misleading you. I am not experienced in QEMU usage. I don't realize I need try this case with different options Until read Peter's reply.
>>
>> As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case.
>>
>> Thanks,
>> Jet
>>
>


Attachments:
dmesg-quantal-f4-128:20140407182830:x86_64-randconfig-br0-04050702:3.14.0-rc5-00621-g12e364b:1 (59.84 kB)
config-3.14.0-rc5-00621-g12e364b (96.71 kB)
Download all attachments

2014-04-30 10:02:59

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

Il 11/04/2014 19:40, H. Peter Anvin ha scritto:
> On 04/11/2014 10:35 AM, Jet Chen wrote:
>>
>> As Peter said, QEMU probably should *not* set the hypervisor bit. But based on my testing, I think KVM works properly in this case.
>>
>
> Either way, unless there is a CPUID interface exposed in CPUID levels
> 0x40000000+, then relying on the hypervisor bit to do VMCALL is wrong in
> the extreme.

Sorry for the delay guys, I was on vacation.

Lack of a CPUID interface at 0x40000000 is indeed *the* good reason why
QEMU should not set the hypervisor bit. Of course that there is no
guarantee that QEMU will never expose a 0x40000000 interface, and at
that point the hypervisor bit may reappear in QEMU's JIT mode.

As to sending #UD to the guest at CPL>0, that is a choice of the
hypervisor. Hyper-V (and KVM in Hyper-V emulation mode) does that, and
does the same in real mode too. KVM instead sets EAX to -KVM_EPERM, and
accepts hypercalls in real mode (where CPL=0). Terminating the guest is
surely the wrong thing to do at CPL>0.

Thanks,

Paolo