2012-06-27 08:53:34

by Zhang Yanfei

[permalink] [raw]
Subject: [PATCH v3 0/5] Export offsets of VMCS fields as note information for kdump

This patch set exports offsets of VMCS fields as note information for
kdump. We call it VMCSINFO. The purpose of VMCSINFO is to retrieve
runtime state of guest machine image, such as registers, in host
machine's crash dump as VMCS format. The problem is that VMCS internal
is hidden by Intel in its specification. So, we slove this problem
by reverse engineering implemented in this patch set. The VMCSINFO
is exported via sysfs (/sys/devices/cpu/vmcs/) to kexec-tools.

Here are two usercases for two features that we want.

1) Create guest machine's crash dumpfile from host machine's crash dumpfile

In general, we want to use this feature on failure analysis for the system
where the processing depends on the communication between host and guest
machines to look into the system from both machines's viewpoints.

As a concrete situation, consider where there's heartbeat monitoring
feature on the guest machine's side, where we need to determine in
which machine side the cause of heartbeat stop lies. In our actual
experiments, we encountered such situation and we found the cause of
the bug was in host's process schedular so guest machine's vcpu stopped
for a long time and then led to heartbeat stop.

The module that judges heartbeat stop is on guest machine, so we need
to debug guest machine's data. But if the cause lies in host machine
side, we need to look into host machine's crash dump.

Without this feature, we first create guest machine's dump and then
create host mahine's, but there's only a short time between two
processings, during which it's unlikely that buggy situation remains.

So, we think the feature is useful to debug both guest machine's and
host machine's sides at the same time, and expect we can make failure
analysis efficiently.

Of course, we believe this feature is commonly useful on the situation
where guest machine doesn't work well due to something of host machine's.

2) Get offsets of VMCS information on the CPU running on the host machine

If kdump doesn't work well, then it means we cannot use kvm API to get
register values of guest machine and they are still left on its vmcs
region. In the case, we use crash dump mechanism running outside of
linux kernel, such as sadump, a firmware-based crash dump. Then VMCS
information is then necessary.

TODO:
1. In kexec-tools, get VMCSINFO via sysfs and dump it as note information
into vmcore.
2. Dump VMCS region of each guest vcpu and VMCSINFO into qemu-process
core file. To do this, we will modify kernel core dumper, gdb gcore
and crash gcore.
3. Dump guest image from the qemu-process core file into a vmcore.

Changelog from v2 to v3:
1. New VMCSINFO format.
Now the VMCSINFO is mainly made up of an array that contains all vmcs
fields' offsets. The offsets aren't encoded because we decode them in
the module itself. If some field doesn't exist or its offset cannot be
decoded correctly, the offset in the array is just set to zero.
2. New sysfs interface and Documentation/ABI entry.
We expose the actual fields in /sys/devices/cpu/vmcs instead of just
exporting the address of VMCSINFO in /sys/kernel/vmcsinfo.
For example, /sys/devices/cpu/vmcs/0800 contains the offset of
GUEST_DS_SELECTOR. 0800 is the encoding of GUEST_DS_SELECTOR.
Accordingly, ABI entry in Documentation is changed from sysfs-kernel-vmcsinfo
to sysfs-devices-cpu-vmcs.

Changelog from v1 to v2:
1. The VMCSINFO now has a simple binary <field><encoded offset> format,
as below:
+-------------+--------------------------+
| Byte offset | Contents |
+-------------+--------------------------+
| 0 | VMCS revision identifier |
+-------------+--------------------------+
| 4 | <field><encoded offset> |
+-------------+--------------------------+
| 16 | <field><encoded offset> |
+-------------+--------------------------+
......

The first 32 bits of VMCSINFO contains the VMCS revision identifier.
The remainder of VMCSINFO is used for <field><encoded offset> sets.
Each set takes 12 bytes: field occupys 4 bytes and its corresponding
encoded offset occupys 8 bytes.

Encoded offsets are raw values read by vmcs_read{16, 64, 32, l}, and
they are all unsigned extended to 8 bytes for each <field><encoded offset>
set will have the same size.
We do not decode offsets here. The decoding work is delayed in userspace
tools for more flexible handling.

And here are two examples of the new VMCSINFO:
Processor: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
VMCSINFO contains:
<0000000d> --> VMCS revision id = 0xd
<00004000><0000000001840180> --> OFFSET(PIN_BASED_VM_EXEC_CONTROL) = 0x01840180
<00004002><0000000001940190> --> OFFSET(CPU_BASED_VM_EXEC_CONTROL) = 0x01940190
<0000401e><000000000fe40fe0> --> OFFSET(SECONDARY_VM_EXEC_CONTROL) = 0x0fe40fe0
<0000400c><0000000001e401e0> --> OFFSET(VM_EXIT_CONTROLS) = 0x01e401e0
......

Processor: Intel(R) Xeon(R) CPU E7540 @ 2.00GHz (24 cores)
VMCSINFO contains:
<0000000e> --> VMCS revision id = 0xe
<00004000><0000000005540550> --> OFFSET(PIN_BASED_VM_EXEC_CONTROL) = 0x05540550
<00004002><0000000005440540> --> OFFSET(CPU_BASED_VM_EXEC_CONTROL) = 0x05440540
<0000401e><00000000054c0548> --> OFFSET(SECONDARY_VM_EXEC_CONTROL) = 0x054c0548
<0000400c><00000000057c0578> --> OFFSET(VM_EXIT_CONTROLS) = 0x057c0578
......

2. Add a new kernel module *vmcsinfo-intel* for filling VMCSINFO instead
of putting it in module kvm-intel. The new module is auto-loaded
when the vmx cpufeature is detected and it depends on module kvm-intel.
*Loading and unloading this module will have no side effect on the
running guests.*
3. The sysfs file vmcsinfo is splitted into 2 files:
/sys/kernel/vmcsinfo: shows physical address of VMCSINFO note information.
/sys/kernel/vmcsinfo_maxsize: shows max size of VMCSINFO.
4. A new Documentation/ABI entry is added for vmcsinfo and vmcsinfo_maxsize.
5. Do not update VMCSINFO note when the kernel is panicked.

zhangyanfei (5):
x86: Add helper variables and functions to hold VMCSINFO
KVM: Export symbols for module vmcsinfo-intel
KVM-INTEL: Add new module vmcsinfo-intel to fill VMCSINFO
Sysfs: Export VMCSINFO via sysfs
Documentation: Add ABI entry for vmcs sysfs interface.

Documentation/ABI/testing/sysfs-devices-cpu-vmcs | 11 +
arch/x86/include/asm/vmcsinfo.h | 219 +++++++++++++
arch/x86/include/asm/vmx.h | 231 +++++---------
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/vmcsinfo.c | 381 ++++++++++++++++++++++
arch/x86/kvm/Kconfig | 11 +
arch/x86/kvm/Makefile | 3 +
arch/x86/kvm/vmcsinfo.c | 198 +++++++++++
arch/x86/kvm/vmx.c | 81 +----
drivers/base/core.c | 13 +
include/linux/kvm_host.h | 3 +
virt/kvm/kvm_main.c | 8 +-
12 files changed, 932 insertions(+), 228 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-devices-cpu-vmcs
create mode 100644 arch/x86/include/asm/vmcsinfo.h
create mode 100644 arch/x86/kernel/vmcsinfo.c
create mode 100644 arch/x86/kvm/vmcsinfo.c


2012-06-27 08:55:33

by Zhang Yanfei

[permalink] [raw]
Subject: [PATCH v3 1/5] x86: Add helper variables and functions to hold VMCSINFO

This patch provides a set of variables to hold the VMCSINFO and also
some helper functions to help fill the VMCSINFO.

Signed-off-by: zhangyanfei <[email protected]>
---
arch/x86/include/asm/vmcsinfo.h | 219 ++++++++++++++++++++++
arch/x86/include/asm/vmx.h | 158 +----------------
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/vmcsinfo.c | 381 +++++++++++++++++++++++++++++++++++++++
4 files changed, 603 insertions(+), 156 deletions(-)
create mode 100644 arch/x86/include/asm/vmcsinfo.h
create mode 100644 arch/x86/kernel/vmcsinfo.c

diff --git a/arch/x86/include/asm/vmcsinfo.h b/arch/x86/include/asm/vmcsinfo.h
new file mode 100644
index 0000000..4b9f56b
--- /dev/null
+++ b/arch/x86/include/asm/vmcsinfo.h
@@ -0,0 +1,219 @@
+#ifndef _ASM_X86_VMCSINFO_H
+#define _ASM_X86_VMCSINFO_H
+
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+#include <linux/elf.h>
+#include <linux/device.h>
+
+/* VMCS Encodings */
+enum vmcs_field {
+ VIRTUAL_PROCESSOR_ID = 0x00000000,
+ GUEST_ES_SELECTOR = 0x00000800,
+ GUEST_CS_SELECTOR = 0x00000802,
+ GUEST_SS_SELECTOR = 0x00000804,
+ GUEST_DS_SELECTOR = 0x00000806,
+ GUEST_FS_SELECTOR = 0x00000808,
+ GUEST_GS_SELECTOR = 0x0000080a,
+ GUEST_LDTR_SELECTOR = 0x0000080c,
+ GUEST_TR_SELECTOR = 0x0000080e,
+ HOST_ES_SELECTOR = 0x00000c00,
+ HOST_CS_SELECTOR = 0x00000c02,
+ HOST_SS_SELECTOR = 0x00000c04,
+ HOST_DS_SELECTOR = 0x00000c06,
+ HOST_FS_SELECTOR = 0x00000c08,
+ HOST_GS_SELECTOR = 0x00000c0a,
+ HOST_TR_SELECTOR = 0x00000c0c,
+ IO_BITMAP_A = 0x00002000,
+ IO_BITMAP_A_HIGH = 0x00002001,
+ IO_BITMAP_B = 0x00002002,
+ IO_BITMAP_B_HIGH = 0x00002003,
+ MSR_BITMAP = 0x00002004,
+ MSR_BITMAP_HIGH = 0x00002005,
+ VM_EXIT_MSR_STORE_ADDR = 0x00002006,
+ VM_EXIT_MSR_STORE_ADDR_HIGH = 0x00002007,
+ VM_EXIT_MSR_LOAD_ADDR = 0x00002008,
+ VM_EXIT_MSR_LOAD_ADDR_HIGH = 0x00002009,
+ VM_ENTRY_MSR_LOAD_ADDR = 0x0000200a,
+ VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x0000200b,
+ TSC_OFFSET = 0x00002010,
+ TSC_OFFSET_HIGH = 0x00002011,
+ VIRTUAL_APIC_PAGE_ADDR = 0x00002012,
+ VIRTUAL_APIC_PAGE_ADDR_HIGH = 0x00002013,
+ APIC_ACCESS_ADDR = 0x00002014,
+ APIC_ACCESS_ADDR_HIGH = 0x00002015,
+ EPT_POINTER = 0x0000201a,
+ EPT_POINTER_HIGH = 0x0000201b,
+ GUEST_PHYSICAL_ADDRESS = 0x00002400,
+ GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401,
+ VMCS_LINK_POINTER = 0x00002800,
+ VMCS_LINK_POINTER_HIGH = 0x00002801,
+ GUEST_IA32_DEBUGCTL = 0x00002802,
+ GUEST_IA32_DEBUGCTL_HIGH = 0x00002803,
+ GUEST_IA32_PAT = 0x00002804,
+ GUEST_IA32_PAT_HIGH = 0x00002805,
+ GUEST_IA32_EFER = 0x00002806,
+ GUEST_IA32_EFER_HIGH = 0x00002807,
+ GUEST_IA32_PERF_GLOBAL_CTRL = 0x00002808,
+ GUEST_IA32_PERF_GLOBAL_CTRL_HIGH= 0x00002809,
+ GUEST_PDPTR0 = 0x0000280a,
+ GUEST_PDPTR0_HIGH = 0x0000280b,
+ GUEST_PDPTR1 = 0x0000280c,
+ GUEST_PDPTR1_HIGH = 0x0000280d,
+ GUEST_PDPTR2 = 0x0000280e,
+ GUEST_PDPTR2_HIGH = 0x0000280f,
+ GUEST_PDPTR3 = 0x00002810,
+ GUEST_PDPTR3_HIGH = 0x00002811,
+ HOST_IA32_PAT = 0x00002c00,
+ HOST_IA32_PAT_HIGH = 0x00002c01,
+ HOST_IA32_EFER = 0x00002c02,
+ HOST_IA32_EFER_HIGH = 0x00002c03,
+ HOST_IA32_PERF_GLOBAL_CTRL = 0x00002c04,
+ HOST_IA32_PERF_GLOBAL_CTRL_HIGH = 0x00002c05,
+ PIN_BASED_VM_EXEC_CONTROL = 0x00004000,
+ CPU_BASED_VM_EXEC_CONTROL = 0x00004002,
+ EXCEPTION_BITMAP = 0x00004004,
+ PAGE_FAULT_ERROR_CODE_MASK = 0x00004006,
+ PAGE_FAULT_ERROR_CODE_MATCH = 0x00004008,
+ CR3_TARGET_COUNT = 0x0000400a,
+ VM_EXIT_CONTROLS = 0x0000400c,
+ VM_EXIT_MSR_STORE_COUNT = 0x0000400e,
+ VM_EXIT_MSR_LOAD_COUNT = 0x00004010,
+ VM_ENTRY_CONTROLS = 0x00004012,
+ VM_ENTRY_MSR_LOAD_COUNT = 0x00004014,
+ VM_ENTRY_INTR_INFO_FIELD = 0x00004016,
+ VM_ENTRY_EXCEPTION_ERROR_CODE = 0x00004018,
+ VM_ENTRY_INSTRUCTION_LEN = 0x0000401a,
+ TPR_THRESHOLD = 0x0000401c,
+ SECONDARY_VM_EXEC_CONTROL = 0x0000401e,
+ PLE_GAP = 0x00004020,
+ PLE_WINDOW = 0x00004022,
+ VM_INSTRUCTION_ERROR = 0x00004400,
+ VM_EXIT_REASON = 0x00004402,
+ VM_EXIT_INTR_INFO = 0x00004404,
+ VM_EXIT_INTR_ERROR_CODE = 0x00004406,
+ IDT_VECTORING_INFO_FIELD = 0x00004408,
+ IDT_VECTORING_ERROR_CODE = 0x0000440a,
+ VM_EXIT_INSTRUCTION_LEN = 0x0000440c,
+ VMX_INSTRUCTION_INFO = 0x0000440e,
+ GUEST_ES_LIMIT = 0x00004800,
+ GUEST_CS_LIMIT = 0x00004802,
+ GUEST_SS_LIMIT = 0x00004804,
+ GUEST_DS_LIMIT = 0x00004806,
+ GUEST_FS_LIMIT = 0x00004808,
+ GUEST_GS_LIMIT = 0x0000480a,
+ GUEST_LDTR_LIMIT = 0x0000480c,
+ GUEST_TR_LIMIT = 0x0000480e,
+ GUEST_GDTR_LIMIT = 0x00004810,
+ GUEST_IDTR_LIMIT = 0x00004812,
+ GUEST_ES_AR_BYTES = 0x00004814,
+ GUEST_CS_AR_BYTES = 0x00004816,
+ GUEST_SS_AR_BYTES = 0x00004818,
+ GUEST_DS_AR_BYTES = 0x0000481a,
+ GUEST_FS_AR_BYTES = 0x0000481c,
+ GUEST_GS_AR_BYTES = 0x0000481e,
+ GUEST_LDTR_AR_BYTES = 0x00004820,
+ GUEST_TR_AR_BYTES = 0x00004822,
+ GUEST_INTERRUPTIBILITY_INFO = 0x00004824,
+ GUEST_ACTIVITY_STATE = 0X00004826,
+ GUEST_SYSENTER_CS = 0x0000482A,
+ HOST_IA32_SYSENTER_CS = 0x00004c00,
+ CR0_GUEST_HOST_MASK = 0x00006000,
+ CR4_GUEST_HOST_MASK = 0x00006002,
+ CR0_READ_SHADOW = 0x00006004,
+ CR4_READ_SHADOW = 0x00006006,
+ CR3_TARGET_VALUE0 = 0x00006008,
+ CR3_TARGET_VALUE1 = 0x0000600a,
+ CR3_TARGET_VALUE2 = 0x0000600c,
+ CR3_TARGET_VALUE3 = 0x0000600e,
+ EXIT_QUALIFICATION = 0x00006400,
+ GUEST_LINEAR_ADDRESS = 0x0000640a,
+ GUEST_CR0 = 0x00006800,
+ GUEST_CR3 = 0x00006802,
+ GUEST_CR4 = 0x00006804,
+ GUEST_ES_BASE = 0x00006806,
+ GUEST_CS_BASE = 0x00006808,
+ GUEST_SS_BASE = 0x0000680a,
+ GUEST_DS_BASE = 0x0000680c,
+ GUEST_FS_BASE = 0x0000680e,
+ GUEST_GS_BASE = 0x00006810,
+ GUEST_LDTR_BASE = 0x00006812,
+ GUEST_TR_BASE = 0x00006814,
+ GUEST_GDTR_BASE = 0x00006816,
+ GUEST_IDTR_BASE = 0x00006818,
+ GUEST_DR7 = 0x0000681a,
+ GUEST_RSP = 0x0000681c,
+ GUEST_RIP = 0x0000681e,
+ GUEST_RFLAGS = 0x00006820,
+ GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
+ GUEST_SYSENTER_ESP = 0x00006824,
+ GUEST_SYSENTER_EIP = 0x00006826,
+ HOST_CR0 = 0x00006c00,
+ HOST_CR3 = 0x00006c02,
+ HOST_CR4 = 0x00006c04,
+ HOST_FS_BASE = 0x00006c06,
+ HOST_GS_BASE = 0x00006c08,
+ HOST_TR_BASE = 0x00006c0a,
+ HOST_GDTR_BASE = 0x00006c0c,
+ HOST_IDTR_BASE = 0x00006c0e,
+ HOST_IA32_SYSENTER_ESP = 0x00006c10,
+ HOST_IA32_SYSENTER_EIP = 0x00006c12,
+ HOST_RSP = 0x00006c14,
+ HOST_RIP = 0x00006c16,
+};
+
+/*
+ * vmcs field offsets.
+ */
+struct vmcsinfo {
+ u32 vmcs_revision_id;
+ int filled;
+ u16 vmcs_field_to_offset_table[HOST_RIP + 1];
+};
+
+#define VMCSINFO_NOTE_NAME "VMCSINFO"
+#define VMCSINFO_NOTE_NAME_BYTES ALIGN(sizeof(VMCSINFO_NOTE_NAME), 4)
+#define VMCSINFO_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
+#define VMCSINFO_NOTE_SIZE (VMCSINFO_NOTE_HEAD_BYTES*2 \
+ + sizeof(struct vmcsinfo) \
+ + VMCSINFO_NOTE_NAME_BYTES)
+
+extern struct vmcsinfo vmcsinfo;
+#define VMCSINFO_MAX_FIELD \
+ ARRAY_SIZE(vmcsinfo.vmcs_field_to_offset_table)
+
+extern void update_vmcsinfo_note(void);
+extern int vmcs_sysfs_add(struct device *);
+extern void vmcs_sysfs_remove(struct device *);
+
+static inline void vmcsinfo_revision_id(u32 id)
+{
+ vmcsinfo.vmcs_revision_id = id;
+}
+
+static inline void vmcsinfo_field(unsigned long field, u16 offset)
+{
+ if (field < VMCSINFO_MAX_FIELD)
+ vmcsinfo.vmcs_field_to_offset_table[field] = offset;
+}
+
+static inline int vmcsinfo_is_filled(void)
+{
+ return vmcsinfo.filled;
+}
+
+static inline void vmcsinfo_filled(void)
+{
+ vmcsinfo.filled = 1;
+}
+
+static inline short get_vmcs_field_offset(unsigned long field)
+{
+ if (field >= VMCSINFO_MAX_FIELD ||
+ vmcsinfo.vmcs_field_to_offset_table[field] == 0)
+ return -1;
+ return vmcsinfo.vmcs_field_to_offset_table[field];
+}
+
+#endif /* __ASSEMBLY__ */
+#endif /* _ASM_X86_VMCSINFO_H */
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 31f180c..c364219 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -27,6 +27,8 @@

#include <linux/types.h>

+#include <asm/vmcsinfo.h>
+
/*
* Definitions of Primary Processor-Based VM-Execution Controls.
*/
@@ -84,162 +86,6 @@
#define VM_ENTRY_LOAD_IA32_PAT 0x00004000
#define VM_ENTRY_LOAD_IA32_EFER 0x00008000

-/* VMCS Encodings */
-enum vmcs_field {
- VIRTUAL_PROCESSOR_ID = 0x00000000,
- GUEST_ES_SELECTOR = 0x00000800,
- GUEST_CS_SELECTOR = 0x00000802,
- GUEST_SS_SELECTOR = 0x00000804,
- GUEST_DS_SELECTOR = 0x00000806,
- GUEST_FS_SELECTOR = 0x00000808,
- GUEST_GS_SELECTOR = 0x0000080a,
- GUEST_LDTR_SELECTOR = 0x0000080c,
- GUEST_TR_SELECTOR = 0x0000080e,
- HOST_ES_SELECTOR = 0x00000c00,
- HOST_CS_SELECTOR = 0x00000c02,
- HOST_SS_SELECTOR = 0x00000c04,
- HOST_DS_SELECTOR = 0x00000c06,
- HOST_FS_SELECTOR = 0x00000c08,
- HOST_GS_SELECTOR = 0x00000c0a,
- HOST_TR_SELECTOR = 0x00000c0c,
- IO_BITMAP_A = 0x00002000,
- IO_BITMAP_A_HIGH = 0x00002001,
- IO_BITMAP_B = 0x00002002,
- IO_BITMAP_B_HIGH = 0x00002003,
- MSR_BITMAP = 0x00002004,
- MSR_BITMAP_HIGH = 0x00002005,
- VM_EXIT_MSR_STORE_ADDR = 0x00002006,
- VM_EXIT_MSR_STORE_ADDR_HIGH = 0x00002007,
- VM_EXIT_MSR_LOAD_ADDR = 0x00002008,
- VM_EXIT_MSR_LOAD_ADDR_HIGH = 0x00002009,
- VM_ENTRY_MSR_LOAD_ADDR = 0x0000200a,
- VM_ENTRY_MSR_LOAD_ADDR_HIGH = 0x0000200b,
- TSC_OFFSET = 0x00002010,
- TSC_OFFSET_HIGH = 0x00002011,
- VIRTUAL_APIC_PAGE_ADDR = 0x00002012,
- VIRTUAL_APIC_PAGE_ADDR_HIGH = 0x00002013,
- APIC_ACCESS_ADDR = 0x00002014,
- APIC_ACCESS_ADDR_HIGH = 0x00002015,
- EPT_POINTER = 0x0000201a,
- EPT_POINTER_HIGH = 0x0000201b,
- GUEST_PHYSICAL_ADDRESS = 0x00002400,
- GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401,
- VMCS_LINK_POINTER = 0x00002800,
- VMCS_LINK_POINTER_HIGH = 0x00002801,
- GUEST_IA32_DEBUGCTL = 0x00002802,
- GUEST_IA32_DEBUGCTL_HIGH = 0x00002803,
- GUEST_IA32_PAT = 0x00002804,
- GUEST_IA32_PAT_HIGH = 0x00002805,
- GUEST_IA32_EFER = 0x00002806,
- GUEST_IA32_EFER_HIGH = 0x00002807,
- GUEST_IA32_PERF_GLOBAL_CTRL = 0x00002808,
- GUEST_IA32_PERF_GLOBAL_CTRL_HIGH= 0x00002809,
- GUEST_PDPTR0 = 0x0000280a,
- GUEST_PDPTR0_HIGH = 0x0000280b,
- GUEST_PDPTR1 = 0x0000280c,
- GUEST_PDPTR1_HIGH = 0x0000280d,
- GUEST_PDPTR2 = 0x0000280e,
- GUEST_PDPTR2_HIGH = 0x0000280f,
- GUEST_PDPTR3 = 0x00002810,
- GUEST_PDPTR3_HIGH = 0x00002811,
- HOST_IA32_PAT = 0x00002c00,
- HOST_IA32_PAT_HIGH = 0x00002c01,
- HOST_IA32_EFER = 0x00002c02,
- HOST_IA32_EFER_HIGH = 0x00002c03,
- HOST_IA32_PERF_GLOBAL_CTRL = 0x00002c04,
- HOST_IA32_PERF_GLOBAL_CTRL_HIGH = 0x00002c05,
- PIN_BASED_VM_EXEC_CONTROL = 0x00004000,
- CPU_BASED_VM_EXEC_CONTROL = 0x00004002,
- EXCEPTION_BITMAP = 0x00004004,
- PAGE_FAULT_ERROR_CODE_MASK = 0x00004006,
- PAGE_FAULT_ERROR_CODE_MATCH = 0x00004008,
- CR3_TARGET_COUNT = 0x0000400a,
- VM_EXIT_CONTROLS = 0x0000400c,
- VM_EXIT_MSR_STORE_COUNT = 0x0000400e,
- VM_EXIT_MSR_LOAD_COUNT = 0x00004010,
- VM_ENTRY_CONTROLS = 0x00004012,
- VM_ENTRY_MSR_LOAD_COUNT = 0x00004014,
- VM_ENTRY_INTR_INFO_FIELD = 0x00004016,
- VM_ENTRY_EXCEPTION_ERROR_CODE = 0x00004018,
- VM_ENTRY_INSTRUCTION_LEN = 0x0000401a,
- TPR_THRESHOLD = 0x0000401c,
- SECONDARY_VM_EXEC_CONTROL = 0x0000401e,
- PLE_GAP = 0x00004020,
- PLE_WINDOW = 0x00004022,
- VM_INSTRUCTION_ERROR = 0x00004400,
- VM_EXIT_REASON = 0x00004402,
- VM_EXIT_INTR_INFO = 0x00004404,
- VM_EXIT_INTR_ERROR_CODE = 0x00004406,
- IDT_VECTORING_INFO_FIELD = 0x00004408,
- IDT_VECTORING_ERROR_CODE = 0x0000440a,
- VM_EXIT_INSTRUCTION_LEN = 0x0000440c,
- VMX_INSTRUCTION_INFO = 0x0000440e,
- GUEST_ES_LIMIT = 0x00004800,
- GUEST_CS_LIMIT = 0x00004802,
- GUEST_SS_LIMIT = 0x00004804,
- GUEST_DS_LIMIT = 0x00004806,
- GUEST_FS_LIMIT = 0x00004808,
- GUEST_GS_LIMIT = 0x0000480a,
- GUEST_LDTR_LIMIT = 0x0000480c,
- GUEST_TR_LIMIT = 0x0000480e,
- GUEST_GDTR_LIMIT = 0x00004810,
- GUEST_IDTR_LIMIT = 0x00004812,
- GUEST_ES_AR_BYTES = 0x00004814,
- GUEST_CS_AR_BYTES = 0x00004816,
- GUEST_SS_AR_BYTES = 0x00004818,
- GUEST_DS_AR_BYTES = 0x0000481a,
- GUEST_FS_AR_BYTES = 0x0000481c,
- GUEST_GS_AR_BYTES = 0x0000481e,
- GUEST_LDTR_AR_BYTES = 0x00004820,
- GUEST_TR_AR_BYTES = 0x00004822,
- GUEST_INTERRUPTIBILITY_INFO = 0x00004824,
- GUEST_ACTIVITY_STATE = 0X00004826,
- GUEST_SYSENTER_CS = 0x0000482A,
- HOST_IA32_SYSENTER_CS = 0x00004c00,
- CR0_GUEST_HOST_MASK = 0x00006000,
- CR4_GUEST_HOST_MASK = 0x00006002,
- CR0_READ_SHADOW = 0x00006004,
- CR4_READ_SHADOW = 0x00006006,
- CR3_TARGET_VALUE0 = 0x00006008,
- CR3_TARGET_VALUE1 = 0x0000600a,
- CR3_TARGET_VALUE2 = 0x0000600c,
- CR3_TARGET_VALUE3 = 0x0000600e,
- EXIT_QUALIFICATION = 0x00006400,
- GUEST_LINEAR_ADDRESS = 0x0000640a,
- GUEST_CR0 = 0x00006800,
- GUEST_CR3 = 0x00006802,
- GUEST_CR4 = 0x00006804,
- GUEST_ES_BASE = 0x00006806,
- GUEST_CS_BASE = 0x00006808,
- GUEST_SS_BASE = 0x0000680a,
- GUEST_DS_BASE = 0x0000680c,
- GUEST_FS_BASE = 0x0000680e,
- GUEST_GS_BASE = 0x00006810,
- GUEST_LDTR_BASE = 0x00006812,
- GUEST_TR_BASE = 0x00006814,
- GUEST_GDTR_BASE = 0x00006816,
- GUEST_IDTR_BASE = 0x00006818,
- GUEST_DR7 = 0x0000681a,
- GUEST_RSP = 0x0000681c,
- GUEST_RIP = 0x0000681e,
- GUEST_RFLAGS = 0x00006820,
- GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
- GUEST_SYSENTER_ESP = 0x00006824,
- GUEST_SYSENTER_EIP = 0x00006826,
- HOST_CR0 = 0x00006c00,
- HOST_CR3 = 0x00006c02,
- HOST_CR4 = 0x00006c04,
- HOST_FS_BASE = 0x00006c06,
- HOST_GS_BASE = 0x00006c08,
- HOST_TR_BASE = 0x00006c0a,
- HOST_GDTR_BASE = 0x00006c0c,
- HOST_IDTR_BASE = 0x00006c0e,
- HOST_IA32_SYSENTER_ESP = 0x00006c10,
- HOST_IA32_SYSENTER_EIP = 0x00006c12,
- HOST_RSP = 0x00006c14,
- HOST_RIP = 0x00006c16,
-};
-
#define VMX_EXIT_REASONS_FAILED_VMENTRY 0x80000000

#define EXIT_REASON_EXCEPTION_NMI 0
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8215e56..2c41f93 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -99,6 +99,7 @@ obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o
obj-$(CONFIG_SWIOTLB) += pci-swiotlb.o
obj-$(CONFIG_OF) += devicetree.o
obj-$(CONFIG_UPROBES) += uprobes.o
+obj-y += vmcsinfo.o

###
# 64 bit specific files
diff --git a/arch/x86/kernel/vmcsinfo.c b/arch/x86/kernel/vmcsinfo.c
new file mode 100644
index 0000000..25218ca
--- /dev/null
+++ b/arch/x86/kernel/vmcsinfo.c
@@ -0,0 +1,381 @@
+/*
+ * Architecture specific (i386/x86_64) functions for storing vmcs
+ * field information.
+ *
+ * Created by: Zhang Yanfei ([email protected])
+ *
+ * Copyright (C) Fujitsu Corporation, 2012. All rights reserved.
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2. See the file COPYING for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/elf.h>
+
+#include <asm/vmcsinfo.h>
+
+struct vmcsinfo vmcsinfo;
+EXPORT_SYMBOL_GPL(vmcsinfo);
+static u32 vmcsinfo_note[VMCSINFO_NOTE_SIZE/4];
+
+const char vmcs_group_name[] = "vmcs";
+
+void update_vmcsinfo_note(void)
+{
+ u32 *buf = vmcsinfo_note;
+ struct elf_note note;
+
+ if (!vmcsinfo_is_filled())
+ return;
+
+ note.n_namesz = strlen(VMCSINFO_NOTE_NAME) + 1;
+ note.n_descsz = sizeof(vmcsinfo);
+ note.n_type = 0;
+ memcpy(buf, &note, sizeof(note));
+ buf += (sizeof(note) + 3)/4;
+ memcpy(buf, VMCSINFO_NOTE_NAME, note.n_namesz);
+ buf += (note.n_namesz + 3)/4;
+ memcpy(buf, &vmcsinfo, note.n_descsz);
+ buf += (note.n_descsz + 3)/4;
+
+ note.n_namesz = 0;
+ note.n_descsz = 0;
+ note.n_type = 0;
+ memcpy(buf, &note, sizeof(note));
+}
+EXPORT_SYMBOL_GPL(update_vmcsinfo_note);
+
+#define BUILD_OFFSET_SHOW(field_code) \
+static ssize_t _##field_code##_show(struct device *dev, \
+ struct device_attribute *attr, \
+ char *buf) \
+{ \
+ return sprintf(buf, "%d\n", \
+ vmcsinfo.vmcs_field_to_offset_table[0x##field_code]); \
+} \
+static DEVICE_ATTR(field_code, 0444, _##field_code##_show, NULL); \
+
+BUILD_OFFSET_SHOW(0000); /* VIRTUAL_PROCESSOR_ID */
+BUILD_OFFSET_SHOW(0800); /* GUEST_ES_SELECTOR */
+BUILD_OFFSET_SHOW(0802); /* GUEST_CS_SELECTOR */
+BUILD_OFFSET_SHOW(0804); /* GUEST_SS_SELECTOR */
+BUILD_OFFSET_SHOW(0806); /* GUEST_DS_SELECTOR */
+BUILD_OFFSET_SHOW(0808); /* GUEST_FS_SELECTOR */
+BUILD_OFFSET_SHOW(080a); /* GUEST_GS_SELECTOR */
+BUILD_OFFSET_SHOW(080c); /* GUEST_LDTR_SELECTOR */
+BUILD_OFFSET_SHOW(080e); /* GUEST_TR_SELECTOR */
+BUILD_OFFSET_SHOW(0c00); /* HOST_ES_SELECTOR */
+BUILD_OFFSET_SHOW(0c02); /* HOST_CS_SELECTOR */
+BUILD_OFFSET_SHOW(0c04); /* HOST_SS_SELECTOR */
+BUILD_OFFSET_SHOW(0c06); /* HOST_DS_SELECTOR */
+BUILD_OFFSET_SHOW(0c08); /* HOST_FS_SELECTOR */
+BUILD_OFFSET_SHOW(0c0a); /* HOST_GS_SELECTOR */
+BUILD_OFFSET_SHOW(0c0c); /* HOST_TR_SELECTOR */
+BUILD_OFFSET_SHOW(2000); /* IO_BITMAP_A */
+BUILD_OFFSET_SHOW(2001); /* IO_BITMAP_A_HIGH */
+BUILD_OFFSET_SHOW(2002); /* IO_BITMAP_B */
+BUILD_OFFSET_SHOW(2003); /* IO_BITMAP_B_HIGH */
+BUILD_OFFSET_SHOW(2004); /* MSR_BITMAP */
+BUILD_OFFSET_SHOW(2005); /* MSR_BITMAP_HIGH */
+BUILD_OFFSET_SHOW(2006); /* VM_EXIT_MSR_STORE_ADDR */
+BUILD_OFFSET_SHOW(2007); /* VM_EXIT_MSR_STORE_ADDR_HIGH */
+BUILD_OFFSET_SHOW(2008); /* VM_EXIT_MSR_LOAD_ADDR */
+BUILD_OFFSET_SHOW(2009); /* VM_EXIT_MSR_LOAD_ADDR_HIGH */
+BUILD_OFFSET_SHOW(200a); /* VM_ENTRY_MSR_LOAD_ADDR */
+BUILD_OFFSET_SHOW(200b); /* VM_ENTRY_MSR_LOAD_ADDR_HIGH */
+BUILD_OFFSET_SHOW(2010); /* TSC_OFFSET */
+BUILD_OFFSET_SHOW(2011); /* TSC_OFFSET_HIGH */
+BUILD_OFFSET_SHOW(2012); /* VIRTUAL_APIC_PAGE_ADDR */
+BUILD_OFFSET_SHOW(2013); /* VIRTUAL_APIC_PAGE_ADDR_HIGH */
+BUILD_OFFSET_SHOW(2014); /* APIC_ACCESS_ADDR */
+BUILD_OFFSET_SHOW(2015); /* APIC_ACCESS_ADDR_HIGH */
+BUILD_OFFSET_SHOW(201a); /* EPT_POINTER */
+BUILD_OFFSET_SHOW(201b); /* EPT_POINTER_HIGH */
+BUILD_OFFSET_SHOW(2400); /* GUEST_PHYSICAL_ADDRESS */
+BUILD_OFFSET_SHOW(2401); /* GUEST_PHYSICAL_ADDRESS_HIGH */
+BUILD_OFFSET_SHOW(2800); /* VMCS_LINK_POINTER */
+BUILD_OFFSET_SHOW(2801); /* VMCS_LINK_POINTER_HIGH */
+BUILD_OFFSET_SHOW(2802); /* GUEST_IA32_DEBUGCTL */
+BUILD_OFFSET_SHOW(2803); /* GUEST_IA32_DEBUGCTL_HIGH */
+BUILD_OFFSET_SHOW(2804); /* GUEST_IA32_PAT */
+BUILD_OFFSET_SHOW(2805); /* GUEST_IA32_PAT_HIGH */
+BUILD_OFFSET_SHOW(2806); /* GUEST_IA32_EFER */
+BUILD_OFFSET_SHOW(2807); /* GUEST_IA32_EFER_HIGH */
+BUILD_OFFSET_SHOW(2808); /* GUEST_IA32_PERF_GLOBAL_CTRL */
+BUILD_OFFSET_SHOW(2809); /* GUEST_IA32_PERF_GLOBAL_CTRL_HIGH */
+BUILD_OFFSET_SHOW(280a); /* GUEST_PDPTR0 */
+BUILD_OFFSET_SHOW(280b); /* GUEST_PDPTR0_HIGH */
+BUILD_OFFSET_SHOW(280c); /* GUEST_PDPTR1 */
+BUILD_OFFSET_SHOW(280d); /* GUEST_PDPTR1_HIGH */
+BUILD_OFFSET_SHOW(280e); /* GUEST_PDPTR2 */
+BUILD_OFFSET_SHOW(280f); /* GUEST_PDPTR2_HIGH */
+BUILD_OFFSET_SHOW(2810); /* GUEST_PDPTR3 */
+BUILD_OFFSET_SHOW(2811); /* GUEST_PDPTR3_HIGH */
+BUILD_OFFSET_SHOW(2c00); /* HOST_IA32_PAT */
+BUILD_OFFSET_SHOW(2c01); /* HOST_IA32_PAT_HIGH */
+BUILD_OFFSET_SHOW(2c02); /* HOST_IA32_EFER */
+BUILD_OFFSET_SHOW(2c03); /* HOST_IA32_EFER_HIGH */
+BUILD_OFFSET_SHOW(2c04); /* HOST_IA32_PERF_GLOBAL_CTRL */
+BUILD_OFFSET_SHOW(2c05); /* HOST_IA32_PERF_GLOBAL_CTRL_HIGH */
+BUILD_OFFSET_SHOW(4000); /* PIN_BASED_VM_EXEC_CONTROL */
+BUILD_OFFSET_SHOW(4002); /* CPU_BASED_VM_EXEC_CONTROL */
+BUILD_OFFSET_SHOW(4004); /* EXCEPTION_BITMAP */
+BUILD_OFFSET_SHOW(4006); /* PAGE_FAULT_ERROR_CODE_MASK */
+BUILD_OFFSET_SHOW(4008); /* PAGE_FAULT_ERROR_CODE_MATCH */
+BUILD_OFFSET_SHOW(400a); /* CR3_TARGET_COUNT */
+BUILD_OFFSET_SHOW(400c); /* VM_EXIT_CONTROLS */
+BUILD_OFFSET_SHOW(400e); /* VM_EXIT_MSR_STORE_COUNT */
+BUILD_OFFSET_SHOW(4010); /* VM_EXIT_MSR_LOAD_COUNT */
+BUILD_OFFSET_SHOW(4012); /* VM_ENTRY_CONTROLS */
+BUILD_OFFSET_SHOW(4014); /* VM_ENTRY_MSR_LOAD_COUNT */
+BUILD_OFFSET_SHOW(4016); /* VM_ENTRY_INTR_INFO_FIELD */
+BUILD_OFFSET_SHOW(4018); /* VM_ENTRY_EXCEPTION_ERROR_CODE */
+BUILD_OFFSET_SHOW(401a); /* VM_ENTRY_INSTRUCTION_LEN */
+BUILD_OFFSET_SHOW(401c); /* TPR_THRESHOLD */
+BUILD_OFFSET_SHOW(401e); /* SECONDARY_VM_EXEC_CONTROL */
+BUILD_OFFSET_SHOW(4020); /* PLE_GAP */
+BUILD_OFFSET_SHOW(4022); /* PLE_WINDOW */
+BUILD_OFFSET_SHOW(4400); /* VM_INSTRUCTION_ERROR */
+BUILD_OFFSET_SHOW(4402); /* VM_EXIT_REASON */
+BUILD_OFFSET_SHOW(4404); /* VM_EXIT_INTR_INFO */
+BUILD_OFFSET_SHOW(4406); /* VM_EXIT_INTR_ERROR_CODE */
+BUILD_OFFSET_SHOW(4408); /* IDT_VECTORING_INFO_FIELD */
+BUILD_OFFSET_SHOW(440a); /* IDT_VECTORING_ERROR_CODE */
+BUILD_OFFSET_SHOW(440c); /* VM_EXIT_INSTRUCTION_LEN */
+BUILD_OFFSET_SHOW(440e); /* VMX_INSTRUCTION_INFO */
+BUILD_OFFSET_SHOW(4800); /* GUEST_ES_LIMIT */
+BUILD_OFFSET_SHOW(4802); /* GUEST_CS_LIMIT */
+BUILD_OFFSET_SHOW(4804); /* GUEST_SS_LIMIT */
+BUILD_OFFSET_SHOW(4806); /* GUEST_DS_LIMIT */
+BUILD_OFFSET_SHOW(4808); /* GUEST_FS_LIMIT */
+BUILD_OFFSET_SHOW(480a); /* GUEST_GS_LIMIT */
+BUILD_OFFSET_SHOW(480c); /* GUEST_LDTR_LIMIT */
+BUILD_OFFSET_SHOW(480e); /* GUEST_TR_LIMIT */
+BUILD_OFFSET_SHOW(4810); /* GUEST_GDTR_LIMIT */
+BUILD_OFFSET_SHOW(4812); /* GUEST_IDTR_LIMIT */
+BUILD_OFFSET_SHOW(4814); /* GUEST_ES_AR_BYTES */
+BUILD_OFFSET_SHOW(4816); /* GUEST_CS_AR_BYTES */
+BUILD_OFFSET_SHOW(4818); /* GUEST_SS_AR_BYTES */
+BUILD_OFFSET_SHOW(481a); /* GUEST_DS_AR_BYTES */
+BUILD_OFFSET_SHOW(481c); /* GUEST_FS_AR_BYTES */
+BUILD_OFFSET_SHOW(481e); /* GUEST_GS_AR_BYTES */
+BUILD_OFFSET_SHOW(4820); /* GUEST_LDTR_AR_BYTES */
+BUILD_OFFSET_SHOW(4822); /* GUEST_TR_AR_BYTES */
+BUILD_OFFSET_SHOW(4824); /* GUEST_INTERRUPTIBILITY_INFO */
+BUILD_OFFSET_SHOW(4826); /* GUEST_ACTIVITY_STATE */
+BUILD_OFFSET_SHOW(482A); /* GUEST_SYSENTER_CS */
+BUILD_OFFSET_SHOW(4c00); /* HOST_IA32_SYSENTER_CS */
+BUILD_OFFSET_SHOW(6000); /* CR0_GUEST_HOST_MASK */
+BUILD_OFFSET_SHOW(6002); /* CR4_GUEST_HOST_MASK */
+BUILD_OFFSET_SHOW(6004); /* CR0_READ_SHADOW */
+BUILD_OFFSET_SHOW(6006); /* CR4_READ_SHADOW */
+BUILD_OFFSET_SHOW(6008); /* CR3_TARGET_VALUE0 */
+BUILD_OFFSET_SHOW(600a); /* CR3_TARGET_VALUE1 */
+BUILD_OFFSET_SHOW(600c); /* CR3_TARGET_VALUE2 */
+BUILD_OFFSET_SHOW(600e); /* CR3_TARGET_VALUE3 */
+BUILD_OFFSET_SHOW(6400); /* EXIT_QUALIFICATION */
+BUILD_OFFSET_SHOW(640a); /* GUEST_LINEAR_ADDRESS */
+BUILD_OFFSET_SHOW(6800); /* GUEST_CR0 */
+BUILD_OFFSET_SHOW(6802); /* GUEST_CR3 */
+BUILD_OFFSET_SHOW(6804); /* GUEST_CR4 */
+BUILD_OFFSET_SHOW(6806); /* GUEST_ES_BASE */
+BUILD_OFFSET_SHOW(6808); /* GUEST_CS_BASE */
+BUILD_OFFSET_SHOW(680a); /* GUEST_SS_BASE */
+BUILD_OFFSET_SHOW(680c); /* GUEST_DS_BASE */
+BUILD_OFFSET_SHOW(680e); /* GUEST_FS_BASE */
+BUILD_OFFSET_SHOW(6810); /* GUEST_GS_BASE */
+BUILD_OFFSET_SHOW(6812); /* GUEST_LDTR_BASE */
+BUILD_OFFSET_SHOW(6814); /* GUEST_TR_BASE */
+BUILD_OFFSET_SHOW(6816); /* GUEST_GDTR_BASE */
+BUILD_OFFSET_SHOW(6818); /* GUEST_IDTR_BASE */
+BUILD_OFFSET_SHOW(681a); /* GUEST_DR7 */
+BUILD_OFFSET_SHOW(681c); /* GUEST_RSP */
+BUILD_OFFSET_SHOW(681e); /* GUEST_RIP */
+BUILD_OFFSET_SHOW(6820); /* GUEST_RFLAGS */
+BUILD_OFFSET_SHOW(6822); /* GUEST_PENDING_DBG_EXCEPTIONS */
+BUILD_OFFSET_SHOW(6824); /* GUEST_SYSENTER_ESP */
+BUILD_OFFSET_SHOW(6826); /* GUEST_SYSENTER_EIP */
+BUILD_OFFSET_SHOW(6c00); /* HOST_CR0 */
+BUILD_OFFSET_SHOW(6c02); /* HOST_CR3 */
+BUILD_OFFSET_SHOW(6c04); /* HOST_CR4 */
+BUILD_OFFSET_SHOW(6c06); /* HOST_FS_BASE */
+BUILD_OFFSET_SHOW(6c08); /* HOST_GS_BASE */
+BUILD_OFFSET_SHOW(6c0a); /* HOST_TR_BASE */
+BUILD_OFFSET_SHOW(6c0c); /* HOST_GDTR_BASE */
+BUILD_OFFSET_SHOW(6c0e); /* HOST_IDTR_BASE */
+BUILD_OFFSET_SHOW(6c10); /* HOST_IA32_SYSENTER_ESP */
+BUILD_OFFSET_SHOW(6c12); /* HOST_IA32_SYSENTER_EIP */
+BUILD_OFFSET_SHOW(6c14); /* HOST_RSP */
+BUILD_OFFSET_SHOW(6c16); /* HOST_RIP */
+
+static struct attribute *vmcs_attrs[] = {
+ &dev_attr_0000.attr,
+ &dev_attr_0800.attr,
+ &dev_attr_0802.attr,
+ &dev_attr_0804.attr,
+ &dev_attr_0806.attr,
+ &dev_attr_0808.attr,
+ &dev_attr_080a.attr,
+ &dev_attr_080c.attr,
+ &dev_attr_080e.attr,
+ &dev_attr_0c00.attr,
+ &dev_attr_0c02.attr,
+ &dev_attr_0c04.attr,
+ &dev_attr_0c06.attr,
+ &dev_attr_0c08.attr,
+ &dev_attr_0c0a.attr,
+ &dev_attr_0c0c.attr,
+ &dev_attr_2000.attr,
+ &dev_attr_2001.attr,
+ &dev_attr_2002.attr,
+ &dev_attr_2003.attr,
+ &dev_attr_2004.attr,
+ &dev_attr_2005.attr,
+ &dev_attr_2006.attr,
+ &dev_attr_2007.attr,
+ &dev_attr_2008.attr,
+ &dev_attr_2009.attr,
+ &dev_attr_200a.attr,
+ &dev_attr_200b.attr,
+ &dev_attr_2010.attr,
+ &dev_attr_2011.attr,
+ &dev_attr_2012.attr,
+ &dev_attr_2013.attr,
+ &dev_attr_2014.attr,
+ &dev_attr_2015.attr,
+ &dev_attr_201a.attr,
+ &dev_attr_201b.attr,
+ &dev_attr_2400.attr,
+ &dev_attr_2401.attr,
+ &dev_attr_2800.attr,
+ &dev_attr_2801.attr,
+ &dev_attr_2802.attr,
+ &dev_attr_2803.attr,
+ &dev_attr_2804.attr,
+ &dev_attr_2805.attr,
+ &dev_attr_2806.attr,
+ &dev_attr_2807.attr,
+ &dev_attr_2808.attr,
+ &dev_attr_2809.attr,
+ &dev_attr_280a.attr,
+ &dev_attr_280b.attr,
+ &dev_attr_280c.attr,
+ &dev_attr_280d.attr,
+ &dev_attr_280e.attr,
+ &dev_attr_280f.attr,
+ &dev_attr_2810.attr,
+ &dev_attr_2811.attr,
+ &dev_attr_2c00.attr,
+ &dev_attr_2c01.attr,
+ &dev_attr_2c02.attr,
+ &dev_attr_2c03.attr,
+ &dev_attr_2c04.attr,
+ &dev_attr_2c05.attr,
+ &dev_attr_4000.attr,
+ &dev_attr_4002.attr,
+ &dev_attr_4004.attr,
+ &dev_attr_4006.attr,
+ &dev_attr_4008.attr,
+ &dev_attr_400a.attr,
+ &dev_attr_400c.attr,
+ &dev_attr_400e.attr,
+ &dev_attr_4010.attr,
+ &dev_attr_4012.attr,
+ &dev_attr_4014.attr,
+ &dev_attr_4016.attr,
+ &dev_attr_4018.attr,
+ &dev_attr_401a.attr,
+ &dev_attr_401c.attr,
+ &dev_attr_401e.attr,
+ &dev_attr_4020.attr,
+ &dev_attr_4022.attr,
+ &dev_attr_4400.attr,
+ &dev_attr_4402.attr,
+ &dev_attr_4404.attr,
+ &dev_attr_4406.attr,
+ &dev_attr_4408.attr,
+ &dev_attr_440a.attr,
+ &dev_attr_440c.attr,
+ &dev_attr_440e.attr,
+ &dev_attr_4800.attr,
+ &dev_attr_4802.attr,
+ &dev_attr_4804.attr,
+ &dev_attr_4806.attr,
+ &dev_attr_4808.attr,
+ &dev_attr_480a.attr,
+ &dev_attr_480c.attr,
+ &dev_attr_480e.attr,
+ &dev_attr_4810.attr,
+ &dev_attr_4812.attr,
+ &dev_attr_4814.attr,
+ &dev_attr_4816.attr,
+ &dev_attr_4818.attr,
+ &dev_attr_481a.attr,
+ &dev_attr_481c.attr,
+ &dev_attr_481e.attr,
+ &dev_attr_4820.attr,
+ &dev_attr_4822.attr,
+ &dev_attr_4824.attr,
+ &dev_attr_4826.attr,
+ &dev_attr_482A.attr,
+ &dev_attr_4c00.attr,
+ &dev_attr_6000.attr,
+ &dev_attr_6002.attr,
+ &dev_attr_6004.attr,
+ &dev_attr_6006.attr,
+ &dev_attr_6008.attr,
+ &dev_attr_600a.attr,
+ &dev_attr_600c.attr,
+ &dev_attr_600e.attr,
+ &dev_attr_6400.attr,
+ &dev_attr_640a.attr,
+ &dev_attr_6800.attr,
+ &dev_attr_6802.attr,
+ &dev_attr_6804.attr,
+ &dev_attr_6806.attr,
+ &dev_attr_6808.attr,
+ &dev_attr_680a.attr,
+ &dev_attr_680c.attr,
+ &dev_attr_680e.attr,
+ &dev_attr_6810.attr,
+ &dev_attr_6812.attr,
+ &dev_attr_6814.attr,
+ &dev_attr_6816.attr,
+ &dev_attr_6818.attr,
+ &dev_attr_681a.attr,
+ &dev_attr_681c.attr,
+ &dev_attr_681e.attr,
+ &dev_attr_6820.attr,
+ &dev_attr_6822.attr,
+ &dev_attr_6824.attr,
+ &dev_attr_6826.attr,
+ &dev_attr_6c00.attr,
+ &dev_attr_6c02.attr,
+ &dev_attr_6c04.attr,
+ &dev_attr_6c06.attr,
+ &dev_attr_6c08.attr,
+ &dev_attr_6c0a.attr,
+ &dev_attr_6c0c.attr,
+ &dev_attr_6c0e.attr,
+ &dev_attr_6c10.attr,
+ &dev_attr_6c12.attr,
+ &dev_attr_6c14.attr,
+ &dev_attr_6c16.attr,
+ NULL,
+};
+
+static struct attribute_group vmcs_attr_group = {
+ .name = vmcs_group_name,
+ .attrs = vmcs_attrs,
+};
+
+int vmcs_sysfs_add(struct device *dev)
+{
+ return sysfs_create_group(&dev->kobj, &vmcs_attr_group);
+}
+
+void vmcs_sysfs_remove(struct device *dev)
+{
+ sysfs_remove_group(&dev->kobj, &vmcs_attr_group);
+}
--
1.7.1

2012-06-27 08:56:37

by Zhang Yanfei

[permalink] [raw]
Subject: [PATCH v3 2/5] KVM: Export symbols for module vmcsinfo-intel

A new module named vmcsinfo-intel is used to fill VMCSINFO. And
this module depends on kvm-intel and kvm module. So we should
export some symbols of kvm-intel and kvm module that are needed
by vmcsinfo-intel.

Signed-off-by: zhangyanfei <[email protected]>
---
arch/x86/include/asm/vmx.h | 73 +++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx.c | 81 +++++++-------------------------------------
include/linux/kvm_host.h | 3 ++
virt/kvm/kvm_main.c | 8 ++--
4 files changed, 93 insertions(+), 72 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index c364219..03b7ae3 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -26,6 +26,7 @@
*/

#include <linux/types.h>
+#include <linux/kvm_host.h>

#include <asm/vmcsinfo.h>

@@ -327,4 +328,76 @@ enum vm_instruction_error_number {
VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID = 28,
};

+#define __ex(x) __kvm_handle_fault_on_reboot(x)
+#define __ex_clear(x, reg) \
+ ____kvm_handle_fault_on_reboot(x, "xor " reg " , " reg)
+
+struct vmcs {
+ u32 revision_id;
+ u32 abort;
+ char data[0];
+};
+
+struct vmcs_config {
+ int size;
+ int order;
+ u32 revision_id;
+ u32 pin_based_exec_ctrl;
+ u32 cpu_based_exec_ctrl;
+ u32 cpu_based_2nd_exec_ctrl;
+ u32 vmexit_ctrl;
+ u32 vmentry_ctrl;
+};
+
+extern struct vmcs_config vmcs_config;
+
+DECLARE_PER_CPU(struct vmcs *, current_vmcs);
+
+enum vmcs_field_type {
+ VMCS_FIELD_TYPE_U16 = 0,
+ VMCS_FIELD_TYPE_U64 = 1,
+ VMCS_FIELD_TYPE_U32 = 2,
+ VMCS_FIELD_TYPE_NATURAL_WIDTH = 3
+};
+
+static inline int vmcs_field_type(unsigned long field)
+{
+ if (0x1 & field) /* the *_HIGH fields are all 32 bit */
+ return VMCS_FIELD_TYPE_U32;
+ return (field >> 13) & 0x3 ;
+}
+
+static __always_inline unsigned long vmcs_readl(unsigned long field)
+{
+ unsigned long value;
+
+ asm volatile (__ex_clear(ASM_VMX_VMREAD_RDX_RAX, "%0")
+ : "=a"(value) : "d"(field) : "cc");
+ return value;
+}
+
+static __always_inline u16 vmcs_read16(unsigned long field)
+{
+ return vmcs_readl(field);
+}
+
+static __always_inline u32 vmcs_read32(unsigned long field)
+{
+ return vmcs_readl(field);
+}
+
+static __always_inline u64 vmcs_read64(unsigned long field)
+{
+#ifdef CONFIG_X86_64
+ return vmcs_readl(field);
+#else
+ return vmcs_readl(field) | ((u64)vmcs_readl(field+1) << 32);
+#endif
+}
+
+struct vmcs *alloc_vmcs(void);
+void vmcs_load(struct vmcs *);
+void vmcs_clear(struct vmcs *);
+void free_vmcs(struct vmcs *);
+
#endif
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 32eb588..43ceae7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -20,7 +20,6 @@
#include "mmu.h"
#include "cpuid.h"

-#include <linux/kvm_host.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/mm.h>
@@ -45,10 +44,6 @@

#include "trace.h"

-#define __ex(x) __kvm_handle_fault_on_reboot(x)
-#define __ex_clear(x, reg) \
- ____kvm_handle_fault_on_reboot(x, "xor " reg " , " reg)
-
MODULE_AUTHOR("Qumranet");
MODULE_LICENSE("GPL");

@@ -127,12 +122,6 @@ module_param(ple_window, int, S_IRUGO);
#define NR_AUTOLOAD_MSRS 8
#define VMCS02_POOL_SIZE 1

-struct vmcs {
- u32 revision_id;
- u32 abort;
- char data[0];
-};
-
/*
* Track a VMCS that may be loaded on a certain CPU. If it is (cpu!=-1), also
* remember whether it was VMLAUNCHed, and maintain a linked list of all VMCSs
@@ -617,7 +606,9 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);

static DEFINE_PER_CPU(struct vmcs *, vmxarea);
-static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
+DEFINE_PER_CPU(struct vmcs *, current_vmcs);
+EXPORT_SYMBOL_GPL(current_vmcs);
+
/*
* We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is needed
* when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it.
@@ -636,16 +627,8 @@ static bool cpu_has_load_perf_global_ctrl;
static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
static DEFINE_SPINLOCK(vmx_vpid_lock);

-static struct vmcs_config {
- int size;
- int order;
- u32 revision_id;
- u32 pin_based_exec_ctrl;
- u32 cpu_based_exec_ctrl;
- u32 cpu_based_2nd_exec_ctrl;
- u32 vmexit_ctrl;
- u32 vmentry_ctrl;
-} vmcs_config;
+struct vmcs_config vmcs_config;
+EXPORT_SYMBOL_GPL(vmcs_config);

static struct vmx_capability {
u32 ept;
@@ -940,7 +923,7 @@ static struct shared_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr)
return NULL;
}

-static void vmcs_clear(struct vmcs *vmcs)
+void vmcs_clear(struct vmcs *vmcs)
{
u64 phys_addr = __pa(vmcs);
u8 error;
@@ -952,6 +935,7 @@ static void vmcs_clear(struct vmcs *vmcs)
printk(KERN_ERR "kvm: vmclear fail: %p/%llx\n",
vmcs, phys_addr);
}
+EXPORT_SYMBOL_GPL(vmcs_clear);

static inline void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
{
@@ -960,7 +944,7 @@ static inline void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
loaded_vmcs->launched = 0;
}

-static void vmcs_load(struct vmcs *vmcs)
+void vmcs_load(struct vmcs *vmcs)
{
u64 phys_addr = __pa(vmcs);
u8 error;
@@ -972,6 +956,7 @@ static void vmcs_load(struct vmcs *vmcs)
printk(KERN_ERR "kvm: vmptrld %p/%llx failed\n",
vmcs, phys_addr);
}
+EXPORT_SYMBOL_GPL(vmcs_load);

static void __loaded_vmcs_clear(void *arg)
{
@@ -1043,34 +1028,6 @@ static inline void ept_sync_individual_addr(u64 eptp, gpa_t gpa)
}
}

-static __always_inline unsigned long vmcs_readl(unsigned long field)
-{
- unsigned long value;
-
- asm volatile (__ex_clear(ASM_VMX_VMREAD_RDX_RAX, "%0")
- : "=a"(value) : "d"(field) : "cc");
- return value;
-}
-
-static __always_inline u16 vmcs_read16(unsigned long field)
-{
- return vmcs_readl(field);
-}
-
-static __always_inline u32 vmcs_read32(unsigned long field)
-{
- return vmcs_readl(field);
-}
-
-static __always_inline u64 vmcs_read64(unsigned long field)
-{
-#ifdef CONFIG_X86_64
- return vmcs_readl(field);
-#else
- return vmcs_readl(field) | ((u64)vmcs_readl(field+1) << 32);
-#endif
-}
-
static noinline void vmwrite_error(unsigned long field, unsigned long value)
{
printk(KERN_ERR "vmwrite error: reg %lx value %lx (err %d)\n",
@@ -2580,15 +2537,17 @@ static struct vmcs *alloc_vmcs_cpu(int cpu)
return vmcs;
}

-static struct vmcs *alloc_vmcs(void)
+struct vmcs *alloc_vmcs(void)
{
return alloc_vmcs_cpu(raw_smp_processor_id());
}
+EXPORT_SYMBOL_GPL(alloc_vmcs);

-static void free_vmcs(struct vmcs *vmcs)
+void free_vmcs(struct vmcs *vmcs)
{
free_pages((unsigned long)vmcs, vmcs_config.order);
}
+EXPORT_SYMBOL_GPL(free_vmcs);

/*
* Free a VMCS, but before that VMCLEAR it on the CPU where it was last loaded
@@ -5314,20 +5273,6 @@ static int handle_vmresume(struct kvm_vcpu *vcpu)
return nested_vmx_run(vcpu, false);
}

-enum vmcs_field_type {
- VMCS_FIELD_TYPE_U16 = 0,
- VMCS_FIELD_TYPE_U64 = 1,
- VMCS_FIELD_TYPE_U32 = 2,
- VMCS_FIELD_TYPE_NATURAL_WIDTH = 3
-};
-
-static inline int vmcs_field_type(unsigned long field)
-{
- if (0x1 & field) /* the *_HIGH fields are all 32 bit */
- return VMCS_FIELD_TYPE_U32;
- return (field >> 13) & 0x3 ;
-}
-
static inline int vmcs_field_readonly(unsigned long field)
{
return (((field >> 10) & 0x3) == 1);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c446435..0930fd9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -95,6 +95,9 @@ enum kvm_bus {
KVM_NR_BUSES
};

+int hardware_enable_all(void);
+void hardware_disable_all(void);
+
int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
int len, const void *val);
int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, int len,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7e14068..26fd04d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -90,8 +90,6 @@ static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
static long kvm_vcpu_compat_ioctl(struct file *file, unsigned int ioctl,
unsigned long arg);
#endif
-static int hardware_enable_all(void);
-static void hardware_disable_all(void);

static void kvm_io_bus_destroy(struct kvm_io_bus *bus);

@@ -2330,14 +2328,15 @@ static void hardware_disable_all_nolock(void)
on_each_cpu(hardware_disable_nolock, NULL, 1);
}

-static void hardware_disable_all(void)
+void hardware_disable_all(void)
{
raw_spin_lock(&kvm_lock);
hardware_disable_all_nolock();
raw_spin_unlock(&kvm_lock);
}
+EXPORT_SYMBOL_GPL(hardware_disable_all);

-static int hardware_enable_all(void)
+int hardware_enable_all(void)
{
int r = 0;

@@ -2358,6 +2357,7 @@ static int hardware_enable_all(void)

return r;
}
+EXPORT_SYMBOL_GPL(hardware_enable_all);

static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val,
void *v)
--
1.7.1

2012-06-27 08:57:35

by Zhang Yanfei

[permalink] [raw]
Subject: [PATCH v3 3/5] KVM-INTEL: Add new module vmcsinfo-intel to fill VMCSINFO

This patch implements a new module named vmcsinfo-intel. The
module fills VMCSINFO with the VMCS revision identifier,
and offsets of VMCS fields.

Note, offsets of fields below will not be filled into VMCSINFO:
1. fields defined in Intel specification (Intel® 64 and
IA-32 Architectures Software Developer’s Manual, Volume
3C) but not defined in *vmcs_field*.
2. fields unsupported.

Signed-off-by: zhangyanfei <[email protected]>
---
arch/x86/kvm/Kconfig | 11 +++
arch/x86/kvm/Makefile | 3 +
arch/x86/kvm/vmcsinfo.c | 198 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 212 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/kvm/vmcsinfo.c

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index a28f338..1dd64b1 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -63,6 +63,17 @@ config KVM_INTEL
To compile this as a module, choose M here: the module
will be called kvm-intel.

+config VMCSINFO_INTEL
+ tristate "Export VMCSINFO for Intel processors"
+ depends on KVM_INTEL
+ ---help---
+ Provides support for exporting VMCSINFO on Intel processors equipped
+ with the VT extensions. The VMCSINFO contains a VMCS revision
+ identifier and offsets of VMCS fields.
+
+ To compile this as a module, choose M here: the module
+ will be called vmcsinfo-intel.
+
config KVM_AMD
tristate "KVM for AMD processors support"
depends on KVM
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 4f579e8..12a1ef6 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -4,6 +4,7 @@ ccflags-y += -Ivirt/kvm -Iarch/x86/kvm
CFLAGS_x86.o := -I.
CFLAGS_svm.o := -I.
CFLAGS_vmx.o := -I.
+CFLAGS_vmcsinfo.o := -I.

kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
@@ -15,7 +16,9 @@ kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
i8254.o timer.o cpuid.o pmu.o
kvm-intel-y += vmx.o
kvm-amd-y += svm.o
+vmcsinfo-intel-y += vmcsinfo.o

obj-$(CONFIG_KVM) += kvm.o
obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
obj-$(CONFIG_KVM_AMD) += kvm-amd.o
+obj-$(CONFIG_VMCSINFO_INTEL) += vmcsinfo-intel.o
diff --git a/arch/x86/kvm/vmcsinfo.c b/arch/x86/kvm/vmcsinfo.c
new file mode 100644
index 0000000..7b1873c
--- /dev/null
+++ b/arch/x86/kvm/vmcsinfo.c
@@ -0,0 +1,198 @@
+/*
+ * Kernel-based Virtual Machine driver for Linux
+ *
+ * This module enables machines with Intel VT-x extensions to export
+ * offsets of VMCS fields for guest debugging.
+ *
+ * Copyright (C) 2012 Fujitsu, Inc.
+ *
+ * Authors:
+ * Zhang Yanfei <[email protected]>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/swab.h>
+
+#include <asm/vmx.h>
+#include <asm/vmcsinfo.h>
+
+MODULE_AUTHOR("Fujitsu");
+MODULE_LICENSE("GPL");
+
+static const struct x86_cpu_id vmcsinfo_cpu_id[] = {
+ X86_FEATURE_MATCH(X86_FEATURE_VMX),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, vmcsinfo_cpu_id);
+
+/*
+ * For caculating offsets of fields in VMCS data, we index every 16-bit
+ * field by this kind of format:
+ * | --------- 16 bits ---------- |
+ * +-------------+-+------------+-+
+ * | high 7 bits |1| low 7 bits |0|
+ * +-------------+-+------------+-+
+ * In high byte, the lowest bit must be 1; In low byte, the lowest bit
+ * must be 0. The two bits are set like this in case indexes in VMCS
+ * data are read as big endian mode.
+ * The remaining 14 bits of the index indicate the real offset of the
+ * field. Because the size of a VMCS region is at most 4 KBytes, so
+ * 14 bits are enough to index the whole VMCS region.
+ *
+ * ENCODING_OFFSET: encode the offset into the index of this kind.
+ * DECODING_OFFSET: decode the index of this kind into real offset.
+ */
+#define OFFSET_HIGH_SHIFT (7)
+#define OFFSET_LOW_MASK ((1 << OFFSET_HIGH_SHIFT) - 1) /* 0x7f */
+#define OFFSET_HIGH_MASK (OFFSET_LOW_MASK << OFFSET_HIGH_SHIFT) /* 0x3f80 */
+#define ENCODING_OFFSET(offset) \
+ ((((offset) & OFFSET_LOW_MASK) << 1) + \
+ ((((offset) & OFFSET_HIGH_MASK) << 2) | 0x100))
+/*
+ * index here should be always read in little endian mode.
+ */
+#define DECODING_OFFSET_LE(index) \
+ ((((index) >> 1) & OFFSET_LOW_MASK) + \
+ (((index) >> 2) & OFFSET_HIGH_MASK))
+/*
+ * n indicates the bits of index. We first check if index
+ * is read in big endian mode.
+ */
+#define DECODING_OFFSET(index, n) \
+ ((index & 1) ? (DECODING_OFFSET_LE(__swab##n(index))) : \
+ (DECODING_OFFSET_LE(index)))
+
+#define FIELD_OFFSET16(field, offset) \
+ vmcsinfo_field(field, DECODING_OFFSET(offset, 16))
+#define FIELD_OFFSET64(field, offset) \
+ vmcsinfo_field(field, DECODING_OFFSET(offset, 64))
+#define FIELD_OFFSET32(field, offset) \
+ vmcsinfo_field(field, DECODING_OFFSET(offset, 32))
+#define FIELD_OFFSETNW(field, offset) \
+do { \
+ if (sizeof(offset) == 8) \
+ vmcsinfo_field(field, DECODING_OFFSET(offset, 64)); \
+ else \
+ vmcsinfo_field(field, DECODING_OFFSET(offset, 32)); \
+} while (0)
+
+#define VMCS_FIELD_CHECK(field, offset, type) \
+do { \
+ if (vmcs_read32(VM_INSTRUCTION_ERROR) != \
+ VMXERR_UNSUPPORTED_VMCS_COMPONENT) \
+ FIELD_OFFSET##type(field, offset); \
+} while (0)
+
+static inline void vmcs_read_checking(unsigned long field)
+{
+ u16 offset16;
+ u64 offset64;
+ u32 offset32;
+ unsigned long offsetnw;
+
+ switch (vmcs_field_type(field)) {
+ case VMCS_FIELD_TYPE_U16:
+ offset16 = vmcs_read16(field);
+ VMCS_FIELD_CHECK(field, offset16, 16);
+ break;
+ case VMCS_FIELD_TYPE_U64:
+ offset64 = vmcs_read64(field);
+ VMCS_FIELD_CHECK(field, offset64, 64);
+ break;
+ case VMCS_FIELD_TYPE_U32:
+ offset32 = vmcs_read32(field);
+ VMCS_FIELD_CHECK(field, offset32, 32);
+ break;
+ case VMCS_FIELD_TYPE_NATURAL_WIDTH:
+ offsetnw = vmcs_readl(field);
+ VMCS_FIELD_CHECK(field, offsetnw, NW);
+ break;
+ }
+}
+
+/*
+ * Note, offsets of fields below will not be filled into
+ * VMCSINFO:
+ * 1. fields defined in Intel specification (Intel® 64 and
+ * IA-32 Architectures Software Developer’s Manual, Volume
+ * 3C) but not defined in *vmcs_field*.
+ * 2. fields unsupported.
+ */
+static int __init alloc_vmcsinfo_init(void)
+{
+/*
+ * The first 8 bytes in vmcs region are for
+ * VMCS revision identifier
+ * VMX-abort indicator
+ */
+#define FIELD_START (8)
+
+ int r, offset;
+ struct vmcs *vmcs;
+ int cpu;
+ unsigned long field;
+
+ if (vmcsinfo_is_filled())
+ return 0;
+
+ vmcs = alloc_vmcs();
+ if (!vmcs) {
+ return -ENOMEM;
+ }
+
+ r = hardware_enable_all();
+ if (r)
+ goto out_err;
+
+ /*
+ * Write encoded offsets into VMCS data for later vmcs_read.
+ */
+ for (offset = FIELD_START; offset < vmcs_config.size;
+ offset += sizeof(u16))
+ *(u16 *)((char *)vmcs + offset) = ENCODING_OFFSET(offset);
+
+ cpu = get_cpu();
+ vmcs_clear(vmcs);
+ per_cpu(current_vmcs, cpu) = vmcs;
+ vmcs_load(vmcs);
+
+ vmcsinfo_revision_id(vmcs->revision_id);
+ vmcs_read_checking(VM_INSTRUCTION_ERROR);
+ offset = get_vmcs_field_offset(VM_INSTRUCTION_ERROR);
+ if (offset == -1)
+ goto out_clear;
+
+ for (field = 0; field < VMCSINFO_MAX_FIELD; ++field) {
+ if (field == VM_INSTRUCTION_ERROR)
+ continue;
+ /*
+ * Before each reading, zeroed field VM_INSTRUCTION_ERROR
+ */
+ *(u32 *)((char *)vmcs + offset) = 0;
+ vmcs_read_checking(field);
+ }
+ vmcsinfo_filled();
+
+ update_vmcsinfo_note();
+
+out_clear:
+ vmcs_clear(vmcs);
+ put_cpu();
+
+out_err:
+ free_vmcs(vmcs);
+ return r;
+}
+
+static void __exit alloc_vmcsinfo_exit(void)
+{
+ hardware_disable_all();
+}
+
+module_init(alloc_vmcsinfo_init);
+module_exit(alloc_vmcsinfo_exit);
--
1.7.1

2012-06-27 08:58:12

by Zhang Yanfei

[permalink] [raw]
Subject: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs

This patch export offsets of fields via /sys/devices/cpu/vmcs/.
Individual offsets are contained in subfiles named by the filed's
encoding, e.g.: /sys/devices/cpu/vmcs/0800

Signed-off-by: zhangyanfei <[email protected]>
---
drivers/base/core.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 346be8b..dd05ee7 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -26,6 +26,7 @@
#include <linux/async.h>
#include <linux/pm_runtime.h>
#include <linux/netdevice.h>
+#include <asm/vmcsinfo.h>

#include "base.h"
#include "power/power.h"
@@ -1038,6 +1039,11 @@ int device_add(struct device *dev)
error = dpm_sysfs_add(dev);
if (error)
goto DPMError;
+#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
+ error = vmcs_sysfs_add(dev);
+ if (error)
+ goto VMCSError;
+#endif
device_pm_add(dev);

/* Notify clients of device addition. This call must come
@@ -1069,6 +1075,10 @@ int device_add(struct device *dev)
done:
put_device(dev);
return error;
+#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
+ VMCSError:
+ dpm_sysfs_remove(dev);
+#endif
DPMError:
bus_remove_device(dev);
BusError:
@@ -1171,6 +1181,9 @@ void device_del(struct device *dev)
blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
BUS_NOTIFY_DEL_DEVICE, dev);
device_pm_remove(dev);
+#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
+ vmcs_sysfs_remove(dev);
+#endif
dpm_sysfs_remove(dev);
if (parent)
klist_del(&dev->p->knode_parent);
--
1.7.1

2012-06-27 08:59:32

by Zhang Yanfei

[permalink] [raw]
Subject: [PATCH v3 5/5] Documentation: Add ABI entry for vmcs sysfs interface

Signed-off-by: zhangyanfei <[email protected]>
---
Documentation/ABI/testing/sysfs-devices-cpu-vmcs | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-devices-cpu-vmcs

diff --git a/Documentation/ABI/testing/sysfs-devices-cpu-vmcs b/Documentation/ABI/testing/sysfs-devices-cpu-vmcs
new file mode 100644
index 0000000..0846b07
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-cpu-vmcs
@@ -0,0 +1,11 @@
+What: /sys/devices/cpu/vmcs/
+Date: June 2012
+KernelVersion: 3.5.0
+Contact: Zhang Yanfei <[email protected]>
+Description:
+ A collection of vmcs fields' offsets for Intel cpu.
+
+ Individual offsets are contained in subfiles named by
+ the filed's encoding, e.g.:
+
+ /sys/devices/cpu/vmcs/0800
--
1.7.1

2012-06-27 19:22:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs

On Wed, Jun 27, 2012 at 04:54:54PM +0800, Yanfei Zhang wrote:
> This patch export offsets of fields via /sys/devices/cpu/vmcs/.
> Individual offsets are contained in subfiles named by the filed's
> encoding, e.g.: /sys/devices/cpu/vmcs/0800
>
> Signed-off-by: zhangyanfei <[email protected]>
> ---
> drivers/base/core.c | 13 +++++++++++++
> 1 files changed, 13 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 346be8b..dd05ee7 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -26,6 +26,7 @@
> #include <linux/async.h>
> #include <linux/pm_runtime.h>
> #include <linux/netdevice.h>
> +#include <asm/vmcsinfo.h>

Did you just break the build on all other arches? Not nice.

> @@ -1038,6 +1039,11 @@ int device_add(struct device *dev)
> error = dpm_sysfs_add(dev);
> if (error)
> goto DPMError;
> +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
> + error = vmcs_sysfs_add(dev);
> + if (error)
> + goto VMCSError;
> +#endif

Oh my no, that's no way to ever do this, you know better than that,
please fix.

greg k-h

2012-06-28 09:57:58

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs

于 2012年06月28日 03:22, Greg KH 写道:
> On Wed, Jun 27, 2012 at 04:54:54PM +0800, Yanfei Zhang wrote:
>> This patch export offsets of fields via /sys/devices/cpu/vmcs/.
>> Individual offsets are contained in subfiles named by the filed's
>> encoding, e.g.: /sys/devices/cpu/vmcs/0800
>>
>> Signed-off-by: zhangyanfei <[email protected]>
>> ---
>> drivers/base/core.c | 13 +++++++++++++
>> 1 files changed, 13 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index 346be8b..dd05ee7 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -26,6 +26,7 @@
>> #include <linux/async.h>
>> #include <linux/pm_runtime.h>
>> #include <linux/netdevice.h>
>> +#include <asm/vmcsinfo.h>
>
> Did you just break the build on all other arches? Not nice.
>
>> @@ -1038,6 +1039,11 @@ int device_add(struct device *dev)
>> error = dpm_sysfs_add(dev);
>> if (error)
>> goto DPMError;
>> +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
>> + error = vmcs_sysfs_add(dev);
>> + if (error)
>> + goto VMCSError;
>> +#endif
>
> Oh my no, that's no way to ever do this, you know better than that,
> please fix.
>
> greg k-h
>

Sorry for my thoughtless, Here is the new patch.

---
drivers/base/core.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 346be8b..7b5266a 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -30,6 +30,13 @@
#include "base.h"
#include "power/power.h"

+#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
+#include <asm/vmcsinfo.h>
+#else
+static inline int vmcs_sysfs_add(struct device *dev) { return 0; }
+static inline void vmcs_sysfs_remove(struct device *dev) { }
+#endif
+
#ifdef CONFIG_SYSFS_DEPRECATED
#ifdef CONFIG_SYSFS_DEPRECATED_V2
long sysfs_deprecated = 1;
@@ -1038,6 +1045,9 @@ int device_add(struct device *dev)
error = dpm_sysfs_add(dev);
if (error)
goto DPMError;
+ error = vmcs_sysfs_add(dev);
+ if (error)
+ goto VMCSError;
device_pm_add(dev);

/* Notify clients of device addition. This call must come
@@ -1069,6 +1079,8 @@ int device_add(struct device *dev)
done:
put_device(dev);
return error;
+ VMCSError:
+ dpm_sysfs_remove(dev);
DPMError:
bus_remove_device(dev);
BusError:
@@ -1171,6 +1183,7 @@ void device_del(struct device *dev)
blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
BUS_NOTIFY_DEL_DEVICE, dev);
device_pm_remove(dev);
+ vmcs_sysfs_remove(dev);
dpm_sysfs_remove(dev);
if (parent)
klist_del(&dev->p->knode_parent);
--
1.7.1

2012-06-28 11:37:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs

On Thu, Jun 28, 2012 at 05:54:30PM +0800, Yanfei Zhang wrote:
> 于 2012年06月28日 03:22, Greg KH 写道:
> > On Wed, Jun 27, 2012 at 04:54:54PM +0800, Yanfei Zhang wrote:
> >> This patch export offsets of fields via /sys/devices/cpu/vmcs/.
> >> Individual offsets are contained in subfiles named by the filed's
> >> encoding, e.g.: /sys/devices/cpu/vmcs/0800
> >>
> >> Signed-off-by: zhangyanfei <[email protected]>
> >> ---
> >> drivers/base/core.c | 13 +++++++++++++
> >> 1 files changed, 13 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >> index 346be8b..dd05ee7 100644
> >> --- a/drivers/base/core.c
> >> +++ b/drivers/base/core.c
> >> @@ -26,6 +26,7 @@
> >> #include <linux/async.h>
> >> #include <linux/pm_runtime.h>
> >> #include <linux/netdevice.h>
> >> +#include <asm/vmcsinfo.h>
> >
> > Did you just break the build on all other arches? Not nice.
> >
> >> @@ -1038,6 +1039,11 @@ int device_add(struct device *dev)
> >> error = dpm_sysfs_add(dev);
> >> if (error)
> >> goto DPMError;
> >> +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
> >> + error = vmcs_sysfs_add(dev);
> >> + if (error)
> >> + goto VMCSError;
> >> +#endif
> >
> > Oh my no, that's no way to ever do this, you know better than that,
> > please fix.
> >
> > greg k-h
> >
>
> Sorry for my thoughtless, Here is the new patch.
>
> ---
> drivers/base/core.c | 13 +++++++++++++
> 1 files changed, 13 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 346be8b..7b5266a 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -30,6 +30,13 @@
> #include "base.h"
> #include "power/power.h"
>
> +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
> +#include <asm/vmcsinfo.h>
> +#else
> +static inline int vmcs_sysfs_add(struct device *dev) { return 0; }
> +static inline void vmcs_sysfs_remove(struct device *dev) { }
> +#endif

{sigh} No, again, you know better, don't do this.

greg k-h

2012-06-29 01:55:13

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] x86: Add helper variables and functions to hold VMCSINFO

From: Yanfei Zhang <[email protected]>
Subject: [PATCH v3 1/5] x86: Add helper variables and functions to hold VMCSINFO
Date: Wed, 27 Jun 2012 16:51:58 +0800

> This patch provides a set of variables to hold the VMCSINFO and also
> some helper functions to help fill the VMCSINFO.
>
> Signed-off-by: zhangyanfei <[email protected]>
> ---
> arch/x86/include/asm/vmcsinfo.h | 219 ++++++++++++++++++++++
> arch/x86/include/asm/vmx.h | 158 +----------------
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/vmcsinfo.c | 381 +++++++++++++++++++++++++++++++++++++++
> 4 files changed, 603 insertions(+), 156 deletions(-)
> create mode 100644 arch/x86/include/asm/vmcsinfo.h
> create mode 100644 arch/x86/kernel/vmcsinfo.c
>
> diff --git a/arch/x86/include/asm/vmcsinfo.h b/arch/x86/include/asm/vmcsinfo.h
> new file mode 100644
> index 0000000..4b9f56b
> --- /dev/null
> +++ b/arch/x86/include/asm/vmcsinfo.h
> @@ -0,0 +1,219 @@
> +#ifndef _ASM_X86_VMCSINFO_H
> +#define _ASM_X86_VMCSINFO_H
> +
> +#ifndef __ASSEMBLY__
> +#include <linux/types.h>
> +#include <linux/elf.h>
> +#include <linux/device.h>
> +
> +/* VMCS Encodings */
> +enum vmcs_field {
> + VIRTUAL_PROCESSOR_ID = 0x00000000,

<cut>

> + HOST_RIP = 0x00006c16,
> +};
> +
> +/*
> + * vmcs field offsets.
> + */
> +struct vmcsinfo {
> + u32 vmcs_revision_id;
> + int filled;
> + u16 vmcs_field_to_offset_table[HOST_RIP + 1];

HOST_RIP is so large that this array becomes large. Also there are
unused elements in this array because field encoding is not indexed
constantly.

Instead, how about defining the numbr of vmcs fields, 152?, as a
specific constant, indexing each fields using integers and newly
preparing index_to_field_table[]?

Thanks.
HATAYAMA, Daisuke

2012-06-29 03:09:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 4/5] Sysfs: Export VMCSINFO via sysfs

On Thu, Jun 28, 2012 at 04:37:38AM -0700, Greg KH wrote:
> On Thu, Jun 28, 2012 at 05:54:30PM +0800, Yanfei Zhang wrote:
> > 于 2012年06月28日 03:22, Greg KH 写道:
> > > On Wed, Jun 27, 2012 at 04:54:54PM +0800, Yanfei Zhang wrote:
> > >> This patch export offsets of fields via /sys/devices/cpu/vmcs/.
> > >> Individual offsets are contained in subfiles named by the filed's
> > >> encoding, e.g.: /sys/devices/cpu/vmcs/0800
> > >>
> > >> Signed-off-by: zhangyanfei <[email protected]>
> > >> ---
> > >> drivers/base/core.c | 13 +++++++++++++
> > >> 1 files changed, 13 insertions(+), 0 deletions(-)
> > >>
> > >> diff --git a/drivers/base/core.c b/drivers/base/core.c
> > >> index 346be8b..dd05ee7 100644
> > >> --- a/drivers/base/core.c
> > >> +++ b/drivers/base/core.c
> > >> @@ -26,6 +26,7 @@
> > >> #include <linux/async.h>
> > >> #include <linux/pm_runtime.h>
> > >> #include <linux/netdevice.h>
> > >> +#include <asm/vmcsinfo.h>
> > >
> > > Did you just break the build on all other arches? Not nice.
> > >
> > >> @@ -1038,6 +1039,11 @@ int device_add(struct device *dev)
> > >> error = dpm_sysfs_add(dev);
> > >> if (error)
> > >> goto DPMError;
> > >> +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
> > >> + error = vmcs_sysfs_add(dev);
> > >> + if (error)
> > >> + goto VMCSError;
> > >> +#endif
> > >
> > > Oh my no, that's no way to ever do this, you know better than that,
> > > please fix.
> > >
> > > greg k-h
> > >
> >
> > Sorry for my thoughtless, Here is the new patch.
> >
> > ---
> > drivers/base/core.c | 13 +++++++++++++
> > 1 files changed, 13 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 346be8b..7b5266a 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -30,6 +30,13 @@
> > #include "base.h"
> > #include "power/power.h"
> >
> > +#if defined(CONFIG_KVM_INTEL) || defined(CONFIG_KVM_INTEL_MODULE)
> > +#include <asm/vmcsinfo.h>
> > +#else
> > +static inline int vmcs_sysfs_add(struct device *dev) { return 0; }
> > +static inline void vmcs_sysfs_remove(struct device *dev) { }
> > +#endif
>
> {sigh} No, again, you know better, don't do this.

Ok, as others have rightly pointed out, this wasn't the most helpful
review comment, sorry about that.

In Linux, we don't put ifdefs in .c files, we put them in .h files. See
many examples of this all over the place. That's my main complaints the
past two times of this patch.

But, for this, I would question why you even want / need to do this in
the drivers/base/core/ file in the first place. Shouldn't it be in some
arch or cpu specific file instead that already handles the cpu files?

thanks,

greg k-h