2017-12-11 10:15:18

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 00/11] Intel Processor Trace virtulization enabling

Hi All,

Here is a patch-series which adding Processor Trace enabling in KVM guest. You can get It's software developer manuals from:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
In Chapter 5 INTEL PROCESSOR TRACE: VMX IMPROVEMENTS.

Introduction:
Intel Processor Trace (Intel PT) is an extension of Intel Architecture that captures information about software execution using dedicated hardware facilities that cause only minimal performance perturbation to the software being traced. Details on the Intel PT infrastructure and trace capabilities can be found in the Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C.

The suite of architecture changes serve to simplify the process of virtualizing Intel PT for use by a guest software. There are two primary elements to this new architecture support for VMX support improvements made for Intel PT.
1. Addition of a new guest IA32_RTIT_CTL value field to the VMCS.
— This serves to speed and simplify the process of disabling trace on VM exit, and restoring it on VM entry.
2. Enabling use of EPT to redirect PT output.
— This enables the VMM to elect to virtualize the PT output buffer using EPT. In this mode, the CPU will treat PT output addresses as Guest Physical Addresses (GPAs) and translate them using EPT. This means that Intel PT output reads (of the ToPA table) and writes (of trace output) can cause EPT violations, and other output events.

Processor Trace virtualization can be work in one of 3 possible modes by set new option "pt_mode". Default value is system mode.
a. system-wide: trace both host/guest and output to host buffer;
b. host-only: only trace host and output to host buffer;
c. host-guest: trace host/guest simultaneous and output to their respective buffer.

>From V3:
- change default mode to SYSTEM mode;
- add a new patch to move PT out of scattered features;
- add a new fucntion kvm_get_pt_addr_cnt() to get the number of address ranges;
- add a new function vmx_set_rtit_ctl() to set the value of guest RTIT_CTL, GUEST_IA32_RTIT_CTL and MSRs intercept.

>From v2:
- replace *_PT_SUPPRESS_PIP to *_PT_CONCEAL_PIP;
- clean SECONDARY_EXEC_PT_USE_GPA, VM_EXIT_CLEAR_IA32_RTIT_CTL and VM_ENTRY_LOAD_IA32_RTIT_CTL in SYSTEM mode. These bits must be all set or all clean;
- move processor tracing out of scattered features;
- add a new function to enable/disable intercept MSRs read/write;
- add all Intel PT MSRs read/write and disable intercept when PT is enabled in guest;
- disable Intel PT and enable intercept MSRs when L1 guest VMXON;
- performance optimization.
In Host only mode. we just need to save host RTIT_CTL before vm-entry and restore host RTIT_CTL after vm-exit;
In HOST_GUEST mode. we need to save and restore all MSRs only when PT has enabled in guest.
- use XSAVES/XRESTORES implement context switch.
Haven't implementation in this version and still in debuging. will make a separate patch work on this.

>From v1:
- remove guest-only mode because guest-only mode can be covered by host-guest mode;
- always set "use GPA for processor tracing" in secondary execution control if it can be;
- trap RTIT_CTL read/write. Forbid write this msr when VMXON in L1 hypervisor.

Chao Peng (7):
perf/x86/intel/pt: Move Intel-PT MSR bit definitions to a public
header
perf/x86/intel/pt: Change pt_cap_get() to a public function
KVM: x86: Add Intel Processor Trace virtualization mode
KVM: x86: Add Intel Processor Trace cpuid emulation
KVM: x86: Add Intel processor trace context for each vcpu
KVM: x86: Implement Intel Processor Trace MSRs read/write
KVM: x86: Implement Intel Processor Trace context switch

Luwei Kang (3):
KVM: x86: Add a function to get the number of address ranges
KVM: x86: Add a function to disable/enable Intel PT MSRs intercept
KVM: x86: Disable Intel Processor Trace when VMXON in L1 guest

Paolo Bonzini (1):
x86: cpufeature: move processor tracing out of scattered features

arch/x86/events/intel/pt.c | 3 +-
arch/x86/events/intel/pt.h | 55 -------
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/intel_pt.h | 26 ++++
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/asm/msr-index.h | 35 +++++
arch/x86/include/asm/vmx.h | 8 +
arch/x86/kernel/cpu/scattered.c | 1 -
arch/x86/kvm/cpuid.c | 22 ++-
arch/x86/kvm/svm.c | 6 +
arch/x86/kvm/vmx.c | 297 ++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 33 +++++
12 files changed, 426 insertions(+), 64 deletions(-)

--
1.8.3.1


2017-12-11 10:15:35

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 01/11] perf/x86/intel/pt: Move Intel-PT MSR bit definitions to a public header

From: Chao Peng <[email protected]>

Intel Processor Trace virtualization enabling in guest need
to use these MSR bits, so move then to public header msr-index.h.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/events/intel/pt.h | 37 -------------------------------------
arch/x86/include/asm/msr-index.h | 33 +++++++++++++++++++++++++++++++++
2 files changed, 33 insertions(+), 37 deletions(-)

diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h
index 0eb41d0..0050ca1 100644
--- a/arch/x86/events/intel/pt.h
+++ b/arch/x86/events/intel/pt.h
@@ -20,43 +20,6 @@
#define __INTEL_PT_H__

/*
- * PT MSR bit definitions
- */
-#define RTIT_CTL_TRACEEN BIT(0)
-#define RTIT_CTL_CYCLEACC BIT(1)
-#define RTIT_CTL_OS BIT(2)
-#define RTIT_CTL_USR BIT(3)
-#define RTIT_CTL_PWR_EVT_EN BIT(4)
-#define RTIT_CTL_FUP_ON_PTW BIT(5)
-#define RTIT_CTL_CR3EN BIT(7)
-#define RTIT_CTL_TOPA BIT(8)
-#define RTIT_CTL_MTC_EN BIT(9)
-#define RTIT_CTL_TSC_EN BIT(10)
-#define RTIT_CTL_DISRETC BIT(11)
-#define RTIT_CTL_PTW_EN BIT(12)
-#define RTIT_CTL_BRANCH_EN BIT(13)
-#define RTIT_CTL_MTC_RANGE_OFFSET 14
-#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
-#define RTIT_CTL_CYC_THRESH_OFFSET 19
-#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
-#define RTIT_CTL_PSB_FREQ_OFFSET 24
-#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
-#define RTIT_CTL_ADDR0_OFFSET 32
-#define RTIT_CTL_ADDR0 (0x0full << RTIT_CTL_ADDR0_OFFSET)
-#define RTIT_CTL_ADDR1_OFFSET 36
-#define RTIT_CTL_ADDR1 (0x0full << RTIT_CTL_ADDR1_OFFSET)
-#define RTIT_CTL_ADDR2_OFFSET 40
-#define RTIT_CTL_ADDR2 (0x0full << RTIT_CTL_ADDR2_OFFSET)
-#define RTIT_CTL_ADDR3_OFFSET 44
-#define RTIT_CTL_ADDR3 (0x0full << RTIT_CTL_ADDR3_OFFSET)
-#define RTIT_STATUS_FILTEREN BIT(0)
-#define RTIT_STATUS_CONTEXTEN BIT(1)
-#define RTIT_STATUS_TRIGGEREN BIT(2)
-#define RTIT_STATUS_BUFFOVF BIT(3)
-#define RTIT_STATUS_ERROR BIT(4)
-#define RTIT_STATUS_STOPPED BIT(5)
-
-/*
* Single-entry ToPA: when this close to region boundary, switch
* buffers to avoid losing data.
*/
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 34c4922..d76bd22 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -94,7 +94,40 @@
#define MSR_PEBS_LD_LAT_THRESHOLD 0x000003f6

#define MSR_IA32_RTIT_CTL 0x00000570
+#define RTIT_CTL_TRACEEN BIT(0)
+#define RTIT_CTL_CYCLEACC BIT(1)
+#define RTIT_CTL_OS BIT(2)
+#define RTIT_CTL_USR BIT(3)
+#define RTIT_CTL_PWR_EVT_EN BIT(4)
+#define RTIT_CTL_FUP_ON_PTW BIT(5)
+#define RTIT_CTL_CR3EN BIT(7)
+#define RTIT_CTL_TOPA BIT(8)
+#define RTIT_CTL_MTC_EN BIT(9)
+#define RTIT_CTL_TSC_EN BIT(10)
+#define RTIT_CTL_DISRETC BIT(11)
+#define RTIT_CTL_PTW_EN BIT(12)
+#define RTIT_CTL_BRANCH_EN BIT(13)
+#define RTIT_CTL_MTC_RANGE_OFFSET 14
+#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
+#define RTIT_CTL_CYC_THRESH_OFFSET 19
+#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
+#define RTIT_CTL_PSB_FREQ_OFFSET 24
+#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
+#define RTIT_CTL_ADDR0_OFFSET 32
+#define RTIT_CTL_ADDR0 (0x0full << RTIT_CTL_ADDR0_OFFSET)
+#define RTIT_CTL_ADDR1_OFFSET 36
+#define RTIT_CTL_ADDR1 (0x0full << RTIT_CTL_ADDR1_OFFSET)
+#define RTIT_CTL_ADDR2_OFFSET 40
+#define RTIT_CTL_ADDR2 (0x0full << RTIT_CTL_ADDR2_OFFSET)
+#define RTIT_CTL_ADDR3_OFFSET 44
+#define RTIT_CTL_ADDR3 (0x0full << RTIT_CTL_ADDR3_OFFSET)
#define MSR_IA32_RTIT_STATUS 0x00000571
+#define RTIT_STATUS_FILTEREN BIT(0)
+#define RTIT_STATUS_CONTEXTEN BIT(1)
+#define RTIT_STATUS_TRIGGEREN BIT(2)
+#define RTIT_STATUS_BUFFOVF BIT(3)
+#define RTIT_STATUS_ERROR BIT(4)
+#define RTIT_STATUS_STOPPED BIT(5)
#define MSR_IA32_RTIT_ADDR0_A 0x00000580
#define MSR_IA32_RTIT_ADDR0_B 0x00000581
#define MSR_IA32_RTIT_ADDR1_A 0x00000582
--
1.8.3.1

2017-12-11 10:15:43

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 02/11] perf/x86/intel/pt: Change pt_cap_get() to a public function

From: Chao Peng <[email protected]>

Change pt_cap_get() to a public function so that KVM can access it.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/events/intel/pt.c | 3 ++-
arch/x86/events/intel/pt.h | 18 ------------------
arch/x86/include/asm/intel_pt.h | 20 ++++++++++++++++++++
3 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 81fd41d..a5a7e44 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -75,7 +75,7 @@
PT_CAP(psb_periods, 1, CPUID_EBX, 0xffff0000),
};

-static u32 pt_cap_get(enum pt_capabilities cap)
+u32 pt_cap_get(enum pt_capabilities cap)
{
struct pt_cap_desc *cd = &pt_caps[cap];
u32 c = pt_pmu.caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
@@ -83,6 +83,7 @@ static u32 pt_cap_get(enum pt_capabilities cap)

return (c & cd->mask) >> shift;
}
+EXPORT_SYMBOL_GPL(pt_cap_get);

static ssize_t pt_cap_show(struct device *cdev,
struct device_attribute *attr,
diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h
index 0050ca1..d75c9f3 100644
--- a/arch/x86/events/intel/pt.h
+++ b/arch/x86/events/intel/pt.h
@@ -51,24 +51,6 @@ struct topa_entry {
/* TSC to Core Crystal Clock Ratio */
#define CPUID_TSC_LEAF 0x15

-enum pt_capabilities {
- PT_CAP_max_subleaf = 0,
- PT_CAP_cr3_filtering,
- PT_CAP_psb_cyc,
- PT_CAP_ip_filtering,
- PT_CAP_mtc,
- PT_CAP_ptwrite,
- PT_CAP_power_event_trace,
- PT_CAP_topa_output,
- PT_CAP_topa_multiple_entries,
- PT_CAP_single_range_output,
- PT_CAP_payloads_lip,
- PT_CAP_num_address_ranges,
- PT_CAP_mtc_periods,
- PT_CAP_cycle_thresholds,
- PT_CAP_psb_periods,
-};
-
struct pt_pmu {
struct pmu pmu;
u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h
index b523f51..1b301e7 100644
--- a/arch/x86/include/asm/intel_pt.h
+++ b/arch/x86/include/asm/intel_pt.h
@@ -2,10 +2,30 @@
#ifndef _ASM_X86_INTEL_PT_H
#define _ASM_X86_INTEL_PT_H

+enum pt_capabilities {
+ PT_CAP_max_subleaf = 0,
+ PT_CAP_cr3_filtering,
+ PT_CAP_psb_cyc,
+ PT_CAP_ip_filtering,
+ PT_CAP_mtc,
+ PT_CAP_ptwrite,
+ PT_CAP_power_event_trace,
+ PT_CAP_topa_output,
+ PT_CAP_topa_multiple_entries,
+ PT_CAP_single_range_output,
+ PT_CAP_payloads_lip,
+ PT_CAP_num_address_ranges,
+ PT_CAP_mtc_periods,
+ PT_CAP_cycle_thresholds,
+ PT_CAP_psb_periods,
+};
+
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
void cpu_emergency_stop_pt(void);
+extern u32 pt_cap_get(enum pt_capabilities cap);
#else
static inline void cpu_emergency_stop_pt(void) {}
+static inline u32 pt_cap_get(enum pt_capabilities cap) { return 0; }
#endif

#endif /* _ASM_X86_INTEL_PT_H */
--
1.8.3.1

2017-12-11 10:15:47

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 03/11] KVM: x86: Add Intel Processor Trace virtualization mode

From: Chao Peng <[email protected]>

Intel PT virtualization can be work in one of 3 possible modes:
a. system-wide: trace both host/guest and output to host buffer;
b. host-only: only trace host and output to host buffer;
c. host-guest: trace host/guest simultaneous and output to their
respective buffer.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/include/asm/intel_pt.h | 6 ++++
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/vmx.h | 6 ++++
arch/x86/kvm/vmx.c | 68 +++++++++++++++++++++++++++++++++++++---
4 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h
index 1b301e7..c19b2de 100644
--- a/arch/x86/include/asm/intel_pt.h
+++ b/arch/x86/include/asm/intel_pt.h
@@ -2,6 +2,12 @@
#ifndef _ASM_X86_INTEL_PT_H
#define _ASM_X86_INTEL_PT_H

+enum pt_mode {
+ PT_MODE_SYSTEM = 0,
+ PT_MODE_HOST,
+ PT_MODE_HOST_GUEST,
+};
+
enum pt_capabilities {
PT_CAP_max_subleaf = 0,
PT_CAP_cr3_filtering,
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index d76bd22..fd98ef0 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -755,6 +755,7 @@
#define VMX_BASIC_INOUT 0x0040000000000000LLU

/* MSR_IA32_VMX_MISC bits */
+#define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F
/* AMD-V MSRs */
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 8b67807..27d5d37 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -76,7 +76,9 @@
#define SECONDARY_EXEC_SHADOW_VMCS 0x00004000
#define SECONDARY_EXEC_RDSEED_EXITING 0x00010000
#define SECONDARY_EXEC_ENABLE_PML 0x00020000
+#define SECONDARY_EXEC_PT_CONCEAL_VMX 0x00080000
#define SECONDARY_EXEC_XSAVES 0x00100000
+#define SECONDARY_EXEC_PT_USE_GPA 0x01000000
#define SECONDARY_EXEC_TSC_SCALING 0x02000000

#define PIN_BASED_EXT_INTR_MASK 0x00000001
@@ -97,6 +99,8 @@
#define VM_EXIT_LOAD_IA32_EFER 0x00200000
#define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER 0x00400000
#define VM_EXIT_CLEAR_BNDCFGS 0x00800000
+#define VM_EXIT_PT_CONCEAL_PIP 0x01000000
+#define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000

#define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff

@@ -108,6 +112,8 @@
#define VM_ENTRY_LOAD_IA32_PAT 0x00004000
#define VM_ENTRY_LOAD_IA32_EFER 0x00008000
#define VM_ENTRY_LOAD_BNDCFGS 0x00010000
+#define VM_ENTRY_PT_CONCEAL_PIP 0x00020000
+#define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000

#define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8eba631..f14c245 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -50,6 +50,7 @@
#include <asm/apic.h>
#include <asm/irq_remapping.h>
#include <asm/mmu_context.h>
+#include <asm/intel_pt.h>

#include "trace.h"
#include "pmu.h"
@@ -181,6 +182,10 @@
static int ple_window_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
module_param(ple_window_max, int, S_IRUGO);

+/* Default is SYSTEM mode. */
+static int __read_mostly pt_mode = PT_MODE_SYSTEM;
+module_param(pt_mode, int, S_IRUGO);
+
extern const ulong vmx_return;

#define NR_AUTOLOAD_MSRS 8
@@ -1338,6 +1343,19 @@ static inline bool cpu_has_vmx_vmfunc(void)
SECONDARY_EXEC_ENABLE_VMFUNC;
}

+static inline bool cpu_has_vmx_intel_pt(void)
+{
+ u64 vmx_msr;
+
+ rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
+ return vmx_msr & MSR_IA32_VMX_MISC_INTEL_PT;
+}
+
+static inline bool cpu_has_vmx_pt_use_gpa(void)
+{
+ return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_PT_USE_GPA;
+}
+
static inline bool report_flexpriority(void)
{
return flexpriority_enabled;
@@ -3672,6 +3690,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
SECONDARY_EXEC_RDRAND_EXITING |
SECONDARY_EXEC_ENABLE_PML |
SECONDARY_EXEC_TSC_SCALING |
+ SECONDARY_EXEC_PT_USE_GPA |
+ SECONDARY_EXEC_PT_CONCEAL_VMX |
SECONDARY_EXEC_ENABLE_VMFUNC;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
@@ -3716,7 +3736,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
#endif
opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT |
- VM_EXIT_CLEAR_BNDCFGS;
+ VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_PT_CONCEAL_PIP |
+ VM_EXIT_CLEAR_IA32_RTIT_CTL;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
&_vmexit_control) < 0)
return -EIO;
@@ -3735,11 +3756,20 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
_pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;

min = VM_ENTRY_LOAD_DEBUG_CONTROLS;
- opt = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
+ opt = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+ VM_ENTRY_PT_CONCEAL_PIP | VM_ENTRY_LOAD_IA32_RTIT_CTL;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
&_vmentry_control) < 0)
return -EIO;

+ if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_PT_USE_GPA) ||
+ !(_vmexit_control & VM_EXIT_CLEAR_IA32_RTIT_CTL) ||
+ !(_vmentry_control & VM_ENTRY_LOAD_IA32_RTIT_CTL)) {
+ _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_PT_USE_GPA;
+ _vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
+ _vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
+ }
+
rdmsr(MSR_IA32_VMX_BASIC, vmx_msr_low, vmx_msr_high);

/* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
@@ -5291,6 +5321,28 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
return exec_control;
}

+static u32 vmx_vmexit_control(struct vcpu_vmx *vmx)
+{
+ u32 vmexit_control = vmcs_config.vmexit_ctrl;
+
+ if (pt_mode == PT_MODE_SYSTEM)
+ vmexit_control &= ~(VM_EXIT_CLEAR_IA32_RTIT_CTL |
+ VM_EXIT_PT_CONCEAL_PIP);
+
+ return vmexit_control;
+}
+
+static u32 vmx_vmentry_control(struct vcpu_vmx *vmx)
+{
+ u32 vmentry_control = vmcs_config.vmentry_ctrl;
+
+ if (pt_mode == PT_MODE_SYSTEM)
+ vmentry_control &= ~(VM_ENTRY_PT_CONCEAL_PIP |
+ VM_ENTRY_LOAD_IA32_RTIT_CTL);
+
+ return vmentry_control;
+}
+
static bool vmx_rdrand_supported(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
@@ -5421,6 +5473,10 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
}
}

+ if (pt_mode == PT_MODE_SYSTEM)
+ exec_control &= ~(SECONDARY_EXEC_PT_USE_GPA |
+ SECONDARY_EXEC_PT_CONCEAL_VMX);
+
vmx->secondary_exec_control = exec_control;
}

@@ -5533,10 +5589,10 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
}


- vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
+ vm_exit_controls_init(vmx, vmx_vmexit_control(vmx));

/* 22.2.1, 20.8.1 */
- vm_entry_controls_init(vmx, vmcs_config.vmentry_ctrl);
+ vm_entry_controls_init(vmx, vmx_vmentry_control(vmx));

vmx->vcpu.arch.cr0_guest_owned_bits = X86_CR0_TS;
vmcs_writel(CR0_GUEST_HOST_MASK, ~X86_CR0_TS);
@@ -6901,6 +6957,10 @@ static __init int hardware_setup(void)

kvm_mce_cap_supported |= MCG_LMCE_P;

+ if (!enable_ept || !pt_cap_get(PT_CAP_topa_output) ||
+ !cpu_has_vmx_intel_pt() || !cpu_has_vmx_pt_use_gpa())
+ pt_mode = PT_MODE_SYSTEM;
+
return alloc_kvm_area();

out:
--
1.8.3.1

2017-12-11 10:15:53

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 04/11] x86: cpufeature: move processor tracing out of scattered features

From: Paolo Bonzini <[email protected]>

Processor tracing is already enumerated in word 9 (CPUID[7,0].EBX),
so do not duplicate it in the scattered features word.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/kernel/cpu/scattered.c | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index c0b0e9e..3e03cbe 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -203,7 +203,6 @@
#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */

#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
-#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
#define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */
#define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */

@@ -242,6 +241,7 @@
#define X86_FEATURE_AVX512IFMA ( 9*32+21) /* AVX-512 Integer Fused Multiply-Add instructions */
#define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */
+#define X86_FEATURE_INTEL_PT ( 9*32+25) /* Intel Processor Trace */
#define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
#define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */
#define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 05459ad..d0e6976 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -21,7 +21,6 @@ struct cpuid_bit {
static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
- { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x00000007, 0 },
{ X86_FEATURE_AVX512_4VNNIW, CPUID_EDX, 2, 0x00000007, 0 },
{ X86_FEATURE_AVX512_4FMAPS, CPUID_EDX, 3, 0x00000007, 0 },
{ X86_FEATURE_CAT_L3, CPUID_EBX, 1, 0x00000010, 0 },
--
1.8.3.1

2017-12-11 10:15:59

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 05/11] KVM: x86: Add Intel Processor Trace cpuid emulation

From: Chao Peng <[email protected]>

Expose Intel Processor Trace to guest only when PT work in
HOST_GUEST mode.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/cpuid.c | 22 ++++++++++++++++++++--
arch/x86/kvm/svm.c | 6 ++++++
arch/x86/kvm/vmx.c | 6 ++++++
4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5167984..6480faa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1017,6 +1017,7 @@ struct kvm_x86_ops {
void (*handle_external_intr)(struct kvm_vcpu *vcpu);
bool (*mpx_supported)(void);
bool (*xsaves_supported)(void);
+ bool (*pt_supported)(void);

int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..fcbc029 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -327,6 +327,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
unsigned f_invpcid = kvm_x86_ops->invpcid_supported() ? F(INVPCID) : 0;
unsigned f_mpx = kvm_mpx_supported() ? F(MPX) : 0;
unsigned f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
+ unsigned f_intel_pt = kvm_x86_ops->pt_supported() ? F(INTEL_PT) : 0;

/* cpuid 1.edx */
const u32 kvm_cpuid_1_edx_x86_features =
@@ -379,7 +380,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) |
F(ADX) | F(SMAP) | F(AVX512IFMA) | F(AVX512F) | F(AVX512PF) |
F(AVX512ER) | F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(AVX512DQ) |
- F(SHA_NI) | F(AVX512BW) | F(AVX512VL);
+ F(SHA_NI) | F(AVX512BW) | F(AVX512VL) | f_intel_pt;

/* cpuid 0xD.1.eax */
const u32 kvm_cpuid_D_1_eax_x86_features =
@@ -407,7 +408,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,

switch (function) {
case 0:
- entry->eax = min(entry->eax, (u32)0xd);
+ entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
break;
case 1:
entry->edx &= kvm_cpuid_1_edx_x86_features;
@@ -578,6 +579,23 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
}
break;
}
+ /* Intel PT */
+ case 0x14: {
+ int t, times = entry->eax;
+
+ if (!f_intel_pt)
+ break;
+
+ entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+ for (t = 1; t <= times; ++t) {
+ if (*nent >= maxnent)
+ goto out;
+ do_cpuid_1_ent(&entry[t], function, t);
+ entry[t].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+ ++*nent;
+ }
+ break;
+ }
case KVM_CPUID_SIGNATURE: {
static const char signature[12] = "KVMKVMKVM\0\0";
const u32 *sigptr = (const u32 *)signature;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index eb714f1..9727e8d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5204,6 +5204,11 @@ static bool svm_xsaves_supported(void)
return false;
}

+static bool svm_pt_supported(void)
+{
+ return false;
+}
+
static bool svm_has_wbinvd_exit(void)
{
return true;
@@ -5597,6 +5602,7 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
.invpcid_supported = svm_invpcid_supported,
.mpx_supported = svm_mpx_supported,
.xsaves_supported = svm_xsaves_supported,
+ .pt_supported = svm_pt_supported,

.set_supported_cpuid = svm_set_supported_cpuid,

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f14c245..de9e958 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9215,6 +9215,11 @@ static bool vmx_xsaves_supported(void)
SECONDARY_EXEC_XSAVES;
}

+static bool vmx_pt_supported(void)
+{
+ return (pt_mode == PT_MODE_HOST_GUEST);
+}
+
static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx)
{
u32 exit_intr_info;
@@ -12230,6 +12235,7 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
.handle_external_intr = vmx_handle_external_intr,
.mpx_supported = vmx_mpx_supported,
.xsaves_supported = vmx_xsaves_supported,
+ .pt_supported = vmx_pt_supported,

.check_nested_events = vmx_check_nested_events,

--
1.8.3.1

2017-12-11 10:16:06

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 06/11] KVM: x86: Add a function to get the number of address ranges

CPUID.(EAX=14H,ECX=1H).EAX[2:0] enumerates the number of
Intel Processor Trace configurable Address Ranges for filtering.

Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 13 +++++++++++++
2 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6480faa..c120202 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1217,6 +1217,7 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
bool kvm_rdpmc(struct kvm_vcpu *vcpu);
+unsigned int kvm_get_pt_addr_cnt(void);

void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index faf843c..b19a749 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -991,6 +991,19 @@ bool kvm_rdpmc(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_rdpmc);

+unsigned int kvm_get_pt_addr_cnt(void)
+{
+ unsigned int eax, ebx, ecx, edx;
+ /*
+ * - CPUID function 14H, sub-function 1:
+ * EAX[2:0] enumerates the number of Intel Processor
+ * Trace configurable Address Ranges for filtering.
+ */
+ cpuid_count(0x14, 1, &eax, &ebx, &ecx, &edx);
+ return (eax & 0x7);
+}
+EXPORT_SYMBOL_GPL(kvm_get_pt_addr_cnt);
+
/*
* List of msr numbers which we expose to userspace through KVM_GET_MSRS
* and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST.
--
1.8.3.1

2017-12-11 10:16:15

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 08/11] KVM: x86: Add Intel processor trace context for each vcpu

From: Chao Peng <[email protected]>

Add a data structure to save Intel Processor Trace context.
It mainly include the MSRs related Intel Processor Trace.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/vmx.h | 2 ++
arch/x86/kvm/vmx.c | 17 +++++++++++++++++
3 files changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index fd98ef0..03ffde6 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -136,6 +136,7 @@
#define MSR_IA32_RTIT_ADDR2_B 0x00000585
#define MSR_IA32_RTIT_ADDR3_A 0x00000586
#define MSR_IA32_RTIT_ADDR3_B 0x00000587
+#define MSR_IA32_RTIT_ADDR_COUNT 8
#define MSR_IA32_RTIT_CR3_MATCH 0x00000572
#define MSR_IA32_RTIT_OUTPUT_BASE 0x00000560
#define MSR_IA32_RTIT_OUTPUT_MASK 0x00000561
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 27d5d37..9e828d4 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -240,6 +240,8 @@ enum vmcs_field {
GUEST_PDPTR3_HIGH = 0x00002811,
GUEST_BNDCFGS = 0x00002812,
GUEST_BNDCFGS_HIGH = 0x00002813,
+ GUEST_IA32_RTIT_CTL = 0x00002814,
+ GUEST_IA32_RTIT_CTL_HIGH = 0x00002815,
HOST_IA32_PAT = 0x00002c00,
HOST_IA32_PAT_HIGH = 0x00002c01,
HOST_IA32_EFER = 0x00002c02,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e2de089..7761c25 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -581,6 +581,21 @@ static inline int pi_test_sn(struct pi_desc *pi_desc)
(unsigned long *)&pi_desc->control);
}

+struct pt_ctx {
+ u64 ctl;
+ u64 status;
+ u64 output_base;
+ u64 output_mask;
+ u64 cr3_match;
+ u64 addrs[MSR_IA32_RTIT_ADDR_COUNT];
+};
+
+struct pt_desc {
+ unsigned int addr_num;
+ struct pt_ctx host;
+ struct pt_ctx guest;
+};
+
struct vcpu_vmx {
struct kvm_vcpu vcpu;
unsigned long host_rsp;
@@ -670,6 +685,8 @@ struct vcpu_vmx {
*/
u64 msr_ia32_feature_control;
u64 msr_ia32_feature_control_valid_bits;
+
+ struct pt_desc pt_desc;
};

enum segment_cache_field {
--
1.8.3.1

2017-12-11 10:16:20

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 09/11] KVM: x86: Disable Intel Processor Trace when VMXON in L1 guest

Currently, Intel Processor Trace do not support tracing in L1 guest
VMX operation(IA32_VMX_MISC[bit 14] is 0). As mentioned in SDM,
on these type of processors, execution of the VMXON instruction will
clears IA32_RTIT_CTL.TraceEn and any attempt to write IA32_RTIT_CTL
causes a general-protection xception (#GP).

Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/kvm/vmx.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7761c25..d2e64bf 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -948,6 +948,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
u16 error_code);
+static void pt_disable_intercept_for_msr(bool flag);

static DEFINE_PER_CPU(struct vmcs *, vmxarea);
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -2469,6 +2470,15 @@ static void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, interruptibility);
}

+static void vmx_set_rtit_ctl(struct kvm_vcpu *vcpu, u64 data)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ pt_disable_intercept_for_msr(data & RTIT_CTL_TRACEEN);
+ vmcs_write64(GUEST_IA32_RTIT_CTL, data);
+ vmx->pt_desc.guest.ctl = data;
+}
+
static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
{
unsigned long rip;
@@ -7500,6 +7510,9 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
if (ret)
return ret;

+ if (pt_mode == PT_MODE_HOST_GUEST)
+ vmx_set_rtit_ctl(vcpu, 0);
+
nested_vmx_succeed(vcpu);
return kvm_skip_emulated_instruction(vcpu);
}
--
1.8.3.1

2017-12-11 10:16:27

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 10/11] KVM: x86: Implement Intel Processor Trace MSRs read/write

From: Chao Peng <[email protected]>

Intel PT MSRs read/write will not be intercepted when guest enabled
Intel PT. IA32_RTIT_CTL read/write will always cause a VM-Exit.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/kvm/vmx.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 20 +++++++++++++++++
2 files changed, 84 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d2e64bf..f948231 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -949,6 +949,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
u16 error_code);
static void pt_disable_intercept_for_msr(bool flag);
+static bool vmx_pt_supported(void);

static DEFINE_PER_CPU(struct vmcs *, vmxarea);
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3349,6 +3350,38 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
+ case MSR_IA32_RTIT_CTL:
+ if (!vmx_pt_supported())
+ return 1;
+ msr_info->data = to_vmx(vcpu)->pt_desc.guest.ctl;
+ break;
+ case MSR_IA32_RTIT_STATUS:
+ if (!vmx_pt_supported())
+ return 1;
+ msr_info->data = to_vmx(vcpu)->pt_desc.guest.status;
+ break;
+ case MSR_IA32_RTIT_CR3_MATCH:
+ if (!vmx_pt_supported())
+ return 1;
+ msr_info->data = to_vmx(vcpu)->pt_desc.guest.cr3_match;
+ break;
+ case MSR_IA32_RTIT_OUTPUT_BASE:
+ if (!vmx_pt_supported())
+ return 1;
+ msr_info->data = to_vmx(vcpu)->pt_desc.guest.output_base;
+ break;
+ case MSR_IA32_RTIT_OUTPUT_MASK:
+ if (!vmx_pt_supported())
+ return 1;
+ msr_info->data = to_vmx(vcpu)->pt_desc.guest.output_mask;
+ break;
+ case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
+ if (!vmx_pt_supported())
+ return 1;
+ msr_info->data =
+ to_vmx(vcpu)->pt_desc.guest.addrs[msr_info->index -
+ MSR_IA32_RTIT_ADDR0_A];
+ break;
case MSR_TSC_AUX:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
@@ -3473,6 +3506,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
clear_atomic_switch_msr(vmx, MSR_IA32_XSS);
break;
+ case MSR_IA32_RTIT_CTL:
+ if (!vmx_pt_supported() || to_vmx(vcpu)->nested.vmxon)
+ return 1;
+ vmx_set_rtit_ctl(vcpu, data);
+ break;
+ case MSR_IA32_RTIT_STATUS:
+ if (!vmx_pt_supported())
+ return 1;
+ vmx->pt_desc.guest.status = data;
+ break;
+ case MSR_IA32_RTIT_CR3_MATCH:
+ if (!vmx_pt_supported())
+ return 1;
+ vmx->pt_desc.guest.cr3_match = data;
+ break;
+ case MSR_IA32_RTIT_OUTPUT_BASE:
+ if (!vmx_pt_supported())
+ return 1;
+ vmx->pt_desc.guest.output_base = data;
+ break;
+ case MSR_IA32_RTIT_OUTPUT_MASK:
+ if (!vmx_pt_supported())
+ return 1;
+ vmx->pt_desc.guest.output_mask = data;
+ break;
+ case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
+ if (!vmx_pt_supported())
+ return 1;
+ vmx->pt_desc.guest.addrs[msr_info->index -
+ MSR_IA32_RTIT_ADDR0_A] = data;
+ break;
case MSR_TSC_AUX:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b19a749..3f16626 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1022,6 +1022,12 @@ unsigned int kvm_get_pt_addr_cnt(void)
#endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+ MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
+ MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
+ MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
+ MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
+ MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
+ MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
};

static unsigned num_msrs_to_save;
@@ -4337,6 +4343,20 @@ static void kvm_init_msr_list(void)
if (!kvm_x86_ops->rdtscp_supported())
continue;
break;
+ case MSR_IA32_RTIT_CTL:
+ case MSR_IA32_RTIT_STATUS:
+ case MSR_IA32_RTIT_CR3_MATCH:
+ case MSR_IA32_RTIT_OUTPUT_BASE:
+ case MSR_IA32_RTIT_OUTPUT_MASK:
+ if (!kvm_x86_ops->pt_supported())
+ continue;
+ break;
+ case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B: {
+ if (!kvm_x86_ops->pt_supported() || msrs_to_save[i] -
+ MSR_IA32_RTIT_ADDR0_A >= kvm_get_pt_addr_cnt())
+ continue;
+ break;
+ }
default:
break;
}
--
1.8.3.1

2017-12-11 10:16:30

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 11/11] KVM: x86: Implement Intel Processor Trace context switch

From: Chao Peng <[email protected]>

Load/Store Intel processor trace register in context switch.
MSR IA32_RTIT_CTL is loaded/stored automatically from VMCS.
In HOST mode, we just need to restore the status of IA32_RTIT_CTL.
In HOST_GUEST mode, we need load/resore PT MSRs only when PT is
enabled in guest.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/kvm/vmx.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f948231..d45cc76 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2144,6 +2144,55 @@ static unsigned long segment_base(u16 selector)
}
#endif

+static inline void pt_load_msr(struct pt_ctx *ctx, unsigned int addr_num)
+{
+ u32 i;
+
+ wrmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
+ wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
+ wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
+ wrmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
+ for (i = 0; i < addr_num; i++)
+ wrmsrl(MSR_IA32_RTIT_ADDR0_A + i, ctx->addrs[i]);
+}
+
+static inline void pt_save_msr(struct pt_ctx *ctx, unsigned int addr_num)
+{
+ u32 i;
+
+ rdmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
+ rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
+ rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
+ rdmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
+ for (i = 0; i < addr_num; i++)
+ rdmsrl(MSR_IA32_RTIT_ADDR0_A + i, ctx->addrs[i]);
+}
+
+static void pt_guest_enter(struct vcpu_vmx *vmx)
+{
+ if (pt_mode == PT_MODE_HOST || PT_MODE_HOST_GUEST)
+ rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
+
+ if (pt_mode == PT_MODE_HOST_GUEST &&
+ vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
+ wrmsrl(MSR_IA32_RTIT_CTL, 0);
+ pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_num);
+ pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_num);
+ }
+}
+
+static void pt_guest_exit(struct vcpu_vmx *vmx)
+{
+ if (pt_mode == PT_MODE_HOST_GUEST &&
+ vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
+ pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_num);
+ pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_num);
+ }
+
+ if (pt_mode == PT_MODE_HOST || pt_mode == PT_MODE_HOST_GUEST)
+ wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
+}
+
static void vmx_save_host_state(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -5766,6 +5815,14 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
}
+
+ if (pt_mode == PT_MODE_HOST_GUEST) {
+ memset(&vmx->pt_desc, 0, sizeof(vmx->pt_desc));
+ vmx->pt_desc.addr_num = kvm_get_pt_addr_cnt();
+ /* Bit[6~0] are forced to 1, writes are ignored. */
+ vmx->pt_desc.guest.output_mask = 0x7F;
+ vmcs_write64(GUEST_IA32_RTIT_CTL, 0);
+ }
}

static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -9589,6 +9646,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
vcpu->arch.pkru != vmx->host_pkru)
__write_pkru(vcpu->arch.pkru);

+ pt_guest_enter(vmx);
+
atomic_switch_perf_msrs(vmx);
debugctlmsr = get_debugctlmsr();

@@ -9724,6 +9783,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
| (1 << VCPU_EXREG_CR3));
vcpu->arch.regs_dirty = 0;

+ pt_guest_exit(vmx);
+
/*
* eager fpu is enabled if PKEY is supported and CR4 is switched
* back on host, so it is safe to read guest PKRU from current
--
1.8.3.1

2017-12-11 10:17:46

by Luwei Kang

[permalink] [raw]
Subject: [PATCH V4 07/11] KVM: x86: Add a function to disable/enable Intel PT MSRs intercept

Intel Processor Trace MSRs(except IA32_RTIT_CTL) would be passthrough to
guest when Intel PT is enable in guest. So we need this function to
disable/enable intercept these MSRs.

Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/kvm/vmx.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index de9e958..e2de089 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4978,6 +4978,41 @@ static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
}
}

+static void __vmx_enable_intercept_for_msr(unsigned long *msr_bitmap,
+ u32 msr, int type)
+{
+ int f = sizeof(unsigned long);
+
+ if (!cpu_has_vmx_msr_bitmap())
+ return;
+
+ /*
+ * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
+ * have the write-low and read-high bitmap offsets the wrong way round.
+ * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
+ */
+ if (msr <= 0x1fff) {
+ if (type & MSR_TYPE_R)
+ /* read-low */
+ __set_bit(msr, msr_bitmap + 0x000 / f);
+
+ if (type & MSR_TYPE_W)
+ /* write-low */
+ __set_bit(msr, msr_bitmap + 0x800 / f);
+
+ } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
+ msr &= 0x1fff;
+ if (type & MSR_TYPE_R)
+ /* read-high */
+ __set_bit(msr, msr_bitmap + 0x400 / f);
+
+ if (type & MSR_TYPE_W)
+ /* write-high */
+ __set_bit(msr, msr_bitmap + 0xc00 / f);
+
+ }
+}
+
/*
* If a msr is allowed by L0, we should check whether it is allowed by L1.
* The corresponding bit will be cleared unless both of L0 and L1 allow it.
@@ -5033,6 +5068,39 @@ static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
msr, MSR_TYPE_R | MSR_TYPE_W);
}

+static void vmx_enable_intercept_for_msr(u32 msr, bool longmode_only)
+{
+ if (!longmode_only)
+ __vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy,
+ msr, MSR_TYPE_R | MSR_TYPE_W);
+ __vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode,
+ msr, MSR_TYPE_R | MSR_TYPE_W);
+}
+
+static void pt_disable_intercept_for_msr(bool flag)
+{
+ unsigned int i;
+ unsigned int addr_num = kvm_get_pt_addr_cnt();
+
+ if (flag) {
+ vmx_disable_intercept_for_msr(MSR_IA32_RTIT_STATUS, false);
+ vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_BASE, false);
+ vmx_disable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_MASK, false);
+ vmx_disable_intercept_for_msr(MSR_IA32_RTIT_CR3_MATCH, false);
+ for (i = 0; i < addr_num; i++)
+ vmx_disable_intercept_for_msr(MSR_IA32_RTIT_ADDR0_A + i,
+ false);
+ } else {
+ vmx_enable_intercept_for_msr(MSR_IA32_RTIT_STATUS, false);
+ vmx_enable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_BASE, false);
+ vmx_enable_intercept_for_msr(MSR_IA32_RTIT_OUTPUT_MASK, false);
+ vmx_enable_intercept_for_msr(MSR_IA32_RTIT_CR3_MATCH, false);
+ for (i = 0; i < addr_num; i++)
+ vmx_enable_intercept_for_msr(MSR_IA32_RTIT_ADDR0_A + i,
+ false);
+ }
+}
+
static void vmx_disable_intercept_msr_x2apic(u32 msr, int type, bool apicv_active)
{
if (apicv_active) {
--
1.8.3.1

2017-12-12 17:06:22

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V4 11/11] KVM: x86: Implement Intel Processor Trace context switch

On 10/12/2017 21:30, Luwei Kang wrote:
> From: Chao Peng <[email protected]>
>
> Load/Store Intel processor trace register in context switch.
> MSR IA32_RTIT_CTL is loaded/stored automatically from VMCS.
> In HOST mode, we just need to restore the status of IA32_RTIT_CTL.
> In HOST_GUEST mode, we need load/resore PT MSRs only when PT is
> enabled in guest.
>
> Signed-off-by: Chao Peng <[email protected]>
> Signed-off-by: Luwei Kang <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 61 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f948231..d45cc76 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2144,6 +2144,55 @@ static unsigned long segment_base(u16 selector)
> }
> #endif
>
> +static inline void pt_load_msr(struct pt_ctx *ctx, unsigned int addr_num)
> +{
> + u32 i;
> +
> + wrmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
> + wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
> + wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
> + wrmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
> + for (i = 0; i < addr_num; i++)
> + wrmsrl(MSR_IA32_RTIT_ADDR0_A + i, ctx->addrs[i]);
> +}
> +
> +static inline void pt_save_msr(struct pt_ctx *ctx, unsigned int addr_num)
> +{
> + u32 i;
> +
> + rdmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
> + rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
> + rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
> + rdmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
> + for (i = 0; i < addr_num; i++)
> + rdmsrl(MSR_IA32_RTIT_ADDR0_A + i, ctx->addrs[i]);
> +}
> +
> +static void pt_guest_enter(struct vcpu_vmx *vmx)
> +{
> + if (pt_mode == PT_MODE_HOST || PT_MODE_HOST_GUEST)

Small mistake here (missing "pt_mode == ").

Thanks,

Paolo

> + rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
> +
> + if (pt_mode == PT_MODE_HOST_GUEST &&
> + vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
> + wrmsrl(MSR_IA32_RTIT_CTL, 0);
> + pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_num);
> + pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_num);
> + }
> +}
> +
> +static void pt_guest_exit(struct vcpu_vmx *vmx)
> +{
> + if (pt_mode == PT_MODE_HOST_GUEST &&
> + vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
> + pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_num);
> + pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_num);
> + }
> +
> + if (pt_mode == PT_MODE_HOST || pt_mode == PT_MODE_HOST_GUEST)
> + wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
> +}
> +
> static void vmx_save_host_state(struct kvm_vcpu *vcpu)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -5766,6 +5815,14 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
> vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
> }
> +
> + if (pt_mode == PT_MODE_HOST_GUEST) {
> + memset(&vmx->pt_desc, 0, sizeof(vmx->pt_desc));
> + vmx->pt_desc.addr_num = kvm_get_pt_addr_cnt();
> + /* Bit[6~0] are forced to 1, writes are ignored. */
> + vmx->pt_desc.guest.output_mask = 0x7F;
> + vmcs_write64(GUEST_IA32_RTIT_CTL, 0);
> + }
> }
>
> static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> @@ -9589,6 +9646,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
> vcpu->arch.pkru != vmx->host_pkru)
> __write_pkru(vcpu->arch.pkru);
>
> + pt_guest_enter(vmx);
> +
> atomic_switch_perf_msrs(vmx);
> debugctlmsr = get_debugctlmsr();
>
> @@ -9724,6 +9783,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
> | (1 << VCPU_EXREG_CR3));
> vcpu->arch.regs_dirty = 0;
>
> + pt_guest_exit(vmx);
> +
> /*
> * eager fpu is enabled if PKEY is supported and CR4 is switched
> * back on host, so it is safe to read guest PKRU from current
>

2017-12-12 17:09:57

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V4 00/11] Intel Processor Trace virtulization enabling

On 10/12/2017 21:30, Luwei Kang wrote:
> Hi All,
>
> Here is a patch-series which adding Processor Trace enabling in KVM guest. You can get It's software developer manuals from:
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> In Chapter 5 INTEL PROCESSOR TRACE: VMX IMPROVEMENTS.
>
> Introduction:
> Intel Processor Trace (Intel PT) is an extension of Intel Architecture that captures information about software execution using dedicated hardware facilities that cause only minimal performance perturbation to the software being traced. Details on the Intel PT infrastructure and trace capabilities can be found in the Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C.
>
> The suite of architecture changes serve to simplify the process of virtualizing Intel PT for use by a guest software. There are two primary elements to this new architecture support for VMX support improvements made for Intel PT.
> 1. Addition of a new guest IA32_RTIT_CTL value field to the VMCS.
> — This serves to speed and simplify the process of disabling trace on VM exit, and restoring it on VM entry.
> 2. Enabling use of EPT to redirect PT output.
> — This enables the VMM to elect to virtualize the PT output buffer using EPT. In this mode, the CPU will treat PT output addresses as Guest Physical Addresses (GPAs) and translate them using EPT. This means that Intel PT output reads (of the ToPA table) and writes (of trace output) can cause EPT violations, and other output events.
>
> Processor Trace virtualization can be work in one of 3 possible modes by set new option "pt_mode". Default value is system mode.
> a. system-wide: trace both host/guest and output to host buffer;
> b. host-only: only trace host and output to host buffer;
> c. host-guest: trace host/guest simultaneous and output to their respective buffer.
>
> From V3:
> - change default mode to SYSTEM mode;
> - add a new patch to move PT out of scattered features;
> - add a new fucntion kvm_get_pt_addr_cnt() to get the number of address ranges;
> - add a new function vmx_set_rtit_ctl() to set the value of guest RTIT_CTL, GUEST_IA32_RTIT_CTL and MSRs intercept.
>
> From v2:
> - replace *_PT_SUPPRESS_PIP to *_PT_CONCEAL_PIP;
> - clean SECONDARY_EXEC_PT_USE_GPA, VM_EXIT_CLEAR_IA32_RTIT_CTL and VM_ENTRY_LOAD_IA32_RTIT_CTL in SYSTEM mode. These bits must be all set or all clean;
> - move processor tracing out of scattered features;
> - add a new function to enable/disable intercept MSRs read/write;
> - add all Intel PT MSRs read/write and disable intercept when PT is enabled in guest;
> - disable Intel PT and enable intercept MSRs when L1 guest VMXON;
> - performance optimization.
> In Host only mode. we just need to save host RTIT_CTL before vm-entry and restore host RTIT_CTL after vm-exit;
> In HOST_GUEST mode. we need to save and restore all MSRs only when PT has enabled in guest.
> - use XSAVES/XRESTORES implement context switch.
> Haven't implementation in this version and still in debuging. will make a separate patch work on this.
>
> From v1:
> - remove guest-only mode because guest-only mode can be covered by host-guest mode;
> - always set "use GPA for processor tracing" in secondary execution control if it can be;
> - trap RTIT_CTL read/write. Forbid write this msr when VMXON in L1 hypervisor.
>
> Chao Peng (7):
> perf/x86/intel/pt: Move Intel-PT MSR bit definitions to a public
> header
> perf/x86/intel/pt: Change pt_cap_get() to a public function
> KVM: x86: Add Intel Processor Trace virtualization mode
> KVM: x86: Add Intel Processor Trace cpuid emulation
> KVM: x86: Add Intel processor trace context for each vcpu
> KVM: x86: Implement Intel Processor Trace MSRs read/write
> KVM: x86: Implement Intel Processor Trace context switch
>
> Luwei Kang (3):
> KVM: x86: Add a function to get the number of address ranges
> KVM: x86: Add a function to disable/enable Intel PT MSRs intercept
> KVM: x86: Disable Intel Processor Trace when VMXON in L1 guest
>
> Paolo Bonzini (1):
> x86: cpufeature: move processor tracing out of scattered features
>
> arch/x86/events/intel/pt.c | 3 +-
> arch/x86/events/intel/pt.h | 55 -------
> arch/x86/include/asm/cpufeatures.h | 2 +-
> arch/x86/include/asm/intel_pt.h | 26 ++++
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/include/asm/msr-index.h | 35 +++++
> arch/x86/include/asm/vmx.h | 8 +
> arch/x86/kernel/cpu/scattered.c | 1 -
> arch/x86/kvm/cpuid.c | 22 ++-
> arch/x86/kvm/svm.c | 6 +
> arch/x86/kvm/vmx.c | 297 ++++++++++++++++++++++++++++++++++++-
> arch/x86/kvm/x86.c | 33 +++++
> 12 files changed, 426 insertions(+), 64 deletions(-)
>

Queued, thanks.

Paolo

2018-01-09 21:40:01

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V4 09/11] KVM: x86: Disable Intel Processor Trace when VMXON in L1 guest

On 09/01/2018 21:16, Jim Mattson wrote:
> This doesn't look right to me. pt_disable_intercept_for_msr calls
> either vmx_disable_intercept_for_msr or vmx_enable_intercept_for_msr,
> both of which only change vmx_msr_bitmap_legacy
> and vmx_msr_bitmap_longmode. Neither of these MSR permission bitmaps is
> likely to be the one in use by L1 (which is more likely to be using the
> mode-appropriate x2apic or x2apic_apicv bitmap). Moreover, these changes
> affect all of the VMs that are using those MSR permission bitmaps.

Yeah, I replied this morning with something like this, on the thread
about removing the longmode bitmaps.

Paolo