2018-05-22 02:58:35

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 00/12] Intel Processor Trace virtualization enabling

Hi All,

Here is a patch-series which adding Processor Trace enabling in KVM guest. You can get It's software developer manuals from:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
In Chapter 4 INTEL PROCESSOR TRACE: VMX IMPROVEMENTS.

Introduction:
Intel Processor Trace (Intel PT) is an extension of Intel Architecture that captures information about software execution using dedicated hardware facilities that cause only minimal performance perturbation to the software being traced. Details on the Intel PT infrastructure and trace capabilities can be found in the Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C.

The suite of architecture changes serve to simplify the process of virtualizing Intel PT for use by a guest software. There are two primary elements to this new architecture support for VMX support improvements made for Intel PT.
1. Addition of a new guest IA32_RTIT_CTL value field to the VMCS.
— This serves to speed and simplify the process of disabling trace on VM exit, and restoring it on VM entry.
2. Enabling use of EPT to redirect PT output.
— This enables the VMM to elect to virtualize the PT output buffer using EPT. In this mode, the CPU will treat PT output addresses as Guest Physical Addresses (GPAs) and translate them using EPT. This means that Intel PT output reads (of the ToPA table) and writes (of trace output) can cause EPT violations, and other output events.

Processor Trace virtualization can be work in one of 2 possible modes by set new option "pt_mode". Default value is system mode.
a. system-wide: trace both host/guest and output to host buffer;
b. host-guest: trace host/guest simultaneous and output to their respective buffer.

>From V8:
- move macro definition MSR_IA32_RTIT_ADDR_RANGE from msr-index.h to intel_pt.h;
- initialize the RTIT_CTL bitmask to ~0ULL.

>From V7:
- remove host only mode since it can be emulated by perf code;
- merge patch 8 and 9 to make code and data in the same patch;
- rename __pt_cap_get() to pt_cap_decode();
- other minor change.

>From V6:
- split pathes 1~2 to four separate patches (these patches do 2 things) and add more descriptions.

>From V5:
- rename the function from pt_cap_get_ex() to __pt_cap_get();
- replace the most of function from vmx_pt_supported() to "pt_mode == PT_MODE_HOST_GUEST"(or !=).

>From V4:
- add data check when setting the value of MSR_IA32_RTIT_CTL;
- Invoke new interface to set the intercept of MSRs read/write after "MSR bitmap per-vcpu" patches.

>From V3:
- change default mode to SYSTEM mode;
- add a new patch to move PT out of scattered features;
- add a new fucntion kvm_get_pt_addr_cnt() to get the number of address ranges;
- add a new function vmx_set_rtit_ctl() to set the value of guest RTIT_CTL, GUEST_IA32_RTIT_CTL and MSRs intercept.

>From v2:
- replace *_PT_SUPPRESS_PIP to *_PT_CONCEAL_PIP;
- clean SECONDARY_EXEC_PT_USE_GPA, VM_EXIT_CLEAR_IA32_RTIT_CTL and VM_ENTRY_LOAD_IA32_RTIT_CTL in SYSTEM mode. These bits must be all set or all clean;
- move processor tracing out of scattered features;
- add a new function to enable/disable intercept MSRs read/write;
- add all Intel PT MSRs read/write and disable intercept when PT is enabled in guest;
- disable Intel PT and enable intercept MSRs when L1 guest VMXON;
- performance optimization.
In Host only mode. we just need to save host RTIT_CTL before vm-entry and restore host RTIT_CTL after vm-exit;
In HOST_GUEST mode. we need to save and restore all MSRs only when PT has enabled in guest.
- use XSAVES/XRESTORES implement context switch.
Haven't implementation in this version and still in debuging. will make a separate patch work on this.

>From v1:
- remove guest-only mode because guest-only mode can be covered by host-guest mode;
- always set "use GPA for processor tracing" in secondary execution control if it can be;
- trap RTIT_CTL read/write. Forbid write this msr when VMXON in L1 hypervisor.

Chao Peng (7):
perf/x86/intel/pt: Move Intel-PT MSRs bit definitions to a public
header
perf/x86/intel/pt: Change pt_cap_get() to a public function
KVM: x86: Add Intel Processor Trace virtualization mode
KVM: x86: Add Intel Processor Trace cpuid emulation
KVM: x86: Add Intel Processor Trace context switch for each vcpu
KVM: x86: Implement Intel Processor Trace MSRs read/write emulation
KVM: x86: Set intercept for Intel PT MSRs read/write

Luwei Kang (5):
perf/x86/intel/pt: Add new bit definitions for Intel PT MSRs
perf/x86/intel/pt: add new capability for Intel PT
perf/x86/intel/pt: Introduce a new function to get capability of Intel
PT
KVM: x86: Introduce a function to initialize the PT configuration
KVM: x86: Disable Intel Processor Trace when VMXON in L1 guest

arch/x86/events/intel/pt.c | 12 +-
arch/x86/events/intel/pt.h | 58 ------
arch/x86/include/asm/intel_pt.h | 39 ++++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/msr-index.h | 37 ++++
arch/x86/include/asm/vmx.h | 8 +
arch/x86/kvm/cpuid.c | 22 +-
arch/x86/kvm/svm.c | 6 +
arch/x86/kvm/vmx.c | 439 ++++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 33 ++-
10 files changed, 588 insertions(+), 67 deletions(-)

--
1.8.3.1



2018-05-22 02:58:42

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 01/12] perf/x86/intel/pt: Move Intel-PT MSRs bit definitions to a public header

From: Chao Peng <[email protected]>

Intel Processor Trace virtualization enabling in KVM guest
need to access these MSRs bit definitions, so move them to
public header file msr-index.h.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/events/intel/pt.h | 37 -------------------------------------
arch/x86/include/asm/msr-index.h | 33 +++++++++++++++++++++++++++++++++
2 files changed, 33 insertions(+), 37 deletions(-)

diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h
index 0eb41d0..0050ca1 100644
--- a/arch/x86/events/intel/pt.h
+++ b/arch/x86/events/intel/pt.h
@@ -20,43 +20,6 @@
#define __INTEL_PT_H__

/*
- * PT MSR bit definitions
- */
-#define RTIT_CTL_TRACEEN BIT(0)
-#define RTIT_CTL_CYCLEACC BIT(1)
-#define RTIT_CTL_OS BIT(2)
-#define RTIT_CTL_USR BIT(3)
-#define RTIT_CTL_PWR_EVT_EN BIT(4)
-#define RTIT_CTL_FUP_ON_PTW BIT(5)
-#define RTIT_CTL_CR3EN BIT(7)
-#define RTIT_CTL_TOPA BIT(8)
-#define RTIT_CTL_MTC_EN BIT(9)
-#define RTIT_CTL_TSC_EN BIT(10)
-#define RTIT_CTL_DISRETC BIT(11)
-#define RTIT_CTL_PTW_EN BIT(12)
-#define RTIT_CTL_BRANCH_EN BIT(13)
-#define RTIT_CTL_MTC_RANGE_OFFSET 14
-#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
-#define RTIT_CTL_CYC_THRESH_OFFSET 19
-#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
-#define RTIT_CTL_PSB_FREQ_OFFSET 24
-#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
-#define RTIT_CTL_ADDR0_OFFSET 32
-#define RTIT_CTL_ADDR0 (0x0full << RTIT_CTL_ADDR0_OFFSET)
-#define RTIT_CTL_ADDR1_OFFSET 36
-#define RTIT_CTL_ADDR1 (0x0full << RTIT_CTL_ADDR1_OFFSET)
-#define RTIT_CTL_ADDR2_OFFSET 40
-#define RTIT_CTL_ADDR2 (0x0full << RTIT_CTL_ADDR2_OFFSET)
-#define RTIT_CTL_ADDR3_OFFSET 44
-#define RTIT_CTL_ADDR3 (0x0full << RTIT_CTL_ADDR3_OFFSET)
-#define RTIT_STATUS_FILTEREN BIT(0)
-#define RTIT_STATUS_CONTEXTEN BIT(1)
-#define RTIT_STATUS_TRIGGEREN BIT(2)
-#define RTIT_STATUS_BUFFOVF BIT(3)
-#define RTIT_STATUS_ERROR BIT(4)
-#define RTIT_STATUS_STOPPED BIT(5)
-
-/*
* Single-entry ToPA: when this close to region boundary, switch
* buffers to avoid losing data.
*/
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 53d5b1b..afe4e13 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -106,7 +106,40 @@
#define MSR_PEBS_LD_LAT_THRESHOLD 0x000003f6

#define MSR_IA32_RTIT_CTL 0x00000570
+#define RTIT_CTL_TRACEEN BIT(0)
+#define RTIT_CTL_CYCLEACC BIT(1)
+#define RTIT_CTL_OS BIT(2)
+#define RTIT_CTL_USR BIT(3)
+#define RTIT_CTL_PWR_EVT_EN BIT(4)
+#define RTIT_CTL_FUP_ON_PTW BIT(5)
+#define RTIT_CTL_CR3EN BIT(7)
+#define RTIT_CTL_TOPA BIT(8)
+#define RTIT_CTL_MTC_EN BIT(9)
+#define RTIT_CTL_TSC_EN BIT(10)
+#define RTIT_CTL_DISRETC BIT(11)
+#define RTIT_CTL_PTW_EN BIT(12)
+#define RTIT_CTL_BRANCH_EN BIT(13)
+#define RTIT_CTL_MTC_RANGE_OFFSET 14
+#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
+#define RTIT_CTL_CYC_THRESH_OFFSET 19
+#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
+#define RTIT_CTL_PSB_FREQ_OFFSET 24
+#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
+#define RTIT_CTL_ADDR0_OFFSET 32
+#define RTIT_CTL_ADDR0 (0x0full << RTIT_CTL_ADDR0_OFFSET)
+#define RTIT_CTL_ADDR1_OFFSET 36
+#define RTIT_CTL_ADDR1 (0x0full << RTIT_CTL_ADDR1_OFFSET)
+#define RTIT_CTL_ADDR2_OFFSET 40
+#define RTIT_CTL_ADDR2 (0x0full << RTIT_CTL_ADDR2_OFFSET)
+#define RTIT_CTL_ADDR3_OFFSET 44
+#define RTIT_CTL_ADDR3 (0x0full << RTIT_CTL_ADDR3_OFFSET)
#define MSR_IA32_RTIT_STATUS 0x00000571
+#define RTIT_STATUS_FILTEREN BIT(0)
+#define RTIT_STATUS_CONTEXTEN BIT(1)
+#define RTIT_STATUS_TRIGGEREN BIT(2)
+#define RTIT_STATUS_BUFFOVF BIT(3)
+#define RTIT_STATUS_ERROR BIT(4)
+#define RTIT_STATUS_STOPPED BIT(5)
#define MSR_IA32_RTIT_ADDR0_A 0x00000580
#define MSR_IA32_RTIT_ADDR0_B 0x00000581
#define MSR_IA32_RTIT_ADDR1_A 0x00000582
--
1.8.3.1


2018-05-22 02:59:16

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 03/12] perf/x86/intel/pt: Add new bit definitions for Intel PT MSRs

These bit definitions are use for emulate MSRs read/write
for KVM. For example, IA32_RTIT_CTL.FabricEn[bit 6] is available
only when CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 1. If KVM guest
try to set this bit with CPUID.(EAX=14H, ECX=0):ECX[bit3] = 0
a #GP would be injected to KVM guest.

Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/include/asm/msr-index.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index afe4e13..6ae2462 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -112,6 +112,7 @@
#define RTIT_CTL_USR BIT(3)
#define RTIT_CTL_PWR_EVT_EN BIT(4)
#define RTIT_CTL_FUP_ON_PTW BIT(5)
+#define RTIT_CTL_FABRIC_EN BIT(6)
#define RTIT_CTL_CR3EN BIT(7)
#define RTIT_CTL_TOPA BIT(8)
#define RTIT_CTL_MTC_EN BIT(9)
@@ -140,6 +141,8 @@
#define RTIT_STATUS_BUFFOVF BIT(3)
#define RTIT_STATUS_ERROR BIT(4)
#define RTIT_STATUS_STOPPED BIT(5)
+#define RTIT_STATUS_BYTECNT_OFFSET 32
+#define RTIT_STATUS_BYTECNT (0x1ffffull << RTIT_STATUS_BYTECNT_OFFSET)
#define MSR_IA32_RTIT_ADDR0_A 0x00000580
#define MSR_IA32_RTIT_ADDR0_B 0x00000581
#define MSR_IA32_RTIT_ADDR1_A 0x00000582
--
1.8.3.1


2018-05-22 02:59:35

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 05/12] perf/x86/intel/pt: Introduce a new function to get capability of Intel PT

New function pt_cap_decode() will be invoked in KVM to check
if a specific capability is available in KVM guest.
Another function pt_cap_get() can only check the hardware
capabilities but this may different with KVM guest because
some features may not be exposed to guest.

Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/events/intel/pt.c | 10 ++++++++--
arch/x86/include/asm/intel_pt.h | 2 ++
2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index f65f97a..18a2e80 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -76,14 +76,20 @@
PT_CAP(psb_periods, 1, CPUID_EBX, 0xffff0000),
};

-u32 pt_cap_get(enum pt_capabilities cap)
+u32 pt_cap_decode(u32 *caps, enum pt_capabilities cap)
{
struct pt_cap_desc *cd = &pt_caps[cap];
- u32 c = pt_pmu.caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
+ u32 c = caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
unsigned int shift = __ffs(cd->mask);

return (c & cd->mask) >> shift;
}
+EXPORT_SYMBOL_GPL(pt_cap_decode);
+
+u32 pt_cap_get(enum pt_capabilities cap)
+{
+ return pt_cap_decode(pt_pmu.caps, cap);
+}
EXPORT_SYMBOL_GPL(pt_cap_get);

static ssize_t pt_cap_show(struct device *cdev,
diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h
index 2de4db0..9c71453 100644
--- a/arch/x86/include/asm/intel_pt.h
+++ b/arch/x86/include/asm/intel_pt.h
@@ -27,9 +27,11 @@ enum pt_capabilities {
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
void cpu_emergency_stop_pt(void);
extern u32 pt_cap_get(enum pt_capabilities cap);
+extern u32 pt_cap_decode(u32 *caps, enum pt_capabilities cap);
#else
static inline void cpu_emergency_stop_pt(void) {}
static inline u32 pt_cap_get(enum pt_capabilities cap) { return 0; }
+static u32 pt_cap_decode(u32 *caps, enum pt_capabilities cap) { return 0; }
#endif

#endif /* _ASM_X86_INTEL_PT_H */
--
1.8.3.1


2018-05-22 02:59:39

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 06/12] KVM: x86: Add Intel Processor Trace virtualization mode

From: Chao Peng <[email protected]>

Intel PT virtualization can be work in one of 2 possible modes:
a. system-wide: trace both host and guest and output to host buffer;
b. host-guest: trace host/guest simultaneous and output to their
respective buffer.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/include/asm/intel_pt.h | 3 ++
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/vmx.h | 8 +++++
arch/x86/kvm/vmx.c | 68 +++++++++++++++++++++++++++++++++++++---
4 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h
index 9c71453..5748205 100644
--- a/arch/x86/include/asm/intel_pt.h
+++ b/arch/x86/include/asm/intel_pt.h
@@ -5,6 +5,9 @@
#define PT_CPUID_LEAVES 2
#define PT_CPUID_REGS_NUM 4 /* number of regsters (eax, ebx, ecx, edx) */

+#define PT_MODE_SYSTEM 0
+#define PT_MODE_HOST_GUEST 1
+
enum pt_capabilities {
PT_CAP_max_subleaf = 0,
PT_CAP_cr3_filtering,
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6ae2462..6b14325 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -789,6 +789,7 @@
#define VMX_BASIC_INOUT 0x0040000000000000LLU

/* MSR_IA32_VMX_MISC bits */
+#define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
#define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F
/* AMD-V MSRs */
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 5db8b0b..5936d72 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -76,7 +76,9 @@
#define SECONDARY_EXEC_SHADOW_VMCS 0x00004000
#define SECONDARY_EXEC_RDSEED_EXITING 0x00010000
#define SECONDARY_EXEC_ENABLE_PML 0x00020000
+#define SECONDARY_EXEC_PT_CONCEAL_VMX 0x00080000
#define SECONDARY_EXEC_XSAVES 0x00100000
+#define SECONDARY_EXEC_PT_USE_GPA 0x01000000
#define SECONDARY_EXEC_TSC_SCALING 0x02000000

#define PIN_BASED_EXT_INTR_MASK 0x00000001
@@ -97,6 +99,8 @@
#define VM_EXIT_LOAD_IA32_EFER 0x00200000
#define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER 0x00400000
#define VM_EXIT_CLEAR_BNDCFGS 0x00800000
+#define VM_EXIT_PT_CONCEAL_PIP 0x01000000
+#define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000

#define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff

@@ -108,6 +112,8 @@
#define VM_ENTRY_LOAD_IA32_PAT 0x00004000
#define VM_ENTRY_LOAD_IA32_EFER 0x00008000
#define VM_ENTRY_LOAD_BNDCFGS 0x00010000
+#define VM_ENTRY_PT_CONCEAL_PIP 0x00020000
+#define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000

#define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff

@@ -234,6 +240,8 @@ enum vmcs_field {
GUEST_PDPTR3_HIGH = 0x00002811,
GUEST_BNDCFGS = 0x00002812,
GUEST_BNDCFGS_HIGH = 0x00002813,
+ GUEST_IA32_RTIT_CTL = 0x00002814,
+ GUEST_IA32_RTIT_CTL_HIGH = 0x00002815,
HOST_IA32_PAT = 0x00002c00,
HOST_IA32_PAT_HIGH = 0x00002c01,
HOST_IA32_EFER = 0x00002c02,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ea09813..bb96396 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -53,6 +53,7 @@
#include <asm/mmu_context.h>
#include <asm/nospec-branch.h>
#include <asm/mshyperv.h>
+#include <asm/intel_pt.h>

#include "trace.h"
#include "pmu.h"
@@ -186,6 +187,10 @@
static unsigned int ple_window_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
module_param(ple_window_max, uint, 0444);

+/* Default is SYSTEM mode. */
+static int __read_mostly pt_mode = PT_MODE_SYSTEM;
+module_param(pt_mode, int, S_IRUGO);
+
extern const ulong vmx_return;

struct kvm_vmx {
@@ -1512,6 +1517,20 @@ static bool vmx_umip_emulated(void)
SECONDARY_EXEC_DESC;
}

+static inline bool cpu_has_vmx_intel_pt(void)
+{
+ u64 vmx_msr;
+
+ rdmsrl(MSR_IA32_VMX_MISC, vmx_msr);
+ return !!(vmx_msr & MSR_IA32_VMX_MISC_INTEL_PT);
+}
+
+static inline bool cpu_has_vmx_pt_use_gpa(void)
+{
+ return !!(vmcs_config.cpu_based_2nd_exec_ctrl &
+ SECONDARY_EXEC_PT_USE_GPA);
+}
+
static inline bool report_flexpriority(void)
{
return flexpriority_enabled;
@@ -4026,6 +4045,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
SECONDARY_EXEC_RDRAND_EXITING |
SECONDARY_EXEC_ENABLE_PML |
SECONDARY_EXEC_TSC_SCALING |
+ SECONDARY_EXEC_PT_USE_GPA |
+ SECONDARY_EXEC_PT_CONCEAL_VMX |
SECONDARY_EXEC_ENABLE_VMFUNC;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
@@ -4070,7 +4091,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
#endif
opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT |
- VM_EXIT_CLEAR_BNDCFGS;
+ VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_PT_CONCEAL_PIP |
+ VM_EXIT_CLEAR_IA32_RTIT_CTL;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
&_vmexit_control) < 0)
return -EIO;
@@ -4089,11 +4111,20 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
_pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;

min = VM_ENTRY_LOAD_DEBUG_CONTROLS;
- opt = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
+ opt = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+ VM_ENTRY_PT_CONCEAL_PIP | VM_ENTRY_LOAD_IA32_RTIT_CTL;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
&_vmentry_control) < 0)
return -EIO;

+ if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_PT_USE_GPA) ||
+ !(_vmexit_control & VM_EXIT_CLEAR_IA32_RTIT_CTL) ||
+ !(_vmentry_control & VM_ENTRY_LOAD_IA32_RTIT_CTL)) {
+ _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_PT_USE_GPA;
+ _vmexit_control &= ~VM_EXIT_CLEAR_IA32_RTIT_CTL;
+ _vmentry_control &= ~VM_ENTRY_LOAD_IA32_RTIT_CTL;
+ }
+
rdmsr(MSR_IA32_VMX_BASIC, vmx_msr_low, vmx_msr_high);

/* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
@@ -5814,6 +5845,28 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
return exec_control;
}

+static u32 vmx_vmexit_control(struct vcpu_vmx *vmx)
+{
+ u32 vmexit_control = vmcs_config.vmexit_ctrl;
+
+ if (pt_mode == PT_MODE_SYSTEM)
+ vmexit_control &= ~(VM_EXIT_CLEAR_IA32_RTIT_CTL |
+ VM_EXIT_PT_CONCEAL_PIP);
+
+ return vmexit_control;
+}
+
+static u32 vmx_vmentry_control(struct vcpu_vmx *vmx)
+{
+ u32 vmentry_control = vmcs_config.vmentry_ctrl;
+
+ if (pt_mode == PT_MODE_SYSTEM)
+ vmentry_control &= ~(VM_ENTRY_PT_CONCEAL_PIP |
+ VM_ENTRY_LOAD_IA32_RTIT_CTL);
+
+ return vmentry_control;
+}
+
static bool vmx_rdrand_supported(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
@@ -5950,6 +6003,10 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
}
}

+ if (pt_mode == PT_MODE_SYSTEM)
+ exec_control &= ~(SECONDARY_EXEC_PT_USE_GPA |
+ SECONDARY_EXEC_PT_CONCEAL_VMX);
+
vmx->secondary_exec_control = exec_control;
}

@@ -6060,10 +6117,10 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);

- vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
+ vm_exit_controls_init(vmx, vmx_vmexit_control(vmx));

/* 22.2.1, 20.8.1 */
- vm_entry_controls_init(vmx, vmcs_config.vmentry_ctrl);
+ vm_entry_controls_init(vmx, vmx_vmentry_control(vmx));

vmx->vcpu.arch.cr0_guest_owned_bits = X86_CR0_TS;
vmcs_writel(CR0_GUEST_HOST_MASK, ~X86_CR0_TS);
@@ -7384,6 +7441,9 @@ static __init int hardware_setup(void)

kvm_mce_cap_supported |= MCG_LMCE_P;

+ if (!enable_ept || !cpu_has_vmx_intel_pt() || !cpu_has_vmx_pt_use_gpa())
+ pt_mode = PT_MODE_SYSTEM;
+
return alloc_kvm_area();

out:
--
1.8.3.1


2018-05-22 02:59:45

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 08/12] KVM: x86: Add Intel Processor Trace context switch for each vcpu

From: Chao Peng <[email protected]>

Load/Store Intel processor trace register in context switch.
MSR IA32_RTIT_CTL is loaded/stored automatically from VMCS.
In HOST_GUEST mode, we need load/resore PT MSRs only when PT
is enabled in guest.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/include/asm/intel_pt.h | 2 +
arch/x86/kvm/vmx.c | 94 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 96 insertions(+)

diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h
index 5748205..70f4139 100644
--- a/arch/x86/include/asm/intel_pt.h
+++ b/arch/x86/include/asm/intel_pt.h
@@ -8,6 +8,8 @@
#define PT_MODE_SYSTEM 0
#define PT_MODE_HOST_GUEST 1

+#define RTIT_ADDR_RANGE 4
+
enum pt_capabilities {
PT_CAP_max_subleaf = 0,
PT_CAP_cr3_filtering,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 24aded4..11fb90a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -597,6 +597,24 @@ static inline int pi_test_sn(struct pi_desc *pi_desc)
(unsigned long *)&pi_desc->control);
}

+struct pt_ctx {
+ u64 ctl;
+ u64 status;
+ u64 output_base;
+ u64 output_mask;
+ u64 cr3_match;
+ u64 addr_a[RTIT_ADDR_RANGE];
+ u64 addr_b[RTIT_ADDR_RANGE];
+};
+
+struct pt_desc {
+ u64 ctl_bitmask;
+ u32 addr_range;
+ u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
+ struct pt_ctx host;
+ struct pt_ctx guest;
+};
+
struct vcpu_vmx {
struct kvm_vcpu vcpu;
unsigned long host_rsp;
@@ -693,6 +711,8 @@ struct vcpu_vmx {
*/
u64 msr_ia32_feature_control;
u64 msr_ia32_feature_control_valid_bits;
+
+ struct pt_desc pt_desc;
};

enum segment_cache_field {
@@ -2391,6 +2411,69 @@ static unsigned long segment_base(u16 selector)
}
#endif

+static inline void pt_load_msr(struct pt_ctx *ctx, u32 addr_range)
+{
+ u32 i;
+
+ wrmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
+ wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
+ wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
+ wrmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
+ for (i = 0; i < addr_range; i++) {
+ wrmsrl(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]);
+ wrmsrl(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]);
+ }
+}
+
+static inline void pt_save_msr(struct pt_ctx *ctx, u32 addr_range)
+{
+ u32 i;
+
+ rdmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
+ rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
+ rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
+ rdmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
+ for (i = 0; i < addr_range; i++) {
+ rdmsrl(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]);
+ rdmsrl(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]);
+ }
+}
+
+static void pt_guest_enter(struct vcpu_vmx *vmx)
+{
+ if (pt_mode == PT_MODE_SYSTEM)
+ return;
+
+ /* Save host state before VM entry */
+ rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
+
+ /*
+ * Set guest state of MSR_IA32_RTIT_CTL MSR (PT will be disabled
+ * on VM entry when it has been disabled in guest before).
+ */
+ vmcs_write64(GUEST_IA32_RTIT_CTL, vmx->pt_desc.guest.ctl);
+
+ if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
+ wrmsrl(MSR_IA32_RTIT_CTL, 0);
+ pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_range);
+ pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_range);
+ }
+}
+
+static void pt_guest_exit(struct vcpu_vmx *vmx)
+{
+ if (pt_mode == PT_MODE_SYSTEM)
+ return;
+
+ if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
+ pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_range);
+ pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_range);
+ }
+
+ /* Reload host state (IA32_RTIT_CTL will be cleared on VM exit). */
+ wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
+}
+
static void vmx_save_host_state(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -6135,6 +6218,13 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
}
+
+ if (pt_mode == PT_MODE_HOST_GUEST) {
+ memset(&vmx->pt_desc, 0, sizeof(vmx->pt_desc));
+ /* Bit[6~0] are forced to 1, writes are ignored. */
+ vmx->pt_desc.guest.output_mask = 0x7F;
+ vmcs_write64(GUEST_IA32_RTIT_CTL, 0);
+ }
}

static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -9800,6 +9890,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
vcpu->arch.pkru != vmx->host_pkru)
__write_pkru(vcpu->arch.pkru);

+ pt_guest_enter(vmx);
+
atomic_switch_perf_msrs(vmx);

vmx_arm_hv_timer(vcpu);
@@ -9994,6 +10086,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
| (1 << VCPU_EXREG_CR3));
vcpu->arch.regs_dirty = 0;

+ pt_guest_exit(vmx);
+
/*
* eager fpu is enabled if PKEY is supported and CR4 is switched
* back on host, so it is safe to read guest PKRU from current
--
1.8.3.1


2018-05-22 02:59:53

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 11/12] KVM: x86: Set intercept for Intel PT MSRs read/write

From: Chao Peng <[email protected]>

Disable intercept Intel PT MSRs only when Intel PT is
enabled in guest. But MSR_IA32_RTIT_CTL will alway be
intercept.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/kvm/vmx.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 770cb7c..a09157c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -948,6 +948,7 @@ static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
u32 msr, int type);
+static void pt_set_intercept_for_msr(struct vcpu_vmx *vmx, bool flag);

static DEFINE_PER_CPU(struct vmcs *, vmxarea);
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3999,6 +4000,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vmx_rtit_ctl_check(vcpu, data))
return 1;
vmcs_write64(GUEST_IA32_RTIT_CTL, data);
+ pt_set_intercept_for_msr(vmx, !(data & RTIT_CTL_TRACEEN));
vmx->pt_desc.guest.ctl = data;
break;
case MSR_IA32_RTIT_STATUS:
@@ -5820,6 +5822,27 @@ static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu)
vmx->msr_bitmap_mode = mode;
}

+static void pt_set_intercept_for_msr(struct vcpu_vmx *vmx, bool flag)
+{
+ unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
+ u32 i;
+
+ vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_RTIT_STATUS,
+ MSR_TYPE_RW, flag);
+ vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_RTIT_OUTPUT_BASE,
+ MSR_TYPE_RW, flag);
+ vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_RTIT_OUTPUT_MASK,
+ MSR_TYPE_RW, flag);
+ vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_RTIT_CR3_MATCH,
+ MSR_TYPE_RW, flag);
+ for (i = 0; i < vmx->pt_desc.addr_range; i++) {
+ vmx_set_intercept_for_msr(msr_bitmap,
+ MSR_IA32_RTIT_ADDR0_A + i * 2, MSR_TYPE_RW, flag);
+ vmx_set_intercept_for_msr(msr_bitmap,
+ MSR_IA32_RTIT_ADDR0_B + i * 2, MSR_TYPE_RW, flag);
+ }
+}
+
static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu)
{
return enable_apicv;
--
1.8.3.1


2018-05-22 03:00:06

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 12/12] KVM: x86: Disable Intel Processor Trace when VMXON in L1 guest

Currently, Intel Processor Trace do not support tracing in L1 guest
VMX operation(IA32_VMX_MISC[bit 14] is 0). As mentioned in SDM,
on these type of processors, execution of the VMXON instruction will
clears IA32_RTIT_CTL.TraceEn and any attempt to write IA32_RTIT_CTL
causes a general-protection exception (#GP).

Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/kvm/vmx.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a09157c..093c1f7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3997,7 +3997,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
case MSR_IA32_RTIT_CTL:
if ((pt_mode != PT_MODE_HOST_GUEST) ||
- vmx_rtit_ctl_check(vcpu, data))
+ vmx_rtit_ctl_check(vcpu, data) ||
+ vmx->nested.vmxon)
return 1;
vmcs_write64(GUEST_IA32_RTIT_CTL, data);
pt_set_intercept_for_msr(vmx, !(data & RTIT_CTL_TRACEEN));
@@ -8090,6 +8091,11 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
if (ret)
return ret;

+ if (pt_mode == PT_MODE_HOST_GUEST) {
+ vmx->pt_desc.guest.ctl = 0;
+ pt_set_intercept_for_msr(vmx, 1);
+ }
+
nested_vmx_succeed(vcpu);
return kvm_skip_emulated_instruction(vcpu);
}
--
1.8.3.1


2018-05-22 03:00:43

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 07/12] KVM: x86: Add Intel Processor Trace cpuid emulation

From: Chao Peng <[email protected]>

Expose Intel Processor Trace to guest only when PT work in
HOST_GUEST mode.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/cpuid.c | 22 ++++++++++++++++++++--
arch/x86/kvm/svm.c | 6 ++++++
arch/x86/kvm/vmx.c | 6 ++++++
4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b27de80..8f3c7ea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1026,6 +1026,7 @@ struct kvm_x86_ops {
bool (*mpx_supported)(void);
bool (*xsaves_supported)(void);
bool (*umip_emulated)(void);
+ bool (*pt_supported)(void);

int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 82055b9..e04bf67 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -336,6 +336,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
unsigned f_mpx = kvm_mpx_supported() ? F(MPX) : 0;
unsigned f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
unsigned f_umip = kvm_x86_ops->umip_emulated() ? F(UMIP) : 0;
+ unsigned f_intel_pt = kvm_x86_ops->pt_supported() ? F(INTEL_PT) : 0;

/* cpuid 1.edx */
const u32 kvm_cpuid_1_edx_x86_features =
@@ -393,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) |
F(ADX) | F(SMAP) | F(AVX512IFMA) | F(AVX512F) | F(AVX512PF) |
F(AVX512ER) | F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(AVX512DQ) |
- F(SHA_NI) | F(AVX512BW) | F(AVX512VL);
+ F(SHA_NI) | F(AVX512BW) | F(AVX512VL) | f_intel_pt;

/* cpuid 0xD.1.eax */
const u32 kvm_cpuid_D_1_eax_x86_features =
@@ -423,7 +424,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,

switch (function) {
case 0:
- entry->eax = min(entry->eax, (u32)0xd);
+ entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
break;
case 1:
entry->edx &= kvm_cpuid_1_edx_x86_features;
@@ -595,6 +596,23 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
}
break;
}
+ /* Intel PT */
+ case 0x14: {
+ int t, times = entry->eax;
+
+ if (!f_intel_pt)
+ break;
+
+ entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+ for (t = 1; t <= times; ++t) {
+ if (*nent >= maxnent)
+ goto out;
+ do_cpuid_1_ent(&entry[t], function, t);
+ entry[t].flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+ ++*nent;
+ }
+ break;
+ }
case KVM_CPUID_SIGNATURE: {
static const char signature[12] = "KVMKVMKVM\0\0";
const u32 *sigptr = (const u32 *)signature;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 220e5a89..6df8075 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5876,6 +5876,11 @@ static bool svm_umip_emulated(void)
return false;
}

+static bool svm_pt_supported(void)
+{
+ return false;
+}
+
static bool svm_has_wbinvd_exit(void)
{
return true;
@@ -7101,6 +7106,7 @@ static int svm_unregister_enc_region(struct kvm *kvm,
.mpx_supported = svm_mpx_supported,
.xsaves_supported = svm_xsaves_supported,
.umip_emulated = svm_umip_emulated,
+ .pt_supported = svm_pt_supported,

.set_supported_cpuid = svm_set_supported_cpuid,

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bb96396..24aded4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9588,6 +9588,11 @@ static bool vmx_xsaves_supported(void)
SECONDARY_EXEC_XSAVES;
}

+static bool vmx_pt_supported(void)
+{
+ return (pt_mode == PT_MODE_HOST_GUEST);
+}
+
static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx)
{
u32 exit_intr_info;
@@ -12809,6 +12814,7 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
.mpx_supported = vmx_mpx_supported,
.xsaves_supported = vmx_xsaves_supported,
.umip_emulated = vmx_umip_emulated,
+ .pt_supported = vmx_pt_supported,

.check_nested_events = vmx_check_nested_events,

--
1.8.3.1


2018-05-22 03:01:07

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 04/12] perf/x86/intel/pt: add new capability for Intel PT

CPUID(EAX=14H,ECX=0):EBX[bit 3] = 1 indicates support of
output to Trace Transport subsystem.
MSR IA32_RTIT_CTL.FabricEn[bit 6] is reserved if
CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 0.
This is use for emulate IA32_RTIT_CTL MSR read/write
in KVM. KVM guest write IA32_RTIT_CTL will trap to
root mode and a #GP would be injected to guest if set
IA32_RTIT_CTL.FabricEn with
CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 0.

Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/events/intel/pt.c | 1 +
arch/x86/include/asm/intel_pt.h | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c80e2f5..f65f97a 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -68,6 +68,7 @@
PT_CAP(topa_output, 0, CPUID_ECX, BIT(0)),
PT_CAP(topa_multiple_entries, 0, CPUID_ECX, BIT(1)),
PT_CAP(single_range_output, 0, CPUID_ECX, BIT(2)),
+ PT_CAP(output_subsys, 0, CPUID_ECX, BIT(3)),
PT_CAP(payloads_lip, 0, CPUID_ECX, BIT(31)),
PT_CAP(num_address_ranges, 1, CPUID_EAX, 0x3),
PT_CAP(mtc_periods, 1, CPUID_EAX, 0xffff0000),
diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h
index 4270421..2de4db0 100644
--- a/arch/x86/include/asm/intel_pt.h
+++ b/arch/x86/include/asm/intel_pt.h
@@ -16,6 +16,7 @@ enum pt_capabilities {
PT_CAP_topa_output,
PT_CAP_topa_multiple_entries,
PT_CAP_single_range_output,
+ PT_CAP_output_subsys,
PT_CAP_payloads_lip,
PT_CAP_num_address_ranges,
PT_CAP_mtc_periods,
--
1.8.3.1


2018-05-22 03:01:14

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 10/12] KVM: x86: Implement Intel Processor Trace MSRs read/write emulation

From: Chao Peng <[email protected]>

This patch implement Intel Processor Trace MSRs read/write
emulation.
Intel PT MSRs read/write need to be emulated when Intel PT
MSRs is intercepted in guest and during live migration.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/include/asm/intel_pt.h | 8 ++
arch/x86/kvm/vmx.c | 172 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 33 +++++++-
3 files changed, 212 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h
index 70f4139..3a25dc1 100644
--- a/arch/x86/include/asm/intel_pt.h
+++ b/arch/x86/include/asm/intel_pt.h
@@ -10,6 +10,14 @@

#define RTIT_ADDR_RANGE 4

+#define MSR_IA32_RTIT_STATUS_MASK (~(RTIT_STATUS_FILTEREN | \
+ RTIT_STATUS_CONTEXTEN | RTIT_STATUS_TRIGGEREN | \
+ RTIT_STATUS_ERROR | RTIT_STATUS_STOPPED | \
+ RTIT_STATUS_BYTECNT))
+
+#define MSR_IA32_RTIT_OUTPUT_BASE_MASK \
+ (~((1UL << cpuid_query_maxphyaddr(vcpu)) - 1) | 0x7f)
+
enum pt_capabilities {
PT_CAP_max_subleaf = 0,
PT_CAP_cr3_filtering,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 952ddf4..770cb7c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2809,6 +2809,77 @@ static void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, interruptibility);
}

+static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+ unsigned long value;
+
+ /*
+ * Any MSR write that attempts to change bits marked reserved will
+ * case a #GP fault.
+ */
+ if (data & vmx->pt_desc.ctl_bitmask)
+ return 1;
+
+ /*
+ * Any attempt to modify IA32_RTIT_CTL while TraceEn is set will
+ * result in a #GP unless the same write also clears TraceEn.
+ */
+ if ((vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) &&
+ ((vmx->pt_desc.guest.ctl ^ data) & ~RTIT_CTL_TRACEEN))
+ return 1;
+
+ /*
+ * WRMSR to IA32_RTIT_CTL that sets TraceEn but clears this bit
+ * and FabricEn would cause #GP, if
+ * CPUID.(EAX=14H, ECX=0):ECX.SNGLRGNOUT[bit 2] = 0
+ */
+ if ((data & RTIT_CTL_TRACEEN) && !(data & RTIT_CTL_TOPA) &&
+ !(data & RTIT_CTL_FABRIC_EN) &&
+ !pt_cap_decode(vmx->pt_desc.caps, PT_CAP_single_range_output))
+ return 1;
+
+ /*
+ * MTCFreq, CycThresh and PSBFreq encodings check, any MSR write that
+ * utilize encodings marked reserved will casue a #GP fault.
+ */
+ value = pt_cap_decode(vmx->pt_desc.caps, PT_CAP_mtc_periods);
+ if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_mtc) &&
+ !test_bit((data & RTIT_CTL_MTC_RANGE) >>
+ RTIT_CTL_MTC_RANGE_OFFSET, &value))
+ return 1;
+ value = pt_cap_decode(vmx->pt_desc.caps, PT_CAP_cycle_thresholds);
+ if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_psb_cyc) &&
+ !test_bit((data & RTIT_CTL_CYC_THRESH) >>
+ RTIT_CTL_CYC_THRESH_OFFSET, &value))
+ return 1;
+ value = pt_cap_decode(vmx->pt_desc.caps, PT_CAP_psb_periods);
+ if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_psb_cyc) &&
+ !test_bit((data & RTIT_CTL_PSB_FREQ) >>
+ RTIT_CTL_PSB_FREQ_OFFSET, &value))
+ return 1;
+
+ /*
+ * If ADDRx_CFG is reserved or the encodings is >2 will
+ * cause a #GP fault.
+ */
+ value = (data & RTIT_CTL_ADDR0) >> RTIT_CTL_ADDR0_OFFSET;
+ if ((value && (vmx->pt_desc.addr_range < 1)) || (value > 2))
+ return 1;
+ value = (data & RTIT_CTL_ADDR1) >> RTIT_CTL_ADDR1_OFFSET;
+ if ((value && (vmx->pt_desc.addr_range < 2)) || (value > 2))
+ return 1;
+ value = (data & RTIT_CTL_ADDR2) >> RTIT_CTL_ADDR2_OFFSET;
+ if ((value && (vmx->pt_desc.addr_range < 3)) || (value > 2))
+ return 1;
+ value = (data & RTIT_CTL_ADDR3) >> RTIT_CTL_ADDR3_OFFSET;
+ if ((value && (vmx->pt_desc.addr_range < 4)) || (value > 2))
+ return 1;
+
+ return 0;
+}
+
+
static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
{
unsigned long rip;
@@ -3625,6 +3696,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct shared_msr_entry *msr;
+ u32 index;

switch (msr_info->index) {
#ifdef CONFIG_X86_64
@@ -3691,6 +3763,51 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
+ case MSR_IA32_RTIT_CTL:
+ if (pt_mode != PT_MODE_HOST_GUEST)
+ return 1;
+ msr_info->data = vmx->pt_desc.guest.ctl;
+ break;
+ case MSR_IA32_RTIT_STATUS:
+ if (pt_mode != PT_MODE_HOST_GUEST)
+ return 1;
+ msr_info->data = vmx->pt_desc.guest.status;
+ break;
+ case MSR_IA32_RTIT_CR3_MATCH:
+ if ((pt_mode != PT_MODE_HOST_GUEST) ||
+ !pt_cap_decode(vmx->pt_desc.caps, PT_CAP_cr3_filtering))
+ return 1;
+ msr_info->data = vmx->pt_desc.guest.cr3_match;
+ break;
+ case MSR_IA32_RTIT_OUTPUT_BASE:
+ if ((pt_mode != PT_MODE_HOST_GUEST) ||
+ (!pt_cap_decode(vmx->pt_desc.caps,
+ PT_CAP_topa_output) &&
+ !pt_cap_decode(vmx->pt_desc.caps,
+ PT_CAP_single_range_output)))
+ return 1;
+ msr_info->data = vmx->pt_desc.guest.output_base;
+ break;
+ case MSR_IA32_RTIT_OUTPUT_MASK:
+ if ((pt_mode != PT_MODE_HOST_GUEST) ||
+ (!pt_cap_decode(vmx->pt_desc.caps,
+ PT_CAP_topa_output) &&
+ !pt_cap_decode(vmx->pt_desc.caps,
+ PT_CAP_single_range_output)))
+ return 1;
+ msr_info->data = vmx->pt_desc.guest.output_mask;
+ break;
+ case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
+ index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
+ if ((pt_mode != PT_MODE_HOST_GUEST) ||
+ (index >= 2 * pt_cap_decode(vmx->pt_desc.caps,
+ PT_CAP_num_address_ranges)))
+ return 1;
+ if (index % 2)
+ msr_info->data = vmx->pt_desc.guest.addr_b[index / 2];
+ else
+ msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
+ break;
case MSR_TSC_AUX:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
@@ -3722,6 +3839,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
int ret = 0;
u32 msr_index = msr_info->index;
u64 data = msr_info->data;
+ u32 index;

switch (msr_index) {
case MSR_EFER:
@@ -3876,6 +3994,60 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
clear_atomic_switch_msr(vmx, MSR_IA32_XSS);
break;
+ case MSR_IA32_RTIT_CTL:
+ if ((pt_mode != PT_MODE_HOST_GUEST) ||
+ vmx_rtit_ctl_check(vcpu, data))
+ return 1;
+ vmcs_write64(GUEST_IA32_RTIT_CTL, data);
+ vmx->pt_desc.guest.ctl = data;
+ break;
+ case MSR_IA32_RTIT_STATUS:
+ if ((pt_mode != PT_MODE_HOST_GUEST) ||
+ (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
+ (data & MSR_IA32_RTIT_STATUS_MASK))
+ return 1;
+ vmx->pt_desc.guest.status = data;
+ break;
+ case MSR_IA32_RTIT_CR3_MATCH:
+ if ((pt_mode != PT_MODE_HOST_GUEST) ||
+ (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
+ !pt_cap_decode(vmx->pt_desc.caps, PT_CAP_cr3_filtering))
+ return 1;
+ vmx->pt_desc.guest.cr3_match = data;
+ break;
+ case MSR_IA32_RTIT_OUTPUT_BASE:
+ if ((pt_mode != PT_MODE_HOST_GUEST) ||
+ (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
+ (!pt_cap_decode(vmx->pt_desc.caps,
+ PT_CAP_topa_output) &&
+ !pt_cap_decode(vmx->pt_desc.caps,
+ PT_CAP_single_range_output)) ||
+ (data & MSR_IA32_RTIT_OUTPUT_BASE_MASK))
+ return 1;
+ vmx->pt_desc.guest.output_base = data;
+ break;
+ case MSR_IA32_RTIT_OUTPUT_MASK:
+ if ((pt_mode != PT_MODE_HOST_GUEST) ||
+ (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
+ (!pt_cap_decode(vmx->pt_desc.caps,
+ PT_CAP_topa_output) &&
+ !pt_cap_decode(vmx->pt_desc.caps,
+ PT_CAP_single_range_output)))
+ return 1;
+ vmx->pt_desc.guest.output_mask = data;
+ break;
+ case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
+ index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
+ if ((pt_mode != PT_MODE_HOST_GUEST) ||
+ (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
+ (index >= 2 * pt_cap_decode(vmx->pt_desc.caps,
+ PT_CAP_num_address_ranges)))
+ return 1;
+ if (index % 2)
+ vmx->pt_desc.guest.addr_b[index / 2] = data;
+ else
+ vmx->pt_desc.guest.addr_a[index / 2] = data;
+ break;
case MSR_TSC_AUX:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 182693f..8f3d465 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -69,6 +69,7 @@
#include <asm/irq_remapping.h>
#include <asm/mshyperv.h>
#include <asm/hypervisor.h>
+#include <asm/intel_pt.h>

#define CREATE_TRACE_POINTS
#include "trace.h"
@@ -1030,7 +1031,13 @@ bool kvm_rdpmc(struct kvm_vcpu *vcpu)
#endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
- MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES
+ MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES,
+ MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
+ MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
+ MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
+ MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
+ MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
+ MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
};

static unsigned num_msrs_to_save;
@@ -4599,6 +4606,30 @@ static void kvm_init_msr_list(void)
if (!kvm_x86_ops->rdtscp_supported())
continue;
break;
+ case MSR_IA32_RTIT_CTL:
+ case MSR_IA32_RTIT_STATUS:
+ if (!kvm_x86_ops->pt_supported())
+ continue;
+ break;
+ case MSR_IA32_RTIT_CR3_MATCH:
+ if (!kvm_x86_ops->pt_supported() ||
+ !pt_cap_get(PT_CAP_cr3_filtering))
+ continue;
+ break;
+ case MSR_IA32_RTIT_OUTPUT_BASE:
+ case MSR_IA32_RTIT_OUTPUT_MASK:
+ if (!kvm_x86_ops->pt_supported() ||
+ (!pt_cap_get(PT_CAP_topa_output) &&
+ !pt_cap_get(PT_CAP_single_range_output)))
+ continue;
+ break;
+ case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B: {
+ if (!kvm_x86_ops->pt_supported() ||
+ msrs_to_save[i] - MSR_IA32_RTIT_ADDR0_A >=
+ 2 * pt_cap_get(PT_CAP_num_address_ranges))
+ continue;
+ break;
+ }
default:
break;
}
--
1.8.3.1


2018-05-22 03:01:24

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 09/12] KVM: x86: Introduce a function to initialize the PT configuration

Initialize the Intel PT configuration when cpuid update.
Include cpuid inforamtion, rtit_ctl bit mask and the number of
address ranges.

Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/kvm/vmx.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 70 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 11fb90a..952ddf4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10411,6 +10411,72 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
#undef cr4_fixed1_update
}

+static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct kvm_cpuid_entry2 *best = NULL;
+ int i;
+
+ for (i = 0; i < PT_CPUID_LEAVES; i++) {
+ best = kvm_find_cpuid_entry(vcpu, 0x14, i);
+ if (!best)
+ return;
+ vmx->pt_desc.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM] = best->eax;
+ vmx->pt_desc.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM] = best->ebx;
+ vmx->pt_desc.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM] = best->ecx;
+ vmx->pt_desc.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM] = best->edx;
+ }
+
+ /* Get the number of configurable Address Ranges for filtering */
+ vmx->pt_desc.addr_range = pt_cap_decode(vmx->pt_desc.caps,
+ PT_CAP_num_address_ranges);
+
+ /* Initialize and clear the no dependency bits */
+ vmx->pt_desc.ctl_bitmask = ~0ULL;
+ vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
+ RTIT_CTL_USR | RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC);
+
+ /* If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set */
+ if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_cr3_filtering))
+ vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_CR3EN;
+
+ /*
+ * If CPUID.(EAX=14H,ECX=0):EBX[1]=1 CYCEn, CycThresh and
+ * PSBFreq can be set
+ */
+ if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_psb_cyc))
+ vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_CYCLEACC |
+ RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ);
+
+ /*
+ * If CPUID.(EAX=14H,ECX=0):EBX[3]=1 MTCEn BranchEn and
+ * MTCFreq can be set
+ */
+ if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_mtc))
+ vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_MTC_EN |
+ RTIT_CTL_BRANCH_EN | RTIT_CTL_MTC_RANGE);
+
+ /* If CPUID.(EAX=14H,ECX=0):EBX[4]=1 FUPonPTW and PTWEn can be set */
+ if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_ptwrite))
+ vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_FUP_ON_PTW |
+ RTIT_CTL_PTW_EN);
+
+ /* If CPUID.(EAX=14H,ECX=0):EBX[5]=1 PwrEvEn can be set */
+ if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_power_event_trace))
+ vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_PWR_EVT_EN;
+
+ /* If CPUID.(EAX=14H,ECX=0):ECX[0]=1 ToPA can be set */
+ if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_topa_output))
+ vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_TOPA;
+ /* If CPUID.(EAX=14H,ECX=0):ECX[3]=1 FabircEn can be set */
+ if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_output_subsys))
+ vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;
+
+ /* unmask address range configure area */
+ for (i = 0; i < vmx->pt_desc.addr_range; i++)
+ vmx->pt_desc.ctl_bitmask &= ~(0xf << (32 + i * 4));
+}
+
static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -10429,6 +10495,10 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)

if (nested_vmx_allowed(vcpu))
nested_vmx_cr_fixed1_bits_update(vcpu);
+
+ if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
+ update_intel_pt_cfg(vcpu);
}

static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
--
1.8.3.1


2018-05-22 03:02:27

by Luwei Kang

[permalink] [raw]
Subject: [PATCH v9 02/12] perf/x86/intel/pt: Change pt_cap_get() to a public function

From: Chao Peng <[email protected]>

Change pt_cap_get() to a public function that KVM
can access this function to check if specific
feature is supported on hardware.

Signed-off-by: Chao Peng <[email protected]>
Signed-off-by: Luwei Kang <[email protected]>
---
arch/x86/events/intel/pt.c | 3 ++-
arch/x86/events/intel/pt.h | 21 ---------------------
arch/x86/include/asm/intel_pt.h | 23 +++++++++++++++++++++++
3 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 3b99394..c80e2f5 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -75,7 +75,7 @@
PT_CAP(psb_periods, 1, CPUID_EBX, 0xffff0000),
};

-static u32 pt_cap_get(enum pt_capabilities cap)
+u32 pt_cap_get(enum pt_capabilities cap)
{
struct pt_cap_desc *cd = &pt_caps[cap];
u32 c = pt_pmu.caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
@@ -83,6 +83,7 @@ static u32 pt_cap_get(enum pt_capabilities cap)

return (c & cd->mask) >> shift;
}
+EXPORT_SYMBOL_GPL(pt_cap_get);

static ssize_t pt_cap_show(struct device *cdev,
struct device_attribute *attr,
diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h
index 0050ca1..269e15a 100644
--- a/arch/x86/events/intel/pt.h
+++ b/arch/x86/events/intel/pt.h
@@ -45,30 +45,9 @@ struct topa_entry {
u64 rsvd4 : 16;
};

-#define PT_CPUID_LEAVES 2
-#define PT_CPUID_REGS_NUM 4 /* number of regsters (eax, ebx, ecx, edx) */
-
/* TSC to Core Crystal Clock Ratio */
#define CPUID_TSC_LEAF 0x15

-enum pt_capabilities {
- PT_CAP_max_subleaf = 0,
- PT_CAP_cr3_filtering,
- PT_CAP_psb_cyc,
- PT_CAP_ip_filtering,
- PT_CAP_mtc,
- PT_CAP_ptwrite,
- PT_CAP_power_event_trace,
- PT_CAP_topa_output,
- PT_CAP_topa_multiple_entries,
- PT_CAP_single_range_output,
- PT_CAP_payloads_lip,
- PT_CAP_num_address_ranges,
- PT_CAP_mtc_periods,
- PT_CAP_cycle_thresholds,
- PT_CAP_psb_periods,
-};
-
struct pt_pmu {
struct pmu pmu;
u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h
index b523f51..4270421 100644
--- a/arch/x86/include/asm/intel_pt.h
+++ b/arch/x86/include/asm/intel_pt.h
@@ -2,10 +2,33 @@
#ifndef _ASM_X86_INTEL_PT_H
#define _ASM_X86_INTEL_PT_H

+#define PT_CPUID_LEAVES 2
+#define PT_CPUID_REGS_NUM 4 /* number of regsters (eax, ebx, ecx, edx) */
+
+enum pt_capabilities {
+ PT_CAP_max_subleaf = 0,
+ PT_CAP_cr3_filtering,
+ PT_CAP_psb_cyc,
+ PT_CAP_ip_filtering,
+ PT_CAP_mtc,
+ PT_CAP_ptwrite,
+ PT_CAP_power_event_trace,
+ PT_CAP_topa_output,
+ PT_CAP_topa_multiple_entries,
+ PT_CAP_single_range_output,
+ PT_CAP_payloads_lip,
+ PT_CAP_num_address_ranges,
+ PT_CAP_mtc_periods,
+ PT_CAP_cycle_thresholds,
+ PT_CAP_psb_periods,
+};
+
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
void cpu_emergency_stop_pt(void);
+extern u32 pt_cap_get(enum pt_capabilities cap);
#else
static inline void cpu_emergency_stop_pt(void) {}
+static inline u32 pt_cap_get(enum pt_capabilities cap) { return 0; }
#endif

#endif /* _ASM_X86_INTEL_PT_H */
--
1.8.3.1


2018-06-07 07:12:48

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v9 01/12] perf/x86/intel/pt: Move Intel-PT MSRs bit definitions to a public header

> -----Original Message-----
> From: Kang, Luwei
> Sent: Tuesday, May 22, 2018 12:52 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Liang, Kan <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; Kang, Luwei
> <[email protected]>
> Subject: [PATCH v9 01/12] perf/x86/intel/pt: Move Intel-PT MSRs bit definitions to a public header
>
> From: Chao Peng <[email protected]>
>
> Intel Processor Trace virtualization enabling in KVM guest need to access these MSRs bit definitions, so move them to public header
> file msr-index.h.
>
> Signed-off-by: Chao Peng <[email protected]>
> Signed-off-by: Luwei Kang <[email protected]>
> ---
> arch/x86/events/intel/pt.h | 37 -------------------------------------
> arch/x86/include/asm/msr-index.h | 33 +++++++++++++++++++++++++++++++++
> 2 files changed, 33 insertions(+), 37 deletions(-)
>
> diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h index 0eb41d0..0050ca1 100644
> --- a/arch/x86/events/intel/pt.h
> +++ b/arch/x86/events/intel/pt.h
> @@ -20,43 +20,6 @@
> #define __INTEL_PT_H__
>
> /*
> - * PT MSR bit definitions
> - */
> -#define RTIT_CTL_TRACEEN BIT(0)
> -#define RTIT_CTL_CYCLEACC BIT(1)
> -#define RTIT_CTL_OS BIT(2)
> -#define RTIT_CTL_USR BIT(3)
> -#define RTIT_CTL_PWR_EVT_EN BIT(4)
> -#define RTIT_CTL_FUP_ON_PTW BIT(5)
> -#define RTIT_CTL_CR3EN BIT(7)
> -#define RTIT_CTL_TOPA BIT(8)
> -#define RTIT_CTL_MTC_EN BIT(9)
> -#define RTIT_CTL_TSC_EN BIT(10)
> -#define RTIT_CTL_DISRETC BIT(11)
> -#define RTIT_CTL_PTW_EN BIT(12)
> -#define RTIT_CTL_BRANCH_EN BIT(13)
> -#define RTIT_CTL_MTC_RANGE_OFFSET 14
> -#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
> -#define RTIT_CTL_CYC_THRESH_OFFSET 19
> -#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
> -#define RTIT_CTL_PSB_FREQ_OFFSET 24
> -#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
> -#define RTIT_CTL_ADDR0_OFFSET 32
> -#define RTIT_CTL_ADDR0 (0x0full << RTIT_CTL_ADDR0_OFFSET)
> -#define RTIT_CTL_ADDR1_OFFSET 36
> -#define RTIT_CTL_ADDR1 (0x0full << RTIT_CTL_ADDR1_OFFSET)
> -#define RTIT_CTL_ADDR2_OFFSET 40
> -#define RTIT_CTL_ADDR2 (0x0full << RTIT_CTL_ADDR2_OFFSET)
> -#define RTIT_CTL_ADDR3_OFFSET 44
> -#define RTIT_CTL_ADDR3 (0x0full << RTIT_CTL_ADDR3_OFFSET)
> -#define RTIT_STATUS_FILTEREN BIT(0)
> -#define RTIT_STATUS_CONTEXTEN BIT(1)
> -#define RTIT_STATUS_TRIGGEREN BIT(2)
> -#define RTIT_STATUS_BUFFOVF BIT(3)
> -#define RTIT_STATUS_ERROR BIT(4)
> -#define RTIT_STATUS_STOPPED BIT(5)
> -
> -/*
> * Single-entry ToPA: when this close to region boundary, switch
> * buffers to avoid losing data.
> */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 53d5b1b..afe4e13 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -106,7 +106,40 @@
> #define MSR_PEBS_LD_LAT_THRESHOLD 0x000003f6
>
> #define MSR_IA32_RTIT_CTL 0x00000570
> +#define RTIT_CTL_TRACEEN BIT(0)
> +#define RTIT_CTL_CYCLEACC BIT(1)
> +#define RTIT_CTL_OS BIT(2)
> +#define RTIT_CTL_USR BIT(3)
> +#define RTIT_CTL_PWR_EVT_EN BIT(4)
> +#define RTIT_CTL_FUP_ON_PTW BIT(5)
> +#define RTIT_CTL_CR3EN BIT(7)
> +#define RTIT_CTL_TOPA BIT(8)
> +#define RTIT_CTL_MTC_EN BIT(9)
> +#define RTIT_CTL_TSC_EN BIT(10)
> +#define RTIT_CTL_DISRETC BIT(11)
> +#define RTIT_CTL_PTW_EN BIT(12)
> +#define RTIT_CTL_BRANCH_EN BIT(13)
> +#define RTIT_CTL_MTC_RANGE_OFFSET 14
> +#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
> +#define RTIT_CTL_CYC_THRESH_OFFSET 19
> +#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
> +#define RTIT_CTL_PSB_FREQ_OFFSET 24
> +#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
> +#define RTIT_CTL_ADDR0_OFFSET 32
> +#define RTIT_CTL_ADDR0 (0x0full << RTIT_CTL_ADDR0_OFFSET)
> +#define RTIT_CTL_ADDR1_OFFSET 36
> +#define RTIT_CTL_ADDR1 (0x0full << RTIT_CTL_ADDR1_OFFSET)
> +#define RTIT_CTL_ADDR2_OFFSET 40
> +#define RTIT_CTL_ADDR2 (0x0full << RTIT_CTL_ADDR2_OFFSET)
> +#define RTIT_CTL_ADDR3_OFFSET 44
> +#define RTIT_CTL_ADDR3 (0x0full << RTIT_CTL_ADDR3_OFFSET)
> #define MSR_IA32_RTIT_STATUS 0x00000571
> +#define RTIT_STATUS_FILTEREN BIT(0)
> +#define RTIT_STATUS_CONTEXTEN BIT(1)
> +#define RTIT_STATUS_TRIGGEREN BIT(2)
> +#define RTIT_STATUS_BUFFOVF BIT(3)
> +#define RTIT_STATUS_ERROR BIT(4)
> +#define RTIT_STATUS_STOPPED BIT(5)
> #define MSR_IA32_RTIT_ADDR0_A 0x00000580
> #define MSR_IA32_RTIT_ADDR0_B 0x00000581
> #define MSR_IA32_RTIT_ADDR1_A 0x00000582

Hi,
Patch 1~5 have some code changes in x86 native for Intel Processor Trace virtualization enabling in KVM guest.
I have sent patch set v9 which include some minor changes from old version.
Do you have any comments?

Thanks,
Luwei Kang

2018-06-07 11:23:36

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v9 01/12] perf/x86/intel/pt: Move Intel-PT MSRs bit definitions to a public header

On 07/06/2018 08:50, Kang, Luwei wrote:
> Patch 1~5 have some code changes in x86 native for Intel Processor Trace virtualization enabling in KVM guest.
> I have sent patch set v9 which include some minor changes from old version.
> Do you have any comments?

x86 folks, looks like this series is going to miss 4.18. Perf folks,
can you review this and hopefully provide a topic branch asap after the
end of the merge window?

Thanks,

Paolo

2018-06-07 13:40:59

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCH v9 06/12] KVM: x86: Add Intel Processor Trace virtualization mode

On Tue, May 22, 2018 at 12:52:09PM +0800, Luwei Kang wrote:
> From: Chao Peng <[email protected]>
>
> Intel PT virtualization can be work in one of 2 possible modes:
> a. system-wide: trace both host and guest and output to host buffer;
> b. host-guest: trace host/guest simultaneous and output to their
> respective buffer.

I think we discussed this before. That's not the choice that the user
needs to make. The only choice that I see is what happens if the guest
wants to use PT while it's already in use by the host. Otherwise, each
of them gets to use PT as they would.

Regards,
--
Alex


2018-06-07 13:42:45

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCH v9 03/12] perf/x86/intel/pt: Add new bit definitions for Intel PT MSRs

On Tue, May 22, 2018 at 12:52:06PM +0800, Luwei Kang wrote:
> These bit definitions are use for emulate MSRs read/write
> for KVM. For example, IA32_RTIT_CTL.FabricEn[bit 6] is available
> only when CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 1. If KVM guest
> try to set this bit with CPUID.(EAX=14H, ECX=0):ECX[bit3] = 0
> a #GP would be injected to KVM guest.

Do we have anything in the guest that this feature will work with?

Regards,
--
Alex


2018-06-07 13:44:26

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCH v9 04/12] perf/x86/intel/pt: add new capability for Intel PT

On Tue, May 22, 2018 at 12:52:07PM +0800, Luwei Kang wrote:
> CPUID(EAX=14H,ECX=0):EBX[bit 3] = 1 indicates support of
> output to Trace Transport subsystem.
> MSR IA32_RTIT_CTL.FabricEn[bit 6] is reserved if
> CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 0.

This should instead say:

This adds support for "output to Trace Transport subsystem" capability
of Intel PT, as documented in IA SDM Chapter 36.x.y.z. It means that
PT can output its trace to an MMIO address range rather than system
memory buffer.

> This is use for emulate IA32_RTIT_CTL MSR read/write
> in KVM. KVM guest write IA32_RTIT_CTL will trap to
> root mode and a #GP would be injected to guest if set
> IA32_RTIT_CTL.FabricEn with
> CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 0.

I'm not sure what this means, this patch has nothing to do with KVM
as far as I can tell.

Aside from the commit message, this is a valid patch.

Thanks,
--
Alex


2018-06-07 13:52:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9 04/12] perf/x86/intel/pt: add new capability for Intel PT

On Thu, Jun 07, 2018 at 04:40:51PM +0300, Alexander Shishkin wrote:
> On Tue, May 22, 2018 at 12:52:07PM +0800, Luwei Kang wrote:
> > CPUID(EAX=14H,ECX=0):EBX[bit 3] = 1 indicates support of
> > output to Trace Transport subsystem.
> > MSR IA32_RTIT_CTL.FabricEn[bit 6] is reserved if
> > CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 0.
>
> This should instead say:
>
> This adds support for "output to Trace Transport subsystem" capability
> of Intel PT, as documented in IA SDM Chapter 36.x.y.z. It means that
> PT can output its trace to an MMIO address range rather than system
> memory buffer.

The $subject also needs attention.

2018-06-07 13:57:52

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCH v9 09/12] KVM: x86: Introduce a function to initialize the PT configuration

On Tue, May 22, 2018 at 12:52:12PM +0800, Luwei Kang wrote:
> Initialize the Intel PT configuration when cpuid update.

Is it the CPUID configuration? Is it the MSR configuration? Is it both?
Kind of looks like both. Not sure what is the cpuid update, though.

> Include cpuid inforamtion, rtit_ctl bit mask and the number of
> address ranges.
>
> Signed-off-by: Luwei Kang <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 70 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 11fb90a..952ddf4 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10411,6 +10411,72 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
> #undef cr4_fixed1_update
> }
>
> +static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_vmx *vmx = to_vmx(vcpu);
> + struct kvm_cpuid_entry2 *best = NULL;
> + int i;
> +
> + for (i = 0; i < PT_CPUID_LEAVES; i++) {
> + best = kvm_find_cpuid_entry(vcpu, 0x14, i);
> + if (!best)
> + return;
> + vmx->pt_desc.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM] = best->eax;
> + vmx->pt_desc.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM] = best->ebx;
> + vmx->pt_desc.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM] = best->ecx;
> + vmx->pt_desc.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM] = best->edx;
> + }
> +
> + /* Get the number of configurable Address Ranges for filtering */
> + vmx->pt_desc.addr_range = pt_cap_decode(vmx->pt_desc.caps,
> + PT_CAP_num_address_ranges);
> +
> + /* Initialize and clear the no dependency bits */
> + vmx->pt_desc.ctl_bitmask = ~0ULL;

This looks redundant, doesn't it?

> + vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
> + RTIT_CTL_USR | RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC);
> +
> + /* If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set */

This comment makes it less clear than it would have been otherwise.

> + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_cr3_filtering))
> + vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_CR3EN;
> +
> + /*
> + * If CPUID.(EAX=14H,ECX=0):EBX[1]=1 CYCEn, CycThresh and
> + * PSBFreq can be set
> + */
> + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_psb_cyc))
> + vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_CYCLEACC |
> + RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ);
> +
> + /*
> + * If CPUID.(EAX=14H,ECX=0):EBX[3]=1 MTCEn BranchEn and
> + * MTCFreq can be set
> + */
> + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_mtc))
> + vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_MTC_EN |
> + RTIT_CTL_BRANCH_EN | RTIT_CTL_MTC_RANGE);
> +
> + /* If CPUID.(EAX=14H,ECX=0):EBX[4]=1 FUPonPTW and PTWEn can be set */
> + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_ptwrite))
> + vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_FUP_ON_PTW |
> + RTIT_CTL_PTW_EN);
> +
> + /* If CPUID.(EAX=14H,ECX=0):EBX[5]=1 PwrEvEn can be set */
> + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_power_event_trace))
> + vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_PWR_EVT_EN;
> +
> + /* If CPUID.(EAX=14H,ECX=0):ECX[0]=1 ToPA can be set */
> + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_topa_output))
> + vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_TOPA;

If you want to be thorough, there's also PT_CAP_single_range_output, which
tells us if RTIT_CTL_TOPA can be *unset*. Otherwise it's required.

> + /* If CPUID.(EAX=14H,ECX=0):ECX[3]=1 FabircEn can be set */
> + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_output_subsys))
> + vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;

Are we sure we want to virtualize this and that it's safe?

> +
> + /* unmask address range configure area */
> + for (i = 0; i < vmx->pt_desc.addr_range; i++)
> + vmx->pt_desc.ctl_bitmask &= ~(0xf << (32 + i * 4));

So, the ctl_bitmask is all the bits that are not allowed?

Regards,
--
Alex

2018-06-07 14:23:31

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCH v9 11/12] KVM: x86: Set intercept for Intel PT MSRs read/write

On Tue, May 22, 2018 at 12:52:14PM +0800, Luwei Kang wrote:
> From: Chao Peng <[email protected]>
>
> Disable intercept Intel PT MSRs only when Intel PT is
> enabled in guest. But MSR_IA32_RTIT_CTL will alway be
> intercept.

"I'd like to offer some suggestions as to how to make the commit message
friendlier for reviewing.

Generally, for every patch, we want to explain the following: what we want,
why we want it and how we want to go about getting it. We also would prefer
to do it in english rather than in C, because for the latter we can just
look at the code." [1]

I apologize for quoting myself or if I'm stating the obvious, but it looks
appropriate here.

[1] https://marc.info/?l=linux-kernel&m=152233031020263

Regards,
--
Alex

2018-06-07 16:34:10

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCH v9 10/12] KVM: x86: Implement Intel Processor Trace MSRs read/write emulation

On Tue, May 22, 2018 at 12:52:13PM +0800, Luwei Kang wrote:
> + /*
> + * Any MSR write that attempts to change bits marked reserved will
> + * case a #GP fault.
> + */
> + if (data & vmx->pt_desc.ctl_bitmask)
> + return 1;
> +
> + /*
> + * Any attempt to modify IA32_RTIT_CTL while TraceEn is set will
> + * result in a #GP unless the same write also clears TraceEn.
> + */
> + if ((vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) &&
> + ((vmx->pt_desc.guest.ctl ^ data) & ~RTIT_CTL_TRACEEN))
> + return 1;
> +
> + /*
> + * WRMSR to IA32_RTIT_CTL that sets TraceEn but clears this bit
> + * and FabricEn would cause #GP, if
> + * CPUID.(EAX=14H, ECX=0):ECX.SNGLRGNOUT[bit 2] = 0
> + */
> + if ((data & RTIT_CTL_TRACEEN) && !(data & RTIT_CTL_TOPA) &&
> + !(data & RTIT_CTL_FABRIC_EN) &&
> + !pt_cap_decode(vmx->pt_desc.caps, PT_CAP_single_range_output))
> + return 1;

Ah, I see. But afaict this is still wrong: PT_CAP_single_range_output is
only about allowing !RTIT_CTL_TOPA, FABRIC_EN should be checked separately
against the new capability that you added in 4/12.

Regards,
--
Alex


2018-06-08 14:19:54

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v9 06/12] KVM: x86: Add Intel Processor Trace virtualization mode

> > From: Chao Peng <[email protected]>
> >
> > Intel PT virtualization can be work in one of 2 possible modes:
> > a. system-wide: trace both host and guest and output to host buffer;
> > b. host-guest: trace host/guest simultaneous and output to their
> > respective buffer.
>
> I think we discussed this before. That's not the choice that the user needs to
> make. The only choice that I see is what happens if the guest wants to use PT
> while it's already in use by the host. Otherwise, each of them gets to use PT
> as they would.
>

Hi,
Thanks for your reply first. If guest want to use PT that we need to expose this feature to guest and guest will save these trace packets to guest buffer. Host will can't get the trace which are made by KVM guest. This is Host_Guest mode.
On the other hand, host may also want to trace hypervisor and KVM guest execution trace and output these trace in native. This is SYSTEM mode. PT can't be exposed to guest in this mode or KVM guest trace will be lost in native buffer.
These are two different use case and can't coexist.

Thanks,
Luwei Kang



2018-06-08 14:27:02

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v9 03/12] perf/x86/intel/pt: Add new bit definitions for Intel PT MSRs

> > These bit definitions are use for emulate MSRs read/write for KVM. For
> > example, IA32_RTIT_CTL.FabricEn[bit 6] is available only when
> > CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 1. If KVM guest try to set this
> > bit with CPUID.(EAX=14H, ECX=0):ECX[bit3] = 0 a #GP would be injected
> > to KVM guest.
>
> Do we have anything in the guest that this feature will work with?
>

It depend on PT driver. KVM need to do some security check if kvm guest (maybe linux or other os) try to set any bits of these MSRs.

Thanks,
Luwei Kang

2018-06-08 14:35:47

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v9 04/12] perf/x86/intel/pt: add new capability for Intel PT

> > CPUID(EAX=14H,ECX=0):EBX[bit 3] = 1 indicates support of output to
> > Trace Transport subsystem.
> > MSR IA32_RTIT_CTL.FabricEn[bit 6] is reserved if CPUID.(EAX=14H,
> > ECX=0):ECX[bit 3] = 0.
>
> This should instead say:
>
> This adds support for "output to Trace Transport subsystem" capability of
> Intel PT, as documented in IA SDM Chapter 36.x.y.z. It means that PT can
> output its trace to an MMIO address range rather than system memory
> buffer.

Yes, you are right. In KVM point of view this bit use for MSR access security check.
KVM will track MSRs access in guest, an #GP will be injected to guest if guest try to write IA32_RTIT_CTL.FabricEn[bit 6] when "output_subsys" (CPUID.(EAX=14H, ECX=0):ECX[bit 3] = 0) is not supported.

Thanks,
Luwei Kang

>
> > This is use for emulate IA32_RTIT_CTL MSR read/write in KVM. KVM guest
> > write IA32_RTIT_CTL will trap to root mode and a #GP would be injected
> > to guest if set IA32_RTIT_CTL.FabricEn with CPUID.(EAX=14H,
> > ECX=0):ECX[bit 3] = 0.
>
> I'm not sure what this means, this patch has nothing to do with KVM as far as
> I can tell.
>
> Aside from the commit message, this is a valid patch.
>
> Thanks,
> --
> Alex


2018-06-08 14:38:20

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v9 04/12] perf/x86/intel/pt: add new capability for Intel PT

> > On Tue, May 22, 2018 at 12:52:07PM +0800, Luwei Kang wrote:
> > > CPUID(EAX=14H,ECX=0):EBX[bit 3] = 1 indicates support of output to
> > > Trace Transport subsystem.
> > > MSR IA32_RTIT_CTL.FabricEn[bit 6] is reserved if CPUID.(EAX=14H,
> > > ECX=0):ECX[bit 3] = 0.
> >
> > This should instead say:
> >
> > This adds support for "output to Trace Transport subsystem" capability
> > of Intel PT, as documented in IA SDM Chapter 36.x.y.z. It means that
> > PT can output its trace to an MMIO address range rather than system
> > memory buffer.
>
> The $subject also needs attention.

A little don't understand. What is "$subject" mean?

Thanks,
Luwei Kang

2018-06-08 14:58:09

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v9 09/12] KVM: x86: Introduce a function to initialize the PT configuration

> On Tue, May 22, 2018 at 12:52:12PM +0800, Luwei Kang wrote:
> > Initialize the Intel PT configuration when cpuid update.
>
> Is it the CPUID configuration? Is it the MSR configuration? Is it both?
> Kind of looks like both. Not sure what is the cpuid update, though.
>
> > Include cpuid inforamtion, rtit_ctl bit mask and the number of address
> > ranges.
> >
> > Signed-off-by: Luwei Kang <[email protected]>
> > ---
> > arch/x86/kvm/vmx.c | 70
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 70 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> > 11fb90a..952ddf4 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -10411,6 +10411,72 @@ static void
> > nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu) #undef
> > cr4_fixed1_update }
> >
> > +static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) {
> > + struct vcpu_vmx *vmx = to_vmx(vcpu);
> > + struct kvm_cpuid_entry2 *best = NULL;
> > + int i;
> > +
> > + for (i = 0; i < PT_CPUID_LEAVES; i++) {
> > + best = kvm_find_cpuid_entry(vcpu, 0x14, i);
> > + if (!best)
> > + return;
> > + vmx->pt_desc.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM]
> = best->eax;
> > + vmx->pt_desc.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM]
> = best->ebx;
> > + vmx->pt_desc.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM]
> = best->ecx;
> > + vmx->pt_desc.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]
> = best->edx;
> > + }
> > +
> > + /* Get the number of configurable Address Ranges for filtering */
> > + vmx->pt_desc.addr_range = pt_cap_decode(vmx->pt_desc.caps,
> > +
> PT_CAP_num_address_ranges);
> > +
> > + /* Initialize and clear the no dependency bits */
> > + vmx->pt_desc.ctl_bitmask = ~0ULL;
>
> This looks redundant, doesn't it?

This is a bit mask for RTIT_CTL MSR and it will make & with the value of RTIT_CLT from guest.
The reserved bits will be 1 in this bit mask and the setable bits are 0.

>
> > + vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
> > + RTIT_CTL_USR | RTIT_CTL_TSC_EN |
> RTIT_CTL_DISRETC);
> > +
> > + /* If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set */
>
> This comment makes it less clear than it would have been otherwise.

What about like this?
/* If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set otherwise will inject an #GP */

>
> > + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_cr3_filtering))
> > + vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_CR3EN;
> > +
> > + /*
> > + * If CPUID.(EAX=14H,ECX=0):EBX[1]=1 CYCEn, CycThresh and
> > + * PSBFreq can be set
> > + */
> > + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_psb_cyc))
> > + vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_CYCLEACC |
> > + RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ);
> > +
> > + /*
> > + * If CPUID.(EAX=14H,ECX=0):EBX[3]=1 MTCEn BranchEn and
> > + * MTCFreq can be set
> > + */
> > + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_mtc))
> > + vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_MTC_EN |
> > + RTIT_CTL_BRANCH_EN | RTIT_CTL_MTC_RANGE);
> > +
> > + /* If CPUID.(EAX=14H,ECX=0):EBX[4]=1 FUPonPTW and PTWEn can
> be set */
> > + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_ptwrite))
> > + vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_FUP_ON_PTW |
> > + RTIT_CTL_PTW_EN);
> > +
> > + /* If CPUID.(EAX=14H,ECX=0):EBX[5]=1 PwrEvEn can be set */
> > + if (pt_cap_decode(vmx->pt_desc.caps,
> PT_CAP_power_event_trace))
> > + vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_PWR_EVT_EN;
> > +
> > + /* If CPUID.(EAX=14H,ECX=0):ECX[0]=1 ToPA can be set */
> > + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_topa_output))
> > + vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_TOPA;
>
> If you want to be thorough, there's also PT_CAP_single_range_output,
> which tells us if RTIT_CTL_TOPA can be *unset*. Otherwise it's required.

Following the description in SDM, the default value of this bit (RTIT_CTL.ToPA) is 0. So I think we don't need to touch this bit when TOPA is not supported.

>
> > + /* If CPUID.(EAX=14H,ECX=0):ECX[3]=1 FabircEn can be set */
> > + if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_output_subsys))
> > + vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;
>
> Are we sure we want to virtualize this and that it's safe?

It depend on the CPUID information virtualization in guest. I think it can works (e.g. we can pass through the PCI card to guest and stream the trace to a MMIO address).

>
> > +
> > + /* unmask address range configure area */
> > + for (i = 0; i < vmx->pt_desc.addr_range; i++)
> > + vmx->pt_desc.ctl_bitmask &= ~(0xf << (32 + i * 4));
>
> So, the ctl_bitmask is all the bits that are not allowed?

Yes, you are right.

Thanks,
Luwei Kang

2018-06-08 15:13:28

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v9 10/12] KVM: x86: Implement Intel Processor Trace MSRs read/write emulation

> > + /*
> > + * Any MSR write that attempts to change bits marked reserved will
> > + * case a #GP fault.
> > + */
> > + if (data & vmx->pt_desc.ctl_bitmask)
> > + return 1;
> > +
> > + /*
> > + * Any attempt to modify IA32_RTIT_CTL while TraceEn is set will
> > + * result in a #GP unless the same write also clears TraceEn.
> > + */
> > + if ((vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) &&
> > + ((vmx->pt_desc.guest.ctl ^ data) & ~RTIT_CTL_TRACEEN))
> > + return 1;
> > +
> > + /*
> > + * WRMSR to IA32_RTIT_CTL that sets TraceEn but clears this bit
> > + * and FabricEn would cause #GP, if
> > + * CPUID.(EAX=14H, ECX=0):ECX.SNGLRGNOUT[bit 2] = 0
> > + */
> > + if ((data & RTIT_CTL_TRACEEN) && !(data & RTIT_CTL_TOPA) &&
> > + !(data & RTIT_CTL_FABRIC_EN) &&
> > + !pt_cap_decode(vmx->pt_desc.caps,
> PT_CAP_single_range_output))
> > + return 1;
>
> Ah, I see. But afaict this is still wrong: PT_CAP_single_range_output is only about allowing !RTIT_CTL_TOPA,

This is follow the description in SDM (35.2.7.2 IA32_RTIT_CTL MSR -> ToPA -> Note:...)

> FABRIC_EN should be checked separately against the new capability that you added in 4/12.

Yes, it have has been handled in patch 9.
+ /* If CPUID.(EAX=14H,ECX=0):ECX[3]=1 FabircEn can be set */
+ if (pt_cap_decode(vmx->pt_desc.caps, PT_CAP_output_subsys))
+ vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;

Thanks,
Luwei Kang

2018-06-08 15:17:09

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v9 11/12] KVM: x86: Set intercept for Intel PT MSRs read/write

> > From: Chao Peng <[email protected]>
> >
> > Disable intercept Intel PT MSRs only when Intel PT is enabled in
> > guest. But MSR_IA32_RTIT_CTL will alway be intercept.
>
> "I'd like to offer some suggestions as to how to make the commit message
> friendlier for reviewing.
>
> Generally, for every patch, we want to explain the following: what we want,
> why we want it and how we want to go about getting it. We also would
> prefer to do it in english rather than in C, because for the latter we can just
> look at the code." [1]
>
> I apologize for quoting myself or if I'm stating the obvious, but it looks
> appropriate here.
>
> [1] https://marc.info/?l=linux-kernel&m=152233031020263
>

Thank you very much. My English is not good and lack some experience of how to make description more clearly.
Will improve it.

Thanks,
Luwei Kang


2018-06-11 11:41:33

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v9 06/12] KVM: x86: Add Intel Processor Trace virtualization mode

On 07/06/2018 15:26, Alexander Shishkin wrote:
>> Intel PT virtualization can be work in one of 2 possible modes:
>> a. system-wide: trace both host and guest and output to host buffer;
>> b. host-guest: trace host/guest simultaneous and output to their
>> respective buffer.
>
> I think we discussed this before. That's not the choice that the user
> needs to make. The only choice that I see is what happens if the guest
> wants to use PT while it's already in use by the host. Otherwise, each
> of them gets to use PT as they would.

That is a superset of this choice:

a. hide PT from guests completely (the "a" above)

b. host-guest with guest preference: same as "b" above

c. host-guest with host preference: trace host/guest simultaneously, but
drop guest events if the host has tracing enabled.

so I don't think this choice is an issue.

Paolo

2018-06-11 11:43:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v9 09/12] KVM: x86: Introduce a function to initialize the PT configuration

On 08/06/2018 16:56, Kang, Luwei wrote:
>>> + /* Initialize and clear the no dependency bits */
>>> + vmx->pt_desc.ctl_bitmask = ~0ULL;
>> This looks redundant, doesn't it?
> This is a bit mask for RTIT_CTL MSR and it will make & with the value of RTIT_CLT from guest.
> The reserved bits will be 1 in this bit mask and the setable bits are 0.
>
>>> + vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
>>> + RTIT_CTL_USR | RTIT_CTL_TSC_EN |
>> RTIT_CTL_DISRETC);

But it's set twice, so the first assignment is superfluous.

Paolo

2018-06-12 01:46:17

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v9 09/12] KVM: x86: Introduce a function to initialize the PT configuration

> >>> + /* Initialize and clear the no dependency bits */
> >>> + vmx->pt_desc.ctl_bitmask = ~0ULL;
> >> This looks redundant, doesn't it?
> > This is a bit mask for RTIT_CTL MSR and it will make & with the value of RTIT_CLT from guest.
> > The reserved bits will be 1 in this bit mask and the setable bits are 0.
> >
> >>> + vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
> >>> + RTIT_CTL_USR | RTIT_CTL_TSC_EN |
> >> RTIT_CTL_DISRETC);
>
> But it's set twice, so the first assignment is superfluous.

Oh, I see. Will remove it.

Thanks,
Luwei Kang