LinuxLists.cc - [PATCH v2 00/15] Introduce Architectural LBR for vPMU

2022-11-25 06:39:10

Subject: [PATCH v2 00/15] Introduce Architectural LBR for vPMU

Intel CPU model-specific LBR(Legacy LBR) has evolved to Architectural
LBR(Arch LBR [0]), it's the replacement of legacy LBR on new platforms.
The native support patches were merged into 5.9 kernel tree, and this
patch series is to enable Arch LBR in vPMU so that guest can benefit
from the feature.

The main advantages of Arch LBR are [1]:
- Faster context switching due to XSAVES support and faster reset of
LBR MSRs via the new DEPTH MSR
- Faster LBR read for a non-PEBS event due to XSAVES support, which
lowers the overhead of the NMI handler.
- Linux kernel can support the LBR features without knowing the model
number of the current CPU.

From end user's point of view, the usage of Arch LBR is the same as
the Legacy LBR that has been merged in the mainline.

Note, in this series, there's one restriction for guest Arch LBR, i.e.,
guest can only set its LBR record depth the same as host's. This is due
to the special behavior of MSR_ARCH_LBR_DEPTH:
1) On write to the MSR, it'll reset all Arch LBR recording MSRs to 0s.
2) XRSTORS resets all record MSRs to 0s if the saved depth mismatches
MSR_ARCH_LBR_DEPTH.
Enforcing the restriction keeps KVM Arch LBR vPMU working flow simple
and straightforward.

Paolo refactored the old series and the resulting patches became the
base of this new series, therefore he's the author of some patches.

[0] https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
[1] https://lore.kernel.org/lkml/[email protected]/

v1:
https://lore.kernel.org/all/[email protected]/

Changes v2:
1. Removed Paolo's SOBs from some patches. [Sean]
2. Modified some patches due to KVM changes, e.g., SMM/vPMU refactor.
3. Rebased to https://git.kernel.org/pub/scm/virt/kvm/kvm.git : queue branch.

Like Xu (3):
perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers
KVM: vmx/pmu: Emulate MSR_ARCH_LBR_DEPTH for guest Arch LBR
KVM: x86: Add XSAVE Support for Architectural LBR

Paolo Bonzini (4):
KVM: PMU: disable LBR handling if architectural LBR is available
KVM: vmx/pmu: Emulate MSR_ARCH_LBR_CTL for guest Arch LBR
KVM: VMX: Support passthrough of architectural LBRs
KVM: x86: Refine the matching and clearing logic for supported_xss

Sean Christopherson (1):
KVM: x86: Report XSS as an MSR to be saved if there are supported
features

Yang Weijiang (7):
KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS
KVM: x86: Add Arch LBR MSRs to msrs_to_save_all list
KVM: x86/vmx: Check Arch LBR config when return perf capabilities
KVM: x86/vmx: Disable Arch LBREn bit in #DB and warm reset
KVM: x86/vmx: Save/Restore guest Arch LBR Ctrl msr at SMM entry/exit
KVM: x86: Add Arch LBR data MSR access interface
KVM: x86/cpuid: Advertise Arch LBR feature in CPUID

arch/x86/events/intel/lbr.c | 6 +-
arch/x86/include/asm/kvm_host.h | 3 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/vmx.h | 4 +
arch/x86/kvm/cpuid.c | 52 +++++++++-
arch/x86/kvm/smm.c | 1 +
arch/x86/kvm/smm.h | 3 +-
arch/x86/kvm/vmx/capabilities.h | 5 +
arch/x86/kvm/vmx/nested.c | 8 ++
arch/x86/kvm/vmx/pmu_intel.c | 161 +++++++++++++++++++++++++++----
arch/x86/kvm/vmx/vmx.c | 74 +++++++++++++-
arch/x86/kvm/vmx/vmx.h | 6 +-
arch/x86/kvm/x86.c | 27 +++++-
13 files changed, 316 insertions(+), 35 deletions(-)

base-commit: da5f28e10aa7df1a925dbc10656cc89d9c061358
--
2.27.0

2022-11-25 06:39:36

by Yang, Weijiang

[permalink] [raw]

Subject: [PATCH v2 04/15] KVM: PMU: disable LBR handling if architectural LBR is available

From: Paolo Bonzini <[email protected]>

Traditional LBR is absent on CPU models that have architectural LBR, so
disable all processing of traditional LBR MSRs if they are not there.

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx/pmu_intel.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index e5cec07ca8d9..905673228932 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -170,19 +170,23 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
{
struct x86_pmu_lbr *records = vcpu_to_lbr_records(vcpu);
- bool ret = false;

if (!intel_pmu_lbr_is_enabled(vcpu))
- return ret;
+ return false;

- ret = (index == MSR_LBR_SELECT) || (index == MSR_LBR_TOS) ||
- (index >= records->from && index < records->from + records->nr) ||
- (index >= records->to && index < records->to + records->nr);
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) &&
+ (index == MSR_LBR_SELECT || index == MSR_LBR_TOS))
+ return true;

- if (!ret && records->info)
- ret = (index >= records->info && index < records->info + records->nr);
+ if ((index >= records->from && index < records->from + records->nr) ||
+ (index >= records->to && index < records->to + records->nr))
+ return true;

- return ret;
+ if (records->info && index >= records->info &&
+ index < records->info + records->nr)
+ return true;
+
+ return false;
}

static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
@@ -702,6 +706,9 @@ static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
vmx_set_intercept_for_msr(vcpu, lbr->info + i, MSR_TYPE_RW, set);
}

+ if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
+ return;
+
vmx_set_intercept_for_msr(vcpu, MSR_LBR_SELECT, MSR_TYPE_RW, set);
vmx_set_intercept_for_msr(vcpu, MSR_LBR_TOS, MSR_TYPE_RW, set);
}
@@ -742,10 +749,12 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+ bool lbr_enable = !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) &&
+ (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR);

if (!lbr_desc->event) {
vmx_disable_lbr_msrs_passthrough(vcpu);
- if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
+ if (lbr_enable)
goto warn;
if (test_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use))
goto warn;
@@ -768,7 +777,10 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)

static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
{
- if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
+ bool lbr_enable = !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) &&
+ (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR);
+
+ if (!lbr_enable)
intel_pmu_release_guest_lbr_event(vcpu);
}

--
2.27.0

2022-11-25 07:10:58

by Yang, Weijiang

[permalink] [raw]

Subject: [PATCH v2 09/15] KVM: x86: Refine the matching and clearing logic for supported_xss

From: Paolo Bonzini <[email protected]>

Refine the code path of the existing clearing of supported_xss in this way:
initialize the supported_xss with the filter of KVM_SUPPORTED_XSS mask and
update its value in a bit clear manner (rather than bit setting).

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 5 +++--
arch/x86/kvm/x86.c | 6 +++++-
2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9bd52ad3bbf4..2ab4c33b5008 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7738,9 +7738,10 @@ static __init void vmx_set_cpu_caps(void)
kvm_cpu_cap_set(X86_FEATURE_UMIP);

/* CPUID 0xD.1 */
- kvm_caps.supported_xss = 0;
- if (!cpu_has_vmx_xsaves())
+ if (!cpu_has_vmx_xsaves()) {
kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
+ kvm_caps.supported_xss = 0;
+ }

/* CPUID 0x80000001 and 0x7 (RDPID) */
if (!cpu_has_vmx_rdtscp()) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 74c858eaa1ea..889be0c9176d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -217,6 +217,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)

+#define KVM_SUPPORTED_XSS 0
+
u64 __read_mostly host_efer;
EXPORT_SYMBOL_GPL(host_efer);

@@ -11999,8 +12001,10 @@ int kvm_arch_hardware_setup(void *opaque)

rdmsrl_safe(MSR_EFER, &host_efer);

- if (boot_cpu_has(X86_FEATURE_XSAVES))
+ if (boot_cpu_has(X86_FEATURE_XSAVES)) {
rdmsrl(MSR_IA32_XSS, host_xss);
+ kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
+ }

kvm_init_pmu_capability();

--
2.27.0

2023-01-12 02:24:49

by Yang, Weijiang

[permalink] [raw]

Subject: Re: [PATCH v2 00/15] Introduce Architectural LBR for vPMU

Hi, Sean,

Sorry to bother, do you have time to review this series? The feature
has been pending

for a long time, and I want to move it forward.

Thanks!

On 11/25/2022 12:05 PM, Yang, Weijiang wrote:
> Intel CPU model-specific LBR(Legacy LBR) has evolved to Architectural
> LBR(Arch LBR [0]), it's the replacement of legacy LBR on new platforms.
> The native support patches were merged into 5.9 kernel tree, and this
> patch series is to enable Arch LBR in vPMU so that guest can benefit
> from the feature.
>
> The main advantages of Arch LBR are [1]:
> - Faster context switching due to XSAVES support and faster reset of
> LBR MSRs via the new DEPTH MSR
> - Faster LBR read for a non-PEBS event due to XSAVES support, which
> lowers the overhead of the NMI handler.
> - Linux kernel can support the LBR features without knowing the model
> number of the current CPU.
>
> From end user's point of view, the usage of Arch LBR is the same as
> the Legacy LBR that has been merged in the mainline.
>
> Note, in this series, there's one restriction for guest Arch LBR, i.e.,
> guest can only set its LBR record depth the same as host's. This is due
> to the special behavior of MSR_ARCH_LBR_DEPTH:
> 1) On write to the MSR, it'll reset all Arch LBR recording MSRs to 0s.
> 2) XRSTORS resets all record MSRs to 0s if the saved depth mismatches
> MSR_ARCH_LBR_DEPTH.
> Enforcing the restriction keeps KVM Arch LBR vPMU working flow simple
> and straightforward.
>
> Paolo refactored the old series and the resulting patches became the
> base of this new series, therefore he's the author of some patches.
>
> [0] https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> v1:
> https://lore.kernel.org/all/[email protected]/
>
> Changes v2:
> 1. Removed Paolo's SOBs from some patches. [Sean]
> 2. Modified some patches due to KVM changes, e.g., SMM/vPMU refactor.
> 3. Rebased to https://git.kernel.org/pub/scm/virt/kvm/kvm.git : queue branch.
>
>
> Like Xu (3):
> perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers
> KVM: vmx/pmu: Emulate MSR_ARCH_LBR_DEPTH for guest Arch LBR
> KVM: x86: Add XSAVE Support for Architectural LBR
>
> Paolo Bonzini (4):
> KVM: PMU: disable LBR handling if architectural LBR is available
> KVM: vmx/pmu: Emulate MSR_ARCH_LBR_CTL for guest Arch LBR
> KVM: VMX: Support passthrough of architectural LBRs
> KVM: x86: Refine the matching and clearing logic for supported_xss
>
> Sean Christopherson (1):
> KVM: x86: Report XSS as an MSR to be saved if there are supported
> features
>
> Yang Weijiang (7):
> KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS
> KVM: x86: Add Arch LBR MSRs to msrs_to_save_all list
> KVM: x86/vmx: Check Arch LBR config when return perf capabilities
> KVM: x86/vmx: Disable Arch LBREn bit in #DB and warm reset
> KVM: x86/vmx: Save/Restore guest Arch LBR Ctrl msr at SMM entry/exit
> KVM: x86: Add Arch LBR data MSR access interface
> KVM: x86/cpuid: Advertise Arch LBR feature in CPUID
>
> arch/x86/events/intel/lbr.c | 6 +-
> arch/x86/include/asm/kvm_host.h | 3 +
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/include/asm/vmx.h | 4 +
> arch/x86/kvm/cpuid.c | 52 +++++++++-
> arch/x86/kvm/smm.c | 1 +
> arch/x86/kvm/smm.h | 3 +-
> arch/x86/kvm/vmx/capabilities.h | 5 +
> arch/x86/kvm/vmx/nested.c | 8 ++
> arch/x86/kvm/vmx/pmu_intel.c | 161 +++++++++++++++++++++++++++----
> arch/x86/kvm/vmx/vmx.c | 74 +++++++++++++-
> arch/x86/kvm/vmx/vmx.h | 6 +-
> arch/x86/kvm/x86.c | 27 +++++-
> 13 files changed, 316 insertions(+), 35 deletions(-)
>
>
> base-commit: da5f28e10aa7df1a925dbc10656cc89d9c061358

2023-01-27 20:11:29

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 04/15] KVM: PMU: disable LBR handling if architectural LBR is available

On Thu, Nov 24, 2022, Yang Weijiang wrote:
> From: Paolo Bonzini <[email protected]>
>
> Traditional LBR is absent on CPU models that have architectural LBR, so
> disable all processing of traditional LBR MSRs if they are not there.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/kvm/vmx/pmu_intel.c | 32 ++++++++++++++++++++++----------
> 1 file changed, 22 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index e5cec07ca8d9..905673228932 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -170,19 +170,23 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
> static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
> {
> struct x86_pmu_lbr *records = vcpu_to_lbr_records(vcpu);
> - bool ret = false;
>
> if (!intel_pmu_lbr_is_enabled(vcpu))
> - return ret;
> + return false;
>
> - ret = (index == MSR_LBR_SELECT) || (index == MSR_LBR_TOS) ||
> - (index >= records->from && index < records->from + records->nr) ||
> - (index >= records->to && index < records->to + records->nr);
> + if (!guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) &&

IIUC, the MSRs flat out don't exist _and_ KVM expects to passthrough MSRs to the
guest, i.e. KVM should check host support, not guest support. Probably a moot
point from a functionality perspective since KVM shouldn't allow LBRs to shouldn't
be enabled for the guest, but from a performance perspective, checking guest CPUID
is slooow.

That brings me to point #2, which is that KVM needs to disallow enabling legacy
LBRs on CPUs that support arch LBRs. Again, IIUC, because KVM doesn't have the
option to fallback to legacy LBRs, that restriction needs to be treated as a bug
fix. I'll post a separate patch unless my understanding is wrong.

> + (index == MSR_LBR_SELECT || index == MSR_LBR_TOS))
> + return true;
>
> - if (!ret && records->info)
> - ret = (index >= records->info && index < records->info + records->nr);
> + if ((index >= records->from && index < records->from + records->nr) ||
> + (index >= records->to && index < records->to + records->nr))
> + return true;
>
> - return ret;
> + if (records->info && index >= records->info &&
> + index < records->info + records->nr)
> + return true;
> +
> + return false;
> }
>
> static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
> @@ -702,6 +706,9 @@ static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
> vmx_set_intercept_for_msr(vcpu, lbr->info + i, MSR_TYPE_RW, set);
> }
>
> + if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))

Similar to above, I really don't want to query guest CPUID in the VM-Enter path.
If we establish the rule that LBRs can be enabled if and only if the correct type
is enabled (traditional/legacy vs. arch), then this can simply check host support.

> + return;
> +
> vmx_set_intercept_for_msr(vcpu, MSR_LBR_SELECT, MSR_TYPE_RW, set);
> vmx_set_intercept_for_msr(vcpu, MSR_LBR_TOS, MSR_TYPE_RW, set);
> }
> @@ -742,10 +749,12 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
> {
> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
> + bool lbr_enable = !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) &&
> + (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR);

Unnecessary guest CPUID lookup and VMCS read, i.e. this can be deferred to the
!lbr_desc->event path.

>
> if (!lbr_desc->event) {
> vmx_disable_lbr_msrs_passthrough(vcpu);
> - if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
> + if (lbr_enable)
> goto warn;
> if (test_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use))
> goto warn;
> @@ -768,7 +777,10 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
>
> static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
> {
> - if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
> + bool lbr_enable = !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) &&
> + (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR);
> +
> + if (!lbr_enable)
> intel_pmu_release_guest_lbr_event(vcpu);
> }
>
> --
> 2.27.0
>

2023-01-27 21:46:13

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 09/15] KVM: x86: Refine the matching and clearing logic for supported_xss

On Thu, Nov 24, 2022, Yang Weijiang wrote:
> From: Paolo Bonzini <[email protected]>
>
> Refine the code path of the existing clearing of supported_xss in this way:
> initialize the supported_xss with the filter of KVM_SUPPORTED_XSS mask and
> update its value in a bit clear manner (rather than bit setting).
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Like Xu <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 5 +++--
> arch/x86/kvm/x86.c | 6 +++++-
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9bd52ad3bbf4..2ab4c33b5008 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7738,9 +7738,10 @@ static __init void vmx_set_cpu_caps(void)
> kvm_cpu_cap_set(X86_FEATURE_UMIP);
>
> /* CPUID 0xD.1 */
> - kvm_caps.supported_xss = 0;

This needs to stay until VMX actually supports something.

> - if (!cpu_has_vmx_xsaves())
> + if (!cpu_has_vmx_xsaves()) {
> kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
> + kvm_caps.supported_xss = 0;

This is already handled in common KVM.

> + }
>
> /* CPUID 0x80000001 and 0x7 (RDPID) */
> if (!cpu_has_vmx_rdtscp()) {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 74c858eaa1ea..889be0c9176d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -217,6 +217,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
> | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
>
> +#define KVM_SUPPORTED_XSS 0
> +
> u64 __read_mostly host_efer;
> EXPORT_SYMBOL_GPL(host_efer);
>
> @@ -11999,8 +12001,10 @@ int kvm_arch_hardware_setup(void *opaque)
>
> rdmsrl_safe(MSR_EFER, &host_efer);
>
> - if (boot_cpu_has(X86_FEATURE_XSAVES))
> + if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> rdmsrl(MSR_IA32_XSS, host_xss);
> + kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
> + }
>
> kvm_init_pmu_capability();
>
> --
> 2.27.0
>

2023-01-27 22:46:28

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 00/15] Introduce Architectural LBR for vPMU

On Thu, Nov 24, 2022, Yang Weijiang wrote:
> Intel CPU model-specific LBR(Legacy LBR) has evolved to Architectural
> LBR(Arch LBR [0]), it's the replacement of legacy LBR on new platforms.
> The native support patches were merged into 5.9 kernel tree, and this
> patch series is to enable Arch LBR in vPMU so that guest can benefit
> from the feature.
>
> The main advantages of Arch LBR are [1]:
> - Faster context switching due to XSAVES support and faster reset of
> LBR MSRs via the new DEPTH MSR
> - Faster LBR read for a non-PEBS event due to XSAVES support, which
> lowers the overhead of the NMI handler.
> - Linux kernel can support the LBR features without knowing the model
> number of the current CPU.
>
> From end user's point of view, the usage of Arch LBR is the same as
> the Legacy LBR that has been merged in the mainline.
>
> Note, in this series, there's one restriction for guest Arch LBR, i.e.,
> guest can only set its LBR record depth the same as host's. This is due
> to the special behavior of MSR_ARCH_LBR_DEPTH:
> 1) On write to the MSR, it'll reset all Arch LBR recording MSRs to 0s.
> 2) XRSTORS resets all record MSRs to 0s if the saved depth mismatches
> MSR_ARCH_LBR_DEPTH.
> Enforcing the restriction keeps KVM Arch LBR vPMU working flow simple
> and straightforward.
>
> Paolo refactored the old series and the resulting patches became the
> base of this new series, therefore he's the author of some patches.

To be very blunt, this series is a mess. I don't want to point fingers as there
is plenty of blame to go around. The existing LBR support is a confusing mess,
vPMU as a whole has been neglected for too long, review feedback has been relatively
non-existent, and I'm sure some of the mess is due to Paolo trying to hastily fix
things up back when this was temporarily queued.

However, for arch LBR support to be merged, things need to change.

First and foremost, the existing LBR support needs to be documented. Someone,
I don't care who, needs to provide a detailed writeup of the contract between KVM
and perf. Specifically, I want to know:

1. When exactly is perf allowed to take control of LBR MRS. Task switch? IRQ?
NMI?

2. What is the expected behavior when perf is using LBRs? Is the guest supposed
to be traced?

3. Why does KVM snapshot DEBUGCTL with IRQs enabled, but disables IRQs when
accessing LBR MSRs?

It doesn't have to be polished, e.g. I'll happily wordsmith things into proper
documentation, but I want to have a very clear understanding of how LBR support
is _intended_ to function and how it all _actually_ functions without having to
make guesses.

And depending on the answers, I want to revisit KVM's LBR implementation before
tackling arch LBRs. Letting perf usurp LBRs while KVM has the vCPU loaded is
frankly ridiculous. Just have perf set a flag telling KVM that it needs to take
control of LBRs and have KVM service the flag as a request or something. Stealing
the LBRs back in IRQ context adds a stupid amount of complexity without much value,
e.g. waiting a few branches for KVM to get to a safe place isn't going to meaningfully
change the traces. If that can't actually happen, then why on earth does KVM need
to disable IRQs to read MSRs?

And AFAICT, since KVM unconditionally loads the guest's DEBUGCTL, whether or not
guest branches show up in the LBRs when the host is tracing is completely up to
the whims of the guest. If that's correct, then again, what's the point of the
dance between KVM and perf?

Beyond the "how does this work" issues, there needs to be tests. At the absolute
minimum, there needs to be selftests showing that this stuff actually works, that
save/restore (migration) works, that the MSRs can/can't be accessed when guest
CPUID is (in)correctly configured, etc. And I would really, really like to have
tests that force contention between host and guests, e.g. to make sure that KVM
isn't leaking host state or outright exploding, but I can understand that those
types of tests would be very difficult to write.

I've pushed a heavily reworked, but definitely broken, version to

[email protected]:sean-jc/linux.git x86/arch_lbrs

It compiles, but it's otherwise untested and there are known gaps. E.g. I omitted
toggling load+clear of ARCH_LBR_CTL because I couldn't figure out the intended
behavior.

2023-01-30 08:12:59

by Yang, Weijiang

[permalink] [raw]

Subject: Re: [PATCH v2 04/15] KVM: PMU: disable LBR handling if architectural LBR is available

On 1/28/2023 4:10 AM, Sean Christopherson wrote:
> On Thu, Nov 24, 2022, Yang Weijiang wrote:
>> From: Paolo Bonzini <[email protected]>
>>
>> Traditional LBR is absent on CPU models that have architectural LBR, so
>> disable all processing of traditional LBR MSRs if they are not there.
>>
>> Signed-off-by: Paolo Bonzini <[email protected]>
>> Signed-off-by: Yang Weijiang <[email protected]>
>> ---
>> arch/x86/kvm/vmx/pmu_intel.c | 32 ++++++++++++++++++++++----------
>> 1 file changed, 22 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index e5cec07ca8d9..905673228932 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -170,19 +170,23 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
>> static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
>> {
>> struct x86_pmu_lbr *records = vcpu_to_lbr_records(vcpu);
>> - bool ret = false;
>>
>> if (!intel_pmu_lbr_is_enabled(vcpu))
>> - return ret;
>> + return false;
>>
>> - ret = (index == MSR_LBR_SELECT) || (index == MSR_LBR_TOS) ||
>> - (index >= records->from && index < records->from + records->nr) ||
>> - (index >= records->to && index < records->to + records->nr);
>> + if (!guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) &&
> IIUC, the MSRs flat out don't exist _and_ KVM expects to passthrough MSRs to the
> guest, i.e. KVM should check host support, not guest support. Probably a moot
> point from a functionality perspective since KVM shouldn't allow LBRs to shouldn't
> be enabled for the guest, but from a performance perspective, checking guest CPUID
> is slooow.

OK, I'll change the check.

>
> That brings me to point #2, which is that KVM needs to disallow enabling legacy
> LBRs on CPUs that support arch LBRs. Again, IIUC, because KVM doesn't have the
> option to fallback to legacy LBRs,

Legacy LBR and Arch-lbr are exclusive on any platforms, on old
platforms, legacy LBR is available,

on new platforms, e.g., SPR, arch-lbr is present, so we don't have
fallback logic.

> that restriction needs to be treated as a bug
> fix. I'll post a separate patch unless my understanding is wrong.
>
>> + (index == MSR_LBR_SELECT || index == MSR_LBR_TOS))
>> + return true;
>>
>> - if (!ret && records->info)
>> - ret = (index >= records->info && index < records->info + records->nr);
>> + if ((index >= records->from && index < records->from + records->nr) ||
>> + (index >= records->to && index < records->to + records->nr))
>> + return true;
>>
>> - return ret;
>> + if (records->info && index >= records->info &&
>> + index < records->info + records->nr)
>> + return true;
>> +
>> + return false;
>> }
>>
>> static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
>> @@ -702,6 +706,9 @@ static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
>> vmx_set_intercept_for_msr(vcpu, lbr->info + i, MSR_TYPE_RW, set);
>> }
>>
>> + if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
> Similar to above, I really don't want to query guest CPUID in the VM-Enter path.
> If we establish the rule that LBRs can be enabled if and only if the correct type
> is enabled (traditional/legacy vs. arch), then this can simply check host support.

I understand your concerns, will try to use other efficient ways to
check guest arch-lbr support.

>
>> + return;
>> +
>> vmx_set_intercept_for_msr(vcpu, MSR_LBR_SELECT, MSR_TYPE_RW, set);
>> vmx_set_intercept_for_msr(vcpu, MSR_LBR_TOS, MSR_TYPE_RW, set);
>> }
>> @@ -742,10 +749,12 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
>> {
>> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
>> + bool lbr_enable = !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) &&
>> + (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR);
> Unnecessary guest CPUID lookup and VMCS read, i.e. this can be deferred to the
> !lbr_desc->event path.

OK

>
>>
>> if (!lbr_desc->event) {
>> vmx_disable_lbr_msrs_passthrough(vcpu);
>> - if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
>> + if (lbr_enable)
>> goto warn;
>> if (test_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use))
>> goto warn;
>> @@ -768,7 +777,10 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
>>
>> static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
>> {
>> - if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
>> + bool lbr_enable = !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) &&
>> + (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR);
>> +
>> + if (!lbr_enable)
>> intel_pmu_release_guest_lbr_event(vcpu);
>> }
>>
>> --
>> 2.27.0
>>

2023-01-30 12:38:08

by Yang, Weijiang

[permalink] [raw]

Subject: Re: [PATCH v2 09/15] KVM: x86: Refine the matching and clearing logic for supported_xss

On 1/28/2023 5:46 AM, Sean Christopherson wrote:
> On Thu, Nov 24, 2022, Yang Weijiang wrote:
>> From: Paolo Bonzini <[email protected]>
>>
>> Refine the code path of the existing clearing of supported_xss in this way:
>> initialize the supported_xss with the filter of KVM_SUPPORTED_XSS mask and
>> update its value in a bit clear manner (rather than bit setting).
>>
>> Suggested-by: Sean Christopherson <[email protected]>
>> Signed-off-by: Like Xu <[email protected]>
>> Signed-off-by: Paolo Bonzini <[email protected]>
>> Signed-off-by: Yang Weijiang <[email protected]>
>> ---
>> arch/x86/kvm/vmx/vmx.c | 5 +++--
>> arch/x86/kvm/x86.c | 6 +++++-
>> 2 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 9bd52ad3bbf4..2ab4c33b5008 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -7738,9 +7738,10 @@ static __init void vmx_set_cpu_caps(void)
>> kvm_cpu_cap_set(X86_FEATURE_UMIP);
>>
>> /* CPUID 0xD.1 */
>> - kvm_caps.supported_xss = 0;
> This needs to stay until VMX actually supports something.

Will modify this patch.

>
>> - if (!cpu_has_vmx_xsaves())
>> + if (!cpu_has_vmx_xsaves()) {
>> kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
>> + kvm_caps.supported_xss = 0;
> This is already handled in common KVM.
>
>> + }
>>
>> /* CPUID 0x80000001 and 0x7 (RDPID) */
>> if (!cpu_has_vmx_rdtscp()) {
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 74c858eaa1ea..889be0c9176d 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -217,6 +217,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
>> | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
>> | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
>>
>> +#define KVM_SUPPORTED_XSS 0
>> +
>> u64 __read_mostly host_efer;
>> EXPORT_SYMBOL_GPL(host_efer);
>>
>> @@ -11999,8 +12001,10 @@ int kvm_arch_hardware_setup(void *opaque)
>>
>> rdmsrl_safe(MSR_EFER, &host_efer);
>>
>> - if (boot_cpu_has(X86_FEATURE_XSAVES))
>> + if (boot_cpu_has(X86_FEATURE_XSAVES)) {
>> rdmsrl(MSR_IA32_XSS, host_xss);
>> + kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
>> + }
>>
>> kvm_init_pmu_capability();
>>
>> --
>> 2.27.0
>>

2023-01-30 13:39:18

by Yang, Weijiang

[permalink] [raw]

Subject: Re: [PATCH v2 00/15] Introduce Architectural LBR for vPMU

On 1/28/2023 6:46 AM, Sean Christopherson wrote:
> On Thu, Nov 24, 2022, Yang Weijiang wrote:
>> Intel CPU model-specific LBR(Legacy LBR) has evolved to Architectural
>> LBR(Arch LBR [0]), it's the replacement of legacy LBR on new platforms.
>> The native support patches were merged into 5.9 kernel tree, and this
>> patch series is to enable Arch LBR in vPMU so that guest can benefit
>> from the feature.
>>
>> The main advantages of Arch LBR are [1]:
>> - Faster context switching due to XSAVES support and faster reset of
>> LBR MSRs via the new DEPTH MSR
>> - Faster LBR read for a non-PEBS event due to XSAVES support, which
>> lowers the overhead of the NMI handler.
>> - Linux kernel can support the LBR features without knowing the model
>> number of the current CPU.
>>
>> From end user's point of view, the usage of Arch LBR is the same as
>> the Legacy LBR that has been merged in the mainline.
>>
>> Note, in this series, there's one restriction for guest Arch LBR, i.e.,
>> guest can only set its LBR record depth the same as host's. This is due
>> to the special behavior of MSR_ARCH_LBR_DEPTH:
>> 1) On write to the MSR, it'll reset all Arch LBR recording MSRs to 0s.
>> 2) XRSTORS resets all record MSRs to 0s if the saved depth mismatches
>> MSR_ARCH_LBR_DEPTH.
>> Enforcing the restriction keeps KVM Arch LBR vPMU working flow simple
>> and straightforward.
>>
>> Paolo refactored the old series and the resulting patches became the
>> base of this new series, therefore he's the author of some patches.
> To be very blunt, this series is a mess. I don't want to point fingers as there
> is plenty of blame to go around. The existing LBR support is a confusing mess,
> vPMU as a whole has been neglected for too long, review feedback has been relatively
> non-existent, and I'm sure some of the mess is due to Paolo trying to hastily fix
> things up back when this was temporarily queued.
>
> However, for arch LBR support to be merged, things need to change.
>
> First and foremost, the existing LBR support needs to be documented. Someone,
> I don't care who, needs to provide a detailed writeup of the contract between KVM
> and perf. Specifically, I want to know:
>
> 1. When exactly is perf allowed to take control of LBR MRS. Task switch? IRQ?
> NMI?
>
> 2. What is the expected behavior when perf is using LBRs? Is the guest supposed
> to be traced?
>
> 3. Why does KVM snapshot DEBUGCTL with IRQs enabled, but disables IRQs when
> accessing LBR MSRs?
>
> It doesn't have to be polished, e.g. I'll happily wordsmith things into proper
> documentation, but I want to have a very clear understanding of how LBR support
> is _intended_ to function and how it all _actually_ functions without having to
> make guesses.
>
> And depending on the answers, I want to revisit KVM's LBR implementation before
> tackling arch LBRs. Letting perf usurp LBRs while KVM has the vCPU loaded is
> frankly ridiculous. Just have perf set a flag telling KVM that it needs to take
> control of LBRs and have KVM service the flag as a request or something. Stealing
> the LBRs back in IRQ context adds a stupid amount of complexity without much value,
> e.g. waiting a few branches for KVM to get to a safe place isn't going to meaningfully
> change the traces. If that can't actually happen, then why on earth does KVM need
> to disable IRQs to read MSRs?
>
> And AFAICT, since KVM unconditionally loads the guest's DEBUGCTL, whether or not
> guest branches show up in the LBRs when the host is tracing is completely up to
> the whims of the guest. If that's correct, then again, what's the point of the
> dance between KVM and perf?
>
> Beyond the "how does this work" issues, there needs to be tests. At the absolute
> minimum, there needs to be selftests showing that this stuff actually works, that
> save/restore (migration) works, that the MSRs can/can't be accessed when guest
> CPUID is (in)correctly configured, etc. And I would really, really like to have
> tests that force contention between host and guests, e.g. to make sure that KVM
> isn't leaking host state or outright exploding, but I can understand that those
> types of tests would be very difficult to write.
>
> I've pushed a heavily reworked, but definitely broken, version to
>
> [email protected]:sean-jc/linux.git x86/arch_lbrs
>
> It compiles, but it's otherwise untested and there are known gaps. E.g. I omitted
> toggling load+clear of ARCH_LBR_CTL because I couldn't figure out the intended
> behavior.

Appreciated for your elaborate review and comments!

I'll check your reworked version and discuss with stakeholders on how to
move the work forward.

2023-06-05 10:07:50

by Like Xu

[permalink] [raw]

Subject: Re: [PATCH v2 00/15] Introduce Architectural LBR for vPMU

+xiongzha to follow up.

On 28/1/2023 6:46 am, Sean Christopherson wrote:
> On Thu, Nov 24, 2022, Yang Weijiang wrote:
>> Intel CPU model-specific LBR(Legacy LBR) has evolved to Architectural
>> LBR(Arch LBR [0]), it's the replacement of legacy LBR on new platforms.
>> The native support patches were merged into 5.9 kernel tree, and this
>> patch series is to enable Arch LBR in vPMU so that guest can benefit
>> from the feature.
>>
>> The main advantages of Arch LBR are [1]:
>> - Faster context switching due to XSAVES support and faster reset of
>> LBR MSRs via the new DEPTH MSR
>> - Faster LBR read for a non-PEBS event due to XSAVES support, which
>> lowers the overhead of the NMI handler.
>> - Linux kernel can support the LBR features without knowing the model
>> number of the current CPU.
>>
>> From end user's point of view, the usage of Arch LBR is the same as
>> the Legacy LBR that has been merged in the mainline.
>>
>> Note, in this series, there's one restriction for guest Arch LBR, i.e.,
>> guest can only set its LBR record depth the same as host's. This is due
>> to the special behavior of MSR_ARCH_LBR_DEPTH:
>> 1) On write to the MSR, it'll reset all Arch LBR recording MSRs to 0s.
>> 2) XRSTORS resets all record MSRs to 0s if the saved depth mismatches
>> MSR_ARCH_LBR_DEPTH.
>> Enforcing the restriction keeps KVM Arch LBR vPMU working flow simple
>> and straightforward.
>>
>> Paolo refactored the old series and the resulting patches became the
>> base of this new series, therefore he's the author of some patches.
>
> To be very blunt, this series is a mess. I don't want to point fingers as there
> is plenty of blame to go around. The existing LBR support is a confusing mess,
> vPMU as a whole has been neglected for too long, review feedback has been relatively
> non-existent, and I'm sure some of the mess is due to Paolo trying to hastily fix
> things up back when this was temporarily queued.
>
> However, for arch LBR support to be merged, things need to change.
>
> First and foremost, the existing LBR support needs to be documented. Someone,
> I don't care who, needs to provide a detailed writeup of the contract between KVM
> and perf. Specifically, I want to know:
>
> 1. When exactly is perf allowed to take control of LBR MRS. Task switch? IRQ?
> NMI?
>
> 2. What is the expected behavior when perf is using LBRs? Is the guest supposed
> to be traced?
>
> 3. Why does KVM snapshot DEBUGCTL with IRQs enabled, but disables IRQs when
> accessing LBR MSRs?
>
> It doesn't have to be polished, e.g. I'll happily wordsmith things into proper
> documentation, but I want to have a very clear understanding of how LBR support
> is _intended_ to function and how it all _actually_ functions without having to
> make guesses.

This is a very good topic for LPC KVM Microconference.

Many thanks to Sean for ranting about something that only I was thinking
about before. Having host and guest be able to use PMU at the same time
in peace (hybrid profiling mode) is a very use-case worthy goal rather than
introducing exclusivity, and it's clear that kvm+perf lacks reasonable and
well-documented support when there is host perf user interference.

Ref: https://lpc.events/event/17/page/200-proposed-microconferences#kvm

>
> And depending on the answers, I want to revisit KVM's LBR implementation before
> tackling arch LBRs. Letting perf usurp LBRs while KVM has the vCPU loaded is
> frankly ridiculous. Just have perf set a flag telling KVM that it needs to take
> control of LBRs and have KVM service the flag as a request or something. Stealing
> the LBRs back in IRQ context adds a stupid amount of complexity without much value,
> e.g. waiting a few branches for KVM to get to a safe place isn't going to meaningfully
> change the traces. If that can't actually happen, then why on earth does KVM need
> to disable IRQs to read MSRs?
>
> And AFAICT, since KVM unconditionally loads the guest's DEBUGCTL, whether or not
> guest branches show up in the LBRs when the host is tracing is completely up to
> the whims of the guest. If that's correct, then again, what's the point of the
> dance between KVM and perf?
>
> Beyond the "how does this work" issues, there needs to be tests. At the absolute
> minimum, there needs to be selftests showing that this stuff actually works, that
> save/restore (migration) works, that the MSRs can/can't be accessed when guest
> CPUID is (in)correctly configured, etc. And I would really, really like to have
> tests that force contention between host and guests, e.g. to make sure that KVM
> isn't leaking host state or outright exploding, but I can understand that those
> types of tests would be very difficult to write.
>
> I've pushed a heavily reworked, but definitely broken, version to
>
> [email protected]:sean-jc/linux.git x86/arch_lbrs
>
> It compiles, but it's otherwise untested and there are known gaps. E.g. I omitted
> toggling load+clear of ARCH_LBR_CTL because I couldn't figure out the intended
> behavior.