From: Kan Liang <[email protected]>
For some Alder Lake N machine, the below unchecked MSR access error may be
triggered.
[ 0.088017] rcu: Hierarchical SRCU implementation.
[ 0.088017] unchecked MSR access error: WRMSR to 0x38f (tried to write
0x0001000f0000003f) at rIP: 0xffffffffb5684de8 (native_write_msr+0x8/0x30)
[ 0.088017] Call Trace:
[ 0.088017] <TASK>
[ 0.088017] __intel_pmu_enable_all.constprop.46+0x4a/0xa0
The Alder Lake N only has e-cores. The X86_FEATURE_HYBRID_CPU flag is
not set. The perf cannot retrieve the correct CPU type via
get_this_hybrid_cpu_type(). The model specific get_hybrid_cpu_type() is
hardcode to p-core. The wrong CPU type is given to the PMU of the
Alder Lake N.
Add a model check to return the e-core CPU type for Alder Lake N.
Factor out x86_pmu_get_this_hybrid_cpu_type().
Fixes: c2a960f7c574 ("perf/x86: Add new Alder Lake and Raptor Lake support")
Reported-by: Jianfeng Gao <[email protected]>
Tested-by: Jianfeng Gao <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/core.c | 15 +++++++++++----
arch/x86/events/intel/core.c | 8 ++++----
arch/x86/events/perf_event.h | 2 ++
3 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 0b19ffaa2dee..94cdf7e76b86 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2076,6 +2076,16 @@ void x86_pmu_update_cpu_context(struct pmu *pmu, int cpu)
cpuctx->ctx.pmu = pmu;
}
+u8 x86_pmu_get_this_hybrid_cpu_type(void)
+{
+ u8 cpu_type = get_this_hybrid_cpu_type();
+
+ if (!cpu_type && x86_pmu.get_hybrid_cpu_type)
+ return x86_pmu.get_hybrid_cpu_type();
+
+ return cpu_type;
+}
+
static int __init init_hw_perf_events(void)
{
struct x86_pmu_quirk *quirk;
@@ -2175,13 +2185,10 @@ static int __init init_hw_perf_events(void)
if (err)
goto out2;
} else {
- u8 cpu_type = get_this_hybrid_cpu_type();
+ u8 cpu_type = x86_pmu_get_this_hybrid_cpu_type();
struct x86_hybrid_pmu *hybrid_pmu;
int i, j;
- if (!cpu_type && x86_pmu.get_hybrid_cpu_type)
- cpu_type = x86_pmu.get_hybrid_cpu_type();
-
for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
hybrid_pmu = &x86_pmu.hybrid_pmu[i];
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 217c5695cbb0..1d57cf0be806 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4267,6 +4267,9 @@ static int adl_hw_config(struct perf_event *event)
static u8 adl_get_hybrid_cpu_type(void)
{
+ if (boot_cpu_data.x86_model == INTEL_FAM6_ALDERLAKE_N)
+ return hybrid_small;
+
return hybrid_big;
}
@@ -4430,13 +4433,10 @@ static void flip_smm_bit(void *data)
static bool init_hybrid_pmu(int cpu)
{
struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
- u8 cpu_type = get_this_hybrid_cpu_type();
+ u8 cpu_type = x86_pmu_get_this_hybrid_cpu_type();
struct x86_hybrid_pmu *pmu = NULL;
int i;
- if (!cpu_type && x86_pmu.get_hybrid_cpu_type)
- cpu_type = x86_pmu.get_hybrid_cpu_type();
-
for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
if (x86_pmu.hybrid_pmu[i].cpu_type == cpu_type) {
pmu = &x86_pmu.hybrid_pmu[i];
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index ba30f24bec41..c1bf7e6af6a0 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1176,6 +1176,8 @@ void x86_pmu_show_pmu_cap(int num_counters, int num_counters_fixed,
void x86_pmu_update_cpu_context(struct pmu *pmu, int cpu);
+u8 x86_pmu_get_this_hybrid_cpu_type(void);
+
extern struct event_constraint emptyconstraint;
extern struct event_constraint unconstrained;
--
2.35.1
On Thu, Aug 18, 2022 at 11:15:30AM -0700, [email protected] wrote:
> The Alder Lake N only has e-cores. The X86_FEATURE_HYBRID_CPU flag is
> not set. The perf cannot retrieve the correct CPU type via
> get_this_hybrid_cpu_type(). The model specific get_hybrid_cpu_type() is
> hardcode to p-core. The wrong CPU type is given to the PMU of the
> Alder Lake N.
If ADL-N isn't in fact a hybrid CPU, then *WHY* are we running
init_hybrid_pmu() and setting up all that nonsense?
That is, wouldn't the right thing be to remove ALDERLAKE_N from the rest
of {ALDER,RAPTOP}LAKE and create a non-hybrid PMU setup for it?
On Fri, Aug 19, 2022 at 10:05:40AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 18, 2022 at 11:15:30AM -0700, [email protected] wrote:
>
> > The Alder Lake N only has e-cores. The X86_FEATURE_HYBRID_CPU flag is
> > not set. The perf cannot retrieve the correct CPU type via
> > get_this_hybrid_cpu_type(). The model specific get_hybrid_cpu_type() is
> > hardcode to p-core. The wrong CPU type is given to the PMU of the
> > Alder Lake N.
>
> If ADL-N isn't in fact a hybrid CPU, then *WHY* are we running
> init_hybrid_pmu() and setting up all that nonsense?
>
> That is, wouldn't the right thing be to remove ALDERLAKE_N from the rest
> of {ALDER,RAPTOP}LAKE and create a non-hybrid PMU setup for it?
Something like the *completely* untested below.. which adds it like a
regular atom chip (which it is).
(I basically did copy/paste of tremont and added bits from the cpu_atom
thing from alderlake -- but might well have missed something)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2db93498ff71..e509f1033a2d 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5974,6 +5974,38 @@ __init int intel_pmu_init(void)
name = "Tremont";
break;
+ case INTEL_FAM6_ALDERLAKE_N:
+ x86_pmu.mid_ack = true;
+ memcpy(hw_cache_event_ids, glp_hw_cache_event_ids,
+ sizeof(hw_cache_event_ids));
+ memcpy(hw_cache_extra_regs, tnt_hw_cache_extra_regs,
+ sizeof(hw_cache_extra_regs));
+ hw_cache_event_ids[C(ITLB)][C(OP_READ)][C(RESULT_ACCESS)] = -1;
+
+ intel_pmu_lbr_init_skl();
+
+ x86_pmu.event_constraints = intel_slm_event_constraints;
+ x86_pmu.pebs_constraints = intel_grt_pebs_event_constraints;
+ x86_pmu.extra_regs = intel_grt_extra_regs;
+ /*
+ * It's recommended to use CPU_CLK_UNHALTED.CORE_P + NPEBS
+ * for precise cycles.
+ */
+ x86_pmu.pebs_aliases = NULL;
+ x86_pmu.pebs_prec_dist = true;
+ x86_pmu.lbr_pt_coexist = true;
+ x86_pmu.flags |= PMU_FL_HAS_RSP_1;
+ x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
+ x86_pmu.flags |= PMU_FL_PEBS_ALL;
+ x86_pmu.flags |= PMU_FL_INSTR_LATENCY;
+ x86_pmu.flags |= PMU_FL_MEM_LOADS_AUX;
+ x86_pmu.get_event_constraints = tnt_get_event_constraints;
+ td_attr = tnt_events_attrs;
+ extra_attr = slm_format_attr;
+ pr_cont("Gracemont events, ");
+ name = "Gracemont";
+ break;
+
case INTEL_FAM6_WESTMERE:
case INTEL_FAM6_WESTMERE_EP:
case INTEL_FAM6_WESTMERE_EX:
@@ -6318,7 +6350,6 @@ __init int intel_pmu_init(void)
case INTEL_FAM6_ALDERLAKE:
case INTEL_FAM6_ALDERLAKE_L:
- case INTEL_FAM6_ALDERLAKE_N:
case INTEL_FAM6_RAPTORLAKE:
case INTEL_FAM6_RAPTORLAKE_P:
/*
On Thu, Aug 18, 2022 at 11:15:30AM -0700, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> For some Alder Lake N machine, the below unchecked MSR access error may be
> triggered.
>
> [ 0.088017] rcu: Hierarchical SRCU implementation.
> [ 0.088017] unchecked MSR access error: WRMSR to 0x38f (tried to write
> 0x0001000f0000003f) at rIP: 0xffffffffb5684de8 (native_write_msr+0x8/0x30)
> [ 0.088017] Call Trace:
> [ 0.088017] <TASK>
> [ 0.088017] __intel_pmu_enable_all.constprop.46+0x4a/0xa0
FWIW, I seem to get the same error when booting KVM on my ADL. I'm
fairly sure the whole CPUID vs vCPU thing is a trainwreck.
On 2022-08-19 4:33 a.m., Peter Zijlstra wrote:
> On Fri, Aug 19, 2022 at 10:05:40AM +0200, Peter Zijlstra wrote:
>> On Thu, Aug 18, 2022 at 11:15:30AM -0700, [email protected] wrote:
>>
>>> The Alder Lake N only has e-cores. The X86_FEATURE_HYBRID_CPU flag is
>>> not set. The perf cannot retrieve the correct CPU type via
>>> get_this_hybrid_cpu_type(). The model specific get_hybrid_cpu_type() is
>>> hardcode to p-core. The wrong CPU type is given to the PMU of the
>>> Alder Lake N.
>>
>> If ADL-N isn't in fact a hybrid CPU, then *WHY* are we running
>> init_hybrid_pmu() and setting up all that nonsense?
>>
>> That is, wouldn't the right thing be to remove ALDERLAKE_N from the rest
>> of {ALDER,RAPTOP}LAKE and create a non-hybrid PMU setup for it?
>
I think the only issue should be the PMU name. The non-hybrid PMU name
is "cpu". The hybrid PMU name is "cpu_$coretype". If we move the
ALDERLAKE_N to the non-hybrid PMU, the PMU name will be changed from
"cpu_atom" to "cpu". It will be different from the rest of
{ALDER,RAPTOP}LAKE.
Also, I think we have to update the perf tool for the events because of
the PMU name change.
But I guess it should be OK, since the ALDERLAKE_N was just added and we
know its an Atom-only system.
> Something like the *completely* untested below.. which adds it like a
> regular atom chip (which it is).
I will do more tests and send out a V2.
Thanks,
Kan
>
> (I basically did copy/paste of tremont and added bits from the cpu_atom
> thing from alderlake -- but might well have missed something)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 2db93498ff71..e509f1033a2d 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -5974,6 +5974,38 @@ __init int intel_pmu_init(void)
> name = "Tremont";
> break;
>
> + case INTEL_FAM6_ALDERLAKE_N:
> + x86_pmu.mid_ack = true;
> + memcpy(hw_cache_event_ids, glp_hw_cache_event_ids,
> + sizeof(hw_cache_event_ids));
> + memcpy(hw_cache_extra_regs, tnt_hw_cache_extra_regs,
> + sizeof(hw_cache_extra_regs));
> + hw_cache_event_ids[C(ITLB)][C(OP_READ)][C(RESULT_ACCESS)] = -1;
> +
> + intel_pmu_lbr_init_skl();
> +
> + x86_pmu.event_constraints = intel_slm_event_constraints;
> + x86_pmu.pebs_constraints = intel_grt_pebs_event_constraints;
> + x86_pmu.extra_regs = intel_grt_extra_regs;
> + /*
> + * It's recommended to use CPU_CLK_UNHALTED.CORE_P + NPEBS
> + * for precise cycles.
> + */
> + x86_pmu.pebs_aliases = NULL;
> + x86_pmu.pebs_prec_dist = true;
> + x86_pmu.lbr_pt_coexist = true;
> + x86_pmu.flags |= PMU_FL_HAS_RSP_1;
> + x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
> + x86_pmu.flags |= PMU_FL_PEBS_ALL;
> + x86_pmu.flags |= PMU_FL_INSTR_LATENCY;
> + x86_pmu.flags |= PMU_FL_MEM_LOADS_AUX;
> + x86_pmu.get_event_constraints = tnt_get_event_constraints;
> + td_attr = tnt_events_attrs;
> + extra_attr = slm_format_attr;
> + pr_cont("Gracemont events, ");
> + name = "Gracemont";
> + break;
> +
> case INTEL_FAM6_WESTMERE:
> case INTEL_FAM6_WESTMERE_EP:
> case INTEL_FAM6_WESTMERE_EX:
> @@ -6318,7 +6350,6 @@ __init int intel_pmu_init(void)
>
> case INTEL_FAM6_ALDERLAKE:
> case INTEL_FAM6_ALDERLAKE_L:
> - case INTEL_FAM6_ALDERLAKE_N:
> case INTEL_FAM6_RAPTORLAKE:
> case INTEL_FAM6_RAPTORLAKE_P:
> /*
On 2022-08-19 10:38 a.m., Peter Zijlstra wrote:
> On Thu, Aug 18, 2022 at 11:15:30AM -0700, [email protected] wrote:
>> From: Kan Liang <[email protected]>
>>
>> For some Alder Lake N machine, the below unchecked MSR access error may be
>> triggered.
>>
>> [ 0.088017] rcu: Hierarchical SRCU implementation.
>> [ 0.088017] unchecked MSR access error: WRMSR to 0x38f (tried to write
>> 0x0001000f0000003f) at rIP: 0xffffffffb5684de8 (native_write_msr+0x8/0x30)
>> [ 0.088017] Call Trace:
>> [ 0.088017] <TASK>
>> [ 0.088017] __intel_pmu_enable_all.constprop.46+0x4a/0xa0
>
> FWIW, I seem to get the same error when booting KVM on my ADL. I'm
> fairly sure the whole CPUID vs vCPU thing is a trainwreck.
We will enhance the CPUID to address the issues. Hopefully, we can have
them supported in the next generation.
Thanks,
Kan
On Mon, Aug 22, 2022 at 09:24:57AM -0400, Liang, Kan wrote:
> I think the only issue should be the PMU name. The non-hybrid PMU name
> is "cpu". The hybrid PMU name is "cpu_$coretype". If we move the
> ALDERLAKE_N to the non-hybrid PMU, the PMU name will be changed from
> "cpu_atom" to "cpu". It will be different from the rest of
> {ALDER,RAPTOP}LAKE.
>
> Also, I think we have to update the perf tool for the events because of
> the PMU name change.
>
> But I guess it should be OK, since the ALDERLAKE_N was just added and we
> know its an Atom-only system.
cpu/caps/pmu_name should be 'Gracemont', which is exactly like all the
other !hybrid setups. Surely perf-tools already knows about this
pattern.
IOW, if you need to change perf-tools for this, someone did something
wrong somewhere.
(also, I just noticed, 'Tremont' is the *only* PMU that has a
capitalized name, perhaps we don't want Gracemont to follow but instead
fix tremont if that is still possible)
On Mon, Aug 22, 2022 at 09:28:31AM -0400, Liang, Kan wrote:
>
>
> On 2022-08-19 10:38 a.m., Peter Zijlstra wrote:
> > On Thu, Aug 18, 2022 at 11:15:30AM -0700, [email protected] wrote:
> >> From: Kan Liang <[email protected]>
> >>
> >> For some Alder Lake N machine, the below unchecked MSR access error may be
> >> triggered.
> >>
> >> [ 0.088017] rcu: Hierarchical SRCU implementation.
> >> [ 0.088017] unchecked MSR access error: WRMSR to 0x38f (tried to write
> >> 0x0001000f0000003f) at rIP: 0xffffffffb5684de8 (native_write_msr+0x8/0x30)
> >> [ 0.088017] Call Trace:
> >> [ 0.088017] <TASK>
> >> [ 0.088017] __intel_pmu_enable_all.constprop.46+0x4a/0xa0
> >
> > FWIW, I seem to get the same error when booting KVM on my ADL. I'm
> > fairly sure the whole CPUID vs vCPU thing is a trainwreck.
>
> We will enhance the CPUID to address the issues. Hopefully, we can have
> them supported in the next generation.
How!? A vCPU can readily migrate between a big and small CPU. There is
no way the guest can sanely program the (v)MSRs and expect it to work.
On 8/22/2022 3:48 PM, Peter Zijlstra wrote:
> On Mon, Aug 22, 2022 at 09:28:31AM -0400, Liang, Kan wrote:
>>
>> On 2022-08-19 10:38 a.m., Peter Zijlstra wrote:
>>> On Thu, Aug 18, 2022 at 11:15:30AM -0700, [email protected] wrote:
>>>> From: Kan Liang <[email protected]>
>>>>
>>>> For some Alder Lake N machine, the below unchecked MSR access error may be
>>>> triggered.
>>>>
>>>> [ 0.088017] rcu: Hierarchical SRCU implementation.
>>>> [ 0.088017] unchecked MSR access error: WRMSR to 0x38f (tried to write
>>>> 0x0001000f0000003f) at rIP: 0xffffffffb5684de8 (native_write_msr+0x8/0x30)
>>>> [ 0.088017] Call Trace:
>>>> [ 0.088017] <TASK>
>>>> [ 0.088017] __intel_pmu_enable_all.constprop.46+0x4a/0xa0
>>> FWIW, I seem to get the same error when booting KVM on my ADL. I'm
>>> fairly sure the whole CPUID vs vCPU thing is a trainwreck.
>> We will enhance the CPUID to address the issues. Hopefully, we can have
>> them supported in the next generation.
> How!? A vCPU can readily migrate between a big and small CPU. There is
> no way the guest can sanely program the (v)MSRs and expect it to work.
In principle this can be fixed by affinitizing the vcpus to their
respective type and reporting the right type, and I thought qemu was
supported to support this. But it would be certainly a much more complex
command line.
If you don't do this, architectural events should work, but yes any non
architectural will not count correctly.
I guess one way to detect this situation would be if the CPUID is
Alderlake, but there is no hybrid support reported in CPUID. Then it's
likely a situation like this and it could be special cased in the perf
tools and only show a limited event list.
-Andi
On Mon, Aug 22, 2022 at 09:28:31AM -0400, Liang, Kan wrote:
>
>
> On 2022-08-19 10:38 a.m., Peter Zijlstra wrote:
> > On Thu, Aug 18, 2022 at 11:15:30AM -0700, [email protected] wrote:
> >> From: Kan Liang <[email protected]>
> >>
> >> For some Alder Lake N machine, the below unchecked MSR access error may be
> >> triggered.
> >>
> >> [ 0.088017] rcu: Hierarchical SRCU implementation.
> >> [ 0.088017] unchecked MSR access error: WRMSR to 0x38f (tried to write
> >> 0x0001000f0000003f) at rIP: 0xffffffffb5684de8 (native_write_msr+0x8/0x30)
> >> [ 0.088017] Call Trace:
> >> [ 0.088017] <TASK>
> >> [ 0.088017] __intel_pmu_enable_all.constprop.46+0x4a/0xa0
> >
> > FWIW, I seem to get the same error when booting KVM on my ADL. I'm
> > fairly sure the whole CPUID vs vCPU thing is a trainwreck.
>
> We will enhance the CPUID to address the issues. Hopefully, we can have
> them supported in the next generation.
>
How about this?
---
[ 0.170231] Performance Events: Alderlake Hybrid events, full-width counters, Intel PMU driver.
[ 0.171420] core: hybrid PMU and virt are incompatible
arch/x86/events/intel/core.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2db93498ff71..232e24324fd7 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4473,6 +4473,11 @@ static bool init_hybrid_pmu(int cpu)
struct x86_hybrid_pmu *pmu = NULL;
int i;
+ if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ pr_warn_once("hybrid PMU and virt are incompatible\n");
+ return false;
+ }
+
if (!cpu_type && x86_pmu.get_hybrid_cpu_type)
cpu_type = x86_pmu.get_hybrid_cpu_type();
On 2022-08-22 10:39 a.m., Peter Zijlstra wrote:
> On Mon, Aug 22, 2022 at 09:28:31AM -0400, Liang, Kan wrote:
>>
>>
>> On 2022-08-19 10:38 a.m., Peter Zijlstra wrote:
>>> On Thu, Aug 18, 2022 at 11:15:30AM -0700, [email protected] wrote:
>>>> From: Kan Liang <[email protected]>
>>>>
>>>> For some Alder Lake N machine, the below unchecked MSR access error may be
>>>> triggered.
>>>>
>>>> [ 0.088017] rcu: Hierarchical SRCU implementation.
>>>> [ 0.088017] unchecked MSR access error: WRMSR to 0x38f (tried to write
>>>> 0x0001000f0000003f) at rIP: 0xffffffffb5684de8 (native_write_msr+0x8/0x30)
>>>> [ 0.088017] Call Trace:
>>>> [ 0.088017] <TASK>
>>>> [ 0.088017] __intel_pmu_enable_all.constprop.46+0x4a/0xa0
>>>
>>> FWIW, I seem to get the same error when booting KVM on my ADL. I'm
>>> fairly sure the whole CPUID vs vCPU thing is a trainwreck.
>>
>> We will enhance the CPUID to address the issues. Hopefully, we can have
>> them supported in the next generation.
>>
>
> How about this?
>
> ---
> [ 0.170231] Performance Events: Alderlake Hybrid events, full-width counters, Intel PMU driver.
> [ 0.171420] core: hybrid PMU and virt are incompatible
>
>
> arch/x86/events/intel/core.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 2db93498ff71..232e24324fd7 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4473,6 +4473,11 @@ static bool init_hybrid_pmu(int cpu)
> struct x86_hybrid_pmu *pmu = NULL;
> int i;
>
> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> + pr_warn_once("hybrid PMU and virt are incompatible\n");
> + return false;
> + }
I would expect that KVM exposes the enhanced CPUID to the guest. The
guest vCPU should knows its specific CPU type. The KVM should bind the
vCPU to the same type of CPUs.
Before KVM provides such support, I guess it may be OK to have the
warning. Maybe more specifically, only architecture events work.
Thanks,
Kan
> +
> if (!cpu_type && x86_pmu.get_hybrid_cpu_type)
> cpu_type = x86_pmu.get_hybrid_cpu_type();
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 2db93498ff71..232e24324fd7 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4473,6 +4473,11 @@ static bool init_hybrid_pmu(int cpu)
> struct x86_hybrid_pmu *pmu = NULL;
> int i;
>
> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> + pr_warn_once("hybrid PMU and virt are incompatible\n");
> + return false;
> + }
It's totally possible to virtualize hybrid correctly, so I don't think
this is justified
-Andi
On 2022-08-22 9:55 a.m., Peter Zijlstra wrote:
> On Mon, Aug 22, 2022 at 09:24:57AM -0400, Liang, Kan wrote:
>> I think the only issue should be the PMU name. The non-hybrid PMU name
>> is "cpu". The hybrid PMU name is "cpu_$coretype". If we move the
>> ALDERLAKE_N to the non-hybrid PMU, the PMU name will be changed from
>> "cpu_atom" to "cpu". It will be different from the rest of
>> {ALDER,RAPTOP}LAKE.
>>
>> Also, I think we have to update the perf tool for the events because of
>> the PMU name change.
>>
>> But I guess it should be OK, since the ALDERLAKE_N was just added and we
>> know its an Atom-only system.
>
> cpu/caps/pmu_name should be 'Gracemont', which is exactly like all the
> other !hybrid setups. Surely perf-tools already knows about this
> pattern.
>
> IOW, if you need to change perf-tools for this, someone did something
> wrong somewhere.
The event list for ADL and RPL is different from the non-hybrid
platforms. We combine the events from big core and small core into a
single file and use the PMU name to distinguish from them. The PMU name
is either cpu_core or cpu_atom.
If we change the ADL-N to non-hybrid, the simplest way is to create a
dedicate gracemont event list. Or we have to specially handle the ADL-N
in the parsing codes. We have to update the tool for either way.
>
> (also, I just noticed, 'Tremont' is the *only* PMU that has a
> capitalized name, perhaps we don't want Gracemont to follow but instead
> fix tremont if that is still possible)
I don't think the tool rely on the name under cpu/caps/pmu_name.
The event list rely on the CPU model number.
It should be OK to fix the tremont name.
Thanks,
Kan
On Mon, Aug 22, 2022 at 05:08:55PM +0200, Andi Kleen wrote:
>
> > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > index 2db93498ff71..232e24324fd7 100644
> > --- a/arch/x86/events/intel/core.c
> > +++ b/arch/x86/events/intel/core.c
> > @@ -4473,6 +4473,11 @@ static bool init_hybrid_pmu(int cpu)
> > struct x86_hybrid_pmu *pmu = NULL;
> > int i;
> > + if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> > + pr_warn_once("hybrid PMU and virt are incompatible\n");
> > + return false;
> > + }
>
> It's totally possible to virtualize hybrid correctly, so I don't think this
> is justified
With a magic incantation and a sacrificial chicken sure, but the typical
guest will not have it set up right and we'll get the kernel doing
*splat*.
So I'm going to keep this and then some virt person can provide us a new
flag for when they think the hypervisor did the right thing and exclude
themselves from this ban.
On Mon, Aug 22, 2022 at 10:59:13AM -0400, Liang, Kan wrote:
> > + if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> > + pr_warn_once("hybrid PMU and virt are incompatible\n");
> > + return false;
> > + }
>
> I would expect that KVM exposes the enhanced CPUID to the guest. The
> guest vCPU should knows its specific CPU type. The KVM should bind the
> vCPU to the same type of CPUs.
>
> Before KVM provides such support, I guess it may be OK to have the
> warning. Maybe more specifically, only architecture events work.
Well, as is I randomly get #GPs when I boot a guest kernel.
The QEMU thing is passing in Core CPUID to all vCPUs (because per luck
it starts on a biggie), but if the vCPU lands on a small then the
PERF_GLOBAL_CTRL write will #GP because biggie has more bits set than
small knows how to deal with:
0001000f000000ff vs 000000070000003f, or something like that.
So I'm thinking we should entirely kill the thing by default and allow
KVM to set some magical bit when it knows the CPUID and affinity crud is
just right to have it maybe work. But that's for some virt person to
sort out...
On Mon, Aug 22, 2022, Peter Zijlstra wrote:
> On Mon, Aug 22, 2022 at 05:08:55PM +0200, Andi Kleen wrote:
> >
> > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > > index 2db93498ff71..232e24324fd7 100644
> > > --- a/arch/x86/events/intel/core.c
> > > +++ b/arch/x86/events/intel/core.c
> > > @@ -4473,6 +4473,11 @@ static bool init_hybrid_pmu(int cpu)
> > > struct x86_hybrid_pmu *pmu = NULL;
> > > int i;
> > > + if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> > > + pr_warn_once("hybrid PMU and virt are incompatible\n");
> > > + return false;
> > > + }
> >
> > It's totally possible to virtualize hybrid correctly, so I don't think this
> > is justified
>
> With a magic incantation and a sacrificial chicken sure,
Pretty sure this one requires at least a goat.
> but the typical guest will not have it set up right and we'll get the kernel
> doing *splat*.
I 100% agree that typical VMMs will not get this right, but at the same time I
think this is firmly a host _kernel_ bug.
Checking X86_FEATURE_HYPERVISOR in the guest won't handle things like trying to
run a non-hyrbid vCPU model on a hybrid CPU, because IIUC, the "is_hybrid()" is
purely based on FMS, i.e. will be false if someone enumerates a big core vCPU on
a hybrid CPU.
So until KVM gets sane handling, shouldn't this be?
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f969410d0c90..0a8accfc3018 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2999,12 +2999,8 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
}
cap->version = x86_pmu.version;
- /*
- * KVM doesn't support the hybrid PMU yet.
- * Return the common value in global x86_pmu,
- * which available for all cores.
- */
- cap->num_counters_gp = x86_pmu.num_counters;
+ /* KVM doesn't support the hybrid PMU yet. */
+ cap->num_counters_gp = is_hybrid() ? 0 : x86_pmu.num_counters;
cap->num_counters_fixed = x86_pmu.num_counters_fixed;
cap->bit_width_gp = x86_pmu.cntval_bits;
cap->bit_width_fixed = x86_pmu.cntval_bits;
>
> Checking X86_FEATURE_HYPERVISOR in the guest won't handle things like trying to
> run a non-hyrbid vCPU model on a hybrid CPU, because IIUC, the "is_hybrid()" is
> purely based on FMS, i.e. will be false if someone enumerates a big core vCPU on
> a hybrid CPU.
>
> So until KVM gets sane handling, shouldn't this be?
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index f969410d0c90..0a8accfc3018 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2999,12 +2999,8 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
> }
>
> cap->version = x86_pmu.version;
> - /*
> - * KVM doesn't support the hybrid PMU yet.
> - * Return the common value in global x86_pmu,
> - * which available for all cores.
> - */
> - cap->num_counters_gp = x86_pmu.num_counters;
> + /* KVM doesn't support the hybrid PMU yet. */
> + cap->num_counters_gp = is_hybrid() ? 0 : x86_pmu.num_counters;
That's just the PMU. Arguably if you don't handle hybrid affinity you
shouldn't report the hybrid bit ever to the guest. So need more than that.
But I guess Peter is concerned about the case when an old KVM is the
host. I think a short term workaround for that is fine, but I don't
think it's a good idea to completely disable it since that will break
future setups with correct hybrid hypervisor too. We already probe the
PMU MSRs, can't we detect this case there too?
-Andi