When running pmu test on Intel Sapphire Rapids, we found several
failures are encountered, such as "llc misses" failure, "all counters"
failure and "fixed counter 3" failure.
Intel Sapphire Rapids introduces new fixed counter 3, total PMU counters
including GP and fixed counters increase to 12 and also optimizes cache
subsystem. All these changes make the original assumptions in pmu test
unavailable any more on Sapphire Rapids. Patches 2-4 fixes these
failures, patch 0 remove the duplicate code and patch 5 adds assert to
ensure predefine fixed events are matched with HW fixed counters.
Dapeng Mi (4):
x86: pmu: Change the minimum value of llc_misses event to 0
x86: pmu: Enlarge cnt array length to 64 in check_counters_many()
x86: pmu: Support validation for Intel PMU fixed counter 3
x86: pmu: Add asserts to warn inconsistent fixed events and counters
Xiong Zhang (1):
x86: pmu: Remove duplicate code in pmu_init()
lib/x86/pmu.c | 5 -----
x86/pmu.c | 17 ++++++++++++-----
2 files changed, 12 insertions(+), 10 deletions(-)
base-commit: bfe5d7d0e14c8199d134df84d6ae8487a9772c48
--
2.34.1
Along with the CPU HW's upgrade and optimization, the count of LLC
misses event for running loop() helper could be 0 just like seen on
Sapphire Rapids.
So modify the lower limit of possible count range for LLC misses
events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
Signed-off-by: Dapeng Mi <[email protected]>
---
x86/pmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/x86/pmu.c b/x86/pmu.c
index 0def28695c70..7443fdab5c8a 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -35,7 +35,7 @@ struct pmu_event {
{"instructions", 0x00c0, 10*N, 10.2*N},
{"ref cycles", 0x013c, 1*N, 30*N},
{"llc references", 0x4f2e, 1, 2*N},
- {"llc misses", 0x412e, 1, 1*N},
+ {"llc misses", 0x412e, 0, 1*N},
{"branches", 0x00c4, 1*N, 1.1*N},
{"branch misses", 0x00c5, 0, 0.1*N},
}, amd_gp_events[] = {
--
2.34.1
From: Xiong Zhang <[email protected]>
There are totally same code in pmu_init() helper, remove the duplicate
code.
Signed-off-by: Xiong Zhang <[email protected]>
Signed-off-by: Dapeng Mi <[email protected]>
---
lib/x86/pmu.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index 0f2afd650bc9..d06e94553024 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -16,11 +16,6 @@ void pmu_init(void)
pmu.fixed_counter_width = (cpuid_10.d >> 5) & 0xff;
}
- if (pmu.version > 1) {
- pmu.nr_fixed_counters = cpuid_10.d & 0x1f;
- pmu.fixed_counter_width = (cpuid_10.d >> 5) & 0xff;
- }
-
pmu.nr_gp_counters = (cpuid_10.a >> 8) & 0xff;
pmu.gp_counter_width = (cpuid_10.a >> 16) & 0xff;
pmu.gp_counter_mask_length = (cpuid_10.a >> 24) & 0xff;
--
2.34.1
Considering there are already 8 GP counters and 4 fixed counters on
latest Intel CPUs, like Sapphire Rapids. The original cnt array length
10 is definitely not enough to cover all supported PMU counters on these
new CPUs and it would cause PMU counter validation failures.
It's probably more and more GP and fixed counters are introduced in the
future and then directly extends the cnt array length to 64.
Signed-off-by: Dapeng Mi <[email protected]>
---
x86/pmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/x86/pmu.c b/x86/pmu.c
index 7443fdab5c8a..1bebf493d4a4 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -254,7 +254,7 @@ static void check_fixed_counters(void)
static void check_counters_many(void)
{
- pmu_counter_t cnt[10];
+ pmu_counter_t cnt[64];
int i, n;
for (i = 0, n = 0; n < pmu.nr_gp_counters; i++) {
--
2.34.1
Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
(fixed counter 3) to counter/sample topdown.slots event, but current
code still doesn't cover this new fixed counter.
So add code to validate this new fixed counter.
Signed-off-by: Dapeng Mi <[email protected]>
---
x86/pmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/x86/pmu.c b/x86/pmu.c
index 1bebf493d4a4..41165e168d8e 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -46,7 +46,8 @@ struct pmu_event {
}, fixed_events[] = {
{"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
{"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
- {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
+ {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N},
+ {"fixed 4", MSR_CORE_PERF_FIXED_CTR0 + 3, 1*N, 100*N}
};
char *buf;
--
2.34.1
Current PMU code deosn't check whether PMU fixed counter number is
larger than pre-defined fixed events. If so, it would cause memory
access out of range.
So add assert to warn this invalid case.
Signed-off-by: Dapeng Mi <[email protected]>
---
x86/pmu.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/x86/pmu.c b/x86/pmu.c
index 41165e168d8e..a1d615fbab3d 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -112,8 +112,12 @@ static struct pmu_event* get_counter_event(pmu_counter_t *cnt)
for (i = 0; i < gp_events_size; i++)
if (gp_events[i].unit_sel == (cnt->config & 0xffff))
return &gp_events[i];
- } else
- return &fixed_events[cnt->ctr - MSR_CORE_PERF_FIXED_CTR0];
+ } else {
+ int idx = cnt->ctr - MSR_CORE_PERF_FIXED_CTR0;
+
+ assert(idx < ARRAY_SIZE(fixed_events));
+ return &fixed_events[idx];
+ }
return (void*)0;
}
@@ -246,6 +250,7 @@ static void check_fixed_counters(void)
};
int i;
+ assert(pmu.nr_fixed_counters <= ARRAY_SIZE(fixed_events));
for (i = 0; i < pmu.nr_fixed_counters; i++) {
cnt.ctr = fixed_events[i].unit_sel;
measure_one(&cnt);
@@ -267,6 +272,7 @@ static void check_counters_many(void)
gp_events[i % gp_events_size].unit_sel;
n++;
}
+ assert(pmu.nr_fixed_counters <= ARRAY_SIZE(fixed_events));
for (i = 0; i < pmu.nr_fixed_counters; i++) {
cnt[n].ctr = fixed_events[i].unit_sel;
cnt[n].config = EVNTSEL_OS | EVNTSEL_USR;
--
2.34.1
On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <[email protected]> wrote:
>
> Along with the CPU HW's upgrade and optimization, the count of LLC
> misses event for running loop() helper could be 0 just like seen on
> Sapphire Rapids.
>
> So modify the lower limit of possible count range for LLC misses
> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
I'm not convinced that these tests are really indicative of whether or
not the PMU is working properly. If 0 is allowed for llc misses, for
instance, doesn't this sub-test pass even when the PMU is disabled?
Surely, we can do better.
> Signed-off-by: Dapeng Mi <[email protected]>
> ---
> x86/pmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/x86/pmu.c b/x86/pmu.c
> index 0def28695c70..7443fdab5c8a 100644
> --- a/x86/pmu.c
> +++ b/x86/pmu.c
> @@ -35,7 +35,7 @@ struct pmu_event {
> {"instructions", 0x00c0, 10*N, 10.2*N},
> {"ref cycles", 0x013c, 1*N, 30*N},
> {"llc references", 0x4f2e, 1, 2*N},
> - {"llc misses", 0x412e, 1, 1*N},
> + {"llc misses", 0x412e, 0, 1*N},
> {"branches", 0x00c4, 1*N, 1.1*N},
> {"branch misses", 0x00c5, 0, 0.1*N},
> }, amd_gp_events[] = {
> --
> 2.34.1
>
On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <[email protected]> wrote:
>
> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
> (fixed counter 3) to counter/sample topdown.slots event, but current
> code still doesn't cover this new fixed counter.
>
> So add code to validate this new fixed counter.
Can you explain how this "validates" anything?
> Signed-off-by: Dapeng Mi <[email protected]>
> ---
> x86/pmu.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/x86/pmu.c b/x86/pmu.c
> index 1bebf493d4a4..41165e168d8e 100644
> --- a/x86/pmu.c
> +++ b/x86/pmu.c
> @@ -46,7 +46,8 @@ struct pmu_event {
> }, fixed_events[] = {
> {"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
> {"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
> - {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
> + {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N},
> + {"fixed 4", MSR_CORE_PERF_FIXED_CTR0 + 3, 1*N, 100*N}
> };
>
> char *buf;
> --
> 2.34.1
>
On 10/24/2023 9:03 PM, Jim Mattson wrote:
> On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <[email protected]> wrote:
>> Along with the CPU HW's upgrade and optimization, the count of LLC
>> misses event for running loop() helper could be 0 just like seen on
>> Sapphire Rapids.
>>
>> So modify the lower limit of possible count range for LLC misses
>> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
> I'm not convinced that these tests are really indicative of whether or
> not the PMU is working properly. If 0 is allowed for llc misses, for
> instance, doesn't this sub-test pass even when the PMU is disabled?
>
> Surely, we can do better.
Considering the testing workload is just a simple adding loop, it's
reasonable and possible that it gets a 0 result for LLC misses and
branch misses events. Yeah, I agree the 0 count makes the results not so
credible. If we want to avoid these 0 count values, we may have to
complicate the workload, such as adding flush cache instructions, or
something like that (I'm not sure if there are instructions which can
force branch misses). How's your idea about this?
>
>> Signed-off-by: Dapeng Mi <[email protected]>
>> ---
>> x86/pmu.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/x86/pmu.c b/x86/pmu.c
>> index 0def28695c70..7443fdab5c8a 100644
>> --- a/x86/pmu.c
>> +++ b/x86/pmu.c
>> @@ -35,7 +35,7 @@ struct pmu_event {
>> {"instructions", 0x00c0, 10*N, 10.2*N},
>> {"ref cycles", 0x013c, 1*N, 30*N},
>> {"llc references", 0x4f2e, 1, 2*N},
>> - {"llc misses", 0x412e, 1, 1*N},
>> + {"llc misses", 0x412e, 0, 1*N},
>> {"branches", 0x00c4, 1*N, 1.1*N},
>> {"branch misses", 0x00c5, 0, 0.1*N},
>> }, amd_gp_events[] = {
>> --
>> 2.34.1
>>
On 10/25/2023 3:05 AM, Jim Mattson wrote:
> On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <[email protected]> wrote:
>> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
>> (fixed counter 3) to counter/sample topdown.slots event, but current
>> code still doesn't cover this new fixed counter.
>>
>> So add code to validate this new fixed counter.
> Can you explain how this "validates" anything?
I may not describe the sentence clearly. This would validate the fixed
counter 3 can count the slots event and get a valid count in a
reasonable range. Thanks.
>
>> Signed-off-by: Dapeng Mi <[email protected]>
>> ---
>> x86/pmu.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/x86/pmu.c b/x86/pmu.c
>> index 1bebf493d4a4..41165e168d8e 100644
>> --- a/x86/pmu.c
>> +++ b/x86/pmu.c
>> @@ -46,7 +46,8 @@ struct pmu_event {
>> }, fixed_events[] = {
>> {"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
>> {"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
>> - {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
>> + {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N},
>> + {"fixed 4", MSR_CORE_PERF_FIXED_CTR0 + 3, 1*N, 100*N}
>> };
>>
>> char *buf;
>> --
>> 2.34.1
>>
On Wed, Oct 25, 2023 at 4:23 AM Mi, Dapeng <[email protected]> wrote:
>
>
> On 10/24/2023 9:03 PM, Jim Mattson wrote:
> > On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <[email protected]> wrote:
> >> Along with the CPU HW's upgrade and optimization, the count of LLC
> >> misses event for running loop() helper could be 0 just like seen on
> >> Sapphire Rapids.
> >>
> >> So modify the lower limit of possible count range for LLC misses
> >> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
> > I'm not convinced that these tests are really indicative of whether or
> > not the PMU is working properly. If 0 is allowed for llc misses, for
> > instance, doesn't this sub-test pass even when the PMU is disabled?
> >
> > Surely, we can do better.
>
>
> Considering the testing workload is just a simple adding loop, it's
> reasonable and possible that it gets a 0 result for LLC misses and
> branch misses events. Yeah, I agree the 0 count makes the results not so
> credible. If we want to avoid these 0 count values, we may have to
> complicate the workload, such as adding flush cache instructions, or
> something like that (I'm not sure if there are instructions which can
> force branch misses). How's your idea about this?
CLFLUSH is probably a good way to ensure cache misses. IBPB may be a
good way to ensure branch mispredictions, or IBRS on parts without
eIBRS.
>
> >
> >> Signed-off-by: Dapeng Mi <[email protected]>
> >> ---
> >> x86/pmu.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/x86/pmu.c b/x86/pmu.c
> >> index 0def28695c70..7443fdab5c8a 100644
> >> --- a/x86/pmu.c
> >> +++ b/x86/pmu.c
> >> @@ -35,7 +35,7 @@ struct pmu_event {
> >> {"instructions", 0x00c0, 10*N, 10.2*N},
> >> {"ref cycles", 0x013c, 1*N, 30*N},
> >> {"llc references", 0x4f2e, 1, 2*N},
> >> - {"llc misses", 0x412e, 1, 1*N},
> >> + {"llc misses", 0x412e, 0, 1*N},
> >> {"branches", 0x00c4, 1*N, 1.1*N},
> >> {"branch misses", 0x00c5, 0, 0.1*N},
> >> }, amd_gp_events[] = {
> >> --
> >> 2.34.1
> >>
On Wed, Oct 25, 2023 at 4:26 AM Mi, Dapeng <[email protected]> wrote:
>
>
> On 10/25/2023 3:05 AM, Jim Mattson wrote:
> > On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <[email protected]> wrote:
> >> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
> >> (fixed counter 3) to counter/sample topdown.slots event, but current
> >> code still doesn't cover this new fixed counter.
> >>
> >> So add code to validate this new fixed counter.
> > Can you explain how this "validates" anything?
>
>
> I may not describe the sentence clearly. This would validate the fixed
> counter 3 can count the slots event and get a valid count in a
> reasonable range. Thanks.
I thought the current vPMU implementation did not actually support
top-down slots. If it doesn't work, how can it be validated?
>
> >
> >> Signed-off-by: Dapeng Mi <[email protected]>
> >> ---
> >> x86/pmu.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/x86/pmu.c b/x86/pmu.c
> >> index 1bebf493d4a4..41165e168d8e 100644
> >> --- a/x86/pmu.c
> >> +++ b/x86/pmu.c
> >> @@ -46,7 +46,8 @@ struct pmu_event {
> >> }, fixed_events[] = {
> >> {"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
> >> {"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
> >> - {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
> >> + {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N},
> >> + {"fixed 4", MSR_CORE_PERF_FIXED_CTR0 + 3, 1*N, 100*N}
> >> };
> >>
> >> char *buf;
> >> --
> >> 2.34.1
> >>
On Tue, Oct 24, 2023, Dapeng Mi wrote:
> When running pmu test on Intel Sapphire Rapids, we found several
> failures are encountered, such as "llc misses" failure, "all counters"
> failure and "fixed counter 3" failure.
hmm, I have tested your series on a SPR machine. It looks like, all "llc
misses" already pass on my side. "all counters" always fail with/without
your patches. "fixed counter 3" never exists... I have "fixed
cntr-{0,1,2}" and "fixed-{0,1,2}"
You may want to double check the requirements of your series. Not just
under your setting without explainning those setting in detail.
Maybe what I am missing is your topdown series? So, before your topdown
series checked in. I don't see value in this series.
Thanks.
-Mingwei
>
> Intel Sapphire Rapids introduces new fixed counter 3, total PMU counters
> including GP and fixed counters increase to 12 and also optimizes cache
> subsystem. All these changes make the original assumptions in pmu test
> unavailable any more on Sapphire Rapids. Patches 2-4 fixes these
> failures, patch 0 remove the duplicate code and patch 5 adds assert to
> ensure predefine fixed events are matched with HW fixed counters.
>
> Dapeng Mi (4):
> x86: pmu: Change the minimum value of llc_misses event to 0
> x86: pmu: Enlarge cnt array length to 64 in check_counters_many()
> x86: pmu: Support validation for Intel PMU fixed counter 3
> x86: pmu: Add asserts to warn inconsistent fixed events and counters
>
> Xiong Zhang (1):
> x86: pmu: Remove duplicate code in pmu_init()
>
> lib/x86/pmu.c | 5 -----
> x86/pmu.c | 17 ++++++++++++-----
> 2 files changed, 12 insertions(+), 10 deletions(-)
>
>
> base-commit: bfe5d7d0e14c8199d134df84d6ae8487a9772c48
> --
> 2.34.1
>
On 10/25/2023 8:35 PM, Jim Mattson wrote:
> On Wed, Oct 25, 2023 at 4:23 AM Mi, Dapeng <[email protected]> wrote:
>>
>> On 10/24/2023 9:03 PM, Jim Mattson wrote:
>>> On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <[email protected]> wrote:
>>>> Along with the CPU HW's upgrade and optimization, the count of LLC
>>>> misses event for running loop() helper could be 0 just like seen on
>>>> Sapphire Rapids.
>>>>
>>>> So modify the lower limit of possible count range for LLC misses
>>>> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
>>> I'm not convinced that these tests are really indicative of whether or
>>> not the PMU is working properly. If 0 is allowed for llc misses, for
>>> instance, doesn't this sub-test pass even when the PMU is disabled?
>>>
>>> Surely, we can do better.
>>
>> Considering the testing workload is just a simple adding loop, it's
>> reasonable and possible that it gets a 0 result for LLC misses and
>> branch misses events. Yeah, I agree the 0 count makes the results not so
>> credible. If we want to avoid these 0 count values, we may have to
>> complicate the workload, such as adding flush cache instructions, or
>> something like that (I'm not sure if there are instructions which can
>> force branch misses). How's your idea about this?
> CLFLUSH is probably a good way to ensure cache misses. IBPB may be a
> good way to ensure branch mispredictions, or IBRS on parts without
> eIBRS.
Thanks Jim for the information. I'm not familiar with IBPB/IBRS
instructions, but just a glance, it looks there two instructions are
some kind of advanced instructions, Not all Intel CPUs support these
instructions and not sure if AMD has similar instructions. It would be
better if there are more generic instruction to trigger branch miss.
Anyway I would look at the details and come back again.
>>>> Signed-off-by: Dapeng Mi <[email protected]>
>>>> ---
>>>> x86/pmu.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/x86/pmu.c b/x86/pmu.c
>>>> index 0def28695c70..7443fdab5c8a 100644
>>>> --- a/x86/pmu.c
>>>> +++ b/x86/pmu.c
>>>> @@ -35,7 +35,7 @@ struct pmu_event {
>>>> {"instructions", 0x00c0, 10*N, 10.2*N},
>>>> {"ref cycles", 0x013c, 1*N, 30*N},
>>>> {"llc references", 0x4f2e, 1, 2*N},
>>>> - {"llc misses", 0x412e, 1, 1*N},
>>>> + {"llc misses", 0x412e, 0, 1*N},
>>>> {"branches", 0x00c4, 1*N, 1.1*N},
>>>> {"branch misses", 0x00c5, 0, 0.1*N},
>>>> }, amd_gp_events[] = {
>>>> --
>>>> 2.34.1
>>>>
On 10/25/2023 8:38 PM, Jim Mattson wrote:
> On Wed, Oct 25, 2023 at 4:26 AM Mi, Dapeng <[email protected]> wrote:
>>
>> On 10/25/2023 3:05 AM, Jim Mattson wrote:
>>> On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <[email protected]> wrote:
>>>> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
>>>> (fixed counter 3) to counter/sample topdown.slots event, but current
>>>> code still doesn't cover this new fixed counter.
>>>>
>>>> So add code to validate this new fixed counter.
>>> Can you explain how this "validates" anything?
>>
>> I may not describe the sentence clearly. This would validate the fixed
>> counter 3 can count the slots event and get a valid count in a
>> reasonable range. Thanks.
> I thought the current vPMU implementation did not actually support
> top-down slots. If it doesn't work, how can it be validated?
Ops, you reminds me, I just made a mistake, the kernel which I used
includes the vtopdown supporting patches, so the topdown slots is
supported. Since there are big arguments on the original vtopdown RFC
patches, the topdown metrics feature is probably not to be supported in
current vPMU emulation framework, but the slots events support patches
(the former two patches
https://lore.kernel.org/all/[email protected]/T/#m53883e39177eb9a0d8e23e4c382ddc6190c7f0f4
and
https://lore.kernel.org/all/[email protected]/T/#m1d9c433eb6ce83b32e50f6d976fbfeee2b731fb9)
are still valuable and just a small piece of work and doesn't touch any
perf code. I'd like split these two patches into an independent patchset
and resend to LKML.
>
>>>> Signed-off-by: Dapeng Mi <[email protected]>
>>>> ---
>>>> x86/pmu.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/x86/pmu.c b/x86/pmu.c
>>>> index 1bebf493d4a4..41165e168d8e 100644
>>>> --- a/x86/pmu.c
>>>> +++ b/x86/pmu.c
>>>> @@ -46,7 +46,8 @@ struct pmu_event {
>>>> }, fixed_events[] = {
>>>> {"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
>>>> {"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
>>>> - {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
>>>> + {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N},
>>>> + {"fixed 4", MSR_CORE_PERF_FIXED_CTR0 + 3, 1*N, 100*N}
>>>> };
>>>>
>>>> char *buf;
>>>> --
>>>> 2.34.1
>>>>
On 10/26/2023 7:47 AM, Mingwei Zhang wrote:
> On Tue, Oct 24, 2023, Dapeng Mi wrote:
>> When running pmu test on Intel Sapphire Rapids, we found several
>> failures are encountered, such as "llc misses" failure, "all counters"
>> failure and "fixed counter 3" failure.
> hmm, I have tested your series on a SPR machine. It looks like, all "llc
> misses" already pass on my side. "all counters" always fail with/without
> your patches. "fixed counter 3" never exists... I have "fixed
> cntr-{0,1,2}" and "fixed-{0,1,2}"
1. "LLC misses" failure
Yeah, the "LLC misses" failure is not always seen. I can see the "LLC
misses" 2 ~3 times out of 10 runs of PMU standalone test and you could
see the failure with higher possibility if you run the full
kvm-unit-tests. I think whether you can see the "LLC misses" failure it
really depends on current cache status on your system, how much cache
memory are consumed by other programs. If there are lots of free cache
lines on system when running the pmu test, you may have higher
possibility to see the LLC misses failures just like what I see below.
PASS: Intel: llc references-7
*FAIL*: Intel: llc misses-0
PASS: Intel: llc misses-1
PASS: Intel: llc misses-2
2. "all counters" failure
Actually the "all counters" failure are not always seen, but it doesn't
mean current code is correct. In current code, the length of "cnt[10]"
array in check_counters_many() is defined as 10, but there are at least
11 counters supported (8 GP counters + 3 fixed counters) on SPR even
though fixed counter 3 is not supported in current upstream code.
Obviously there would be out of range memory access in
check_counters_many().
>
> You may want to double check the requirements of your series. Not just
> under your setting without explainning those setting in detail.
>
> Maybe what I am missing is your topdown series? So, before your topdown
> series checked in. I don't see value in this series.
3. "fixed counter 3" failure
Yeah, I just realized I used the kernel which includes the vtopdown
supporting patches after Jim's reminding. As the reply for Jim's
comments says, the patches for support slots event are still valuable
for current emulation framework and I would split them from the original
vtopdown patchset and resend them as an independent patchset. Anyway,
even though there is not slots event support in Kernel, it only impacts
the patch 4/5, other patches are still valuable.
>
> Thanks.
> -Mingwei
>> Intel Sapphire Rapids introduces new fixed counter 3, total PMU counters
>> including GP and fixed counters increase to 12 and also optimizes cache
>> subsystem. All these changes make the original assumptions in pmu test
>> unavailable any more on Sapphire Rapids. Patches 2-4 fixes these
>> failures, patch 0 remove the duplicate code and patch 5 adds assert to
>> ensure predefine fixed events are matched with HW fixed counters.
>>
>> Dapeng Mi (4):
>> x86: pmu: Change the minimum value of llc_misses event to 0
>> x86: pmu: Enlarge cnt array length to 64 in check_counters_many()
>> x86: pmu: Support validation for Intel PMU fixed counter 3
>> x86: pmu: Add asserts to warn inconsistent fixed events and counters
>>
>> Xiong Zhang (1):
>> x86: pmu: Remove duplicate code in pmu_init()
>>
>> lib/x86/pmu.c | 5 -----
>> x86/pmu.c | 17 ++++++++++++-----
>> 2 files changed, 12 insertions(+), 10 deletions(-)
>>
>>
>> base-commit: bfe5d7d0e14c8199d134df84d6ae8487a9772c48
>> --
>> 2.34.1
>>
On Wed, Oct 25, 2023 at 7:14 PM Mi, Dapeng <[email protected]> wrote:
>
>
> On 10/25/2023 8:35 PM, Jim Mattson wrote:
> > On Wed, Oct 25, 2023 at 4:23 AM Mi, Dapeng <[email protected]> wrote:
> >>
> >> On 10/24/2023 9:03 PM, Jim Mattson wrote:
> >>> On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <[email protected]> wrote:
> >>>> Along with the CPU HW's upgrade and optimization, the count of LLC
> >>>> misses event for running loop() helper could be 0 just like seen on
> >>>> Sapphire Rapids.
> >>>>
> >>>> So modify the lower limit of possible count range for LLC misses
> >>>> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
> >>> I'm not convinced that these tests are really indicative of whether or
> >>> not the PMU is working properly. If 0 is allowed for llc misses, for
> >>> instance, doesn't this sub-test pass even when the PMU is disabled?
> >>>
> >>> Surely, we can do better.
> >>
> >> Considering the testing workload is just a simple adding loop, it's
> >> reasonable and possible that it gets a 0 result for LLC misses and
> >> branch misses events. Yeah, I agree the 0 count makes the results not so
> >> credible. If we want to avoid these 0 count values, we may have to
> >> complicate the workload, such as adding flush cache instructions, or
> >> something like that (I'm not sure if there are instructions which can
> >> force branch misses). How's your idea about this?
> > CLFLUSH is probably a good way to ensure cache misses. IBPB may be a
> > good way to ensure branch mispredictions, or IBRS on parts without
> > eIBRS.
>
>
> Thanks Jim for the information. I'm not familiar with IBPB/IBRS
> instructions, but just a glance, it looks there two instructions are
> some kind of advanced instructions, Not all Intel CPUs support these
> instructions and not sure if AMD has similar instructions. It would be
> better if there are more generic instruction to trigger branch miss.
> Anyway I would look at the details and come back again.
IBPB and IBRS are not instructions. IBPB (indirect branch predictor
barrier) is triggered by setting bit 0 of the IA32_PRED_CMD MSR. IBRS
(indirect branch restricted speculation) is triggered by setting bit 0
of the IA32_SPEC_CTRL MSR. It is true that the desired behavior of
IBRS (causing branch mispredictions) is only exhibited by certain
older parts. However, IBPB is now universally available, as it is
necessary to mitigate many speculative execution attacks. For Intel
documentation, see
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/cpuid-enumeration-and-architectural-msrs.html.
If you don't want to use these, you could train a branch to go one way
prior to measurement, and then arrange for the branch under test go
the other way.
> >>>> Signed-off-by: Dapeng Mi <[email protected]>
> >>>> ---
> >>>> x86/pmu.c | 2 +-
> >>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/x86/pmu.c b/x86/pmu.c
> >>>> index 0def28695c70..7443fdab5c8a 100644
> >>>> --- a/x86/pmu.c
> >>>> +++ b/x86/pmu.c
> >>>> @@ -35,7 +35,7 @@ struct pmu_event {
> >>>> {"instructions", 0x00c0, 10*N, 10.2*N},
> >>>> {"ref cycles", 0x013c, 1*N, 30*N},
> >>>> {"llc references", 0x4f2e, 1, 2*N},
> >>>> - {"llc misses", 0x412e, 1, 1*N},
> >>>> + {"llc misses", 0x412e, 0, 1*N},
> >>>> {"branches", 0x00c4, 1*N, 1.1*N},
> >>>> {"branch misses", 0x00c5, 0, 0.1*N},
> >>>> }, amd_gp_events[] = {
> >>>> --
> >>>> 2.34.1
> >>>>
On 10/26/2023 8:19 PM, Jim Mattson wrote:
> On Wed, Oct 25, 2023 at 7:14 PM Mi, Dapeng <[email protected]> wrote:
>>
>> On 10/25/2023 8:35 PM, Jim Mattson wrote:
>>> On Wed, Oct 25, 2023 at 4:23 AM Mi, Dapeng <[email protected]> wrote:
>>>> On 10/24/2023 9:03 PM, Jim Mattson wrote:
>>>>> On Tue, Oct 24, 2023 at 12:51 AM Dapeng Mi <[email protected]> wrote:
>>>>>> Along with the CPU HW's upgrade and optimization, the count of LLC
>>>>>> misses event for running loop() helper could be 0 just like seen on
>>>>>> Sapphire Rapids.
>>>>>>
>>>>>> So modify the lower limit of possible count range for LLC misses
>>>>>> events to 0 to avoid LLC misses event test failure on Sapphire Rapids.
>>>>> I'm not convinced that these tests are really indicative of whether or
>>>>> not the PMU is working properly. If 0 is allowed for llc misses, for
>>>>> instance, doesn't this sub-test pass even when the PMU is disabled?
>>>>>
>>>>> Surely, we can do better.
>>>> Considering the testing workload is just a simple adding loop, it's
>>>> reasonable and possible that it gets a 0 result for LLC misses and
>>>> branch misses events. Yeah, I agree the 0 count makes the results not so
>>>> credible. If we want to avoid these 0 count values, we may have to
>>>> complicate the workload, such as adding flush cache instructions, or
>>>> something like that (I'm not sure if there are instructions which can
>>>> force branch misses). How's your idea about this?
>>> CLFLUSH is probably a good way to ensure cache misses. IBPB may be a
>>> good way to ensure branch mispredictions, or IBRS on parts without
>>> eIBRS.
>>
>> Thanks Jim for the information. I'm not familiar with IBPB/IBRS
>> instructions, but just a glance, it looks there two instructions are
>> some kind of advanced instructions, Not all Intel CPUs support these
>> instructions and not sure if AMD has similar instructions. It would be
>> better if there are more generic instruction to trigger branch miss.
>> Anyway I would look at the details and come back again.
> IBPB and IBRS are not instructions. IBPB (indirect branch predictor
> barrier) is triggered by setting bit 0 of the IA32_PRED_CMD MSR. IBRS
> (indirect branch restricted speculation) is triggered by setting bit 0
> of the IA32_SPEC_CTRL MSR. It is true that the desired behavior of
> IBRS (causing branch mispredictions) is only exhibited by certain
> older parts. However, IBPB is now universally available, as it is
> necessary to mitigate many speculative execution attacks. For Intel
> documentation, see
> https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/cpuid-enumeration-and-architectural-msrs.html.
>
> If you don't want to use these, you could train a branch to go one way
> prior to measurement, and then arrange for the branch under test go
> the other way.
Thanks Jim. From my point of view, IBPB is still some kind of extended
feature which may be not supported on some older platforms. Considering
kvm-unit-tests could still be run on these old platforms, IBPB seems not
the best choice. I'm thinking an alternative way is to use the 'rdrand'
instruction to get a random value, and then call jmp instruction base on
the random value results. That would definitely cause branch misses.
This looks more generic.
>
>>>>>> Signed-off-by: Dapeng Mi <[email protected]>
>>>>>> ---
>>>>>> x86/pmu.c | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/x86/pmu.c b/x86/pmu.c
>>>>>> index 0def28695c70..7443fdab5c8a 100644
>>>>>> --- a/x86/pmu.c
>>>>>> +++ b/x86/pmu.c
>>>>>> @@ -35,7 +35,7 @@ struct pmu_event {
>>>>>> {"instructions", 0x00c0, 10*N, 10.2*N},
>>>>>> {"ref cycles", 0x013c, 1*N, 30*N},
>>>>>> {"llc references", 0x4f2e, 1, 2*N},
>>>>>> - {"llc misses", 0x412e, 1, 1*N},
>>>>>> + {"llc misses", 0x412e, 0, 1*N},
>>>>>> {"branches", 0x00c4, 1*N, 1.1*N},
>>>>>> {"branch misses", 0x00c5, 0, 0.1*N},
>>>>>> }, amd_gp_events[] = {
>>>>>> --
>>>>>> 2.34.1
>>>>>>
On Thu, Oct 26, 2023, Mi, Dapeng wrote:
> On 10/26/2023 7:47 AM, Mingwei Zhang wrote:
> > On Tue, Oct 24, 2023, Dapeng Mi wrote:
> > > When running pmu test on Intel Sapphire Rapids, we found several
> > > failures are encountered, such as "llc misses" failure, "all counters"
> > > failure and "fixed counter 3" failure.
> > hmm, I have tested your series on a SPR machine. It looks like, all "llc
> > misses" already pass on my side. "all counters" always fail with/without
> > your patches. "fixed counter 3" never exists... I have "fixed
> > cntr-{0,1,2}" and "fixed-{0,1,2}"
>
> 1. "LLC misses" failure
>
> Yeah, the "LLC misses" failure is not always seen. I can see the "LLC?
> misses" 2 ~3 times out of 10 runs of PMU standalone test and you could see
> the failure with higher possibility if you run the full kvm-unit-tests. I
> think whether you can see the "LLC misses" failure it really depends on
> current cache status on your system, how much cache memory are consumed by
> other programs. If there are lots of free cache lines on system when running
> the pmu test, you may have higher possibility to see the LLC misses failures
> just like what I see below.
>
> PASS: Intel: llc references-7
> *FAIL*: Intel: llc misses-0
> PASS: Intel: llc misses-1
> PASS: Intel: llc misses-2
>
> 2. "all counters" failure
>
> Actually the "all counters" failure are not always seen, but it doesn't mean
> current code is correct. In current code, the length of "cnt[10]" array in
> check_counters_many() is defined as 10, but there are at least 11 counters
> supported (8 GP counters + 3 fixed counters) on SPR even though fixed
> counter 3 is not supported in current upstream code. Obviously there would
> be out of range memory access in check_counters_many().
>
ok, I will double check on these. Thanks.
> >
> > You may want to double check the requirements of your series. Not just
> > under your setting without explainning those setting in detail.
> >
> > Maybe what I am missing is your topdown series? So, before your topdown
> > series checked in. I don't see value in this series.
>
> 3. "fixed counter 3" failure
>
> Yeah, I just realized I used the kernel which includes the vtopdown
> supporting patches after Jim's reminding. As the reply for Jim's comments
> says, the patches for support slots event are still valuable for current
> emulation framework and I would split them from the original vtopdown
> patchset and resend them as an independent patchset. Anyway, even though
> there is not slots event support in Kernel, it only impacts the patch 4/5,
> other patches are still valuable.
>
>
> >
> > Thanks.
> > -Mingwei
> > > Intel Sapphire Rapids introduces new fixed counter 3, total PMU counters
> > > including GP and fixed counters increase to 12 and also optimizes cache
> > > subsystem. All these changes make the original assumptions in pmu test
> > > unavailable any more on Sapphire Rapids. Patches 2-4 fixes these
> > > failures, patch 0 remove the duplicate code and patch 5 adds assert to
> > > ensure predefine fixed events are matched with HW fixed counters.
> > >
> > > Dapeng Mi (4):
> > > x86: pmu: Change the minimum value of llc_misses event to 0
> > > x86: pmu: Enlarge cnt array length to 64 in check_counters_many()
> > > x86: pmu: Support validation for Intel PMU fixed counter 3
> > > x86: pmu: Add asserts to warn inconsistent fixed events and counters
> > >
> > > Xiong Zhang (1):
> > > x86: pmu: Remove duplicate code in pmu_init()
> > >
> > > lib/x86/pmu.c | 5 -----
> > > x86/pmu.c | 17 ++++++++++++-----
> > > 2 files changed, 12 insertions(+), 10 deletions(-)
> > >
> > >
> > > base-commit: bfe5d7d0e14c8199d134df84d6ae8487a9772c48
> > > --
> > > 2.34.1
> > >