When running pmu test on Intel Sapphire Rapids, we found several
failures are encountered, such as "llc misses" failure, "all counters"
failure and "fixed counter 3" failure.
Intel Sapphire Rapids introduces new fixed counter 3, total PMU counters
including GP and fixed counters increase to 12 and also optimizes cache
subsystem. All these changes make the original assumptions in pmu test
be unavailable any more on Sapphire Rapids. Patches 2-4 fixes these
failures, especially patch 2 improves current loop() function and ensure
the LLC/branch misses are always be triggered and don't depend on the
possibility like before, patch 1 removes the duplicate code and patch 5
adds asserts to ensure pre-defined fixed events are matched with HW fixed
counters.
Plese note 1) this patchset depends on the Kernel patches "Enable topdown
slots event in vPMU" 2) this patchset is only tested on Intel Sapphire
rapids platform, the tests on other platforms are welcomed.
Dapeng Mi (4):
x86: pmu: Improve loop() to force to generate llc/branch misses
x86: pmu: Enlarge cnt array length to 64 in check_counters_many()
x86: pmu: Support validation for Intel PMU fixed counter 3
x86: pmu: Add asserts to warn inconsistent fixed events and counters
Xiong Zhang (1):
x86: pmu: Remove duplicate code in pmu_init()
lib/x86/pmu.c | 5 -----
x86/pmu.c | 54 +++++++++++++++++++++++++++++++--------------------
2 files changed, 33 insertions(+), 26 deletions(-)
base-commit: bfe5d7d0e14c8199d134df84d6ae8487a9772c48
--
2.34.1
From: Xiong Zhang <[email protected]>
There are totally same code in pmu_init() helper, remove the duplicate
code.
Signed-off-by: Xiong Zhang <[email protected]>
Signed-off-by: Dapeng Mi <[email protected]>
---
lib/x86/pmu.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/lib/x86/pmu.c b/lib/x86/pmu.c
index 0f2afd650bc9..d06e94553024 100644
--- a/lib/x86/pmu.c
+++ b/lib/x86/pmu.c
@@ -16,11 +16,6 @@ void pmu_init(void)
pmu.fixed_counter_width = (cpuid_10.d >> 5) & 0xff;
}
- if (pmu.version > 1) {
- pmu.nr_fixed_counters = cpuid_10.d & 0x1f;
- pmu.fixed_counter_width = (cpuid_10.d >> 5) & 0xff;
- }
-
pmu.nr_gp_counters = (cpuid_10.a >> 8) & 0xff;
pmu.gp_counter_width = (cpuid_10.a >> 16) & 0xff;
pmu.gp_counter_mask_length = (cpuid_10.a >> 24) & 0xff;
--
2.34.1
Current loop() helper is a very simple adding loop function, it can't
garantee that LLC misses and branch misses would be always triggered in
the loop() running, especailly along with the larger and larger LLC size
and better and better branch predictor in new CPUs. In this situation
0 LLC/branch misses count would be seen more and more easily just like
what we see on Sapphire Rapids.
It's ambiguous to take 0 as a valid result in tests since we can't
confirm if the PMU function works correctly or it's just disabled.
So this patch improves current loop() function and introduces random
jump and clflush instructions to force to generate LLC and branch
misses. Since random jump and clflush instructions are involved, all
pre-defined valid count ranges are also update accordingly.
Signed-off-by: Dapeng Mi <[email protected]>
---
x86/pmu.c | 41 +++++++++++++++++++++++------------------
1 file changed, 23 insertions(+), 18 deletions(-)
diff --git a/x86/pmu.c b/x86/pmu.c
index 0def28695c70..1df5794b7ef8 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -11,7 +11,7 @@
#include "libcflat.h"
#include <stdint.h>
-#define N 1000000
+#define N 1000000ULL
// These values match the number of instructions and branches in the
// assembly block in check_emulated_instr().
@@ -28,25 +28,25 @@ typedef struct {
struct pmu_event {
const char *name;
uint32_t unit_sel;
- int min;
- int max;
+ uint64_t min;
+ uint64_t max;
} intel_gp_events[] = {
- {"core cycles", 0x003c, 1*N, 50*N},
+ {"core cycles", 0x003c, 1*N, 500*N},
{"instructions", 0x00c0, 10*N, 10.2*N},
- {"ref cycles", 0x013c, 1*N, 30*N},
- {"llc references", 0x4f2e, 1, 2*N},
- {"llc misses", 0x412e, 1, 1*N},
- {"branches", 0x00c4, 1*N, 1.1*N},
- {"branch misses", 0x00c5, 0, 0.1*N},
+ {"ref cycles", 0x013c, 1*N, 300*N},
+ {"llc references", 0x4f2e, 0.1*N, 2*N},
+ {"llc misses", 0x412e, 0.1*N, 1*N},
+ {"branches", 0x00c4, 3*N, 3.3*N},
+ {"branch misses", 0x00c5, 0.1*N, 0.3*N},
}, amd_gp_events[] = {
- {"core cycles", 0x0076, 1*N, 50*N},
+ {"core cycles", 0x0076, 1*N, 500*N},
{"instructions", 0x00c0, 10*N, 10.2*N},
- {"branches", 0x00c2, 1*N, 1.1*N},
- {"branch misses", 0x00c3, 0, 0.1*N},
+ {"branches", 0x00c2, 3*N, 3.3*N},
+ {"branch misses", 0x00c3, 0.1*N, 0.3*N},
}, fixed_events[] = {
{"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
- {"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
- {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
+ {"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 500*N},
+ {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 300*N},
};
char *buf;
@@ -56,10 +56,15 @@ static unsigned int gp_events_size;
static inline void loop(void)
{
- unsigned long tmp, tmp2, tmp3;
+ unsigned long tmp, tmp2, tmp3, tmp4;
- asm volatile("1: mov (%1), %2; add $64, %1; nop; nop; nop; nop; nop; nop; nop; loop 1b"
- : "=c"(tmp), "=r"(tmp2), "=r"(tmp3): "0"(N), "1"(buf));
+ asm volatile("1: dec %0; jz 3f; mov (%1), %2; add $64, %1; nop; \n"
+ " rdrand %3; and $7, %3; jnz 2f; clflush (%1); jmp 1b\n"
+ "2: nop; jmp 1b;"
+ "3: nop"
+ : "=c"(tmp), "=r"(tmp2), "=r"(tmp3), "=r"(tmp4)
+ : "0"(N), "1"(buf)
+ : "memory");
}
@@ -202,7 +207,7 @@ static noinline void __measure(pmu_counter_t *evt, uint64_t count)
static bool verify_event(uint64_t count, struct pmu_event *e)
{
- // printf("%d <= %ld <= %d\n", e->min, count, e->max);
+ // printf("%ld <= %ld <= %ld\n", e->min, count, e->max);
return count >= e->min && count <= e->max;
}
--
2.34.1
Considering there are already 8 GP counters and 4 fixed counters on
latest Intel CPUs, like Sapphire Rapids. The original cnt array length
10 is definitely not enough to cover all supported PMU counters on these
new CPUs and it would cause PMU counter validation failures.
It's probably more and more GP and fixed counters are introduced in the
future and then directly extends the cnt array length to 64.
Signed-off-by: Dapeng Mi <[email protected]>
---
x86/pmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/x86/pmu.c b/x86/pmu.c
index 1df5794b7ef8..6bd8f6d53f55 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -259,7 +259,7 @@ static void check_fixed_counters(void)
static void check_counters_many(void)
{
- pmu_counter_t cnt[10];
+ pmu_counter_t cnt[64];
int i, n;
for (i = 0, n = 0; n < pmu.nr_gp_counters; i++) {
--
2.34.1
Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
(fixed counter 3) to counter/sample topdown.slots event, but current
code still doesn't cover this new fixed counter.
So this patch adds code to validate this new fixed counter can count
slots event correctly.
Signed-off-by: Dapeng Mi <[email protected]>
---
x86/pmu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/x86/pmu.c b/x86/pmu.c
index 6bd8f6d53f55..404dc7b62ac2 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -47,6 +47,7 @@ struct pmu_event {
{"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
{"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 500*N},
{"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 300*N},
+ {"fixed 4", MSR_CORE_PERF_FIXED_CTR0 + 3, 1*N, 5000*N},
};
char *buf;
--
2.34.1
Current PMU code doesn't check whether the number of fixed counters is
larger than pre-defined fixed events. If so, it would cause out of range
memory access.
So add asserts to warn this invalid case.
Signed-off-by: Dapeng Mi <[email protected]>
---
x86/pmu.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/x86/pmu.c b/x86/pmu.c
index 404dc7b62ac2..3ce05f0a1d38 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -117,8 +117,12 @@ static struct pmu_event* get_counter_event(pmu_counter_t *cnt)
for (i = 0; i < gp_events_size; i++)
if (gp_events[i].unit_sel == (cnt->config & 0xffff))
return &gp_events[i];
- } else
- return &fixed_events[cnt->ctr - MSR_CORE_PERF_FIXED_CTR0];
+ } else {
+ int idx = cnt->ctr - MSR_CORE_PERF_FIXED_CTR0;
+
+ assert(idx < ARRAY_SIZE(fixed_events));
+ return &fixed_events[idx];
+ }
return (void*)0;
}
@@ -251,6 +255,7 @@ static void check_fixed_counters(void)
};
int i;
+ assert(pmu.nr_fixed_counters <= ARRAY_SIZE(fixed_events));
for (i = 0; i < pmu.nr_fixed_counters; i++) {
cnt.ctr = fixed_events[i].unit_sel;
measure_one(&cnt);
@@ -272,6 +277,7 @@ static void check_counters_many(void)
gp_events[i % gp_events_size].unit_sel;
n++;
}
+ assert(pmu.nr_fixed_counters <= ARRAY_SIZE(fixed_events));
for (i = 0; i < pmu.nr_fixed_counters; i++) {
cnt[n].ctr = fixed_events[i].unit_sel;
cnt[n].config = EVNTSEL_OS | EVNTSEL_USR;
--
2.34.1
On Tue, Oct 31, 2023 at 2:22 AM Dapeng Mi <[email protected]> wrote:
>
> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
> (fixed counter 3) to counter/sample topdown.slots event, but current
> code still doesn't cover this new fixed counter.
>
> So this patch adds code to validate this new fixed counter can count
> slots event correctly.
I'm not convinced that this actually validates anything.
Suppose, for example, that KVM used fixed counter 1 when the guest
asked for fixed counter 3. Wouldn't this test still pass?
> Signed-off-by: Dapeng Mi <[email protected]>
> ---
> x86/pmu.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/x86/pmu.c b/x86/pmu.c
> index 6bd8f6d53f55..404dc7b62ac2 100644
> --- a/x86/pmu.c
> +++ b/x86/pmu.c
> @@ -47,6 +47,7 @@ struct pmu_event {
> {"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
> {"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 500*N},
> {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 300*N},
> + {"fixed 4", MSR_CORE_PERF_FIXED_CTR0 + 3, 1*N, 5000*N},
> };
>
> char *buf;
> --
> 2.34.1
>
On 11/1/2023 2:47 AM, Jim Mattson wrote:
> On Tue, Oct 31, 2023 at 2:22 AM Dapeng Mi <[email protected]> wrote:
>> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
>> (fixed counter 3) to counter/sample topdown.slots event, but current
>> code still doesn't cover this new fixed counter.
>>
>> So this patch adds code to validate this new fixed counter can count
>> slots event correctly.
> I'm not convinced that this actually validates anything.
>
> Suppose, for example, that KVM used fixed counter 1 when the guest
> asked for fixed counter 3. Wouldn't this test still pass?
Per my understanding, as long as the KVM returns a valid count in the
reasonable count range, we can think KVM works correctly. We don't need
to entangle on how KVM really uses the HW, it could be impossible and
unnecessary.
Yeah, currently the predefined valid count range may be some kind of
loose since I want to cover as much as hardwares and avoid to cause
regression. Especially after introducing the random jump and clflush
instructions, the cycles and slots become much more hard to predict.
Maybe we can have a comparable restricted count range in the initial
change, and we can loosen the restriction then if we encounter a failure
on some specific hardware. do you think it's better? Thanks.
>
>> Signed-off-by: Dapeng Mi <[email protected]>
>> ---
>> x86/pmu.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/x86/pmu.c b/x86/pmu.c
>> index 6bd8f6d53f55..404dc7b62ac2 100644
>> --- a/x86/pmu.c
>> +++ b/x86/pmu.c
>> @@ -47,6 +47,7 @@ struct pmu_event {
>> {"fixed 1", MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
>> {"fixed 2", MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 500*N},
>> {"fixed 3", MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 300*N},
>> + {"fixed 4", MSR_CORE_PERF_FIXED_CTR0 + 3, 1*N, 5000*N},
>> };
>>
>> char *buf;
>> --
>> 2.34.1
>>
On Tue, Oct 31, 2023 at 7:33 PM Mi, Dapeng <[email protected]> wrote:
>
>
> On 11/1/2023 2:47 AM, Jim Mattson wrote:
> > On Tue, Oct 31, 2023 at 2:22 AM Dapeng Mi <[email protected]> wrote:
> >> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
> >> (fixed counter 3) to counter/sample topdown.slots event, but current
> >> code still doesn't cover this new fixed counter.
> >>
> >> So this patch adds code to validate this new fixed counter can count
> >> slots event correctly.
> > I'm not convinced that this actually validates anything.
> >
> > Suppose, for example, that KVM used fixed counter 1 when the guest
> > asked for fixed counter 3. Wouldn't this test still pass?
>
>
> Per my understanding, as long as the KVM returns a valid count in the
> reasonable count range, we can think KVM works correctly. We don't need
> to entangle on how KVM really uses the HW, it could be impossible and
> unnecessary.
Now, I see how the Pentium FDIV bug escaped notice. Hey, the numbers
are in a reasonable range. What's everyone upset about?
> Yeah, currently the predefined valid count range may be some kind of
> loose since I want to cover as much as hardwares and avoid to cause
> regression. Especially after introducing the random jump and clflush
> instructions, the cycles and slots become much more hard to predict.
> Maybe we can have a comparable restricted count range in the initial
> change, and we can loosen the restriction then if we encounter a failure
> on some specific hardware. do you think it's better? Thanks.
I think the test is essentially useless, and should probably just be
deleted, so that it doesn't give a false sense of confidence.
On 11/1/2023 10:47 AM, Jim Mattson wrote:
> On Tue, Oct 31, 2023 at 7:33 PM Mi, Dapeng <[email protected]> wrote:
>>
>> On 11/1/2023 2:47 AM, Jim Mattson wrote:
>>> On Tue, Oct 31, 2023 at 2:22 AM Dapeng Mi <[email protected]> wrote:
>>>> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
>>>> (fixed counter 3) to counter/sample topdown.slots event, but current
>>>> code still doesn't cover this new fixed counter.
>>>>
>>>> So this patch adds code to validate this new fixed counter can count
>>>> slots event correctly.
>>> I'm not convinced that this actually validates anything.
>>>
>>> Suppose, for example, that KVM used fixed counter 1 when the guest
>>> asked for fixed counter 3. Wouldn't this test still pass?
>>
>> Per my understanding, as long as the KVM returns a valid count in the
>> reasonable count range, we can think KVM works correctly. We don't need
>> to entangle on how KVM really uses the HW, it could be impossible and
>> unnecessary.
> Now, I see how the Pentium FDIV bug escaped notice. Hey, the numbers
> are in a reasonable range. What's everyone upset about?
>
>> Yeah, currently the predefined valid count range may be some kind of
>> loose since I want to cover as much as hardwares and avoid to cause
>> regression. Especially after introducing the random jump and clflush
>> instructions, the cycles and slots become much more hard to predict.
>> Maybe we can have a comparable restricted count range in the initial
>> change, and we can loosen the restriction then if we encounter a failure
>> on some specific hardware. do you think it's better? Thanks.
> I think the test is essentially useless, and should probably just be
> deleted, so that it doesn't give a false sense of confidence.
IMO, I can't say the tests are totally useless. Yes, passing the tests
doesn't mean the KVM vPMU must work correctly, but we can say there is
something probably wrong if it fails to pass these tests. Considering
the hardware differences, it's impossible to set an exact value for
these events in advance and it seems there is no better method to verify
the PMC count as well. I still prefer to keep these tests until we have
a better method to verify the accuracy of the PMC count.
On Tue, Oct 31, 2023 at 8:16 PM Mi, Dapeng <[email protected]> wrote:
>
>
> On 11/1/2023 10:47 AM, Jim Mattson wrote:
> > On Tue, Oct 31, 2023 at 7:33 PM Mi, Dapeng <[email protected]> wrote:
> >>
> >> On 11/1/2023 2:47 AM, Jim Mattson wrote:
> >>> On Tue, Oct 31, 2023 at 2:22 AM Dapeng Mi <[email protected]> wrote:
> >>>> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
> >>>> (fixed counter 3) to counter/sample topdown.slots event, but current
> >>>> code still doesn't cover this new fixed counter.
> >>>>
> >>>> So this patch adds code to validate this new fixed counter can count
> >>>> slots event correctly.
> >>> I'm not convinced that this actually validates anything.
> >>>
> >>> Suppose, for example, that KVM used fixed counter 1 when the guest
> >>> asked for fixed counter 3. Wouldn't this test still pass?
> >>
> >> Per my understanding, as long as the KVM returns a valid count in the
> >> reasonable count range, we can think KVM works correctly. We don't need
> >> to entangle on how KVM really uses the HW, it could be impossible and
> >> unnecessary.
> > Now, I see how the Pentium FDIV bug escaped notice. Hey, the numbers
> > are in a reasonable range. What's everyone upset about?
> >
> >> Yeah, currently the predefined valid count range may be some kind of
> >> loose since I want to cover as much as hardwares and avoid to cause
> >> regression. Especially after introducing the random jump and clflush
> >> instructions, the cycles and slots become much more hard to predict.
> >> Maybe we can have a comparable restricted count range in the initial
> >> change, and we can loosen the restriction then if we encounter a failure
> >> on some specific hardware. do you think it's better? Thanks.
> > I think the test is essentially useless, and should probably just be
> > deleted, so that it doesn't give a false sense of confidence.
>
> IMO, I can't say the tests are totally useless. Yes, passing the tests
> doesn't mean the KVM vPMU must work correctly, but we can say there is
> something probably wrong if it fails to pass these tests. Considering
> the hardware differences, it's impossible to set an exact value for
> these events in advance and it seems there is no better method to verify
> the PMC count as well. I still prefer to keep these tests until we have
> a better method to verify the accuracy of the PMC count.
If it's impossible to set an exact value for these events in advance,
how does Intel validate the hardware PMU?
On 11/1/2023 11:24 AM, Jim Mattson wrote:
> On Tue, Oct 31, 2023 at 8:16 PM Mi, Dapeng <[email protected]> wrote:
>>
>> On 11/1/2023 10:47 AM, Jim Mattson wrote:
>>> On Tue, Oct 31, 2023 at 7:33 PM Mi, Dapeng <[email protected]> wrote:
>>>> On 11/1/2023 2:47 AM, Jim Mattson wrote:
>>>>> On Tue, Oct 31, 2023 at 2:22 AM Dapeng Mi <[email protected]> wrote:
>>>>>> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
>>>>>> (fixed counter 3) to counter/sample topdown.slots event, but current
>>>>>> code still doesn't cover this new fixed counter.
>>>>>>
>>>>>> So this patch adds code to validate this new fixed counter can count
>>>>>> slots event correctly.
>>>>> I'm not convinced that this actually validates anything.
>>>>>
>>>>> Suppose, for example, that KVM used fixed counter 1 when the guest
>>>>> asked for fixed counter 3. Wouldn't this test still pass?
>>>> Per my understanding, as long as the KVM returns a valid count in the
>>>> reasonable count range, we can think KVM works correctly. We don't need
>>>> to entangle on how KVM really uses the HW, it could be impossible and
>>>> unnecessary.
>>> Now, I see how the Pentium FDIV bug escaped notice. Hey, the numbers
>>> are in a reasonable range. What's everyone upset about?
>>>
>>>> Yeah, currently the predefined valid count range may be some kind of
>>>> loose since I want to cover as much as hardwares and avoid to cause
>>>> regression. Especially after introducing the random jump and clflush
>>>> instructions, the cycles and slots become much more hard to predict.
>>>> Maybe we can have a comparable restricted count range in the initial
>>>> change, and we can loosen the restriction then if we encounter a failure
>>>> on some specific hardware. do you think it's better? Thanks.
>>> I think the test is essentially useless, and should probably just be
>>> deleted, so that it doesn't give a false sense of confidence.
>> IMO, I can't say the tests are totally useless. Yes, passing the tests
>> doesn't mean the KVM vPMU must work correctly, but we can say there is
>> something probably wrong if it fails to pass these tests. Considering
>> the hardware differences, it's impossible to set an exact value for
>> these events in advance and it seems there is no better method to verify
>> the PMC count as well. I still prefer to keep these tests until we have
>> a better method to verify the accuracy of the PMC count.
> If it's impossible to set an exact value for these events in advance,
> how does Intel validate the hardware PMU?
I have no much idea how HW team validates the PMU functionality. But per
my gotten information, they could have some very tiny benchmarks with a
fixed pattern and run them on a certain scenario, so they can expect an
very accurate count value. But this is different with our case, a real
program is executed on a real system (probably shared with other
programs), the events count is impacted by too much hardware/software
factors, such as cache contention, it's hard to predict a single
accurate count in advance.
Anyway, it's only my guess about the ways of hardware validation, still
add Kan to get more information.
Hi Kan,
Do you have more information about how HW team to validate the PMC count
accuracy? Thanks.
On Tue, Oct 31, 2023 at 2:21 AM Dapeng Mi <[email protected]> wrote:
>
> From: Xiong Zhang <[email protected]>
>
> There are totally same code in pmu_init() helper, remove the duplicate
> code.
>
> Signed-off-by: Xiong Zhang <[email protected]>
> Signed-off-by: Dapeng Mi <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
On 2023-10-31 11:57 p.m., Mi, Dapeng wrote:
>
> On 11/1/2023 11:24 AM, Jim Mattson wrote:
>> On Tue, Oct 31, 2023 at 8:16 PM Mi, Dapeng
>> <[email protected]> wrote:
>>>
>>> On 11/1/2023 10:47 AM, Jim Mattson wrote:
>>>> On Tue, Oct 31, 2023 at 7:33 PM Mi, Dapeng
>>>> <[email protected]> wrote:
>>>>> On 11/1/2023 2:47 AM, Jim Mattson wrote:
>>>>>> On Tue, Oct 31, 2023 at 2:22 AM Dapeng Mi
>>>>>> <[email protected]> wrote:
>>>>>>> Intel CPUs, like Sapphire Rapids, introduces a new fixed counter
>>>>>>> (fixed counter 3) to counter/sample topdown.slots event, but current
>>>>>>> code still doesn't cover this new fixed counter.
>>>>>>>
>>>>>>> So this patch adds code to validate this new fixed counter can count
>>>>>>> slots event correctly.
>>>>>> I'm not convinced that this actually validates anything.
>>>>>>
>>>>>> Suppose, for example, that KVM used fixed counter 1 when the guest
>>>>>> asked for fixed counter 3. Wouldn't this test still pass?
>>>>> Per my understanding, as long as the KVM returns a valid count in the
>>>>> reasonable count range, we can think KVM works correctly. We don't
>>>>> need
>>>>> to entangle on how KVM really uses the HW, it could be impossible and
>>>>> unnecessary.
>>>> Now, I see how the Pentium FDIV bug escaped notice. Hey, the numbers
>>>> are in a reasonable range. What's everyone upset about?
>>>>
>>>>> Yeah, currently the predefined valid count range may be some kind of
>>>>> loose since I want to cover as much as hardwares and avoid to cause
>>>>> regression. Especially after introducing the random jump and clflush
>>>>> instructions, the cycles and slots become much more hard to predict.
>>>>> Maybe we can have a comparable restricted count range in the initial
>>>>> change, and we can loosen the restriction then if we encounter a
>>>>> failure
>>>>> on some specific hardware. do you think it's better? Thanks.
>>>> I think the test is essentially useless, and should probably just be
>>>> deleted, so that it doesn't give a false sense of confidence.
>>> IMO, I can't say the tests are totally useless. Yes, passing the tests
>>> doesn't mean the KVM vPMU must work correctly, but we can say there is
>>> something probably wrong if it fails to pass these tests. Considering
>>> the hardware differences, it's impossible to set an exact value for
>>> these events in advance and it seems there is no better method to verify
>>> the PMC count as well. I still prefer to keep these tests until we have
>>> a better method to verify the accuracy of the PMC count.
>> If it's impossible to set an exact value for these events in advance,
>> how does Intel validate the hardware PMU?
>
>
> I have no much idea how HW team validates the PMU functionality. But per
> my gotten information, they could have some very tiny benchmarks with a
> fixed pattern and run them on a certain scenario, so they can expect an
> very accurate count value. But this is different with our case, a real
> program is executed on a real system (probably shared with other
> programs), the events count is impacted by too much hardware/software
> factors, such as cache contention, it's hard to predict a single
> accurate count in advance.
>
Yes, there are many factors could impact the value of the
microbenchmarks. I don't think there is a universal benchmark for all
generations and all configurations.
Thanks,
Kan
> Anyway, it's only my guess about the ways of hardware validation, still
> add Kan to get more information.
>
> Hi Kan,
>
> Do you have more information about how HW team to validate the PMC count
> accuracy? Thanks.
>
>