This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
to provide counter information (i.e. values/overflow status) via a shared
memory between the SBI implementation and supervisor OS. This allows to minimize
the number of traps in when perf being used inside a kvm guest as it relies on
SBI PMU + trap/emulation of the counters.
The current set of ratified RISC-V specification also doesn't allow scountovf
to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
in ISA as well and enables perf sampling in the guest. However, LCOFI in the
guest only works via IRQ filtering in AIA specification. That's why, AIA
has to be enabled in the hardware (at least the Ssaia extension) in order to
use the sampling support in the perf.
Here are the patch wise implementation details.
PATCH 1,4,7,8,9,10,11,15 : Generic cleanups/improvements.
PATCH 2,3,14 : FW_READ_HI function implementation
PATCH 5-6: Add PMU snapshot feature in sbi pmu driver
PATCH 12-13: KVM implementation for snapshot and sampling in kvm guests
PATCH 16-17: Generic improvements for kvm selftests
PATCH 18-22: KVM selftests for SBI PMU extension
The series is based on v6.9-rc4 and is available at:
https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v8
The kvmtool patch is also available at:
https://github.com/atishp04/kvmtool/tree/sscofpmf
It also requires Ssaia ISA extension to be present in the hardware in order to
get perf sampling support in the guest. In Qemu virt machine, it can be done
by the following config.
```
-cpu rv64,sscofpmf=true,x-ssaia=true
```
There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
for the guest if AIA patches are not available. Here is the example command.
```
/lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
```
The series has been tested only in Qemu.
Here is the snippet of the perf running inside a kvm guest.
===================================================
$ perf record -e cycles -e instructions perf bench sched messaging -g 5
..
$ Running 'sched/messaging' benchmark:
..
[ 45.928723] perf_duration_warn: 2 callbacks suppressed
[ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
$ 20 sender and receiver processes per group
$ 5 groups == 200 processes run
Total time: 14.220 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
$ perf report --stdio
$ To display the perf.data header info, please use --header/--header-only optio>
$
$
$ Total Lost Samples: 0
$
$ Samples: 943 of event 'cycles'
$ Event count (approx.): 5128976844
$
$ Overhead Command Shared Object Symbol >
$ ........ ............... ........................... .....................>
$
7.59% sched-messaging [kernel.kallsyms] [k] memcpy
5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
3.16% sched-messaging [kernel.kallsyms] [k] clear_page
3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
===================================================
[1] https://github.com/riscv-non-isa/riscv-sbi-doc
Changes from v7->v8:
1. Updated event states so that shared memory is updated only during stop
operations.
2. Avoid clobbering lower XLEN counter/overflow values in shared memory
by maintaining a temporary copy for RV32.
3. Improved overflow handling in snapshot case by supporting all 64 values.
4. Minor cleanups based on suggestions on v7.
Changes from v6->v7:
1. Used SBI_SHMEM_DISABLE in the driver.
2. Added RB Tags.
3. Improved the sbi_pmu_test commandline to allow disabling multiple
tests.
Changes from v5->v6:
1. Added a patch for command line option for the sbi pmu tests.
2. Removed redundant prints and restructure the code little bit.
3. Added a patch for computing the sbi minor version correctly.
4. Addressed all other comments on v5.
Changes from v4->v5:
1. Moved sbi related definitions to its own header file from processor.h
2. Added few helper functions for selftests.
3. Improved firmware counter read and RV32 start/stop functions.
4. Converted all the shifting operations to use BIT macro
5. Addressed all other comments on v4.
Changes from v3->v4:
1. Added selftests.
2. Fixed an issue to clear the interrupt pending bits.
3. Fixed the counter index in snapshot memory start function.
Changes from v2->v3:
1. Fixed a patchwork warning on patch6.
2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
Changes from v1->v2:
1. Fixed warning/errors from patchwork CI.
2. Rebased on top of kvm-next.
3. Added Acked-by tags.
Changes from RFC->v1:
1. Addressed all the comments on RFC series.
2. Removed PATCH2 and merged into later patches.
3. Added 2 more patches for minor fixes.
4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
Ssaia in the host.
Atish Patra (24):
RISC-V: Fix the typo in Scountovf CSR name
RISC-V: Add FIRMWARE_READ_HI definition
drivers/perf: riscv: Read upper bits of a firmware counter
drivers/perf: riscv: Use BIT macro for shifting operations
RISC-V: Add SBI PMU snapshot definitions
RISC-V: KVM: Rename the SBI_STA_SHMEM_DISABLE to a generic name
RISC-V: Use the minor version mask while computing sbi version
drivers/perf: riscv: Fix counter mask iteration for RV32
drivers/perf: riscv: Implement SBI PMU snapshot function
RISC-V: KVM: Fix the initial sample period value
RISC-V: KVM: No need to update the counter value during reset
RISC-V: KVM: No need to exit to the user space if perf event failed
RISC-V: KVM: Implement SBI PMU Snapshot feature
RISC-V: KVM: Add perf sampling support for guests
RISC-V: KVM: Support 64 bit firmware counters on RV32
RISC-V: KVM: Improve firmware counter read function
KVM: riscv: selftests: Move sbi definitions to its own header file
KVM: riscv: selftests: Add helper functions for extension checks
KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
KVM: riscv: selftests: Add SBI PMU extension definitions
KVM: riscv: selftests: Add SBI PMU selftest
KVM: riscv: selftests: Add a test for PMU snapshot functionality
KVM: riscv: selftests: Add a test for counter overflow
KVM: riscv: selftests: Add commandline option for SBI PMU test
arch/riscv/include/asm/csr.h | 5 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 +-
arch/riscv/include/asm/sbi.h | 38 +-
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/paravirt.c | 6 +-
arch/riscv/kvm/aia.c | 5 +
arch/riscv/kvm/vcpu.c | 15 +-
arch/riscv/kvm/vcpu_onereg.c | 6 +
arch/riscv/kvm/vcpu_pmu.c | 260 ++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 17 +-
arch/riscv/kvm/vcpu_sbi_sta.c | 4 +-
drivers/perf/riscv_pmu.c | 1 +
drivers/perf/riscv_pmu_sbi.c | 309 +++++++-
include/linux/perf/riscv_pmu.h | 6 +
tools/testing/selftests/kvm/Makefile | 1 +
../selftests/kvm/include/riscv/processor.h | 49 +-
../testing/selftests/kvm/include/riscv/sbi.h | 141 ++++
../selftests/kvm/include/riscv/ucall.h | 1 +
../selftests/kvm/lib/riscv/processor.c | 12 +
../testing/selftests/kvm/riscv/arch_timer.c | 2 +-
../selftests/kvm/riscv/get-reg-list.c | 4 +
../selftests/kvm/riscv/sbi_pmu_test.c | 681 ++++++++++++++++++
tools/testing/selftests/kvm/steal_time.c | 4 +-
23 files changed, 1467 insertions(+), 117 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/riscv/sbi.h
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
--
2.34.1
The counter overflow CSR name is "scountovf" not "sscountovf".
Fix the csr name.
Fixes: 4905ec2fb7e6 ("RISC-V: Add sscofpmf extension support")
Reviewed-by: Clément Léger <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/csr.h | 2 +-
drivers/perf/riscv_pmu_sbi.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 2468c55933cd..9d1b07932794 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -281,7 +281,7 @@
#define CSR_HPMCOUNTER30H 0xc9e
#define CSR_HPMCOUNTER31H 0xc9f
-#define CSR_SSCOUNTOVF 0xda0
+#define CSR_SCOUNTOVF 0xda0
#define CSR_SSTATUS 0x100
#define CSR_SIE 0x104
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 8cbe6e5f9c39..3e44d2fb8bf8 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -27,7 +27,7 @@
#define ALT_SBI_PMU_OVERFLOW(__ovl) \
asm volatile(ALTERNATIVE_2( \
- "csrr %0, " __stringify(CSR_SSCOUNTOVF), \
+ "csrr %0, " __stringify(CSR_SCOUNTOVF), \
"csrr %0, " __stringify(THEAD_C9XX_CSR_SCOUNTEROF), \
THEAD_VENDOR_ID, ERRATA_THEAD_PMU, \
CONFIG_ERRATA_THEAD_PMU, \
--
2.34.1
SBI v2.0 added another function to SBI PMU extension to read
the upper bits of a counter with width larger than XLEN.
Add the definition for that function.
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Clément Léger <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/sbi.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 6e68f8dff76b..ef8311dafb91 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -131,6 +131,7 @@ enum sbi_ext_pmu_fid {
SBI_EXT_PMU_COUNTER_START,
SBI_EXT_PMU_COUNTER_STOP,
SBI_EXT_PMU_COUNTER_FW_READ,
+ SBI_EXT_PMU_COUNTER_FW_READ_HI,
};
union sbi_pmu_ctr_info {
--
2.34.1
SBI v2.0 introduced a explicit function to read the upper 32 bits
for any firmware counter width that is longer than 32bits.
This is only applicable for RV32 where firmware counter can be
64 bit.
Reviewed-by: Andrew Jones <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
drivers/perf/riscv_pmu_sbi.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 3e44d2fb8bf8..1823ffb25d35 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -57,6 +57,8 @@ asm volatile(ALTERNATIVE( \
PMU_FORMAT_ATTR(event, "config:0-47");
PMU_FORMAT_ATTR(firmware, "config:63");
+static bool sbi_v2_available;
+
static struct attribute *riscv_arch_formats_attr[] = {
&format_attr_event.attr,
&format_attr_firmware.attr,
@@ -511,19 +513,29 @@ static u64 pmu_sbi_ctr_read(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
struct sbiret ret;
- union sbi_pmu_ctr_info info;
u64 val = 0;
+ union sbi_pmu_ctr_info info = pmu_ctr_list[idx];
if (pmu_sbi_is_fw_event(event)) {
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ,
hwc->idx, 0, 0, 0, 0, 0);
- if (!ret.error)
- val = ret.value;
+ if (ret.error)
+ return 0;
+
+ val = ret.value;
+ if (IS_ENABLED(CONFIG_32BIT) && sbi_v2_available && info.width >= 32) {
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ_HI,
+ hwc->idx, 0, 0, 0, 0, 0);
+ if (!ret.error)
+ val |= ((u64)ret.value << 32);
+ else
+ WARN_ONCE(1, "Unable to read upper 32 bits of firmware counter error: %ld\n",
+ ret.error);
+ }
} else {
- info = pmu_ctr_list[idx];
val = riscv_pmu_ctr_read_csr(info.csr);
if (IS_ENABLED(CONFIG_32BIT))
- val = ((u64)riscv_pmu_ctr_read_csr(info.csr + 0x80)) << 31 | val;
+ val |= ((u64)riscv_pmu_ctr_read_csr(info.csr + 0x80)) << 32;
}
return val;
@@ -1135,6 +1147,9 @@ static int __init pmu_sbi_devinit(void)
return 0;
}
+ if (sbi_spec_version >= sbi_mk_version(2, 0))
+ sbi_v2_available = true;
+
ret = cpuhp_setup_state_multi(CPUHP_AP_PERF_RISCV_STARTING,
"perf/riscv/pmu:starting",
pmu_sbi_starting_cpu, pmu_sbi_dying_cpu);
--
2.34.1
It is a good practice to use BIT() instead of (1 << x).
Replace the current usages with BIT().
Take this opportunity to replace few (1UL << x) with BIT() as well
for consistency.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/sbi.h | 20 ++++++++++----------
drivers/perf/riscv_pmu_sbi.c | 2 +-
2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index ef8311dafb91..4afa2cd01bae 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -233,20 +233,20 @@ enum sbi_pmu_ctr_type {
#define SBI_PMU_EVENT_IDX_INVALID 0xFFFFFFFF
/* Flags defined for config matching function */
-#define SBI_PMU_CFG_FLAG_SKIP_MATCH (1 << 0)
-#define SBI_PMU_CFG_FLAG_CLEAR_VALUE (1 << 1)
-#define SBI_PMU_CFG_FLAG_AUTO_START (1 << 2)
-#define SBI_PMU_CFG_FLAG_SET_VUINH (1 << 3)
-#define SBI_PMU_CFG_FLAG_SET_VSINH (1 << 4)
-#define SBI_PMU_CFG_FLAG_SET_UINH (1 << 5)
-#define SBI_PMU_CFG_FLAG_SET_SINH (1 << 6)
-#define SBI_PMU_CFG_FLAG_SET_MINH (1 << 7)
+#define SBI_PMU_CFG_FLAG_SKIP_MATCH BIT(0)
+#define SBI_PMU_CFG_FLAG_CLEAR_VALUE BIT(1)
+#define SBI_PMU_CFG_FLAG_AUTO_START BIT(2)
+#define SBI_PMU_CFG_FLAG_SET_VUINH BIT(3)
+#define SBI_PMU_CFG_FLAG_SET_VSINH BIT(4)
+#define SBI_PMU_CFG_FLAG_SET_UINH BIT(5)
+#define SBI_PMU_CFG_FLAG_SET_SINH BIT(6)
+#define SBI_PMU_CFG_FLAG_SET_MINH BIT(7)
/* Flags defined for counter start function */
-#define SBI_PMU_START_FLAG_SET_INIT_VALUE (1 << 0)
+#define SBI_PMU_START_FLAG_SET_INIT_VALUE BIT(0)
/* Flags defined for counter stop function */
-#define SBI_PMU_STOP_FLAG_RESET (1 << 0)
+#define SBI_PMU_STOP_FLAG_RESET BIT(0)
enum sbi_ext_dbcn_fid {
SBI_EXT_DBCN_CONSOLE_WRITE = 0,
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 1823ffb25d35..f23501898657 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -386,7 +386,7 @@ static int pmu_sbi_ctr_get_idx(struct perf_event *event)
cmask = 1;
} else if (event->attr.config == PERF_COUNT_HW_INSTRUCTIONS) {
cflags |= SBI_PMU_CFG_FLAG_SKIP_MATCH;
- cmask = 1UL << (CSR_INSTRET - CSR_CYCLE);
+ cmask = BIT(CSR_INSTRET - CSR_CYCLE);
}
}
--
2.34.1
SBI PMU Snapshot function optimizes the number of traps to
higher privilege mode by leveraging a shared memory between the S/VS-mode
and the M/HS mode. Add the definitions for that extension and new error
codes.
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/sbi.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 4afa2cd01bae..9aada4b9f7b5 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -132,6 +132,7 @@ enum sbi_ext_pmu_fid {
SBI_EXT_PMU_COUNTER_STOP,
SBI_EXT_PMU_COUNTER_FW_READ,
SBI_EXT_PMU_COUNTER_FW_READ_HI,
+ SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
};
union sbi_pmu_ctr_info {
@@ -148,6 +149,13 @@ union sbi_pmu_ctr_info {
};
};
+/* Data structure to contain the pmu snapshot data */
+struct riscv_pmu_snapshot_data {
+ u64 ctr_overflow_mask;
+ u64 ctr_values[64];
+ u64 reserved[447];
+};
+
#define RISCV_PMU_RAW_EVENT_MASK GENMASK_ULL(47, 0)
#define RISCV_PMU_RAW_EVENT_IDX 0x20000
@@ -244,9 +252,11 @@ enum sbi_pmu_ctr_type {
/* Flags defined for counter start function */
#define SBI_PMU_START_FLAG_SET_INIT_VALUE BIT(0)
+#define SBI_PMU_START_FLAG_INIT_SNAPSHOT BIT(1)
/* Flags defined for counter stop function */
#define SBI_PMU_STOP_FLAG_RESET BIT(0)
+#define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
enum sbi_ext_dbcn_fid {
SBI_EXT_DBCN_CONSOLE_WRITE = 0,
@@ -285,6 +295,7 @@ struct sbi_sta_struct {
#define SBI_ERR_ALREADY_AVAILABLE -6
#define SBI_ERR_ALREADY_STARTED -7
#define SBI_ERR_ALREADY_STOPPED -8
+#define SBI_ERR_NO_SHMEM -9
extern unsigned long sbi_spec_version;
struct sbiret {
--
2.34.1
SBI_STA_SHMEM_DISABLE is a macro to invoke disable shared memory
commands. As this can be invoked from other SBI extension context
as well, rename it to more generic name as SBI_SHMEM_DISABLE.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/sbi.h | 2 +-
arch/riscv/kernel/paravirt.c | 6 +++---
arch/riscv/kvm/vcpu_sbi_sta.c | 4 ++--
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 9aada4b9f7b5..f31650b10899 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -277,7 +277,7 @@ struct sbi_sta_struct {
u8 pad[47];
} __packed;
-#define SBI_STA_SHMEM_DISABLE -1
+#define SBI_SHMEM_DISABLE -1
/* SBI spec version fields */
#define SBI_SPEC_VERSION_DEFAULT 0x1
diff --git a/arch/riscv/kernel/paravirt.c b/arch/riscv/kernel/paravirt.c
index 0d6225fd3194..fa6b0339a65d 100644
--- a/arch/riscv/kernel/paravirt.c
+++ b/arch/riscv/kernel/paravirt.c
@@ -62,7 +62,7 @@ static int sbi_sta_steal_time_set_shmem(unsigned long lo, unsigned long hi,
ret = sbi_ecall(SBI_EXT_STA, SBI_EXT_STA_STEAL_TIME_SET_SHMEM,
lo, hi, flags, 0, 0, 0);
if (ret.error) {
- if (lo == SBI_STA_SHMEM_DISABLE && hi == SBI_STA_SHMEM_DISABLE)
+ if (lo == SBI_SHMEM_DISABLE && hi == SBI_SHMEM_DISABLE)
pr_warn("Failed to disable steal-time shmem");
else
pr_warn("Failed to set steal-time shmem");
@@ -84,8 +84,8 @@ static int pv_time_cpu_online(unsigned int cpu)
static int pv_time_cpu_down_prepare(unsigned int cpu)
{
- return sbi_sta_steal_time_set_shmem(SBI_STA_SHMEM_DISABLE,
- SBI_STA_SHMEM_DISABLE, 0);
+ return sbi_sta_steal_time_set_shmem(SBI_SHMEM_DISABLE,
+ SBI_SHMEM_DISABLE, 0);
}
static u64 pv_time_steal_clock(int cpu)
diff --git a/arch/riscv/kvm/vcpu_sbi_sta.c b/arch/riscv/kvm/vcpu_sbi_sta.c
index d8cf9ca28c61..5f35427114c1 100644
--- a/arch/riscv/kvm/vcpu_sbi_sta.c
+++ b/arch/riscv/kvm/vcpu_sbi_sta.c
@@ -93,8 +93,8 @@ static int kvm_sbi_sta_steal_time_set_shmem(struct kvm_vcpu *vcpu)
if (flags != 0)
return SBI_ERR_INVALID_PARAM;
- if (shmem_phys_lo == SBI_STA_SHMEM_DISABLE &&
- shmem_phys_hi == SBI_STA_SHMEM_DISABLE) {
+ if (shmem_phys_lo == SBI_SHMEM_DISABLE &&
+ shmem_phys_hi == SBI_SHMEM_DISABLE) {
vcpu->arch.sta.shmem = INVALID_GPA;
return 0;
}
--
2.34.1
As per the SBI specification, minor version is encoded in the
lower 24 bits only. Make sure that the SBI version is computed
with the appropriate mask.
Currently, there is no minor version in use. Thus, it doesn't
change anything functionality but it is good to be compliant with
the specification.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/sbi.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index f31650b10899..112a0a0d9f46 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -367,8 +367,8 @@ static inline unsigned long sbi_minor_version(void)
static inline unsigned long sbi_mk_version(unsigned long major,
unsigned long minor)
{
- return ((major & SBI_SPEC_VERSION_MAJOR_MASK) <<
- SBI_SPEC_VERSION_MAJOR_SHIFT) | minor;
+ return ((major & SBI_SPEC_VERSION_MAJOR_MASK) << SBI_SPEC_VERSION_MAJOR_SHIFT)
+ | (minor & SBI_SPEC_VERSION_MINOR_MASK);
}
int sbi_err_map_linux_errno(int err);
--
2.34.1
For RV32, used_hw_ctrs can have more than 1 word if the firmware chooses
to interleave firmware/hardware counters indicies. Even though it's a
unlikely scenario, handle that case by iterating over all the words
instead of just using the first word.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
drivers/perf/riscv_pmu_sbi.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index f23501898657..4eacd89141a9 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -652,10 +652,12 @@ static inline void pmu_sbi_stop_all(struct riscv_pmu *pmu)
static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
{
struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
+ int i;
- /* No need to check the error here as we can't do anything about the error */
- sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 0,
- cpu_hw_evt->used_hw_ctrs[0], 0, 0, 0, 0);
+ for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++)
+ /* No need to check the error here as we can't do anything about the error */
+ sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, i * BITS_PER_LONG,
+ cpu_hw_evt->used_hw_ctrs[i], 0, 0, 0, 0);
}
/*
@@ -667,7 +669,7 @@ static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
unsigned long ctr_ovf_mask)
{
- int idx = 0;
+ int idx = 0, i;
struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
struct perf_event *event;
unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
@@ -676,11 +678,12 @@ static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
struct hw_perf_event *hwc;
u64 init_val = 0;
- ctr_start_mask = cpu_hw_evt->used_hw_ctrs[0] & ~ctr_ovf_mask;
-
- /* Start all the counters that did not overflow in a single shot */
- sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, 0, ctr_start_mask,
- 0, 0, 0, 0);
+ for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++) {
+ ctr_start_mask = cpu_hw_evt->used_hw_ctrs[i] & ~ctr_ovf_mask;
+ /* Start all the counters that did not overflow in a single shot */
+ sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, i * BITS_PER_LONG, ctr_start_mask,
+ 0, 0, 0, 0);
+ }
/* Reinitialize and start all the counter that overflowed */
while (ctr_ovf_mask) {
--
2.34.1
The initial sample period value when counter value is not assigned
should be set to maximum value supported by the counter width.
Otherwise, it may result in spurious interrupts.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/vcpu_pmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index 86391a5061dd..cee1b9ca4ec4 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -39,7 +39,7 @@ static u64 kvm_pmu_get_sample_period(struct kvm_pmc *pmc)
u64 sample_period;
if (!pmc->counter_val)
- sample_period = counter_val_mask + 1;
+ sample_period = counter_val_mask;
else
sample_period = (-pmc->counter_val) & counter_val_mask;
--
2.34.1
SBI v2.0 SBI introduced PMU snapshot feature which adds the following
features.
1. Read counter values directly from the shared memory instead of
csr read.
2. Start multiple counters with initial values with one SBI call.
These functionalities optimizes the number of traps to the higher
privilege mode. If the kernel is in VS mode while the hypervisor
deploy trap & emulate method, this would minimize all the hpmcounter
CSR read traps. If the kernel is running in S-mode, the benefits
reduced to CSR latency vs DRAM/cache latency as there is no trap
involved while accessing the hpmcounter CSRs.
In both modes, it does saves the number of ecalls while starting
multiple counter together with an initial values. This is a likely
scenario if multiple counters overflow at the same time.
Acked-by: Palmer Dabbelt <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
drivers/perf/riscv_pmu.c | 1 +
drivers/perf/riscv_pmu_sbi.c | 265 ++++++++++++++++++++++++++++++---
include/linux/perf/riscv_pmu.h | 6 +
3 files changed, 255 insertions(+), 17 deletions(-)
diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
index b4efdddb2ad9..36d348753d05 100644
--- a/drivers/perf/riscv_pmu.c
+++ b/drivers/perf/riscv_pmu.c
@@ -408,6 +408,7 @@ struct riscv_pmu *riscv_pmu_alloc(void)
cpuc->n_events = 0;
for (i = 0; i < RISCV_MAX_COUNTERS; i++)
cpuc->events[i] = NULL;
+ cpuc->snapshot_addr = NULL;
}
pmu->pmu = (struct pmu) {
.event_init = riscv_pmu_event_init,
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 4eacd89141a9..2694110f1cff 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -58,6 +58,9 @@ PMU_FORMAT_ATTR(event, "config:0-47");
PMU_FORMAT_ATTR(firmware, "config:63");
static bool sbi_v2_available;
+static DEFINE_STATIC_KEY_FALSE(sbi_pmu_snapshot_available);
+#define sbi_pmu_snapshot_available() \
+ static_branch_unlikely(&sbi_pmu_snapshot_available)
static struct attribute *riscv_arch_formats_attr[] = {
&format_attr_event.attr,
@@ -508,14 +511,105 @@ static int pmu_sbi_event_map(struct perf_event *event, u64 *econfig)
return ret;
}
+static void pmu_sbi_snapshot_free(struct riscv_pmu *pmu)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
+
+ if (!cpu_hw_evt->snapshot_addr)
+ continue;
+
+ free_page((unsigned long)cpu_hw_evt->snapshot_addr);
+ cpu_hw_evt->snapshot_addr = NULL;
+ cpu_hw_evt->snapshot_addr_phys = 0;
+ }
+}
+
+static int pmu_sbi_snapshot_alloc(struct riscv_pmu *pmu)
+{
+ int cpu;
+ struct page *snapshot_page;
+
+ for_each_possible_cpu(cpu) {
+ struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
+
+ snapshot_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
+ if (!snapshot_page) {
+ pmu_sbi_snapshot_free(pmu);
+ return -ENOMEM;
+ }
+ cpu_hw_evt->snapshot_addr = page_to_virt(snapshot_page);
+ cpu_hw_evt->snapshot_addr_phys = page_to_phys(snapshot_page);
+ }
+
+ return 0;
+}
+
+static int pmu_sbi_snapshot_disable(void)
+{
+ struct sbiret ret;
+
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM, SBI_SHMEM_DISABLE,
+ SBI_SHMEM_DISABLE, 0, 0, 0, 0);
+ if (ret.error) {
+ pr_warn("failed to disable snapshot shared memory\n");
+ return sbi_err_map_linux_errno(ret.error);
+ }
+
+ return 0;
+}
+
+static int pmu_sbi_snapshot_setup(struct riscv_pmu *pmu, int cpu)
+{
+ struct cpu_hw_events *cpu_hw_evt;
+ struct sbiret ret = {0};
+
+ cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
+ if (!cpu_hw_evt->snapshot_addr_phys)
+ return -EINVAL;
+
+ if (cpu_hw_evt->snapshot_set_done)
+ return 0;
+
+ if (IS_ENABLED(CONFIG_32BIT))
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
+ cpu_hw_evt->snapshot_addr_phys,
+ (u64)(cpu_hw_evt->snapshot_addr_phys) >> 32, 0, 0, 0, 0);
+ else
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
+ cpu_hw_evt->snapshot_addr_phys, 0, 0, 0, 0, 0);
+
+ /* Free up the snapshot area memory and fall back to SBI PMU calls without snapshot */
+ if (ret.error) {
+ if (ret.error != SBI_ERR_NOT_SUPPORTED)
+ pr_warn("pmu snapshot setup failed with error %ld\n", ret.error);
+ return sbi_err_map_linux_errno(ret.error);
+ }
+
+ cpu_hw_evt->snapshot_set_done = true;
+
+ return 0;
+}
+
static u64 pmu_sbi_ctr_read(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
struct sbiret ret;
u64 val = 0;
+ struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
+ struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
+ struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
union sbi_pmu_ctr_info info = pmu_ctr_list[idx];
+ /* Read the value from the shared memory directly only if counter is stopped */
+ if (sbi_pmu_snapshot_available() & (hwc->state & PERF_HES_STOPPED)) {
+ val = sdata->ctr_values[idx];
+ return val;
+ }
+
if (pmu_sbi_is_fw_event(event)) {
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ,
hwc->idx, 0, 0, 0, 0, 0);
@@ -565,6 +659,7 @@ static void pmu_sbi_ctr_start(struct perf_event *event, u64 ival)
struct hw_perf_event *hwc = &event->hw;
unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
+ /* There is no benefit setting SNAPSHOT FLAG for a single counter */
#if defined(CONFIG_32BIT)
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, hwc->idx,
1, flag, ival, ival >> 32, 0);
@@ -585,16 +680,36 @@ static void pmu_sbi_ctr_stop(struct perf_event *event, unsigned long flag)
{
struct sbiret ret;
struct hw_perf_event *hwc = &event->hw;
+ struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
+ struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
+ struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
(hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
pmu_sbi_reset_scounteren((void *)event);
+ if (sbi_pmu_snapshot_available())
+ flag |= SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
+
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, hwc->idx, 1, flag, 0, 0, 0);
- if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
- flag != SBI_PMU_STOP_FLAG_RESET)
+ if (!ret.error && sbi_pmu_snapshot_available()) {
+ /*
+ * The counter snapshot is based on the index base specified by hwc->idx.
+ * The actual counter value is updated in shared memory at index 0 when counter
+ * mask is 0x01. To ensure accurate counter values, it's necessary to transfer
+ * the counter value to shared memory. However, if hwc->idx is zero, the counter
+ * value is already correctly updated in shared memory, requiring no further
+ * adjustment.
+ */
+ if (hwc->idx > 0) {
+ sdata->ctr_values[hwc->idx] = sdata->ctr_values[0];
+ sdata->ctr_values[0] = 0;
+ }
+ } else if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
+ flag != SBI_PMU_STOP_FLAG_RESET) {
pr_err("Stopping counter idx %d failed with error %d\n",
hwc->idx, sbi_err_map_linux_errno(ret.error));
+ }
}
static int pmu_sbi_find_num_ctrs(void)
@@ -652,12 +767,39 @@ static inline void pmu_sbi_stop_all(struct riscv_pmu *pmu)
static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
{
struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
+ struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
+ unsigned long flag = 0;
int i;
+ struct sbiret ret;
+ unsigned long temp_ctr_values[64] = {0};
+ unsigned long ctr_val, temp_ctr_overflow_mask = 0;
- for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++)
+ if (sbi_pmu_snapshot_available())
+ flag = SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
+
+ for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++) {
/* No need to check the error here as we can't do anything about the error */
- sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, i * BITS_PER_LONG,
- cpu_hw_evt->used_hw_ctrs[i], 0, 0, 0, 0);
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, i * BITS_PER_LONG,
+ cpu_hw_evt->used_hw_ctrs[i], flag, 0, 0, 0);
+ if (!ret.error && sbi_pmu_snapshot_available()) {
+ /* Save the counter values to avoid clobbering */
+ temp_ctr_values[i * BITS_PER_LONG + i] = sdata->ctr_values[i];
+ /* Save the overflow mask to avoid clobbering */
+ if (BIT(i) & sdata->ctr_overflow_mask)
+ temp_ctr_overflow_mask |= BIT(i + i * BITS_PER_LONG);
+ }
+ }
+
+ /* Restore the counter values to the shared memory */
+ if (sbi_pmu_snapshot_available()) {
+ for (i = 0; i < 64; i++) {
+ ctr_val = temp_ctr_values[i];
+ if (ctr_val)
+ sdata->ctr_values[i] = ctr_val;
+ if (temp_ctr_overflow_mask)
+ sdata->ctr_overflow_mask = temp_ctr_overflow_mask;
+ }
+ }
}
/*
@@ -666,11 +808,10 @@ static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
* while the overflowed counters need to be started with updated initialization
* value.
*/
-static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
- unsigned long ctr_ovf_mask)
+static inline void pmu_sbi_start_ovf_ctrs_sbi(struct cpu_hw_events *cpu_hw_evt,
+ u64 ctr_ovf_mask)
{
int idx = 0, i;
- struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
struct perf_event *event;
unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
unsigned long ctr_start_mask = 0;
@@ -706,6 +847,48 @@ static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
}
}
+static inline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_evt,
+ u64 ctr_ovf_mask)
+{
+ int idx = 0;
+ struct perf_event *event;
+ unsigned long flag = SBI_PMU_START_FLAG_INIT_SNAPSHOT;
+ u64 max_period, init_val = 0;
+ struct hw_perf_event *hwc;
+ struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
+
+ for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
+ if (ctr_ovf_mask & BIT(idx)) {
+ event = cpu_hw_evt->events[idx];
+ hwc = &event->hw;
+ max_period = riscv_pmu_ctr_get_width_mask(event);
+ init_val = local64_read(&hwc->prev_count) & max_period;
+ sdata->ctr_values[idx] = init_val;
+ }
+ /*
+ * We do not need to update the non-overflow counters the previous
+ * value should have been there already.
+ */
+ }
+
+ for (idx = 0; idx < BITS_TO_LONGS(RISCV_MAX_COUNTERS); idx++) {
+ /* Start all the counters in a single shot */
+ sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, idx * BITS_PER_LONG,
+ cpu_hw_evt->used_hw_ctrs[idx], flag, 0, 0, 0);
+ }
+}
+
+static void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
+ u64 ctr_ovf_mask)
+{
+ struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
+
+ if (sbi_pmu_snapshot_available())
+ pmu_sbi_start_ovf_ctrs_snapshot(cpu_hw_evt, ctr_ovf_mask);
+ else
+ pmu_sbi_start_ovf_ctrs_sbi(cpu_hw_evt, ctr_ovf_mask);
+}
+
static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
{
struct perf_sample_data data;
@@ -716,9 +899,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
struct riscv_pmu *pmu;
struct perf_event *event;
unsigned long overflow;
- unsigned long overflowed_ctrs = 0;
+ u64 overflowed_ctrs = 0;
struct cpu_hw_events *cpu_hw_evt = dev;
u64 start_clock = sched_clock();
+ struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
if (WARN_ON_ONCE(!cpu_hw_evt))
return IRQ_NONE;
@@ -740,7 +924,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
pmu_sbi_stop_hw_ctrs(pmu);
/* Overflow status register should only be read after counter are stopped */
- ALT_SBI_PMU_OVERFLOW(overflow);
+ if (sbi_pmu_snapshot_available())
+ overflow = sdata->ctr_overflow_mask;
+ else
+ ALT_SBI_PMU_OVERFLOW(overflow);
/*
* Overflow interrupt pending bit should only be cleared after stopping
@@ -766,9 +953,14 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
if (!info || info->type != SBI_PMU_CTR_TYPE_HW)
continue;
- /* compute hardware counter index */
- hidx = info->csr - CSR_CYCLE;
- /* check if the corresponding bit is set in sscountovf */
+ if (sbi_pmu_snapshot_available())
+ /* SBI implementation already updated the logical indicies */
+ hidx = lidx;
+ else
+ /* compute hardware counter index */
+ hidx = info->csr - CSR_CYCLE;
+
+ /* check if the corresponding bit is set in sscountovf or overflow mask in shmem */
if (!(overflow & BIT(hidx)))
continue;
@@ -778,7 +970,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
*/
overflowed_ctrs |= BIT(lidx);
hw_evt = &event->hw;
+ /* Update the event states here so that we know the state while reading */
+ hw_evt->state |= PERF_HES_STOPPED;
riscv_pmu_event_update(event);
+ hw_evt->state |= PERF_HES_UPTODATE;
perf_sample_data_init(&data, 0, hw_evt->last_period);
if (riscv_pmu_event_set_period(event)) {
/*
@@ -791,6 +986,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
*/
perf_event_overflow(event, &data, regs);
}
+ /* Reset the state as we are going to start the counter after the loop */
+ hw_evt->state = 0;
}
pmu_sbi_start_overflow_mask(pmu, overflowed_ctrs);
@@ -822,6 +1019,9 @@ static int pmu_sbi_starting_cpu(unsigned int cpu, struct hlist_node *node)
enable_percpu_irq(riscv_pmu_irq, IRQ_TYPE_NONE);
}
+ if (sbi_pmu_snapshot_available())
+ return pmu_sbi_snapshot_setup(pmu, cpu);
+
return 0;
}
@@ -834,6 +1034,9 @@ static int pmu_sbi_dying_cpu(unsigned int cpu, struct hlist_node *node)
/* Disable all counters access for user mode now */
csr_write(CSR_SCOUNTEREN, 0x0);
+ if (sbi_pmu_snapshot_available())
+ return pmu_sbi_snapshot_disable();
+
return 0;
}
@@ -942,6 +1145,12 @@ static inline void riscv_pm_pmu_unregister(struct riscv_pmu *pmu) { }
static void riscv_pmu_destroy(struct riscv_pmu *pmu)
{
+ if (sbi_v2_available) {
+ if (sbi_pmu_snapshot_available()) {
+ pmu_sbi_snapshot_disable();
+ pmu_sbi_snapshot_free(pmu);
+ }
+ }
riscv_pm_pmu_unregister(pmu);
cpuhp_state_remove_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
}
@@ -1109,10 +1318,6 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
pmu->event_unmapped = pmu_sbi_event_unmapped;
pmu->csr_index = pmu_sbi_csr_index;
- ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
- if (ret)
- return ret;
-
ret = riscv_pm_pmu_register(pmu);
if (ret)
goto out_unregister;
@@ -1121,8 +1326,34 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
if (ret)
goto out_unregister;
+ /* SBI PMU Snapsphot is only available in SBI v2.0 */
+ if (sbi_v2_available) {
+ ret = pmu_sbi_snapshot_alloc(pmu);
+ if (ret)
+ goto out_unregister;
+
+ ret = pmu_sbi_snapshot_setup(pmu, smp_processor_id());
+ if (ret) {
+ /* Snapshot is an optional feature. Continue if not available */
+ pmu_sbi_snapshot_free(pmu);
+ } else {
+ pr_info("SBI PMU snapshot detected\n");
+ /*
+ * We enable it once here for the boot cpu. If snapshot shmem setup
+ * fails during cpu hotplug process, it will fail to start the cpu
+ * as we can not handle hetergenous PMUs with different snapshot
+ * capability.
+ */
+ static_branch_enable(&sbi_pmu_snapshot_available);
+ }
+ }
+
register_sysctl("kernel", sbi_pmu_sysctl_table);
+ ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
+ if (ret)
+ goto out_unregister;
+
return 0;
out_unregister:
diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index 43282e22ebe1..c3fa90970042 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -39,6 +39,12 @@ struct cpu_hw_events {
DECLARE_BITMAP(used_hw_ctrs, RISCV_MAX_COUNTERS);
/* currently enabled firmware counters */
DECLARE_BITMAP(used_fw_ctrs, RISCV_MAX_COUNTERS);
+ /* The virtual address of the shared memory where counter snapshot will be taken */
+ void *snapshot_addr;
+ /* The physical address of the shared memory where counter snapshot will be taken */
+ phys_addr_t snapshot_addr_phys;
+ /* Boolean flag to indicate setup is already done */
+ bool snapshot_set_done;
};
struct riscv_pmu {
--
2.34.1
The virtual counter value is updated during pmu_ctr_read. There is no need
to update it in reset case. Otherwise, it will be counted twice which is
incorrect.
Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling")
Reviewed-by: Anup Patel <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/vcpu_pmu.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index cee1b9ca4ec4..b5159ce4592d 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -397,7 +397,6 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
{
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
int i, pmc_index, sbiret = 0;
- u64 enabled, running;
struct kvm_pmc *pmc;
int fevent_code;
@@ -432,12 +431,9 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
sbiret = SBI_ERR_ALREADY_STOPPED;
}
- if (flags & SBI_PMU_STOP_FLAG_RESET) {
- /* Relase the counter if this is a reset request */
- pmc->counter_val += perf_event_read_value(pmc->perf_event,
- &enabled, &running);
+ if (flags & SBI_PMU_STOP_FLAG_RESET)
+ /* Release the counter if this is a reset request */
kvm_pmu_release_perf_event(pmc);
- }
} else {
sbiret = SBI_ERR_INVALID_PARAM;
}
--
2.34.1
Currently, we return a linux error code if creating a perf event failed
in kvm. That shouldn't be necessary as guest can continue to operate
without perf profiling or profiling with firmware counters.
Return appropriate SBI error code to indicate that PMU configuration
failed. An error message in kvm already describes the reason for failure.
Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling")
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/vcpu_pmu.c | 14 +++++++++-----
arch/riscv/kvm/vcpu_sbi_pmu.c | 6 +++---
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index b5159ce4592d..2d9929bbc2c8 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -229,8 +229,9 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
return 0;
}
-static int kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
- unsigned long flags, unsigned long eidx, unsigned long evtdata)
+static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
+ unsigned long flags, unsigned long eidx,
+ unsigned long evtdata)
{
struct perf_event *event;
@@ -454,7 +455,8 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
unsigned long eidx, u64 evtdata,
struct kvm_vcpu_sbi_return *retdata)
{
- int ctr_idx, ret, sbiret = 0;
+ int ctr_idx, sbiret = 0;
+ long ret;
bool is_fevent;
unsigned long event_code;
u32 etype = kvm_pmu_get_perf_event_type(eidx);
@@ -513,8 +515,10 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
kvpmu->fw_event[event_code].started = true;
} else {
ret = kvm_pmu_create_perf_event(pmc, &attr, flags, eidx, evtdata);
- if (ret)
- return ret;
+ if (ret) {
+ sbiret = SBI_ERR_NOT_SUPPORTED;
+ goto out;
+ }
}
set_bit(ctr_idx, kvpmu->pmc_in_use);
diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
index 7eca72df2cbd..e1633606c98b 100644
--- a/arch/riscv/kvm/vcpu_sbi_pmu.c
+++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
@@ -42,9 +42,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
#endif
/*
* This can fail if perf core framework fails to create an event.
- * Forward the error to userspace because it's an error which
- * happened within the host kernel. The other option would be
- * to convert to an SBI error and forward to the guest.
+ * No need to forward the error to userspace and exit the guest.
+ * The operation can continue without profiling. Forward the
+ * appropriate SBI error to the guest.
*/
ret = kvm_riscv_vcpu_pmu_ctr_cfg_match(vcpu, cp->a0, cp->a1,
cp->a2, cp->a3, temp, retdata);
--
2.34.1
PMU Snapshot function allows to minimize the number of traps when the
guest access configures/access the hpmcounters. If the snapshot feature
is enabled, the hypervisor updates the shared memory with counter
data and state of overflown counters. The guest can just read the
shared memory instead of trap & emulate done by the hypervisor.
This patch doesn't implement the counter overflow yet.
Reviewed-by: Anup Patel <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/kvm_vcpu_pmu.h | 7 ++
arch/riscv/kvm/vcpu_pmu.c | 121 +++++++++++++++++++++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 3 +
3 files changed, 130 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
index 395518a1664e..77a1fc4d203d 100644
--- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
+++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
@@ -50,6 +50,10 @@ struct kvm_pmu {
bool init_done;
/* Bit map of all the virtual counter used */
DECLARE_BITMAP(pmc_in_use, RISCV_KVM_MAX_COUNTERS);
+ /* The address of the counter snapshot area (guest physical address) */
+ gpa_t snapshot_addr;
+ /* The actual data of the snapshot */
+ struct riscv_pmu_snapshot_data *sdata;
};
#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu_context)
@@ -85,6 +89,9 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_return *retdata);
void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_pmu_snapshot_set_shmem(struct kvm_vcpu *vcpu, unsigned long saddr_low,
+ unsigned long saddr_high, unsigned long flags,
+ struct kvm_vcpu_sbi_return *retdata);
void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index 2d9929bbc2c8..2ebccd73680f 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -14,6 +14,7 @@
#include <asm/csr.h>
#include <asm/kvm_vcpu_sbi.h>
#include <asm/kvm_vcpu_pmu.h>
+#include <asm/sbi.h>
#include <linux/bitops.h>
#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
@@ -311,6 +312,80 @@ int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
return ret;
}
+static void kvm_pmu_clear_snapshot_area(struct kvm_vcpu *vcpu)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
+
+ if (kvpmu->sdata) {
+ if (kvpmu->snapshot_addr != INVALID_GPA) {
+ memset(kvpmu->sdata, 0, snapshot_area_size);
+ kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr,
+ kvpmu->sdata, snapshot_area_size);
+ } else {
+ pr_warn("snapshot address invalid\n");
+ }
+ kfree(kvpmu->sdata);
+ kvpmu->sdata = NULL;
+ }
+ kvpmu->snapshot_addr = INVALID_GPA;
+}
+
+int kvm_riscv_vcpu_pmu_snapshot_set_shmem(struct kvm_vcpu *vcpu, unsigned long saddr_low,
+ unsigned long saddr_high, unsigned long flags,
+ struct kvm_vcpu_sbi_return *retdata)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ int snapshot_area_size = sizeof(struct riscv_pmu_snapshot_data);
+ int sbiret = 0;
+ gpa_t saddr;
+ unsigned long hva;
+ bool writable;
+
+ if (!kvpmu || flags) {
+ sbiret = SBI_ERR_INVALID_PARAM;
+ goto out;
+ }
+
+ if (saddr_low == SBI_SHMEM_DISABLE && saddr_high == SBI_SHMEM_DISABLE) {
+ kvm_pmu_clear_snapshot_area(vcpu);
+ return 0;
+ }
+
+ saddr = saddr_low;
+
+ if (saddr_high != 0) {
+ if (IS_ENABLED(CONFIG_32BIT))
+ saddr |= ((gpa_t)saddr_high << 32);
+ else
+ sbiret = SBI_ERR_INVALID_ADDRESS;
+ goto out;
+ }
+
+ hva = kvm_vcpu_gfn_to_hva_prot(vcpu, saddr >> PAGE_SHIFT, &writable);
+ if (kvm_is_error_hva(hva) || !writable) {
+ sbiret = SBI_ERR_INVALID_ADDRESS;
+ goto out;
+ }
+
+ kvpmu->sdata = kzalloc(snapshot_area_size, GFP_ATOMIC);
+ if (!kvpmu->sdata)
+ return -ENOMEM;
+
+ if (kvm_vcpu_write_guest(vcpu, saddr, kvpmu->sdata, snapshot_area_size)) {
+ kfree(kvpmu->sdata);
+ sbiret = SBI_ERR_FAILURE;
+ goto out;
+ }
+
+ kvpmu->snapshot_addr = saddr;
+
+out:
+ retdata->err_val = sbiret;
+
+ return 0;
+}
+
int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu,
struct kvm_vcpu_sbi_return *retdata)
{
@@ -344,20 +419,38 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
int i, pmc_index, sbiret = 0;
struct kvm_pmc *pmc;
int fevent_code;
+ bool snap_flag_set = flags & SBI_PMU_START_FLAG_INIT_SNAPSHOT;
if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
sbiret = SBI_ERR_INVALID_PARAM;
goto out;
}
+ if (snap_flag_set) {
+ if (kvpmu->snapshot_addr == INVALID_GPA) {
+ sbiret = SBI_ERR_NO_SHMEM;
+ goto out;
+ }
+ if (kvm_vcpu_read_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
+ sizeof(struct riscv_pmu_snapshot_data))) {
+ pr_warn("Unable to read snapshot shared memory while starting counters\n");
+ sbiret = SBI_ERR_FAILURE;
+ goto out;
+ }
+ }
/* Start the counters that have been configured and requested by the guest */
for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
pmc_index = i + ctr_base;
if (!test_bit(pmc_index, kvpmu->pmc_in_use))
continue;
pmc = &kvpmu->pmc[pmc_index];
- if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE)
+ if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE) {
pmc->counter_val = ival;
+ } else if (snap_flag_set) {
+ /* The counter index in the snapshot are relative to the counter base */
+ pmc->counter_val = kvpmu->sdata->ctr_values[i];
+ }
+
if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
fevent_code = get_event_code(pmc->event_idx);
if (fevent_code >= SBI_PMU_FW_MAX) {
@@ -398,14 +491,22 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
{
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
int i, pmc_index, sbiret = 0;
+ u64 enabled, running;
struct kvm_pmc *pmc;
int fevent_code;
+ bool snap_flag_set = flags & SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
+ bool shmem_needs_update = false;
if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
sbiret = SBI_ERR_INVALID_PARAM;
goto out;
}
+ if (snap_flag_set && kvpmu->snapshot_addr == INVALID_GPA) {
+ sbiret = SBI_ERR_NO_SHMEM;
+ goto out;
+ }
+
/* Stop the counters that have been configured and requested by the guest */
for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
pmc_index = i + ctr_base;
@@ -438,12 +539,28 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
} else {
sbiret = SBI_ERR_INVALID_PARAM;
}
+
+ if (snap_flag_set && !sbiret) {
+ if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW)
+ pmc->counter_val = kvpmu->fw_event[fevent_code].value;
+ else if (pmc->perf_event)
+ pmc->counter_val += perf_event_read_value(pmc->perf_event,
+ &enabled, &running);
+ /* TODO: Add counter overflow support when sscofpmf support is added */
+ kvpmu->sdata->ctr_values[i] = pmc->counter_val;
+ shmem_needs_update = true;
+ }
+
if (flags & SBI_PMU_STOP_FLAG_RESET) {
pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
clear_bit(pmc_index, kvpmu->pmc_in_use);
}
}
+ if (shmem_needs_update)
+ kvm_vcpu_write_guest(vcpu, kvpmu->snapshot_addr, kvpmu->sdata,
+ sizeof(struct riscv_pmu_snapshot_data));
+
out:
retdata->err_val = sbiret;
@@ -566,6 +683,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
kvpmu->num_hw_ctrs = num_hw_ctrs + 1;
kvpmu->num_fw_ctrs = SBI_PMU_FW_MAX;
memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
+ kvpmu->snapshot_addr = INVALID_GPA;
if (kvpmu->num_hw_ctrs > RISCV_KVM_MAX_HW_CTRS) {
pr_warn_once("Limiting the hardware counters to 32 as specified by the ISA");
@@ -625,6 +743,7 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
}
bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
+ kvm_pmu_clear_snapshot_area(vcpu);
}
void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
index e1633606c98b..d3e7625fb2d2 100644
--- a/arch/riscv/kvm/vcpu_sbi_pmu.c
+++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
@@ -64,6 +64,9 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
case SBI_EXT_PMU_COUNTER_FW_READ:
ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, retdata);
break;
+ case SBI_EXT_PMU_SNAPSHOT_SET_SHMEM:
+ ret = kvm_riscv_vcpu_pmu_snapshot_set_shmem(vcpu, cp->a0, cp->a1, cp->a2, retdata);
+ break;
default:
retdata->err_val = SBI_ERR_NOT_SUPPORTED;
}
--
2.34.1
KVM enables perf for guest via counter virtualization. However, the
sampling can not be supported as there is no mechanism to enabled
trap/emulate scountovf in ISA yet. Rely on the SBI PMU snapshot
to provide the counter overflow data via the shared memory.
In case of sampling event, the host first sets the guest's LCOFI
interrupt and injects to the guest via irq filtering mechanism defined
in AIA specification. Thus, ssaia must be enabled in the host in order
to use perf sampling in the guest. No other AIA dependency w.r.t kernel
is required.
Reviewed-by: Anup Patel <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/csr.h | 3 +-
arch/riscv/include/asm/kvm_vcpu_pmu.h | 3 ++
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kvm/aia.c | 5 ++
arch/riscv/kvm/vcpu.c | 15 ++++--
arch/riscv/kvm/vcpu_onereg.c | 6 +++
arch/riscv/kvm/vcpu_pmu.c | 68 +++++++++++++++++++++++++--
7 files changed, 93 insertions(+), 8 deletions(-)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 9d1b07932794..25966995da04 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -168,7 +168,8 @@
#define VSIP_TO_HVIP_SHIFT (IRQ_VS_SOFT - IRQ_S_SOFT)
#define VSIP_VALID_MASK ((_AC(1, UL) << IRQ_S_SOFT) | \
(_AC(1, UL) << IRQ_S_TIMER) | \
- (_AC(1, UL) << IRQ_S_EXT))
+ (_AC(1, UL) << IRQ_S_EXT) | \
+ (_AC(1, UL) << IRQ_PMU_OVF))
/* AIA CSR bits */
#define TOPI_IID_SHIFT 16
diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
index 77a1fc4d203d..257f17641e00 100644
--- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
+++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
@@ -36,6 +36,7 @@ struct kvm_pmc {
bool started;
/* Monitoring event ID */
unsigned long event_idx;
+ struct kvm_vcpu *vcpu;
};
/* PMU data structure per vcpu */
@@ -50,6 +51,8 @@ struct kvm_pmu {
bool init_done;
/* Bit map of all the virtual counter used */
DECLARE_BITMAP(pmc_in_use, RISCV_KVM_MAX_COUNTERS);
+ /* Bit map of all the virtual counter overflown */
+ DECLARE_BITMAP(pmc_overflown, RISCV_KVM_MAX_COUNTERS);
/* The address of the counter snapshot area (guest physical address) */
gpa_t snapshot_addr;
/* The actual data of the snapshot */
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index b1c503c2959c..e878e7cc3978 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -167,6 +167,7 @@ enum KVM_RISCV_ISA_EXT_ID {
KVM_RISCV_ISA_EXT_ZFA,
KVM_RISCV_ISA_EXT_ZTSO,
KVM_RISCV_ISA_EXT_ZACAS,
+ KVM_RISCV_ISA_EXT_SSCOFPMF,
KVM_RISCV_ISA_EXT_MAX,
};
diff --git a/arch/riscv/kvm/aia.c b/arch/riscv/kvm/aia.c
index a944294f6f23..0f0a9d11bb5f 100644
--- a/arch/riscv/kvm/aia.c
+++ b/arch/riscv/kvm/aia.c
@@ -545,6 +545,9 @@ void kvm_riscv_aia_enable(void)
enable_percpu_irq(hgei_parent_irq,
irq_get_trigger_type(hgei_parent_irq));
csr_set(CSR_HIE, BIT(IRQ_S_GEXT));
+ /* Enable IRQ filtering for overflow interrupt only if sscofpmf is present */
+ if (__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSCOFPMF))
+ csr_write(CSR_HVIEN, BIT(IRQ_PMU_OVF));
}
void kvm_riscv_aia_disable(void)
@@ -558,6 +561,8 @@ void kvm_riscv_aia_disable(void)
return;
hgctrl = get_cpu_ptr(&aia_hgei);
+ if (__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSCOFPMF))
+ csr_clear(CSR_HVIEN, BIT(IRQ_PMU_OVF));
/* Disable per-CPU SGEI interrupt */
csr_clear(CSR_HIE, BIT(IRQ_S_GEXT));
disable_percpu_irq(hgei_parent_irq);
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index b5ca9f2e98ac..bb10771b2b18 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -365,6 +365,13 @@ void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu)
}
}
+ /* Sync up the HVIP.LCOFIP bit changes (only clear) by the guest */
+ if ((csr->hvip ^ hvip) & (1UL << IRQ_PMU_OVF)) {
+ if (!(hvip & (1UL << IRQ_PMU_OVF)) &&
+ !test_and_set_bit(IRQ_PMU_OVF, v->irqs_pending_mask))
+ clear_bit(IRQ_PMU_OVF, v->irqs_pending);
+ }
+
/* Sync-up AIA high interrupts */
kvm_riscv_vcpu_aia_sync_interrupts(vcpu);
@@ -382,7 +389,8 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
if (irq < IRQ_LOCAL_MAX &&
irq != IRQ_VS_SOFT &&
irq != IRQ_VS_TIMER &&
- irq != IRQ_VS_EXT)
+ irq != IRQ_VS_EXT &&
+ irq != IRQ_PMU_OVF)
return -EINVAL;
set_bit(irq, vcpu->arch.irqs_pending);
@@ -397,14 +405,15 @@ int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq)
{
/*
- * We only allow VS-mode software, timer, and external
+ * We only allow VS-mode software, timer, counter overflow and external
* interrupts when irq is one of the local interrupts
* defined by RISC-V privilege specification.
*/
if (irq < IRQ_LOCAL_MAX &&
irq != IRQ_VS_SOFT &&
irq != IRQ_VS_TIMER &&
- irq != IRQ_VS_EXT)
+ irq != IRQ_VS_EXT &&
+ irq != IRQ_PMU_OVF)
return -EINVAL;
clear_bit(irq, vcpu->arch.irqs_pending);
diff --git a/arch/riscv/kvm/vcpu_onereg.c b/arch/riscv/kvm/vcpu_onereg.c
index 994adc26db4b..c676275ea0a0 100644
--- a/arch/riscv/kvm/vcpu_onereg.c
+++ b/arch/riscv/kvm/vcpu_onereg.c
@@ -36,6 +36,7 @@ static const unsigned long kvm_isa_ext_arr[] = {
/* Multi letter extensions (alphabetically sorted) */
KVM_ISA_EXT_ARR(SMSTATEEN),
KVM_ISA_EXT_ARR(SSAIA),
+ KVM_ISA_EXT_ARR(SSCOFPMF),
KVM_ISA_EXT_ARR(SSTC),
KVM_ISA_EXT_ARR(SVINVAL),
KVM_ISA_EXT_ARR(SVNAPOT),
@@ -99,6 +100,9 @@ static bool kvm_riscv_vcpu_isa_enable_allowed(unsigned long ext)
switch (ext) {
case KVM_RISCV_ISA_EXT_H:
return false;
+ case KVM_RISCV_ISA_EXT_SSCOFPMF:
+ /* Sscofpmf depends on interrupt filtering defined in ssaia */
+ return __riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSAIA);
case KVM_RISCV_ISA_EXT_V:
return riscv_v_vstate_ctrl_user_allowed();
default:
@@ -116,6 +120,8 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
case KVM_RISCV_ISA_EXT_C:
case KVM_RISCV_ISA_EXT_I:
case KVM_RISCV_ISA_EXT_M:
+ /* There is not architectural config bit to disable sscofpmf completely */
+ case KVM_RISCV_ISA_EXT_SSCOFPMF:
case KVM_RISCV_ISA_EXT_SSTC:
case KVM_RISCV_ISA_EXT_SVINVAL:
case KVM_RISCV_ISA_EXT_SVNAPOT:
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index 2ebccd73680f..a801ed52dc9b 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -230,6 +230,47 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
return 0;
}
+static void kvm_riscv_pmu_overflow(struct perf_event *perf_event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct kvm_pmc *pmc = perf_event->overflow_handler_context;
+ struct kvm_vcpu *vcpu = pmc->vcpu;
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ struct riscv_pmu *rpmu = to_riscv_pmu(perf_event->pmu);
+ u64 period;
+
+ /*
+ * Stop the event counting by directly accessing the perf_event.
+ * Otherwise, this needs to deferred via a workqueue.
+ * That will introduce skew in the counter value because the actual
+ * physical counter would start after returning from this function.
+ * It will be stopped again once the workqueue is scheduled
+ */
+ rpmu->pmu.stop(perf_event, PERF_EF_UPDATE);
+
+ /*
+ * The hw counter would start automatically when this function returns.
+ * Thus, the host may continue to interrupt and inject it to the guest
+ * even without the guest configuring the next event. Depending on the hardware
+ * the host may have some sluggishness only if privilege mode filtering is not
+ * available. In an ideal world, where qemu is not the only capable hardware,
+ * this can be removed.
+ * FYI: ARM64 does this way while x86 doesn't do anything as such.
+ * TODO: Should we keep it for RISC-V ?
+ */
+ period = -(local64_read(&perf_event->count));
+
+ local64_set(&perf_event->hw.period_left, 0);
+ perf_event->attr.sample_period = period;
+ perf_event->hw.sample_period = period;
+
+ set_bit(pmc->idx, kvpmu->pmc_overflown);
+ kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_PMU_OVF);
+
+ rpmu->pmu.start(perf_event, PERF_EF_RELOAD);
+}
+
static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_attr *attr,
unsigned long flags, unsigned long eidx,
unsigned long evtdata)
@@ -249,7 +290,7 @@ static long kvm_pmu_create_perf_event(struct kvm_pmc *pmc, struct perf_event_att
*/
attr->sample_period = kvm_pmu_get_sample_period(pmc);
- event = perf_event_create_kernel_counter(attr, -1, current, NULL, pmc);
+ event = perf_event_create_kernel_counter(attr, -1, current, kvm_riscv_pmu_overflow, pmc);
if (IS_ERR(event)) {
pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
return PTR_ERR(event);
@@ -443,6 +484,8 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
pmc_index = i + ctr_base;
if (!test_bit(pmc_index, kvpmu->pmc_in_use))
continue;
+ /* The guest started the counter again. Reset the overflow status */
+ clear_bit(pmc_index, kvpmu->pmc_overflown);
pmc = &kvpmu->pmc[pmc_index];
if (flags & SBI_PMU_START_FLAG_SET_INIT_VALUE) {
pmc->counter_val = ival;
@@ -546,7 +589,13 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
else if (pmc->perf_event)
pmc->counter_val += perf_event_read_value(pmc->perf_event,
&enabled, &running);
- /* TODO: Add counter overflow support when sscofpmf support is added */
+ /*
+ * The counter and overflow indicies in the snapshot region are w.r.to
+ * cbase. Modify the set bit in the counter mask instead of the pmc_index
+ * which indicates the absolute counter index.
+ */
+ if (test_bit(pmc_index, kvpmu->pmc_overflown))
+ kvpmu->sdata->ctr_overflow_mask |= BIT(i);
kvpmu->sdata->ctr_values[i] = pmc->counter_val;
shmem_needs_update = true;
}
@@ -554,6 +603,15 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
if (flags & SBI_PMU_STOP_FLAG_RESET) {
pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
clear_bit(pmc_index, kvpmu->pmc_in_use);
+ clear_bit(pmc_index, kvpmu->pmc_overflown);
+ if (snap_flag_set) {
+ /*
+ * Only clear the given counter as the caller is responsible to
+ * validate both the overflow mask and configured counters.
+ */
+ kvpmu->sdata->ctr_overflow_mask &= ~BIT(i);
+ shmem_needs_update = true;
+ }
}
}
@@ -703,6 +761,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
pmc = &kvpmu->pmc[i];
pmc->idx = i;
pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
+ pmc->vcpu = vcpu;
if (i < kvpmu->num_hw_ctrs) {
pmc->cinfo.type = SBI_PMU_CTR_TYPE_HW;
if (i < 3)
@@ -735,13 +794,14 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
if (!kvpmu)
return;
- for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_MAX_COUNTERS) {
+ for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS) {
pmc = &kvpmu->pmc[i];
pmc->counter_val = 0;
kvm_pmu_release_perf_event(pmc);
pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
}
- bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
+ bitmap_zero(kvpmu->pmc_in_use, RISCV_KVM_MAX_COUNTERS);
+ bitmap_zero(kvpmu->pmc_overflown, RISCV_KVM_MAX_COUNTERS);
memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
kvm_pmu_clear_snapshot_area(vcpu);
}
--
2.34.1
The SBI v2.0 introduced a fw_read_hi function to read 64 bit firmware
counters for RV32 based systems.
Add infrastructure to support that.
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/kvm_vcpu_pmu.h | 4 ++-
arch/riscv/kvm/vcpu_pmu.c | 44 ++++++++++++++++++++++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 6 ++++
3 files changed, 52 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
index 257f17641e00..55861b5d3382 100644
--- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
+++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
@@ -20,7 +20,7 @@ static_assert(RISCV_KVM_MAX_COUNTERS <= 64);
struct kvm_fw_event {
/* Current value of the event */
- unsigned long value;
+ u64 value;
/* Event monitoring status */
bool started;
@@ -91,6 +91,8 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
struct kvm_vcpu_sbi_return *retdata);
int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_return *retdata);
+int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
+ struct kvm_vcpu_sbi_return *retdata);
void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
int kvm_riscv_vcpu_pmu_snapshot_set_shmem(struct kvm_vcpu *vcpu, unsigned long saddr_low,
unsigned long saddr_high, unsigned long flags,
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index a801ed52dc9b..e1409ec9afc0 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -197,6 +197,36 @@ static int pmu_get_pmc_index(struct kvm_pmu *pmu, unsigned long eidx,
return kvm_pmu_get_programmable_pmc_index(pmu, eidx, cbase, cmask);
}
+static int pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
+ unsigned long *out_val)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ struct kvm_pmc *pmc;
+ int fevent_code;
+
+ if (!IS_ENABLED(CONFIG_32BIT)) {
+ pr_warn("%s: should be invoked for only RV32\n", __func__);
+ return -EINVAL;
+ }
+
+ if (cidx >= kvm_pmu_num_counters(kvpmu) || cidx == 1) {
+ pr_warn("Invalid counter id [%ld]during read\n", cidx);
+ return -EINVAL;
+ }
+
+ pmc = &kvpmu->pmc[cidx];
+
+ if (pmc->cinfo.type != SBI_PMU_CTR_TYPE_FW)
+ return -EINVAL;
+
+ fevent_code = get_event_code(pmc->event_idx);
+ pmc->counter_val = kvpmu->fw_event[fevent_code].value;
+
+ *out_val = pmc->counter_val >> 32;
+
+ return 0;
+}
+
static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
unsigned long *out_val)
{
@@ -705,6 +735,18 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
return 0;
}
+int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
+ struct kvm_vcpu_sbi_return *retdata)
+{
+ int ret;
+
+ ret = pmu_fw_ctr_read_hi(vcpu, cidx, &retdata->out_val);
+ if (ret == -EINVAL)
+ retdata->err_val = SBI_ERR_INVALID_PARAM;
+
+ return 0;
+}
+
int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_return *retdata)
{
@@ -778,7 +820,7 @@ void kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
pmc->cinfo.csr = CSR_CYCLE + i;
} else {
pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
- pmc->cinfo.width = BITS_PER_LONG - 1;
+ pmc->cinfo.width = 63;
}
}
diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
index d3e7625fb2d2..cf111de51bdb 100644
--- a/arch/riscv/kvm/vcpu_sbi_pmu.c
+++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
@@ -64,6 +64,12 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
case SBI_EXT_PMU_COUNTER_FW_READ:
ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, retdata);
break;
+ case SBI_EXT_PMU_COUNTER_FW_READ_HI:
+ if (IS_ENABLED(CONFIG_32BIT))
+ ret = kvm_riscv_vcpu_pmu_fw_ctr_read_hi(vcpu, cp->a0, retdata);
+ else
+ retdata->out_val = 0;
+ break;
case SBI_EXT_PMU_SNAPSHOT_SET_SHMEM:
ret = kvm_riscv_vcpu_pmu_snapshot_set_shmem(vcpu, cp->a0, cp->a1, cp->a2, retdata);
break;
--
2.34.1
Rename the function to indicate that it is meant for firmware
counter read. While at it, add a range sanity check for it as
well.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/kvm_vcpu_pmu.h | 2 +-
arch/riscv/kvm/vcpu_pmu.c | 7 ++++++-
arch/riscv/kvm/vcpu_sbi_pmu.c | 2 +-
3 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
index 55861b5d3382..fa0f535bbbf0 100644
--- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
+++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
@@ -89,7 +89,7 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
unsigned long ctr_mask, unsigned long flags,
unsigned long eidx, u64 evtdata,
struct kvm_vcpu_sbi_return *retdata);
-int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
+int kvm_riscv_vcpu_pmu_fw_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_return *retdata);
int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_return *retdata);
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index e1409ec9afc0..04db1f993c47 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -235,6 +235,11 @@ static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
u64 enabled, running;
int fevent_code;
+ if (cidx >= kvm_pmu_num_counters(kvpmu) || cidx == 1) {
+ pr_warn("Invalid counter id [%ld] during read\n", cidx);
+ return -EINVAL;
+ }
+
pmc = &kvpmu->pmc[cidx];
if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
@@ -747,7 +752,7 @@ int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
return 0;
}
-int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
+int kvm_riscv_vcpu_pmu_fw_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_return *retdata)
{
int ret;
diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
index cf111de51bdb..e4be34e03e83 100644
--- a/arch/riscv/kvm/vcpu_sbi_pmu.c
+++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
@@ -62,7 +62,7 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
ret = kvm_riscv_vcpu_pmu_ctr_stop(vcpu, cp->a0, cp->a1, cp->a2, retdata);
break;
case SBI_EXT_PMU_COUNTER_FW_READ:
- ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, retdata);
+ ret = kvm_riscv_vcpu_pmu_fw_ctr_read(vcpu, cp->a0, retdata);
break;
case SBI_EXT_PMU_COUNTER_FW_READ_HI:
if (IS_ENABLED(CONFIG_32BIT))
--
2.34.1
The SBI definitions will continue to grow. Move the sbi related
definitions to its own header file from processor.h
Suggested-by: Andrew Jones <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
.../selftests/kvm/include/riscv/processor.h | 39 ---------------
.../testing/selftests/kvm/include/riscv/sbi.h | 50 +++++++++++++++++++
.../selftests/kvm/include/riscv/ucall.h | 1 +
tools/testing/selftests/kvm/steal_time.c | 4 +-
4 files changed, 54 insertions(+), 40 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/riscv/sbi.h
diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
index ce473fe251dd..3b9cb39327ff 100644
--- a/tools/testing/selftests/kvm/include/riscv/processor.h
+++ b/tools/testing/selftests/kvm/include/riscv/processor.h
@@ -154,45 +154,6 @@ void vm_install_interrupt_handler(struct kvm_vm *vm, exception_handler_fn handle
#define PGTBL_PAGE_SIZE PGTBL_L0_BLOCK_SIZE
#define PGTBL_PAGE_SIZE_SHIFT PGTBL_L0_BLOCK_SHIFT
-/* SBI return error codes */
-#define SBI_SUCCESS 0
-#define SBI_ERR_FAILURE -1
-#define SBI_ERR_NOT_SUPPORTED -2
-#define SBI_ERR_INVALID_PARAM -3
-#define SBI_ERR_DENIED -4
-#define SBI_ERR_INVALID_ADDRESS -5
-#define SBI_ERR_ALREADY_AVAILABLE -6
-#define SBI_ERR_ALREADY_STARTED -7
-#define SBI_ERR_ALREADY_STOPPED -8
-
-#define SBI_EXT_EXPERIMENTAL_START 0x08000000
-#define SBI_EXT_EXPERIMENTAL_END 0x08FFFFFF
-
-#define KVM_RISCV_SELFTESTS_SBI_EXT SBI_EXT_EXPERIMENTAL_END
-#define KVM_RISCV_SELFTESTS_SBI_UCALL 0
-#define KVM_RISCV_SELFTESTS_SBI_UNEXP 1
-
-enum sbi_ext_id {
- SBI_EXT_BASE = 0x10,
- SBI_EXT_STA = 0x535441,
-};
-
-enum sbi_ext_base_fid {
- SBI_EXT_BASE_PROBE_EXT = 3,
-};
-
-struct sbiret {
- long error;
- long value;
-};
-
-struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
- unsigned long arg1, unsigned long arg2,
- unsigned long arg3, unsigned long arg4,
- unsigned long arg5);
-
-bool guest_sbi_probe_extension(int extid, long *out_val);
-
static inline void local_irq_enable(void)
{
csr_set(CSR_SSTATUS, SR_SIE);
diff --git a/tools/testing/selftests/kvm/include/riscv/sbi.h b/tools/testing/selftests/kvm/include/riscv/sbi.h
new file mode 100644
index 000000000000..ba04f2dec7b5
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/riscv/sbi.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * RISC-V SBI specific definitions
+ *
+ * Copyright (C) 2024 Rivos Inc.
+ */
+
+#ifndef SELFTEST_KVM_SBI_H
+#define SELFTEST_KVM_SBI_H
+
+/* SBI return error codes */
+#define SBI_SUCCESS 0
+#define SBI_ERR_FAILURE -1
+#define SBI_ERR_NOT_SUPPORTED -2
+#define SBI_ERR_INVALID_PARAM -3
+#define SBI_ERR_DENIED -4
+#define SBI_ERR_INVALID_ADDRESS -5
+#define SBI_ERR_ALREADY_AVAILABLE -6
+#define SBI_ERR_ALREADY_STARTED -7
+#define SBI_ERR_ALREADY_STOPPED -8
+
+#define SBI_EXT_EXPERIMENTAL_START 0x08000000
+#define SBI_EXT_EXPERIMENTAL_END 0x08FFFFFF
+
+#define KVM_RISCV_SELFTESTS_SBI_EXT SBI_EXT_EXPERIMENTAL_END
+#define KVM_RISCV_SELFTESTS_SBI_UCALL 0
+#define KVM_RISCV_SELFTESTS_SBI_UNEXP 1
+
+enum sbi_ext_id {
+ SBI_EXT_BASE = 0x10,
+ SBI_EXT_STA = 0x535441,
+};
+
+enum sbi_ext_base_fid {
+ SBI_EXT_BASE_PROBE_EXT = 3,
+};
+
+struct sbiret {
+ long error;
+ long value;
+};
+
+struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
+ unsigned long arg1, unsigned long arg2,
+ unsigned long arg3, unsigned long arg4,
+ unsigned long arg5);
+
+bool guest_sbi_probe_extension(int extid, long *out_val);
+
+#endif /* SELFTEST_KVM_SBI_H */
diff --git a/tools/testing/selftests/kvm/include/riscv/ucall.h b/tools/testing/selftests/kvm/include/riscv/ucall.h
index be46eb32ec27..a695ae36f3e0 100644
--- a/tools/testing/selftests/kvm/include/riscv/ucall.h
+++ b/tools/testing/selftests/kvm/include/riscv/ucall.h
@@ -3,6 +3,7 @@
#define SELFTEST_KVM_UCALL_H
#include "processor.h"
+#include "sbi.h"
#define UCALL_EXIT_REASON KVM_EXIT_RISCV_SBI
diff --git a/tools/testing/selftests/kvm/steal_time.c b/tools/testing/selftests/kvm/steal_time.c
index bae0c5026f82..2ff82c7fd926 100644
--- a/tools/testing/selftests/kvm/steal_time.c
+++ b/tools/testing/selftests/kvm/steal_time.c
@@ -11,7 +11,9 @@
#include <pthread.h>
#include <linux/kernel.h>
#include <asm/kvm.h>
-#ifndef __riscv
+#ifdef __riscv
+#include "sbi.h"
+#else
#include <asm/kvm_para.h>
#endif
--
2.34.1
__vcpu_has_ext can check both SBI and ISA extensions when the first
argument is properly converted to SBI/ISA extension IDs. Introduce
two helper functions to make life easier for developers so they
don't have to worry about the conversions.
Replace the current usages as well with new helpers.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
tools/testing/selftests/kvm/include/riscv/processor.h | 10 ++++++++++
tools/testing/selftests/kvm/riscv/arch_timer.c | 2 +-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
index 3b9cb39327ff..5f389166338c 100644
--- a/tools/testing/selftests/kvm/include/riscv/processor.h
+++ b/tools/testing/selftests/kvm/include/riscv/processor.h
@@ -50,6 +50,16 @@ static inline uint64_t __kvm_reg_id(uint64_t type, uint64_t subtype,
bool __vcpu_has_ext(struct kvm_vcpu *vcpu, uint64_t ext);
+static inline bool __vcpu_has_isa_ext(struct kvm_vcpu *vcpu, uint64_t isa_ext)
+{
+ return __vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(isa_ext));
+}
+
+static inline bool __vcpu_has_sbi_ext(struct kvm_vcpu *vcpu, uint64_t sbi_ext)
+{
+ return __vcpu_has_ext(vcpu, RISCV_SBI_EXT_REG(sbi_ext));
+}
+
struct ex_regs {
unsigned long ra;
unsigned long sp;
diff --git a/tools/testing/selftests/kvm/riscv/arch_timer.c b/tools/testing/selftests/kvm/riscv/arch_timer.c
index 0f9cabd99fd4..735b78569021 100644
--- a/tools/testing/selftests/kvm/riscv/arch_timer.c
+++ b/tools/testing/selftests/kvm/riscv/arch_timer.c
@@ -85,7 +85,7 @@ struct kvm_vm *test_vm_create(void)
int nr_vcpus = test_args.nr_vcpus;
vm = vm_create_with_vcpus(nr_vcpus, guest_code, vcpus);
- __TEST_REQUIRE(__vcpu_has_ext(vcpus[0], RISCV_ISA_EXT_REG(KVM_RISCV_ISA_EXT_SSTC)),
+ __TEST_REQUIRE(__vcpu_has_isa_ext(vcpus[0], KVM_RISCV_ISA_EXT_SSTC),
"SSTC not available, skipping test\n");
vm_init_vector_tables(vm);
--
2.34.1
The KVM RISC-V allows Sscofpmf extension for Guest/VM so let us
add this extension to get-reg-list test.
Reviewed-by: Anup Patel <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
tools/testing/selftests/kvm/riscv/get-reg-list.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/kvm/riscv/get-reg-list.c b/tools/testing/selftests/kvm/riscv/get-reg-list.c
index b882b7b9b785..222198dd6d04 100644
--- a/tools/testing/selftests/kvm/riscv/get-reg-list.c
+++ b/tools/testing/selftests/kvm/riscv/get-reg-list.c
@@ -43,6 +43,7 @@ bool filter_reg(__u64 reg)
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_V:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SMSTATEEN:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSAIA:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSCOFPMF:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSTC:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVINVAL:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVNAPOT:
@@ -408,6 +409,7 @@ static const char *isa_ext_single_id_to_str(__u64 reg_off)
KVM_ISA_EXT_ARR(V),
KVM_ISA_EXT_ARR(SMSTATEEN),
KVM_ISA_EXT_ARR(SSAIA),
+ KVM_ISA_EXT_ARR(SSCOFPMF),
KVM_ISA_EXT_ARR(SSTC),
KVM_ISA_EXT_ARR(SVINVAL),
KVM_ISA_EXT_ARR(SVNAPOT),
@@ -931,6 +933,7 @@ KVM_ISA_EXT_SUBLIST_CONFIG(fp_f, FP_F);
KVM_ISA_EXT_SUBLIST_CONFIG(fp_d, FP_D);
KVM_ISA_EXT_SIMPLE_CONFIG(h, H);
KVM_ISA_EXT_SUBLIST_CONFIG(smstateen, SMSTATEEN);
+KVM_ISA_EXT_SIMPLE_CONFIG(sscofpmf, SSCOFPMF);
KVM_ISA_EXT_SIMPLE_CONFIG(sstc, SSTC);
KVM_ISA_EXT_SIMPLE_CONFIG(svinval, SVINVAL);
KVM_ISA_EXT_SIMPLE_CONFIG(svnapot, SVNAPOT);
@@ -986,6 +989,7 @@ struct vcpu_reg_list *vcpu_configs[] = {
&config_fp_d,
&config_h,
&config_smstateen,
+ &config_sscofpmf,
&config_sstc,
&config_svinval,
&config_svnapot,
--
2.34.1
The SBI PMU extension definition is required for upcoming SBI PMU
selftests.
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
.../testing/selftests/kvm/include/riscv/sbi.h | 66 +++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/riscv/sbi.h b/tools/testing/selftests/kvm/include/riscv/sbi.h
index ba04f2dec7b5..6675ca673c77 100644
--- a/tools/testing/selftests/kvm/include/riscv/sbi.h
+++ b/tools/testing/selftests/kvm/include/riscv/sbi.h
@@ -29,17 +29,83 @@
enum sbi_ext_id {
SBI_EXT_BASE = 0x10,
SBI_EXT_STA = 0x535441,
+ SBI_EXT_PMU = 0x504D55,
};
enum sbi_ext_base_fid {
SBI_EXT_BASE_PROBE_EXT = 3,
};
+enum sbi_ext_pmu_fid {
+ SBI_EXT_PMU_NUM_COUNTERS = 0,
+ SBI_EXT_PMU_COUNTER_GET_INFO,
+ SBI_EXT_PMU_COUNTER_CFG_MATCH,
+ SBI_EXT_PMU_COUNTER_START,
+ SBI_EXT_PMU_COUNTER_STOP,
+ SBI_EXT_PMU_COUNTER_FW_READ,
+ SBI_EXT_PMU_COUNTER_FW_READ_HI,
+ SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
+};
+
+union sbi_pmu_ctr_info {
+ unsigned long value;
+ struct {
+ unsigned long csr:12;
+ unsigned long width:6;
+#if __riscv_xlen == 32
+ unsigned long reserved:13;
+#else
+ unsigned long reserved:45;
+#endif
+ unsigned long type:1;
+ };
+};
struct sbiret {
long error;
long value;
};
+/** General pmu event codes specified in SBI PMU extension */
+enum sbi_pmu_hw_generic_events_t {
+ SBI_PMU_HW_NO_EVENT = 0,
+ SBI_PMU_HW_CPU_CYCLES = 1,
+ SBI_PMU_HW_INSTRUCTIONS = 2,
+ SBI_PMU_HW_CACHE_REFERENCES = 3,
+ SBI_PMU_HW_CACHE_MISSES = 4,
+ SBI_PMU_HW_BRANCH_INSTRUCTIONS = 5,
+ SBI_PMU_HW_BRANCH_MISSES = 6,
+ SBI_PMU_HW_BUS_CYCLES = 7,
+ SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 8,
+ SBI_PMU_HW_STALLED_CYCLES_BACKEND = 9,
+ SBI_PMU_HW_REF_CPU_CYCLES = 10,
+
+ SBI_PMU_HW_GENERAL_MAX,
+};
+
+/* SBI PMU counter types */
+enum sbi_pmu_ctr_type {
+ SBI_PMU_CTR_TYPE_HW = 0x0,
+ SBI_PMU_CTR_TYPE_FW,
+};
+
+/* Flags defined for config matching function */
+#define SBI_PMU_CFG_FLAG_SKIP_MATCH BIT(0)
+#define SBI_PMU_CFG_FLAG_CLEAR_VALUE BIT(1)
+#define SBI_PMU_CFG_FLAG_AUTO_START BIT(2)
+#define SBI_PMU_CFG_FLAG_SET_VUINH BIT(3)
+#define SBI_PMU_CFG_FLAG_SET_VSINH BIT(4)
+#define SBI_PMU_CFG_FLAG_SET_UINH BIT(5)
+#define SBI_PMU_CFG_FLAG_SET_SINH BIT(6)
+#define SBI_PMU_CFG_FLAG_SET_MINH BIT(7)
+
+/* Flags defined for counter start function */
+#define SBI_PMU_START_FLAG_SET_INIT_VALUE BIT(0)
+#define SBI_PMU_START_FLAG_INIT_SNAPSHOT BIT(1)
+
+/* Flags defined for counter stop function */
+#define SBI_PMU_STOP_FLAG_RESET BIT(0)
+#define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
+
struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
unsigned long arg1, unsigned long arg2,
unsigned long arg3, unsigned long arg4,
--
2.34.1
This test implements basic sanity test and cycle/instret event
counting tests.
Reviewed-by: Anup Patel <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/riscv/sbi_pmu_test.c | 369 ++++++++++++++++++
2 files changed, 370 insertions(+)
create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 741c7dc16afc..1cfcd2797ee4 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -189,6 +189,7 @@ TEST_GEN_PROGS_s390x += rseq_test
TEST_GEN_PROGS_s390x += set_memory_region_test
TEST_GEN_PROGS_s390x += kvm_binary_stats_test
+TEST_GEN_PROGS_riscv += riscv/sbi_pmu_test
TEST_GEN_PROGS_riscv += arch_timer
TEST_GEN_PROGS_riscv += demand_paging_test
TEST_GEN_PROGS_riscv += dirty_log_test
diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
new file mode 100644
index 000000000000..7c81691e39c5
--- /dev/null
+++ b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
@@ -0,0 +1,369 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * sbi_pmu_test.c - Tests the riscv64 SBI PMU functionality.
+ *
+ * Copyright (c) 2024, Rivos Inc.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include "kvm_util.h"
+#include "test_util.h"
+#include "processor.h"
+#include "sbi.h"
+
+/* Maximum counters(firmware + hardware) */
+#define RISCV_MAX_PMU_COUNTERS 64
+union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
+
+/* Cache the available counters in a bitmask */
+static unsigned long counter_mask_available;
+
+static bool illegal_handler_invoked;
+
+unsigned long pmu_csr_read_num(int csr_num)
+{
+#define switchcase_csr_read(__csr_num, __val) {\
+ case __csr_num: \
+ __val = csr_read(__csr_num); \
+ break; }
+#define switchcase_csr_read_2(__csr_num, __val) {\
+ switchcase_csr_read(__csr_num + 0, __val) \
+ switchcase_csr_read(__csr_num + 1, __val)}
+#define switchcase_csr_read_4(__csr_num, __val) {\
+ switchcase_csr_read_2(__csr_num + 0, __val) \
+ switchcase_csr_read_2(__csr_num + 2, __val)}
+#define switchcase_csr_read_8(__csr_num, __val) {\
+ switchcase_csr_read_4(__csr_num + 0, __val) \
+ switchcase_csr_read_4(__csr_num + 4, __val)}
+#define switchcase_csr_read_16(__csr_num, __val) {\
+ switchcase_csr_read_8(__csr_num + 0, __val) \
+ switchcase_csr_read_8(__csr_num + 8, __val)}
+#define switchcase_csr_read_32(__csr_num, __val) {\
+ switchcase_csr_read_16(__csr_num + 0, __val) \
+ switchcase_csr_read_16(__csr_num + 16, __val)}
+
+ unsigned long ret = 0;
+
+ switch (csr_num) {
+ switchcase_csr_read_32(CSR_CYCLE, ret)
+ switchcase_csr_read_32(CSR_CYCLEH, ret)
+ default :
+ break;
+ }
+
+ return ret;
+#undef switchcase_csr_read_32
+#undef switchcase_csr_read_16
+#undef switchcase_csr_read_8
+#undef switchcase_csr_read_4
+#undef switchcase_csr_read_2
+#undef switchcase_csr_read
+}
+
+static inline void dummy_func_loop(uint64_t iter)
+{
+ int i = 0;
+
+ while (i < iter) {
+ asm volatile("nop");
+ i++;
+ }
+}
+
+static void start_counter(unsigned long counter, unsigned long start_flags,
+ unsigned long ival)
+{
+ struct sbiret ret;
+
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, counter, 1, start_flags,
+ ival, 0, 0);
+ __GUEST_ASSERT(ret.error == 0, "Unable to start counter %ld\n", counter);
+}
+
+/* This should be invoked only for reset counter use case */
+static void stop_reset_counter(unsigned long counter, unsigned long stop_flags)
+{
+ struct sbiret ret;
+
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, counter, 1,
+ stop_flags | SBI_PMU_STOP_FLAG_RESET, 0, 0, 0);
+ __GUEST_ASSERT(ret.error == SBI_ERR_ALREADY_STOPPED,
+ "Unable to stop counter %ld\n", counter);
+}
+
+static void stop_counter(unsigned long counter, unsigned long stop_flags)
+{
+ struct sbiret ret;
+
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, counter, 1, stop_flags,
+ 0, 0, 0);
+ __GUEST_ASSERT(ret.error == 0, "Unable to stop counter %ld error %ld\n",
+ counter, ret.error);
+}
+
+static void guest_illegal_exception_handler(struct ex_regs *regs)
+{
+ __GUEST_ASSERT(regs->cause == EXC_INST_ILLEGAL,
+ "Unexpected exception handler %lx\n", regs->cause);
+
+ illegal_handler_invoked = true;
+ /* skip the trapping instruction */
+ regs->epc += 4;
+}
+
+static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
+ unsigned long cflags,
+ unsigned long event)
+{
+ struct sbiret ret;
+
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase, cmask,
+ cflags, event, 0, 0);
+ __GUEST_ASSERT(ret.error == 0, "config matching failed %ld\n", ret.error);
+ GUEST_ASSERT(ret.value < RISCV_MAX_PMU_COUNTERS);
+ GUEST_ASSERT(BIT(ret.value) & counter_mask_available);
+
+ return ret.value;
+}
+
+static unsigned long get_num_counters(void)
+{
+ struct sbiret ret;
+
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_NUM_COUNTERS, 0, 0, 0, 0, 0, 0);
+
+ __GUEST_ASSERT(ret.error == 0, "Unable to retrieve number of counters from SBI PMU");
+ __GUEST_ASSERT(ret.value < RISCV_MAX_PMU_COUNTERS,
+ "Invalid number of counters %ld\n", ret.value);
+
+ return ret.value;
+}
+
+static void update_counter_info(int num_counters)
+{
+ int i = 0;
+ struct sbiret ret;
+
+ for (i = 0; i < num_counters; i++) {
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i, 0, 0, 0, 0, 0);
+
+ /* There can be gaps in logical counter indicies*/
+ if (ret.error)
+ continue;
+ GUEST_ASSERT_NE(ret.value, 0);
+
+ ctrinfo_arr[i].value = ret.value;
+ counter_mask_available |= BIT(i);
+ }
+
+ GUEST_ASSERT(counter_mask_available > 0);
+}
+
+static unsigned long read_fw_counter(int idx, union sbi_pmu_ctr_info ctrinfo)
+{
+ struct sbiret ret;
+
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ, idx, 0, 0, 0, 0, 0);
+ GUEST_ASSERT(ret.error == 0);
+ return ret.value;
+}
+
+static unsigned long read_counter(int idx, union sbi_pmu_ctr_info ctrinfo)
+{
+ unsigned long counter_val = 0;
+
+ __GUEST_ASSERT(ctrinfo.type < 2, "Invalid counter type %d", ctrinfo.type);
+
+ if (ctrinfo.type == SBI_PMU_CTR_TYPE_HW)
+ counter_val = pmu_csr_read_num(ctrinfo.csr);
+ else if (ctrinfo.type == SBI_PMU_CTR_TYPE_FW)
+ counter_val = read_fw_counter(idx, ctrinfo);
+
+ return counter_val;
+}
+
+static void test_pmu_event(unsigned long event)
+{
+ unsigned long counter;
+ unsigned long counter_value_pre, counter_value_post;
+ unsigned long counter_init_value = 100;
+
+ counter = get_counter_index(0, counter_mask_available, 0, event);
+ counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
+
+ /* Do not set the initial value */
+ start_counter(counter, 0, 0);
+ dummy_func_loop(10000);
+ stop_counter(counter, 0);
+
+ counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
+ __GUEST_ASSERT(counter_value_post > counter_value_pre,
+ "Event update verification failed: post [%lx] pre [%lx]\n",
+ counter_value_post, counter_value_pre);
+
+ /*
+ * We can't just update the counter without starting it.
+ * Do start/stop twice to simulate that by first initializing to a very
+ * high value and a low value after that.
+ */
+ start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, ULONG_MAX/2);
+ stop_counter(counter, 0);
+ counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
+
+ start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_init_value);
+ stop_counter(counter, 0);
+ counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
+ __GUEST_ASSERT(counter_value_pre > counter_value_post,
+ "Counter reinitialization verification failed : post [%lx] pre [%lx]\n",
+ counter_value_post, counter_value_pre);
+
+ /* Now set the initial value and compare */
+ start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_init_value);
+ dummy_func_loop(10000);
+ stop_counter(counter, 0);
+
+ counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
+ __GUEST_ASSERT(counter_value_post > counter_init_value,
+ "Event update verification failed: post [%lx] pre [%lx]\n",
+ counter_value_post, counter_init_value);
+
+ stop_reset_counter(counter, 0);
+}
+
+static void test_invalid_event(void)
+{
+ struct sbiret ret;
+ unsigned long event = 0x1234; /* A random event */
+
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, 0,
+ counter_mask_available, 0, event, 0, 0);
+ GUEST_ASSERT_EQ(ret.error, SBI_ERR_NOT_SUPPORTED);
+}
+
+static void test_pmu_events(void)
+{
+ int num_counters = 0;
+
+ /* Get the counter details */
+ num_counters = get_num_counters();
+ update_counter_info(num_counters);
+
+ /* Sanity testing for any random invalid event */
+ test_invalid_event();
+
+ /* Only these two events are guaranteed to be present */
+ test_pmu_event(SBI_PMU_HW_CPU_CYCLES);
+ test_pmu_event(SBI_PMU_HW_INSTRUCTIONS);
+
+ GUEST_DONE();
+}
+
+static void test_pmu_basic_sanity(void)
+{
+ long out_val = 0;
+ bool probe;
+ struct sbiret ret;
+ int num_counters = 0, i;
+ union sbi_pmu_ctr_info ctrinfo;
+
+ probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
+ GUEST_ASSERT(probe && out_val == 1);
+
+ num_counters = get_num_counters();
+
+ for (i = 0; i < num_counters; i++) {
+ ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i,
+ 0, 0, 0, 0, 0);
+
+ /* There can be gaps in logical counter indicies*/
+ if (ret.error)
+ continue;
+ GUEST_ASSERT_NE(ret.value, 0);
+
+ ctrinfo.value = ret.value;
+
+ /**
+ * Accessibility check of hardware and read capability of firmware counters.
+ * The spec doesn't mandate any initial value. No need to check any value.
+ */
+ if (ctrinfo.type == SBI_PMU_CTR_TYPE_HW) {
+ pmu_csr_read_num(ctrinfo.csr);
+ GUEST_ASSERT(illegal_handler_invoked);
+ } else if (ctrinfo.type == SBI_PMU_CTR_TYPE_FW) {
+ read_fw_counter(i, ctrinfo);
+ }
+ }
+
+ GUEST_DONE();
+}
+
+static void run_vcpu(struct kvm_vcpu *vcpu)
+{
+ struct ucall uc;
+
+ vcpu_run(vcpu);
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ case UCALL_DONE:
+ case UCALL_SYNC:
+ break;
+ default:
+ TEST_FAIL("Unknown ucall %lu", uc.cmd);
+ break;
+ }
+}
+
+void test_vm_destroy(struct kvm_vm *vm)
+{
+ memset(ctrinfo_arr, 0, sizeof(union sbi_pmu_ctr_info) * RISCV_MAX_PMU_COUNTERS);
+ counter_mask_available = 0;
+ kvm_vm_free(vm);
+}
+
+static void test_vm_basic_test(void *guest_code)
+{
+ struct kvm_vm *vm;
+ struct kvm_vcpu *vcpu;
+
+ vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+ __TEST_REQUIRE(__vcpu_has_sbi_ext(vcpu, KVM_RISCV_SBI_EXT_PMU),
+ "SBI PMU not available, skipping test");
+ vm_init_vector_tables(vm);
+ /* Illegal instruction handler is required to verify read access without configuration */
+ vm_install_exception_handler(vm, EXC_INST_ILLEGAL, guest_illegal_exception_handler);
+
+ vcpu_init_vector_tables(vcpu);
+ run_vcpu(vcpu);
+
+ test_vm_destroy(vm);
+}
+
+static void test_vm_events_test(void *guest_code)
+{
+ struct kvm_vm *vm = NULL;
+ struct kvm_vcpu *vcpu = NULL;
+
+ vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+ __TEST_REQUIRE(__vcpu_has_sbi_ext(vcpu, KVM_RISCV_SBI_EXT_PMU),
+ "SBI PMU not available, skipping test");
+ run_vcpu(vcpu);
+
+ test_vm_destroy(vm);
+}
+
+int main(void)
+{
+ test_vm_basic_test(test_pmu_basic_sanity);
+ pr_info("SBI PMU basic test : PASS\n");
+
+ test_vm_events_test(test_pmu_events);
+ pr_info("SBI PMU event verification test : PASS\n");
+
+ return 0;
+}
--
2.34.1
Verify PMU snapshot functionality by setting up the shared memory
correctly and reading the counter values from the shared memory
instead of the CSR.
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
.../testing/selftests/kvm/include/riscv/sbi.h | 25 +++
.../selftests/kvm/lib/riscv/processor.c | 12 ++
.../selftests/kvm/riscv/sbi_pmu_test.c | 144 ++++++++++++++++++
3 files changed, 181 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/riscv/sbi.h b/tools/testing/selftests/kvm/include/riscv/sbi.h
index 6675ca673c77..046b432ae896 100644
--- a/tools/testing/selftests/kvm/include/riscv/sbi.h
+++ b/tools/testing/selftests/kvm/include/riscv/sbi.h
@@ -8,6 +8,12 @@
#ifndef SELFTEST_KVM_SBI_H
#define SELFTEST_KVM_SBI_H
+/* SBI spec version fields */
+#define SBI_SPEC_VERSION_DEFAULT 0x1
+#define SBI_SPEC_VERSION_MAJOR_SHIFT 24
+#define SBI_SPEC_VERSION_MAJOR_MASK 0x7f
+#define SBI_SPEC_VERSION_MINOR_MASK 0xffffff
+
/* SBI return error codes */
#define SBI_SUCCESS 0
#define SBI_ERR_FAILURE -1
@@ -33,6 +39,9 @@ enum sbi_ext_id {
};
enum sbi_ext_base_fid {
+ SBI_EXT_BASE_GET_SPEC_VERSION = 0,
+ SBI_EXT_BASE_GET_IMP_ID,
+ SBI_EXT_BASE_GET_IMP_VERSION,
SBI_EXT_BASE_PROBE_EXT = 3,
};
enum sbi_ext_pmu_fid {
@@ -60,6 +69,12 @@ union sbi_pmu_ctr_info {
};
};
+struct riscv_pmu_snapshot_data {
+ u64 ctr_overflow_mask;
+ u64 ctr_values[64];
+ u64 reserved[447];
+};
+
struct sbiret {
long error;
long value;
@@ -113,4 +128,14 @@ struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
bool guest_sbi_probe_extension(int extid, long *out_val);
+/* Make SBI version */
+static inline unsigned long sbi_mk_version(unsigned long major,
+ unsigned long minor)
+{
+ return ((major & SBI_SPEC_VERSION_MAJOR_MASK) << SBI_SPEC_VERSION_MAJOR_SHIFT)
+ | (minor & SBI_SPEC_VERSION_MINOR_MASK);
+}
+
+unsigned long get_host_sbi_spec_version(void);
+
#endif /* SELFTEST_KVM_SBI_H */
diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c b/tools/testing/selftests/kvm/lib/riscv/processor.c
index e8211f5d6863..ccb35573749c 100644
--- a/tools/testing/selftests/kvm/lib/riscv/processor.c
+++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
@@ -502,3 +502,15 @@ bool guest_sbi_probe_extension(int extid, long *out_val)
return true;
}
+
+unsigned long get_host_sbi_spec_version(void)
+{
+ struct sbiret ret;
+
+ ret = sbi_ecall(SBI_EXT_BASE, SBI_EXT_BASE_GET_SPEC_VERSION, 0,
+ 0, 0, 0, 0, 0);
+
+ GUEST_ASSERT(!ret.error);
+
+ return ret.value;
+}
diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
index 7c81691e39c5..9002ff451abf 100644
--- a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
+++ b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
@@ -19,6 +19,11 @@
#define RISCV_MAX_PMU_COUNTERS 64
union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
+/* Snapshot shared memory data */
+#define PMU_SNAPSHOT_GPA_BASE BIT(30)
+static void *snapshot_gva;
+static vm_paddr_t snapshot_gpa;
+
/* Cache the available counters in a bitmask */
static unsigned long counter_mask_available;
@@ -186,6 +191,32 @@ static unsigned long read_counter(int idx, union sbi_pmu_ctr_info ctrinfo)
return counter_val;
}
+static inline void verify_sbi_requirement_assert(void)
+{
+ long out_val = 0;
+ bool probe;
+
+ probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
+ GUEST_ASSERT(probe && out_val == 1);
+
+ if (get_host_sbi_spec_version() < sbi_mk_version(2, 0))
+ __GUEST_ASSERT(0, "SBI implementation version doesn't support PMU Snapshot");
+}
+
+static void snapshot_set_shmem(vm_paddr_t gpa, unsigned long flags)
+{
+ unsigned long lo = (unsigned long)gpa;
+#if __riscv_xlen == 32
+ unsigned long hi = (unsigned long)(gpa >> 32);
+#else
+ unsigned long hi = gpa == -1 ? -1 : 0;
+#endif
+ struct sbiret ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
+ lo, hi, flags, 0, 0, 0);
+
+ GUEST_ASSERT(ret.value == 0 && ret.error == 0);
+}
+
static void test_pmu_event(unsigned long event)
{
unsigned long counter;
@@ -234,6 +265,59 @@ static void test_pmu_event(unsigned long event)
stop_reset_counter(counter, 0);
}
+static void test_pmu_event_snapshot(unsigned long event)
+{
+ unsigned long counter;
+ unsigned long counter_value_pre, counter_value_post;
+ unsigned long counter_init_value = 100;
+ struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
+
+ counter = get_counter_index(0, counter_mask_available, 0, event);
+ counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
+
+ /* Do not set the initial value */
+ start_counter(counter, 0, 0);
+ dummy_func_loop(10000);
+ stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
+
+ /* The counter value is updated w.r.t relative index of cbase */
+ counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
+ __GUEST_ASSERT(counter_value_post > counter_value_pre,
+ "Event update verification failed: post [%lx] pre [%lx]\n",
+ counter_value_post, counter_value_pre);
+
+ /*
+ * We can't just update the counter without starting it.
+ * Do start/stop twice to simulate that by first initializing to a very
+ * high value and a low value after that.
+ */
+ WRITE_ONCE(snapshot_data->ctr_values[0], ULONG_MAX/2);
+ start_counter(counter, SBI_PMU_START_FLAG_INIT_SNAPSHOT, 0);
+ stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
+ counter_value_pre = READ_ONCE(snapshot_data->ctr_values[0]);
+
+ WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
+ start_counter(counter, SBI_PMU_START_FLAG_INIT_SNAPSHOT, 0);
+ stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
+ counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
+ __GUEST_ASSERT(counter_value_pre > counter_value_post,
+ "Counter reinitialization verification failed : post [%lx] pre [%lx]\n",
+ counter_value_post, counter_value_pre);
+
+ /* Now set the initial value and compare */
+ WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
+ start_counter(counter, SBI_PMU_START_FLAG_INIT_SNAPSHOT, 0);
+ dummy_func_loop(10000);
+ stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
+
+ counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
+ __GUEST_ASSERT(counter_value_post > counter_init_value,
+ "Event update verification failed: post [%lx] pre [%lx]\n",
+ counter_value_post, counter_init_value);
+
+ stop_reset_counter(counter, 0);
+}
+
static void test_invalid_event(void)
{
struct sbiret ret;
@@ -301,6 +385,34 @@ static void test_pmu_basic_sanity(void)
GUEST_DONE();
}
+static void test_pmu_events_snaphost(void)
+{
+ int num_counters = 0;
+ struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
+ int i;
+
+ /* Verify presence of SBI PMU and minimum requrired SBI version */
+ verify_sbi_requirement_assert();
+
+ snapshot_set_shmem(snapshot_gpa, 0);
+
+ /* Get the counter details */
+ num_counters = get_num_counters();
+ update_counter_info(num_counters);
+
+ /* Validate shared memory access */
+ GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_overflow_mask), 0);
+ for (i = 0; i < num_counters; i++) {
+ if (counter_mask_available & (BIT(i)))
+ GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_values[i]), 0);
+ }
+ /* Only these two events are guranteed to be present */
+ test_pmu_event_snapshot(SBI_PMU_HW_CPU_CYCLES);
+ test_pmu_event_snapshot(SBI_PMU_HW_INSTRUCTIONS);
+
+ GUEST_DONE();
+}
+
static void run_vcpu(struct kvm_vcpu *vcpu)
{
struct ucall uc;
@@ -357,6 +469,35 @@ static void test_vm_events_test(void *guest_code)
test_vm_destroy(vm);
}
+static void test_vm_setup_snapshot_mem(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
+{
+ /* PMU Snapshot requires single page only */
+ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, PMU_SNAPSHOT_GPA_BASE, 1, 1, 0);
+ /* PMU_SNAPSHOT_GPA_BASE is identity mapped */
+ virt_map(vm, PMU_SNAPSHOT_GPA_BASE, PMU_SNAPSHOT_GPA_BASE, 1);
+
+ snapshot_gva = (void *)(PMU_SNAPSHOT_GPA_BASE);
+ snapshot_gpa = addr_gva2gpa(vcpu->vm, (vm_vaddr_t)snapshot_gva);
+ sync_global_to_guest(vcpu->vm, snapshot_gva);
+ sync_global_to_guest(vcpu->vm, snapshot_gpa);
+}
+
+static void test_vm_events_snapshot_test(void *guest_code)
+{
+ struct kvm_vm *vm = NULL;
+ struct kvm_vcpu *vcpu;
+
+ vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+ __TEST_REQUIRE(__vcpu_has_sbi_ext(vcpu, KVM_RISCV_SBI_EXT_PMU),
+ "SBI PMU not available, skipping test");
+
+ test_vm_setup_snapshot_mem(vm, vcpu);
+
+ run_vcpu(vcpu);
+
+ test_vm_destroy(vm);
+}
+
int main(void)
{
test_vm_basic_test(test_pmu_basic_sanity);
@@ -365,5 +506,8 @@ int main(void)
test_vm_events_test(test_pmu_events);
pr_info("SBI PMU event verification test : PASS\n");
+ test_vm_events_snapshot_test(test_pmu_events_snaphost);
+ pr_info("SBI PMU event verification with snapshot test : PASS\n");
+
return 0;
}
--
2.34.1
Add a test for verifying overflow interrupt. Currently, it relies on
overflow support on cycle/instret events. This test works for cycle/
instret events which support sampling via hpmcounters on the platform.
There are no ISA extensions to detect if a platform supports that. Thus,
this test will fail on platform with virtualization but doesn't
support overflow on these two events.
Reviewed-by: Anup Patel <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
.../selftests/kvm/riscv/sbi_pmu_test.c | 113 ++++++++++++++++++
1 file changed, 113 insertions(+)
diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
index 9002ff451abf..0fd9b76ae838 100644
--- a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
+++ b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
@@ -14,6 +14,7 @@
#include "test_util.h"
#include "processor.h"
#include "sbi.h"
+#include "arch_timer.h"
/* Maximum counters(firmware + hardware) */
#define RISCV_MAX_PMU_COUNTERS 64
@@ -24,6 +25,9 @@ union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
static void *snapshot_gva;
static vm_paddr_t snapshot_gpa;
+static int vcpu_shared_irq_count;
+static int counter_in_use;
+
/* Cache the available counters in a bitmask */
static unsigned long counter_mask_available;
@@ -120,6 +124,31 @@ static void guest_illegal_exception_handler(struct ex_regs *regs)
regs->epc += 4;
}
+static void guest_irq_handler(struct ex_regs *regs)
+{
+ unsigned int irq_num = regs->cause & ~CAUSE_IRQ_FLAG;
+ struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
+ unsigned long overflown_mask;
+ unsigned long counter_val = 0;
+
+ /* Validate that we are in the correct irq handler */
+ GUEST_ASSERT_EQ(irq_num, IRQ_PMU_OVF);
+
+ /* Stop all counters first to avoid further interrupts */
+ stop_counter(counter_in_use, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
+
+ csr_clear(CSR_SIP, BIT(IRQ_PMU_OVF));
+
+ overflown_mask = READ_ONCE(snapshot_data->ctr_overflow_mask);
+ GUEST_ASSERT(overflown_mask & 0x01);
+
+ WRITE_ONCE(vcpu_shared_irq_count, vcpu_shared_irq_count+1);
+
+ counter_val = READ_ONCE(snapshot_data->ctr_values[0]);
+ /* Now start the counter to mimick the real driver behavior */
+ start_counter(counter_in_use, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_val);
+}
+
static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
unsigned long cflags,
unsigned long event)
@@ -318,6 +347,33 @@ static void test_pmu_event_snapshot(unsigned long event)
stop_reset_counter(counter, 0);
}
+static void test_pmu_event_overflow(unsigned long event)
+{
+ unsigned long counter;
+ unsigned long counter_value_post;
+ unsigned long counter_init_value = ULONG_MAX - 10000;
+ struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
+
+ counter = get_counter_index(0, counter_mask_available, 0, event);
+ counter_in_use = counter;
+
+ /* The counter value is updated w.r.t relative index of cbase passed to start/stop */
+ WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
+ start_counter(counter, SBI_PMU_START_FLAG_INIT_SNAPSHOT, 0);
+ dummy_func_loop(10000);
+ udelay(msecs_to_usecs(2000));
+ /* irq handler should have stopped the counter */
+ stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
+
+ counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
+ /* The counter value after stopping should be less the init value due to overflow */
+ __GUEST_ASSERT(counter_value_post < counter_init_value,
+ "counter_value_post %lx counter_init_value %lx for counter\n",
+ counter_value_post, counter_init_value);
+
+ stop_reset_counter(counter, 0);
+}
+
static void test_invalid_event(void)
{
struct sbiret ret;
@@ -413,6 +469,34 @@ static void test_pmu_events_snaphost(void)
GUEST_DONE();
}
+static void test_pmu_events_overflow(void)
+{
+ int num_counters = 0;
+
+ /* Verify presence of SBI PMU and minimum requrired SBI version */
+ verify_sbi_requirement_assert();
+
+ snapshot_set_shmem(snapshot_gpa, 0);
+ csr_set(CSR_IE, BIT(IRQ_PMU_OVF));
+ local_irq_enable();
+
+ /* Get the counter details */
+ num_counters = get_num_counters();
+ update_counter_info(num_counters);
+
+ /*
+ * Qemu supports overflow for cycle/instruction.
+ * This test may fail on any platform that do not support overflow for these two events.
+ */
+ test_pmu_event_overflow(SBI_PMU_HW_CPU_CYCLES);
+ GUEST_ASSERT_EQ(vcpu_shared_irq_count, 1);
+
+ test_pmu_event_overflow(SBI_PMU_HW_INSTRUCTIONS);
+ GUEST_ASSERT_EQ(vcpu_shared_irq_count, 2);
+
+ GUEST_DONE();
+}
+
static void run_vcpu(struct kvm_vcpu *vcpu)
{
struct ucall uc;
@@ -498,6 +582,32 @@ static void test_vm_events_snapshot_test(void *guest_code)
test_vm_destroy(vm);
}
+static void test_vm_events_overflow(void *guest_code)
+{
+ struct kvm_vm *vm = NULL;
+ struct kvm_vcpu *vcpu;
+
+ vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+ __TEST_REQUIRE(__vcpu_has_sbi_ext(vcpu, KVM_RISCV_SBI_EXT_PMU),
+ "SBI PMU not available, skipping test");
+
+ __TEST_REQUIRE(__vcpu_has_isa_ext(vcpu, KVM_RISCV_ISA_EXT_SSCOFPMF),
+ "Sscofpmf is not available, skipping overflow test");
+
+ test_vm_setup_snapshot_mem(vm, vcpu);
+ vm_init_vector_tables(vm);
+ vm_install_interrupt_handler(vm, guest_irq_handler);
+
+ vcpu_init_vector_tables(vcpu);
+ /* Initialize guest timer frequency. */
+ vcpu_get_reg(vcpu, RISCV_TIMER_REG(frequency), &timer_freq);
+ sync_global_to_guest(vm, timer_freq);
+
+ run_vcpu(vcpu);
+
+ test_vm_destroy(vm);
+}
+
int main(void)
{
test_vm_basic_test(test_pmu_basic_sanity);
@@ -509,5 +619,8 @@ int main(void)
test_vm_events_snapshot_test(test_pmu_events_snaphost);
pr_info("SBI PMU event verification with snapshot test : PASS\n");
+ test_vm_events_overflow(test_pmu_events_overflow);
+ pr_info("SBI PMU event verification with overflow test : PASS\n");
+
return 0;
}
--
2.34.1
SBI PMU test comprises of multiple tests and user may want to run
only a subset depending on the platform. The most common case would
be to run all to validate all the tests. However, some platform may
not support all events or all ISA extensions.
The commandline option allows user to disable any set of tests if
they want to.
Suggested-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
.../selftests/kvm/riscv/sbi_pmu_test.c | 73 ++++++++++++++++---
1 file changed, 64 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
index 0fd9b76ae838..69bb94e6b227 100644
--- a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
+++ b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
@@ -33,6 +33,13 @@ static unsigned long counter_mask_available;
static bool illegal_handler_invoked;
+#define SBI_PMU_TEST_BASIC BIT(0)
+#define SBI_PMU_TEST_EVENTS BIT(1)
+#define SBI_PMU_TEST_SNAPSHOT BIT(2)
+#define SBI_PMU_TEST_OVERFLOW BIT(3)
+
+static int disabled_tests;
+
unsigned long pmu_csr_read_num(int csr_num)
{
#define switchcase_csr_read(__csr_num, __val) {\
@@ -608,19 +615,67 @@ static void test_vm_events_overflow(void *guest_code)
test_vm_destroy(vm);
}
-int main(void)
+static void test_print_help(char *name)
+{
+ pr_info("Usage: %s [-h] [-d <test name>]\n", name);
+ pr_info("\t-d: Test to disable. Available tests are 'basic', 'events', 'snapshot', 'overflow'\n");
+ pr_info("\t-h: print this help screen\n");
+}
+
+static bool parse_args(int argc, char *argv[])
+{
+ int opt;
+
+ while ((opt = getopt(argc, argv, "hd:")) != -1) {
+ switch (opt) {
+ case 'd':
+ if (!strncmp("basic", optarg, 5))
+ disabled_tests |= SBI_PMU_TEST_BASIC;
+ else if (!strncmp("events", optarg, 6))
+ disabled_tests |= SBI_PMU_TEST_EVENTS;
+ else if (!strncmp("snapshot", optarg, 8))
+ disabled_tests |= SBI_PMU_TEST_SNAPSHOT;
+ else if (!strncmp("overflow", optarg, 8))
+ disabled_tests |= SBI_PMU_TEST_OVERFLOW;
+ else
+ goto done;
+ break;
+ case 'h':
+ default:
+ goto done;
+ }
+ }
+
+ return true;
+done:
+ test_print_help(argv[0]);
+ return false;
+}
+
+int main(int argc, char *argv[])
{
- test_vm_basic_test(test_pmu_basic_sanity);
- pr_info("SBI PMU basic test : PASS\n");
+ if (!parse_args(argc, argv))
+ exit(KSFT_SKIP);
+
+ if (!(disabled_tests & SBI_PMU_TEST_BASIC)) {
+ test_vm_basic_test(test_pmu_basic_sanity);
+ pr_info("SBI PMU basic test : PASS\n");
+ }
- test_vm_events_test(test_pmu_events);
- pr_info("SBI PMU event verification test : PASS\n");
+ if (!(disabled_tests & SBI_PMU_TEST_EVENTS)) {
+ test_vm_events_test(test_pmu_events);
+ pr_info("SBI PMU event verification test : PASS\n");
+ }
- test_vm_events_snapshot_test(test_pmu_events_snaphost);
- pr_info("SBI PMU event verification with snapshot test : PASS\n");
+ if (!(disabled_tests & SBI_PMU_TEST_SNAPSHOT)) {
+ test_vm_events_snapshot_test(test_pmu_events_snaphost);
+ pr_info("SBI PMU event verification with snapshot test : PASS\n");
+ }
- test_vm_events_overflow(test_pmu_events_overflow);
- pr_info("SBI PMU event verification with overflow test : PASS\n");
+ if (!(disabled_tests & SBI_PMU_TEST_OVERFLOW)) {
+ test_vm_events_overflow(test_pmu_events_overflow);
+ pr_info("SBI PMU event verification with overflow test : PASS\n");
+ }
return 0;
}
--
2.34.1
Hi Atish,
On 2024-04-20 10:17 AM, Atish Patra wrote:
> For RV32, used_hw_ctrs can have more than 1 word if the firmware chooses
> to interleave firmware/hardware counters indicies. Even though it's a
> unlikely scenario, handle that case by iterating over all the words
> instead of just using the first word.
>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> drivers/perf/riscv_pmu_sbi.c | 21 ++++++++++++---------
> 1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index f23501898657..4eacd89141a9 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -652,10 +652,12 @@ static inline void pmu_sbi_stop_all(struct riscv_pmu *pmu)
> static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
> {
> struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> + int i;
>
> - /* No need to check the error here as we can't do anything about the error */
> - sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 0,
> - cpu_hw_evt->used_hw_ctrs[0], 0, 0, 0, 0);
> + for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++)
> + /* No need to check the error here as we can't do anything about the error */
> + sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, i * BITS_PER_LONG,
> + cpu_hw_evt->used_hw_ctrs[i], 0, 0, 0, 0);
> }
>
> /*
> @@ -667,7 +669,7 @@ static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
> static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> unsigned long ctr_ovf_mask)
> {
> - int idx = 0;
> + int idx = 0, i;
> struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> struct perf_event *event;
> unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
> @@ -676,11 +678,12 @@ static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> struct hw_perf_event *hwc;
> u64 init_val = 0;
>
> - ctr_start_mask = cpu_hw_evt->used_hw_ctrs[0] & ~ctr_ovf_mask;
> -
> - /* Start all the counters that did not overflow in a single shot */
> - sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, 0, ctr_start_mask,
> - 0, 0, 0, 0);
> + for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++) {
> + ctr_start_mask = cpu_hw_evt->used_hw_ctrs[i] & ~ctr_ovf_mask;
This is applying the mask for the first 32 logical counters to the both sets of
32 logical counters. ctr_ovf_mask needs to be 64 bits wide here, so each loop
iteration can apply the correct half of the mask.
Regards,
Samuel
> + /* Start all the counters that did not overflow in a single shot */
> + sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, i * BITS_PER_LONG, ctr_start_mask,
> + 0, 0, 0, 0);
> + }
>
> /* Reinitialize and start all the counter that overflowed */
> while (ctr_ovf_mask) {
On Fri, Apr 19, 2024 at 5:37 PM Samuel Holland
<[email protected]> wrote:
>
> Hi Atish,
>
> On 2024-04-20 10:17 AM, Atish Patra wrote:
> > For RV32, used_hw_ctrs can have more than 1 word if the firmware chooses
> > to interleave firmware/hardware counters indicies. Even though it's a
> > unlikely scenario, handle that case by iterating over all the words
> > instead of just using the first word.
> >
> > Reviewed-by: Andrew Jones <[email protected]>
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > drivers/perf/riscv_pmu_sbi.c | 21 ++++++++++++---------
> > 1 file changed, 12 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> > index f23501898657..4eacd89141a9 100644
> > --- a/drivers/perf/riscv_pmu_sbi.c
> > +++ b/drivers/perf/riscv_pmu_sbi.c
> > @@ -652,10 +652,12 @@ static inline void pmu_sbi_stop_all(struct riscv_pmu *pmu)
> > static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
> > {
> > struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> > + int i;
> >
> > - /* No need to check the error here as we can't do anything about the error */
> > - sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 0,
> > - cpu_hw_evt->used_hw_ctrs[0], 0, 0, 0, 0);
> > + for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++)
> > + /* No need to check the error here as we can't do anything about the error */
> > + sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, i * BITS_PER_LONG,
> > + cpu_hw_evt->used_hw_ctrs[i], 0, 0, 0, 0);
> > }
> >
> > /*
> > @@ -667,7 +669,7 @@ static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
> > static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> > unsigned long ctr_ovf_mask)
> > {
> > - int idx = 0;
> > + int idx = 0, i;
> > struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> > struct perf_event *event;
> > unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
> > @@ -676,11 +678,12 @@ static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> > struct hw_perf_event *hwc;
> > u64 init_val = 0;
> >
> > - ctr_start_mask = cpu_hw_evt->used_hw_ctrs[0] & ~ctr_ovf_mask;
> > -
> > - /* Start all the counters that did not overflow in a single shot */
> > - sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, 0, ctr_start_mask,
> > - 0, 0, 0, 0);
> > + for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++) {
> > + ctr_start_mask = cpu_hw_evt->used_hw_ctrs[i] & ~ctr_ovf_mask;
>
> This is applying the mask for the first 32 logical counters to the both sets of
> 32 logical counters. ctr_ovf_mask needs to be 64 bits wide here, so each loop
> iteration can apply the correct half of the mask.
>
The 64bit wide support for ctr_ovf_mask is added in the next patch.
> Regards,
> Samuel
>
> > + /* Start all the counters that did not overflow in a single shot */
> > + sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, i * BITS_PER_LONG, ctr_start_mask,
> > + 0, 0, 0, 0);
> > + }
> >
> > /* Reinitialize and start all the counter that overflowed */
> > while (ctr_ovf_mask) {
>
On Sat, Apr 20, 2024 at 5:18 AM Atish Patra <[email protected]> wrote:
>
> The initial sample period value when counter value is not assigned
> should be set to maximum value supported by the counter width.
> Otherwise, it may result in spurious interrupts.
>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> arch/riscv/kvm/vcpu_pmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index 86391a5061dd..cee1b9ca4ec4 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -39,7 +39,7 @@ static u64 kvm_pmu_get_sample_period(struct kvm_pmc *pmc)
> u64 sample_period;
>
> if (!pmc->counter_val)
> - sample_period = counter_val_mask + 1;
> + sample_period = counter_val_mask;
> else
> sample_period = (-pmc->counter_val) & counter_val_mask;
>
> --
> 2.34.1
>
On Sat, Apr 20, 2024 at 5:18 AM Atish Patra <[email protected]> wrote:
>
> Rename the function to indicate that it is meant for firmware
> counter read. While at it, add a range sanity check for it as
> well.
>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> arch/riscv/include/asm/kvm_vcpu_pmu.h | 2 +-
> arch/riscv/kvm/vcpu_pmu.c | 7 ++++++-
> arch/riscv/kvm/vcpu_sbi_pmu.c | 2 +-
> 3 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> index 55861b5d3382..fa0f535bbbf0 100644
> --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
> +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> @@ -89,7 +89,7 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> unsigned long ctr_mask, unsigned long flags,
> unsigned long eidx, u64 evtdata,
> struct kvm_vcpu_sbi_return *retdata);
> -int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> +int kvm_riscv_vcpu_pmu_fw_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> struct kvm_vcpu_sbi_return *retdata);
> int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
> struct kvm_vcpu_sbi_return *retdata);
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index e1409ec9afc0..04db1f993c47 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -235,6 +235,11 @@ static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> u64 enabled, running;
> int fevent_code;
>
> + if (cidx >= kvm_pmu_num_counters(kvpmu) || cidx == 1) {
> + pr_warn("Invalid counter id [%ld] during read\n", cidx);
> + return -EINVAL;
> + }
> +
> pmc = &kvpmu->pmc[cidx];
>
> if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
> @@ -747,7 +752,7 @@ int kvm_riscv_vcpu_pmu_fw_ctr_read_hi(struct kvm_vcpu *vcpu, unsigned long cidx,
> return 0;
> }
>
> -int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> +int kvm_riscv_vcpu_pmu_fw_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> struct kvm_vcpu_sbi_return *retdata)
> {
> int ret;
> diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
> index cf111de51bdb..e4be34e03e83 100644
> --- a/arch/riscv/kvm/vcpu_sbi_pmu.c
> +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
> @@ -62,7 +62,7 @@ static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> ret = kvm_riscv_vcpu_pmu_ctr_stop(vcpu, cp->a0, cp->a1, cp->a2, retdata);
> break;
> case SBI_EXT_PMU_COUNTER_FW_READ:
> - ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, retdata);
> + ret = kvm_riscv_vcpu_pmu_fw_ctr_read(vcpu, cp->a0, retdata);
> break;
> case SBI_EXT_PMU_COUNTER_FW_READ_HI:
> if (IS_ENABLED(CONFIG_32BIT))
> --
> 2.34.1
>
On Sat, Apr 20, 2024 at 5:18 AM Atish Patra <[email protected]> wrote:
>
> __vcpu_has_ext can check both SBI and ISA extensions when the first
> argument is properly converted to SBI/ISA extension IDs. Introduce
> two helper functions to make life easier for developers so they
> don't have to worry about the conversions.
>
> Replace the current usages as well with new helpers.
>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> tools/testing/selftests/kvm/include/riscv/processor.h | 10 ++++++++++
> tools/testing/selftests/kvm/riscv/arch_timer.c | 2 +-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
> index 3b9cb39327ff..5f389166338c 100644
> --- a/tools/testing/selftests/kvm/include/riscv/processor.h
> +++ b/tools/testing/selftests/kvm/include/riscv/processor.h
> @@ -50,6 +50,16 @@ static inline uint64_t __kvm_reg_id(uint64_t type, uint64_t subtype,
>
> bool __vcpu_has_ext(struct kvm_vcpu *vcpu, uint64_t ext);
>
> +static inline bool __vcpu_has_isa_ext(struct kvm_vcpu *vcpu, uint64_t isa_ext)
> +{
> + return __vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(isa_ext));
> +}
> +
> +static inline bool __vcpu_has_sbi_ext(struct kvm_vcpu *vcpu, uint64_t sbi_ext)
> +{
> + return __vcpu_has_ext(vcpu, RISCV_SBI_EXT_REG(sbi_ext));
> +}
> +
> struct ex_regs {
> unsigned long ra;
> unsigned long sp;
> diff --git a/tools/testing/selftests/kvm/riscv/arch_timer.c b/tools/testing/selftests/kvm/riscv/arch_timer.c
> index 0f9cabd99fd4..735b78569021 100644
> --- a/tools/testing/selftests/kvm/riscv/arch_timer.c
> +++ b/tools/testing/selftests/kvm/riscv/arch_timer.c
> @@ -85,7 +85,7 @@ struct kvm_vm *test_vm_create(void)
> int nr_vcpus = test_args.nr_vcpus;
>
> vm = vm_create_with_vcpus(nr_vcpus, guest_code, vcpus);
> - __TEST_REQUIRE(__vcpu_has_ext(vcpus[0], RISCV_ISA_EXT_REG(KVM_RISCV_ISA_EXT_SSTC)),
> + __TEST_REQUIRE(__vcpu_has_isa_ext(vcpus[0], KVM_RISCV_ISA_EXT_SSTC),
> "SSTC not available, skipping test\n");
>
> vm_init_vector_tables(vm);
> --
> 2.34.1
>
On Sat, Apr 20, 2024 at 5:17 AM Atish Patra <[email protected]> wrote:
>
> This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
> and fw_read_hi() functions.
>
> SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
> to provide counter information (i.e. values/overflow status) via a shared
> memory between the SBI implementation and supervisor OS. This allows to minimize
> the number of traps in when perf being used inside a kvm guest as it relies on
> SBI PMU + trap/emulation of the counters.
>
> The current set of ratified RISC-V specification also doesn't allow scountovf
> to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
> in ISA as well and enables perf sampling in the guest. However, LCOFI in the
> guest only works via IRQ filtering in AIA specification. That's why, AIA
> has to be enabled in the hardware (at least the Ssaia extension) in order to
> use the sampling support in the perf.
>
> Here are the patch wise implementation details.
>
> PATCH 1,4,7,8,9,10,11,15 : Generic cleanups/improvements.
> PATCH 2,3,14 : FW_READ_HI function implementation
> PATCH 5-6: Add PMU snapshot feature in sbi pmu driver
> PATCH 12-13: KVM implementation for snapshot and sampling in kvm guests
> PATCH 16-17: Generic improvements for kvm selftests
> PATCH 18-22: KVM selftests for SBI PMU extension
>
> The series is based on v6.9-rc4 and is available at:
>
> https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v8
>
> The kvmtool patch is also available at:
> https://github.com/atishp04/kvmtool/tree/sscofpmf
>
> It also requires Ssaia ISA extension to be present in the hardware in order to
> get perf sampling support in the guest. In Qemu virt machine, it can be done
> by the following config.
>
> ```
> -cpu rv64,sscofpmf=true,x-ssaia=true
> ```
>
> There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
> for the guest if AIA patches are not available. Here is the example command.
>
> ```
> ./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
> ```
>
> The series has been tested only in Qemu.
> Here is the snippet of the perf running inside a kvm guest.
>
> ===================================================
> $ perf record -e cycles -e instructions perf bench sched messaging -g 5
> ...
> $ Running 'sched/messaging' benchmark:
> ...
> [ 45.928723] perf_duration_warn: 2 callbacks suppressed
> [ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
> $ 20 sender and receiver processes per group
> $ 5 groups == 200 processes run
>
> Total time: 14.220 [sec]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
> $ perf report --stdio
> $ To display the perf.data header info, please use --header/--header-only optio>
> $
> $
> $ Total Lost Samples: 0
> $
> $ Samples: 943 of event 'cycles'
> $ Event count (approx.): 5128976844
> $
> $ Overhead Command Shared Object Symbol >
> $ ........ ............... ........................... ....................>
> $
> 7.59% sched-messaging [kernel.kallsyms] [k] memcpy
> 5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
> 5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
> 4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
> 3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
> 3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
> 3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
> 3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
> 3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
> 3.16% sched-messaging [kernel.kallsyms] [k] clear_page
> 3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
> 2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
>
> ===================================================
>
> [1] https://github.com/riscv-non-isa/riscv-sbi-doc
>
> Changes from v7->v8:
> 1. Updated event states so that shared memory is updated only during stop
> operations.
> 2. Avoid clobbering lower XLEN counter/overflow values in shared memory
> by maintaining a temporary copy for RV32.
> 3. Improved overflow handling in snapshot case by supporting all 64 values.
> 4. Minor cleanups based on suggestions on v7.
>
> Changes from v6->v7:
> 1. Used SBI_SHMEM_DISABLE in the driver.
> 2. Added RB Tags.
> 3. Improved the sbi_pmu_test commandline to allow disabling multiple
> tests.
>
> Changes from v5->v6:
> 1. Added a patch for command line option for the sbi pmu tests.
> 2. Removed redundant prints and restructure the code little bit.
> 3. Added a patch for computing the sbi minor version correctly.
> 4. Addressed all other comments on v5.
>
> Changes from v4->v5:
> 1. Moved sbi related definitions to its own header file from processor.h
> 2. Added few helper functions for selftests.
> 3. Improved firmware counter read and RV32 start/stop functions.
> 4. Converted all the shifting operations to use BIT macro
> 5. Addressed all other comments on v4.
>
> Changes from v3->v4:
> 1. Added selftests.
> 2. Fixed an issue to clear the interrupt pending bits.
> 3. Fixed the counter index in snapshot memory start function.
>
> Changes from v2->v3:
> 1. Fixed a patchwork warning on patch6.
> 2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
> 3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
>
> Changes from v1->v2:
> 1. Fixed warning/errors from patchwork CI.
> 2. Rebased on top of kvm-next.
> 3. Added Acked-by tags.
>
> Changes from RFC->v1:
> 1. Addressed all the comments on RFC series.
> 2. Removed PATCH2 and merged into later patches.
> 3. Added 2 more patches for minor fixes.
> 4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
> Ssaia in the host.
>
> Atish Patra (24):
> RISC-V: Fix the typo in Scountovf CSR name
> RISC-V: Add FIRMWARE_READ_HI definition
> drivers/perf: riscv: Read upper bits of a firmware counter
> drivers/perf: riscv: Use BIT macro for shifting operations
> RISC-V: Add SBI PMU snapshot definitions
> RISC-V: KVM: Rename the SBI_STA_SHMEM_DISABLE to a generic name
> RISC-V: Use the minor version mask while computing sbi version
> drivers/perf: riscv: Fix counter mask iteration for RV32
> drivers/perf: riscv: Implement SBI PMU snapshot function
> RISC-V: KVM: Fix the initial sample period value
> RISC-V: KVM: No need to update the counter value during reset
> RISC-V: KVM: No need to exit to the user space if perf event failed
> RISC-V: KVM: Implement SBI PMU Snapshot feature
> RISC-V: KVM: Add perf sampling support for guests
> RISC-V: KVM: Support 64 bit firmware counters on RV32
> RISC-V: KVM: Improve firmware counter read function
> KVM: riscv: selftests: Move sbi definitions to its own header file
> KVM: riscv: selftests: Add helper functions for extension checks
> KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
> KVM: riscv: selftests: Add SBI PMU extension definitions
> KVM: riscv: selftests: Add SBI PMU selftest
> KVM: riscv: selftests: Add a test for PMU snapshot functionality
> KVM: riscv: selftests: Add a test for counter overflow
> KVM: riscv: selftests: Add commandline option for SBI PMU test
Queued this series for Linux-6.10
If new issues are discovered then send patches based on
the KVM riscv queue.
Thanks,
Anup
>
> arch/riscv/include/asm/csr.h | 5 +-
> arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 +-
> arch/riscv/include/asm/sbi.h | 38 +-
> arch/riscv/include/uapi/asm/kvm.h | 1 +
> arch/riscv/kernel/paravirt.c | 6 +-
> arch/riscv/kvm/aia.c | 5 +
> arch/riscv/kvm/vcpu.c | 15 +-
> arch/riscv/kvm/vcpu_onereg.c | 6 +
> arch/riscv/kvm/vcpu_pmu.c | 260 ++++++-
> arch/riscv/kvm/vcpu_sbi_pmu.c | 17 +-
> arch/riscv/kvm/vcpu_sbi_sta.c | 4 +-
> drivers/perf/riscv_pmu.c | 1 +
> drivers/perf/riscv_pmu_sbi.c | 309 +++++++-
> include/linux/perf/riscv_pmu.h | 6 +
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/include/riscv/processor.h | 49 +-
> .../testing/selftests/kvm/include/riscv/sbi.h | 141 ++++
> .../selftests/kvm/include/riscv/ucall.h | 1 +
> .../selftests/kvm/lib/riscv/processor.c | 12 +
> .../testing/selftests/kvm/riscv/arch_timer.c | 2 +-
> .../selftests/kvm/riscv/get-reg-list.c | 4 +
> .../selftests/kvm/riscv/sbi_pmu_test.c | 681 ++++++++++++++++++
> tools/testing/selftests/kvm/steal_time.c | 4 +-
> 23 files changed, 1467 insertions(+), 117 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/include/riscv/sbi.h
> create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
>
> --
> 2.34.1
>
On Sat, Apr 20, 2024 at 5:18 AM Atish Patra <[email protected]> wrote:
>
> The SBI definitions will continue to grow. Move the sbi related
> definitions to its own header file from processor.h
>
> Suggested-by: Andrew Jones <[email protected]>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> .../selftests/kvm/include/riscv/processor.h | 39 ---------------
> .../testing/selftests/kvm/include/riscv/sbi.h | 50 +++++++++++++++++++
> .../selftests/kvm/include/riscv/ucall.h | 1 +
> tools/testing/selftests/kvm/steal_time.c | 4 +-
> 4 files changed, 54 insertions(+), 40 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/include/riscv/sbi.h
>
> diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
> index ce473fe251dd..3b9cb39327ff 100644
> --- a/tools/testing/selftests/kvm/include/riscv/processor.h
> +++ b/tools/testing/selftests/kvm/include/riscv/processor.h
> @@ -154,45 +154,6 @@ void vm_install_interrupt_handler(struct kvm_vm *vm, exception_handler_fn handle
> #define PGTBL_PAGE_SIZE PGTBL_L0_BLOCK_SIZE
> #define PGTBL_PAGE_SIZE_SHIFT PGTBL_L0_BLOCK_SHIFT
>
> -/* SBI return error codes */
> -#define SBI_SUCCESS 0
> -#define SBI_ERR_FAILURE -1
> -#define SBI_ERR_NOT_SUPPORTED -2
> -#define SBI_ERR_INVALID_PARAM -3
> -#define SBI_ERR_DENIED -4
> -#define SBI_ERR_INVALID_ADDRESS -5
> -#define SBI_ERR_ALREADY_AVAILABLE -6
> -#define SBI_ERR_ALREADY_STARTED -7
> -#define SBI_ERR_ALREADY_STOPPED -8
> -
> -#define SBI_EXT_EXPERIMENTAL_START 0x08000000
> -#define SBI_EXT_EXPERIMENTAL_END 0x08FFFFFF
> -
> -#define KVM_RISCV_SELFTESTS_SBI_EXT SBI_EXT_EXPERIMENTAL_END
> -#define KVM_RISCV_SELFTESTS_SBI_UCALL 0
> -#define KVM_RISCV_SELFTESTS_SBI_UNEXP 1
> -
> -enum sbi_ext_id {
> - SBI_EXT_BASE = 0x10,
> - SBI_EXT_STA = 0x535441,
> -};
> -
> -enum sbi_ext_base_fid {
> - SBI_EXT_BASE_PROBE_EXT = 3,
> -};
> -
> -struct sbiret {
> - long error;
> - long value;
> -};
> -
> -struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
> - unsigned long arg1, unsigned long arg2,
> - unsigned long arg3, unsigned long arg4,
> - unsigned long arg5);
> -
> -bool guest_sbi_probe_extension(int extid, long *out_val);
> -
> static inline void local_irq_enable(void)
> {
> csr_set(CSR_SSTATUS, SR_SIE);
> diff --git a/tools/testing/selftests/kvm/include/riscv/sbi.h b/tools/testing/selftests/kvm/include/riscv/sbi.h
> new file mode 100644
> index 000000000000..ba04f2dec7b5
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/include/riscv/sbi.h
> @@ -0,0 +1,50 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * RISC-V SBI specific definitions
> + *
> + * Copyright (C) 2024 Rivos Inc.
> + */
> +
> +#ifndef SELFTEST_KVM_SBI_H
> +#define SELFTEST_KVM_SBI_H
> +
> +/* SBI return error codes */
> +#define SBI_SUCCESS 0
> +#define SBI_ERR_FAILURE -1
> +#define SBI_ERR_NOT_SUPPORTED -2
> +#define SBI_ERR_INVALID_PARAM -3
> +#define SBI_ERR_DENIED -4
> +#define SBI_ERR_INVALID_ADDRESS -5
> +#define SBI_ERR_ALREADY_AVAILABLE -6
> +#define SBI_ERR_ALREADY_STARTED -7
> +#define SBI_ERR_ALREADY_STOPPED -8
> +
> +#define SBI_EXT_EXPERIMENTAL_START 0x08000000
> +#define SBI_EXT_EXPERIMENTAL_END 0x08FFFFFF
> +
> +#define KVM_RISCV_SELFTESTS_SBI_EXT SBI_EXT_EXPERIMENTAL_END
> +#define KVM_RISCV_SELFTESTS_SBI_UCALL 0
> +#define KVM_RISCV_SELFTESTS_SBI_UNEXP 1
> +
> +enum sbi_ext_id {
> + SBI_EXT_BASE = 0x10,
> + SBI_EXT_STA = 0x535441,
> +};
> +
> +enum sbi_ext_base_fid {
> + SBI_EXT_BASE_PROBE_EXT = 3,
> +};
> +
> +struct sbiret {
> + long error;
> + long value;
> +};
> +
> +struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
> + unsigned long arg1, unsigned long arg2,
> + unsigned long arg3, unsigned long arg4,
> + unsigned long arg5);
> +
> +bool guest_sbi_probe_extension(int extid, long *out_val);
> +
> +#endif /* SELFTEST_KVM_SBI_H */
> diff --git a/tools/testing/selftests/kvm/include/riscv/ucall.h b/tools/testing/selftests/kvm/include/riscv/ucall.h
> index be46eb32ec27..a695ae36f3e0 100644
> --- a/tools/testing/selftests/kvm/include/riscv/ucall.h
> +++ b/tools/testing/selftests/kvm/include/riscv/ucall.h
> @@ -3,6 +3,7 @@
> #define SELFTEST_KVM_UCALL_H
>
> #include "processor.h"
> +#include "sbi.h"
>
> #define UCALL_EXIT_REASON KVM_EXIT_RISCV_SBI
>
> diff --git a/tools/testing/selftests/kvm/steal_time.c b/tools/testing/selftests/kvm/steal_time.c
> index bae0c5026f82..2ff82c7fd926 100644
> --- a/tools/testing/selftests/kvm/steal_time.c
> +++ b/tools/testing/selftests/kvm/steal_time.c
> @@ -11,7 +11,9 @@
> #include <pthread.h>
> #include <linux/kernel.h>
> #include <asm/kvm.h>
> -#ifndef __riscv
> +#ifdef __riscv
> +#include "sbi.h"
> +#else
> #include <asm/kvm_para.h>
> #endif
>
> --
> 2.34.1
>
On Sat, Apr 20, 2024 at 5:18 AM Atish Patra <[email protected]> wrote:
>
> SBI PMU test comprises of multiple tests and user may want to run
> only a subset depending on the platform. The most common case would
> be to run all to validate all the tests. However, some platform may
> not support all events or all ISA extensions.
>
> The commandline option allows user to disable any set of tests if
> they want to.
>
> Suggested-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> .../selftests/kvm/riscv/sbi_pmu_test.c | 73 ++++++++++++++++---
> 1 file changed, 64 insertions(+), 9 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> index 0fd9b76ae838..69bb94e6b227 100644
> --- a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> @@ -33,6 +33,13 @@ static unsigned long counter_mask_available;
>
> static bool illegal_handler_invoked;
>
> +#define SBI_PMU_TEST_BASIC BIT(0)
> +#define SBI_PMU_TEST_EVENTS BIT(1)
> +#define SBI_PMU_TEST_SNAPSHOT BIT(2)
> +#define SBI_PMU_TEST_OVERFLOW BIT(3)
> +
> +static int disabled_tests;
> +
> unsigned long pmu_csr_read_num(int csr_num)
> {
> #define switchcase_csr_read(__csr_num, __val) {\
> @@ -608,19 +615,67 @@ static void test_vm_events_overflow(void *guest_code)
> test_vm_destroy(vm);
> }
>
> -int main(void)
> +static void test_print_help(char *name)
> +{
> + pr_info("Usage: %s [-h] [-d <test name>]\n", name);
> + pr_info("\t-d: Test to disable. Available tests are 'basic', 'events', 'snapshot', 'overflow'\n");
> + pr_info("\t-h: print this help screen\n");
> +}
> +
> +static bool parse_args(int argc, char *argv[])
> +{
> + int opt;
> +
> + while ((opt = getopt(argc, argv, "hd:")) != -1) {
> + switch (opt) {
> + case 'd':
> + if (!strncmp("basic", optarg, 5))
> + disabled_tests |= SBI_PMU_TEST_BASIC;
> + else if (!strncmp("events", optarg, 6))
> + disabled_tests |= SBI_PMU_TEST_EVENTS;
> + else if (!strncmp("snapshot", optarg, 8))
> + disabled_tests |= SBI_PMU_TEST_SNAPSHOT;
> + else if (!strncmp("overflow", optarg, 8))
> + disabled_tests |= SBI_PMU_TEST_OVERFLOW;
> + else
> + goto done;
> + break;
> + case 'h':
> + default:
> + goto done;
> + }
> + }
> +
> + return true;
> +done:
> + test_print_help(argv[0]);
> + return false;
> +}
> +
> +int main(int argc, char *argv[])
> {
> - test_vm_basic_test(test_pmu_basic_sanity);
> - pr_info("SBI PMU basic test : PASS\n");
> + if (!parse_args(argc, argv))
> + exit(KSFT_SKIP);
> +
> + if (!(disabled_tests & SBI_PMU_TEST_BASIC)) {
> + test_vm_basic_test(test_pmu_basic_sanity);
> + pr_info("SBI PMU basic test : PASS\n");
> + }
>
> - test_vm_events_test(test_pmu_events);
> - pr_info("SBI PMU event verification test : PASS\n");
> + if (!(disabled_tests & SBI_PMU_TEST_EVENTS)) {
> + test_vm_events_test(test_pmu_events);
> + pr_info("SBI PMU event verification test : PASS\n");
> + }
>
> - test_vm_events_snapshot_test(test_pmu_events_snaphost);
> - pr_info("SBI PMU event verification with snapshot test : PASS\n");
> + if (!(disabled_tests & SBI_PMU_TEST_SNAPSHOT)) {
> + test_vm_events_snapshot_test(test_pmu_events_snaphost);
> + pr_info("SBI PMU event verification with snapshot test : PASS\n");
> + }
>
> - test_vm_events_overflow(test_pmu_events_overflow);
> - pr_info("SBI PMU event verification with overflow test : PASS\n");
> + if (!(disabled_tests & SBI_PMU_TEST_OVERFLOW)) {
> + test_vm_events_overflow(test_pmu_events_overflow);
> + pr_info("SBI PMU event verification with overflow test : PASS\n");
> + }
>
> return 0;
> }
> --
> 2.34.1
>
Hi Atish,
On 2024-04-20 10:17 AM, Atish Patra wrote:
> SBI v2.0 SBI introduced PMU snapshot feature which adds the following
> features.
>
> 1. Read counter values directly from the shared memory instead of
> csr read.
> 2. Start multiple counters with initial values with one SBI call.
>
> These functionalities optimizes the number of traps to the higher
> privilege mode. If the kernel is in VS mode while the hypervisor
> deploy trap & emulate method, this would minimize all the hpmcounter
> CSR read traps. If the kernel is running in S-mode, the benefits
> reduced to CSR latency vs DRAM/cache latency as there is no trap
> involved while accessing the hpmcounter CSRs.
>
> In both modes, it does saves the number of ecalls while starting
> multiple counter together with an initial values. This is a likely
> scenario if multiple counters overflow at the same time.
>
> Acked-by: Palmer Dabbelt <[email protected]>
> Reviewed-by: Anup Patel <[email protected]>
> Reviewed-by: Conor Dooley <[email protected]>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> drivers/perf/riscv_pmu.c | 1 +
> drivers/perf/riscv_pmu_sbi.c | 265 ++++++++++++++++++++++++++++++---
> include/linux/perf/riscv_pmu.h | 6 +
> 3 files changed, 255 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
> index b4efdddb2ad9..36d348753d05 100644
> --- a/drivers/perf/riscv_pmu.c
> +++ b/drivers/perf/riscv_pmu.c
> @@ -408,6 +408,7 @@ struct riscv_pmu *riscv_pmu_alloc(void)
> cpuc->n_events = 0;
> for (i = 0; i < RISCV_MAX_COUNTERS; i++)
> cpuc->events[i] = NULL;
> + cpuc->snapshot_addr = NULL;
> }
> pmu->pmu = (struct pmu) {
> .event_init = riscv_pmu_event_init,
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 4eacd89141a9..2694110f1cff 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -58,6 +58,9 @@ PMU_FORMAT_ATTR(event, "config:0-47");
> PMU_FORMAT_ATTR(firmware, "config:63");
>
> static bool sbi_v2_available;
> +static DEFINE_STATIC_KEY_FALSE(sbi_pmu_snapshot_available);
> +#define sbi_pmu_snapshot_available() \
> + static_branch_unlikely(&sbi_pmu_snapshot_available)
>
> static struct attribute *riscv_arch_formats_attr[] = {
> &format_attr_event.attr,
> @@ -508,14 +511,105 @@ static int pmu_sbi_event_map(struct perf_event *event, u64 *econfig)
> return ret;
> }
>
> +static void pmu_sbi_snapshot_free(struct riscv_pmu *pmu)
> +{
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> +
> + if (!cpu_hw_evt->snapshot_addr)
> + continue;
> +
> + free_page((unsigned long)cpu_hw_evt->snapshot_addr);
> + cpu_hw_evt->snapshot_addr = NULL;
> + cpu_hw_evt->snapshot_addr_phys = 0;
> + }
> +}
> +
> +static int pmu_sbi_snapshot_alloc(struct riscv_pmu *pmu)
> +{
> + int cpu;
> + struct page *snapshot_page;
> +
> + for_each_possible_cpu(cpu) {
> + struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> +
> + snapshot_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
> + if (!snapshot_page) {
> + pmu_sbi_snapshot_free(pmu);
> + return -ENOMEM;
> + }
> + cpu_hw_evt->snapshot_addr = page_to_virt(snapshot_page);
> + cpu_hw_evt->snapshot_addr_phys = page_to_phys(snapshot_page);
> + }
> +
> + return 0;
> +}
> +
> +static int pmu_sbi_snapshot_disable(void)
> +{
> + struct sbiret ret;
> +
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM, SBI_SHMEM_DISABLE,
> + SBI_SHMEM_DISABLE, 0, 0, 0, 0);
> + if (ret.error) {
> + pr_warn("failed to disable snapshot shared memory\n");
> + return sbi_err_map_linux_errno(ret.error);
> + }
> +
> + return 0;
> +}
> +
> +static int pmu_sbi_snapshot_setup(struct riscv_pmu *pmu, int cpu)
> +{
> + struct cpu_hw_events *cpu_hw_evt;
> + struct sbiret ret = {0};
> +
> + cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> + if (!cpu_hw_evt->snapshot_addr_phys)
> + return -EINVAL;
> +
> + if (cpu_hw_evt->snapshot_set_done)
> + return 0;
> +
> + if (IS_ENABLED(CONFIG_32BIT))
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> + cpu_hw_evt->snapshot_addr_phys,
> + (u64)(cpu_hw_evt->snapshot_addr_phys) >> 32, 0, 0, 0, 0);
> + else
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> + cpu_hw_evt->snapshot_addr_phys, 0, 0, 0, 0, 0);
> +
> + /* Free up the snapshot area memory and fall back to SBI PMU calls without snapshot */
> + if (ret.error) {
> + if (ret.error != SBI_ERR_NOT_SUPPORTED)
> + pr_warn("pmu snapshot setup failed with error %ld\n", ret.error);
> + return sbi_err_map_linux_errno(ret.error);
> + }
> +
> + cpu_hw_evt->snapshot_set_done = true;
> +
> + return 0;
> +}
> +
> static u64 pmu_sbi_ctr_read(struct perf_event *event)
> {
> struct hw_perf_event *hwc = &event->hw;
> int idx = hwc->idx;
> struct sbiret ret;
> u64 val = 0;
> + struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
> + struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> + struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> union sbi_pmu_ctr_info info = pmu_ctr_list[idx];
>
> + /* Read the value from the shared memory directly only if counter is stopped */
> + if (sbi_pmu_snapshot_available() & (hwc->state & PERF_HES_STOPPED)) {
nit: logical && between the two conditions
> + val = sdata->ctr_values[idx];
> + return val;
> + }
> +
> if (pmu_sbi_is_fw_event(event)) {
> ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ,
> hwc->idx, 0, 0, 0, 0, 0);
> @@ -565,6 +659,7 @@ static void pmu_sbi_ctr_start(struct perf_event *event, u64 ival)
> struct hw_perf_event *hwc = &event->hw;
> unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
>
> + /* There is no benefit setting SNAPSHOT FLAG for a single counter */
> #if defined(CONFIG_32BIT)
> ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, hwc->idx,
> 1, flag, ival, ival >> 32, 0);
> @@ -585,16 +680,36 @@ static void pmu_sbi_ctr_stop(struct perf_event *event, unsigned long flag)
> {
> struct sbiret ret;
> struct hw_perf_event *hwc = &event->hw;
> + struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
> + struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> + struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
>
> if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
> (hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
> pmu_sbi_reset_scounteren((void *)event);
>
> + if (sbi_pmu_snapshot_available())
> + flag |= SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
> +
> ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, hwc->idx, 1, flag, 0, 0, 0);
> - if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
> - flag != SBI_PMU_STOP_FLAG_RESET)
> + if (!ret.error && sbi_pmu_snapshot_available()) {
> + /*
> + * The counter snapshot is based on the index base specified by hwc->idx.
> + * The actual counter value is updated in shared memory at index 0 when counter
> + * mask is 0x01. To ensure accurate counter values, it's necessary to transfer
> + * the counter value to shared memory. However, if hwc->idx is zero, the counter
> + * value is already correctly updated in shared memory, requiring no further
> + * adjustment.
> + */
> + if (hwc->idx > 0) {
> + sdata->ctr_values[hwc->idx] = sdata->ctr_values[0];
> + sdata->ctr_values[0] = 0;
> + }
> + } else if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
> + flag != SBI_PMU_STOP_FLAG_RESET) {
> pr_err("Stopping counter idx %d failed with error %d\n",
> hwc->idx, sbi_err_map_linux_errno(ret.error));
> + }
> }
>
> static int pmu_sbi_find_num_ctrs(void)
> @@ -652,12 +767,39 @@ static inline void pmu_sbi_stop_all(struct riscv_pmu *pmu)
> static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
> {
> struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> + struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> + unsigned long flag = 0;
> int i;
> + struct sbiret ret;
> + unsigned long temp_ctr_values[64] = {0};
> + unsigned long ctr_val, temp_ctr_overflow_mask = 0;
>
> - for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++)
> + if (sbi_pmu_snapshot_available())
> + flag = SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
> +
> + for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++) {
> /* No need to check the error here as we can't do anything about the error */
> - sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, i * BITS_PER_LONG,
> - cpu_hw_evt->used_hw_ctrs[i], 0, 0, 0, 0);
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, i * BITS_PER_LONG,
> + cpu_hw_evt->used_hw_ctrs[i], flag, 0, 0, 0);
> + if (!ret.error && sbi_pmu_snapshot_available()) {
> + /* Save the counter values to avoid clobbering */
> + temp_ctr_values[i * BITS_PER_LONG + i] = sdata->ctr_values[i];
> + /* Save the overflow mask to avoid clobbering */
> + if (BIT(i) & sdata->ctr_overflow_mask)
> + temp_ctr_overflow_mask |= BIT(i + i * BITS_PER_LONG);
`i` is iterating over longs here, not individual counters. Using `i` twice in
the same expression like this makes no sense.
And again, since you made temp_ctr_overflow_mask an `unsigned long`, this BIT
expression overflows during the second iteration of the loop on riscv32.
> + }
> + }
> +
> + /* Restore the counter values to the shared memory */
> + if (sbi_pmu_snapshot_available()) {
> + for (i = 0; i < 64; i++) {
> + ctr_val = temp_ctr_values[i];
> + if (ctr_val)
> + sdata->ctr_values[i] = ctr_val;
It is entirely possible for the correct value of a counter to be zero when some
other counter overflows. But here if a counter with value zero is clobbered by a
nonzero value, the correct value will not be restored.
> + if (temp_ctr_overflow_mask)
> + sdata->ctr_overflow_mask = temp_ctr_overflow_mask;
This doesn't need to be inside the loop. And if temp_ctr_overflow_mask is the
64-bit combination of the two 32-bit values from the SBI calls (which seems to
be the intention), then we already know there is at least one set bit, since
this is called from the overflow IRQ handler.
> + }
> + }
> }
>
> /*
> @@ -666,11 +808,10 @@ static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
> * while the overflowed counters need to be started with updated initialization
> * value.
> */
> -static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> - unsigned long ctr_ovf_mask)
> +static inline void pmu_sbi_start_ovf_ctrs_sbi(struct cpu_hw_events *cpu_hw_evt,
> + u64 ctr_ovf_mask)
> {
> int idx = 0, i;
> - struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> struct perf_event *event;
> unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
> unsigned long ctr_start_mask = 0;
> @@ -706,6 +847,48 @@ static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> }
> }
>
> +static inline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_evt,
> + u64 ctr_ovf_mask)
> +{
> + int idx = 0;
> + struct perf_event *event;
> + unsigned long flag = SBI_PMU_START_FLAG_INIT_SNAPSHOT;
> + u64 max_period, init_val = 0;
> + struct hw_perf_event *hwc;
> + struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> +
> + for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
> + if (ctr_ovf_mask & BIT(idx)) {
> + event = cpu_hw_evt->events[idx];
> + hwc = &event->hw;
> + max_period = riscv_pmu_ctr_get_width_mask(event);
> + init_val = local64_read(&hwc->prev_count) & max_period;
> + sdata->ctr_values[idx] = init_val;
> + }
> + /*
> + * We do not need to update the non-overflow counters the previous
> + * value should have been there already.
> + */
> + }
> +
> + for (idx = 0; idx < BITS_TO_LONGS(RISCV_MAX_COUNTERS); idx++) {
> + /* Start all the counters in a single shot */
> + sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, idx * BITS_PER_LONG,
> + cpu_hw_evt->used_hw_ctrs[idx], flag, 0, 0, 0);
This doesn't work either... for riscv32 you have to copy the initial values for
counters 32-63 to ctr_values indexes 0-31, since that is where the SBI
implementation expects to find them. And since this clobbers counters 0-31,
those need to be preserved somewhere. That's why I recommended a shadow copy of
ctr_values -- globally, not local to pmu_sbi_stop_hw_ctrs().
Also, you need SBI_PMU_START_FLAG_INIT_SNAPSHOT here to actually load the values
from shared memory, so this is broken on riscv64 as well.
> + }
> +}
> +
> +static void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> + u64 ctr_ovf_mask)
> +{
> + struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> +
> + if (sbi_pmu_snapshot_available())
> + pmu_sbi_start_ovf_ctrs_snapshot(cpu_hw_evt, ctr_ovf_mask);
> + else
> + pmu_sbi_start_ovf_ctrs_sbi(cpu_hw_evt, ctr_ovf_mask);
> +}
> +
> static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> {
> struct perf_sample_data data;
> @@ -716,9 +899,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> struct riscv_pmu *pmu;
> struct perf_event *event;
> unsigned long overflow;
> - unsigned long overflowed_ctrs = 0;
> + u64 overflowed_ctrs = 0;
> struct cpu_hw_events *cpu_hw_evt = dev;
> u64 start_clock = sched_clock();
> + struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
>
> if (WARN_ON_ONCE(!cpu_hw_evt))
> return IRQ_NONE;
> @@ -740,7 +924,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> pmu_sbi_stop_hw_ctrs(pmu);
>
> /* Overflow status register should only be read after counter are stopped */
> - ALT_SBI_PMU_OVERFLOW(overflow);
> + if (sbi_pmu_snapshot_available())
> + overflow = sdata->ctr_overflow_mask;
Overflow is `unsigned long`, so this assignment truncates ctr_overflow_mask on
riscv32.
Was this series tested on riscv32?
Regards,
Samuel
> + else
> + ALT_SBI_PMU_OVERFLOW(overflow);
>
> /*
> * Overflow interrupt pending bit should only be cleared after stopping
> @@ -766,9 +953,14 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> if (!info || info->type != SBI_PMU_CTR_TYPE_HW)
> continue;
>
> - /* compute hardware counter index */
> - hidx = info->csr - CSR_CYCLE;
> - /* check if the corresponding bit is set in sscountovf */
> + if (sbi_pmu_snapshot_available())
> + /* SBI implementation already updated the logical indicies */
> + hidx = lidx;
> + else
> + /* compute hardware counter index */
> + hidx = info->csr - CSR_CYCLE;
> +
> + /* check if the corresponding bit is set in sscountovf or overflow mask in shmem */
> if (!(overflow & BIT(hidx)))
> continue;
>
> @@ -778,7 +970,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> */
> overflowed_ctrs |= BIT(lidx);
> hw_evt = &event->hw;
> + /* Update the event states here so that we know the state while reading */
> + hw_evt->state |= PERF_HES_STOPPED;
> riscv_pmu_event_update(event);
> + hw_evt->state |= PERF_HES_UPTODATE;
> perf_sample_data_init(&data, 0, hw_evt->last_period);
> if (riscv_pmu_event_set_period(event)) {
> /*
> @@ -791,6 +986,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> */
> perf_event_overflow(event, &data, regs);
> }
> + /* Reset the state as we are going to start the counter after the loop */
> + hw_evt->state = 0;
> }
>
> pmu_sbi_start_overflow_mask(pmu, overflowed_ctrs);
> @@ -822,6 +1019,9 @@ static int pmu_sbi_starting_cpu(unsigned int cpu, struct hlist_node *node)
> enable_percpu_irq(riscv_pmu_irq, IRQ_TYPE_NONE);
> }
>
> + if (sbi_pmu_snapshot_available())
> + return pmu_sbi_snapshot_setup(pmu, cpu);
> +
> return 0;
> }
>
> @@ -834,6 +1034,9 @@ static int pmu_sbi_dying_cpu(unsigned int cpu, struct hlist_node *node)
> /* Disable all counters access for user mode now */
> csr_write(CSR_SCOUNTEREN, 0x0);
>
> + if (sbi_pmu_snapshot_available())
> + return pmu_sbi_snapshot_disable();
> +
> return 0;
> }
>
> @@ -942,6 +1145,12 @@ static inline void riscv_pm_pmu_unregister(struct riscv_pmu *pmu) { }
>
> static void riscv_pmu_destroy(struct riscv_pmu *pmu)
> {
> + if (sbi_v2_available) {
> + if (sbi_pmu_snapshot_available()) {
> + pmu_sbi_snapshot_disable();
> + pmu_sbi_snapshot_free(pmu);
> + }
> + }
> riscv_pm_pmu_unregister(pmu);
> cpuhp_state_remove_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
> }
> @@ -1109,10 +1318,6 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
> pmu->event_unmapped = pmu_sbi_event_unmapped;
> pmu->csr_index = pmu_sbi_csr_index;
>
> - ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
> - if (ret)
> - return ret;
> -
> ret = riscv_pm_pmu_register(pmu);
> if (ret)
> goto out_unregister;
> @@ -1121,8 +1326,34 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
> if (ret)
> goto out_unregister;
>
> + /* SBI PMU Snapsphot is only available in SBI v2.0 */
> + if (sbi_v2_available) {
> + ret = pmu_sbi_snapshot_alloc(pmu);
> + if (ret)
> + goto out_unregister;
> +
> + ret = pmu_sbi_snapshot_setup(pmu, smp_processor_id());
> + if (ret) {
> + /* Snapshot is an optional feature. Continue if not available */
> + pmu_sbi_snapshot_free(pmu);
> + } else {
> + pr_info("SBI PMU snapshot detected\n");
> + /*
> + * We enable it once here for the boot cpu. If snapshot shmem setup
> + * fails during cpu hotplug process, it will fail to start the cpu
> + * as we can not handle hetergenous PMUs with different snapshot
> + * capability.
> + */
> + static_branch_enable(&sbi_pmu_snapshot_available);
> + }
> + }
> +
> register_sysctl("kernel", sbi_pmu_sysctl_table);
>
> + ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
> + if (ret)
> + goto out_unregister;
> +
> return 0;
>
> out_unregister:
> diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
> index 43282e22ebe1..c3fa90970042 100644
> --- a/include/linux/perf/riscv_pmu.h
> +++ b/include/linux/perf/riscv_pmu.h
> @@ -39,6 +39,12 @@ struct cpu_hw_events {
> DECLARE_BITMAP(used_hw_ctrs, RISCV_MAX_COUNTERS);
> /* currently enabled firmware counters */
> DECLARE_BITMAP(used_fw_ctrs, RISCV_MAX_COUNTERS);
> + /* The virtual address of the shared memory where counter snapshot will be taken */
> + void *snapshot_addr;
> + /* The physical address of the shared memory where counter snapshot will be taken */
> + phys_addr_t snapshot_addr_phys;
> + /* Boolean flag to indicate setup is already done */
> + bool snapshot_set_done;
> };
>
> struct riscv_pmu {
On 4/20/24 8:17 PM, Atish Patra wrote:
> The SBI PMU extension definition is required for upcoming SBI PMU
> selftests.
>
> Reviewed-by: Andrew Jones <[email protected]>
> Reviewed-by: Anup Patel <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM
Reviewed-by: Muhammad Usama Anjum <[email protected]>
> ---
> .../testing/selftests/kvm/include/riscv/sbi.h | 66 +++++++++++++++++++
> 1 file changed, 66 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/riscv/sbi.h b/tools/testing/selftests/kvm/include/riscv/sbi.h
> index ba04f2dec7b5..6675ca673c77 100644
> --- a/tools/testing/selftests/kvm/include/riscv/sbi.h
> +++ b/tools/testing/selftests/kvm/include/riscv/sbi.h
> @@ -29,17 +29,83 @@
> enum sbi_ext_id {
> SBI_EXT_BASE = 0x10,
> SBI_EXT_STA = 0x535441,
> + SBI_EXT_PMU = 0x504D55,
> };
>
> enum sbi_ext_base_fid {
> SBI_EXT_BASE_PROBE_EXT = 3,
> };
> +enum sbi_ext_pmu_fid {
> + SBI_EXT_PMU_NUM_COUNTERS = 0,
> + SBI_EXT_PMU_COUNTER_GET_INFO,
> + SBI_EXT_PMU_COUNTER_CFG_MATCH,
> + SBI_EXT_PMU_COUNTER_START,
> + SBI_EXT_PMU_COUNTER_STOP,
> + SBI_EXT_PMU_COUNTER_FW_READ,
> + SBI_EXT_PMU_COUNTER_FW_READ_HI,
> + SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> +};
> +
> +union sbi_pmu_ctr_info {
> + unsigned long value;
> + struct {
> + unsigned long csr:12;
> + unsigned long width:6;
> +#if __riscv_xlen == 32
> + unsigned long reserved:13;
> +#else
> + unsigned long reserved:45;
> +#endif
> + unsigned long type:1;
> + };
> +};
>
> struct sbiret {
> long error;
> long value;
> };
>
> +/** General pmu event codes specified in SBI PMU extension */
> +enum sbi_pmu_hw_generic_events_t {
> + SBI_PMU_HW_NO_EVENT = 0,
> + SBI_PMU_HW_CPU_CYCLES = 1,
> + SBI_PMU_HW_INSTRUCTIONS = 2,
> + SBI_PMU_HW_CACHE_REFERENCES = 3,
> + SBI_PMU_HW_CACHE_MISSES = 4,
> + SBI_PMU_HW_BRANCH_INSTRUCTIONS = 5,
> + SBI_PMU_HW_BRANCH_MISSES = 6,
> + SBI_PMU_HW_BUS_CYCLES = 7,
> + SBI_PMU_HW_STALLED_CYCLES_FRONTEND = 8,
> + SBI_PMU_HW_STALLED_CYCLES_BACKEND = 9,
> + SBI_PMU_HW_REF_CPU_CYCLES = 10,
> +
> + SBI_PMU_HW_GENERAL_MAX,
> +};
> +
> +/* SBI PMU counter types */
> +enum sbi_pmu_ctr_type {
> + SBI_PMU_CTR_TYPE_HW = 0x0,
> + SBI_PMU_CTR_TYPE_FW,
> +};
> +
> +/* Flags defined for config matching function */
> +#define SBI_PMU_CFG_FLAG_SKIP_MATCH BIT(0)
> +#define SBI_PMU_CFG_FLAG_CLEAR_VALUE BIT(1)
> +#define SBI_PMU_CFG_FLAG_AUTO_START BIT(2)
> +#define SBI_PMU_CFG_FLAG_SET_VUINH BIT(3)
> +#define SBI_PMU_CFG_FLAG_SET_VSINH BIT(4)
> +#define SBI_PMU_CFG_FLAG_SET_UINH BIT(5)
> +#define SBI_PMU_CFG_FLAG_SET_SINH BIT(6)
> +#define SBI_PMU_CFG_FLAG_SET_MINH BIT(7)
> +
> +/* Flags defined for counter start function */
> +#define SBI_PMU_START_FLAG_SET_INIT_VALUE BIT(0)
> +#define SBI_PMU_START_FLAG_INIT_SNAPSHOT BIT(1)
> +
> +/* Flags defined for counter stop function */
> +#define SBI_PMU_STOP_FLAG_RESET BIT(0)
> +#define SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT BIT(1)
> +
> struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
> unsigned long arg1, unsigned long arg2,
> unsigned long arg3, unsigned long arg4,
--
BR,
Muhammad Usama Anjum
On 4/20/24 8:17 PM, Atish Patra wrote:
> This test implements basic sanity test and cycle/instret event
> counting tests.
>
> Reviewed-by: Anup Patel <[email protected]>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM
Reviewed-by: Muhammad Usama Anjum <[email protected]>
> ---
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/riscv/sbi_pmu_test.c | 369 ++++++++++++++++++
> 2 files changed, 370 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
>
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 741c7dc16afc..1cfcd2797ee4 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -189,6 +189,7 @@ TEST_GEN_PROGS_s390x += rseq_test
> TEST_GEN_PROGS_s390x += set_memory_region_test
> TEST_GEN_PROGS_s390x += kvm_binary_stats_test
>
> +TEST_GEN_PROGS_riscv += riscv/sbi_pmu_test
> TEST_GEN_PROGS_riscv += arch_timer
> TEST_GEN_PROGS_riscv += demand_paging_test
> TEST_GEN_PROGS_riscv += dirty_log_test
> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> new file mode 100644
> index 000000000000..7c81691e39c5
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> @@ -0,0 +1,369 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * sbi_pmu_test.c - Tests the riscv64 SBI PMU functionality.
> + *
> + * Copyright (c) 2024, Rivos Inc.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include "kvm_util.h"
> +#include "test_util.h"
> +#include "processor.h"
> +#include "sbi.h"
> +
> +/* Maximum counters(firmware + hardware) */
> +#define RISCV_MAX_PMU_COUNTERS 64
> +union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
> +
> +/* Cache the available counters in a bitmask */
> +static unsigned long counter_mask_available;
> +
> +static bool illegal_handler_invoked;
> +
> +unsigned long pmu_csr_read_num(int csr_num)
> +{
> +#define switchcase_csr_read(__csr_num, __val) {\
> + case __csr_num: \
> + __val = csr_read(__csr_num); \
> + break; }
> +#define switchcase_csr_read_2(__csr_num, __val) {\
> + switchcase_csr_read(__csr_num + 0, __val) \
> + switchcase_csr_read(__csr_num + 1, __val)}
> +#define switchcase_csr_read_4(__csr_num, __val) {\
> + switchcase_csr_read_2(__csr_num + 0, __val) \
> + switchcase_csr_read_2(__csr_num + 2, __val)}
> +#define switchcase_csr_read_8(__csr_num, __val) {\
> + switchcase_csr_read_4(__csr_num + 0, __val) \
> + switchcase_csr_read_4(__csr_num + 4, __val)}
> +#define switchcase_csr_read_16(__csr_num, __val) {\
> + switchcase_csr_read_8(__csr_num + 0, __val) \
> + switchcase_csr_read_8(__csr_num + 8, __val)}
> +#define switchcase_csr_read_32(__csr_num, __val) {\
> + switchcase_csr_read_16(__csr_num + 0, __val) \
> + switchcase_csr_read_16(__csr_num + 16, __val)}
> +
> + unsigned long ret = 0;
> +
> + switch (csr_num) {
> + switchcase_csr_read_32(CSR_CYCLE, ret)
> + switchcase_csr_read_32(CSR_CYCLEH, ret)
> + default :
> + break;
> + }
> +
> + return ret;
> +#undef switchcase_csr_read_32
> +#undef switchcase_csr_read_16
> +#undef switchcase_csr_read_8
> +#undef switchcase_csr_read_4
> +#undef switchcase_csr_read_2
> +#undef switchcase_csr_read
> +}
> +
> +static inline void dummy_func_loop(uint64_t iter)
> +{
> + int i = 0;
> +
> + while (i < iter) {
> + asm volatile("nop");
> + i++;
> + }
> +}
> +
> +static void start_counter(unsigned long counter, unsigned long start_flags,
> + unsigned long ival)
> +{
> + struct sbiret ret;
> +
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, counter, 1, start_flags,
> + ival, 0, 0);
> + __GUEST_ASSERT(ret.error == 0, "Unable to start counter %ld\n", counter);
> +}
> +
> +/* This should be invoked only for reset counter use case */
> +static void stop_reset_counter(unsigned long counter, unsigned long stop_flags)
> +{
> + struct sbiret ret;
> +
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, counter, 1,
> + stop_flags | SBI_PMU_STOP_FLAG_RESET, 0, 0, 0);
> + __GUEST_ASSERT(ret.error == SBI_ERR_ALREADY_STOPPED,
> + "Unable to stop counter %ld\n", counter);
> +}
> +
> +static void stop_counter(unsigned long counter, unsigned long stop_flags)
> +{
> + struct sbiret ret;
> +
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, counter, 1, stop_flags,
> + 0, 0, 0);
> + __GUEST_ASSERT(ret.error == 0, "Unable to stop counter %ld error %ld\n",
> + counter, ret.error);
> +}
> +
> +static void guest_illegal_exception_handler(struct ex_regs *regs)
> +{
> + __GUEST_ASSERT(regs->cause == EXC_INST_ILLEGAL,
> + "Unexpected exception handler %lx\n", regs->cause);
> +
> + illegal_handler_invoked = true;
> + /* skip the trapping instruction */
> + regs->epc += 4;
> +}
> +
> +static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
> + unsigned long cflags,
> + unsigned long event)
> +{
> + struct sbiret ret;
> +
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase, cmask,
> + cflags, event, 0, 0);
> + __GUEST_ASSERT(ret.error == 0, "config matching failed %ld\n", ret.error);
> + GUEST_ASSERT(ret.value < RISCV_MAX_PMU_COUNTERS);
> + GUEST_ASSERT(BIT(ret.value) & counter_mask_available);
> +
> + return ret.value;
> +}
> +
> +static unsigned long get_num_counters(void)
> +{
> + struct sbiret ret;
> +
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_NUM_COUNTERS, 0, 0, 0, 0, 0, 0);
> +
> + __GUEST_ASSERT(ret.error == 0, "Unable to retrieve number of counters from SBI PMU");
> + __GUEST_ASSERT(ret.value < RISCV_MAX_PMU_COUNTERS,
> + "Invalid number of counters %ld\n", ret.value);
> +
> + return ret.value;
> +}
> +
> +static void update_counter_info(int num_counters)
> +{
> + int i = 0;
> + struct sbiret ret;
> +
> + for (i = 0; i < num_counters; i++) {
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i, 0, 0, 0, 0, 0);
> +
> + /* There can be gaps in logical counter indicies*/
> + if (ret.error)
> + continue;
> + GUEST_ASSERT_NE(ret.value, 0);
> +
> + ctrinfo_arr[i].value = ret.value;
> + counter_mask_available |= BIT(i);
> + }
> +
> + GUEST_ASSERT(counter_mask_available > 0);
> +}
> +
> +static unsigned long read_fw_counter(int idx, union sbi_pmu_ctr_info ctrinfo)
> +{
> + struct sbiret ret;
> +
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ, idx, 0, 0, 0, 0, 0);
> + GUEST_ASSERT(ret.error == 0);
> + return ret.value;
> +}
> +
> +static unsigned long read_counter(int idx, union sbi_pmu_ctr_info ctrinfo)
> +{
> + unsigned long counter_val = 0;
> +
> + __GUEST_ASSERT(ctrinfo.type < 2, "Invalid counter type %d", ctrinfo.type);
> +
> + if (ctrinfo.type == SBI_PMU_CTR_TYPE_HW)
> + counter_val = pmu_csr_read_num(ctrinfo.csr);
> + else if (ctrinfo.type == SBI_PMU_CTR_TYPE_FW)
> + counter_val = read_fw_counter(idx, ctrinfo);
> +
> + return counter_val;
> +}
> +
> +static void test_pmu_event(unsigned long event)
> +{
> + unsigned long counter;
> + unsigned long counter_value_pre, counter_value_post;
> + unsigned long counter_init_value = 100;
> +
> + counter = get_counter_index(0, counter_mask_available, 0, event);
> + counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
> +
> + /* Do not set the initial value */
> + start_counter(counter, 0, 0);
> + dummy_func_loop(10000);
> + stop_counter(counter, 0);
> +
> + counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> + __GUEST_ASSERT(counter_value_post > counter_value_pre,
> + "Event update verification failed: post [%lx] pre [%lx]\n",
> + counter_value_post, counter_value_pre);
> +
> + /*
> + * We can't just update the counter without starting it.
> + * Do start/stop twice to simulate that by first initializing to a very
> + * high value and a low value after that.
> + */
> + start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, ULONG_MAX/2);
> + stop_counter(counter, 0);
> + counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
> +
> + start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_init_value);
> + stop_counter(counter, 0);
> + counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> + __GUEST_ASSERT(counter_value_pre > counter_value_post,
> + "Counter reinitialization verification failed : post [%lx] pre [%lx]\n",
> + counter_value_post, counter_value_pre);
> +
> + /* Now set the initial value and compare */
> + start_counter(counter, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_init_value);
> + dummy_func_loop(10000);
> + stop_counter(counter, 0);
> +
> + counter_value_post = read_counter(counter, ctrinfo_arr[counter]);
> + __GUEST_ASSERT(counter_value_post > counter_init_value,
> + "Event update verification failed: post [%lx] pre [%lx]\n",
> + counter_value_post, counter_init_value);
> +
> + stop_reset_counter(counter, 0);
> +}
> +
> +static void test_invalid_event(void)
> +{
> + struct sbiret ret;
> + unsigned long event = 0x1234; /* A random event */
> +
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, 0,
> + counter_mask_available, 0, event, 0, 0);
> + GUEST_ASSERT_EQ(ret.error, SBI_ERR_NOT_SUPPORTED);
> +}
> +
> +static void test_pmu_events(void)
> +{
> + int num_counters = 0;
> +
> + /* Get the counter details */
> + num_counters = get_num_counters();
> + update_counter_info(num_counters);
> +
> + /* Sanity testing for any random invalid event */
> + test_invalid_event();
> +
> + /* Only these two events are guaranteed to be present */
> + test_pmu_event(SBI_PMU_HW_CPU_CYCLES);
> + test_pmu_event(SBI_PMU_HW_INSTRUCTIONS);
> +
> + GUEST_DONE();
> +}
> +
> +static void test_pmu_basic_sanity(void)
> +{
> + long out_val = 0;
> + bool probe;
> + struct sbiret ret;
> + int num_counters = 0, i;
> + union sbi_pmu_ctr_info ctrinfo;
> +
> + probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
> + GUEST_ASSERT(probe && out_val == 1);
> +
> + num_counters = get_num_counters();
> +
> + for (i = 0; i < num_counters; i++) {
> + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_GET_INFO, i,
> + 0, 0, 0, 0, 0);
> +
> + /* There can be gaps in logical counter indicies*/
> + if (ret.error)
> + continue;
> + GUEST_ASSERT_NE(ret.value, 0);
> +
> + ctrinfo.value = ret.value;
> +
> + /**
> + * Accessibility check of hardware and read capability of firmware counters.
> + * The spec doesn't mandate any initial value. No need to check any value.
> + */
> + if (ctrinfo.type == SBI_PMU_CTR_TYPE_HW) {
> + pmu_csr_read_num(ctrinfo.csr);
> + GUEST_ASSERT(illegal_handler_invoked);
> + } else if (ctrinfo.type == SBI_PMU_CTR_TYPE_FW) {
> + read_fw_counter(i, ctrinfo);
> + }
> + }
> +
> + GUEST_DONE();
> +}
> +
> +static void run_vcpu(struct kvm_vcpu *vcpu)
> +{
> + struct ucall uc;
> +
> + vcpu_run(vcpu);
> + switch (get_ucall(vcpu, &uc)) {
> + case UCALL_ABORT:
> + REPORT_GUEST_ASSERT(uc);
> + break;
> + case UCALL_DONE:
> + case UCALL_SYNC:
> + break;
> + default:
> + TEST_FAIL("Unknown ucall %lu", uc.cmd);
> + break;
> + }
> +}
> +
> +void test_vm_destroy(struct kvm_vm *vm)
> +{
> + memset(ctrinfo_arr, 0, sizeof(union sbi_pmu_ctr_info) * RISCV_MAX_PMU_COUNTERS);
> + counter_mask_available = 0;
> + kvm_vm_free(vm);
> +}
> +
> +static void test_vm_basic_test(void *guest_code)
> +{
> + struct kvm_vm *vm;
> + struct kvm_vcpu *vcpu;
> +
> + vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> + __TEST_REQUIRE(__vcpu_has_sbi_ext(vcpu, KVM_RISCV_SBI_EXT_PMU),
> + "SBI PMU not available, skipping test");
> + vm_init_vector_tables(vm);
> + /* Illegal instruction handler is required to verify read access without configuration */
> + vm_install_exception_handler(vm, EXC_INST_ILLEGAL, guest_illegal_exception_handler);
> +
> + vcpu_init_vector_tables(vcpu);
> + run_vcpu(vcpu);
> +
> + test_vm_destroy(vm);
> +}
> +
> +static void test_vm_events_test(void *guest_code)
> +{
> + struct kvm_vm *vm = NULL;
> + struct kvm_vcpu *vcpu = NULL;
> +
> + vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> + __TEST_REQUIRE(__vcpu_has_sbi_ext(vcpu, KVM_RISCV_SBI_EXT_PMU),
> + "SBI PMU not available, skipping test");
> + run_vcpu(vcpu);
> +
> + test_vm_destroy(vm);
> +}
> +
> +int main(void)
> +{
> + test_vm_basic_test(test_pmu_basic_sanity);
> + pr_info("SBI PMU basic test : PASS\n");
> +
> + test_vm_events_test(test_pmu_events);
> + pr_info("SBI PMU event verification test : PASS\n");
> +
> + return 0;
> +}
--
BR,
Muhammad Usama Anjum
On 4/20/24 8:17 PM, Atish Patra wrote:
> Verify PMU snapshot functionality by setting up the shared memory
> correctly and reading the counter values from the shared memory
> instead of the CSR.
>
> Reviewed-by: Andrew Jones <[email protected]>
> Reviewed-by: Anup Patel <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM
Reviewed-by: Muhammad Usama Anjum <[email protected]>
> ---
> .../testing/selftests/kvm/include/riscv/sbi.h | 25 +++
> .../selftests/kvm/lib/riscv/processor.c | 12 ++
> .../selftests/kvm/riscv/sbi_pmu_test.c | 144 ++++++++++++++++++
> 3 files changed, 181 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/riscv/sbi.h b/tools/testing/selftests/kvm/include/riscv/sbi.h
> index 6675ca673c77..046b432ae896 100644
> --- a/tools/testing/selftests/kvm/include/riscv/sbi.h
> +++ b/tools/testing/selftests/kvm/include/riscv/sbi.h
> @@ -8,6 +8,12 @@
> #ifndef SELFTEST_KVM_SBI_H
> #define SELFTEST_KVM_SBI_H
>
> +/* SBI spec version fields */
> +#define SBI_SPEC_VERSION_DEFAULT 0x1
> +#define SBI_SPEC_VERSION_MAJOR_SHIFT 24
> +#define SBI_SPEC_VERSION_MAJOR_MASK 0x7f
> +#define SBI_SPEC_VERSION_MINOR_MASK 0xffffff
> +
> /* SBI return error codes */
> #define SBI_SUCCESS 0
> #define SBI_ERR_FAILURE -1
> @@ -33,6 +39,9 @@ enum sbi_ext_id {
> };
>
> enum sbi_ext_base_fid {
> + SBI_EXT_BASE_GET_SPEC_VERSION = 0,
> + SBI_EXT_BASE_GET_IMP_ID,
> + SBI_EXT_BASE_GET_IMP_VERSION,
> SBI_EXT_BASE_PROBE_EXT = 3,
> };
> enum sbi_ext_pmu_fid {
> @@ -60,6 +69,12 @@ union sbi_pmu_ctr_info {
> };
> };
>
> +struct riscv_pmu_snapshot_data {
> + u64 ctr_overflow_mask;
> + u64 ctr_values[64];
> + u64 reserved[447];
> +};
> +
> struct sbiret {
> long error;
> long value;
> @@ -113,4 +128,14 @@ struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
>
> bool guest_sbi_probe_extension(int extid, long *out_val);
>
> +/* Make SBI version */
> +static inline unsigned long sbi_mk_version(unsigned long major,
> + unsigned long minor)
> +{
> + return ((major & SBI_SPEC_VERSION_MAJOR_MASK) << SBI_SPEC_VERSION_MAJOR_SHIFT)
> + | (minor & SBI_SPEC_VERSION_MINOR_MASK);
> +}
> +
> +unsigned long get_host_sbi_spec_version(void);
> +
> #endif /* SELFTEST_KVM_SBI_H */
> diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c b/tools/testing/selftests/kvm/lib/riscv/processor.c
> index e8211f5d6863..ccb35573749c 100644
> --- a/tools/testing/selftests/kvm/lib/riscv/processor.c
> +++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
> @@ -502,3 +502,15 @@ bool guest_sbi_probe_extension(int extid, long *out_val)
>
> return true;
> }
> +
> +unsigned long get_host_sbi_spec_version(void)
> +{
> + struct sbiret ret;
> +
> + ret = sbi_ecall(SBI_EXT_BASE, SBI_EXT_BASE_GET_SPEC_VERSION, 0,
> + 0, 0, 0, 0, 0);
> +
> + GUEST_ASSERT(!ret.error);
> +
> + return ret.value;
> +}
> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> index 7c81691e39c5..9002ff451abf 100644
> --- a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> @@ -19,6 +19,11 @@
> #define RISCV_MAX_PMU_COUNTERS 64
> union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
>
> +/* Snapshot shared memory data */
> +#define PMU_SNAPSHOT_GPA_BASE BIT(30)
> +static void *snapshot_gva;
> +static vm_paddr_t snapshot_gpa;
> +
> /* Cache the available counters in a bitmask */
> static unsigned long counter_mask_available;
>
> @@ -186,6 +191,32 @@ static unsigned long read_counter(int idx, union sbi_pmu_ctr_info ctrinfo)
> return counter_val;
> }
>
> +static inline void verify_sbi_requirement_assert(void)
> +{
> + long out_val = 0;
> + bool probe;
> +
> + probe = guest_sbi_probe_extension(SBI_EXT_PMU, &out_val);
> + GUEST_ASSERT(probe && out_val == 1);
> +
> + if (get_host_sbi_spec_version() < sbi_mk_version(2, 0))
> + __GUEST_ASSERT(0, "SBI implementation version doesn't support PMU Snapshot");
> +}
> +
> +static void snapshot_set_shmem(vm_paddr_t gpa, unsigned long flags)
> +{
> + unsigned long lo = (unsigned long)gpa;
> +#if __riscv_xlen == 32
> + unsigned long hi = (unsigned long)(gpa >> 32);
> +#else
> + unsigned long hi = gpa == -1 ? -1 : 0;
> +#endif
> + struct sbiret ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> + lo, hi, flags, 0, 0, 0);
> +
> + GUEST_ASSERT(ret.value == 0 && ret.error == 0);
> +}
> +
> static void test_pmu_event(unsigned long event)
> {
> unsigned long counter;
> @@ -234,6 +265,59 @@ static void test_pmu_event(unsigned long event)
> stop_reset_counter(counter, 0);
> }
>
> +static void test_pmu_event_snapshot(unsigned long event)
> +{
> + unsigned long counter;
> + unsigned long counter_value_pre, counter_value_post;
> + unsigned long counter_init_value = 100;
> + struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> +
> + counter = get_counter_index(0, counter_mask_available, 0, event);
> + counter_value_pre = read_counter(counter, ctrinfo_arr[counter]);
> +
> + /* Do not set the initial value */
> + start_counter(counter, 0, 0);
> + dummy_func_loop(10000);
> + stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
> +
> + /* The counter value is updated w.r.t relative index of cbase */
> + counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
> + __GUEST_ASSERT(counter_value_post > counter_value_pre,
> + "Event update verification failed: post [%lx] pre [%lx]\n",
> + counter_value_post, counter_value_pre);
> +
> + /*
> + * We can't just update the counter without starting it.
> + * Do start/stop twice to simulate that by first initializing to a very
> + * high value and a low value after that.
> + */
> + WRITE_ONCE(snapshot_data->ctr_values[0], ULONG_MAX/2);
> + start_counter(counter, SBI_PMU_START_FLAG_INIT_SNAPSHOT, 0);
> + stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
> + counter_value_pre = READ_ONCE(snapshot_data->ctr_values[0]);
> +
> + WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
> + start_counter(counter, SBI_PMU_START_FLAG_INIT_SNAPSHOT, 0);
> + stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
> + counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
> + __GUEST_ASSERT(counter_value_pre > counter_value_post,
> + "Counter reinitialization verification failed : post [%lx] pre [%lx]\n",
> + counter_value_post, counter_value_pre);
> +
> + /* Now set the initial value and compare */
> + WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
> + start_counter(counter, SBI_PMU_START_FLAG_INIT_SNAPSHOT, 0);
> + dummy_func_loop(10000);
> + stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
> +
> + counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
> + __GUEST_ASSERT(counter_value_post > counter_init_value,
> + "Event update verification failed: post [%lx] pre [%lx]\n",
> + counter_value_post, counter_init_value);
> +
> + stop_reset_counter(counter, 0);
> +}
> +
> static void test_invalid_event(void)
> {
> struct sbiret ret;
> @@ -301,6 +385,34 @@ static void test_pmu_basic_sanity(void)
> GUEST_DONE();
> }
>
> +static void test_pmu_events_snaphost(void)
> +{
> + int num_counters = 0;
> + struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> + int i;
> +
> + /* Verify presence of SBI PMU and minimum requrired SBI version */
> + verify_sbi_requirement_assert();
> +
> + snapshot_set_shmem(snapshot_gpa, 0);
> +
> + /* Get the counter details */
> + num_counters = get_num_counters();
> + update_counter_info(num_counters);
> +
> + /* Validate shared memory access */
> + GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_overflow_mask), 0);
> + for (i = 0; i < num_counters; i++) {
> + if (counter_mask_available & (BIT(i)))
> + GUEST_ASSERT_EQ(READ_ONCE(snapshot_data->ctr_values[i]), 0);
> + }
> + /* Only these two events are guranteed to be present */
> + test_pmu_event_snapshot(SBI_PMU_HW_CPU_CYCLES);
> + test_pmu_event_snapshot(SBI_PMU_HW_INSTRUCTIONS);
> +
> + GUEST_DONE();
> +}
> +
> static void run_vcpu(struct kvm_vcpu *vcpu)
> {
> struct ucall uc;
> @@ -357,6 +469,35 @@ static void test_vm_events_test(void *guest_code)
> test_vm_destroy(vm);
> }
>
> +static void test_vm_setup_snapshot_mem(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
> +{
> + /* PMU Snapshot requires single page only */
> + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, PMU_SNAPSHOT_GPA_BASE, 1, 1, 0);
> + /* PMU_SNAPSHOT_GPA_BASE is identity mapped */
> + virt_map(vm, PMU_SNAPSHOT_GPA_BASE, PMU_SNAPSHOT_GPA_BASE, 1);
> +
> + snapshot_gva = (void *)(PMU_SNAPSHOT_GPA_BASE);
> + snapshot_gpa = addr_gva2gpa(vcpu->vm, (vm_vaddr_t)snapshot_gva);
> + sync_global_to_guest(vcpu->vm, snapshot_gva);
> + sync_global_to_guest(vcpu->vm, snapshot_gpa);
> +}
> +
> +static void test_vm_events_snapshot_test(void *guest_code)
> +{
> + struct kvm_vm *vm = NULL;
> + struct kvm_vcpu *vcpu;
> +
> + vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> + __TEST_REQUIRE(__vcpu_has_sbi_ext(vcpu, KVM_RISCV_SBI_EXT_PMU),
> + "SBI PMU not available, skipping test");
> +
> + test_vm_setup_snapshot_mem(vm, vcpu);
> +
> + run_vcpu(vcpu);
> +
> + test_vm_destroy(vm);
> +}
> +
> int main(void)
> {
> test_vm_basic_test(test_pmu_basic_sanity);
> @@ -365,5 +506,8 @@ int main(void)
> test_vm_events_test(test_pmu_events);
> pr_info("SBI PMU event verification test : PASS\n");
>
> + test_vm_events_snapshot_test(test_pmu_events_snaphost);
> + pr_info("SBI PMU event verification with snapshot test : PASS\n");
> +
> return 0;
> }
--
BR,
Muhammad Usama Anjum
On 4/20/24 8:17 PM, Atish Patra wrote:
> Add a test for verifying overflow interrupt. Currently, it relies on
> overflow support on cycle/instret events. This test works for cycle/
> instret events which support sampling via hpmcounters on the platform.
> There are no ISA extensions to detect if a platform supports that. Thus,
> this test will fail on platform with virtualization but doesn't
> support overflow on these two events.
>
> Reviewed-by: Anup Patel <[email protected]>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM
Reviewed-by: Muhammad Usama Anjum <[email protected]>
> ---
> .../selftests/kvm/riscv/sbi_pmu_test.c | 113 ++++++++++++++++++
> 1 file changed, 113 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> index 9002ff451abf..0fd9b76ae838 100644
> --- a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> @@ -14,6 +14,7 @@
> #include "test_util.h"
> #include "processor.h"
> #include "sbi.h"
> +#include "arch_timer.h"
>
> /* Maximum counters(firmware + hardware) */
> #define RISCV_MAX_PMU_COUNTERS 64
> @@ -24,6 +25,9 @@ union sbi_pmu_ctr_info ctrinfo_arr[RISCV_MAX_PMU_COUNTERS];
> static void *snapshot_gva;
> static vm_paddr_t snapshot_gpa;
>
> +static int vcpu_shared_irq_count;
> +static int counter_in_use;
> +
> /* Cache the available counters in a bitmask */
> static unsigned long counter_mask_available;
>
> @@ -120,6 +124,31 @@ static void guest_illegal_exception_handler(struct ex_regs *regs)
> regs->epc += 4;
> }
>
> +static void guest_irq_handler(struct ex_regs *regs)
> +{
> + unsigned int irq_num = regs->cause & ~CAUSE_IRQ_FLAG;
> + struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> + unsigned long overflown_mask;
> + unsigned long counter_val = 0;
> +
> + /* Validate that we are in the correct irq handler */
> + GUEST_ASSERT_EQ(irq_num, IRQ_PMU_OVF);
> +
> + /* Stop all counters first to avoid further interrupts */
> + stop_counter(counter_in_use, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
> +
> + csr_clear(CSR_SIP, BIT(IRQ_PMU_OVF));
> +
> + overflown_mask = READ_ONCE(snapshot_data->ctr_overflow_mask);
> + GUEST_ASSERT(overflown_mask & 0x01);
> +
> + WRITE_ONCE(vcpu_shared_irq_count, vcpu_shared_irq_count+1);
> +
> + counter_val = READ_ONCE(snapshot_data->ctr_values[0]);
> + /* Now start the counter to mimick the real driver behavior */
> + start_counter(counter_in_use, SBI_PMU_START_FLAG_SET_INIT_VALUE, counter_val);
> +}
> +
> static unsigned long get_counter_index(unsigned long cbase, unsigned long cmask,
> unsigned long cflags,
> unsigned long event)
> @@ -318,6 +347,33 @@ static void test_pmu_event_snapshot(unsigned long event)
> stop_reset_counter(counter, 0);
> }
>
> +static void test_pmu_event_overflow(unsigned long event)
> +{
> + unsigned long counter;
> + unsigned long counter_value_post;
> + unsigned long counter_init_value = ULONG_MAX - 10000;
> + struct riscv_pmu_snapshot_data *snapshot_data = snapshot_gva;
> +
> + counter = get_counter_index(0, counter_mask_available, 0, event);
> + counter_in_use = counter;
> +
> + /* The counter value is updated w.r.t relative index of cbase passed to start/stop */
> + WRITE_ONCE(snapshot_data->ctr_values[0], counter_init_value);
> + start_counter(counter, SBI_PMU_START_FLAG_INIT_SNAPSHOT, 0);
> + dummy_func_loop(10000);
> + udelay(msecs_to_usecs(2000));
> + /* irq handler should have stopped the counter */
> + stop_counter(counter, SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT);
> +
> + counter_value_post = READ_ONCE(snapshot_data->ctr_values[0]);
> + /* The counter value after stopping should be less the init value due to overflow */
> + __GUEST_ASSERT(counter_value_post < counter_init_value,
> + "counter_value_post %lx counter_init_value %lx for counter\n",
> + counter_value_post, counter_init_value);
> +
> + stop_reset_counter(counter, 0);
> +}
> +
> static void test_invalid_event(void)
> {
> struct sbiret ret;
> @@ -413,6 +469,34 @@ static void test_pmu_events_snaphost(void)
> GUEST_DONE();
> }
>
> +static void test_pmu_events_overflow(void)
> +{
> + int num_counters = 0;
> +
> + /* Verify presence of SBI PMU and minimum requrired SBI version */
> + verify_sbi_requirement_assert();
> +
> + snapshot_set_shmem(snapshot_gpa, 0);
> + csr_set(CSR_IE, BIT(IRQ_PMU_OVF));
> + local_irq_enable();
> +
> + /* Get the counter details */
> + num_counters = get_num_counters();
> + update_counter_info(num_counters);
> +
> + /*
> + * Qemu supports overflow for cycle/instruction.
> + * This test may fail on any platform that do not support overflow for these two events.
> + */
> + test_pmu_event_overflow(SBI_PMU_HW_CPU_CYCLES);
> + GUEST_ASSERT_EQ(vcpu_shared_irq_count, 1);
> +
> + test_pmu_event_overflow(SBI_PMU_HW_INSTRUCTIONS);
> + GUEST_ASSERT_EQ(vcpu_shared_irq_count, 2);
> +
> + GUEST_DONE();
> +}
> +
> static void run_vcpu(struct kvm_vcpu *vcpu)
> {
> struct ucall uc;
> @@ -498,6 +582,32 @@ static void test_vm_events_snapshot_test(void *guest_code)
> test_vm_destroy(vm);
> }
>
> +static void test_vm_events_overflow(void *guest_code)
> +{
> + struct kvm_vm *vm = NULL;
> + struct kvm_vcpu *vcpu;
> +
> + vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> + __TEST_REQUIRE(__vcpu_has_sbi_ext(vcpu, KVM_RISCV_SBI_EXT_PMU),
> + "SBI PMU not available, skipping test");
> +
> + __TEST_REQUIRE(__vcpu_has_isa_ext(vcpu, KVM_RISCV_ISA_EXT_SSCOFPMF),
> + "Sscofpmf is not available, skipping overflow test");
> +
> + test_vm_setup_snapshot_mem(vm, vcpu);
> + vm_init_vector_tables(vm);
> + vm_install_interrupt_handler(vm, guest_irq_handler);
> +
> + vcpu_init_vector_tables(vcpu);
> + /* Initialize guest timer frequency. */
> + vcpu_get_reg(vcpu, RISCV_TIMER_REG(frequency), &timer_freq);
> + sync_global_to_guest(vm, timer_freq);
> +
> + run_vcpu(vcpu);
> +
> + test_vm_destroy(vm);
> +}
> +
> int main(void)
> {
> test_vm_basic_test(test_pmu_basic_sanity);
> @@ -509,5 +619,8 @@ int main(void)
> test_vm_events_snapshot_test(test_pmu_events_snaphost);
> pr_info("SBI PMU event verification with snapshot test : PASS\n");
>
> + test_vm_events_overflow(test_pmu_events_overflow);
> + pr_info("SBI PMU event verification with overflow test : PASS\n");
> +
> return 0;
> }
--
BR,
Muhammad Usama Anjum
On 4/20/24 8:17 PM, Atish Patra wrote:
> SBI PMU test comprises of multiple tests and user may want to run
> only a subset depending on the platform. The most common case would
> be to run all to validate all the tests. However, some platform may
> not support all events or all ISA extensions.
>
> The commandline option allows user to disable any set of tests if
> they want to.
>
> Suggested-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM
Reviewed-by: Muhammad Usama Anjum <[email protected]>
> ---
> .../selftests/kvm/riscv/sbi_pmu_test.c | 73 ++++++++++++++++---
> 1 file changed, 64 insertions(+), 9 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> index 0fd9b76ae838..69bb94e6b227 100644
> --- a/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> +++ b/tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> @@ -33,6 +33,13 @@ static unsigned long counter_mask_available;
>
> static bool illegal_handler_invoked;
>
> +#define SBI_PMU_TEST_BASIC BIT(0)
> +#define SBI_PMU_TEST_EVENTS BIT(1)
> +#define SBI_PMU_TEST_SNAPSHOT BIT(2)
> +#define SBI_PMU_TEST_OVERFLOW BIT(3)
> +
> +static int disabled_tests;
> +
> unsigned long pmu_csr_read_num(int csr_num)
> {
> #define switchcase_csr_read(__csr_num, __val) {\
> @@ -608,19 +615,67 @@ static void test_vm_events_overflow(void *guest_code)
> test_vm_destroy(vm);
> }
>
> -int main(void)
> +static void test_print_help(char *name)
> +{
> + pr_info("Usage: %s [-h] [-d <test name>]\n", name);
A little weird that we have pr_info named helper to print logs when it is a
userspace application, not kernel code. I'll check it in the source who
added it to the KVM tests.
> + pr_info("\t-d: Test to disable. Available tests are 'basic', 'events', 'snapshot', 'overflow'\n");
> + pr_info("\t-h: print this help screen\n");
> +}
> +
> +static bool parse_args(int argc, char *argv[])
> +{
> + int opt;
> +
> + while ((opt = getopt(argc, argv, "hd:")) != -1) {
> + switch (opt) {
> + case 'd':
> + if (!strncmp("basic", optarg, 5))
> + disabled_tests |= SBI_PMU_TEST_BASIC;
> + else if (!strncmp("events", optarg, 6))
> + disabled_tests |= SBI_PMU_TEST_EVENTS;
> + else if (!strncmp("snapshot", optarg, 8))
> + disabled_tests |= SBI_PMU_TEST_SNAPSHOT;
> + else if (!strncmp("overflow", optarg, 8))
> + disabled_tests |= SBI_PMU_TEST_OVERFLOW;
> + else
> + goto done;
> + break;
> + case 'h':
> + default:
> + goto done;
> + }
> + }
> +
> + return true;
> +done:
> + test_print_help(argv[0]);
> + return false;
> +}
> +
> +int main(int argc, char *argv[])
> {
> - test_vm_basic_test(test_pmu_basic_sanity);
> - pr_info("SBI PMU basic test : PASS\n");
> + if (!parse_args(argc, argv))
> + exit(KSFT_SKIP);
> +
> + if (!(disabled_tests & SBI_PMU_TEST_BASIC)) {
> + test_vm_basic_test(test_pmu_basic_sanity);
> + pr_info("SBI PMU basic test : PASS\n");
> + }
>
> - test_vm_events_test(test_pmu_events);
> - pr_info("SBI PMU event verification test : PASS\n");
> + if (!(disabled_tests & SBI_PMU_TEST_EVENTS)) {
> + test_vm_events_test(test_pmu_events);
> + pr_info("SBI PMU event verification test : PASS\n");
> + }
>
> - test_vm_events_snapshot_test(test_pmu_events_snaphost);
> - pr_info("SBI PMU event verification with snapshot test : PASS\n");
> + if (!(disabled_tests & SBI_PMU_TEST_SNAPSHOT)) {
> + test_vm_events_snapshot_test(test_pmu_events_snaphost);
> + pr_info("SBI PMU event verification with snapshot test : PASS\n");
> + }
>
> - test_vm_events_overflow(test_pmu_events_overflow);
> - pr_info("SBI PMU event verification with overflow test : PASS\n");
> + if (!(disabled_tests & SBI_PMU_TEST_OVERFLOW)) {
> + test_vm_events_overflow(test_pmu_events_overflow);
> + pr_info("SBI PMU event verification with overflow test : PASS\n");
> + }
>
> return 0;
> }
--
BR,
Muhammad Usama Anjum
On 4/20/24 8:17 PM, Atish Patra wrote:
> The SBI definitions will continue to grow. Move the sbi related
> definitions to its own header file from processor.h
>
> Suggested-by: Andrew Jones <[email protected]>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM
Reviewed-by: Muhammad Usama Anjum <[email protected]>
> ---
> .../selftests/kvm/include/riscv/processor.h | 39 ---------------
> .../testing/selftests/kvm/include/riscv/sbi.h | 50 +++++++++++++++++++
> .../selftests/kvm/include/riscv/ucall.h | 1 +
> tools/testing/selftests/kvm/steal_time.c | 4 +-
> 4 files changed, 54 insertions(+), 40 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/include/riscv/sbi.h
>
> diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
> index ce473fe251dd..3b9cb39327ff 100644
> --- a/tools/testing/selftests/kvm/include/riscv/processor.h
> +++ b/tools/testing/selftests/kvm/include/riscv/processor.h
> @@ -154,45 +154,6 @@ void vm_install_interrupt_handler(struct kvm_vm *vm, exception_handler_fn handle
> #define PGTBL_PAGE_SIZE PGTBL_L0_BLOCK_SIZE
> #define PGTBL_PAGE_SIZE_SHIFT PGTBL_L0_BLOCK_SHIFT
>
> -/* SBI return error codes */
> -#define SBI_SUCCESS 0
> -#define SBI_ERR_FAILURE -1
> -#define SBI_ERR_NOT_SUPPORTED -2
> -#define SBI_ERR_INVALID_PARAM -3
> -#define SBI_ERR_DENIED -4
> -#define SBI_ERR_INVALID_ADDRESS -5
> -#define SBI_ERR_ALREADY_AVAILABLE -6
> -#define SBI_ERR_ALREADY_STARTED -7
> -#define SBI_ERR_ALREADY_STOPPED -8
> -
> -#define SBI_EXT_EXPERIMENTAL_START 0x08000000
> -#define SBI_EXT_EXPERIMENTAL_END 0x08FFFFFF
> -
> -#define KVM_RISCV_SELFTESTS_SBI_EXT SBI_EXT_EXPERIMENTAL_END
> -#define KVM_RISCV_SELFTESTS_SBI_UCALL 0
> -#define KVM_RISCV_SELFTESTS_SBI_UNEXP 1
> -
> -enum sbi_ext_id {
> - SBI_EXT_BASE = 0x10,
> - SBI_EXT_STA = 0x535441,
> -};
> -
> -enum sbi_ext_base_fid {
> - SBI_EXT_BASE_PROBE_EXT = 3,
> -};
> -
> -struct sbiret {
> - long error;
> - long value;
> -};
> -
> -struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
> - unsigned long arg1, unsigned long arg2,
> - unsigned long arg3, unsigned long arg4,
> - unsigned long arg5);
> -
> -bool guest_sbi_probe_extension(int extid, long *out_val);
> -
> static inline void local_irq_enable(void)
> {
> csr_set(CSR_SSTATUS, SR_SIE);
> diff --git a/tools/testing/selftests/kvm/include/riscv/sbi.h b/tools/testing/selftests/kvm/include/riscv/sbi.h
> new file mode 100644
> index 000000000000..ba04f2dec7b5
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/include/riscv/sbi.h
> @@ -0,0 +1,50 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * RISC-V SBI specific definitions
> + *
> + * Copyright (C) 2024 Rivos Inc.
> + */
> +
> +#ifndef SELFTEST_KVM_SBI_H
> +#define SELFTEST_KVM_SBI_H
> +
> +/* SBI return error codes */
> +#define SBI_SUCCESS 0
> +#define SBI_ERR_FAILURE -1
> +#define SBI_ERR_NOT_SUPPORTED -2
> +#define SBI_ERR_INVALID_PARAM -3
> +#define SBI_ERR_DENIED -4
> +#define SBI_ERR_INVALID_ADDRESS -5
> +#define SBI_ERR_ALREADY_AVAILABLE -6
> +#define SBI_ERR_ALREADY_STARTED -7
> +#define SBI_ERR_ALREADY_STOPPED -8
> +
> +#define SBI_EXT_EXPERIMENTAL_START 0x08000000
> +#define SBI_EXT_EXPERIMENTAL_END 0x08FFFFFF
> +
> +#define KVM_RISCV_SELFTESTS_SBI_EXT SBI_EXT_EXPERIMENTAL_END
> +#define KVM_RISCV_SELFTESTS_SBI_UCALL 0
> +#define KVM_RISCV_SELFTESTS_SBI_UNEXP 1
> +
> +enum sbi_ext_id {
> + SBI_EXT_BASE = 0x10,
> + SBI_EXT_STA = 0x535441,
> +};
> +
> +enum sbi_ext_base_fid {
> + SBI_EXT_BASE_PROBE_EXT = 3,
> +};
> +
> +struct sbiret {
> + long error;
> + long value;
> +};
> +
> +struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
> + unsigned long arg1, unsigned long arg2,
> + unsigned long arg3, unsigned long arg4,
> + unsigned long arg5);
> +
> +bool guest_sbi_probe_extension(int extid, long *out_val);
> +
> +#endif /* SELFTEST_KVM_SBI_H */
> diff --git a/tools/testing/selftests/kvm/include/riscv/ucall.h b/tools/testing/selftests/kvm/include/riscv/ucall.h
> index be46eb32ec27..a695ae36f3e0 100644
> --- a/tools/testing/selftests/kvm/include/riscv/ucall.h
> +++ b/tools/testing/selftests/kvm/include/riscv/ucall.h
> @@ -3,6 +3,7 @@
> #define SELFTEST_KVM_UCALL_H
>
> #include "processor.h"
> +#include "sbi.h"
>
> #define UCALL_EXIT_REASON KVM_EXIT_RISCV_SBI
>
> diff --git a/tools/testing/selftests/kvm/steal_time.c b/tools/testing/selftests/kvm/steal_time.c
> index bae0c5026f82..2ff82c7fd926 100644
> --- a/tools/testing/selftests/kvm/steal_time.c
> +++ b/tools/testing/selftests/kvm/steal_time.c
> @@ -11,7 +11,9 @@
> #include <pthread.h>
> #include <linux/kernel.h>
> #include <asm/kvm.h>
> -#ifndef __riscv
> +#ifdef __riscv
> +#include "sbi.h"
> +#else
> #include <asm/kvm_para.h>
> #endif
>
--
BR,
Muhammad Usama Anjum
On 4/20/24 8:17 PM, Atish Patra wrote:
> __vcpu_has_ext can check both SBI and ISA extensions when the first
> argument is properly converted to SBI/ISA extension IDs. Introduce
> two helper functions to make life easier for developers so they
> don't have to worry about the conversions.
>
> Replace the current usages as well with new helpers.
>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM
Reviewed-by: Muhammad Usama Anjum <[email protected]>
> ---
> tools/testing/selftests/kvm/include/riscv/processor.h | 10 ++++++++++
> tools/testing/selftests/kvm/riscv/arch_timer.c | 2 +-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/kvm/include/riscv/processor.h b/tools/testing/selftests/kvm/include/riscv/processor.h
> index 3b9cb39327ff..5f389166338c 100644
> --- a/tools/testing/selftests/kvm/include/riscv/processor.h
> +++ b/tools/testing/selftests/kvm/include/riscv/processor.h
> @@ -50,6 +50,16 @@ static inline uint64_t __kvm_reg_id(uint64_t type, uint64_t subtype,
>
> bool __vcpu_has_ext(struct kvm_vcpu *vcpu, uint64_t ext);
>
> +static inline bool __vcpu_has_isa_ext(struct kvm_vcpu *vcpu, uint64_t isa_ext)
> +{
> + return __vcpu_has_ext(vcpu, RISCV_ISA_EXT_REG(isa_ext));
> +}
> +
> +static inline bool __vcpu_has_sbi_ext(struct kvm_vcpu *vcpu, uint64_t sbi_ext)
> +{
> + return __vcpu_has_ext(vcpu, RISCV_SBI_EXT_REG(sbi_ext));
> +}
> +
> struct ex_regs {
> unsigned long ra;
> unsigned long sp;
> diff --git a/tools/testing/selftests/kvm/riscv/arch_timer.c b/tools/testing/selftests/kvm/riscv/arch_timer.c
> index 0f9cabd99fd4..735b78569021 100644
> --- a/tools/testing/selftests/kvm/riscv/arch_timer.c
> +++ b/tools/testing/selftests/kvm/riscv/arch_timer.c
> @@ -85,7 +85,7 @@ struct kvm_vm *test_vm_create(void)
> int nr_vcpus = test_args.nr_vcpus;
>
> vm = vm_create_with_vcpus(nr_vcpus, guest_code, vcpus);
> - __TEST_REQUIRE(__vcpu_has_ext(vcpus[0], RISCV_ISA_EXT_REG(KVM_RISCV_ISA_EXT_SSTC)),
> + __TEST_REQUIRE(__vcpu_has_isa_ext(vcpus[0], KVM_RISCV_ISA_EXT_SSTC),
> "SSTC not available, skipping test\n");
>
> vm_init_vector_tables(vm);
--
BR,
Muhammad Usama Anjum
On 4/20/24 8:17 PM, Atish Patra wrote:
> The KVM RISC-V allows Sscofpmf extension for Guest/VM so let us
> add this extension to get-reg-list test.
>
> Reviewed-by: Anup Patel <[email protected]>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
LGTM
Reviewed-by: Muhammad Usama Anjum <[email protected]>
> ---
> tools/testing/selftests/kvm/riscv/get-reg-list.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/riscv/get-reg-list.c b/tools/testing/selftests/kvm/riscv/get-reg-list.c
> index b882b7b9b785..222198dd6d04 100644
> --- a/tools/testing/selftests/kvm/riscv/get-reg-list.c
> +++ b/tools/testing/selftests/kvm/riscv/get-reg-list.c
> @@ -43,6 +43,7 @@ bool filter_reg(__u64 reg)
> case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_V:
> case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SMSTATEEN:
> case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSAIA:
> + case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSCOFPMF:
> case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SSTC:
> case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVINVAL:
> case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVNAPOT:
> @@ -408,6 +409,7 @@ static const char *isa_ext_single_id_to_str(__u64 reg_off)
> KVM_ISA_EXT_ARR(V),
> KVM_ISA_EXT_ARR(SMSTATEEN),
> KVM_ISA_EXT_ARR(SSAIA),
> + KVM_ISA_EXT_ARR(SSCOFPMF),
> KVM_ISA_EXT_ARR(SSTC),
> KVM_ISA_EXT_ARR(SVINVAL),
> KVM_ISA_EXT_ARR(SVNAPOT),
> @@ -931,6 +933,7 @@ KVM_ISA_EXT_SUBLIST_CONFIG(fp_f, FP_F);
> KVM_ISA_EXT_SUBLIST_CONFIG(fp_d, FP_D);
> KVM_ISA_EXT_SIMPLE_CONFIG(h, H);
> KVM_ISA_EXT_SUBLIST_CONFIG(smstateen, SMSTATEEN);
> +KVM_ISA_EXT_SIMPLE_CONFIG(sscofpmf, SSCOFPMF);
> KVM_ISA_EXT_SIMPLE_CONFIG(sstc, SSTC);
> KVM_ISA_EXT_SIMPLE_CONFIG(svinval, SVINVAL);
> KVM_ISA_EXT_SIMPLE_CONFIG(svnapot, SVNAPOT);
> @@ -986,6 +989,7 @@ struct vcpu_reg_list *vcpu_configs[] = {
> &config_fp_d,
> &config_h,
> &config_smstateen,
> + &config_sscofpmf,
> &config_sstc,
> &config_svinval,
> &config_svnapot,
--
BR,
Muhammad Usama Anjum
On Tue, Apr 23, 2024 at 02:03:29PM +0500, Muhammad Usama Anjum wrote:
..
> > + pr_info("Usage: %s [-h] [-d <test name>]\n", name);
> A little weird that we have pr_info named helper to print logs when it is a
> userspace application, not kernel code. I'll check it in the source who
> added it to the KVM tests.
>
It was me, as git-blame will easily show you. Why is it "weird"?
Applications have needs for pr_info-like functions too, and the pr_info
name isn't reserved for the kernel. The only thing weird I see is that
I didn't differentiate pr_debug and pr_info messages. I probably should
have at least given pr_debug output a 'debug:' prefix.
Thanks,
drew
On Mon, Apr 22, 2024 at 9:06 AM Samuel Holland
<[email protected]> wrote:
>
> Hi Atish,
>
> On 2024-04-20 10:17 AM, Atish Patra wrote:
> > SBI v2.0 SBI introduced PMU snapshot feature which adds the following
> > features.
> >
> > 1. Read counter values directly from the shared memory instead of
> > csr read.
> > 2. Start multiple counters with initial values with one SBI call.
> >
> > These functionalities optimizes the number of traps to the higher
> > privilege mode. If the kernel is in VS mode while the hypervisor
> > deploy trap & emulate method, this would minimize all the hpmcounter
> > CSR read traps. If the kernel is running in S-mode, the benefits
> > reduced to CSR latency vs DRAM/cache latency as there is no trap
> > involved while accessing the hpmcounter CSRs.
> >
> > In both modes, it does saves the number of ecalls while starting
> > multiple counter together with an initial values. This is a likely
> > scenario if multiple counters overflow at the same time.
> >
> > Acked-by: Palmer Dabbelt <[email protected]>
> > Reviewed-by: Anup Patel <[email protected]>
> > Reviewed-by: Conor Dooley <[email protected]>
> > Reviewed-by: Andrew Jones <[email protected]>
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > drivers/perf/riscv_pmu.c | 1 +
> > drivers/perf/riscv_pmu_sbi.c | 265 ++++++++++++++++++++++++++++++---
> > include/linux/perf/riscv_pmu.h | 6 +
> > 3 files changed, 255 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
> > index b4efdddb2ad9..36d348753d05 100644
> > --- a/drivers/perf/riscv_pmu.c
> > +++ b/drivers/perf/riscv_pmu.c
> > @@ -408,6 +408,7 @@ struct riscv_pmu *riscv_pmu_alloc(void)
> > cpuc->n_events = 0;
> > for (i = 0; i < RISCV_MAX_COUNTERS; i++)
> > cpuc->events[i] = NULL;
> > + cpuc->snapshot_addr = NULL;
> > }
> > pmu->pmu = (struct pmu) {
> > .event_init = riscv_pmu_event_init,
> > diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> > index 4eacd89141a9..2694110f1cff 100644
> > --- a/drivers/perf/riscv_pmu_sbi.c
> > +++ b/drivers/perf/riscv_pmu_sbi.c
> > @@ -58,6 +58,9 @@ PMU_FORMAT_ATTR(event, "config:0-47");
> > PMU_FORMAT_ATTR(firmware, "config:63");
> >
> > static bool sbi_v2_available;
> > +static DEFINE_STATIC_KEY_FALSE(sbi_pmu_snapshot_available);
> > +#define sbi_pmu_snapshot_available() \
> > + static_branch_unlikely(&sbi_pmu_snapshot_available)
> >
> > static struct attribute *riscv_arch_formats_attr[] = {
> > &format_attr_event.attr,
> > @@ -508,14 +511,105 @@ static int pmu_sbi_event_map(struct perf_event *event, u64 *econfig)
> > return ret;
> > }
> >
> > +static void pmu_sbi_snapshot_free(struct riscv_pmu *pmu)
> > +{
> > + int cpu;
> > +
> > + for_each_possible_cpu(cpu) {
> > + struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> > +
> > + if (!cpu_hw_evt->snapshot_addr)
> > + continue;
> > +
> > + free_page((unsigned long)cpu_hw_evt->snapshot_addr);
> > + cpu_hw_evt->snapshot_addr = NULL;
> > + cpu_hw_evt->snapshot_addr_phys = 0;
> > + }
> > +}
> > +
> > +static int pmu_sbi_snapshot_alloc(struct riscv_pmu *pmu)
> > +{
> > + int cpu;
> > + struct page *snapshot_page;
> > +
> > + for_each_possible_cpu(cpu) {
> > + struct cpu_hw_events *cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> > +
> > + snapshot_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
> > + if (!snapshot_page) {
> > + pmu_sbi_snapshot_free(pmu);
> > + return -ENOMEM;
> > + }
> > + cpu_hw_evt->snapshot_addr = page_to_virt(snapshot_page);
> > + cpu_hw_evt->snapshot_addr_phys = page_to_phys(snapshot_page);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int pmu_sbi_snapshot_disable(void)
> > +{
> > + struct sbiret ret;
> > +
> > + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM, SBI_SHMEM_DISABLE,
> > + SBI_SHMEM_DISABLE, 0, 0, 0, 0);
> > + if (ret.error) {
> > + pr_warn("failed to disable snapshot shared memory\n");
> > + return sbi_err_map_linux_errno(ret.error);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int pmu_sbi_snapshot_setup(struct riscv_pmu *pmu, int cpu)
> > +{
> > + struct cpu_hw_events *cpu_hw_evt;
> > + struct sbiret ret = {0};
> > +
> > + cpu_hw_evt = per_cpu_ptr(pmu->hw_events, cpu);
> > + if (!cpu_hw_evt->snapshot_addr_phys)
> > + return -EINVAL;
> > +
> > + if (cpu_hw_evt->snapshot_set_done)
> > + return 0;
> > +
> > + if (IS_ENABLED(CONFIG_32BIT))
> > + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> > + cpu_hw_evt->snapshot_addr_phys,
> > + (u64)(cpu_hw_evt->snapshot_addr_phys) >> 32, 0, 0, 0, 0);
> > + else
> > + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_SNAPSHOT_SET_SHMEM,
> > + cpu_hw_evt->snapshot_addr_phys, 0, 0, 0, 0, 0);
> > +
> > + /* Free up the snapshot area memory and fall back to SBI PMU calls without snapshot */
> > + if (ret.error) {
> > + if (ret.error != SBI_ERR_NOT_SUPPORTED)
> > + pr_warn("pmu snapshot setup failed with error %ld\n", ret.error);
> > + return sbi_err_map_linux_errno(ret.error);
> > + }
> > +
> > + cpu_hw_evt->snapshot_set_done = true;
> > +
> > + return 0;
> > +}
> > +
> > static u64 pmu_sbi_ctr_read(struct perf_event *event)
> > {
> > struct hw_perf_event *hwc = &event->hw;
> > int idx = hwc->idx;
> > struct sbiret ret;
> > u64 val = 0;
> > + struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
> > + struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> > + struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> > union sbi_pmu_ctr_info info = pmu_ctr_list[idx];
> >
> > + /* Read the value from the shared memory directly only if counter is stopped */
> > + if (sbi_pmu_snapshot_available() & (hwc->state & PERF_HES_STOPPED)) {
>
> nit: logical && between the two conditions
>
Fixed.
> > + val = sdata->ctr_values[idx];
> > + return val;
> > + }
> > +
> > if (pmu_sbi_is_fw_event(event)) {
> > ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_FW_READ,
> > hwc->idx, 0, 0, 0, 0, 0);
> > @@ -565,6 +659,7 @@ static void pmu_sbi_ctr_start(struct perf_event *event, u64 ival)
> > struct hw_perf_event *hwc = &event->hw;
> > unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
> >
> > + /* There is no benefit setting SNAPSHOT FLAG for a single counter */
> > #if defined(CONFIG_32BIT)
> > ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, hwc->idx,
> > 1, flag, ival, ival >> 32, 0);
> > @@ -585,16 +680,36 @@ static void pmu_sbi_ctr_stop(struct perf_event *event, unsigned long flag)
> > {
> > struct sbiret ret;
> > struct hw_perf_event *hwc = &event->hw;
> > + struct riscv_pmu *pmu = to_riscv_pmu(event->pmu);
> > + struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> > + struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> >
> > if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
> > (hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
> > pmu_sbi_reset_scounteren((void *)event);
> >
> > + if (sbi_pmu_snapshot_available())
> > + flag |= SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
> > +
> > ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, hwc->idx, 1, flag, 0, 0, 0);
> > - if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
> > - flag != SBI_PMU_STOP_FLAG_RESET)
> > + if (!ret.error && sbi_pmu_snapshot_available()) {
> > + /*
> > + * The counter snapshot is based on the index base specified by hwc->idx.
> > + * The actual counter value is updated in shared memory at index 0 when counter
> > + * mask is 0x01. To ensure accurate counter values, it's necessary to transfer
> > + * the counter value to shared memory. However, if hwc->idx is zero, the counter
> > + * value is already correctly updated in shared memory, requiring no further
> > + * adjustment.
> > + */
> > + if (hwc->idx > 0) {
> > + sdata->ctr_values[hwc->idx] = sdata->ctr_values[0];
> > + sdata->ctr_values[0] = 0;
> > + }
> > + } else if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
> > + flag != SBI_PMU_STOP_FLAG_RESET) {
> > pr_err("Stopping counter idx %d failed with error %d\n",
> > hwc->idx, sbi_err_map_linux_errno(ret.error));
> > + }
> > }
> >
> > static int pmu_sbi_find_num_ctrs(void)
> > @@ -652,12 +767,39 @@ static inline void pmu_sbi_stop_all(struct riscv_pmu *pmu)
> > static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
> > {
> > struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> > + struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> > + unsigned long flag = 0;
> > int i;
> > + struct sbiret ret;
> > + unsigned long temp_ctr_values[64] = {0};
> > + unsigned long ctr_val, temp_ctr_overflow_mask = 0;
> >
> > - for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++)
> > + if (sbi_pmu_snapshot_available())
> > + flag = SBI_PMU_STOP_FLAG_TAKE_SNAPSHOT;
> > +
> > + for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++) {
> > /* No need to check the error here as we can't do anything about the error */
> > - sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, i * BITS_PER_LONG,
> > - cpu_hw_evt->used_hw_ctrs[i], 0, 0, 0, 0);
> > + ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, i * BITS_PER_LONG,
> > + cpu_hw_evt->used_hw_ctrs[i], flag, 0, 0, 0);
> > + if (!ret.error && sbi_pmu_snapshot_available()) {
> > + /* Save the counter values to avoid clobbering */
> > + temp_ctr_values[i * BITS_PER_LONG + i] = sdata->ctr_values[i];
> > + /* Save the overflow mask to avoid clobbering */
> > + if (BIT(i) & sdata->ctr_overflow_mask)
> > + temp_ctr_overflow_mask |= BIT(i + i * BITS_PER_LONG);
>
> `i` is iterating over longs here, not individual counters. Using `i` twice in
> the same expression like this makes no sense.
>
> And again, since you made temp_ctr_overflow_mask an `unsigned long`, this BIT
> expression overflows during the second iteration of the loop on riscv32.
>
Arrgh. I don't know what I was thinking. That was the result of me
sending the patch in haste.
> > + }
> > + }
> > +
> > + /* Restore the counter values to the shared memory */
> > + if (sbi_pmu_snapshot_available()) {
> > + for (i = 0; i < 64; i++) {
> > + ctr_val = temp_ctr_values[i];
> > + if (ctr_val)
> > + sdata->ctr_values[i] = ctr_val;
>
> It is entirely possible for the correct value of a counter to be zero when some
> other counter overflows. But here if a counter with value zero is clobbered by a
> nonzero value, the correct value will not be restored.
>
Very unlikely but possible. Anyways, it doesn't save much by only
restoring the values
with non-zero. So I will remove it.
> > + if (temp_ctr_overflow_mask)
> > + sdata->ctr_overflow_mask = temp_ctr_overflow_mask;
>
> This doesn't need to be inside the loop. And if temp_ctr_overflow_mask is the
Sure. I will move it outside.
> 64-bit combination of the two 32-bit values from the SBI calls (which seems to
> be the intention), then we already know there is at least one set bit, since
> this is called from the overflow IRQ handler.
The check is there as a sanity check incase there is a spurious interrupt.
>
> > + }
> > + }
> > }
> >
> > /*
> > @@ -666,11 +808,10 @@ static inline void pmu_sbi_stop_hw_ctrs(struct riscv_pmu *pmu)
> > * while the overflowed counters need to be started with updated initialization
> > * value.
> > */
> > -static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> > - unsigned long ctr_ovf_mask)
> > +static inline void pmu_sbi_start_ovf_ctrs_sbi(struct cpu_hw_events *cpu_hw_evt,
> > + u64 ctr_ovf_mask)
> > {
> > int idx = 0, i;
> > - struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> > struct perf_event *event;
> > unsigned long flag = SBI_PMU_START_FLAG_SET_INIT_VALUE;
> > unsigned long ctr_start_mask = 0;
> > @@ -706,6 +847,48 @@ static inline void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> > }
> > }
> >
> > +static inline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_evt,
> > + u64 ctr_ovf_mask)
> > +{
> > + int idx = 0;
> > + struct perf_event *event;
> > + unsigned long flag = SBI_PMU_START_FLAG_INIT_SNAPSHOT;
> > + u64 max_period, init_val = 0;
> > + struct hw_perf_event *hwc;
> > + struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> > +
> > + for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
> > + if (ctr_ovf_mask & BIT(idx)) {
> > + event = cpu_hw_evt->events[idx];
> > + hwc = &event->hw;
> > + max_period = riscv_pmu_ctr_get_width_mask(event);
> > + init_val = local64_read(&hwc->prev_count) & max_period;
> > + sdata->ctr_values[idx] = init_val;
> > + }
> > + /*
> > + * We do not need to update the non-overflow counters the previous
> > + * value should have been there already.
> > + */
> > + }
> > +
> > + for (idx = 0; idx < BITS_TO_LONGS(RISCV_MAX_COUNTERS); idx++) {
> > + /* Start all the counters in a single shot */
> > + sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, idx * BITS_PER_LONG,
> > + cpu_hw_evt->used_hw_ctrs[idx], flag, 0, 0, 0);
>
> This doesn't work either... for riscv32 you have to copy the initial values for
> counters 32-63 to ctr_values indexes 0-31, since that is where the SBI
> implementation expects to find them. And since this clobbers counters 0-31,
> those need to be preserved somewhere. That's why I recommended a shadow copy of
> ctr_values -- globally, not local to pmu_sbi_stop_hw_ctrs().
>
Yeah. Again, it's the result of haste to send the patch on Friday
evening before the trip. My bad!
> Also, you need SBI_PMU_START_FLAG_INIT_SNAPSHOT here to actually load the values
> from shared memory, so this is broken on riscv64 as well.
The "flag" declared above is initialized with SBI_PMU_START_FLAG_INIT_SNAPSHOT.
>
> > + }
> > +}
> > +
> > +static void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
> > + u64 ctr_ovf_mask)
> > +{
> > + struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
> > +
> > + if (sbi_pmu_snapshot_available())
> > + pmu_sbi_start_ovf_ctrs_snapshot(cpu_hw_evt, ctr_ovf_mask);
> > + else
> > + pmu_sbi_start_ovf_ctrs_sbi(cpu_hw_evt, ctr_ovf_mask);
> > +}
> > +
> > static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> > {
> > struct perf_sample_data data;
> > @@ -716,9 +899,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> > struct riscv_pmu *pmu;
> > struct perf_event *event;
> > unsigned long overflow;
> > - unsigned long overflowed_ctrs = 0;
> > + u64 overflowed_ctrs = 0;
> > struct cpu_hw_events *cpu_hw_evt = dev;
> > u64 start_clock = sched_clock();
> > + struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
> >
> > if (WARN_ON_ONCE(!cpu_hw_evt))
> > return IRQ_NONE;
> > @@ -740,7 +924,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> > pmu_sbi_stop_hw_ctrs(pmu);
> >
> > /* Overflow status register should only be read after counter are stopped */
> > - ALT_SBI_PMU_OVERFLOW(overflow);
> > + if (sbi_pmu_snapshot_available())
> > + overflow = sdata->ctr_overflow_mask;
>
> Overflow is `unsigned long`, so this assignment truncates ctr_overflow_mask on
> riscv32.
>
Will fix it.
> Was this series tested on riscv32?
>
This particular code path is very unlikely in normal testing scenarios
as it requires more than 32 counters for rv32 configuration which is
not the common case. Moreover, the physical counter indices are less
than 32 to match the hardware CSR number. You can only configure more
than 32 counters with firmware counters but kvm currently doesn't
support overflow.
Thus, we need to construct an artificial case to create logical
counters with indexes more than 32 for hardware counters with CSR
remapping or support firmware counter overflow.
Let me know if you have any better ideas for quicker testing.
Regarding the whole series, I had tested some earlier versions but
perf tool seems to be broken for riscv32 now. I will fix it and give
it a spin. If you have a working setup of rv32 perf testing, let me
know as well.
Regards,
Atish
> Regards,
> Samuel
>
> > + else
> > + ALT_SBI_PMU_OVERFLOW(overflow);
> >
> > /*
> > * Overflow interrupt pending bit should only be cleared after stopping
> > @@ -766,9 +953,14 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> > if (!info || info->type != SBI_PMU_CTR_TYPE_HW)
> > continue;
> >
> > - /* compute hardware counter index */
> > - hidx = info->csr - CSR_CYCLE;
> > - /* check if the corresponding bit is set in sscountovf */
> > + if (sbi_pmu_snapshot_available())
> > + /* SBI implementation already updated the logical indicies */
> > + hidx = lidx;
> > + else
> > + /* compute hardware counter index */
> > + hidx = info->csr - CSR_CYCLE;
> > +
> > + /* check if the corresponding bit is set in sscountovf or overflow mask in shmem */
> > if (!(overflow & BIT(hidx)))
> > continue;
> >
> > @@ -778,7 +970,10 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> > */
> > overflowed_ctrs |= BIT(lidx);
> > hw_evt = &event->hw;
> > + /* Update the event states here so that we know the state while reading */
> > + hw_evt->state |= PERF_HES_STOPPED;
> > riscv_pmu_event_update(event);
> > + hw_evt->state |= PERF_HES_UPTODATE;
> > perf_sample_data_init(&data, 0, hw_evt->last_period);
> > if (riscv_pmu_event_set_period(event)) {
> > /*
> > @@ -791,6 +986,8 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
> > */
> > perf_event_overflow(event, &data, regs);
> > }
> > + /* Reset the state as we are going to start the counter after the loop */
> > + hw_evt->state = 0;
> > }
> >
> > pmu_sbi_start_overflow_mask(pmu, overflowed_ctrs);
> > @@ -822,6 +1019,9 @@ static int pmu_sbi_starting_cpu(unsigned int cpu, struct hlist_node *node)
> > enable_percpu_irq(riscv_pmu_irq, IRQ_TYPE_NONE);
> > }
> >
> > + if (sbi_pmu_snapshot_available())
> > + return pmu_sbi_snapshot_setup(pmu, cpu);
> > +
> > return 0;
> > }
> >
> > @@ -834,6 +1034,9 @@ static int pmu_sbi_dying_cpu(unsigned int cpu, struct hlist_node *node)
> > /* Disable all counters access for user mode now */
> > csr_write(CSR_SCOUNTEREN, 0x0);
> >
> > + if (sbi_pmu_snapshot_available())
> > + return pmu_sbi_snapshot_disable();
> > +
> > return 0;
> > }
> >
> > @@ -942,6 +1145,12 @@ static inline void riscv_pm_pmu_unregister(struct riscv_pmu *pmu) { }
> >
> > static void riscv_pmu_destroy(struct riscv_pmu *pmu)
> > {
> > + if (sbi_v2_available) {
> > + if (sbi_pmu_snapshot_available()) {
> > + pmu_sbi_snapshot_disable();
> > + pmu_sbi_snapshot_free(pmu);
> > + }
> > + }
> > riscv_pm_pmu_unregister(pmu);
> > cpuhp_state_remove_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
> > }
> > @@ -1109,10 +1318,6 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
> > pmu->event_unmapped = pmu_sbi_event_unmapped;
> > pmu->csr_index = pmu_sbi_csr_index;
> >
> > - ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
> > - if (ret)
> > - return ret;
> > -
> > ret = riscv_pm_pmu_register(pmu);
> > if (ret)
> > goto out_unregister;
> > @@ -1121,8 +1326,34 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
> > if (ret)
> > goto out_unregister;
> >
> > + /* SBI PMU Snapsphot is only available in SBI v2.0 */
> > + if (sbi_v2_available) {
> > + ret = pmu_sbi_snapshot_alloc(pmu);
> > + if (ret)
> > + goto out_unregister;
> > +
> > + ret = pmu_sbi_snapshot_setup(pmu, smp_processor_id());
> > + if (ret) {
> > + /* Snapshot is an optional feature. Continue if not available */
> > + pmu_sbi_snapshot_free(pmu);
> > + } else {
> > + pr_info("SBI PMU snapshot detected\n");
> > + /*
> > + * We enable it once here for the boot cpu. If snapshot shmem setup
> > + * fails during cpu hotplug process, it will fail to start the cpu
> > + * as we can not handle hetergenous PMUs with different snapshot
> > + * capability.
> > + */
> > + static_branch_enable(&sbi_pmu_snapshot_available);
> > + }
> > + }
> > +
> > register_sysctl("kernel", sbi_pmu_sysctl_table);
> >
> > + ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
> > + if (ret)
> > + goto out_unregister;
> > +
> > return 0;
> >
> > out_unregister:
> > diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
> > index 43282e22ebe1..c3fa90970042 100644
> > --- a/include/linux/perf/riscv_pmu.h
> > +++ b/include/linux/perf/riscv_pmu.h
> > @@ -39,6 +39,12 @@ struct cpu_hw_events {
> > DECLARE_BITMAP(used_hw_ctrs, RISCV_MAX_COUNTERS);
> > /* currently enabled firmware counters */
> > DECLARE_BITMAP(used_fw_ctrs, RISCV_MAX_COUNTERS);
> > + /* The virtual address of the shared memory where counter snapshot will be taken */
> > + void *snapshot_addr;
> > + /* The physical address of the shared memory where counter snapshot will be taken */
> > + phys_addr_t snapshot_addr_phys;
> > + /* Boolean flag to indicate setup is already done */
> > + bool snapshot_set_done;
> > };
> >
> > struct riscv_pmu {
>
Hi Palmer,
On Mon, Apr 22, 2024 at 3:29 PM Anup Patel <[email protected]> wrote:
>
> On Sat, Apr 20, 2024 at 5:17 AM Atish Patra <[email protected]> wrote:
> >
> > This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot
> > and fw_read_hi() functions.
> >
> > SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation
> > to provide counter information (i.e. values/overflow status) via a shared
> > memory between the SBI implementation and supervisor OS. This allows to minimize
> > the number of traps in when perf being used inside a kvm guest as it relies on
> > SBI PMU + trap/emulation of the counters.
> >
> > The current set of ratified RISC-V specification also doesn't allow scountovf
> > to be trap/emulated by the hypervisor. The SBI PMU snapshot bridges the gap
> > in ISA as well and enables perf sampling in the guest. However, LCOFI in the
> > guest only works via IRQ filtering in AIA specification. That's why, AIA
> > has to be enabled in the hardware (at least the Ssaia extension) in order to
> > use the sampling support in the perf.
> >
> > Here are the patch wise implementation details.
> >
> > PATCH 1,4,7,8,9,10,11,15 : Generic cleanups/improvements.
> > PATCH 2,3,14 : FW_READ_HI function implementation
> > PATCH 5-6: Add PMU snapshot feature in sbi pmu driver
> > PATCH 12-13: KVM implementation for snapshot and sampling in kvm guests
> > PATCH 16-17: Generic improvements for kvm selftests
> > PATCH 18-22: KVM selftests for SBI PMU extension
> >
> > The series is based on v6.9-rc4 and is available at:
> >
> > https://github.com/atishp04/linux/tree/kvm_pmu_snapshot_v8
> >
> > The kvmtool patch is also available at:
> > https://github.com/atishp04/kvmtool/tree/sscofpmf
> >
> > It also requires Ssaia ISA extension to be present in the hardware in order to
> > get perf sampling support in the guest. In Qemu virt machine, it can be done
> > by the following config.
> >
> > ```
> > -cpu rv64,sscofpmf=true,x-ssaia=true
> > ```
> >
> > There is no other dependencies on AIA apart from that. Thus, Ssaia must be disabled
> > for the guest if AIA patches are not available. Here is the example command.
> >
> > ```
> > ./lkvm-static run -m 256 -c2 --console serial -p "console=ttyS0 earlycon" --disable-ssaia -k ./Image --debug
> > ```
> >
> > The series has been tested only in Qemu.
> > Here is the snippet of the perf running inside a kvm guest.
> >
> > ===================================================
> > $ perf record -e cycles -e instructions perf bench sched messaging -g 5
> > ...
> > $ Running 'sched/messaging' benchmark:
> > ...
> > [ 45.928723] perf_duration_warn: 2 callbacks suppressed
> > [ 45.929000] perf: interrupt took too long (484426 > 483186), lowering kernel.perf_event_max_sample_rate to 250
> > $ 20 sender and receiver processes per group
> > $ 5 groups == 200 processes run
> >
> > Total time: 14.220 [sec]
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.117 MB perf.data (1942 samples) ]
> > $ perf report --stdio
> > $ To display the perf.data header info, please use --header/--header-only optio>
> > $
> > $
> > $ Total Lost Samples: 0
> > $
> > $ Samples: 943 of event 'cycles'
> > $ Event count (approx.): 5128976844
> > $
> > $ Overhead Command Shared Object Symbol >
> > $ ........ ............... ........................... ....................>
> > $
> > 7.59% sched-messaging [kernel.kallsyms] [k] memcpy
> > 5.48% sched-messaging [kernel.kallsyms] [k] percpu_counter_ad>
> > 5.24% sched-messaging [kernel.kallsyms] [k] __sbi_rfence_v02_>
> > 4.00% sched-messaging [kernel.kallsyms] [k] _raw_spin_unlock_>
> > 3.79% sched-messaging [kernel.kallsyms] [k] set_pte_range
> > 3.72% sched-messaging [kernel.kallsyms] [k] next_uptodate_fol>
> > 3.46% sched-messaging [kernel.kallsyms] [k] filemap_map_pages
> > 3.31% sched-messaging [kernel.kallsyms] [k] handle_mm_fault
> > 3.20% sched-messaging [kernel.kallsyms] [k] finish_task_switc>
> > 3.16% sched-messaging [kernel.kallsyms] [k] clear_page
> > 3.03% sched-messaging [kernel.kallsyms] [k] mtree_range_walk
> > 2.42% sched-messaging [kernel.kallsyms] [k] flush_icache_pte
> >
> > ===================================================
> >
> > [1] https://github.com/riscv-non-isa/riscv-sbi-doc
> >
> > Changes from v7->v8:
> > 1. Updated event states so that shared memory is updated only during stop
> > operations.
> > 2. Avoid clobbering lower XLEN counter/overflow values in shared memory
> > by maintaining a temporary copy for RV32.
> > 3. Improved overflow handling in snapshot case by supporting all 64 values.
> > 4. Minor cleanups based on suggestions on v7.
> >
> > Changes from v6->v7:
> > 1. Used SBI_SHMEM_DISABLE in the driver.
> > 2. Added RB Tags.
> > 3. Improved the sbi_pmu_test commandline to allow disabling multiple
> > tests.
> >
> > Changes from v5->v6:
> > 1. Added a patch for command line option for the sbi pmu tests.
> > 2. Removed redundant prints and restructure the code little bit.
> > 3. Added a patch for computing the sbi minor version correctly.
> > 4. Addressed all other comments on v5.
> >
> > Changes from v4->v5:
> > 1. Moved sbi related definitions to its own header file from processor.h
> > 2. Added few helper functions for selftests.
> > 3. Improved firmware counter read and RV32 start/stop functions.
> > 4. Converted all the shifting operations to use BIT macro
> > 5. Addressed all other comments on v4.
> >
> > Changes from v3->v4:
> > 1. Added selftests.
> > 2. Fixed an issue to clear the interrupt pending bits.
> > 3. Fixed the counter index in snapshot memory start function.
> >
> > Changes from v2->v3:
> > 1. Fixed a patchwork warning on patch6.
> > 2. Fixed a comment formatting & nit fix in PATCH 3 & 5.
> > 3. Moved the hvien update and sscofpmf enabling to PATCH 9 from PATCH 8.
> >
> > Changes from v1->v2:
> > 1. Fixed warning/errors from patchwork CI.
> > 2. Rebased on top of kvm-next.
> > 3. Added Acked-by tags.
> >
> > Changes from RFC->v1:
> > 1. Addressed all the comments on RFC series.
> > 2. Removed PATCH2 and merged into later patches.
> > 3. Added 2 more patches for minor fixes.
> > 4. Fixed KVM boot issue without Ssaia and made sscofpmf in guest dependent on
> > Ssaia in the host.
> >
> > Atish Patra (24):
> > RISC-V: Fix the typo in Scountovf CSR name
> > RISC-V: Add FIRMWARE_READ_HI definition
> > drivers/perf: riscv: Read upper bits of a firmware counter
> > drivers/perf: riscv: Use BIT macro for shifting operations
> > RISC-V: Add SBI PMU snapshot definitions
> > RISC-V: KVM: Rename the SBI_STA_SHMEM_DISABLE to a generic name
> > RISC-V: Use the minor version mask while computing sbi version
> > drivers/perf: riscv: Fix counter mask iteration for RV32
> > drivers/perf: riscv: Implement SBI PMU snapshot function
> > RISC-V: KVM: Fix the initial sample period value
> > RISC-V: KVM: No need to update the counter value during reset
> > RISC-V: KVM: No need to exit to the user space if perf event failed
> > RISC-V: KVM: Implement SBI PMU Snapshot feature
> > RISC-V: KVM: Add perf sampling support for guests
> > RISC-V: KVM: Support 64 bit firmware counters on RV32
> > RISC-V: KVM: Improve firmware counter read function
> > KVM: riscv: selftests: Move sbi definitions to its own header file
> > KVM: riscv: selftests: Add helper functions for extension checks
> > KVM: riscv: selftests: Add Sscofpmf to get-reg-list test
> > KVM: riscv: selftests: Add SBI PMU extension definitions
> > KVM: riscv: selftests: Add SBI PMU selftest
> > KVM: riscv: selftests: Add a test for PMU snapshot functionality
> > KVM: riscv: selftests: Add a test for counter overflow
> > KVM: riscv: selftests: Add commandline option for SBI PMU test
>
> Queued this series for Linux-6.10
>
> If new issues are discovered then send patches based on
> the KVM riscv queue.
Please use the kvm-riscv-for-palmer-6.10 tag in the KVM RISC-V
repo (https://github.com/kvm-riscv/linux.git) as a shared tag for
the upcoming Linux-6.10 merge window.
This tag is based on Linux-6.9-rc3.
Regards,
Anup
>
> Thanks,
> Anup
>
> >
> > arch/riscv/include/asm/csr.h | 5 +-
> > arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 +-
> > arch/riscv/include/asm/sbi.h | 38 +-
> > arch/riscv/include/uapi/asm/kvm.h | 1 +
> > arch/riscv/kernel/paravirt.c | 6 +-
> > arch/riscv/kvm/aia.c | 5 +
> > arch/riscv/kvm/vcpu.c | 15 +-
> > arch/riscv/kvm/vcpu_onereg.c | 6 +
> > arch/riscv/kvm/vcpu_pmu.c | 260 ++++++-
> > arch/riscv/kvm/vcpu_sbi_pmu.c | 17 +-
> > arch/riscv/kvm/vcpu_sbi_sta.c | 4 +-
> > drivers/perf/riscv_pmu.c | 1 +
> > drivers/perf/riscv_pmu_sbi.c | 309 +++++++-
> > include/linux/perf/riscv_pmu.h | 6 +
> > tools/testing/selftests/kvm/Makefile | 1 +
> > .../selftests/kvm/include/riscv/processor.h | 49 +-
> > .../testing/selftests/kvm/include/riscv/sbi.h | 141 ++++
> > .../selftests/kvm/include/riscv/ucall.h | 1 +
> > .../selftests/kvm/lib/riscv/processor.c | 12 +
> > .../testing/selftests/kvm/riscv/arch_timer.c | 2 +-
> > .../selftests/kvm/riscv/get-reg-list.c | 4 +
> > .../selftests/kvm/riscv/sbi_pmu_test.c | 681 ++++++++++++++++++
> > tools/testing/selftests/kvm/steal_time.c | 4 +-
> > 23 files changed, 1467 insertions(+), 117 deletions(-)
> > create mode 100644 tools/testing/selftests/kvm/include/riscv/sbi.h
> > create mode 100644 tools/testing/selftests/kvm/riscv/sbi_pmu_test.c
> >
> > --
> > 2.34.1
> >