This series extends perf support for KVM. The KVM implementation relies
on the SBI PMU extension and trap n emulation of hpmcounter CSRs.
The KVM implementation exposes the virtual counters to the guest and internally
manage the counters using kernel perf counters.
This series doesn't support the counter overflow as the Sscofpmf extension
doesn't allow trap & emulation mechanism of scountovf CSR yet. The required
changes to allow that are being under discussions. Supporting overflow interrupt
also requires AIA interrupt filtering support.
1. PATCH 1-5 are generic KVM/PMU driver improvements.
2. PATCH 9 disables hpmcounter for now. It will be enabled to maintain ABI
requirement once the ONE reg interface is settled.
perf stat works in kvm guests with this series.
Here is example of running perf stat in a guest running in KVM.
===========================================================================
/ # /host/apps/perf stat -e instructions -e cycles -e r8000000000000005 \
> -e r8000000000000006 -e r8000000000000007 -e r8000000000000008 \
> -e r800000000000000a perf bench sched messaging -g 10 -l 10
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 7.769 [sec]
Performance counter stats for 'perf bench sched messaging -g 10 -l 10':
73556259604 cycles
73387266056 instructions # 1.00 insn per cycle
0 dTLB-store-misses
0 iTLB-load-misses
0 r8000000000000005
2595 r8000000000000006
2272 r8000000000000007
10 r8000000000000008
0 r800000000000000a
12.173720400 seconds time elapsed
1.002716000 seconds user
21.931047000 seconds sys
Note: The SBI_PMU_FW_SET_TIMER (eventid : r8000000000000005) is zero
as kvm guest supports sstc now.
This series can be found here as well.
https://github.com/atishp04/linux/tree/kvm_perf_v3
TODO:
1. Add sscofpmf support.
2. Add One reg interface for the following operations:
1. Enable/Disable PMU (should it at VM level rather than vcpu ?)
2. Number of hpmcounter and width of the counters
3. Init PMU
4. Allow guest user to access cycle & instret without trapping
Changes v2->v3:
1. Changed the exported functions to GPL only export.
2. Addressed all the nit comments on v2.
3. Split non-kvm related changes into separate patches.
4. Reorgainze the PATCH 11 and 10 based on Drew's suggestions.
Changes from v1->v2:
1. Addressed comments from Andrew.
2. Removed kvpmu sanity check.
3. Added a kvm pmu init flag and the sanity check to probe function.
4. Improved the linux vs sbi error code handling.
Atish Patra (14):
perf: RISC-V: Define helper functions expose hpm counter width and
count
perf: RISC-V: Improve privilege mode filtering for perf
RISC-V: Improve SBI PMU extension related definitions
RISC-V: KVM: Define a probe function for SBI extension data structures
RISC-V: KVM: Return correct code for hsm stop function
RISC-V: KVM: Modify SBI extension handler to return SBI error code
RISC-V: KVM: Add skeleton support for perf
RISC-V: KVM: Add SBI PMU extension support
RISC-V: KVM: Make PMU functionality depend on Sscofpmf
RISC-V: KVM: Disable all hpmcounter access for VS/VU mode
RISC-V: KVM: Implement trap & emulate for hpmcounters
RISC-V: KVM: Implement perf support without sampling
RISC-V: KVM: Support firmware events
RISC-V: KVM: Increment firmware pmu events
arch/riscv/include/asm/kvm_host.h | 3 +
arch/riscv/include/asm/kvm_vcpu_pmu.h | 108 +++++
arch/riscv/include/asm/kvm_vcpu_sbi.h | 13 +-
arch/riscv/include/asm/sbi.h | 5 +-
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/main.c | 3 +-
arch/riscv/kvm/tlb.c | 4 +
arch/riscv/kvm/vcpu.c | 5 +
arch/riscv/kvm/vcpu_insn.c | 4 +-
arch/riscv/kvm/vcpu_pmu.c | 622 ++++++++++++++++++++++++++
arch/riscv/kvm/vcpu_sbi.c | 57 ++-
arch/riscv/kvm/vcpu_sbi_base.c | 45 +-
arch/riscv/kvm/vcpu_sbi_hsm.c | 29 +-
arch/riscv/kvm/vcpu_sbi_pmu.c | 86 ++++
arch/riscv/kvm/vcpu_sbi_replace.c | 54 ++-
arch/riscv/kvm/vcpu_sbi_v01.c | 11 +-
drivers/perf/riscv_pmu_sbi.c | 64 ++-
include/linux/perf/riscv_pmu.h | 5 +
18 files changed, 1013 insertions(+), 106 deletions(-)
create mode 100644 arch/riscv/include/asm/kvm_vcpu_pmu.h
create mode 100644 arch/riscv/kvm/vcpu_pmu.c
create mode 100644 arch/riscv/kvm/vcpu_sbi_pmu.c
--
2.25.1
KVM module needs to know how many hardware counters and the counter
width that the platform supports. Otherwise, it will not be able to show
optimal value of virtual counters to the guest. The virtual hardware
counters also need to have the same width as the logical hardware
counters for simplicity. However, there shouldn't be mapping between
virtual hardware counters and logical hardware counters. As we don't
support hetergeneous harts or counters with different width as of now,
the implementation relies on the counter width of the first available
programmable counter.
Signed-off-by: Atish Patra <[email protected]>
---
drivers/perf/riscv_pmu_sbi.c | 37 ++++++++++++++++++++++++++++++++--
include/linux/perf/riscv_pmu.h | 3 +++
2 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index f6507ef..6b53adc 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -44,7 +44,7 @@ static const struct attribute_group *riscv_pmu_attr_groups[] = {
};
/*
- * RISC-V doesn't have hetergenous harts yet. This need to be part of
+ * RISC-V doesn't have heterogeneous harts yet. This need to be part of
* per_cpu in case of harts with different pmu counters
*/
static union sbi_pmu_ctr_info *pmu_ctr_list;
@@ -52,6 +52,9 @@ static bool riscv_pmu_use_irq;
static unsigned int riscv_pmu_irq_num;
static unsigned int riscv_pmu_irq;
+/* Cache the available counters in a bitmask */
+static unsigned long cmask;
+
struct sbi_pmu_event_data {
union {
union {
@@ -267,6 +270,37 @@ static bool pmu_sbi_ctr_is_fw(int cidx)
return (info->type == SBI_PMU_CTR_TYPE_FW) ? true : false;
}
+/*
+ * Returns the counter width of a programmable counter and number of hardware
+ * counters. As we don't support heterogeneous CPUs yet, it is okay to just
+ * return the counter width of the first programmable counter.
+ */
+int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr)
+{
+ int i;
+ union sbi_pmu_ctr_info *info;
+ u32 hpm_width = 0, hpm_count = 0;
+
+ if (!cmask)
+ return -EINVAL;
+
+ for_each_set_bit(i, &cmask, RISCV_MAX_COUNTERS) {
+ info = &pmu_ctr_list[i];
+ if (!info)
+ continue;
+ if (!hpm_width && info->csr != CSR_CYCLE && info->csr != CSR_INSTRET)
+ hpm_width = info->width;
+ if (info->type == SBI_PMU_CTR_TYPE_HW)
+ hpm_count++;
+ }
+
+ *hw_ctr_width = hpm_width;
+ *num_hw_ctr = hpm_count;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(riscv_pmu_get_hpm_info);
+
static int pmu_sbi_ctr_get_idx(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
@@ -812,7 +846,6 @@ static void riscv_pmu_destroy(struct riscv_pmu *pmu)
static int pmu_sbi_device_probe(struct platform_device *pdev)
{
struct riscv_pmu *pmu = NULL;
- unsigned long cmask = 0;
int ret = -ENODEV;
int num_counters;
diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index e17e86a..a1c3f77 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -73,6 +73,9 @@ void riscv_pmu_legacy_skip_init(void);
static inline void riscv_pmu_legacy_skip_init(void) {};
#endif
struct riscv_pmu *riscv_pmu_alloc(void);
+#ifdef CONFIG_RISCV_PMU_SBI
+int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr);
+#endif
#endif /* CONFIG_RISCV_PMU */
--
2.25.1
Currently, the host driver doesn't have any method to identify if the
requested perf event is from kvm or bare metal. As KVM runs in HS
mode, there are no separate hypervisor privilege mode to distinguish
between the attributes for guest/host.
Improve the privilege mode filtering by using the event specific
config1 field.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
drivers/perf/riscv_pmu_sbi.c | 27 ++++++++++++++++++++++-----
include/linux/perf/riscv_pmu.h | 2 ++
2 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 6b53adc..e862b13 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -301,6 +301,27 @@ int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr)
}
EXPORT_SYMBOL_GPL(riscv_pmu_get_hpm_info);
+static unsigned long pmu_sbi_get_filter_flags(struct perf_event *event)
+{
+ unsigned long cflags = 0;
+ bool guest_events = false;
+
+ if (event->attr.config1 & RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS)
+ guest_events = true;
+ if (event->attr.exclude_kernel)
+ cflags |= guest_events ? SBI_PMU_CFG_FLAG_SET_VSINH : SBI_PMU_CFG_FLAG_SET_SINH;
+ if (event->attr.exclude_user)
+ cflags |= guest_events ? SBI_PMU_CFG_FLAG_SET_VUINH : SBI_PMU_CFG_FLAG_SET_UINH;
+ if (guest_events && event->attr.exclude_hv)
+ cflags |= SBI_PMU_CFG_FLAG_SET_SINH;
+ if (event->attr.exclude_host)
+ cflags |= SBI_PMU_CFG_FLAG_SET_UINH | SBI_PMU_CFG_FLAG_SET_SINH;
+ if (event->attr.exclude_guest)
+ cflags |= SBI_PMU_CFG_FLAG_SET_VSINH | SBI_PMU_CFG_FLAG_SET_VUINH;
+
+ return cflags;
+}
+
static int pmu_sbi_ctr_get_idx(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
@@ -311,11 +332,7 @@ static int pmu_sbi_ctr_get_idx(struct perf_event *event)
uint64_t cbase = 0;
unsigned long cflags = 0;
- if (event->attr.exclude_kernel)
- cflags |= SBI_PMU_CFG_FLAG_SET_SINH;
- if (event->attr.exclude_user)
- cflags |= SBI_PMU_CFG_FLAG_SET_UINH;
-
+ cflags = pmu_sbi_get_filter_flags(event);
/* retrieve the available counter index */
#if defined(CONFIG_32BIT)
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase,
diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index a1c3f77..1c42146 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -26,6 +26,8 @@
#define RISCV_PMU_STOP_FLAG_RESET 1
+#define RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS 0x1
+
struct cpu_hw_events {
/* currently enabled events */
int n_events;
--
2.25.1
This patch fixes/improve few minor things in SBI PMU extension
definition.
1. Align all the firmware event names.
2. Add macros for bit positions in cache event ID & ops.
The changes were small enough to combine them together instead
of creating 1 liner patches.
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/sbi.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 4ca7fba..f21c026 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -171,7 +171,7 @@ enum sbi_pmu_fw_generic_events_t {
SBI_PMU_FW_IPI_SENT = 6,
SBI_PMU_FW_IPI_RECVD = 7,
SBI_PMU_FW_FENCE_I_SENT = 8,
- SBI_PMU_FW_FENCE_I_RECVD = 9,
+ SBI_PMU_FW_FENCE_I_RCVD = 9,
SBI_PMU_FW_SFENCE_VMA_SENT = 10,
SBI_PMU_FW_SFENCE_VMA_RCVD = 11,
SBI_PMU_FW_SFENCE_VMA_ASID_SENT = 12,
@@ -215,6 +215,9 @@ enum sbi_pmu_ctr_type {
#define SBI_PMU_EVENT_CACHE_OP_ID_CODE_MASK 0x06
#define SBI_PMU_EVENT_CACHE_RESULT_ID_CODE_MASK 0x01
+#define SBI_PMU_EVENT_CACHE_ID_SHIFT 3
+#define SBI_PMU_EVENT_CACHE_OP_SHIFT 1
+
#define SBI_PMU_EVENT_IDX_INVALID 0xFFFFFFFF
/* Flags defined for config matching function */
--
2.25.1
Currently the probe function just checks if an SBI extension is
registered or not. However, the extension may not want to advertise
itself depending on some other condition.
An additional extension specific probe function will allow
extensions to decide if they want to be advertised to the caller or
not. Any extension that does not require additional dependency checks
can avoid implementing this function.
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/kvm_vcpu_sbi.h | 3 +++
arch/riscv/kvm/vcpu_sbi_base.c | 13 +++++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_vcpu_sbi.h b/arch/riscv/include/asm/kvm_vcpu_sbi.h
index f79478a..45ba341 100644
--- a/arch/riscv/include/asm/kvm_vcpu_sbi.h
+++ b/arch/riscv/include/asm/kvm_vcpu_sbi.h
@@ -29,6 +29,9 @@ struct kvm_vcpu_sbi_extension {
int (*handler)(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long *out_val, struct kvm_cpu_trap *utrap,
bool *exit);
+
+ /* Extension specific probe function */
+ unsigned long (*probe)(struct kvm_vcpu *vcpu);
};
void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, struct kvm_run *run);
diff --git a/arch/riscv/kvm/vcpu_sbi_base.c b/arch/riscv/kvm/vcpu_sbi_base.c
index 5d65c63..846d518 100644
--- a/arch/riscv/kvm/vcpu_sbi_base.c
+++ b/arch/riscv/kvm/vcpu_sbi_base.c
@@ -19,6 +19,7 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
{
int ret = 0;
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+ const struct kvm_vcpu_sbi_extension *sbi_ext;
switch (cp->a6) {
case SBI_EXT_BASE_GET_SPEC_VERSION:
@@ -43,8 +44,16 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
*/
kvm_riscv_vcpu_sbi_forward(vcpu, run);
*exit = true;
- } else
- *out_val = kvm_vcpu_sbi_find_ext(cp->a0) ? 1 : 0;
+ } else {
+ sbi_ext = kvm_vcpu_sbi_find_ext(cp->a0);
+ if (sbi_ext) {
+ if (sbi_ext->probe)
+ *out_val = sbi_ext->probe(vcpu);
+ else
+ *out_val = 1;
+ } else
+ *out_val = 0;
+ }
break;
case SBI_EXT_BASE_GET_MVENDORID:
*out_val = vcpu->arch.mvendorid;
--
2.25.1
According to the SBI specification, the stop function can only
return error code SBI_ERR_FAILED. However, currently it returns
-EINVAL which will be mapped SBI_ERR_INVALID_PARAM.
Return an linux error code that maps to SBI_ERR_FAILED i.e doesn't map
to any other SBI error code. While EACCES is not the best error code
to describe the situation, it is close enough and will be replaced
with SBI error codes directly anyways.
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/vcpu_sbi_hsm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/kvm/vcpu_sbi_hsm.c b/arch/riscv/kvm/vcpu_sbi_hsm.c
index 2e915ca..619ac0f 100644
--- a/arch/riscv/kvm/vcpu_sbi_hsm.c
+++ b/arch/riscv/kvm/vcpu_sbi_hsm.c
@@ -42,7 +42,7 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu)
static int kvm_sbi_hsm_vcpu_stop(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.power_off)
- return -EINVAL;
+ return -EACCES;
kvm_riscv_vcpu_power_off(vcpu);
--
2.25.1
Currently, the SBI extension handle is expected to return Linux error code.
The top SBI layer converts the Linux error code to SBI specific error code
that can be returned to guest invoking the SBI calls. This model works
as long as SBI error codes have 1-to-1 mappings between them.
However, that may not be true always. This patch attempts to disassociate
both these error codes by allowing the SBI extension implementation to
return SBI specific error codes as well.
The extension will continue to return the Linux error specific code which
will indicate any problem *with* the extension emulation while the
SBI specific error will indicate the problem *of* the emulation.
Suggested-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/kvm_vcpu_sbi.h | 10 ++++--
arch/riscv/kvm/vcpu_sbi.c | 46 ++++++++------------------
arch/riscv/kvm/vcpu_sbi_base.c | 38 ++++++++++------------
arch/riscv/kvm/vcpu_sbi_hsm.c | 29 +++++++++--------
arch/riscv/kvm/vcpu_sbi_replace.c | 47 ++++++++++++++-------------
arch/riscv/kvm/vcpu_sbi_v01.c | 11 +++----
6 files changed, 84 insertions(+), 97 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_vcpu_sbi.h b/arch/riscv/include/asm/kvm_vcpu_sbi.h
index 45ba341..38407b3 100644
--- a/arch/riscv/include/asm/kvm_vcpu_sbi.h
+++ b/arch/riscv/include/asm/kvm_vcpu_sbi.h
@@ -18,6 +18,12 @@ struct kvm_vcpu_sbi_context {
int return_handled;
};
+struct kvm_vcpu_sbi_ext_data {
+ unsigned long out_val;
+ unsigned long err_val;
+ bool uexit;
+};
+
struct kvm_vcpu_sbi_extension {
unsigned long extid_start;
unsigned long extid_end;
@@ -27,8 +33,8 @@ struct kvm_vcpu_sbi_extension {
* specific error codes.
*/
int (*handler)(struct kvm_vcpu *vcpu, struct kvm_run *run,
- unsigned long *out_val, struct kvm_cpu_trap *utrap,
- bool *exit);
+ struct kvm_vcpu_sbi_ext_data *edata,
+ struct kvm_cpu_trap *utrap);
/* Extension specific probe function */
unsigned long (*probe)(struct kvm_vcpu *vcpu);
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
index f96991d..aa42da6 100644
--- a/arch/riscv/kvm/vcpu_sbi.c
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -12,26 +12,6 @@
#include <asm/sbi.h>
#include <asm/kvm_vcpu_sbi.h>
-static int kvm_linux_err_map_sbi(int err)
-{
- switch (err) {
- case 0:
- return SBI_SUCCESS;
- case -EPERM:
- return SBI_ERR_DENIED;
- case -EINVAL:
- return SBI_ERR_INVALID_PARAM;
- case -EFAULT:
- return SBI_ERR_INVALID_ADDRESS;
- case -EOPNOTSUPP:
- return SBI_ERR_NOT_SUPPORTED;
- case -EALREADY:
- return SBI_ERR_ALREADY_AVAILABLE;
- default:
- return SBI_ERR_FAILURE;
- };
-}
-
#ifndef CONFIG_RISCV_SBI_V01
static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_v01 = {
.extid_start = -1UL,
@@ -125,11 +105,10 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
int ret = 1;
bool next_sepc = true;
- bool userspace_exit = false;
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
const struct kvm_vcpu_sbi_extension *sbi_ext;
struct kvm_cpu_trap utrap = { 0 };
- unsigned long out_val = 0;
+ struct kvm_vcpu_sbi_ext_data edata_out = { 0 };
bool ext_is_v01 = false;
sbi_ext = kvm_vcpu_sbi_find_ext(cp->a7);
@@ -139,13 +118,22 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
cp->a7 <= SBI_EXT_0_1_SHUTDOWN)
ext_is_v01 = true;
#endif
- ret = sbi_ext->handler(vcpu, run, &out_val, &utrap, &userspace_exit);
+ ret = sbi_ext->handler(vcpu, run, &edata_out, &utrap);
} else {
/* Return error for unsupported SBI calls */
cp->a0 = SBI_ERR_NOT_SUPPORTED;
goto ecall_done;
}
+ /*
+ * When the SBI extension returns a Linux error code, it exits the ioctl
+ * loop and forwards the error to userspace.
+ */
+ if (ret < 0) {
+ next_sepc = false;
+ goto ecall_done;
+ }
+
/* Handle special error cases i.e trap, exit or userspace forward */
if (utrap.scause) {
/* No need to increment sepc or exit ioctl loop */
@@ -157,24 +145,18 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
}
/* Exit ioctl loop or Propagate the error code the guest */
- if (userspace_exit) {
+ if (edata_out.uexit) {
next_sepc = false;
ret = 0;
} else {
- /**
- * SBI extension handler always returns an Linux error code. Convert
- * it to the SBI specific error code that can be propagated the SBI
- * caller.
- */
- ret = kvm_linux_err_map_sbi(ret);
- cp->a0 = ret;
+ cp->a0 = edata_out.err_val;
ret = 1;
}
ecall_done:
if (next_sepc)
cp->sepc += 4;
if (!ext_is_v01)
- cp->a1 = out_val;
+ cp->a1 = edata_out.out_val;
return ret;
}
diff --git a/arch/riscv/kvm/vcpu_sbi_base.c b/arch/riscv/kvm/vcpu_sbi_base.c
index 846d518..84885e5 100644
--- a/arch/riscv/kvm/vcpu_sbi_base.c
+++ b/arch/riscv/kvm/vcpu_sbi_base.c
@@ -14,24 +14,23 @@
#include <asm/kvm_vcpu_sbi.h>
static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
- unsigned long *out_val,
- struct kvm_cpu_trap *trap, bool *exit)
+ struct kvm_vcpu_sbi_ext_data *edata,
+ struct kvm_cpu_trap *trap)
{
- int ret = 0;
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
const struct kvm_vcpu_sbi_extension *sbi_ext;
switch (cp->a6) {
case SBI_EXT_BASE_GET_SPEC_VERSION:
- *out_val = (KVM_SBI_VERSION_MAJOR <<
+ edata->out_val = (KVM_SBI_VERSION_MAJOR <<
SBI_SPEC_VERSION_MAJOR_SHIFT) |
KVM_SBI_VERSION_MINOR;
break;
case SBI_EXT_BASE_GET_IMP_ID:
- *out_val = KVM_SBI_IMPID;
+ edata->out_val = KVM_SBI_IMPID;
break;
case SBI_EXT_BASE_GET_IMP_VERSION:
- *out_val = LINUX_VERSION_CODE;
+ edata->out_val = LINUX_VERSION_CODE;
break;
case SBI_EXT_BASE_PROBE_EXT:
if ((cp->a0 >= SBI_EXT_EXPERIMENTAL_START &&
@@ -43,33 +42,33 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
* forward it to the userspace
*/
kvm_riscv_vcpu_sbi_forward(vcpu, run);
- *exit = true;
+ edata->uexit = true;
} else {
sbi_ext = kvm_vcpu_sbi_find_ext(cp->a0);
if (sbi_ext) {
if (sbi_ext->probe)
- *out_val = sbi_ext->probe(vcpu);
+ edata->out_val = sbi_ext->probe(vcpu);
else
- *out_val = 1;
+ edata->out_val = 1;
} else
- *out_val = 0;
+ edata->out_val = 0;
}
break;
case SBI_EXT_BASE_GET_MVENDORID:
- *out_val = vcpu->arch.mvendorid;
+ edata->out_val = vcpu->arch.mvendorid;
break;
case SBI_EXT_BASE_GET_MARCHID:
- *out_val = vcpu->arch.marchid;
+ edata->out_val = vcpu->arch.marchid;
break;
case SBI_EXT_BASE_GET_MIMPID:
- *out_val = vcpu->arch.mimpid;
+ edata->out_val = vcpu->arch.mimpid;
break;
default:
- ret = -EOPNOTSUPP;
+ edata->err_val = SBI_ERR_NOT_SUPPORTED;
break;
}
- return ret;
+ return 0;
}
const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_base = {
@@ -79,17 +78,16 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_base = {
};
static int kvm_sbi_ext_forward_handler(struct kvm_vcpu *vcpu,
- struct kvm_run *run,
- unsigned long *out_val,
- struct kvm_cpu_trap *utrap,
- bool *exit)
+ struct kvm_run *run,
+ struct kvm_vcpu_sbi_ext_data *edata,
+ struct kvm_cpu_trap *utrap)
{
/*
* Both SBI experimental and vendor extensions are
* unconditionally forwarded to userspace.
*/
kvm_riscv_vcpu_sbi_forward(vcpu, run);
- *exit = true;
+ edata->uexit = true;
return 0;
}
diff --git a/arch/riscv/kvm/vcpu_sbi_hsm.c b/arch/riscv/kvm/vcpu_sbi_hsm.c
index 619ac0f..5fb526c 100644
--- a/arch/riscv/kvm/vcpu_sbi_hsm.c
+++ b/arch/riscv/kvm/vcpu_sbi_hsm.c
@@ -21,9 +21,9 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu)
target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, target_vcpuid);
if (!target_vcpu)
- return -EINVAL;
+ return SBI_ERR_INVALID_PARAM;
if (!target_vcpu->arch.power_off)
- return -EALREADY;
+ return SBI_ERR_ALREADY_AVAILABLE;
reset_cntx = &target_vcpu->arch.guest_reset_context;
/* start address */
@@ -42,7 +42,7 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu)
static int kvm_sbi_hsm_vcpu_stop(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.power_off)
- return -EACCES;
+ return SBI_ERR_FAILURE;
kvm_riscv_vcpu_power_off(vcpu);
@@ -57,7 +57,7 @@ static int kvm_sbi_hsm_vcpu_get_status(struct kvm_vcpu *vcpu)
target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, target_vcpuid);
if (!target_vcpu)
- return -EINVAL;
+ return SBI_ERR_INVALID_PARAM;
if (!target_vcpu->arch.power_off)
return SBI_HSM_STATE_STARTED;
else if (vcpu->stat.generic.blocking)
@@ -67,9 +67,8 @@ static int kvm_sbi_hsm_vcpu_get_status(struct kvm_vcpu *vcpu)
}
static int kvm_sbi_ext_hsm_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
- unsigned long *out_val,
- struct kvm_cpu_trap *utrap,
- bool *exit)
+ struct kvm_vcpu_sbi_ext_data *edata,
+ struct kvm_cpu_trap *utrap)
{
int ret = 0;
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
@@ -88,27 +87,29 @@ static int kvm_sbi_ext_hsm_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
case SBI_EXT_HSM_HART_STATUS:
ret = kvm_sbi_hsm_vcpu_get_status(vcpu);
if (ret >= 0) {
- *out_val = ret;
- ret = 0;
+ edata->out_val = ret;
+ edata->err_val = 0;
}
- break;
+ return 0;
case SBI_EXT_HSM_HART_SUSPEND:
switch (cp->a0) {
case SBI_HSM_SUSPEND_RET_DEFAULT:
kvm_riscv_vcpu_wfi(vcpu);
break;
case SBI_HSM_SUSPEND_NON_RET_DEFAULT:
- ret = -EOPNOTSUPP;
+ ret = SBI_ERR_NOT_SUPPORTED;
break;
default:
- ret = -EINVAL;
+ ret = SBI_ERR_INVALID_PARAM;
}
break;
default:
- ret = -EOPNOTSUPP;
+ ret = SBI_ERR_NOT_SUPPORTED;
}
- return ret;
+ edata->err_val = ret;
+
+ return 0;
}
const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_hsm = {
diff --git a/arch/riscv/kvm/vcpu_sbi_replace.c b/arch/riscv/kvm/vcpu_sbi_replace.c
index 03a0198..abeb55f 100644
--- a/arch/riscv/kvm/vcpu_sbi_replace.c
+++ b/arch/riscv/kvm/vcpu_sbi_replace.c
@@ -14,15 +14,16 @@
#include <asm/kvm_vcpu_sbi.h>
static int kvm_sbi_ext_time_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
- unsigned long *out_val,
- struct kvm_cpu_trap *utrap, bool *exit)
+ struct kvm_vcpu_sbi_ext_data *edata,
+ struct kvm_cpu_trap *utrap)
{
- int ret = 0;
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
u64 next_cycle;
- if (cp->a6 != SBI_EXT_TIME_SET_TIMER)
- return -EINVAL;
+ if (cp->a6 != SBI_EXT_TIME_SET_TIMER) {
+ edata->err_val = SBI_ERR_INVALID_PARAM;
+ return 0;
+ }
#if __riscv_xlen == 32
next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
@@ -31,7 +32,7 @@ static int kvm_sbi_ext_time_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
#endif
kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
- return ret;
+ return 0;
}
const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_time = {
@@ -41,8 +42,8 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_time = {
};
static int kvm_sbi_ext_ipi_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
- unsigned long *out_val,
- struct kvm_cpu_trap *utrap, bool *exit)
+ struct kvm_vcpu_sbi_ext_data *edata,
+ struct kvm_cpu_trap *utrap)
{
int ret = 0;
unsigned long i;
@@ -51,8 +52,10 @@ static int kvm_sbi_ext_ipi_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long hmask = cp->a0;
unsigned long hbase = cp->a1;
- if (cp->a6 != SBI_EXT_IPI_SEND_IPI)
- return -EINVAL;
+ if (cp->a6 != SBI_EXT_IPI_SEND_IPI) {
+ edata->err_val = SBI_ERR_INVALID_PARAM;
+ return 0;
+ }
kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
if (hbase != -1UL) {
@@ -76,10 +79,9 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_ipi = {
};
static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
- unsigned long *out_val,
- struct kvm_cpu_trap *utrap, bool *exit)
+ struct kvm_vcpu_sbi_ext_data *edata,
+ struct kvm_cpu_trap *utrap)
{
- int ret = 0;
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
unsigned long hmask = cp->a0;
unsigned long hbase = cp->a1;
@@ -116,10 +118,10 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
*/
break;
default:
- ret = -EOPNOTSUPP;
+ edata->err_val = SBI_ERR_NOT_SUPPORTED;
}
- return ret;
+ return 0;
}
const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_rfence = {
@@ -130,14 +132,13 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_rfence = {
static int kvm_sbi_ext_srst_handler(struct kvm_vcpu *vcpu,
struct kvm_run *run,
- unsigned long *out_val,
- struct kvm_cpu_trap *utrap, bool *exit)
+ struct kvm_vcpu_sbi_ext_data *edata,
+ struct kvm_cpu_trap *utrap)
{
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
unsigned long funcid = cp->a6;
u32 reason = cp->a1;
u32 type = cp->a0;
- int ret = 0;
switch (funcid) {
case SBI_EXT_SRST_RESET:
@@ -146,24 +147,24 @@ static int kvm_sbi_ext_srst_handler(struct kvm_vcpu *vcpu,
kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
KVM_SYSTEM_EVENT_SHUTDOWN,
reason);
- *exit = true;
+ edata->uexit = true;
break;
case SBI_SRST_RESET_TYPE_COLD_REBOOT:
case SBI_SRST_RESET_TYPE_WARM_REBOOT:
kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
KVM_SYSTEM_EVENT_RESET,
reason);
- *exit = true;
+ edata->uexit = true;
break;
default:
- ret = -EOPNOTSUPP;
+ edata->err_val = SBI_ERR_NOT_SUPPORTED;
}
break;
default:
- ret = -EOPNOTSUPP;
+ edata->err_val = SBI_ERR_NOT_SUPPORTED;
}
- return ret;
+ return 0;
}
const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_srst = {
diff --git a/arch/riscv/kvm/vcpu_sbi_v01.c b/arch/riscv/kvm/vcpu_sbi_v01.c
index 489f225..c0ccc58 100644
--- a/arch/riscv/kvm/vcpu_sbi_v01.c
+++ b/arch/riscv/kvm/vcpu_sbi_v01.c
@@ -14,9 +14,8 @@
#include <asm/kvm_vcpu_sbi.h>
static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
- unsigned long *out_val,
- struct kvm_cpu_trap *utrap,
- bool *exit)
+ struct kvm_vcpu_sbi_ext_data *edata,
+ struct kvm_cpu_trap *utrap)
{
ulong hmask;
int i, ret = 0;
@@ -33,7 +32,7 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
* handled in kernel so we forward these to user-space
*/
kvm_riscv_vcpu_sbi_forward(vcpu, run);
- *exit = true;
+ edata->uexit = true;
break;
case SBI_EXT_0_1_SET_TIMER:
#if __riscv_xlen == 32
@@ -65,7 +64,7 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
case SBI_EXT_0_1_SHUTDOWN:
kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
KVM_SYSTEM_EVENT_SHUTDOWN, 0);
- *exit = true;
+ edata->uexit = true;
break;
case SBI_EXT_0_1_REMOTE_FENCE_I:
case SBI_EXT_0_1_REMOTE_SFENCE_VMA:
@@ -103,7 +102,7 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
}
break;
default:
- ret = -EINVAL;
+ edata->err_val = SBI_ERR_NOT_SUPPORTED;
break;
}
--
2.25.1
This patch only adds barebore structure of perf implementation. Most of
the function returns zero at this point and will be implemented
fully in the future.
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 3 +
arch/riscv/include/asm/kvm_vcpu_pmu.h | 76 ++++++++++++++
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/vcpu.c | 5 +
arch/riscv/kvm/vcpu_pmu.c | 145 ++++++++++++++++++++++++++
5 files changed, 230 insertions(+)
create mode 100644 arch/riscv/include/asm/kvm_vcpu_pmu.h
create mode 100644 arch/riscv/kvm/vcpu_pmu.c
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 93f43a3..f9874b4 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -18,6 +18,7 @@
#include <asm/kvm_vcpu_insn.h>
#include <asm/kvm_vcpu_sbi.h>
#include <asm/kvm_vcpu_timer.h>
+#include <asm/kvm_vcpu_pmu.h>
#define KVM_MAX_VCPUS 1024
@@ -228,6 +229,8 @@ struct kvm_vcpu_arch {
/* Don't run the VCPU (blocked) */
bool pause;
+
+ struct kvm_pmu pmu;
};
static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
new file mode 100644
index 0000000..3f43a43
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Rivos Inc
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#ifndef __KVM_VCPU_RISCV_PMU_H
+#define __KVM_VCPU_RISCV_PMU_H
+
+#include <linux/perf/riscv_pmu.h>
+#include <asm/kvm_vcpu_sbi.h>
+#include <asm/sbi.h>
+
+#ifdef CONFIG_RISCV_PMU_SBI
+#define RISCV_KVM_MAX_FW_CTRS 32
+#define RISCV_MAX_COUNTERS 64
+
+/* Per virtual pmu counter data */
+struct kvm_pmc {
+ u8 idx;
+ struct perf_event *perf_event;
+ uint64_t counter_val;
+ union sbi_pmu_ctr_info cinfo;
+ /* Event monitoring status */
+ bool started;
+};
+
+/* PMU data structure per vcpu */
+struct kvm_pmu {
+ struct kvm_pmc pmc[RISCV_MAX_COUNTERS];
+ /* Number of the virtual firmware counters available */
+ int num_fw_ctrs;
+ /* Number of the virtual hardware counters available */
+ int num_hw_ctrs;
+ /* A flag to indicate that pmu initialization is done */
+ bool init_done;
+ /* Bit map of all the virtual counter used */
+ DECLARE_BITMAP(pmc_in_use, RISCV_MAX_COUNTERS);
+};
+
+#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
+#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
+
+int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
+int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
+ struct kvm_vcpu_sbi_ext_data *edata);
+int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag, uint64_t ival,
+ struct kvm_vcpu_sbi_ext_data *edata);
+int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag,
+ struct kvm_vcpu_sbi_ext_data *edata);
+int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag,
+ unsigned long eidx, uint64_t evtdata,
+ struct kvm_vcpu_sbi_ext_data *edata);
+int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
+ struct kvm_vcpu_sbi_ext_data *edata);
+int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
+
+#else
+struct kvm_pmu {
+};
+
+static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
+{
+ return 0;
+}
+static inline void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu) {}
+static inline void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu) {}
+#endif /* CONFIG_RISCV_PMU_SBI */
+#endif /* !__KVM_VCPU_RISCV_PMU_H */
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 019df920..5de1053 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -25,3 +25,4 @@ kvm-y += vcpu_sbi_base.o
kvm-y += vcpu_sbi_replace.o
kvm-y += vcpu_sbi_hsm.o
kvm-y += vcpu_timer.o
+kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 7c08567..b746f21 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -137,6 +137,7 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
WRITE_ONCE(vcpu->arch.irqs_pending, 0);
WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
+ kvm_riscv_vcpu_pmu_reset(vcpu);
vcpu->arch.hfence_head = 0;
vcpu->arch.hfence_tail = 0;
@@ -194,6 +195,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
/* Setup VCPU timer */
kvm_riscv_vcpu_timer_init(vcpu);
+ /* setup performance monitoring */
+ kvm_riscv_vcpu_pmu_init(vcpu);
+
/* Reset VCPU */
kvm_riscv_reset_vcpu(vcpu);
@@ -216,6 +220,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
/* Cleanup VCPU timer */
kvm_riscv_vcpu_timer_deinit(vcpu);
+ kvm_riscv_vcpu_pmu_deinit(vcpu);
/* Free unused pages pre-allocated for G-stage page table mappings */
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
}
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
new file mode 100644
index 0000000..d3fd551
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 Rivos Inc
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/perf/riscv_pmu.h>
+#include <asm/csr.h>
+#include <asm/kvm_vcpu_sbi.h>
+#include <asm/kvm_vcpu_pmu.h>
+#include <linux/kvm_host.h>
+
+#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
+
+int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+
+ edata->out_val = kvm_pmu_num_counters(kvpmu);
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
+ struct kvm_vcpu_sbi_ext_data *edata)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+
+ if (cidx > RISCV_MAX_COUNTERS || cidx == 1) {
+ edata->err_val = SBI_ERR_INVALID_PARAM;
+ return 0;
+ }
+
+ edata->out_val = kvpmu->pmc[cidx].cinfo.value;
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag, uint64_t ival,
+ struct kvm_vcpu_sbi_ext_data *edata)
+{
+ /* TODO */
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag,
+ struct kvm_vcpu_sbi_ext_data *edata)
+{
+ /* TODO */
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag,
+ unsigned long eidx, uint64_t evtdata,
+ struct kvm_vcpu_sbi_ext_data *edata)
+{
+ /* TODO */
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
+ struct kvm_vcpu_sbi_ext_data *edata)
+{
+ /* TODO */
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
+{
+ int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ struct kvm_pmc *pmc;
+
+ ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
+ if (ret < 0)
+ return ret;
+
+ if (!hpm_width || !num_hw_ctrs) {
+ pr_err("Cannot initialize VCPU with NULL hpmcounter width or number of counters\n");
+ return -EINVAL;
+ }
+
+ if ((num_hw_ctrs + RISCV_KVM_MAX_FW_CTRS) > RISCV_MAX_COUNTERS) {
+ pr_warn("Limiting fw counters as hw & fw counters exceed maximum counters\n");
+ num_fw_ctrs = RISCV_MAX_COUNTERS - num_hw_ctrs;
+ } else
+ num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;
+
+ kvpmu->num_hw_ctrs = num_hw_ctrs;
+ kvpmu->num_fw_ctrs = num_fw_ctrs;
+
+ /*
+ * There is no correlation between the logical hardware counter and virtual counters.
+ * However, we need to encode a hpmcounter CSR in the counter info field so that
+ * KVM can trap n emulate the read. This works well in the migration use case as
+ * KVM doesn't care if the actual hpmcounter is available in the hardware or not.
+ */
+ for (i = 0; i < kvm_pmu_num_counters(kvpmu); i++) {
+ /* TIME CSR shouldn't be read from perf interface */
+ if (i == 1)
+ continue;
+ pmc = &kvpmu->pmc[i];
+ pmc->idx = i;
+ if (i < kvpmu->num_hw_ctrs) {
+ kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
+ if (i < 3)
+ /* CY, IR counters */
+ kvpmu->pmc[i].cinfo.width = 63;
+ else
+ kvpmu->pmc[i].cinfo.width = hpm_width;
+ /*
+ * The CSR number doesn't have any relation with the logical
+ * hardware counters. The CSR numbers are encoded sequentially
+ * to avoid maintaining a map between the virtual counter
+ * and CSR number.
+ */
+ pmc->cinfo.csr = CSR_CYCLE + i;
+ } else {
+ pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
+ pmc->cinfo.width = BITS_PER_LONG - 1;
+ }
+ }
+
+ kvpmu->init_done = true;
+
+ return 0;
+}
+
+void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
+{
+ /* TODO */
+}
+
+void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
+{
+ kvm_riscv_vcpu_pmu_deinit(vcpu);
+}
--
2.25.1
SBI PMU extension allows KVM guests to configure/start/stop/query about
the PMU counters in virtualized enviornment as well.
In order to allow that, KVM implements the entire SBI PMU extension.
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/Makefile | 2 +-
arch/riscv/kvm/vcpu_sbi.c | 11 +++++
arch/riscv/kvm/vcpu_sbi_pmu.c | 86 +++++++++++++++++++++++++++++++++++
3 files changed, 98 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/kvm/vcpu_sbi_pmu.c
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 5de1053..278e97c 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -25,4 +25,4 @@ kvm-y += vcpu_sbi_base.o
kvm-y += vcpu_sbi_replace.o
kvm-y += vcpu_sbi_hsm.o
kvm-y += vcpu_timer.o
-kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
+kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o vcpu_sbi_pmu.o
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
index aa42da6..04a3b4b 100644
--- a/arch/riscv/kvm/vcpu_sbi.c
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -20,6 +20,16 @@ static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_v01 = {
};
#endif
+#ifdef CONFIG_RISCV_PMU_SBI
+extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu;
+#else
+static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu = {
+ .extid_start = -1UL,
+ .extid_end = -1UL,
+ .handler = NULL,
+};
+#endif
+
static const struct kvm_vcpu_sbi_extension *sbi_ext[] = {
&vcpu_sbi_ext_v01,
&vcpu_sbi_ext_base,
@@ -28,6 +38,7 @@ static const struct kvm_vcpu_sbi_extension *sbi_ext[] = {
&vcpu_sbi_ext_rfence,
&vcpu_sbi_ext_srst,
&vcpu_sbi_ext_hsm,
+ &vcpu_sbi_ext_pmu,
&vcpu_sbi_ext_experimental,
&vcpu_sbi_ext_vendor,
};
diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
new file mode 100644
index 0000000..73aab30
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 Rivos Inc
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <asm/csr.h>
+#include <asm/sbi.h>
+#include <asm/kvm_vcpu_sbi.h>
+
+static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ struct kvm_vcpu_sbi_ext_data *edata,
+ struct kvm_cpu_trap *utrap)
+{
+ int ret = 0;
+ struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ unsigned long funcid = cp->a6;
+ uint64_t temp;
+
+ /* Return not supported if PMU is not initialized */
+ if (!kvpmu->init_done)
+ return -EINVAL;
+
+ switch (funcid) {
+ case SBI_EXT_PMU_NUM_COUNTERS:
+ ret = kvm_riscv_vcpu_pmu_num_ctrs(vcpu, edata);
+ break;
+ case SBI_EXT_PMU_COUNTER_GET_INFO:
+ ret = kvm_riscv_vcpu_pmu_ctr_info(vcpu, cp->a0, edata);
+ break;
+ case SBI_EXT_PMU_COUNTER_CFG_MATCH:
+#if defined(CONFIG_32BIT)
+ temp = ((uint64_t)cp->a5 << 32) | cp->a4;
+#else
+ temp = cp->a4;
+#endif
+ /*
+ * This can fail if perf core framework fails to create an event.
+ * Forward the error to the user space because its an error happened
+ * within host kernel. The other option would be convert this to
+ * an SBI error and forward to the guest.
+ */
+ ret = kvm_riscv_vcpu_pmu_ctr_cfg_match(vcpu, cp->a0, cp->a1,
+ cp->a2, cp->a3, temp, edata);
+ break;
+ case SBI_EXT_PMU_COUNTER_START:
+#if defined(CONFIG_32BIT)
+ temp = ((uint64_t)cp->a4 << 32) | cp->a3;
+#else
+ temp = cp->a3;
+#endif
+ ret = kvm_riscv_vcpu_pmu_ctr_start(vcpu, cp->a0, cp->a1, cp->a2,
+ temp, edata);
+ break;
+ case SBI_EXT_PMU_COUNTER_STOP:
+ ret = kvm_riscv_vcpu_pmu_ctr_stop(vcpu, cp->a0, cp->a1, cp->a2, edata);
+ break;
+ case SBI_EXT_PMU_COUNTER_FW_READ:
+ ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, edata);
+ break;
+ default:
+ edata->err_val = SBI_ERR_NOT_SUPPORTED;
+ }
+
+ return ret;
+}
+
+unsigned long kvm_sbi_ext_pmu_probe(struct kvm_vcpu *vcpu)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+
+ return kvpmu->init_done;
+}
+
+const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu = {
+ .extid_start = SBI_EXT_PMU,
+ .extid_end = SBI_EXT_PMU,
+ .handler = kvm_sbi_ext_pmu_handler,
+ .probe = kvm_sbi_ext_pmu_probe,
+};
--
2.25.1
The privilege mode filtering feature must be available in the host so
that the host can inhibit the counters while the execution is in HS mode.
Otherwise, the guests may have access to critical guest information.
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/vcpu_pmu.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index d3fd551..7713927 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -79,6 +79,14 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
+ /*
+ * PMU functionality should be only available to guests if privilege mode
+ * filtering is available in the host. Otherwise, guest will always count
+ * events while the execution is in hypervisor mode.
+ */
+ if (!riscv_isa_extension_available(NULL, SSCOFPMF))
+ return 0;
+
ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
if (ret < 0)
return ret;
--
2.25.1
Any guest must not get access to any hpmcounter including cycle/instret
without any checks. We achieve that by disabling all the bits except TM
bit in hcounteren.
However, instret and cycle access for guest user space can be enabled
upon explicit request (via ONE REG) or on first trap from VU mode
to maintain ABI requirement in the future. This patch doesn't support
that as ONE REG interface is not settled yet.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 58c5489..c5d400f 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -49,7 +49,8 @@ int kvm_arch_hardware_enable(void)
hideleg |= (1UL << IRQ_VS_EXT);
csr_write(CSR_HIDELEG, hideleg);
- csr_write(CSR_HCOUNTEREN, -1UL);
+ /* VS should access only the time counter directly. Everything else should trap */
+ csr_write(CSR_HCOUNTEREN, 0x02);
csr_write(CSR_HVIP, 0);
--
2.25.1
As the KVM guests only see the virtual PMU counters, all hpmcounter
access should trap and KVM emulates the read access on behalf of guests.
Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 ++++++++++
arch/riscv/kvm/vcpu_insn.c | 4 ++-
arch/riscv/kvm/vcpu_pmu.c | 45 ++++++++++++++++++++++++++-
3 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
index 3f43a43..022d45d 100644
--- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
+++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
@@ -43,6 +43,19 @@ struct kvm_pmu {
#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
+#if defined(CONFIG_32BIT)
+#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
+{ .base = CSR_CYCLEH, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm }, \
+{ .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
+#else
+#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
+{ .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
+#endif
+
+int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
+ unsigned long *val, unsigned long new_val,
+ unsigned long wr_mask);
+
int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_ext_data *edata);
@@ -65,6 +78,9 @@ void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
#else
struct kvm_pmu {
};
+#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
+{ .base = 0, .count = 0, .func = NULL },
+
static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
{
diff --git a/arch/riscv/kvm/vcpu_insn.c b/arch/riscv/kvm/vcpu_insn.c
index 0bb5276..f689337 100644
--- a/arch/riscv/kvm/vcpu_insn.c
+++ b/arch/riscv/kvm/vcpu_insn.c
@@ -213,7 +213,9 @@ struct csr_func {
unsigned long wr_mask);
};
-static const struct csr_func csr_funcs[] = { };
+static const struct csr_func csr_funcs[] = {
+ KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS
+};
/**
* kvm_riscv_vcpu_csr_return -- Handle CSR read/write after user space
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index 7713927..894053a 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -17,6 +17,44 @@
#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
+static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
+ unsigned long *out_val)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ struct kvm_pmc *pmc;
+ u64 enabled, running;
+
+ pmc = &kvpmu->pmc[cidx];
+ if (!pmc->perf_event)
+ return -EINVAL;
+
+ pmc->counter_val += perf_event_read_value(pmc->perf_event, &enabled, &running);
+ *out_val = pmc->counter_val;
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
+ unsigned long *val, unsigned long new_val,
+ unsigned long wr_mask)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ int cidx, ret = KVM_INSN_CONTINUE_NEXT_SEPC;
+
+ if (!kvpmu || !kvpmu->init_done)
+ return KVM_INSN_EXIT_TO_USER_SPACE;
+
+ if (wr_mask)
+ return KVM_INSN_ILLEGAL_TRAP;
+
+ cidx = csr_num - CSR_CYCLE;
+
+ if (pmu_ctr_read(vcpu, cidx, val) < 0)
+ return KVM_INSN_EXIT_TO_USER_SPACE;
+
+ return ret;
+}
+
int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
{
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
@@ -69,7 +107,12 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_vcpu_sbi_ext_data *edata)
{
- /* TODO */
+ int ret;
+
+ ret = pmu_ctr_read(vcpu, cidx, &edata->out_val);
+ if (ret == -EINVAL)
+ edata->err_val = SBI_ERR_INVALID_PARAM;
+
return 0;
}
--
2.25.1
RISC-V SBI PMU & Sscofpmf ISA extension allows supporting perf in
the virtualization enviornment as well. KVM implementation
relies on SBI PMU extension for the most part while trapping
& emulating the CSRs read for counter access.
This patch doesn't have the event sampling support yet.
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/vcpu_pmu.c | 366 +++++++++++++++++++++++++++++++++++++-
1 file changed, 360 insertions(+), 6 deletions(-)
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index 894053a..73dccf7 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -12,10 +12,190 @@
#include <linux/perf/riscv_pmu.h>
#include <asm/csr.h>
#include <asm/kvm_vcpu_sbi.h>
+#include <asm/bitops.h>
#include <asm/kvm_vcpu_pmu.h>
#include <linux/kvm_host.h>
#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
+#define get_event_type(x) (((x) & SBI_PMU_EVENT_IDX_TYPE_MASK) >> 16)
+#define get_event_code(x) ((x) & SBI_PMU_EVENT_IDX_CODE_MASK)
+
+
+static enum perf_hw_id hw_event_perf_map[SBI_PMU_HW_GENERAL_MAX] = {
+ [SBI_PMU_HW_CPU_CYCLES] = PERF_COUNT_HW_CPU_CYCLES,
+ [SBI_PMU_HW_INSTRUCTIONS] = PERF_COUNT_HW_INSTRUCTIONS,
+ [SBI_PMU_HW_CACHE_REFERENCES] = PERF_COUNT_HW_CACHE_REFERENCES,
+ [SBI_PMU_HW_CACHE_MISSES] = PERF_COUNT_HW_CACHE_MISSES,
+ [SBI_PMU_HW_BRANCH_INSTRUCTIONS] = PERF_COUNT_HW_BRANCH_INSTRUCTIONS,
+ [SBI_PMU_HW_BRANCH_MISSES] = PERF_COUNT_HW_BRANCH_MISSES,
+ [SBI_PMU_HW_BUS_CYCLES] = PERF_COUNT_HW_BUS_CYCLES,
+ [SBI_PMU_HW_STALLED_CYCLES_FRONTEND] = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND,
+ [SBI_PMU_HW_STALLED_CYCLES_BACKEND] = PERF_COUNT_HW_STALLED_CYCLES_BACKEND,
+ [SBI_PMU_HW_REF_CPU_CYCLES] = PERF_COUNT_HW_REF_CPU_CYCLES,
+};
+
+static u64 kvm_pmu_get_sample_period(struct kvm_pmc *pmc)
+{
+ u64 counter_val_mask = GENMASK(pmc->cinfo.width, 0);
+ u64 sample_period;
+
+ if (!pmc->counter_val)
+ sample_period = counter_val_mask + 1;
+ else
+ sample_period = (-pmc->counter_val) & counter_val_mask;
+
+ return sample_period;
+}
+
+static u32 kvm_pmu_get_perf_event_type(unsigned long eidx)
+{
+ enum sbi_pmu_event_type etype = get_event_type(eidx);
+ u32 type = PERF_TYPE_MAX;
+
+ switch (etype) {
+ case SBI_PMU_EVENT_TYPE_HW:
+ type = PERF_TYPE_HARDWARE;
+ break;
+ case SBI_PMU_EVENT_TYPE_CACHE:
+ type = PERF_TYPE_HW_CACHE;
+ break;
+ case SBI_PMU_EVENT_TYPE_RAW:
+ case SBI_PMU_EVENT_TYPE_FW:
+ type = PERF_TYPE_RAW;
+ break;
+ default:
+ break;
+ }
+
+ return type;
+}
+
+static bool kvm_pmu_is_fw_event(unsigned long eidx)
+{
+ return get_event_type(eidx) == SBI_PMU_EVENT_TYPE_FW;
+}
+
+static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc)
+{
+ if (pmc->perf_event) {
+ perf_event_disable(pmc->perf_event);
+ perf_event_release_kernel(pmc->perf_event);
+ pmc->perf_event = NULL;
+ }
+}
+
+static u64 kvm_pmu_get_perf_event_hw_config(u32 sbi_event_code)
+{
+ return hw_event_perf_map[sbi_event_code];
+}
+
+static u64 kvm_pmu_get_perf_event_cache_config(u32 sbi_event_code)
+{
+ u64 config = U64_MAX;
+ unsigned int cache_type, cache_op, cache_result;
+
+ /* All the cache event masks lie within 0xFF. No separate masking is necesssary */
+ cache_type = (sbi_event_code & SBI_PMU_EVENT_CACHE_ID_CODE_MASK) >>
+ SBI_PMU_EVENT_CACHE_ID_SHIFT;
+ cache_op = (sbi_event_code & SBI_PMU_EVENT_CACHE_OP_ID_CODE_MASK) >>
+ SBI_PMU_EVENT_CACHE_OP_SHIFT;
+ cache_result = sbi_event_code & SBI_PMU_EVENT_CACHE_RESULT_ID_CODE_MASK;
+
+ if (cache_type >= PERF_COUNT_HW_CACHE_MAX ||
+ cache_op >= PERF_COUNT_HW_CACHE_OP_MAX ||
+ cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
+ return config;
+
+ config = cache_type | (cache_op << 8) | (cache_result << 16);
+
+ return config;
+}
+
+static u64 kvm_pmu_get_perf_event_config(unsigned long eidx, uint64_t evt_data)
+{
+ enum sbi_pmu_event_type etype = get_event_type(eidx);
+ u32 ecode = get_event_code(eidx);
+ u64 config = U64_MAX;
+
+ switch (etype) {
+ case SBI_PMU_EVENT_TYPE_HW:
+ if (ecode < SBI_PMU_HW_GENERAL_MAX)
+ config = kvm_pmu_get_perf_event_hw_config(ecode);
+ break;
+ case SBI_PMU_EVENT_TYPE_CACHE:
+ config = kvm_pmu_get_perf_event_cache_config(ecode);
+ break;
+ case SBI_PMU_EVENT_TYPE_RAW:
+ config = evt_data & RISCV_PMU_RAW_EVENT_MASK;
+ break;
+ case SBI_PMU_EVENT_TYPE_FW:
+ if (ecode < SBI_PMU_FW_MAX)
+ config = (1ULL << 63) | ecode;
+ break;
+ default:
+ break;
+ }
+
+ return config;
+}
+
+static int kvm_pmu_get_fixed_pmc_index(unsigned long eidx)
+{
+ u32 etype = kvm_pmu_get_perf_event_type(eidx);
+ u32 ecode = get_event_code(eidx);
+
+ if (etype != SBI_PMU_EVENT_TYPE_HW)
+ return -EINVAL;
+
+ if (ecode == SBI_PMU_HW_CPU_CYCLES)
+ return 0;
+ else if (ecode == SBI_PMU_HW_INSTRUCTIONS)
+ return 2;
+ else
+ return -EINVAL;
+}
+
+static int kvm_pmu_get_programmable_pmc_index(struct kvm_pmu *kvpmu, unsigned long eidx,
+ unsigned long cbase, unsigned long cmask)
+{
+ int ctr_idx = -1;
+ int i, pmc_idx;
+ int min, max;
+
+ if (kvm_pmu_is_fw_event(eidx)) {
+ /* Firmware counters are mapped 1:1 starting from num_hw_ctrs for simplicity */
+ min = kvpmu->num_hw_ctrs;
+ max = min + kvpmu->num_fw_ctrs;
+ } else {
+ /* First 3 counters are reserved for fixed counters */
+ min = 3;
+ max = kvpmu->num_hw_ctrs;
+ }
+
+ for_each_set_bit(i, &cmask, BITS_PER_LONG) {
+ pmc_idx = i + cbase;
+ if ((pmc_idx >= min && pmc_idx < max) &&
+ !test_bit(pmc_idx, kvpmu->pmc_in_use)) {
+ ctr_idx = pmc_idx;
+ break;
+ }
+ }
+
+ return ctr_idx;
+}
+
+static int pmu_get_pmc_index(struct kvm_pmu *pmu, unsigned long eidx,
+ unsigned long cbase, unsigned long cmask)
+{
+ int ret;
+
+ /* Fixed counters need to be have fixed mapping as they have different width */
+ ret = kvm_pmu_get_fixed_pmc_index(eidx);
+ if (ret >= 0)
+ return ret;
+
+ return kvm_pmu_get_programmable_pmc_index(pmu, eidx, cbase, cmask);
+}
static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
unsigned long *out_val)
@@ -34,6 +214,16 @@ static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
return 0;
}
+static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ctr_base,
+ unsigned long ctr_mask)
+{
+ /* Make sure the we have a valid counter mask requested from the caller */
+ if (!ctr_mask || (ctr_base + __fls(ctr_mask) >= kvm_pmu_num_counters(kvpmu)))
+ return -EINVAL;
+
+ return 0;
+}
+
int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
unsigned long *val, unsigned long new_val,
unsigned long wr_mask)
@@ -83,7 +273,39 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
unsigned long ctr_mask, unsigned long flag, uint64_t ival,
struct kvm_vcpu_sbi_ext_data *edata)
{
- /* TODO */
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ int i, pmc_index, sbiret = 0;
+ struct kvm_pmc *pmc;
+
+ if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
+ sbiret = SBI_ERR_INVALID_PARAM;
+ goto out;
+ }
+
+ /* Start the counters that have been configured and requested by the guest */
+ for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
+ pmc_index = i + ctr_base;
+ if (!test_bit(pmc_index, kvpmu->pmc_in_use))
+ continue;
+ pmc = &kvpmu->pmc[pmc_index];
+ if (flag & SBI_PMU_START_FLAG_SET_INIT_VALUE)
+ pmc->counter_val = ival;
+ if (pmc->perf_event) {
+ if (unlikely(pmc->started)) {
+ sbiret = SBI_ERR_ALREADY_STARTED;
+ continue;
+ }
+ perf_event_period(pmc->perf_event, kvm_pmu_get_sample_period(pmc));
+ perf_event_enable(pmc->perf_event);
+ pmc->started = true;
+ } else {
+ sbiret = SBI_ERR_INVALID_PARAM;
+ }
+ }
+
+out:
+ edata->err_val = sbiret;
+
return 0;
}
@@ -91,7 +313,45 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
unsigned long ctr_mask, unsigned long flag,
struct kvm_vcpu_sbi_ext_data *edata)
{
- /* TODO */
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ int i, pmc_index, sbiret = 0;
+ u64 enabled, running;
+ struct kvm_pmc *pmc;
+
+ if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
+ sbiret = SBI_ERR_INVALID_PARAM;
+ goto out;
+ }
+
+ /* Stop the counters that have been configured and requested by the guest */
+ for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
+ pmc_index = i + ctr_base;
+ if (!test_bit(pmc_index, kvpmu->pmc_in_use))
+ continue;
+ pmc = &kvpmu->pmc[pmc_index];
+ if (pmc->perf_event) {
+ if (pmc->started) {
+ /* Stop counting the counter */
+ perf_event_disable(pmc->perf_event);
+ pmc->started = false;
+ } else
+ sbiret = SBI_ERR_ALREADY_STOPPED;
+
+ if (flag & SBI_PMU_STOP_FLAG_RESET) {
+ /* Relase the counter if this is a reset request */
+ pmc->counter_val += perf_event_read_value(pmc->perf_event,
+ &enabled, &running);
+ kvm_pmu_release_perf_event(pmc);
+ clear_bit(pmc_index, kvpmu->pmc_in_use);
+ }
+ } else {
+ sbiret = SBI_ERR_INVALID_PARAM;
+ }
+ }
+
+out:
+ edata->err_val = sbiret;
+
return 0;
}
@@ -100,7 +360,89 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
unsigned long eidx, uint64_t evtdata,
struct kvm_vcpu_sbi_ext_data *edata)
{
- /* TODO */
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ struct perf_event *event;
+ int ctr_idx;
+ u32 etype = kvm_pmu_get_perf_event_type(eidx);
+ u64 config;
+ struct kvm_pmc *pmc;
+ int sbiret = 0;
+ struct perf_event_attr attr = {
+ .type = etype,
+ .size = sizeof(struct perf_event_attr),
+ .pinned = true,
+ /*
+ * It should never reach here if the platform doesn't support the sscofpmf
+ * extension as mode filtering won't work without it.
+ */
+ .exclude_host = true,
+ .exclude_hv = true,
+ .exclude_user = !!(flag & SBI_PMU_CFG_FLAG_SET_UINH),
+ .exclude_kernel = !!(flag & SBI_PMU_CFG_FLAG_SET_SINH),
+ .config1 = RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS,
+ };
+
+ if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
+ sbiret = SBI_ERR_INVALID_PARAM;
+ goto out;
+ }
+
+ if (kvm_pmu_is_fw_event(eidx)) {
+ sbiret = SBI_ERR_NOT_SUPPORTED;
+ goto out;
+ }
+
+ /*
+ * SKIP_MATCH flag indicates the caller is aware of the assigned counter
+ * for this event. Just do a sanity check if it already marked used.
+ */
+ if (flag & SBI_PMU_CFG_FLAG_SKIP_MATCH) {
+ if (!test_bit(ctr_base + __ffs(ctr_mask), kvpmu->pmc_in_use)) {
+ sbiret = SBI_ERR_FAILURE;
+ goto out;
+ }
+ ctr_idx = ctr_base + __ffs(ctr_mask);
+ } else {
+
+ ctr_idx = pmu_get_pmc_index(kvpmu, eidx, ctr_base, ctr_mask);
+ if (ctr_idx < 0) {
+ sbiret = SBI_ERR_NOT_SUPPORTED;
+ goto out;
+ }
+ }
+
+ pmc = &kvpmu->pmc[ctr_idx];
+ kvm_pmu_release_perf_event(pmc);
+ pmc->idx = ctr_idx;
+
+ config = kvm_pmu_get_perf_event_config(eidx, evtdata);
+ attr.config = config;
+ if (flag & SBI_PMU_CFG_FLAG_CLEAR_VALUE) {
+ //TODO: Do we really want to clear the value in hardware counter
+ pmc->counter_val = 0;
+ }
+
+ /*
+ * Set the default sample_period for now. The guest specified value
+ * will be updated in the start call.
+ */
+ attr.sample_period = kvm_pmu_get_sample_period(pmc);
+
+ event = perf_event_create_kernel_counter(&attr, -1, current, NULL, pmc);
+ if (IS_ERR(event)) {
+ pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
+ return PTR_ERR(event);
+ }
+
+ set_bit(ctr_idx, kvpmu->pmc_in_use);
+ pmc->perf_event = event;
+ if (flag & SBI_PMU_CFG_FLAG_AUTO_START)
+ perf_event_enable(pmc->perf_event);
+
+ edata->out_val = ctr_idx;
+out:
+ edata->err_val = sbiret;
+
return 0;
}
@@ -164,9 +506,9 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
if (i < 3)
/* CY, IR counters */
- kvpmu->pmc[i].cinfo.width = 63;
+ pmc->cinfo.width = 63;
else
- kvpmu->pmc[i].cinfo.width = hpm_width;
+ pmc->cinfo.width = hpm_width;
/*
* The CSR number doesn't have any relation with the logical
* hardware counters. The CSR numbers are encoded sequentially
@@ -187,7 +529,19 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
{
- /* TODO */
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ struct kvm_pmc *pmc;
+ int i;
+
+ if (!kvpmu)
+ return;
+
+ for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_MAX_COUNTERS) {
+ pmc = &kvpmu->pmc[i];
+ pmc->counter_val = 0;
+ kvm_pmu_release_perf_event(pmc);
+ }
+ bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
}
void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
--
2.25.1
SBI PMU extension defines a set of firmware events which can provide
useful information to guests about number of SBI calls. As hypervisor
implements the SBI PMU extension, these firmware events corresponds
to ecall invocations between VS->HS mode. All other firmware events
will always report zero if monitored as KVM doesn't implement them.
This patch adds all the infrastructure required to support firmware
events.
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 +++
arch/riscv/kvm/vcpu_pmu.c | 144 +++++++++++++++++++-------
2 files changed, 124 insertions(+), 36 deletions(-)
diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
index 022d45d..b235e7e 100644
--- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
+++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
@@ -17,6 +17,14 @@
#define RISCV_KVM_MAX_FW_CTRS 32
#define RISCV_MAX_COUNTERS 64
+struct kvm_fw_event {
+ /* Current value of the event */
+ unsigned long value;
+
+ /* Event monitoring status */
+ bool started;
+};
+
/* Per virtual pmu counter data */
struct kvm_pmc {
u8 idx;
@@ -25,11 +33,14 @@ struct kvm_pmc {
union sbi_pmu_ctr_info cinfo;
/* Event monitoring status */
bool started;
+ /* Monitoring event ID */
+ unsigned long event_idx;
};
/* PMU data structure per vcpu */
struct kvm_pmu {
struct kvm_pmc pmc[RISCV_MAX_COUNTERS];
+ struct kvm_fw_event fw_event[RISCV_KVM_MAX_FW_CTRS];
/* Number of the virtual firmware counters available */
int num_fw_ctrs;
/* Number of the virtual hardware counters available */
@@ -52,6 +63,7 @@ struct kvm_pmu {
{ .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
#endif
+int kvm_riscv_vcpu_pmu_incr_fw(struct kvm_vcpu *vcpu, unsigned long fid);
int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
unsigned long *val, unsigned long new_val,
unsigned long wr_mask);
@@ -81,6 +93,10 @@ struct kvm_pmu {
#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
{ .base = 0, .count = 0, .func = NULL },
+static inline int kvm_riscv_vcpu_pmu_incr_fw(struct kvm_vcpu *vcpu, unsigned long fid)
+{
+ return 0;
+}
static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
{
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index 73dccf7..b8d6aba 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -203,12 +203,15 @@ static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
struct kvm_pmc *pmc;
u64 enabled, running;
+ int fevent_code;
pmc = &kvpmu->pmc[cidx];
- if (!pmc->perf_event)
- return -EINVAL;
- pmc->counter_val += perf_event_read_value(pmc->perf_event, &enabled, &running);
+ if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
+ fevent_code = get_event_code(pmc->event_idx);
+ pmc->counter_val = kvpmu->fw_event[fevent_code].value;
+ } else if (pmc->perf_event)
+ pmc->counter_val += perf_event_read_value(pmc->perf_event, &enabled, &running);
*out_val = pmc->counter_val;
return 0;
@@ -224,6 +227,55 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
return 0;
}
+static int kvm_pmu_create_perf_event(struct kvm_pmc *pmc, int ctr_idx,
+ struct perf_event_attr *attr, unsigned long flag,
+ unsigned long eidx, unsigned long evtdata)
+{
+ struct perf_event *event;
+
+ kvm_pmu_release_perf_event(pmc);
+ pmc->idx = ctr_idx;
+
+ attr->config = kvm_pmu_get_perf_event_config(eidx, evtdata);
+ if (flag & SBI_PMU_CFG_FLAG_CLEAR_VALUE) {
+ //TODO: Do we really want to clear the value in hardware counter
+ pmc->counter_val = 0;
+ }
+
+ /*
+ * Set the default sample_period for now. The guest specified value
+ * will be updated in the start call.
+ */
+ attr->sample_period = kvm_pmu_get_sample_period(pmc);
+
+ event = perf_event_create_kernel_counter(attr, -1, current, NULL, pmc);
+ if (IS_ERR(event)) {
+ pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
+ return PTR_ERR(event);
+ }
+
+ pmc->perf_event = event;
+ if (flag & SBI_PMU_CFG_FLAG_AUTO_START)
+ perf_event_enable(pmc->perf_event);
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_incr_fw(struct kvm_vcpu *vcpu, unsigned long fid)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ struct kvm_fw_event *fevent;
+
+ if (!kvpmu || fid >= SBI_PMU_FW_MAX)
+ return -EINVAL;
+
+ fevent = &kvpmu->fw_event[fid];
+ if (fevent->started)
+ fevent->value++;
+
+ return 0;
+}
+
int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
unsigned long *val, unsigned long new_val,
unsigned long wr_mask)
@@ -276,6 +328,7 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
int i, pmc_index, sbiret = 0;
struct kvm_pmc *pmc;
+ int fevent_code;
if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
sbiret = SBI_ERR_INVALID_PARAM;
@@ -290,7 +343,22 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
pmc = &kvpmu->pmc[pmc_index];
if (flag & SBI_PMU_START_FLAG_SET_INIT_VALUE)
pmc->counter_val = ival;
- if (pmc->perf_event) {
+ if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
+ fevent_code = get_event_code(pmc->event_idx);
+ if (fevent_code >= SBI_PMU_FW_MAX) {
+ sbiret = SBI_ERR_INVALID_PARAM;
+ goto out;
+ }
+
+ /* Check if the counter was already started for some reason */
+ if (kvpmu->fw_event[fevent_code].started) {
+ sbiret = SBI_ERR_ALREADY_STARTED;
+ continue;
+ }
+
+ kvpmu->fw_event[fevent_code].started = true;
+ kvpmu->fw_event[fevent_code].value = pmc->counter_val;
+ } else if (pmc->perf_event) {
if (unlikely(pmc->started)) {
sbiret = SBI_ERR_ALREADY_STARTED;
continue;
@@ -317,6 +385,7 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
int i, pmc_index, sbiret = 0;
u64 enabled, running;
struct kvm_pmc *pmc;
+ int fevent_code;
if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
sbiret = SBI_ERR_INVALID_PARAM;
@@ -329,7 +398,18 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
if (!test_bit(pmc_index, kvpmu->pmc_in_use))
continue;
pmc = &kvpmu->pmc[pmc_index];
- if (pmc->perf_event) {
+ if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
+ fevent_code = get_event_code(pmc->event_idx);
+ if (fevent_code >= SBI_PMU_FW_MAX) {
+ sbiret = SBI_ERR_INVALID_PARAM;
+ goto out;
+ }
+
+ if (!kvpmu->fw_event[fevent_code].started)
+ sbiret = SBI_ERR_ALREADY_STOPPED;
+
+ kvpmu->fw_event[fevent_code].started = false;
+ } else if (pmc->perf_event) {
if (pmc->started) {
/* Stop counting the counter */
perf_event_disable(pmc->perf_event);
@@ -342,11 +422,14 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
pmc->counter_val += perf_event_read_value(pmc->perf_event,
&enabled, &running);
kvm_pmu_release_perf_event(pmc);
- clear_bit(pmc_index, kvpmu->pmc_in_use);
}
} else {
sbiret = SBI_ERR_INVALID_PARAM;
}
+ if (flag & SBI_PMU_STOP_FLAG_RESET) {
+ pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
+ clear_bit(pmc_index, kvpmu->pmc_in_use);
+ }
}
out:
@@ -361,12 +444,11 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
struct kvm_vcpu_sbi_ext_data *edata)
{
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
- struct perf_event *event;
- int ctr_idx;
+ int ctr_idx, sbiret = 0, ret;
u32 etype = kvm_pmu_get_perf_event_type(eidx);
- u64 config;
- struct kvm_pmc *pmc;
- int sbiret = 0;
+ struct kvm_pmc *pmc = NULL;
+ bool is_fevent;
+ unsigned long event_code;
struct perf_event_attr attr = {
.type = etype,
.size = sizeof(struct perf_event_attr),
@@ -387,7 +469,9 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
goto out;
}
- if (kvm_pmu_is_fw_event(eidx)) {
+ event_code = get_event_code(eidx);
+ is_fevent = kvm_pmu_is_fw_event(eidx);
+ if (is_fevent && event_code >= SBI_PMU_FW_MAX) {
sbiret = SBI_ERR_NOT_SUPPORTED;
goto out;
}
@@ -412,33 +496,17 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
}
pmc = &kvpmu->pmc[ctr_idx];
- kvm_pmu_release_perf_event(pmc);
- pmc->idx = ctr_idx;
-
- config = kvm_pmu_get_perf_event_config(eidx, evtdata);
- attr.config = config;
- if (flag & SBI_PMU_CFG_FLAG_CLEAR_VALUE) {
- //TODO: Do we really want to clear the value in hardware counter
- pmc->counter_val = 0;
- }
-
- /*
- * Set the default sample_period for now. The guest specified value
- * will be updated in the start call.
- */
- attr.sample_period = kvm_pmu_get_sample_period(pmc);
-
- event = perf_event_create_kernel_counter(&attr, -1, current, NULL, pmc);
- if (IS_ERR(event)) {
- pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
- return PTR_ERR(event);
+ if (is_fevent) {
+ if (flag & SBI_PMU_CFG_FLAG_AUTO_START)
+ kvpmu->fw_event[event_code].started = true;
+ } else {
+ ret = kvm_pmu_create_perf_event(pmc, ctr_idx, &attr, flag, eidx, evtdata);
+ if (ret)
+ return ret;
}
set_bit(ctr_idx, kvpmu->pmc_in_use);
- pmc->perf_event = event;
- if (flag & SBI_PMU_CFG_FLAG_AUTO_START)
- perf_event_enable(pmc->perf_event);
-
+ pmc->event_idx = eidx;
edata->out_val = ctr_idx;
out:
edata->err_val = sbiret;
@@ -489,6 +557,7 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
kvpmu->num_hw_ctrs = num_hw_ctrs;
kvpmu->num_fw_ctrs = num_fw_ctrs;
+ memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
/*
* There is no correlation between the logical hardware counter and virtual counters.
@@ -502,6 +571,7 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
continue;
pmc = &kvpmu->pmc[i];
pmc->idx = i;
+ pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
if (i < kvpmu->num_hw_ctrs) {
kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
if (i < 3)
@@ -540,8 +610,10 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
pmc = &kvpmu->pmc[i];
pmc->counter_val = 0;
kvm_pmu_release_perf_event(pmc);
+ pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
}
bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
+ memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
}
void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
--
2.25.1
KVM supports firmware events now. Invoke the firmware event increment
function from appropriate places.
Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/tlb.c | 4 ++++
arch/riscv/kvm/vcpu_sbi_replace.c | 7 +++++++
2 files changed, 11 insertions(+)
diff --git a/arch/riscv/kvm/tlb.c b/arch/riscv/kvm/tlb.c
index 309d79b..b797f7c 100644
--- a/arch/riscv/kvm/tlb.c
+++ b/arch/riscv/kvm/tlb.c
@@ -181,6 +181,7 @@ void kvm_riscv_local_tlb_sanitize(struct kvm_vcpu *vcpu)
void kvm_riscv_fence_i_process(struct kvm_vcpu *vcpu)
{
+ kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_FENCE_I_RCVD);
local_flush_icache_all();
}
@@ -264,15 +265,18 @@ void kvm_riscv_hfence_process(struct kvm_vcpu *vcpu)
d.addr, d.size, d.order);
break;
case KVM_RISCV_HFENCE_VVMA_ASID_GVA:
+ kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_ASID_RCVD);
kvm_riscv_local_hfence_vvma_asid_gva(
READ_ONCE(v->vmid), d.asid,
d.addr, d.size, d.order);
break;
case KVM_RISCV_HFENCE_VVMA_ASID_ALL:
+ kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_ASID_RCVD);
kvm_riscv_local_hfence_vvma_asid_all(
READ_ONCE(v->vmid), d.asid);
break;
case KVM_RISCV_HFENCE_VVMA_GVA:
+ kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_RCVD);
kvm_riscv_local_hfence_vvma_gva(
READ_ONCE(v->vmid),
d.addr, d.size, d.order);
diff --git a/arch/riscv/kvm/vcpu_sbi_replace.c b/arch/riscv/kvm/vcpu_sbi_replace.c
index abeb55f..71a671e 100644
--- a/arch/riscv/kvm/vcpu_sbi_replace.c
+++ b/arch/riscv/kvm/vcpu_sbi_replace.c
@@ -11,6 +11,7 @@
#include <linux/kvm_host.h>
#include <asm/sbi.h>
#include <asm/kvm_vcpu_timer.h>
+#include <asm/kvm_vcpu_pmu.h>
#include <asm/kvm_vcpu_sbi.h>
static int kvm_sbi_ext_time_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
@@ -25,6 +26,7 @@ static int kvm_sbi_ext_time_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
return 0;
}
+ kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_SET_TIMER);
#if __riscv_xlen == 32
next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
#else
@@ -57,6 +59,7 @@ static int kvm_sbi_ext_ipi_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
return 0;
}
+ kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_IPI_SENT);
kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
if (hbase != -1UL) {
if (tmp->vcpu_id < hbase)
@@ -67,6 +70,7 @@ static int kvm_sbi_ext_ipi_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
ret = kvm_riscv_vcpu_set_interrupt(tmp, IRQ_VS_SOFT);
if (ret < 0)
break;
+ kvm_riscv_vcpu_pmu_incr_fw(tmp, SBI_PMU_FW_IPI_RECVD);
}
return ret;
@@ -90,6 +94,7 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
switch (funcid) {
case SBI_EXT_RFENCE_REMOTE_FENCE_I:
kvm_riscv_fence_i(vcpu->kvm, hbase, hmask);
+ kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_FENCE_I_SENT);
break;
case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA:
if (cp->a2 == 0 && cp->a3 == 0)
@@ -97,6 +102,7 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
else
kvm_riscv_hfence_vvma_gva(vcpu->kvm, hbase, hmask,
cp->a2, cp->a3, PAGE_SHIFT);
+ kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_SENT);
break;
case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID:
if (cp->a2 == 0 && cp->a3 == 0)
@@ -107,6 +113,7 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
hbase, hmask,
cp->a2, cp->a3,
PAGE_SHIFT, cp->a4);
+ kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_ASID_SENT);
break;
case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA:
case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID:
--
2.25.1
Yo Atish,
On Fri, Jan 27, 2023 at 10:25:47AM -0800, Atish Patra wrote:
> This patch fixes/improve few minor things in SBI PMU extension
> definition.
>
> 1. Align all the firmware event names.
> @@ -171,7 +171,7 @@ enum sbi_pmu_fw_generic_events_t {
> SBI_PMU_FW_IPI_RECVD = 7,
> - SBI_PMU_FW_FENCE_I_RECVD = 9,
> + SBI_PMU_FW_FENCE_I_RCVD = 9,
> SBI_PMU_FW_SFENCE_VMA_RCVD = 11,
Alignment looks incomplete to me! Looks like you went from 2 RECVD and
1 RCVD to 2 RCVD and 1 RECVD! FWIW, the spec uses RECEIVED for all of
these:
https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc#114-event-firmware-events-type-15
Thanks,
Conor.
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> KVM module needs to know how many hardware counters and the counter
> width that the platform supports. Otherwise, it will not be able to show
> optimal value of virtual counters to the guest. The virtual hardware
> counters also need to have the same width as the logical hardware
> counters for simplicity. However, there shouldn't be mapping between
> virtual hardware counters and logical hardware counters. As we don't
> support hetergeneous harts or counters with different width as of now,
> the implementation relies on the counter width of the first available
> programmable counter.
>
> Signed-off-by: Atish Patra <[email protected]>
Looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> drivers/perf/riscv_pmu_sbi.c | 37 ++++++++++++++++++++++++++++++++--
> include/linux/perf/riscv_pmu.h | 3 +++
> 2 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index f6507ef..6b53adc 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -44,7 +44,7 @@ static const struct attribute_group *riscv_pmu_attr_groups[] = {
> };
>
> /*
> - * RISC-V doesn't have hetergenous harts yet. This need to be part of
> + * RISC-V doesn't have heterogeneous harts yet. This need to be part of
> * per_cpu in case of harts with different pmu counters
> */
> static union sbi_pmu_ctr_info *pmu_ctr_list;
> @@ -52,6 +52,9 @@ static bool riscv_pmu_use_irq;
> static unsigned int riscv_pmu_irq_num;
> static unsigned int riscv_pmu_irq;
>
> +/* Cache the available counters in a bitmask */
> +static unsigned long cmask;
> +
> struct sbi_pmu_event_data {
> union {
> union {
> @@ -267,6 +270,37 @@ static bool pmu_sbi_ctr_is_fw(int cidx)
> return (info->type == SBI_PMU_CTR_TYPE_FW) ? true : false;
> }
>
> +/*
> + * Returns the counter width of a programmable counter and number of hardware
> + * counters. As we don't support heterogeneous CPUs yet, it is okay to just
> + * return the counter width of the first programmable counter.
> + */
> +int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr)
> +{
> + int i;
> + union sbi_pmu_ctr_info *info;
> + u32 hpm_width = 0, hpm_count = 0;
> +
> + if (!cmask)
> + return -EINVAL;
> +
> + for_each_set_bit(i, &cmask, RISCV_MAX_COUNTERS) {
> + info = &pmu_ctr_list[i];
> + if (!info)
> + continue;
> + if (!hpm_width && info->csr != CSR_CYCLE && info->csr != CSR_INSTRET)
> + hpm_width = info->width;
> + if (info->type == SBI_PMU_CTR_TYPE_HW)
> + hpm_count++;
> + }
> +
> + *hw_ctr_width = hpm_width;
> + *num_hw_ctr = hpm_count;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(riscv_pmu_get_hpm_info);
> +
> static int pmu_sbi_ctr_get_idx(struct perf_event *event)
> {
> struct hw_perf_event *hwc = &event->hw;
> @@ -812,7 +846,6 @@ static void riscv_pmu_destroy(struct riscv_pmu *pmu)
> static int pmu_sbi_device_probe(struct platform_device *pdev)
> {
> struct riscv_pmu *pmu = NULL;
> - unsigned long cmask = 0;
> int ret = -ENODEV;
> int num_counters;
>
> diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
> index e17e86a..a1c3f77 100644
> --- a/include/linux/perf/riscv_pmu.h
> +++ b/include/linux/perf/riscv_pmu.h
> @@ -73,6 +73,9 @@ void riscv_pmu_legacy_skip_init(void);
> static inline void riscv_pmu_legacy_skip_init(void) {};
> #endif
> struct riscv_pmu *riscv_pmu_alloc(void);
> +#ifdef CONFIG_RISCV_PMU_SBI
> +int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr);
> +#endif
>
> #endif /* CONFIG_RISCV_PMU */
>
> --
> 2.25.1
>
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> Currently, the host driver doesn't have any method to identify if the
> requested perf event is from kvm or bare metal. As KVM runs in HS
> mode, there are no separate hypervisor privilege mode to distinguish
> between the attributes for guest/host.
>
> Improve the privilege mode filtering by using the event specific
> config1 field.
>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> drivers/perf/riscv_pmu_sbi.c | 27 ++++++++++++++++++++++-----
> include/linux/perf/riscv_pmu.h | 2 ++
> 2 files changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 6b53adc..e862b13 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -301,6 +301,27 @@ int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr)
> }
> EXPORT_SYMBOL_GPL(riscv_pmu_get_hpm_info);
>
> +static unsigned long pmu_sbi_get_filter_flags(struct perf_event *event)
> +{
> + unsigned long cflags = 0;
> + bool guest_events = false;
> +
> + if (event->attr.config1 & RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS)
> + guest_events = true;
> + if (event->attr.exclude_kernel)
> + cflags |= guest_events ? SBI_PMU_CFG_FLAG_SET_VSINH : SBI_PMU_CFG_FLAG_SET_SINH;
> + if (event->attr.exclude_user)
> + cflags |= guest_events ? SBI_PMU_CFG_FLAG_SET_VUINH : SBI_PMU_CFG_FLAG_SET_UINH;
> + if (guest_events && event->attr.exclude_hv)
> + cflags |= SBI_PMU_CFG_FLAG_SET_SINH;
> + if (event->attr.exclude_host)
> + cflags |= SBI_PMU_CFG_FLAG_SET_UINH | SBI_PMU_CFG_FLAG_SET_SINH;
> + if (event->attr.exclude_guest)
> + cflags |= SBI_PMU_CFG_FLAG_SET_VSINH | SBI_PMU_CFG_FLAG_SET_VUINH;
> +
> + return cflags;
> +}
> +
> static int pmu_sbi_ctr_get_idx(struct perf_event *event)
> {
> struct hw_perf_event *hwc = &event->hw;
> @@ -311,11 +332,7 @@ static int pmu_sbi_ctr_get_idx(struct perf_event *event)
> uint64_t cbase = 0;
> unsigned long cflags = 0;
>
> - if (event->attr.exclude_kernel)
> - cflags |= SBI_PMU_CFG_FLAG_SET_SINH;
> - if (event->attr.exclude_user)
> - cflags |= SBI_PMU_CFG_FLAG_SET_UINH;
> -
> + cflags = pmu_sbi_get_filter_flags(event);
> /* retrieve the available counter index */
> #if defined(CONFIG_32BIT)
> ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase,
> diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
> index a1c3f77..1c42146 100644
> --- a/include/linux/perf/riscv_pmu.h
> +++ b/include/linux/perf/riscv_pmu.h
> @@ -26,6 +26,8 @@
>
> #define RISCV_PMU_STOP_FLAG_RESET 1
>
> +#define RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS 0x1
For consistency other defines in this header:
s/RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS/RISCV_PMU_CONFIG1_GUEST_EVENTS/
> +
> struct cpu_hw_events {
> /* currently enabled events */
> int n_events;
> --
> 2.25.1
>
Otherwise, it looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> Currently the probe function just checks if an SBI extension is
> registered or not. However, the extension may not want to advertise
> itself depending on some other condition.
> An additional extension specific probe function will allow
> extensions to decide if they want to be advertised to the caller or
> not. Any extension that does not require additional dependency checks
> can avoid implementing this function.
>
> Signed-off-by: Atish Patra <[email protected]>
Looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> arch/riscv/include/asm/kvm_vcpu_sbi.h | 3 +++
> arch/riscv/kvm/vcpu_sbi_base.c | 13 +++++++++++--
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_vcpu_sbi.h b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> index f79478a..45ba341 100644
> --- a/arch/riscv/include/asm/kvm_vcpu_sbi.h
> +++ b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> @@ -29,6 +29,9 @@ struct kvm_vcpu_sbi_extension {
> int (*handler)(struct kvm_vcpu *vcpu, struct kvm_run *run,
> unsigned long *out_val, struct kvm_cpu_trap *utrap,
> bool *exit);
> +
> + /* Extension specific probe function */
> + unsigned long (*probe)(struct kvm_vcpu *vcpu);
> };
>
> void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, struct kvm_run *run);
> diff --git a/arch/riscv/kvm/vcpu_sbi_base.c b/arch/riscv/kvm/vcpu_sbi_base.c
> index 5d65c63..846d518 100644
> --- a/arch/riscv/kvm/vcpu_sbi_base.c
> +++ b/arch/riscv/kvm/vcpu_sbi_base.c
> @@ -19,6 +19,7 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> {
> int ret = 0;
> struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> + const struct kvm_vcpu_sbi_extension *sbi_ext;
>
> switch (cp->a6) {
> case SBI_EXT_BASE_GET_SPEC_VERSION:
> @@ -43,8 +44,16 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> */
> kvm_riscv_vcpu_sbi_forward(vcpu, run);
> *exit = true;
> - } else
> - *out_val = kvm_vcpu_sbi_find_ext(cp->a0) ? 1 : 0;
> + } else {
> + sbi_ext = kvm_vcpu_sbi_find_ext(cp->a0);
> + if (sbi_ext) {
> + if (sbi_ext->probe)
> + *out_val = sbi_ext->probe(vcpu);
> + else
> + *out_val = 1;
> + } else
> + *out_val = 0;
> + }
> break;
> case SBI_EXT_BASE_GET_MVENDORID:
> *out_val = vcpu->arch.mvendorid;
> --
> 2.25.1
>
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> According to the SBI specification, the stop function can only
> return error code SBI_ERR_FAILED. However, currently it returns
> -EINVAL which will be mapped SBI_ERR_INVALID_PARAM.
>
> Return an linux error code that maps to SBI_ERR_FAILED i.e doesn't map
> to any other SBI error code. While EACCES is not the best error code
> to describe the situation, it is close enough and will be replaced
> with SBI error codes directly anyways.
>
> Signed-off-by: Atish Patra <[email protected]>
Looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> arch/riscv/kvm/vcpu_sbi_hsm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/kvm/vcpu_sbi_hsm.c b/arch/riscv/kvm/vcpu_sbi_hsm.c
> index 2e915ca..619ac0f 100644
> --- a/arch/riscv/kvm/vcpu_sbi_hsm.c
> +++ b/arch/riscv/kvm/vcpu_sbi_hsm.c
> @@ -42,7 +42,7 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu)
> static int kvm_sbi_hsm_vcpu_stop(struct kvm_vcpu *vcpu)
> {
> if (vcpu->arch.power_off)
> - return -EINVAL;
> + return -EACCES;
>
> kvm_riscv_vcpu_power_off(vcpu);
>
> --
> 2.25.1
>
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> Currently, the SBI extension handle is expected to return Linux error code.
> The top SBI layer converts the Linux error code to SBI specific error code
> that can be returned to guest invoking the SBI calls. This model works
> as long as SBI error codes have 1-to-1 mappings between them.
> However, that may not be true always. This patch attempts to disassociate
> both these error codes by allowing the SBI extension implementation to
> return SBI specific error codes as well.
>
> The extension will continue to return the Linux error specific code which
> will indicate any problem *with* the extension emulation while the
> SBI specific error will indicate the problem *of* the emulation.
>
> Suggested-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/include/asm/kvm_vcpu_sbi.h | 10 ++++--
> arch/riscv/kvm/vcpu_sbi.c | 46 ++++++++------------------
> arch/riscv/kvm/vcpu_sbi_base.c | 38 ++++++++++------------
> arch/riscv/kvm/vcpu_sbi_hsm.c | 29 +++++++++--------
> arch/riscv/kvm/vcpu_sbi_replace.c | 47 ++++++++++++++-------------
> arch/riscv/kvm/vcpu_sbi_v01.c | 11 +++----
> 6 files changed, 84 insertions(+), 97 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_vcpu_sbi.h b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> index 45ba341..38407b3 100644
> --- a/arch/riscv/include/asm/kvm_vcpu_sbi.h
> +++ b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> @@ -18,6 +18,12 @@ struct kvm_vcpu_sbi_context {
> int return_handled;
> };
>
> +struct kvm_vcpu_sbi_ext_data {
s/kvm_vcpu_sbi_ext_data/kvm_vcpu_sbi_return/
> + unsigned long out_val;
> + unsigned long err_val;
Add "struct kvm_cpu_trap utrap" here.
> + bool uexit;
> +};
> +
> struct kvm_vcpu_sbi_extension {
> unsigned long extid_start;
> unsigned long extid_end;
> @@ -27,8 +33,8 @@ struct kvm_vcpu_sbi_extension {
> * specific error codes.
> */
> int (*handler)(struct kvm_vcpu *vcpu, struct kvm_run *run,
> - unsigned long *out_val, struct kvm_cpu_trap *utrap,
> - bool *exit);
> + struct kvm_vcpu_sbi_ext_data *edata,
> + struct kvm_cpu_trap *utrap);
>
> /* Extension specific probe function */
> unsigned long (*probe)(struct kvm_vcpu *vcpu);
> diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> index f96991d..aa42da6 100644
> --- a/arch/riscv/kvm/vcpu_sbi.c
> +++ b/arch/riscv/kvm/vcpu_sbi.c
> @@ -12,26 +12,6 @@
> #include <asm/sbi.h>
> #include <asm/kvm_vcpu_sbi.h>
>
> -static int kvm_linux_err_map_sbi(int err)
> -{
> - switch (err) {
> - case 0:
> - return SBI_SUCCESS;
> - case -EPERM:
> - return SBI_ERR_DENIED;
> - case -EINVAL:
> - return SBI_ERR_INVALID_PARAM;
> - case -EFAULT:
> - return SBI_ERR_INVALID_ADDRESS;
> - case -EOPNOTSUPP:
> - return SBI_ERR_NOT_SUPPORTED;
> - case -EALREADY:
> - return SBI_ERR_ALREADY_AVAILABLE;
> - default:
> - return SBI_ERR_FAILURE;
> - };
> -}
> -
> #ifndef CONFIG_RISCV_SBI_V01
> static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_v01 = {
> .extid_start = -1UL,
> @@ -125,11 +105,10 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
> int ret = 1;
> bool next_sepc = true;
> - bool userspace_exit = false;
> struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> const struct kvm_vcpu_sbi_extension *sbi_ext;
> struct kvm_cpu_trap utrap = { 0 };
Remove "struct kvm_cpu_trap utrap" from here since it can be
part of "struct kvm_vcpu_sbi_return"
> - unsigned long out_val = 0;
> + struct kvm_vcpu_sbi_ext_data edata_out = { 0 };
> bool ext_is_v01 = false;
>
> sbi_ext = kvm_vcpu_sbi_find_ext(cp->a7);
> @@ -139,13 +118,22 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
> cp->a7 <= SBI_EXT_0_1_SHUTDOWN)
> ext_is_v01 = true;
> #endif
> - ret = sbi_ext->handler(vcpu, run, &out_val, &utrap, &userspace_exit);
> + ret = sbi_ext->handler(vcpu, run, &edata_out, &utrap);
> } else {
> /* Return error for unsupported SBI calls */
> cp->a0 = SBI_ERR_NOT_SUPPORTED;
> goto ecall_done;
> }
>
> + /*
> + * When the SBI extension returns a Linux error code, it exits the ioctl
> + * loop and forwards the error to userspace.
> + */
> + if (ret < 0) {
> + next_sepc = false;
> + goto ecall_done;
> + }
> +
> /* Handle special error cases i.e trap, exit or userspace forward */
> if (utrap.scause) {
> /* No need to increment sepc or exit ioctl loop */
> @@ -157,24 +145,18 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
> }
>
> /* Exit ioctl loop or Propagate the error code the guest */
> - if (userspace_exit) {
> + if (edata_out.uexit) {
> next_sepc = false;
> ret = 0;
> } else {
> - /**
> - * SBI extension handler always returns an Linux error code. Convert
> - * it to the SBI specific error code that can be propagated the SBI
> - * caller.
> - */
> - ret = kvm_linux_err_map_sbi(ret);
> - cp->a0 = ret;
> + cp->a0 = edata_out.err_val;
> ret = 1;
> }
> ecall_done:
> if (next_sepc)
> cp->sepc += 4;
> if (!ext_is_v01)
> - cp->a1 = out_val;
> + cp->a1 = edata_out.out_val;
Strange! This now updates the "a1" register when ret < 0 which it should not.
Ideally, the "a1" register should be only updated when "ret == 1".
>
> return ret;
> }
> diff --git a/arch/riscv/kvm/vcpu_sbi_base.c b/arch/riscv/kvm/vcpu_sbi_base.c
> index 846d518..84885e5 100644
> --- a/arch/riscv/kvm/vcpu_sbi_base.c
> +++ b/arch/riscv/kvm/vcpu_sbi_base.c
> @@ -14,24 +14,23 @@
> #include <asm/kvm_vcpu_sbi.h>
>
> static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> - unsigned long *out_val,
> - struct kvm_cpu_trap *trap, bool *exit)
> + struct kvm_vcpu_sbi_ext_data *edata,
> + struct kvm_cpu_trap *trap)
> {
> - int ret = 0;
> struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> const struct kvm_vcpu_sbi_extension *sbi_ext;
>
> switch (cp->a6) {
> case SBI_EXT_BASE_GET_SPEC_VERSION:
> - *out_val = (KVM_SBI_VERSION_MAJOR <<
> + edata->out_val = (KVM_SBI_VERSION_MAJOR <<
> SBI_SPEC_VERSION_MAJOR_SHIFT) |
> KVM_SBI_VERSION_MINOR;
> break;
> case SBI_EXT_BASE_GET_IMP_ID:
> - *out_val = KVM_SBI_IMPID;
> + edata->out_val = KVM_SBI_IMPID;
> break;
> case SBI_EXT_BASE_GET_IMP_VERSION:
> - *out_val = LINUX_VERSION_CODE;
> + edata->out_val = LINUX_VERSION_CODE;
> break;
> case SBI_EXT_BASE_PROBE_EXT:
> if ((cp->a0 >= SBI_EXT_EXPERIMENTAL_START &&
> @@ -43,33 +42,33 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> * forward it to the userspace
> */
> kvm_riscv_vcpu_sbi_forward(vcpu, run);
> - *exit = true;
> + edata->uexit = true;
> } else {
> sbi_ext = kvm_vcpu_sbi_find_ext(cp->a0);
> if (sbi_ext) {
> if (sbi_ext->probe)
> - *out_val = sbi_ext->probe(vcpu);
> + edata->out_val = sbi_ext->probe(vcpu);
> else
> - *out_val = 1;
> + edata->out_val = 1;
> } else
> - *out_val = 0;
> + edata->out_val = 0;
> }
> break;
> case SBI_EXT_BASE_GET_MVENDORID:
> - *out_val = vcpu->arch.mvendorid;
> + edata->out_val = vcpu->arch.mvendorid;
> break;
> case SBI_EXT_BASE_GET_MARCHID:
> - *out_val = vcpu->arch.marchid;
> + edata->out_val = vcpu->arch.marchid;
> break;
> case SBI_EXT_BASE_GET_MIMPID:
> - *out_val = vcpu->arch.mimpid;
> + edata->out_val = vcpu->arch.mimpid;
> break;
> default:
> - ret = -EOPNOTSUPP;
> + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> break;
> }
>
> - return ret;
> + return 0;
> }
>
> const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_base = {
> @@ -79,17 +78,16 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_base = {
> };
>
> static int kvm_sbi_ext_forward_handler(struct kvm_vcpu *vcpu,
> - struct kvm_run *run,
> - unsigned long *out_val,
> - struct kvm_cpu_trap *utrap,
> - bool *exit)
> + struct kvm_run *run,
> + struct kvm_vcpu_sbi_ext_data *edata,
> + struct kvm_cpu_trap *utrap)
> {
> /*
> * Both SBI experimental and vendor extensions are
> * unconditionally forwarded to userspace.
> */
> kvm_riscv_vcpu_sbi_forward(vcpu, run);
> - *exit = true;
> + edata->uexit = true;
> return 0;
> }
>
> diff --git a/arch/riscv/kvm/vcpu_sbi_hsm.c b/arch/riscv/kvm/vcpu_sbi_hsm.c
> index 619ac0f..5fb526c 100644
> --- a/arch/riscv/kvm/vcpu_sbi_hsm.c
> +++ b/arch/riscv/kvm/vcpu_sbi_hsm.c
> @@ -21,9 +21,9 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu)
>
> target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, target_vcpuid);
> if (!target_vcpu)
> - return -EINVAL;
> + return SBI_ERR_INVALID_PARAM;
> if (!target_vcpu->arch.power_off)
> - return -EALREADY;
> + return SBI_ERR_ALREADY_AVAILABLE;
>
> reset_cntx = &target_vcpu->arch.guest_reset_context;
> /* start address */
> @@ -42,7 +42,7 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu)
> static int kvm_sbi_hsm_vcpu_stop(struct kvm_vcpu *vcpu)
> {
> if (vcpu->arch.power_off)
> - return -EACCES;
> + return SBI_ERR_FAILURE;
>
> kvm_riscv_vcpu_power_off(vcpu);
>
> @@ -57,7 +57,7 @@ static int kvm_sbi_hsm_vcpu_get_status(struct kvm_vcpu *vcpu)
>
> target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, target_vcpuid);
> if (!target_vcpu)
> - return -EINVAL;
> + return SBI_ERR_INVALID_PARAM;
> if (!target_vcpu->arch.power_off)
> return SBI_HSM_STATE_STARTED;
> else if (vcpu->stat.generic.blocking)
> @@ -67,9 +67,8 @@ static int kvm_sbi_hsm_vcpu_get_status(struct kvm_vcpu *vcpu)
> }
>
> static int kvm_sbi_ext_hsm_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> - unsigned long *out_val,
> - struct kvm_cpu_trap *utrap,
> - bool *exit)
> + struct kvm_vcpu_sbi_ext_data *edata,
> + struct kvm_cpu_trap *utrap)
> {
> int ret = 0;
> struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> @@ -88,27 +87,29 @@ static int kvm_sbi_ext_hsm_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> case SBI_EXT_HSM_HART_STATUS:
> ret = kvm_sbi_hsm_vcpu_get_status(vcpu);
> if (ret >= 0) {
> - *out_val = ret;
> - ret = 0;
> + edata->out_val = ret;
> + edata->err_val = 0;
> }
> - break;
> + return 0;
> case SBI_EXT_HSM_HART_SUSPEND:
> switch (cp->a0) {
> case SBI_HSM_SUSPEND_RET_DEFAULT:
> kvm_riscv_vcpu_wfi(vcpu);
> break;
> case SBI_HSM_SUSPEND_NON_RET_DEFAULT:
> - ret = -EOPNOTSUPP;
> + ret = SBI_ERR_NOT_SUPPORTED;
> break;
> default:
> - ret = -EINVAL;
> + ret = SBI_ERR_INVALID_PARAM;
> }
> break;
> default:
> - ret = -EOPNOTSUPP;
> + ret = SBI_ERR_NOT_SUPPORTED;
> }
>
> - return ret;
> + edata->err_val = ret;
> +
> + return 0;
> }
>
> const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_hsm = {
> diff --git a/arch/riscv/kvm/vcpu_sbi_replace.c b/arch/riscv/kvm/vcpu_sbi_replace.c
> index 03a0198..abeb55f 100644
> --- a/arch/riscv/kvm/vcpu_sbi_replace.c
> +++ b/arch/riscv/kvm/vcpu_sbi_replace.c
> @@ -14,15 +14,16 @@
> #include <asm/kvm_vcpu_sbi.h>
>
> static int kvm_sbi_ext_time_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> - unsigned long *out_val,
> - struct kvm_cpu_trap *utrap, bool *exit)
> + struct kvm_vcpu_sbi_ext_data *edata,
> + struct kvm_cpu_trap *utrap)
> {
> - int ret = 0;
> struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> u64 next_cycle;
>
> - if (cp->a6 != SBI_EXT_TIME_SET_TIMER)
> - return -EINVAL;
> + if (cp->a6 != SBI_EXT_TIME_SET_TIMER) {
> + edata->err_val = SBI_ERR_INVALID_PARAM;
> + return 0;
> + }
>
> #if __riscv_xlen == 32
> next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
> @@ -31,7 +32,7 @@ static int kvm_sbi_ext_time_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> #endif
> kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
>
> - return ret;
> + return 0;
> }
>
> const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_time = {
> @@ -41,8 +42,8 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_time = {
> };
>
> static int kvm_sbi_ext_ipi_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> - unsigned long *out_val,
> - struct kvm_cpu_trap *utrap, bool *exit)
> + struct kvm_vcpu_sbi_ext_data *edata,
> + struct kvm_cpu_trap *utrap)
> {
> int ret = 0;
> unsigned long i;
> @@ -51,8 +52,10 @@ static int kvm_sbi_ext_ipi_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> unsigned long hmask = cp->a0;
> unsigned long hbase = cp->a1;
>
> - if (cp->a6 != SBI_EXT_IPI_SEND_IPI)
> - return -EINVAL;
> + if (cp->a6 != SBI_EXT_IPI_SEND_IPI) {
> + edata->err_val = SBI_ERR_INVALID_PARAM;
> + return 0;
> + }
>
> kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> if (hbase != -1UL) {
> @@ -76,10 +79,9 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_ipi = {
> };
>
> static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> - unsigned long *out_val,
> - struct kvm_cpu_trap *utrap, bool *exit)
> + struct kvm_vcpu_sbi_ext_data *edata,
> + struct kvm_cpu_trap *utrap)
> {
> - int ret = 0;
> struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> unsigned long hmask = cp->a0;
> unsigned long hbase = cp->a1;
> @@ -116,10 +118,10 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
> */
> break;
> default:
> - ret = -EOPNOTSUPP;
> + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> }
>
> - return ret;
> + return 0;
> }
>
> const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_rfence = {
> @@ -130,14 +132,13 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_rfence = {
>
> static int kvm_sbi_ext_srst_handler(struct kvm_vcpu *vcpu,
> struct kvm_run *run,
> - unsigned long *out_val,
> - struct kvm_cpu_trap *utrap, bool *exit)
> + struct kvm_vcpu_sbi_ext_data *edata,
> + struct kvm_cpu_trap *utrap)
> {
> struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> unsigned long funcid = cp->a6;
> u32 reason = cp->a1;
> u32 type = cp->a0;
> - int ret = 0;
>
> switch (funcid) {
> case SBI_EXT_SRST_RESET:
> @@ -146,24 +147,24 @@ static int kvm_sbi_ext_srst_handler(struct kvm_vcpu *vcpu,
> kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
> KVM_SYSTEM_EVENT_SHUTDOWN,
> reason);
> - *exit = true;
> + edata->uexit = true;
> break;
> case SBI_SRST_RESET_TYPE_COLD_REBOOT:
> case SBI_SRST_RESET_TYPE_WARM_REBOOT:
> kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
> KVM_SYSTEM_EVENT_RESET,
> reason);
> - *exit = true;
> + edata->uexit = true;
> break;
> default:
> - ret = -EOPNOTSUPP;
> + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> }
> break;
> default:
> - ret = -EOPNOTSUPP;
> + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> }
>
> - return ret;
> + return 0;
> }
>
> const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_srst = {
> diff --git a/arch/riscv/kvm/vcpu_sbi_v01.c b/arch/riscv/kvm/vcpu_sbi_v01.c
> index 489f225..c0ccc58 100644
> --- a/arch/riscv/kvm/vcpu_sbi_v01.c
> +++ b/arch/riscv/kvm/vcpu_sbi_v01.c
> @@ -14,9 +14,8 @@
> #include <asm/kvm_vcpu_sbi.h>
>
> static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> - unsigned long *out_val,
> - struct kvm_cpu_trap *utrap,
> - bool *exit)
> + struct kvm_vcpu_sbi_ext_data *edata,
> + struct kvm_cpu_trap *utrap)
> {
> ulong hmask;
> int i, ret = 0;
> @@ -33,7 +32,7 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> * handled in kernel so we forward these to user-space
> */
> kvm_riscv_vcpu_sbi_forward(vcpu, run);
> - *exit = true;
> + edata->uexit = true;
> break;
> case SBI_EXT_0_1_SET_TIMER:
> #if __riscv_xlen == 32
> @@ -65,7 +64,7 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> case SBI_EXT_0_1_SHUTDOWN:
> kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
> KVM_SYSTEM_EVENT_SHUTDOWN, 0);
> - *exit = true;
> + edata->uexit = true;
> break;
> case SBI_EXT_0_1_REMOTE_FENCE_I:
> case SBI_EXT_0_1_REMOTE_SFENCE_VMA:
> @@ -103,7 +102,7 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> }
> break;
> default:
> - ret = -EINVAL;
> + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> break;
> }
>
> --
> 2.25.1
>
Regards,
Anup
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> This patch only adds barebore structure of perf implementation. Most of
s/barebore/barebone/
> the function returns zero at this point and will be implemented
> fully in the future.
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/include/asm/kvm_host.h | 3 +
> arch/riscv/include/asm/kvm_vcpu_pmu.h | 76 ++++++++++++++
> arch/riscv/kvm/Makefile | 1 +
> arch/riscv/kvm/vcpu.c | 5 +
> arch/riscv/kvm/vcpu_pmu.c | 145 ++++++++++++++++++++++++++
> 5 files changed, 230 insertions(+)
> create mode 100644 arch/riscv/include/asm/kvm_vcpu_pmu.h
> create mode 100644 arch/riscv/kvm/vcpu_pmu.c
>
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index 93f43a3..f9874b4 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -18,6 +18,7 @@
> #include <asm/kvm_vcpu_insn.h>
> #include <asm/kvm_vcpu_sbi.h>
> #include <asm/kvm_vcpu_timer.h>
> +#include <asm/kvm_vcpu_pmu.h>
>
> #define KVM_MAX_VCPUS 1024
>
> @@ -228,6 +229,8 @@ struct kvm_vcpu_arch {
>
> /* Don't run the VCPU (blocked) */
> bool pause;
> +
> + struct kvm_pmu pmu;
Add a single line comment just like other members of the structure.
I also suggest naming this variable "pmu_context" or something similar
for naming consistency.
> };
>
> static inline void kvm_arch_hardware_unsetup(void) {}
> diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> new file mode 100644
> index 0000000..3f43a43
> --- /dev/null
> +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> @@ -0,0 +1,76 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2023 Rivos Inc
> + *
> + * Authors:
> + * Atish Patra <[email protected]>
> + */
> +
> +#ifndef __KVM_VCPU_RISCV_PMU_H
> +#define __KVM_VCPU_RISCV_PMU_H
> +
> +#include <linux/perf/riscv_pmu.h>
> +#include <asm/kvm_vcpu_sbi.h>
> +#include <asm/sbi.h>
> +
> +#ifdef CONFIG_RISCV_PMU_SBI
> +#define RISCV_KVM_MAX_FW_CTRS 32
> +#define RISCV_MAX_COUNTERS 64
> +
> +/* Per virtual pmu counter data */
> +struct kvm_pmc {
> + u8 idx;
> + struct perf_event *perf_event;
> + uint64_t counter_val;
> + union sbi_pmu_ctr_info cinfo;
> + /* Event monitoring status */
> + bool started;
> +};
> +
> +/* PMU data structure per vcpu */
> +struct kvm_pmu {
> + struct kvm_pmc pmc[RISCV_MAX_COUNTERS];
> + /* Number of the virtual firmware counters available */
> + int num_fw_ctrs;
> + /* Number of the virtual hardware counters available */
> + int num_hw_ctrs;
> + /* A flag to indicate that pmu initialization is done */
> + bool init_done;
> + /* Bit map of all the virtual counter used */
> + DECLARE_BITMAP(pmc_in_use, RISCV_MAX_COUNTERS);
> +};
> +
> +#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
> +#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
> +
> +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
> +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> + struct kvm_vcpu_sbi_ext_data *edata);
> +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> + struct kvm_vcpu_sbi_ext_data *edata);
> +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag,
> + struct kvm_vcpu_sbi_ext_data *edata);
> +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag,
> + unsigned long eidx, uint64_t evtdata,
> + struct kvm_vcpu_sbi_ext_data *edata);
> +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> + struct kvm_vcpu_sbi_ext_data *edata);
> +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
> +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
> +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> +
> +#else
> +struct kvm_pmu {
> +};
> +
> +static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> +{
> + return 0;
> +}
> +static inline void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu) {}
> +#endif /* CONFIG_RISCV_PMU_SBI */
> +#endif /* !__KVM_VCPU_RISCV_PMU_H */
> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> index 019df920..5de1053 100644
> --- a/arch/riscv/kvm/Makefile
> +++ b/arch/riscv/kvm/Makefile
> @@ -25,3 +25,4 @@ kvm-y += vcpu_sbi_base.o
> kvm-y += vcpu_sbi_replace.o
> kvm-y += vcpu_sbi_hsm.o
> kvm-y += vcpu_timer.o
> +kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> index 7c08567..b746f21 100644
> --- a/arch/riscv/kvm/vcpu.c
> +++ b/arch/riscv/kvm/vcpu.c
> @@ -137,6 +137,7 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
>
> WRITE_ONCE(vcpu->arch.irqs_pending, 0);
> WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
Add an empty newline here.
> + kvm_riscv_vcpu_pmu_reset(vcpu);
>
> vcpu->arch.hfence_head = 0;
> vcpu->arch.hfence_tail = 0;
> @@ -194,6 +195,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> /* Setup VCPU timer */
> kvm_riscv_vcpu_timer_init(vcpu);
>
> + /* setup performance monitoring */
> + kvm_riscv_vcpu_pmu_init(vcpu);
> +
> /* Reset VCPU */
> kvm_riscv_reset_vcpu(vcpu);
>
> @@ -216,6 +220,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> /* Cleanup VCPU timer */
> kvm_riscv_vcpu_timer_deinit(vcpu);
>
> + kvm_riscv_vcpu_pmu_deinit(vcpu);
Add an empty newline here.
> /* Free unused pages pre-allocated for G-stage page table mappings */
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> }
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> new file mode 100644
> index 0000000..d3fd551
> --- /dev/null
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -0,0 +1,145 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2023 Rivos Inc
> + *
> + * Authors:
> + * Atish Patra <[email protected]>
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/kvm_host.h>
> +#include <linux/perf/riscv_pmu.h>
> +#include <asm/csr.h>
> +#include <asm/kvm_vcpu_sbi.h>
> +#include <asm/kvm_vcpu_pmu.h>
> +#include <linux/kvm_host.h>
> +
> +#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> +
> +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
> +{
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> +
> + edata->out_val = kvm_pmu_num_counters(kvpmu);
> +
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> + struct kvm_vcpu_sbi_ext_data *edata)
> +{
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> +
> + if (cidx > RISCV_MAX_COUNTERS || cidx == 1) {
> + edata->err_val = SBI_ERR_INVALID_PARAM;
> + return 0;
> + }
> +
> + edata->out_val = kvpmu->pmc[cidx].cinfo.value;
> +
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> + struct kvm_vcpu_sbi_ext_data *edata)
> +{
> + /* TODO */
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag,
> + struct kvm_vcpu_sbi_ext_data *edata)
> +{
> + /* TODO */
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag,
> + unsigned long eidx, uint64_t evtdata,
> + struct kvm_vcpu_sbi_ext_data *edata)
> +{
> + /* TODO */
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> + struct kvm_vcpu_sbi_ext_data *edata)
> +{
> + /* TODO */
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> +{
> + int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + struct kvm_pmc *pmc;
> +
> + ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
> + if (ret < 0)
> + return ret;
> +
> + if (!hpm_width || !num_hw_ctrs) {
> + pr_err("Cannot initialize VCPU with NULL hpmcounter width or number of counters\n");
What will happen if underlying M-mode firmware does not implement
SBI PMU extension ?
> + return -EINVAL;
> + }
> +
> + if ((num_hw_ctrs + RISCV_KVM_MAX_FW_CTRS) > RISCV_MAX_COUNTERS) {
> + pr_warn("Limiting fw counters as hw & fw counters exceed maximum counters\n");
How is this possible ?
Maximum HW counters is 32 (including time, cycle, and instret),
RISCV_KVM_MAX_FW_CTRS = 32, and
RISCV_MAX_COUNTERS = 64
> + num_fw_ctrs = RISCV_MAX_COUNTERS - num_hw_ctrs;
> + } else
> + num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;
> +
> + kvpmu->num_hw_ctrs = num_hw_ctrs;
> + kvpmu->num_fw_ctrs = num_fw_ctrs;
> +
> + /*
> + * There is no correlation between the logical hardware counter and virtual counters.
> + * However, we need to encode a hpmcounter CSR in the counter info field so that
> + * KVM can trap n emulate the read. This works well in the migration use case as
> + * KVM doesn't care if the actual hpmcounter is available in the hardware or not.
> + */
> + for (i = 0; i < kvm_pmu_num_counters(kvpmu); i++) {
> + /* TIME CSR shouldn't be read from perf interface */
> + if (i == 1)
> + continue;
> + pmc = &kvpmu->pmc[i];
> + pmc->idx = i;
> + if (i < kvpmu->num_hw_ctrs) {
> + kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
> + if (i < 3)
> + /* CY, IR counters */
> + kvpmu->pmc[i].cinfo.width = 63;
> + else
> + kvpmu->pmc[i].cinfo.width = hpm_width;
> + /*
> + * The CSR number doesn't have any relation with the logical
> + * hardware counters. The CSR numbers are encoded sequentially
> + * to avoid maintaining a map between the virtual counter
> + * and CSR number.
> + */
> + pmc->cinfo.csr = CSR_CYCLE + i;
> + } else {
> + pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
> + pmc->cinfo.width = BITS_PER_LONG - 1;
> + }
> + }
> +
> + kvpmu->init_done = true;
> +
> + return 0;
> +}
> +
> +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> +{
> + /* TODO */
> +}
> +
> +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> +{
> + kvm_riscv_vcpu_pmu_deinit(vcpu);
> +}
> --
> 2.25.1
>
Regards,
Anup
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> SBI PMU extension allows KVM guests to configure/start/stop/query about
> the PMU counters in virtualized enviornment as well.
>
> In order to allow that, KVM implements the entire SBI PMU extension.
>
> Signed-off-by: Atish Patra <[email protected]>
Looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> arch/riscv/kvm/Makefile | 2 +-
> arch/riscv/kvm/vcpu_sbi.c | 11 +++++
> arch/riscv/kvm/vcpu_sbi_pmu.c | 86 +++++++++++++++++++++++++++++++++++
> 3 files changed, 98 insertions(+), 1 deletion(-)
> create mode 100644 arch/riscv/kvm/vcpu_sbi_pmu.c
>
> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> index 5de1053..278e97c 100644
> --- a/arch/riscv/kvm/Makefile
> +++ b/arch/riscv/kvm/Makefile
> @@ -25,4 +25,4 @@ kvm-y += vcpu_sbi_base.o
> kvm-y += vcpu_sbi_replace.o
> kvm-y += vcpu_sbi_hsm.o
> kvm-y += vcpu_timer.o
> -kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
> +kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o vcpu_sbi_pmu.o
> diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> index aa42da6..04a3b4b 100644
> --- a/arch/riscv/kvm/vcpu_sbi.c
> +++ b/arch/riscv/kvm/vcpu_sbi.c
> @@ -20,6 +20,16 @@ static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_v01 = {
> };
> #endif
>
> +#ifdef CONFIG_RISCV_PMU_SBI
> +extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu;
> +#else
> +static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu = {
> + .extid_start = -1UL,
> + .extid_end = -1UL,
> + .handler = NULL,
> +};
> +#endif
> +
> static const struct kvm_vcpu_sbi_extension *sbi_ext[] = {
> &vcpu_sbi_ext_v01,
> &vcpu_sbi_ext_base,
> @@ -28,6 +38,7 @@ static const struct kvm_vcpu_sbi_extension *sbi_ext[] = {
> &vcpu_sbi_ext_rfence,
> &vcpu_sbi_ext_srst,
> &vcpu_sbi_ext_hsm,
> + &vcpu_sbi_ext_pmu,
> &vcpu_sbi_ext_experimental,
> &vcpu_sbi_ext_vendor,
> };
> diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
> new file mode 100644
> index 0000000..73aab30
> --- /dev/null
> +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2023 Rivos Inc
> + *
> + * Authors:
> + * Atish Patra <[email protected]>
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/kvm_host.h>
> +#include <asm/csr.h>
> +#include <asm/sbi.h>
> +#include <asm/kvm_vcpu_sbi.h>
> +
> +static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> + struct kvm_vcpu_sbi_ext_data *edata,
> + struct kvm_cpu_trap *utrap)
> +{
> + int ret = 0;
> + struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + unsigned long funcid = cp->a6;
> + uint64_t temp;
> +
> + /* Return not supported if PMU is not initialized */
> + if (!kvpmu->init_done)
> + return -EINVAL;
> +
> + switch (funcid) {
> + case SBI_EXT_PMU_NUM_COUNTERS:
> + ret = kvm_riscv_vcpu_pmu_num_ctrs(vcpu, edata);
> + break;
> + case SBI_EXT_PMU_COUNTER_GET_INFO:
> + ret = kvm_riscv_vcpu_pmu_ctr_info(vcpu, cp->a0, edata);
> + break;
> + case SBI_EXT_PMU_COUNTER_CFG_MATCH:
> +#if defined(CONFIG_32BIT)
> + temp = ((uint64_t)cp->a5 << 32) | cp->a4;
> +#else
> + temp = cp->a4;
> +#endif
> + /*
> + * This can fail if perf core framework fails to create an event.
> + * Forward the error to the user space because its an error happened
> + * within host kernel. The other option would be convert this to
> + * an SBI error and forward to the guest.
> + */
> + ret = kvm_riscv_vcpu_pmu_ctr_cfg_match(vcpu, cp->a0, cp->a1,
> + cp->a2, cp->a3, temp, edata);
> + break;
> + case SBI_EXT_PMU_COUNTER_START:
> +#if defined(CONFIG_32BIT)
> + temp = ((uint64_t)cp->a4 << 32) | cp->a3;
> +#else
> + temp = cp->a3;
> +#endif
> + ret = kvm_riscv_vcpu_pmu_ctr_start(vcpu, cp->a0, cp->a1, cp->a2,
> + temp, edata);
> + break;
> + case SBI_EXT_PMU_COUNTER_STOP:
> + ret = kvm_riscv_vcpu_pmu_ctr_stop(vcpu, cp->a0, cp->a1, cp->a2, edata);
> + break;
> + case SBI_EXT_PMU_COUNTER_FW_READ:
> + ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, edata);
> + break;
> + default:
> + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> + }
> +
> + return ret;
> +}
> +
> +unsigned long kvm_sbi_ext_pmu_probe(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> +
> + return kvpmu->init_done;
> +}
> +
> +const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu = {
> + .extid_start = SBI_EXT_PMU,
> + .extid_end = SBI_EXT_PMU,
> + .handler = kvm_sbi_ext_pmu_handler,
> + .probe = kvm_sbi_ext_pmu_probe,
> +};
> --
> 2.25.1
>
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> The privilege mode filtering feature must be available in the host so
> that the host can inhibit the counters while the execution is in HS mode.
> Otherwise, the guests may have access to critical guest information.
>
> Signed-off-by: Atish Patra <[email protected]>
Looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> arch/riscv/kvm/vcpu_pmu.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index d3fd551..7713927 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -79,6 +79,14 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> struct kvm_pmc *pmc;
>
> + /*
> + * PMU functionality should be only available to guests if privilege mode
> + * filtering is available in the host. Otherwise, guest will always count
> + * events while the execution is in hypervisor mode.
> + */
> + if (!riscv_isa_extension_available(NULL, SSCOFPMF))
> + return 0;
> +
> ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
> if (ret < 0)
> return ret;
> --
> 2.25.1
>
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> Any guest must not get access to any hpmcounter including cycle/instret
> without any checks. We achieve that by disabling all the bits except TM
> bit in hcounteren.
>
> However, instret and cycle access for guest user space can be enabled
> upon explicit request (via ONE REG) or on first trap from VU mode
> to maintain ABI requirement in the future. This patch doesn't support
> that as ONE REG interface is not settled yet.
>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
Looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> arch/riscv/kvm/main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
> index 58c5489..c5d400f 100644
> --- a/arch/riscv/kvm/main.c
> +++ b/arch/riscv/kvm/main.c
> @@ -49,7 +49,8 @@ int kvm_arch_hardware_enable(void)
> hideleg |= (1UL << IRQ_VS_EXT);
> csr_write(CSR_HIDELEG, hideleg);
>
> - csr_write(CSR_HCOUNTEREN, -1UL);
> + /* VS should access only the time counter directly. Everything else should trap */
> + csr_write(CSR_HCOUNTEREN, 0x02);
>
> csr_write(CSR_HVIP, 0);
>
> --
> 2.25.1
>
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> As the KVM guests only see the virtual PMU counters, all hpmcounter
> access should trap and KVM emulates the read access on behalf of guests.
>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 ++++++++++
> arch/riscv/kvm/vcpu_insn.c | 4 ++-
> arch/riscv/kvm/vcpu_pmu.c | 45 ++++++++++++++++++++++++++-
> 3 files changed, 63 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> index 3f43a43..022d45d 100644
> --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
> +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> @@ -43,6 +43,19 @@ struct kvm_pmu {
> #define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
> #define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
>
> +#if defined(CONFIG_32BIT)
> +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> +{ .base = CSR_CYCLEH, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm }, \
> +{ .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
> +#else
> +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> +{ .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
> +#endif
> +
> +int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> + unsigned long *val, unsigned long new_val,
> + unsigned long wr_mask);
> +
> int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
> int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> struct kvm_vcpu_sbi_ext_data *edata);
> @@ -65,6 +78,9 @@ void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> #else
> struct kvm_pmu {
> };
> +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> +{ .base = 0, .count = 0, .func = NULL },
> +
Redundant newline here.
>
> static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> {
> diff --git a/arch/riscv/kvm/vcpu_insn.c b/arch/riscv/kvm/vcpu_insn.c
> index 0bb5276..f689337 100644
> --- a/arch/riscv/kvm/vcpu_insn.c
> +++ b/arch/riscv/kvm/vcpu_insn.c
> @@ -213,7 +213,9 @@ struct csr_func {
> unsigned long wr_mask);
> };
>
> -static const struct csr_func csr_funcs[] = { };
> +static const struct csr_func csr_funcs[] = {
> + KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS
> +};
>
> /**
> * kvm_riscv_vcpu_csr_return -- Handle CSR read/write after user space
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index 7713927..894053a 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -17,6 +17,44 @@
>
> #define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
>
> +static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> + unsigned long *out_val)
> +{
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + struct kvm_pmc *pmc;
> + u64 enabled, running;
> +
> + pmc = &kvpmu->pmc[cidx];
> + if (!pmc->perf_event)
> + return -EINVAL;
> +
> + pmc->counter_val += perf_event_read_value(pmc->perf_event, &enabled, &running);
> + *out_val = pmc->counter_val;
> +
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> + unsigned long *val, unsigned long new_val,
> + unsigned long wr_mask)
> +{
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + int cidx, ret = KVM_INSN_CONTINUE_NEXT_SEPC;
> +
> + if (!kvpmu || !kvpmu->init_done)
> + return KVM_INSN_EXIT_TO_USER_SPACE;
As discussed previously, this should be KVM_INSN_ILLEGAL_TRAP.
> +
> + if (wr_mask)
> + return KVM_INSN_ILLEGAL_TRAP;
> +
> + cidx = csr_num - CSR_CYCLE;
> +
> + if (pmu_ctr_read(vcpu, cidx, val) < 0)
> + return KVM_INSN_EXIT_TO_USER_SPACE;
Same as above.
> +
> + return ret;
> +}
> +
> int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
> {
> struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> @@ -69,7 +107,12 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> struct kvm_vcpu_sbi_ext_data *edata)
> {
> - /* TODO */
> + int ret;
> +
> + ret = pmu_ctr_read(vcpu, cidx, &edata->out_val);
> + if (ret == -EINVAL)
> + edata->err_val = SBI_ERR_INVALID_PARAM;
> +
> return 0;
> }
>
> --
> 2.25.1
>
Regards,
Anup
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> RISC-V SBI PMU & Sscofpmf ISA extension allows supporting perf in
> the virtualization enviornment as well. KVM implementation
> relies on SBI PMU extension for the most part while trapping
> & emulating the CSRs read for counter access.
>
> This patch doesn't have the event sampling support yet.
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/kvm/vcpu_pmu.c | 366 +++++++++++++++++++++++++++++++++++++-
> 1 file changed, 360 insertions(+), 6 deletions(-)
>
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index 894053a..73dccf7 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -12,10 +12,190 @@
> #include <linux/perf/riscv_pmu.h>
> #include <asm/csr.h>
> #include <asm/kvm_vcpu_sbi.h>
> +#include <asm/bitops.h>
> #include <asm/kvm_vcpu_pmu.h>
> #include <linux/kvm_host.h>
>
> #define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> +#define get_event_type(x) (((x) & SBI_PMU_EVENT_IDX_TYPE_MASK) >> 16)
> +#define get_event_code(x) ((x) & SBI_PMU_EVENT_IDX_CODE_MASK)
> +
> +
Redundant newline.
> +static enum perf_hw_id hw_event_perf_map[SBI_PMU_HW_GENERAL_MAX] = {
> + [SBI_PMU_HW_CPU_CYCLES] = PERF_COUNT_HW_CPU_CYCLES,
> + [SBI_PMU_HW_INSTRUCTIONS] = PERF_COUNT_HW_INSTRUCTIONS,
> + [SBI_PMU_HW_CACHE_REFERENCES] = PERF_COUNT_HW_CACHE_REFERENCES,
> + [SBI_PMU_HW_CACHE_MISSES] = PERF_COUNT_HW_CACHE_MISSES,
> + [SBI_PMU_HW_BRANCH_INSTRUCTIONS] = PERF_COUNT_HW_BRANCH_INSTRUCTIONS,
> + [SBI_PMU_HW_BRANCH_MISSES] = PERF_COUNT_HW_BRANCH_MISSES,
> + [SBI_PMU_HW_BUS_CYCLES] = PERF_COUNT_HW_BUS_CYCLES,
> + [SBI_PMU_HW_STALLED_CYCLES_FRONTEND] = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND,
> + [SBI_PMU_HW_STALLED_CYCLES_BACKEND] = PERF_COUNT_HW_STALLED_CYCLES_BACKEND,
> + [SBI_PMU_HW_REF_CPU_CYCLES] = PERF_COUNT_HW_REF_CPU_CYCLES,
> +};
> +
> +static u64 kvm_pmu_get_sample_period(struct kvm_pmc *pmc)
> +{
> + u64 counter_val_mask = GENMASK(pmc->cinfo.width, 0);
> + u64 sample_period;
> +
> + if (!pmc->counter_val)
> + sample_period = counter_val_mask + 1;
> + else
> + sample_period = (-pmc->counter_val) & counter_val_mask;
> +
> + return sample_period;
> +}
> +
> +static u32 kvm_pmu_get_perf_event_type(unsigned long eidx)
> +{
> + enum sbi_pmu_event_type etype = get_event_type(eidx);
> + u32 type = PERF_TYPE_MAX;
> +
> + switch (etype) {
> + case SBI_PMU_EVENT_TYPE_HW:
> + type = PERF_TYPE_HARDWARE;
> + break;
> + case SBI_PMU_EVENT_TYPE_CACHE:
> + type = PERF_TYPE_HW_CACHE;
> + break;
> + case SBI_PMU_EVENT_TYPE_RAW:
> + case SBI_PMU_EVENT_TYPE_FW:
> + type = PERF_TYPE_RAW;
> + break;
> + default:
> + break;
> + }
> +
> + return type;
> +}
> +
> +static bool kvm_pmu_is_fw_event(unsigned long eidx)
> +{
> + return get_event_type(eidx) == SBI_PMU_EVENT_TYPE_FW;
> +}
> +
> +static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc)
> +{
> + if (pmc->perf_event) {
> + perf_event_disable(pmc->perf_event);
> + perf_event_release_kernel(pmc->perf_event);
> + pmc->perf_event = NULL;
> + }
> +}
> +
> +static u64 kvm_pmu_get_perf_event_hw_config(u32 sbi_event_code)
> +{
> + return hw_event_perf_map[sbi_event_code];
> +}
> +
> +static u64 kvm_pmu_get_perf_event_cache_config(u32 sbi_event_code)
> +{
> + u64 config = U64_MAX;
> + unsigned int cache_type, cache_op, cache_result;
> +
> + /* All the cache event masks lie within 0xFF. No separate masking is necesssary */
s/necesssary/necessary/
> + cache_type = (sbi_event_code & SBI_PMU_EVENT_CACHE_ID_CODE_MASK) >>
> + SBI_PMU_EVENT_CACHE_ID_SHIFT;
> + cache_op = (sbi_event_code & SBI_PMU_EVENT_CACHE_OP_ID_CODE_MASK) >>
> + SBI_PMU_EVENT_CACHE_OP_SHIFT;
> + cache_result = sbi_event_code & SBI_PMU_EVENT_CACHE_RESULT_ID_CODE_MASK;
> +
> + if (cache_type >= PERF_COUNT_HW_CACHE_MAX ||
> + cache_op >= PERF_COUNT_HW_CACHE_OP_MAX ||
> + cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
> + return config;
> +
> + config = cache_type | (cache_op << 8) | (cache_result << 16);
> +
> + return config;
> +}
> +
> +static u64 kvm_pmu_get_perf_event_config(unsigned long eidx, uint64_t evt_data)
> +{
> + enum sbi_pmu_event_type etype = get_event_type(eidx);
> + u32 ecode = get_event_code(eidx);
> + u64 config = U64_MAX;
> +
> + switch (etype) {
> + case SBI_PMU_EVENT_TYPE_HW:
> + if (ecode < SBI_PMU_HW_GENERAL_MAX)
> + config = kvm_pmu_get_perf_event_hw_config(ecode);
> + break;
> + case SBI_PMU_EVENT_TYPE_CACHE:
> + config = kvm_pmu_get_perf_event_cache_config(ecode);
> + break;
> + case SBI_PMU_EVENT_TYPE_RAW:
> + config = evt_data & RISCV_PMU_RAW_EVENT_MASK;
> + break;
> + case SBI_PMU_EVENT_TYPE_FW:
> + if (ecode < SBI_PMU_FW_MAX)
> + config = (1ULL << 63) | ecode;
> + break;
> + default:
> + break;
> + }
> +
> + return config;
> +}
> +
> +static int kvm_pmu_get_fixed_pmc_index(unsigned long eidx)
> +{
> + u32 etype = kvm_pmu_get_perf_event_type(eidx);
> + u32 ecode = get_event_code(eidx);
> +
> + if (etype != SBI_PMU_EVENT_TYPE_HW)
> + return -EINVAL;
> +
> + if (ecode == SBI_PMU_HW_CPU_CYCLES)
> + return 0;
> + else if (ecode == SBI_PMU_HW_INSTRUCTIONS)
> + return 2;
> + else
> + return -EINVAL;
> +}
> +
> +static int kvm_pmu_get_programmable_pmc_index(struct kvm_pmu *kvpmu, unsigned long eidx,
> + unsigned long cbase, unsigned long cmask)
> +{
> + int ctr_idx = -1;
> + int i, pmc_idx;
> + int min, max;
> +
> + if (kvm_pmu_is_fw_event(eidx)) {
> + /* Firmware counters are mapped 1:1 starting from num_hw_ctrs for simplicity */
> + min = kvpmu->num_hw_ctrs;
> + max = min + kvpmu->num_fw_ctrs;
> + } else {
> + /* First 3 counters are reserved for fixed counters */
> + min = 3;
> + max = kvpmu->num_hw_ctrs;
> + }
> +
> + for_each_set_bit(i, &cmask, BITS_PER_LONG) {
> + pmc_idx = i + cbase;
> + if ((pmc_idx >= min && pmc_idx < max) &&
> + !test_bit(pmc_idx, kvpmu->pmc_in_use)) {
> + ctr_idx = pmc_idx;
> + break;
> + }
> + }
> +
> + return ctr_idx;
> +}
> +
> +static int pmu_get_pmc_index(struct kvm_pmu *pmu, unsigned long eidx,
> + unsigned long cbase, unsigned long cmask)
> +{
> + int ret;
> +
> + /* Fixed counters need to be have fixed mapping as they have different width */
> + ret = kvm_pmu_get_fixed_pmc_index(eidx);
> + if (ret >= 0)
> + return ret;
> +
> + return kvm_pmu_get_programmable_pmc_index(pmu, eidx, cbase, cmask);
> +}
>
> static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> unsigned long *out_val)
> @@ -34,6 +214,16 @@ static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> return 0;
> }
>
> +static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ctr_base,
> + unsigned long ctr_mask)
> +{
> + /* Make sure the we have a valid counter mask requested from the caller */
> + if (!ctr_mask || (ctr_base + __fls(ctr_mask) >= kvm_pmu_num_counters(kvpmu)))
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> unsigned long *val, unsigned long new_val,
> unsigned long wr_mask)
> @@ -83,7 +273,39 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> struct kvm_vcpu_sbi_ext_data *edata)
> {
> - /* TODO */
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + int i, pmc_index, sbiret = 0;
> + struct kvm_pmc *pmc;
> +
> + if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
> + sbiret = SBI_ERR_INVALID_PARAM;
> + goto out;
> + }
> +
> + /* Start the counters that have been configured and requested by the guest */
> + for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
> + pmc_index = i + ctr_base;
> + if (!test_bit(pmc_index, kvpmu->pmc_in_use))
> + continue;
> + pmc = &kvpmu->pmc[pmc_index];
> + if (flag & SBI_PMU_START_FLAG_SET_INIT_VALUE)
> + pmc->counter_val = ival;
> + if (pmc->perf_event) {
> + if (unlikely(pmc->started)) {
> + sbiret = SBI_ERR_ALREADY_STARTED;
> + continue;
> + }
> + perf_event_period(pmc->perf_event, kvm_pmu_get_sample_period(pmc));
> + perf_event_enable(pmc->perf_event);
> + pmc->started = true;
> + } else {
> + sbiret = SBI_ERR_INVALID_PARAM;
> + }
> + }
> +
> +out:
> + edata->err_val = sbiret;
> +
> return 0;
> }
>
> @@ -91,7 +313,45 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> unsigned long ctr_mask, unsigned long flag,
> struct kvm_vcpu_sbi_ext_data *edata)
> {
> - /* TODO */
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + int i, pmc_index, sbiret = 0;
> + u64 enabled, running;
> + struct kvm_pmc *pmc;
> +
> + if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
> + sbiret = SBI_ERR_INVALID_PARAM;
> + goto out;
> + }
> +
> + /* Stop the counters that have been configured and requested by the guest */
> + for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
> + pmc_index = i + ctr_base;
> + if (!test_bit(pmc_index, kvpmu->pmc_in_use))
> + continue;
> + pmc = &kvpmu->pmc[pmc_index];
> + if (pmc->perf_event) {
> + if (pmc->started) {
> + /* Stop counting the counter */
> + perf_event_disable(pmc->perf_event);
> + pmc->started = false;
> + } else
> + sbiret = SBI_ERR_ALREADY_STOPPED;
> +
> + if (flag & SBI_PMU_STOP_FLAG_RESET) {
> + /* Relase the counter if this is a reset request */
> + pmc->counter_val += perf_event_read_value(pmc->perf_event,
> + &enabled, &running);
> + kvm_pmu_release_perf_event(pmc);
> + clear_bit(pmc_index, kvpmu->pmc_in_use);
> + }
> + } else {
> + sbiret = SBI_ERR_INVALID_PARAM;
> + }
> + }
> +
> +out:
> + edata->err_val = sbiret;
> +
> return 0;
> }
>
> @@ -100,7 +360,89 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> unsigned long eidx, uint64_t evtdata,
> struct kvm_vcpu_sbi_ext_data *edata)
> {
> - /* TODO */
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + struct perf_event *event;
> + int ctr_idx;
> + u32 etype = kvm_pmu_get_perf_event_type(eidx);
> + u64 config;
> + struct kvm_pmc *pmc;
> + int sbiret = 0;
Organize this local variables in better way.
> + struct perf_event_attr attr = {
> + .type = etype,
> + .size = sizeof(struct perf_event_attr),
> + .pinned = true,
> + /*
> + * It should never reach here if the platform doesn't support the sscofpmf
> + * extension as mode filtering won't work without it.
> + */
> + .exclude_host = true,
> + .exclude_hv = true,
> + .exclude_user = !!(flag & SBI_PMU_CFG_FLAG_SET_UINH),
> + .exclude_kernel = !!(flag & SBI_PMU_CFG_FLAG_SET_SINH),
> + .config1 = RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS,
> + };
> +
> + if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
> + sbiret = SBI_ERR_INVALID_PARAM;
> + goto out;
> + }
> +
> + if (kvm_pmu_is_fw_event(eidx)) {
> + sbiret = SBI_ERR_NOT_SUPPORTED;
> + goto out;
> + }
> +
> + /*
> + * SKIP_MATCH flag indicates the caller is aware of the assigned counter
> + * for this event. Just do a sanity check if it already marked used.
> + */
> + if (flag & SBI_PMU_CFG_FLAG_SKIP_MATCH) {
> + if (!test_bit(ctr_base + __ffs(ctr_mask), kvpmu->pmc_in_use)) {
> + sbiret = SBI_ERR_FAILURE;
> + goto out;
> + }
> + ctr_idx = ctr_base + __ffs(ctr_mask);
> + } else {
> +
> + ctr_idx = pmu_get_pmc_index(kvpmu, eidx, ctr_base, ctr_mask);
> + if (ctr_idx < 0) {
> + sbiret = SBI_ERR_NOT_SUPPORTED;
> + goto out;
> + }
> + }
> +
> + pmc = &kvpmu->pmc[ctr_idx];
> + kvm_pmu_release_perf_event(pmc);
> + pmc->idx = ctr_idx;
> +
> + config = kvm_pmu_get_perf_event_config(eidx, evtdata);
> + attr.config = config;
> + if (flag & SBI_PMU_CFG_FLAG_CLEAR_VALUE) {
> + //TODO: Do we really want to clear the value in hardware counter
> + pmc->counter_val = 0;
> + }
> +
> + /*
> + * Set the default sample_period for now. The guest specified value
> + * will be updated in the start call.
> + */
> + attr.sample_period = kvm_pmu_get_sample_period(pmc);
> +
> + event = perf_event_create_kernel_counter(&attr, -1, current, NULL, pmc);
> + if (IS_ERR(event)) {
> + pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
> + return PTR_ERR(event);
> + }
> +
> + set_bit(ctr_idx, kvpmu->pmc_in_use);
> + pmc->perf_event = event;
> + if (flag & SBI_PMU_CFG_FLAG_AUTO_START)
> + perf_event_enable(pmc->perf_event);
> +
> + edata->out_val = ctr_idx;
> +out:
> + edata->err_val = sbiret;
> +
> return 0;
> }
>
> @@ -164,9 +506,9 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
> if (i < 3)
> /* CY, IR counters */
> - kvpmu->pmc[i].cinfo.width = 63;
> + pmc->cinfo.width = 63;
> else
> - kvpmu->pmc[i].cinfo.width = hpm_width;
> + pmc->cinfo.width = hpm_width;
> /*
> * The CSR number doesn't have any relation with the logical
> * hardware counters. The CSR numbers are encoded sequentially
> @@ -187,7 +529,19 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
>
> void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> {
> - /* TODO */
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + struct kvm_pmc *pmc;
> + int i;
> +
> + if (!kvpmu)
> + return;
> +
> + for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_MAX_COUNTERS) {
> + pmc = &kvpmu->pmc[i];
> + pmc->counter_val = 0;
> + kvm_pmu_release_perf_event(pmc);
> + }
> + bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
> }
>
> void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> --
> 2.25.1
>
Apart from minor nits, it looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> SBI PMU extension defines a set of firmware events which can provide
> useful information to guests about number of SBI calls. As hypervisor
s/about number/about the number/
> implements the SBI PMU extension, these firmware events corresponds
s/corresponds/correspond/
> to ecall invocations between VS->HS mode. All other firmware events
> will always report zero if monitored as KVM doesn't implement them.
>
> This patch adds all the infrastructure required to support firmware
> events.
>
> Signed-off-by: Atish Patra <[email protected]>
Otherwise, it looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 +++
> arch/riscv/kvm/vcpu_pmu.c | 144 +++++++++++++++++++-------
> 2 files changed, 124 insertions(+), 36 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> index 022d45d..b235e7e 100644
> --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
> +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> @@ -17,6 +17,14 @@
> #define RISCV_KVM_MAX_FW_CTRS 32
> #define RISCV_MAX_COUNTERS 64
>
> +struct kvm_fw_event {
> + /* Current value of the event */
> + unsigned long value;
> +
> + /* Event monitoring status */
> + bool started;
> +};
> +
> /* Per virtual pmu counter data */
> struct kvm_pmc {
> u8 idx;
> @@ -25,11 +33,14 @@ struct kvm_pmc {
> union sbi_pmu_ctr_info cinfo;
> /* Event monitoring status */
> bool started;
> + /* Monitoring event ID */
> + unsigned long event_idx;
> };
>
> /* PMU data structure per vcpu */
> struct kvm_pmu {
> struct kvm_pmc pmc[RISCV_MAX_COUNTERS];
> + struct kvm_fw_event fw_event[RISCV_KVM_MAX_FW_CTRS];
> /* Number of the virtual firmware counters available */
> int num_fw_ctrs;
> /* Number of the virtual hardware counters available */
> @@ -52,6 +63,7 @@ struct kvm_pmu {
> { .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
> #endif
>
> +int kvm_riscv_vcpu_pmu_incr_fw(struct kvm_vcpu *vcpu, unsigned long fid);
> int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> unsigned long *val, unsigned long new_val,
> unsigned long wr_mask);
> @@ -81,6 +93,10 @@ struct kvm_pmu {
> #define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> { .base = 0, .count = 0, .func = NULL },
>
> +static inline int kvm_riscv_vcpu_pmu_incr_fw(struct kvm_vcpu *vcpu, unsigned long fid)
> +{
> + return 0;
> +}
>
> static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> {
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index 73dccf7..b8d6aba 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -203,12 +203,15 @@ static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> struct kvm_pmc *pmc;
> u64 enabled, running;
> + int fevent_code;
>
> pmc = &kvpmu->pmc[cidx];
> - if (!pmc->perf_event)
> - return -EINVAL;
>
> - pmc->counter_val += perf_event_read_value(pmc->perf_event, &enabled, &running);
> + if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
> + fevent_code = get_event_code(pmc->event_idx);
> + pmc->counter_val = kvpmu->fw_event[fevent_code].value;
> + } else if (pmc->perf_event)
> + pmc->counter_val += perf_event_read_value(pmc->perf_event, &enabled, &running);
> *out_val = pmc->counter_val;
>
> return 0;
> @@ -224,6 +227,55 @@ static int kvm_pmu_validate_counter_mask(struct kvm_pmu *kvpmu, unsigned long ct
> return 0;
> }
>
> +static int kvm_pmu_create_perf_event(struct kvm_pmc *pmc, int ctr_idx,
> + struct perf_event_attr *attr, unsigned long flag,
> + unsigned long eidx, unsigned long evtdata)
> +{
> + struct perf_event *event;
> +
> + kvm_pmu_release_perf_event(pmc);
> + pmc->idx = ctr_idx;
> +
> + attr->config = kvm_pmu_get_perf_event_config(eidx, evtdata);
> + if (flag & SBI_PMU_CFG_FLAG_CLEAR_VALUE) {
> + //TODO: Do we really want to clear the value in hardware counter
> + pmc->counter_val = 0;
> + }
> +
> + /*
> + * Set the default sample_period for now. The guest specified value
> + * will be updated in the start call.
> + */
> + attr->sample_period = kvm_pmu_get_sample_period(pmc);
> +
> + event = perf_event_create_kernel_counter(attr, -1, current, NULL, pmc);
> + if (IS_ERR(event)) {
> + pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
> + return PTR_ERR(event);
> + }
> +
> + pmc->perf_event = event;
> + if (flag & SBI_PMU_CFG_FLAG_AUTO_START)
> + perf_event_enable(pmc->perf_event);
> +
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_incr_fw(struct kvm_vcpu *vcpu, unsigned long fid)
> +{
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + struct kvm_fw_event *fevent;
> +
> + if (!kvpmu || fid >= SBI_PMU_FW_MAX)
> + return -EINVAL;
> +
> + fevent = &kvpmu->fw_event[fid];
> + if (fevent->started)
> + fevent->value++;
> +
> + return 0;
> +}
> +
> int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> unsigned long *val, unsigned long new_val,
> unsigned long wr_mask)
> @@ -276,6 +328,7 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> int i, pmc_index, sbiret = 0;
> struct kvm_pmc *pmc;
> + int fevent_code;
>
> if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
> sbiret = SBI_ERR_INVALID_PARAM;
> @@ -290,7 +343,22 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> pmc = &kvpmu->pmc[pmc_index];
> if (flag & SBI_PMU_START_FLAG_SET_INIT_VALUE)
> pmc->counter_val = ival;
> - if (pmc->perf_event) {
> + if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
> + fevent_code = get_event_code(pmc->event_idx);
> + if (fevent_code >= SBI_PMU_FW_MAX) {
> + sbiret = SBI_ERR_INVALID_PARAM;
> + goto out;
> + }
> +
> + /* Check if the counter was already started for some reason */
> + if (kvpmu->fw_event[fevent_code].started) {
> + sbiret = SBI_ERR_ALREADY_STARTED;
> + continue;
> + }
> +
> + kvpmu->fw_event[fevent_code].started = true;
> + kvpmu->fw_event[fevent_code].value = pmc->counter_val;
> + } else if (pmc->perf_event) {
> if (unlikely(pmc->started)) {
> sbiret = SBI_ERR_ALREADY_STARTED;
> continue;
> @@ -317,6 +385,7 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> int i, pmc_index, sbiret = 0;
> u64 enabled, running;
> struct kvm_pmc *pmc;
> + int fevent_code;
>
> if (kvm_pmu_validate_counter_mask(kvpmu, ctr_base, ctr_mask) < 0) {
> sbiret = SBI_ERR_INVALID_PARAM;
> @@ -329,7 +398,18 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> if (!test_bit(pmc_index, kvpmu->pmc_in_use))
> continue;
> pmc = &kvpmu->pmc[pmc_index];
> - if (pmc->perf_event) {
> + if (pmc->cinfo.type == SBI_PMU_CTR_TYPE_FW) {
> + fevent_code = get_event_code(pmc->event_idx);
> + if (fevent_code >= SBI_PMU_FW_MAX) {
> + sbiret = SBI_ERR_INVALID_PARAM;
> + goto out;
> + }
> +
> + if (!kvpmu->fw_event[fevent_code].started)
> + sbiret = SBI_ERR_ALREADY_STOPPED;
> +
> + kvpmu->fw_event[fevent_code].started = false;
> + } else if (pmc->perf_event) {
> if (pmc->started) {
> /* Stop counting the counter */
> perf_event_disable(pmc->perf_event);
> @@ -342,11 +422,14 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> pmc->counter_val += perf_event_read_value(pmc->perf_event,
> &enabled, &running);
> kvm_pmu_release_perf_event(pmc);
> - clear_bit(pmc_index, kvpmu->pmc_in_use);
> }
> } else {
> sbiret = SBI_ERR_INVALID_PARAM;
> }
> + if (flag & SBI_PMU_STOP_FLAG_RESET) {
> + pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
> + clear_bit(pmc_index, kvpmu->pmc_in_use);
> + }
> }
>
> out:
> @@ -361,12 +444,11 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> struct kvm_vcpu_sbi_ext_data *edata)
> {
> struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> - struct perf_event *event;
> - int ctr_idx;
> + int ctr_idx, sbiret = 0, ret;
> u32 etype = kvm_pmu_get_perf_event_type(eidx);
> - u64 config;
> - struct kvm_pmc *pmc;
> - int sbiret = 0;
> + struct kvm_pmc *pmc = NULL;
> + bool is_fevent;
> + unsigned long event_code;
> struct perf_event_attr attr = {
> .type = etype,
> .size = sizeof(struct perf_event_attr),
> @@ -387,7 +469,9 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> goto out;
> }
>
> - if (kvm_pmu_is_fw_event(eidx)) {
> + event_code = get_event_code(eidx);
> + is_fevent = kvm_pmu_is_fw_event(eidx);
> + if (is_fevent && event_code >= SBI_PMU_FW_MAX) {
> sbiret = SBI_ERR_NOT_SUPPORTED;
> goto out;
> }
> @@ -412,33 +496,17 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> }
>
> pmc = &kvpmu->pmc[ctr_idx];
> - kvm_pmu_release_perf_event(pmc);
> - pmc->idx = ctr_idx;
> -
> - config = kvm_pmu_get_perf_event_config(eidx, evtdata);
> - attr.config = config;
> - if (flag & SBI_PMU_CFG_FLAG_CLEAR_VALUE) {
> - //TODO: Do we really want to clear the value in hardware counter
> - pmc->counter_val = 0;
> - }
> -
> - /*
> - * Set the default sample_period for now. The guest specified value
> - * will be updated in the start call.
> - */
> - attr.sample_period = kvm_pmu_get_sample_period(pmc);
> -
> - event = perf_event_create_kernel_counter(&attr, -1, current, NULL, pmc);
> - if (IS_ERR(event)) {
> - pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
> - return PTR_ERR(event);
> + if (is_fevent) {
> + if (flag & SBI_PMU_CFG_FLAG_AUTO_START)
> + kvpmu->fw_event[event_code].started = true;
> + } else {
> + ret = kvm_pmu_create_perf_event(pmc, ctr_idx, &attr, flag, eidx, evtdata);
> + if (ret)
> + return ret;
> }
>
> set_bit(ctr_idx, kvpmu->pmc_in_use);
> - pmc->perf_event = event;
> - if (flag & SBI_PMU_CFG_FLAG_AUTO_START)
> - perf_event_enable(pmc->perf_event);
> -
> + pmc->event_idx = eidx;
> edata->out_val = ctr_idx;
> out:
> edata->err_val = sbiret;
> @@ -489,6 +557,7 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
>
> kvpmu->num_hw_ctrs = num_hw_ctrs;
> kvpmu->num_fw_ctrs = num_fw_ctrs;
> + memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
>
> /*
> * There is no correlation between the logical hardware counter and virtual counters.
> @@ -502,6 +571,7 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> continue;
> pmc = &kvpmu->pmc[i];
> pmc->idx = i;
> + pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
> if (i < kvpmu->num_hw_ctrs) {
> kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
> if (i < 3)
> @@ -540,8 +610,10 @@ void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> pmc = &kvpmu->pmc[i];
> pmc->counter_val = 0;
> kvm_pmu_release_perf_event(pmc);
> + pmc->event_idx = SBI_PMU_EVENT_IDX_INVALID;
> }
> bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
> + memset(&kvpmu->fw_event, 0, SBI_PMU_FW_MAX * sizeof(struct kvm_fw_event));
> }
>
> void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> --
> 2.25.1
>
On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
>
> KVM supports firmware events now. Invoke the firmware event increment
> function from appropriate places.
>
> Signed-off-by: Atish Patra <[email protected]>
Looks good to me.
Reviewed-by: Anup Patel <[email protected]>
Regards,
Anup
> ---
> arch/riscv/kvm/tlb.c | 4 ++++
> arch/riscv/kvm/vcpu_sbi_replace.c | 7 +++++++
> 2 files changed, 11 insertions(+)
>
> diff --git a/arch/riscv/kvm/tlb.c b/arch/riscv/kvm/tlb.c
> index 309d79b..b797f7c 100644
> --- a/arch/riscv/kvm/tlb.c
> +++ b/arch/riscv/kvm/tlb.c
> @@ -181,6 +181,7 @@ void kvm_riscv_local_tlb_sanitize(struct kvm_vcpu *vcpu)
>
> void kvm_riscv_fence_i_process(struct kvm_vcpu *vcpu)
> {
> + kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_FENCE_I_RCVD);
> local_flush_icache_all();
> }
>
> @@ -264,15 +265,18 @@ void kvm_riscv_hfence_process(struct kvm_vcpu *vcpu)
> d.addr, d.size, d.order);
> break;
> case KVM_RISCV_HFENCE_VVMA_ASID_GVA:
> + kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_ASID_RCVD);
> kvm_riscv_local_hfence_vvma_asid_gva(
> READ_ONCE(v->vmid), d.asid,
> d.addr, d.size, d.order);
> break;
> case KVM_RISCV_HFENCE_VVMA_ASID_ALL:
> + kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_ASID_RCVD);
> kvm_riscv_local_hfence_vvma_asid_all(
> READ_ONCE(v->vmid), d.asid);
> break;
> case KVM_RISCV_HFENCE_VVMA_GVA:
> + kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_RCVD);
> kvm_riscv_local_hfence_vvma_gva(
> READ_ONCE(v->vmid),
> d.addr, d.size, d.order);
> diff --git a/arch/riscv/kvm/vcpu_sbi_replace.c b/arch/riscv/kvm/vcpu_sbi_replace.c
> index abeb55f..71a671e 100644
> --- a/arch/riscv/kvm/vcpu_sbi_replace.c
> +++ b/arch/riscv/kvm/vcpu_sbi_replace.c
> @@ -11,6 +11,7 @@
> #include <linux/kvm_host.h>
> #include <asm/sbi.h>
> #include <asm/kvm_vcpu_timer.h>
> +#include <asm/kvm_vcpu_pmu.h>
> #include <asm/kvm_vcpu_sbi.h>
>
> static int kvm_sbi_ext_time_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> @@ -25,6 +26,7 @@ static int kvm_sbi_ext_time_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> return 0;
> }
>
> + kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_SET_TIMER);
> #if __riscv_xlen == 32
> next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
> #else
> @@ -57,6 +59,7 @@ static int kvm_sbi_ext_ipi_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> return 0;
> }
>
> + kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_IPI_SENT);
> kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> if (hbase != -1UL) {
> if (tmp->vcpu_id < hbase)
> @@ -67,6 +70,7 @@ static int kvm_sbi_ext_ipi_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> ret = kvm_riscv_vcpu_set_interrupt(tmp, IRQ_VS_SOFT);
> if (ret < 0)
> break;
> + kvm_riscv_vcpu_pmu_incr_fw(tmp, SBI_PMU_FW_IPI_RECVD);
> }
>
> return ret;
> @@ -90,6 +94,7 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
> switch (funcid) {
> case SBI_EXT_RFENCE_REMOTE_FENCE_I:
> kvm_riscv_fence_i(vcpu->kvm, hbase, hmask);
> + kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_FENCE_I_SENT);
> break;
> case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA:
> if (cp->a2 == 0 && cp->a3 == 0)
> @@ -97,6 +102,7 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
> else
> kvm_riscv_hfence_vvma_gva(vcpu->kvm, hbase, hmask,
> cp->a2, cp->a3, PAGE_SHIFT);
> + kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_SENT);
> break;
> case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID:
> if (cp->a2 == 0 && cp->a3 == 0)
> @@ -107,6 +113,7 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
> hbase, hmask,
> cp->a2, cp->a3,
> PAGE_SHIFT, cp->a4);
> + kvm_riscv_vcpu_pmu_incr_fw(vcpu, SBI_PMU_FW_HFENCE_VVMA_ASID_SENT);
> break;
> case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA:
> case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID:
> --
> 2.25.1
>
On Fri, Jan 27, 2023 at 2:53 PM Conor Dooley <[email protected]> wrote:
>
> Yo Atish,
>
> On Fri, Jan 27, 2023 at 10:25:47AM -0800, Atish Patra wrote:
> > This patch fixes/improve few minor things in SBI PMU extension
> > definition.
> >
> > 1. Align all the firmware event names.
>
> > @@ -171,7 +171,7 @@ enum sbi_pmu_fw_generic_events_t {
> > SBI_PMU_FW_IPI_RECVD = 7,
> > - SBI_PMU_FW_FENCE_I_RECVD = 9,
> > + SBI_PMU_FW_FENCE_I_RCVD = 9,
> > SBI_PMU_FW_SFENCE_VMA_RCVD = 11,
>
> Alignment looks incomplete to me! Looks like you went from 2 RECVD and
> 1 RCVD to 2 RCVD and 1 RECVD! FWIW, the spec uses RECEIVED for all of
Ahh I missed the other one. I have changed everything to RCVD just to
keep it short.
"RECEIVED" is too long :)
> these:
> https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc#114-event-firmware-events-type-15
>
> Thanks,
> Conor.
>
--
Regards,
Atish
On Sun, Jan 29, 2023 at 4:16 AM Anup Patel <[email protected]> wrote:
>
> On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
> >
> > Currently, the SBI extension handle is expected to return Linux error code.
> > The top SBI layer converts the Linux error code to SBI specific error code
> > that can be returned to guest invoking the SBI calls. This model works
> > as long as SBI error codes have 1-to-1 mappings between them.
> > However, that may not be true always. This patch attempts to disassociate
> > both these error codes by allowing the SBI extension implementation to
> > return SBI specific error codes as well.
> >
> > The extension will continue to return the Linux error specific code which
> > will indicate any problem *with* the extension emulation while the
> > SBI specific error will indicate the problem *of* the emulation.
> >
> > Suggested-by: Andrew Jones <[email protected]>
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > arch/riscv/include/asm/kvm_vcpu_sbi.h | 10 ++++--
> > arch/riscv/kvm/vcpu_sbi.c | 46 ++++++++------------------
> > arch/riscv/kvm/vcpu_sbi_base.c | 38 ++++++++++------------
> > arch/riscv/kvm/vcpu_sbi_hsm.c | 29 +++++++++--------
> > arch/riscv/kvm/vcpu_sbi_replace.c | 47 ++++++++++++++-------------
> > arch/riscv/kvm/vcpu_sbi_v01.c | 11 +++----
> > 6 files changed, 84 insertions(+), 97 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/kvm_vcpu_sbi.h b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> > index 45ba341..38407b3 100644
> > --- a/arch/riscv/include/asm/kvm_vcpu_sbi.h
> > +++ b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> > @@ -18,6 +18,12 @@ struct kvm_vcpu_sbi_context {
> > int return_handled;
> > };
> >
> > +struct kvm_vcpu_sbi_ext_data {
>
> s/kvm_vcpu_sbi_ext_data/kvm_vcpu_sbi_return/
>
> > + unsigned long out_val;
> > + unsigned long err_val;
>
> Add "struct kvm_cpu_trap utrap" here.
>
Done.
> > + bool uexit;
> > +};
> > +
> > struct kvm_vcpu_sbi_extension {
> > unsigned long extid_start;
> > unsigned long extid_end;
> > @@ -27,8 +33,8 @@ struct kvm_vcpu_sbi_extension {
> > * specific error codes.
> > */
> > int (*handler)(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > - unsigned long *out_val, struct kvm_cpu_trap *utrap,
> > - bool *exit);
> > + struct kvm_vcpu_sbi_ext_data *edata,
> > + struct kvm_cpu_trap *utrap);
> >
> > /* Extension specific probe function */
> > unsigned long (*probe)(struct kvm_vcpu *vcpu);
> > diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> > index f96991d..aa42da6 100644
> > --- a/arch/riscv/kvm/vcpu_sbi.c
> > +++ b/arch/riscv/kvm/vcpu_sbi.c
> > @@ -12,26 +12,6 @@
> > #include <asm/sbi.h>
> > #include <asm/kvm_vcpu_sbi.h>
> >
> > -static int kvm_linux_err_map_sbi(int err)
> > -{
> > - switch (err) {
> > - case 0:
> > - return SBI_SUCCESS;
> > - case -EPERM:
> > - return SBI_ERR_DENIED;
> > - case -EINVAL:
> > - return SBI_ERR_INVALID_PARAM;
> > - case -EFAULT:
> > - return SBI_ERR_INVALID_ADDRESS;
> > - case -EOPNOTSUPP:
> > - return SBI_ERR_NOT_SUPPORTED;
> > - case -EALREADY:
> > - return SBI_ERR_ALREADY_AVAILABLE;
> > - default:
> > - return SBI_ERR_FAILURE;
> > - };
> > -}
> > -
> > #ifndef CONFIG_RISCV_SBI_V01
> > static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_v01 = {
> > .extid_start = -1UL,
> > @@ -125,11 +105,10 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > {
> > int ret = 1;
> > bool next_sepc = true;
> > - bool userspace_exit = false;
> > struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> > const struct kvm_vcpu_sbi_extension *sbi_ext;
> > struct kvm_cpu_trap utrap = { 0 };
>
> Remove "struct kvm_cpu_trap utrap" from here since it can be
> part of "struct kvm_vcpu_sbi_return"
>
Done.
> > - unsigned long out_val = 0;
> > + struct kvm_vcpu_sbi_ext_data edata_out = { 0 };
> > bool ext_is_v01 = false;
> >
> > sbi_ext = kvm_vcpu_sbi_find_ext(cp->a7);
> > @@ -139,13 +118,22 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > cp->a7 <= SBI_EXT_0_1_SHUTDOWN)
> > ext_is_v01 = true;
> > #endif
> > - ret = sbi_ext->handler(vcpu, run, &out_val, &utrap, &userspace_exit);
> > + ret = sbi_ext->handler(vcpu, run, &edata_out, &utrap);
> > } else {
> > /* Return error for unsupported SBI calls */
> > cp->a0 = SBI_ERR_NOT_SUPPORTED;
> > goto ecall_done;
> > }
> >
> > + /*
> > + * When the SBI extension returns a Linux error code, it exits the ioctl
> > + * loop and forwards the error to userspace.
> > + */
> > + if (ret < 0) {
> > + next_sepc = false;
> > + goto ecall_done;
> > + }
> > +
> > /* Handle special error cases i.e trap, exit or userspace forward */
> > if (utrap.scause) {
> > /* No need to increment sepc or exit ioctl loop */
> > @@ -157,24 +145,18 @@ int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > }
> >
> > /* Exit ioctl loop or Propagate the error code the guest */
> > - if (userspace_exit) {
> > + if (edata_out.uexit) {
> > next_sepc = false;
> > ret = 0;
> > } else {
> > - /**
> > - * SBI extension handler always returns an Linux error code. Convert
> > - * it to the SBI specific error code that can be propagated the SBI
> > - * caller.
> > - */
> > - ret = kvm_linux_err_map_sbi(ret);
> > - cp->a0 = ret;
> > + cp->a0 = edata_out.err_val;
> > ret = 1;
> > }
> > ecall_done:
> > if (next_sepc)
> > cp->sepc += 4;
> > if (!ext_is_v01)
> > - cp->a1 = out_val;
> > + cp->a1 = edata_out.out_val;
>
> Strange! This now updates the "a1" register when ret < 0 which it should not.
>
> Ideally, the "a1" register should be only updated when "ret == 1".
>
Ahh. Thanks for catching that. Fixed that.
Earlier, we were updating a1 register for the userspace exit scenario.
I have fixed that as well.
> >
> > return ret;
> > }
> > diff --git a/arch/riscv/kvm/vcpu_sbi_base.c b/arch/riscv/kvm/vcpu_sbi_base.c
> > index 846d518..84885e5 100644
> > --- a/arch/riscv/kvm/vcpu_sbi_base.c
> > +++ b/arch/riscv/kvm/vcpu_sbi_base.c
> > @@ -14,24 +14,23 @@
> > #include <asm/kvm_vcpu_sbi.h>
> >
> > static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > - unsigned long *out_val,
> > - struct kvm_cpu_trap *trap, bool *exit)
> > + struct kvm_vcpu_sbi_ext_data *edata,
> > + struct kvm_cpu_trap *trap)
> > {
> > - int ret = 0;
> > struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> > const struct kvm_vcpu_sbi_extension *sbi_ext;
> >
> > switch (cp->a6) {
> > case SBI_EXT_BASE_GET_SPEC_VERSION:
> > - *out_val = (KVM_SBI_VERSION_MAJOR <<
> > + edata->out_val = (KVM_SBI_VERSION_MAJOR <<
> > SBI_SPEC_VERSION_MAJOR_SHIFT) |
> > KVM_SBI_VERSION_MINOR;
> > break;
> > case SBI_EXT_BASE_GET_IMP_ID:
> > - *out_val = KVM_SBI_IMPID;
> > + edata->out_val = KVM_SBI_IMPID;
> > break;
> > case SBI_EXT_BASE_GET_IMP_VERSION:
> > - *out_val = LINUX_VERSION_CODE;
> > + edata->out_val = LINUX_VERSION_CODE;
> > break;
> > case SBI_EXT_BASE_PROBE_EXT:
> > if ((cp->a0 >= SBI_EXT_EXPERIMENTAL_START &&
> > @@ -43,33 +42,33 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > * forward it to the userspace
> > */
> > kvm_riscv_vcpu_sbi_forward(vcpu, run);
> > - *exit = true;
> > + edata->uexit = true;
> > } else {
> > sbi_ext = kvm_vcpu_sbi_find_ext(cp->a0);
> > if (sbi_ext) {
> > if (sbi_ext->probe)
> > - *out_val = sbi_ext->probe(vcpu);
> > + edata->out_val = sbi_ext->probe(vcpu);
> > else
> > - *out_val = 1;
> > + edata->out_val = 1;
> > } else
> > - *out_val = 0;
> > + edata->out_val = 0;
> > }
> > break;
> > case SBI_EXT_BASE_GET_MVENDORID:
> > - *out_val = vcpu->arch.mvendorid;
> > + edata->out_val = vcpu->arch.mvendorid;
> > break;
> > case SBI_EXT_BASE_GET_MARCHID:
> > - *out_val = vcpu->arch.marchid;
> > + edata->out_val = vcpu->arch.marchid;
> > break;
> > case SBI_EXT_BASE_GET_MIMPID:
> > - *out_val = vcpu->arch.mimpid;
> > + edata->out_val = vcpu->arch.mimpid;
> > break;
> > default:
> > - ret = -EOPNOTSUPP;
> > + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> > break;
> > }
> >
> > - return ret;
> > + return 0;
> > }
> >
> > const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_base = {
> > @@ -79,17 +78,16 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_base = {
> > };
> >
> > static int kvm_sbi_ext_forward_handler(struct kvm_vcpu *vcpu,
> > - struct kvm_run *run,
> > - unsigned long *out_val,
> > - struct kvm_cpu_trap *utrap,
> > - bool *exit)
> > + struct kvm_run *run,
> > + struct kvm_vcpu_sbi_ext_data *edata,
> > + struct kvm_cpu_trap *utrap)
> > {
> > /*
> > * Both SBI experimental and vendor extensions are
> > * unconditionally forwarded to userspace.
> > */
> > kvm_riscv_vcpu_sbi_forward(vcpu, run);
> > - *exit = true;
> > + edata->uexit = true;
> > return 0;
> > }
> >
> > diff --git a/arch/riscv/kvm/vcpu_sbi_hsm.c b/arch/riscv/kvm/vcpu_sbi_hsm.c
> > index 619ac0f..5fb526c 100644
> > --- a/arch/riscv/kvm/vcpu_sbi_hsm.c
> > +++ b/arch/riscv/kvm/vcpu_sbi_hsm.c
> > @@ -21,9 +21,9 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu)
> >
> > target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, target_vcpuid);
> > if (!target_vcpu)
> > - return -EINVAL;
> > + return SBI_ERR_INVALID_PARAM;
> > if (!target_vcpu->arch.power_off)
> > - return -EALREADY;
> > + return SBI_ERR_ALREADY_AVAILABLE;
> >
> > reset_cntx = &target_vcpu->arch.guest_reset_context;
> > /* start address */
> > @@ -42,7 +42,7 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu)
> > static int kvm_sbi_hsm_vcpu_stop(struct kvm_vcpu *vcpu)
> > {
> > if (vcpu->arch.power_off)
> > - return -EACCES;
> > + return SBI_ERR_FAILURE;
> >
> > kvm_riscv_vcpu_power_off(vcpu);
> >
> > @@ -57,7 +57,7 @@ static int kvm_sbi_hsm_vcpu_get_status(struct kvm_vcpu *vcpu)
> >
> > target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, target_vcpuid);
> > if (!target_vcpu)
> > - return -EINVAL;
> > + return SBI_ERR_INVALID_PARAM;
> > if (!target_vcpu->arch.power_off)
> > return SBI_HSM_STATE_STARTED;
> > else if (vcpu->stat.generic.blocking)
> > @@ -67,9 +67,8 @@ static int kvm_sbi_hsm_vcpu_get_status(struct kvm_vcpu *vcpu)
> > }
> >
> > static int kvm_sbi_ext_hsm_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > - unsigned long *out_val,
> > - struct kvm_cpu_trap *utrap,
> > - bool *exit)
> > + struct kvm_vcpu_sbi_ext_data *edata,
> > + struct kvm_cpu_trap *utrap)
> > {
> > int ret = 0;
> > struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> > @@ -88,27 +87,29 @@ static int kvm_sbi_ext_hsm_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > case SBI_EXT_HSM_HART_STATUS:
> > ret = kvm_sbi_hsm_vcpu_get_status(vcpu);
> > if (ret >= 0) {
> > - *out_val = ret;
> > - ret = 0;
> > + edata->out_val = ret;
> > + edata->err_val = 0;
> > }
> > - break;
> > + return 0;
> > case SBI_EXT_HSM_HART_SUSPEND:
> > switch (cp->a0) {
> > case SBI_HSM_SUSPEND_RET_DEFAULT:
> > kvm_riscv_vcpu_wfi(vcpu);
> > break;
> > case SBI_HSM_SUSPEND_NON_RET_DEFAULT:
> > - ret = -EOPNOTSUPP;
> > + ret = SBI_ERR_NOT_SUPPORTED;
> > break;
> > default:
> > - ret = -EINVAL;
> > + ret = SBI_ERR_INVALID_PARAM;
> > }
> > break;
> > default:
> > - ret = -EOPNOTSUPP;
> > + ret = SBI_ERR_NOT_SUPPORTED;
> > }
> >
> > - return ret;
> > + edata->err_val = ret;
> > +
> > + return 0;
> > }
> >
> > const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_hsm = {
> > diff --git a/arch/riscv/kvm/vcpu_sbi_replace.c b/arch/riscv/kvm/vcpu_sbi_replace.c
> > index 03a0198..abeb55f 100644
> > --- a/arch/riscv/kvm/vcpu_sbi_replace.c
> > +++ b/arch/riscv/kvm/vcpu_sbi_replace.c
> > @@ -14,15 +14,16 @@
> > #include <asm/kvm_vcpu_sbi.h>
> >
> > static int kvm_sbi_ext_time_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > - unsigned long *out_val,
> > - struct kvm_cpu_trap *utrap, bool *exit)
> > + struct kvm_vcpu_sbi_ext_data *edata,
> > + struct kvm_cpu_trap *utrap)
> > {
> > - int ret = 0;
> > struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> > u64 next_cycle;
> >
> > - if (cp->a6 != SBI_EXT_TIME_SET_TIMER)
> > - return -EINVAL;
> > + if (cp->a6 != SBI_EXT_TIME_SET_TIMER) {
> > + edata->err_val = SBI_ERR_INVALID_PARAM;
> > + return 0;
> > + }
> >
> > #if __riscv_xlen == 32
> > next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0;
> > @@ -31,7 +32,7 @@ static int kvm_sbi_ext_time_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > #endif
> > kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle);
> >
> > - return ret;
> > + return 0;
> > }
> >
> > const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_time = {
> > @@ -41,8 +42,8 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_time = {
> > };
> >
> > static int kvm_sbi_ext_ipi_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > - unsigned long *out_val,
> > - struct kvm_cpu_trap *utrap, bool *exit)
> > + struct kvm_vcpu_sbi_ext_data *edata,
> > + struct kvm_cpu_trap *utrap)
> > {
> > int ret = 0;
> > unsigned long i;
> > @@ -51,8 +52,10 @@ static int kvm_sbi_ext_ipi_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > unsigned long hmask = cp->a0;
> > unsigned long hbase = cp->a1;
> >
> > - if (cp->a6 != SBI_EXT_IPI_SEND_IPI)
> > - return -EINVAL;
> > + if (cp->a6 != SBI_EXT_IPI_SEND_IPI) {
> > + edata->err_val = SBI_ERR_INVALID_PARAM;
> > + return 0;
> > + }
> >
> > kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> > if (hbase != -1UL) {
> > @@ -76,10 +79,9 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_ipi = {
> > };
> >
> > static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > - unsigned long *out_val,
> > - struct kvm_cpu_trap *utrap, bool *exit)
> > + struct kvm_vcpu_sbi_ext_data *edata,
> > + struct kvm_cpu_trap *utrap)
> > {
> > - int ret = 0;
> > struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> > unsigned long hmask = cp->a0;
> > unsigned long hbase = cp->a1;
> > @@ -116,10 +118,10 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
> > */
> > break;
> > default:
> > - ret = -EOPNOTSUPP;
> > + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> > }
> >
> > - return ret;
> > + return 0;
> > }
> >
> > const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_rfence = {
> > @@ -130,14 +132,13 @@ const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_rfence = {
> >
> > static int kvm_sbi_ext_srst_handler(struct kvm_vcpu *vcpu,
> > struct kvm_run *run,
> > - unsigned long *out_val,
> > - struct kvm_cpu_trap *utrap, bool *exit)
> > + struct kvm_vcpu_sbi_ext_data *edata,
> > + struct kvm_cpu_trap *utrap)
> > {
> > struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> > unsigned long funcid = cp->a6;
> > u32 reason = cp->a1;
> > u32 type = cp->a0;
> > - int ret = 0;
> >
> > switch (funcid) {
> > case SBI_EXT_SRST_RESET:
> > @@ -146,24 +147,24 @@ static int kvm_sbi_ext_srst_handler(struct kvm_vcpu *vcpu,
> > kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
> > KVM_SYSTEM_EVENT_SHUTDOWN,
> > reason);
> > - *exit = true;
> > + edata->uexit = true;
> > break;
> > case SBI_SRST_RESET_TYPE_COLD_REBOOT:
> > case SBI_SRST_RESET_TYPE_WARM_REBOOT:
> > kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
> > KVM_SYSTEM_EVENT_RESET,
> > reason);
> > - *exit = true;
> > + edata->uexit = true;
> > break;
> > default:
> > - ret = -EOPNOTSUPP;
> > + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> > }
> > break;
> > default:
> > - ret = -EOPNOTSUPP;
> > + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> > }
> >
> > - return ret;
> > + return 0;
> > }
> >
> > const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_srst = {
> > diff --git a/arch/riscv/kvm/vcpu_sbi_v01.c b/arch/riscv/kvm/vcpu_sbi_v01.c
> > index 489f225..c0ccc58 100644
> > --- a/arch/riscv/kvm/vcpu_sbi_v01.c
> > +++ b/arch/riscv/kvm/vcpu_sbi_v01.c
> > @@ -14,9 +14,8 @@
> > #include <asm/kvm_vcpu_sbi.h>
> >
> > static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > - unsigned long *out_val,
> > - struct kvm_cpu_trap *utrap,
> > - bool *exit)
> > + struct kvm_vcpu_sbi_ext_data *edata,
> > + struct kvm_cpu_trap *utrap)
> > {
> > ulong hmask;
> > int i, ret = 0;
> > @@ -33,7 +32,7 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > * handled in kernel so we forward these to user-space
> > */
> > kvm_riscv_vcpu_sbi_forward(vcpu, run);
> > - *exit = true;
> > + edata->uexit = true;
> > break;
> > case SBI_EXT_0_1_SET_TIMER:
> > #if __riscv_xlen == 32
> > @@ -65,7 +64,7 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > case SBI_EXT_0_1_SHUTDOWN:
> > kvm_riscv_vcpu_sbi_system_reset(vcpu, run,
> > KVM_SYSTEM_EVENT_SHUTDOWN, 0);
> > - *exit = true;
> > + edata->uexit = true;
> > break;
> > case SBI_EXT_0_1_REMOTE_FENCE_I:
> > case SBI_EXT_0_1_REMOTE_SFENCE_VMA:
> > @@ -103,7 +102,7 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > }
> > break;
> > default:
> > - ret = -EINVAL;
> > + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> > break;
> > }
> >
> > --
> > 2.25.1
> >
>
> Regards,
> Anup
--
Regards,
Atish
On Sun, Jan 29, 2023 at 4:30 AM Anup Patel <[email protected]> wrote:
>
> On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
> >
> > This patch only adds barebore structure of perf implementation. Most of
>
> s/barebore/barebone/
>
> > the function returns zero at this point and will be implemented
> > fully in the future.
> >
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > arch/riscv/include/asm/kvm_host.h | 3 +
> > arch/riscv/include/asm/kvm_vcpu_pmu.h | 76 ++++++++++++++
> > arch/riscv/kvm/Makefile | 1 +
> > arch/riscv/kvm/vcpu.c | 5 +
> > arch/riscv/kvm/vcpu_pmu.c | 145 ++++++++++++++++++++++++++
> > 5 files changed, 230 insertions(+)
> > create mode 100644 arch/riscv/include/asm/kvm_vcpu_pmu.h
> > create mode 100644 arch/riscv/kvm/vcpu_pmu.c
> >
> > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > index 93f43a3..f9874b4 100644
> > --- a/arch/riscv/include/asm/kvm_host.h
> > +++ b/arch/riscv/include/asm/kvm_host.h
> > @@ -18,6 +18,7 @@
> > #include <asm/kvm_vcpu_insn.h>
> > #include <asm/kvm_vcpu_sbi.h>
> > #include <asm/kvm_vcpu_timer.h>
> > +#include <asm/kvm_vcpu_pmu.h>
> >
> > #define KVM_MAX_VCPUS 1024
> >
> > @@ -228,6 +229,8 @@ struct kvm_vcpu_arch {
> >
> > /* Don't run the VCPU (blocked) */
> > bool pause;
> > +
> > + struct kvm_pmu pmu;
>
> Add a single line comment just like other members of the structure.
>
> I also suggest naming this variable "pmu_context" or something similar
> for naming consistency.
>
Done.
> > };
> >
> > static inline void kvm_arch_hardware_unsetup(void) {}
> > diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > new file mode 100644
> > index 0000000..3f43a43
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > @@ -0,0 +1,76 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (c) 2023 Rivos Inc
> > + *
> > + * Authors:
> > + * Atish Patra <[email protected]>
> > + */
> > +
> > +#ifndef __KVM_VCPU_RISCV_PMU_H
> > +#define __KVM_VCPU_RISCV_PMU_H
> > +
> > +#include <linux/perf/riscv_pmu.h>
> > +#include <asm/kvm_vcpu_sbi.h>
> > +#include <asm/sbi.h>
> > +
> > +#ifdef CONFIG_RISCV_PMU_SBI
> > +#define RISCV_KVM_MAX_FW_CTRS 32
> > +#define RISCV_MAX_COUNTERS 64
> > +
> > +/* Per virtual pmu counter data */
> > +struct kvm_pmc {
> > + u8 idx;
> > + struct perf_event *perf_event;
> > + uint64_t counter_val;
> > + union sbi_pmu_ctr_info cinfo;
> > + /* Event monitoring status */
> > + bool started;
> > +};
> > +
> > +/* PMU data structure per vcpu */
> > +struct kvm_pmu {
> > + struct kvm_pmc pmc[RISCV_MAX_COUNTERS];
> > + /* Number of the virtual firmware counters available */
> > + int num_fw_ctrs;
> > + /* Number of the virtual hardware counters available */
> > + int num_hw_ctrs;
> > + /* A flag to indicate that pmu initialization is done */
> > + bool init_done;
> > + /* Bit map of all the virtual counter used */
> > + DECLARE_BITMAP(pmc_in_use, RISCV_MAX_COUNTERS);
> > +};
> > +
> > +#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
> > +#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
> > +
> > +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
> > +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > + struct kvm_vcpu_sbi_ext_data *edata);
> > +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> > + struct kvm_vcpu_sbi_ext_data *edata);
> > +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag,
> > + struct kvm_vcpu_sbi_ext_data *edata);
> > +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag,
> > + unsigned long eidx, uint64_t evtdata,
> > + struct kvm_vcpu_sbi_ext_data *edata);
> > +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > + struct kvm_vcpu_sbi_ext_data *edata);
> > +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
> > +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
> > +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> > +
> > +#else
> > +struct kvm_pmu {
> > +};
> > +
> > +static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > +{
> > + return 0;
> > +}
> > +static inline void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu) {}
> > +static inline void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu) {}
> > +#endif /* CONFIG_RISCV_PMU_SBI */
> > +#endif /* !__KVM_VCPU_RISCV_PMU_H */
> > diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> > index 019df920..5de1053 100644
> > --- a/arch/riscv/kvm/Makefile
> > +++ b/arch/riscv/kvm/Makefile
> > @@ -25,3 +25,4 @@ kvm-y += vcpu_sbi_base.o
> > kvm-y += vcpu_sbi_replace.o
> > kvm-y += vcpu_sbi_hsm.o
> > kvm-y += vcpu_timer.o
> > +kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
> > diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> > index 7c08567..b746f21 100644
> > --- a/arch/riscv/kvm/vcpu.c
> > +++ b/arch/riscv/kvm/vcpu.c
> > @@ -137,6 +137,7 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
> >
> > WRITE_ONCE(vcpu->arch.irqs_pending, 0);
> > WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
>
> Add an empty newline here.
>
Done.
> > + kvm_riscv_vcpu_pmu_reset(vcpu);
> >
> > vcpu->arch.hfence_head = 0;
> > vcpu->arch.hfence_tail = 0;
> > @@ -194,6 +195,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> > /* Setup VCPU timer */
> > kvm_riscv_vcpu_timer_init(vcpu);
> >
> > + /* setup performance monitoring */
> > + kvm_riscv_vcpu_pmu_init(vcpu);
> > +
> > /* Reset VCPU */
> > kvm_riscv_reset_vcpu(vcpu);
> >
> > @@ -216,6 +220,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> > /* Cleanup VCPU timer */
> > kvm_riscv_vcpu_timer_deinit(vcpu);
> >
> > + kvm_riscv_vcpu_pmu_deinit(vcpu);
>
> Add an empty newline here.
>
Done.
> > /* Free unused pages pre-allocated for G-stage page table mappings */
> > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> > }
> > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > new file mode 100644
> > index 0000000..d3fd551
> > --- /dev/null
> > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > @@ -0,0 +1,145 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (c) 2023 Rivos Inc
> > + *
> > + * Authors:
> > + * Atish Patra <[email protected]>
> > + */
> > +
> > +#include <linux/errno.h>
> > +#include <linux/err.h>
> > +#include <linux/kvm_host.h>
> > +#include <linux/perf/riscv_pmu.h>
> > +#include <asm/csr.h>
> > +#include <asm/kvm_vcpu_sbi.h>
> > +#include <asm/kvm_vcpu_pmu.h>
> > +#include <linux/kvm_host.h>
> > +
> > +#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> > +
> > +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
> > +{
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > +
> > + edata->out_val = kvm_pmu_num_counters(kvpmu);
> > +
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > + struct kvm_vcpu_sbi_ext_data *edata)
> > +{
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > +
> > + if (cidx > RISCV_MAX_COUNTERS || cidx == 1) {
> > + edata->err_val = SBI_ERR_INVALID_PARAM;
> > + return 0;
> > + }
> > +
> > + edata->out_val = kvpmu->pmc[cidx].cinfo.value;
> > +
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> > + struct kvm_vcpu_sbi_ext_data *edata)
> > +{
> > + /* TODO */
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag,
> > + struct kvm_vcpu_sbi_ext_data *edata)
> > +{
> > + /* TODO */
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag,
> > + unsigned long eidx, uint64_t evtdata,
> > + struct kvm_vcpu_sbi_ext_data *edata)
> > +{
> > + /* TODO */
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > + struct kvm_vcpu_sbi_ext_data *edata)
> > +{
> > + /* TODO */
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > +{
> > + int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > + struct kvm_pmc *pmc;
> > +
> > + ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
> > + if (ret < 0)
> > + return ret;
> > +
> > + if (!hpm_width || !num_hw_ctrs) {
> > + pr_err("Cannot initialize VCPU with NULL hpmcounter width or number of counters\n");
>
> What will happen if underlying M-mode firmware does not implement
> SBI PMU extension ?
>
riscv_pmu_get_hpm_info will return an error and
kvm_riscv_vcpu_pmu_init will fail.
> > + return -EINVAL;
> > + }
> > +
> > + if ((num_hw_ctrs + RISCV_KVM_MAX_FW_CTRS) > RISCV_MAX_COUNTERS) {
> > + pr_warn("Limiting fw counters as hw & fw counters exceed maximum counters\n");
>
> How is this possible ?
>
> Maximum HW counters is 32 (including time, cycle, and instret),
> RISCV_KVM_MAX_FW_CTRS = 32, and
> RISCV_MAX_COUNTERS = 64
This was added to prevent the condition where somebody changed the
definition RISCV_KVM_MAX_FW_CTRS
without incrementing MAX_COUNTERS. The error might be subtle as it may
work on some platforms (with less hardware counter)
but fail on others (with more hardware counters.)
I couldn't find a better way to describe the relationship between
RISCV_KVM_MAX_FW_CTRS and RISCV_MAX_COUNTERS.
I can just put a big comment here instead of the condition check if
you prefer that way.
>
> > + num_fw_ctrs = RISCV_MAX_COUNTERS - num_hw_ctrs;
> > + } else
> > + num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;
> > +
> > + kvpmu->num_hw_ctrs = num_hw_ctrs;
> > + kvpmu->num_fw_ctrs = num_fw_ctrs;
> > +
> > + /*
> > + * There is no correlation between the logical hardware counter and virtual counters.
> > + * However, we need to encode a hpmcounter CSR in the counter info field so that
> > + * KVM can trap n emulate the read. This works well in the migration use case as
> > + * KVM doesn't care if the actual hpmcounter is available in the hardware or not.
> > + */
> > + for (i = 0; i < kvm_pmu_num_counters(kvpmu); i++) {
> > + /* TIME CSR shouldn't be read from perf interface */
> > + if (i == 1)
> > + continue;
> > + pmc = &kvpmu->pmc[i];
> > + pmc->idx = i;
> > + if (i < kvpmu->num_hw_ctrs) {
> > + kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
> > + if (i < 3)
> > + /* CY, IR counters */
> > + kvpmu->pmc[i].cinfo.width = 63;
> > + else
> > + kvpmu->pmc[i].cinfo.width = hpm_width;
> > + /*
> > + * The CSR number doesn't have any relation with the logical
> > + * hardware counters. The CSR numbers are encoded sequentially
> > + * to avoid maintaining a map between the virtual counter
> > + * and CSR number.
> > + */
> > + pmc->cinfo.csr = CSR_CYCLE + i;
> > + } else {
> > + pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
> > + pmc->cinfo.width = BITS_PER_LONG - 1;
> > + }
> > + }
> > +
> > + kvpmu->init_done = true;
> > +
> > + return 0;
> > +}
> > +
> > +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> > +{
> > + /* TODO */
> > +}
> > +
> > +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> > +{
> > + kvm_riscv_vcpu_pmu_deinit(vcpu);
> > +}
> > --
> > 2.25.1
> >
>
> Regards,
> Anup
--
Regards,
Atish
On Sun, Jan 29, 2023 at 4:44 AM Anup Patel <[email protected]> wrote:
>
> On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
> >
> > As the KVM guests only see the virtual PMU counters, all hpmcounter
> > access should trap and KVM emulates the read access on behalf of guests.
> >
> > Reviewed-by: Andrew Jones <[email protected]>
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 ++++++++++
> > arch/riscv/kvm/vcpu_insn.c | 4 ++-
> > arch/riscv/kvm/vcpu_pmu.c | 45 ++++++++++++++++++++++++++-
> > 3 files changed, 63 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > index 3f43a43..022d45d 100644
> > --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > @@ -43,6 +43,19 @@ struct kvm_pmu {
> > #define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
> > #define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
> >
> > +#if defined(CONFIG_32BIT)
> > +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> > +{ .base = CSR_CYCLEH, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm }, \
> > +{ .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
> > +#else
> > +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> > +{ .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
> > +#endif
> > +
> > +int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> > + unsigned long *val, unsigned long new_val,
> > + unsigned long wr_mask);
> > +
> > int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
> > int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > struct kvm_vcpu_sbi_ext_data *edata);
> > @@ -65,6 +78,9 @@ void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> > #else
> > struct kvm_pmu {
> > };
> > +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> > +{ .base = 0, .count = 0, .func = NULL },
> > +
>
> Redundant newline here.
>
Fixed.
> >
> > static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > {
> > diff --git a/arch/riscv/kvm/vcpu_insn.c b/arch/riscv/kvm/vcpu_insn.c
> > index 0bb5276..f689337 100644
> > --- a/arch/riscv/kvm/vcpu_insn.c
> > +++ b/arch/riscv/kvm/vcpu_insn.c
> > @@ -213,7 +213,9 @@ struct csr_func {
> > unsigned long wr_mask);
> > };
> >
> > -static const struct csr_func csr_funcs[] = { };
> > +static const struct csr_func csr_funcs[] = {
> > + KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS
> > +};
> >
> > /**
> > * kvm_riscv_vcpu_csr_return -- Handle CSR read/write after user space
> > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > index 7713927..894053a 100644
> > --- a/arch/riscv/kvm/vcpu_pmu.c
> > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > @@ -17,6 +17,44 @@
> >
> > #define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> >
> > +static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > + unsigned long *out_val)
> > +{
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > + struct kvm_pmc *pmc;
> > + u64 enabled, running;
> > +
> > + pmc = &kvpmu->pmc[cidx];
> > + if (!pmc->perf_event)
> > + return -EINVAL;
> > +
> > + pmc->counter_val += perf_event_read_value(pmc->perf_event, &enabled, &running);
> > + *out_val = pmc->counter_val;
> > +
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> > + unsigned long *val, unsigned long new_val,
> > + unsigned long wr_mask)
> > +{
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > + int cidx, ret = KVM_INSN_CONTINUE_NEXT_SEPC;
> > +
> > + if (!kvpmu || !kvpmu->init_done)
> > + return KVM_INSN_EXIT_TO_USER_SPACE;
>
> As discussed previously, this should be KVM_INSN_ILLEGAL_TRAP.
>
Done.
> > +
> > + if (wr_mask)
> > + return KVM_INSN_ILLEGAL_TRAP;
> > +
> > + cidx = csr_num - CSR_CYCLE;
> > +
> > + if (pmu_ctr_read(vcpu, cidx, val) < 0)
> > + return KVM_INSN_EXIT_TO_USER_SPACE;
>
> Same as above.
>
Done.
> > +
> > + return ret;
> > +}
> > +
> > int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
> > {
> > struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > @@ -69,7 +107,12 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> > int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > struct kvm_vcpu_sbi_ext_data *edata)
> > {
> > - /* TODO */
> > + int ret;
> > +
> > + ret = pmu_ctr_read(vcpu, cidx, &edata->out_val);
> > + if (ret == -EINVAL)
> > + edata->err_val = SBI_ERR_INVALID_PARAM;
> > +
> > return 0;
> > }
> >
> > --
> > 2.25.1
> >
>
> Regards,
> Anup
--
Regards,
Atish
On Wed, Feb 1, 2023 at 4:05 AM Atish Patra <[email protected]> wrote:
>
> On Sun, Jan 29, 2023 at 4:30 AM Anup Patel <[email protected]> wrote:
> >
> > On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
> > >
> > > This patch only adds barebore structure of perf implementation. Most of
> >
> > s/barebore/barebone/
> >
> > > the function returns zero at this point and will be implemented
> > > fully in the future.
> > >
> > > Signed-off-by: Atish Patra <[email protected]>
> > > ---
> > > arch/riscv/include/asm/kvm_host.h | 3 +
> > > arch/riscv/include/asm/kvm_vcpu_pmu.h | 76 ++++++++++++++
> > > arch/riscv/kvm/Makefile | 1 +
> > > arch/riscv/kvm/vcpu.c | 5 +
> > > arch/riscv/kvm/vcpu_pmu.c | 145 ++++++++++++++++++++++++++
> > > 5 files changed, 230 insertions(+)
> > > create mode 100644 arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > create mode 100644 arch/riscv/kvm/vcpu_pmu.c
> > >
> > > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > > index 93f43a3..f9874b4 100644
> > > --- a/arch/riscv/include/asm/kvm_host.h
> > > +++ b/arch/riscv/include/asm/kvm_host.h
> > > @@ -18,6 +18,7 @@
> > > #include <asm/kvm_vcpu_insn.h>
> > > #include <asm/kvm_vcpu_sbi.h>
> > > #include <asm/kvm_vcpu_timer.h>
> > > +#include <asm/kvm_vcpu_pmu.h>
> > >
> > > #define KVM_MAX_VCPUS 1024
> > >
> > > @@ -228,6 +229,8 @@ struct kvm_vcpu_arch {
> > >
> > > /* Don't run the VCPU (blocked) */
> > > bool pause;
> > > +
> > > + struct kvm_pmu pmu;
> >
> > Add a single line comment just like other members of the structure.
> >
> > I also suggest naming this variable "pmu_context" or something similar
> > for naming consistency.
> >
>
> Done.
>
> > > };
> > >
> > > static inline void kvm_arch_hardware_unsetup(void) {}
> > > diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > new file mode 100644
> > > index 0000000..3f43a43
> > > --- /dev/null
> > > +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > @@ -0,0 +1,76 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/*
> > > + * Copyright (c) 2023 Rivos Inc
> > > + *
> > > + * Authors:
> > > + * Atish Patra <[email protected]>
> > > + */
> > > +
> > > +#ifndef __KVM_VCPU_RISCV_PMU_H
> > > +#define __KVM_VCPU_RISCV_PMU_H
> > > +
> > > +#include <linux/perf/riscv_pmu.h>
> > > +#include <asm/kvm_vcpu_sbi.h>
> > > +#include <asm/sbi.h>
> > > +
> > > +#ifdef CONFIG_RISCV_PMU_SBI
> > > +#define RISCV_KVM_MAX_FW_CTRS 32
> > > +#define RISCV_MAX_COUNTERS 64
> > > +
> > > +/* Per virtual pmu counter data */
> > > +struct kvm_pmc {
> > > + u8 idx;
> > > + struct perf_event *perf_event;
> > > + uint64_t counter_val;
> > > + union sbi_pmu_ctr_info cinfo;
> > > + /* Event monitoring status */
> > > + bool started;
> > > +};
> > > +
> > > +/* PMU data structure per vcpu */
> > > +struct kvm_pmu {
> > > + struct kvm_pmc pmc[RISCV_MAX_COUNTERS];
> > > + /* Number of the virtual firmware counters available */
> > > + int num_fw_ctrs;
> > > + /* Number of the virtual hardware counters available */
> > > + int num_hw_ctrs;
> > > + /* A flag to indicate that pmu initialization is done */
> > > + bool init_done;
> > > + /* Bit map of all the virtual counter used */
> > > + DECLARE_BITMAP(pmc_in_use, RISCV_MAX_COUNTERS);
> > > +};
> > > +
> > > +#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
> > > +#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
> > > +
> > > +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
> > > +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > + unsigned long ctr_mask, unsigned long flag,
> > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > + unsigned long ctr_mask, unsigned long flag,
> > > + unsigned long eidx, uint64_t evtdata,
> > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
> > > +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
> > > +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> > > +
> > > +#else
> > > +struct kvm_pmu {
> > > +};
> > > +
> > > +static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > > +{
> > > + return 0;
> > > +}
> > > +static inline void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu) {}
> > > +static inline void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu) {}
> > > +#endif /* CONFIG_RISCV_PMU_SBI */
> > > +#endif /* !__KVM_VCPU_RISCV_PMU_H */
> > > diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> > > index 019df920..5de1053 100644
> > > --- a/arch/riscv/kvm/Makefile
> > > +++ b/arch/riscv/kvm/Makefile
> > > @@ -25,3 +25,4 @@ kvm-y += vcpu_sbi_base.o
> > > kvm-y += vcpu_sbi_replace.o
> > > kvm-y += vcpu_sbi_hsm.o
> > > kvm-y += vcpu_timer.o
> > > +kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
> > > diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> > > index 7c08567..b746f21 100644
> > > --- a/arch/riscv/kvm/vcpu.c
> > > +++ b/arch/riscv/kvm/vcpu.c
> > > @@ -137,6 +137,7 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
> > >
> > > WRITE_ONCE(vcpu->arch.irqs_pending, 0);
> > > WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
> >
> > Add an empty newline here.
> >
>
> Done.
> > > + kvm_riscv_vcpu_pmu_reset(vcpu);
> > >
> > > vcpu->arch.hfence_head = 0;
> > > vcpu->arch.hfence_tail = 0;
> > > @@ -194,6 +195,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> > > /* Setup VCPU timer */
> > > kvm_riscv_vcpu_timer_init(vcpu);
> > >
> > > + /* setup performance monitoring */
> > > + kvm_riscv_vcpu_pmu_init(vcpu);
> > > +
> > > /* Reset VCPU */
> > > kvm_riscv_reset_vcpu(vcpu);
> > >
> > > @@ -216,6 +220,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> > > /* Cleanup VCPU timer */
> > > kvm_riscv_vcpu_timer_deinit(vcpu);
> > >
> > > + kvm_riscv_vcpu_pmu_deinit(vcpu);
> >
> > Add an empty newline here.
> >
>
> Done.
>
> > > /* Free unused pages pre-allocated for G-stage page table mappings */
> > > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> > > }
> > > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > > new file mode 100644
> > > index 0000000..d3fd551
> > > --- /dev/null
> > > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > > @@ -0,0 +1,145 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * Copyright (c) 2023 Rivos Inc
> > > + *
> > > + * Authors:
> > > + * Atish Patra <[email protected]>
> > > + */
> > > +
> > > +#include <linux/errno.h>
> > > +#include <linux/err.h>
> > > +#include <linux/kvm_host.h>
> > > +#include <linux/perf/riscv_pmu.h>
> > > +#include <asm/csr.h>
> > > +#include <asm/kvm_vcpu_sbi.h>
> > > +#include <asm/kvm_vcpu_pmu.h>
> > > +#include <linux/kvm_host.h>
> > > +
> > > +#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> > > +
> > > +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
> > > +{
> > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > +
> > > + edata->out_val = kvm_pmu_num_counters(kvpmu);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > +{
> > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > +
> > > + if (cidx > RISCV_MAX_COUNTERS || cidx == 1) {
> > > + edata->err_val = SBI_ERR_INVALID_PARAM;
> > > + return 0;
> > > + }
> > > +
> > > + edata->out_val = kvpmu->pmc[cidx].cinfo.value;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > +{
> > > + /* TODO */
> > > + return 0;
> > > +}
> > > +
> > > +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > + unsigned long ctr_mask, unsigned long flag,
> > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > +{
> > > + /* TODO */
> > > + return 0;
> > > +}
> > > +
> > > +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > + unsigned long ctr_mask, unsigned long flag,
> > > + unsigned long eidx, uint64_t evtdata,
> > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > +{
> > > + /* TODO */
> > > + return 0;
> > > +}
> > > +
> > > +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > +{
> > > + /* TODO */
> > > + return 0;
> > > +}
> > > +
> > > +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > > +{
> > > + int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
> > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > + struct kvm_pmc *pmc;
> > > +
> > > + ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
> > > + if (ret < 0)
> > > + return ret;
> > > +
> > > + if (!hpm_width || !num_hw_ctrs) {
> > > + pr_err("Cannot initialize VCPU with NULL hpmcounter width or number of counters\n");
> >
> > What will happen if underlying M-mode firmware does not implement
> > SBI PMU extension ?
> >
>
> riscv_pmu_get_hpm_info will return an error and
> kvm_riscv_vcpu_pmu_init will fail.
Ahh okay. The vcpu_create() is not looking at return value
of kvm_riscv_vcpu_pmu_init() but this function should fail
silently "return 0" if riscv_pmu_get_hpm_info() fails.
>
> > > + return -EINVAL;
> > > + }
> > > +
> > > + if ((num_hw_ctrs + RISCV_KVM_MAX_FW_CTRS) > RISCV_MAX_COUNTERS) {
> > > + pr_warn("Limiting fw counters as hw & fw counters exceed maximum counters\n");
> >
> > How is this possible ?
> >
> > Maximum HW counters is 32 (including time, cycle, and instret),
> > RISCV_KVM_MAX_FW_CTRS = 32, and
> > RISCV_MAX_COUNTERS = 64
>
> This was added to prevent the condition where somebody changed the
> definition RISCV_KVM_MAX_FW_CTRS
> without incrementing MAX_COUNTERS. The error might be subtle as it may
> work on some platforms (with less hardware counter)
> but fail on others (with more hardware counters.)
> I couldn't find a better way to describe the relationship between
> RISCV_KVM_MAX_FW_CTRS and RISCV_MAX_COUNTERS.
>
> I can just put a big comment here instead of the condition check if
> you prefer that way.
Maybe you can add a compile-time check using "#if" and "#error" ?
>
>
> >
> > > + num_fw_ctrs = RISCV_MAX_COUNTERS - num_hw_ctrs;
> > > + } else
> > > + num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;
> > > +
> > > + kvpmu->num_hw_ctrs = num_hw_ctrs;
> > > + kvpmu->num_fw_ctrs = num_fw_ctrs;
> > > +
> > > + /*
> > > + * There is no correlation between the logical hardware counter and virtual counters.
> > > + * However, we need to encode a hpmcounter CSR in the counter info field so that
> > > + * KVM can trap n emulate the read. This works well in the migration use case as
> > > + * KVM doesn't care if the actual hpmcounter is available in the hardware or not.
> > > + */
> > > + for (i = 0; i < kvm_pmu_num_counters(kvpmu); i++) {
> > > + /* TIME CSR shouldn't be read from perf interface */
> > > + if (i == 1)
> > > + continue;
> > > + pmc = &kvpmu->pmc[i];
> > > + pmc->idx = i;
> > > + if (i < kvpmu->num_hw_ctrs) {
> > > + kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
> > > + if (i < 3)
> > > + /* CY, IR counters */
> > > + kvpmu->pmc[i].cinfo.width = 63;
> > > + else
> > > + kvpmu->pmc[i].cinfo.width = hpm_width;
> > > + /*
> > > + * The CSR number doesn't have any relation with the logical
> > > + * hardware counters. The CSR numbers are encoded sequentially
> > > + * to avoid maintaining a map between the virtual counter
> > > + * and CSR number.
> > > + */
> > > + pmc->cinfo.csr = CSR_CYCLE + i;
> > > + } else {
> > > + pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
> > > + pmc->cinfo.width = BITS_PER_LONG - 1;
> > > + }
> > > + }
> > > +
> > > + kvpmu->init_done = true;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> > > +{
> > > + /* TODO */
> > > +}
> > > +
> > > +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> > > +{
> > > + kvm_riscv_vcpu_pmu_deinit(vcpu);
> > > +}
> > > --
> > > 2.25.1
> > >
> >
> > Regards,
> > Anup
>
>
>
> --
> Regards,
> Atish
Regards,
Anup
On Tue, Jan 31, 2023 at 7:48 PM Anup Patel <[email protected]> wrote:
>
> On Wed, Feb 1, 2023 at 4:05 AM Atish Patra <[email protected]> wrote:
> >
> > On Sun, Jan 29, 2023 at 4:30 AM Anup Patel <[email protected]> wrote:
> > >
> > > On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
> > > >
> > > > This patch only adds barebore structure of perf implementation. Most of
> > >
> > > s/barebore/barebone/
> > >
> > > > the function returns zero at this point and will be implemented
> > > > fully in the future.
> > > >
> > > > Signed-off-by: Atish Patra <[email protected]>
> > > > ---
> > > > arch/riscv/include/asm/kvm_host.h | 3 +
> > > > arch/riscv/include/asm/kvm_vcpu_pmu.h | 76 ++++++++++++++
> > > > arch/riscv/kvm/Makefile | 1 +
> > > > arch/riscv/kvm/vcpu.c | 5 +
> > > > arch/riscv/kvm/vcpu_pmu.c | 145 ++++++++++++++++++++++++++
> > > > 5 files changed, 230 insertions(+)
> > > > create mode 100644 arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > > create mode 100644 arch/riscv/kvm/vcpu_pmu.c
> > > >
> > > > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > > > index 93f43a3..f9874b4 100644
> > > > --- a/arch/riscv/include/asm/kvm_host.h
> > > > +++ b/arch/riscv/include/asm/kvm_host.h
> > > > @@ -18,6 +18,7 @@
> > > > #include <asm/kvm_vcpu_insn.h>
> > > > #include <asm/kvm_vcpu_sbi.h>
> > > > #include <asm/kvm_vcpu_timer.h>
> > > > +#include <asm/kvm_vcpu_pmu.h>
> > > >
> > > > #define KVM_MAX_VCPUS 1024
> > > >
> > > > @@ -228,6 +229,8 @@ struct kvm_vcpu_arch {
> > > >
> > > > /* Don't run the VCPU (blocked) */
> > > > bool pause;
> > > > +
> > > > + struct kvm_pmu pmu;
> > >
> > > Add a single line comment just like other members of the structure.
> > >
> > > I also suggest naming this variable "pmu_context" or something similar
> > > for naming consistency.
> > >
> >
> > Done.
> >
> > > > };
> > > >
> > > > static inline void kvm_arch_hardware_unsetup(void) {}
> > > > diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > > new file mode 100644
> > > > index 0000000..3f43a43
> > > > --- /dev/null
> > > > +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > > @@ -0,0 +1,76 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > +/*
> > > > + * Copyright (c) 2023 Rivos Inc
> > > > + *
> > > > + * Authors:
> > > > + * Atish Patra <[email protected]>
> > > > + */
> > > > +
> > > > +#ifndef __KVM_VCPU_RISCV_PMU_H
> > > > +#define __KVM_VCPU_RISCV_PMU_H
> > > > +
> > > > +#include <linux/perf/riscv_pmu.h>
> > > > +#include <asm/kvm_vcpu_sbi.h>
> > > > +#include <asm/sbi.h>
> > > > +
> > > > +#ifdef CONFIG_RISCV_PMU_SBI
> > > > +#define RISCV_KVM_MAX_FW_CTRS 32
> > > > +#define RISCV_MAX_COUNTERS 64
> > > > +
> > > > +/* Per virtual pmu counter data */
> > > > +struct kvm_pmc {
> > > > + u8 idx;
> > > > + struct perf_event *perf_event;
> > > > + uint64_t counter_val;
> > > > + union sbi_pmu_ctr_info cinfo;
> > > > + /* Event monitoring status */
> > > > + bool started;
> > > > +};
> > > > +
> > > > +/* PMU data structure per vcpu */
> > > > +struct kvm_pmu {
> > > > + struct kvm_pmc pmc[RISCV_MAX_COUNTERS];
> > > > + /* Number of the virtual firmware counters available */
> > > > + int num_fw_ctrs;
> > > > + /* Number of the virtual hardware counters available */
> > > > + int num_hw_ctrs;
> > > > + /* A flag to indicate that pmu initialization is done */
> > > > + bool init_done;
> > > > + /* Bit map of all the virtual counter used */
> > > > + DECLARE_BITMAP(pmc_in_use, RISCV_MAX_COUNTERS);
> > > > +};
> > > > +
> > > > +#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
> > > > +#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
> > > > +
> > > > +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
> > > > +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > > +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> > > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > > +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > + unsigned long ctr_mask, unsigned long flag,
> > > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > > +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > + unsigned long ctr_mask, unsigned long flag,
> > > > + unsigned long eidx, uint64_t evtdata,
> > > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > > +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > > +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
> > > > +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
> > > > +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> > > > +
> > > > +#else
> > > > +struct kvm_pmu {
> > > > +};
> > > > +
> > > > +static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > > > +{
> > > > + return 0;
> > > > +}
> > > > +static inline void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu) {}
> > > > +static inline void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu) {}
> > > > +#endif /* CONFIG_RISCV_PMU_SBI */
> > > > +#endif /* !__KVM_VCPU_RISCV_PMU_H */
> > > > diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> > > > index 019df920..5de1053 100644
> > > > --- a/arch/riscv/kvm/Makefile
> > > > +++ b/arch/riscv/kvm/Makefile
> > > > @@ -25,3 +25,4 @@ kvm-y += vcpu_sbi_base.o
> > > > kvm-y += vcpu_sbi_replace.o
> > > > kvm-y += vcpu_sbi_hsm.o
> > > > kvm-y += vcpu_timer.o
> > > > +kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
> > > > diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> > > > index 7c08567..b746f21 100644
> > > > --- a/arch/riscv/kvm/vcpu.c
> > > > +++ b/arch/riscv/kvm/vcpu.c
> > > > @@ -137,6 +137,7 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
> > > >
> > > > WRITE_ONCE(vcpu->arch.irqs_pending, 0);
> > > > WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
> > >
> > > Add an empty newline here.
> > >
> >
> > Done.
> > > > + kvm_riscv_vcpu_pmu_reset(vcpu);
> > > >
> > > > vcpu->arch.hfence_head = 0;
> > > > vcpu->arch.hfence_tail = 0;
> > > > @@ -194,6 +195,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> > > > /* Setup VCPU timer */
> > > > kvm_riscv_vcpu_timer_init(vcpu);
> > > >
> > > > + /* setup performance monitoring */
> > > > + kvm_riscv_vcpu_pmu_init(vcpu);
> > > > +
> > > > /* Reset VCPU */
> > > > kvm_riscv_reset_vcpu(vcpu);
> > > >
> > > > @@ -216,6 +220,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> > > > /* Cleanup VCPU timer */
> > > > kvm_riscv_vcpu_timer_deinit(vcpu);
> > > >
> > > > + kvm_riscv_vcpu_pmu_deinit(vcpu);
> > >
> > > Add an empty newline here.
> > >
> >
> > Done.
> >
> > > > /* Free unused pages pre-allocated for G-stage page table mappings */
> > > > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> > > > }
> > > > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > > > new file mode 100644
> > > > index 0000000..d3fd551
> > > > --- /dev/null
> > > > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > > > @@ -0,0 +1,145 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * Copyright (c) 2023 Rivos Inc
> > > > + *
> > > > + * Authors:
> > > > + * Atish Patra <[email protected]>
> > > > + */
> > > > +
> > > > +#include <linux/errno.h>
> > > > +#include <linux/err.h>
> > > > +#include <linux/kvm_host.h>
> > > > +#include <linux/perf/riscv_pmu.h>
> > > > +#include <asm/csr.h>
> > > > +#include <asm/kvm_vcpu_sbi.h>
> > > > +#include <asm/kvm_vcpu_pmu.h>
> > > > +#include <linux/kvm_host.h>
> > > > +
> > > > +#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> > > > +
> > > > +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
> > > > +{
> > > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > > +
> > > > + edata->out_val = kvm_pmu_num_counters(kvpmu);
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > > +{
> > > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > > +
> > > > + if (cidx > RISCV_MAX_COUNTERS || cidx == 1) {
> > > > + edata->err_val = SBI_ERR_INVALID_PARAM;
> > > > + return 0;
> > > > + }
> > > > +
> > > > + edata->out_val = kvpmu->pmc[cidx].cinfo.value;
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> > > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > > +{
> > > > + /* TODO */
> > > > + return 0;
> > > > +}
> > > > +
> > > > +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > + unsigned long ctr_mask, unsigned long flag,
> > > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > > +{
> > > > + /* TODO */
> > > > + return 0;
> > > > +}
> > > > +
> > > > +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > + unsigned long ctr_mask, unsigned long flag,
> > > > + unsigned long eidx, uint64_t evtdata,
> > > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > > +{
> > > > + /* TODO */
> > > > + return 0;
> > > > +}
> > > > +
> > > > +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > > +{
> > > > + /* TODO */
> > > > + return 0;
> > > > +}
> > > > +
> > > > +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > > > +{
> > > > + int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
> > > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > > + struct kvm_pmc *pmc;
> > > > +
> > > > + ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
> > > > + if (ret < 0)
> > > > + return ret;
> > > > +
> > > > + if (!hpm_width || !num_hw_ctrs) {
> > > > + pr_err("Cannot initialize VCPU with NULL hpmcounter width or number of counters\n");
> > >
> > > What will happen if underlying M-mode firmware does not implement
> > > SBI PMU extension ?
> > >
> >
> > riscv_pmu_get_hpm_info will return an error and
> > kvm_riscv_vcpu_pmu_init will fail.
>
> Ahh okay. The vcpu_create() is not looking at return value
> of kvm_riscv_vcpu_pmu_init() but this function should fail
> silently "return 0" if riscv_pmu_get_hpm_info() fails.
>
Sure. I am assuming it should return 0 for the next case as well.
if (!hpm_width || !num_hw_ctrs)
Thinking over again, does it make sense to print the error message in
the above case ?
In case you do see some use of this error message, it probably needs
improvement as well
"Cannot initialize pmu for vcpu %d with invalid counter width or
number of counters"
> >
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + if ((num_hw_ctrs + RISCV_KVM_MAX_FW_CTRS) > RISCV_MAX_COUNTERS) {
> > > > + pr_warn("Limiting fw counters as hw & fw counters exceed maximum counters\n");
> > >
> > > How is this possible ?
> > >
> > > Maximum HW counters is 32 (including time, cycle, and instret),
> > > RISCV_KVM_MAX_FW_CTRS = 32, and
> > > RISCV_MAX_COUNTERS = 64
> >
> > This was added to prevent the condition where somebody changed the
> > definition RISCV_KVM_MAX_FW_CTRS
> > without incrementing MAX_COUNTERS. The error might be subtle as it may
> > work on some platforms (with less hardware counter)
> > but fail on others (with more hardware counters.)
> > I couldn't find a better way to describe the relationship between
> > RISCV_KVM_MAX_FW_CTRS and RISCV_MAX_COUNTERS.
> >
> > I can just put a big comment here instead of the condition check if
> > you prefer that way.
>
> Maybe you can add a compile-time check using "#if" and "#error" ?
>
You mean compile time error when RISCV_KVM_MAX_FW_CTRS > 32 ?
Because num_hw_ctrs is computed at the runtime. So we can't determine
the actual limit of firmware counters unless we know how many hardware
counters are there.
> >
> >
> > >
> > > > + num_fw_ctrs = RISCV_MAX_COUNTERS - num_hw_ctrs;
> > > > + } else
> > > > + num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;
> > > > +
> > > > + kvpmu->num_hw_ctrs = num_hw_ctrs;
> > > > + kvpmu->num_fw_ctrs = num_fw_ctrs;
> > > > +
> > > > + /*
> > > > + * There is no correlation between the logical hardware counter and virtual counters.
> > > > + * However, we need to encode a hpmcounter CSR in the counter info field so that
> > > > + * KVM can trap n emulate the read. This works well in the migration use case as
> > > > + * KVM doesn't care if the actual hpmcounter is available in the hardware or not.
> > > > + */
> > > > + for (i = 0; i < kvm_pmu_num_counters(kvpmu); i++) {
> > > > + /* TIME CSR shouldn't be read from perf interface */
> > > > + if (i == 1)
> > > > + continue;
> > > > + pmc = &kvpmu->pmc[i];
> > > > + pmc->idx = i;
> > > > + if (i < kvpmu->num_hw_ctrs) {
> > > > + kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
> > > > + if (i < 3)
> > > > + /* CY, IR counters */
> > > > + kvpmu->pmc[i].cinfo.width = 63;
> > > > + else
> > > > + kvpmu->pmc[i].cinfo.width = hpm_width;
> > > > + /*
> > > > + * The CSR number doesn't have any relation with the logical
> > > > + * hardware counters. The CSR numbers are encoded sequentially
> > > > + * to avoid maintaining a map between the virtual counter
> > > > + * and CSR number.
> > > > + */
> > > > + pmc->cinfo.csr = CSR_CYCLE + i;
> > > > + } else {
> > > > + pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
> > > > + pmc->cinfo.width = BITS_PER_LONG - 1;
> > > > + }
> > > > + }
> > > > +
> > > > + kvpmu->init_done = true;
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> > > > +{
> > > > + /* TODO */
> > > > +}
> > > > +
> > > > +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> > > > +{
> > > > + kvm_riscv_vcpu_pmu_deinit(vcpu);
> > > > +}
> > > > --
> > > > 2.25.1
> > > >
> > >
> > > Regards,
> > > Anup
> >
> >
> >
> > --
> > Regards,
> > Atish
>
> Regards,
> Anup
--
Regards,
Atish
On Tue, Jan 31, 2023 at 2:46 PM Atish Patra <[email protected]> wrote:
>
> On Sun, Jan 29, 2023 at 4:44 AM Anup Patel <[email protected]> wrote:
> >
> > On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
> > >
> > > As the KVM guests only see the virtual PMU counters, all hpmcounter
> > > access should trap and KVM emulates the read access on behalf of guests.
> > >
> > > Reviewed-by: Andrew Jones <[email protected]>
> > > Signed-off-by: Atish Patra <[email protected]>
> > > ---
> > > arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 ++++++++++
> > > arch/riscv/kvm/vcpu_insn.c | 4 ++-
> > > arch/riscv/kvm/vcpu_pmu.c | 45 ++++++++++++++++++++++++++-
> > > 3 files changed, 63 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > index 3f43a43..022d45d 100644
> > > --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > @@ -43,6 +43,19 @@ struct kvm_pmu {
> > > #define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
> > > #define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
> > >
> > > +#if defined(CONFIG_32BIT)
> > > +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> > > +{ .base = CSR_CYCLEH, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm }, \
> > > +{ .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
> > > +#else
> > > +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> > > +{ .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
> > > +#endif
> > > +
> > > +int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> > > + unsigned long *val, unsigned long new_val,
> > > + unsigned long wr_mask);
> > > +
> > > int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
> > > int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > struct kvm_vcpu_sbi_ext_data *edata);
> > > @@ -65,6 +78,9 @@ void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> > > #else
> > > struct kvm_pmu {
> > > };
> > > +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> > > +{ .base = 0, .count = 0, .func = NULL },
> > > +
> >
> > Redundant newline here.
> >
>
> Fixed.
>
> > >
> > > static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > > {
> > > diff --git a/arch/riscv/kvm/vcpu_insn.c b/arch/riscv/kvm/vcpu_insn.c
> > > index 0bb5276..f689337 100644
> > > --- a/arch/riscv/kvm/vcpu_insn.c
> > > +++ b/arch/riscv/kvm/vcpu_insn.c
> > > @@ -213,7 +213,9 @@ struct csr_func {
> > > unsigned long wr_mask);
> > > };
> > >
> > > -static const struct csr_func csr_funcs[] = { };
> > > +static const struct csr_func csr_funcs[] = {
> > > + KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS
> > > +};
> > >
> > > /**
> > > * kvm_riscv_vcpu_csr_return -- Handle CSR read/write after user space
> > > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > > index 7713927..894053a 100644
> > > --- a/arch/riscv/kvm/vcpu_pmu.c
> > > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > > @@ -17,6 +17,44 @@
> > >
> > > #define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> > >
> > > +static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > + unsigned long *out_val)
> > > +{
> > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > + struct kvm_pmc *pmc;
> > > + u64 enabled, running;
> > > +
> > > + pmc = &kvpmu->pmc[cidx];
> > > + if (!pmc->perf_event)
> > > + return -EINVAL;
> > > +
> > > + pmc->counter_val += perf_event_read_value(pmc->perf_event, &enabled, &running);
> > > + *out_val = pmc->counter_val;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> > > + unsigned long *val, unsigned long new_val,
> > > + unsigned long wr_mask)
> > > +{
> > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > + int cidx, ret = KVM_INSN_CONTINUE_NEXT_SEPC;
> > > +
> > > + if (!kvpmu || !kvpmu->init_done)
> > > + return KVM_INSN_EXIT_TO_USER_SPACE;
> >
> > As discussed previously, this should be KVM_INSN_ILLEGAL_TRAP.
> >
Thinking about it more, this results in a panic in guest S-mode which
is probably undesirable.
As per your earlier suggestion, we can return 0 for cycle/instret
counters if accessed.
This is only possible through legacy pmu drivers running in guests or
some other OS that access any hpmcounters
for random reasons.
I think we should return KVM_INSN_ILLEGAL_TRAP for other counters and
make the guest kernel panic.
This does separate the behavior between fixed and programmable
counters when everything is denied access in hcounteren.
The new code will look like this:
if (!kvpmu || !kvpmu->init_done) {
if (csr_num == CSR_CYCLE || csr_num == CSR_INSTRET) {
*val = 0;
return ret;
} else
return KVM_INSN_ILLEGAL_TRAP;
}
Let me know if you think otherwise.
>
> Done.
> > > +
> > > + if (wr_mask)
> > > + return KVM_INSN_ILLEGAL_TRAP;
> > > +
> > > + cidx = csr_num - CSR_CYCLE;
> > > +
> > > + if (pmu_ctr_read(vcpu, cidx, val) < 0)
> > > + return KVM_INSN_EXIT_TO_USER_SPACE;
> >
> > Same as above.
> >
We can get rid of this as pmu_ctr_read doesn't return errors anyways.
>
> Done.
>
> > > +
> > > + return ret;
> > > +}
> > > +
> > > int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
> > > {
> > > struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > @@ -69,7 +107,12 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> > > int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > struct kvm_vcpu_sbi_ext_data *edata)
> > > {
> > > - /* TODO */
> > > + int ret;
> > > +
> > > + ret = pmu_ctr_read(vcpu, cidx, &edata->out_val);
> > > + if (ret == -EINVAL)
> > > + edata->err_val = SBI_ERR_INVALID_PARAM;
> > > +
> > > return 0;
> > > }
> > >
> > > --
> > > 2.25.1
> > >
> >
> > Regards,
> > Anup
>
>
>
> --
> Regards,
> Atish
--
Regards,
Atish
On Wed, Feb 1, 2023 at 2:11 PM Atish Patra <[email protected]> wrote:
>
> On Tue, Jan 31, 2023 at 7:48 PM Anup Patel <[email protected]> wrote:
> >
> > On Wed, Feb 1, 2023 at 4:05 AM Atish Patra <[email protected]> wrote:
> > >
> > > On Sun, Jan 29, 2023 at 4:30 AM Anup Patel <[email protected]> wrote:
> > > >
> > > > On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
> > > > >
> > > > > This patch only adds barebore structure of perf implementation. Most of
> > > >
> > > > s/barebore/barebone/
> > > >
> > > > > the function returns zero at this point and will be implemented
> > > > > fully in the future.
> > > > >
> > > > > Signed-off-by: Atish Patra <[email protected]>
> > > > > ---
> > > > > arch/riscv/include/asm/kvm_host.h | 3 +
> > > > > arch/riscv/include/asm/kvm_vcpu_pmu.h | 76 ++++++++++++++
> > > > > arch/riscv/kvm/Makefile | 1 +
> > > > > arch/riscv/kvm/vcpu.c | 5 +
> > > > > arch/riscv/kvm/vcpu_pmu.c | 145 ++++++++++++++++++++++++++
> > > > > 5 files changed, 230 insertions(+)
> > > > > create mode 100644 arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > > > create mode 100644 arch/riscv/kvm/vcpu_pmu.c
> > > > >
> > > > > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > > > > index 93f43a3..f9874b4 100644
> > > > > --- a/arch/riscv/include/asm/kvm_host.h
> > > > > +++ b/arch/riscv/include/asm/kvm_host.h
> > > > > @@ -18,6 +18,7 @@
> > > > > #include <asm/kvm_vcpu_insn.h>
> > > > > #include <asm/kvm_vcpu_sbi.h>
> > > > > #include <asm/kvm_vcpu_timer.h>
> > > > > +#include <asm/kvm_vcpu_pmu.h>
> > > > >
> > > > > #define KVM_MAX_VCPUS 1024
> > > > >
> > > > > @@ -228,6 +229,8 @@ struct kvm_vcpu_arch {
> > > > >
> > > > > /* Don't run the VCPU (blocked) */
> > > > > bool pause;
> > > > > +
> > > > > + struct kvm_pmu pmu;
> > > >
> > > > Add a single line comment just like other members of the structure.
> > > >
> > > > I also suggest naming this variable "pmu_context" or something similar
> > > > for naming consistency.
> > > >
> > >
> > > Done.
> > >
> > > > > };
> > > > >
> > > > > static inline void kvm_arch_hardware_unsetup(void) {}
> > > > > diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > > > new file mode 100644
> > > > > index 0000000..3f43a43
> > > > > --- /dev/null
> > > > > +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > > > @@ -0,0 +1,76 @@
> > > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > > +/*
> > > > > + * Copyright (c) 2023 Rivos Inc
> > > > > + *
> > > > > + * Authors:
> > > > > + * Atish Patra <[email protected]>
> > > > > + */
> > > > > +
> > > > > +#ifndef __KVM_VCPU_RISCV_PMU_H
> > > > > +#define __KVM_VCPU_RISCV_PMU_H
> > > > > +
> > > > > +#include <linux/perf/riscv_pmu.h>
> > > > > +#include <asm/kvm_vcpu_sbi.h>
> > > > > +#include <asm/sbi.h>
> > > > > +
> > > > > +#ifdef CONFIG_RISCV_PMU_SBI
> > > > > +#define RISCV_KVM_MAX_FW_CTRS 32
> > > > > +#define RISCV_MAX_COUNTERS 64
> > > > > +
> > > > > +/* Per virtual pmu counter data */
> > > > > +struct kvm_pmc {
> > > > > + u8 idx;
> > > > > + struct perf_event *perf_event;
> > > > > + uint64_t counter_val;
> > > > > + union sbi_pmu_ctr_info cinfo;
> > > > > + /* Event monitoring status */
> > > > > + bool started;
> > > > > +};
> > > > > +
> > > > > +/* PMU data structure per vcpu */
> > > > > +struct kvm_pmu {
> > > > > + struct kvm_pmc pmc[RISCV_MAX_COUNTERS];
> > > > > + /* Number of the virtual firmware counters available */
> > > > > + int num_fw_ctrs;
> > > > > + /* Number of the virtual hardware counters available */
> > > > > + int num_hw_ctrs;
> > > > > + /* A flag to indicate that pmu initialization is done */
> > > > > + bool init_done;
> > > > > + /* Bit map of all the virtual counter used */
> > > > > + DECLARE_BITMAP(pmc_in_use, RISCV_MAX_COUNTERS);
> > > > > +};
> > > > > +
> > > > > +#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
> > > > > +#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
> > > > > +
> > > > > +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
> > > > > +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > > > +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > > + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> > > > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > > > +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > > + unsigned long ctr_mask, unsigned long flag,
> > > > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > > > +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > > + unsigned long ctr_mask, unsigned long flag,
> > > > > + unsigned long eidx, uint64_t evtdata,
> > > > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > > > +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > > > + struct kvm_vcpu_sbi_ext_data *edata);
> > > > > +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
> > > > > +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
> > > > > +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> > > > > +
> > > > > +#else
> > > > > +struct kvm_pmu {
> > > > > +};
> > > > > +
> > > > > +static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > + return 0;
> > > > > +}
> > > > > +static inline void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu) {}
> > > > > +static inline void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu) {}
> > > > > +#endif /* CONFIG_RISCV_PMU_SBI */
> > > > > +#endif /* !__KVM_VCPU_RISCV_PMU_H */
> > > > > diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> > > > > index 019df920..5de1053 100644
> > > > > --- a/arch/riscv/kvm/Makefile
> > > > > +++ b/arch/riscv/kvm/Makefile
> > > > > @@ -25,3 +25,4 @@ kvm-y += vcpu_sbi_base.o
> > > > > kvm-y += vcpu_sbi_replace.o
> > > > > kvm-y += vcpu_sbi_hsm.o
> > > > > kvm-y += vcpu_timer.o
> > > > > +kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
> > > > > diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> > > > > index 7c08567..b746f21 100644
> > > > > --- a/arch/riscv/kvm/vcpu.c
> > > > > +++ b/arch/riscv/kvm/vcpu.c
> > > > > @@ -137,6 +137,7 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
> > > > >
> > > > > WRITE_ONCE(vcpu->arch.irqs_pending, 0);
> > > > > WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
> > > >
> > > > Add an empty newline here.
> > > >
> > >
> > > Done.
> > > > > + kvm_riscv_vcpu_pmu_reset(vcpu);
> > > > >
> > > > > vcpu->arch.hfence_head = 0;
> > > > > vcpu->arch.hfence_tail = 0;
> > > > > @@ -194,6 +195,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> > > > > /* Setup VCPU timer */
> > > > > kvm_riscv_vcpu_timer_init(vcpu);
> > > > >
> > > > > + /* setup performance monitoring */
> > > > > + kvm_riscv_vcpu_pmu_init(vcpu);
> > > > > +
> > > > > /* Reset VCPU */
> > > > > kvm_riscv_reset_vcpu(vcpu);
> > > > >
> > > > > @@ -216,6 +220,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> > > > > /* Cleanup VCPU timer */
> > > > > kvm_riscv_vcpu_timer_deinit(vcpu);
> > > > >
> > > > > + kvm_riscv_vcpu_pmu_deinit(vcpu);
> > > >
> > > > Add an empty newline here.
> > > >
> > >
> > > Done.
> > >
> > > > > /* Free unused pages pre-allocated for G-stage page table mappings */
> > > > > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> > > > > }
> > > > > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > > > > new file mode 100644
> > > > > index 0000000..d3fd551
> > > > > --- /dev/null
> > > > > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > > > > @@ -0,0 +1,145 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0
> > > > > +/*
> > > > > + * Copyright (c) 2023 Rivos Inc
> > > > > + *
> > > > > + * Authors:
> > > > > + * Atish Patra <[email protected]>
> > > > > + */
> > > > > +
> > > > > +#include <linux/errno.h>
> > > > > +#include <linux/err.h>
> > > > > +#include <linux/kvm_host.h>
> > > > > +#include <linux/perf/riscv_pmu.h>
> > > > > +#include <asm/csr.h>
> > > > > +#include <asm/kvm_vcpu_sbi.h>
> > > > > +#include <asm/kvm_vcpu_pmu.h>
> > > > > +#include <linux/kvm_host.h>
> > > > > +
> > > > > +#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> > > > > +
> > > > > +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
> > > > > +{
> > > > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > > > +
> > > > > + edata->out_val = kvm_pmu_num_counters(kvpmu);
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > > > +{
> > > > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > > > +
> > > > > + if (cidx > RISCV_MAX_COUNTERS || cidx == 1) {
> > > > > + edata->err_val = SBI_ERR_INVALID_PARAM;
> > > > > + return 0;
> > > > > + }
> > > > > +
> > > > > + edata->out_val = kvpmu->pmc[cidx].cinfo.value;
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > > + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> > > > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > > > +{
> > > > > + /* TODO */
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > > + unsigned long ctr_mask, unsigned long flag,
> > > > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > > > +{
> > > > > + /* TODO */
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > > > > + unsigned long ctr_mask, unsigned long flag,
> > > > > + unsigned long eidx, uint64_t evtdata,
> > > > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > > > +{
> > > > > + /* TODO */
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > > > + struct kvm_vcpu_sbi_ext_data *edata)
> > > > > +{
> > > > > + /* TODO */
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > + int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
> > > > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > > > + struct kvm_pmc *pmc;
> > > > > +
> > > > > + ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
> > > > > + if (ret < 0)
> > > > > + return ret;
> > > > > +
> > > > > + if (!hpm_width || !num_hw_ctrs) {
> > > > > + pr_err("Cannot initialize VCPU with NULL hpmcounter width or number of counters\n");
> > > >
> > > > What will happen if underlying M-mode firmware does not implement
> > > > SBI PMU extension ?
> > > >
> > >
> > > riscv_pmu_get_hpm_info will return an error and
> > > kvm_riscv_vcpu_pmu_init will fail.
> >
> > Ahh okay. The vcpu_create() is not looking at return value
> > of kvm_riscv_vcpu_pmu_init() but this function should fail
> > silently "return 0" if riscv_pmu_get_hpm_info() fails.
> >
>
> Sure. I am assuming it should return 0 for the next case as well.
> if (!hpm_width || !num_hw_ctrs)
Yes, I agree. Better to make kvm_riscv_vcpu_pmu_init() return void.
>
> Thinking over again, does it make sense to print the error message in
> the above case ?
> In case you do see some use of this error message, it probably needs
> improvement as well
>
> "Cannot initialize pmu for vcpu %d with invalid counter width or
> number of counters"
For now, let's not have any error message in kvm_riscv_vcpu_pmu_init().
>
> > >
> > > > > + return -EINVAL;
> > > > > + }
> > > > > +
> > > > > + if ((num_hw_ctrs + RISCV_KVM_MAX_FW_CTRS) > RISCV_MAX_COUNTERS) {
> > > > > + pr_warn("Limiting fw counters as hw & fw counters exceed maximum counters\n");
> > > >
> > > > How is this possible ?
> > > >
> > > > Maximum HW counters is 32 (including time, cycle, and instret),
> > > > RISCV_KVM_MAX_FW_CTRS = 32, and
> > > > RISCV_MAX_COUNTERS = 64
> > >
> > > This was added to prevent the condition where somebody changed the
> > > definition RISCV_KVM_MAX_FW_CTRS
> > > without incrementing MAX_COUNTERS. The error might be subtle as it may
> > > work on some platforms (with less hardware counter)
> > > but fail on others (with more hardware counters.)
> > > I couldn't find a better way to describe the relationship between
> > > RISCV_KVM_MAX_FW_CTRS and RISCV_MAX_COUNTERS.
> > >
> > > I can just put a big comment here instead of the condition check if
> > > you prefer that way.
> >
> > Maybe you can add a compile-time check using "#if" and "#error" ?
> >
>
> You mean compile time error when RISCV_KVM_MAX_FW_CTRS > 32 ?
> Because num_hw_ctrs is computed at the runtime. So we can't determine
> the actual limit of firmware counters unless we know how many hardware
> counters are there.
Yes, we should ensure that RISCV_KVM_MAX_FW_CTRS > 32 does not
compile with a proper error message.
>
> > >
> > >
> > > >
> > > > > + num_fw_ctrs = RISCV_MAX_COUNTERS - num_hw_ctrs;
> > > > > + } else
> > > > > + num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;
> > > > > +
> > > > > + kvpmu->num_hw_ctrs = num_hw_ctrs;
> > > > > + kvpmu->num_fw_ctrs = num_fw_ctrs;
> > > > > +
> > > > > + /*
> > > > > + * There is no correlation between the logical hardware counter and virtual counters.
> > > > > + * However, we need to encode a hpmcounter CSR in the counter info field so that
> > > > > + * KVM can trap n emulate the read. This works well in the migration use case as
> > > > > + * KVM doesn't care if the actual hpmcounter is available in the hardware or not.
> > > > > + */
> > > > > + for (i = 0; i < kvm_pmu_num_counters(kvpmu); i++) {
> > > > > + /* TIME CSR shouldn't be read from perf interface */
> > > > > + if (i == 1)
> > > > > + continue;
> > > > > + pmc = &kvpmu->pmc[i];
> > > > > + pmc->idx = i;
> > > > > + if (i < kvpmu->num_hw_ctrs) {
> > > > > + kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
> > > > > + if (i < 3)
> > > > > + /* CY, IR counters */
> > > > > + kvpmu->pmc[i].cinfo.width = 63;
> > > > > + else
> > > > > + kvpmu->pmc[i].cinfo.width = hpm_width;
> > > > > + /*
> > > > > + * The CSR number doesn't have any relation with the logical
> > > > > + * hardware counters. The CSR numbers are encoded sequentially
> > > > > + * to avoid maintaining a map between the virtual counter
> > > > > + * and CSR number.
> > > > > + */
> > > > > + pmc->cinfo.csr = CSR_CYCLE + i;
> > > > > + } else {
> > > > > + pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
> > > > > + pmc->cinfo.width = BITS_PER_LONG - 1;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + kvpmu->init_done = true;
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > + /* TODO */
> > > > > +}
> > > > > +
> > > > > +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > + kvm_riscv_vcpu_pmu_deinit(vcpu);
> > > > > +}
> > > > > --
> > > > > 2.25.1
> > > > >
> > > >
> > > > Regards,
> > > > Anup
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Atish
> >
> > Regards,
> > Anup
>
>
>
> --
> Regards,
> Atish
Regards,
Anup
On Wed, Feb 1, 2023 at 2:29 PM Atish Patra <[email protected]> wrote:
>
> On Tue, Jan 31, 2023 at 2:46 PM Atish Patra <[email protected]> wrote:
> >
> > On Sun, Jan 29, 2023 at 4:44 AM Anup Patel <[email protected]> wrote:
> > >
> > > On Fri, Jan 27, 2023 at 11:56 PM Atish Patra <[email protected]> wrote:
> > > >
> > > > As the KVM guests only see the virtual PMU counters, all hpmcounter
> > > > access should trap and KVM emulates the read access on behalf of guests.
> > > >
> > > > Reviewed-by: Andrew Jones <[email protected]>
> > > > Signed-off-by: Atish Patra <[email protected]>
> > > > ---
> > > > arch/riscv/include/asm/kvm_vcpu_pmu.h | 16 ++++++++++
> > > > arch/riscv/kvm/vcpu_insn.c | 4 ++-
> > > > arch/riscv/kvm/vcpu_pmu.c | 45 ++++++++++++++++++++++++++-
> > > > 3 files changed, 63 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > > index 3f43a43..022d45d 100644
> > > > --- a/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > > +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > > > @@ -43,6 +43,19 @@ struct kvm_pmu {
> > > > #define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
> > > > #define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
> > > >
> > > > +#if defined(CONFIG_32BIT)
> > > > +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> > > > +{ .base = CSR_CYCLEH, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm }, \
> > > > +{ .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
> > > > +#else
> > > > +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> > > > +{ .base = CSR_CYCLE, .count = 31, .func = kvm_riscv_vcpu_pmu_read_hpm },
> > > > +#endif
> > > > +
> > > > +int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> > > > + unsigned long *val, unsigned long new_val,
> > > > + unsigned long wr_mask);
> > > > +
> > > > int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
> > > > int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > > struct kvm_vcpu_sbi_ext_data *edata);
> > > > @@ -65,6 +78,9 @@ void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> > > > #else
> > > > struct kvm_pmu {
> > > > };
> > > > +#define KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS \
> > > > +{ .base = 0, .count = 0, .func = NULL },
> > > > +
> > >
> > > Redundant newline here.
> > >
> >
> > Fixed.
> >
> > > >
> > > > static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > > > {
> > > > diff --git a/arch/riscv/kvm/vcpu_insn.c b/arch/riscv/kvm/vcpu_insn.c
> > > > index 0bb5276..f689337 100644
> > > > --- a/arch/riscv/kvm/vcpu_insn.c
> > > > +++ b/arch/riscv/kvm/vcpu_insn.c
> > > > @@ -213,7 +213,9 @@ struct csr_func {
> > > > unsigned long wr_mask);
> > > > };
> > > >
> > > > -static const struct csr_func csr_funcs[] = { };
> > > > +static const struct csr_func csr_funcs[] = {
> > > > + KVM_RISCV_VCPU_HPMCOUNTER_CSR_FUNCS
> > > > +};
> > > >
> > > > /**
> > > > * kvm_riscv_vcpu_csr_return -- Handle CSR read/write after user space
> > > > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > > > index 7713927..894053a 100644
> > > > --- a/arch/riscv/kvm/vcpu_pmu.c
> > > > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > > > @@ -17,6 +17,44 @@
> > > >
> > > > #define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> > > >
> > > > +static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > > + unsigned long *out_val)
> > > > +{
> > > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > > + struct kvm_pmc *pmc;
> > > > + u64 enabled, running;
> > > > +
> > > > + pmc = &kvpmu->pmc[cidx];
> > > > + if (!pmc->perf_event)
> > > > + return -EINVAL;
> > > > +
> > > > + pmc->counter_val += perf_event_read_value(pmc->perf_event, &enabled, &running);
> > > > + *out_val = pmc->counter_val;
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +int kvm_riscv_vcpu_pmu_read_hpm(struct kvm_vcpu *vcpu, unsigned int csr_num,
> > > > + unsigned long *val, unsigned long new_val,
> > > > + unsigned long wr_mask)
> > > > +{
> > > > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > > + int cidx, ret = KVM_INSN_CONTINUE_NEXT_SEPC;
> > > > +
> > > > + if (!kvpmu || !kvpmu->init_done)
> > > > + return KVM_INSN_EXIT_TO_USER_SPACE;
> > >
> > > As discussed previously, this should be KVM_INSN_ILLEGAL_TRAP.
> > >
>
> Thinking about it more, this results in a panic in guest S-mode which
> is probably undesirable.
> As per your earlier suggestion, we can return 0 for cycle/instret
> counters if accessed.
> This is only possible through legacy pmu drivers running in guests or
> some other OS that access any hpmcounters
> for random reasons.
>
> I think we should return KVM_INSN_ILLEGAL_TRAP for other counters and
> make the guest kernel panic.
> This does separate the behavior between fixed and programmable
> counters when everything is denied access in hcounteren.
>
> The new code will look like this:
>
> if (!kvpmu || !kvpmu->init_done) {
> if (csr_num == CSR_CYCLE || csr_num == CSR_INSTRET) {
> *val = 0;
> return ret;
> } else
> return KVM_INSN_ILLEGAL_TRAP;
> }
>
> Let me know if you think otherwise.
Looks good to me. Please also add comment block inside
"if (!kvpmu || !kvpmu->init_done)"
>
> >
> > Done.
> > > > +
> > > > + if (wr_mask)
> > > > + return KVM_INSN_ILLEGAL_TRAP;
> > > > +
> > > > + cidx = csr_num - CSR_CYCLE;
> > > > +
> > > > + if (pmu_ctr_read(vcpu, cidx, val) < 0)
> > > > + return KVM_INSN_EXIT_TO_USER_SPACE;
> > >
> > > Same as above.
> > >
>
> We can get rid of this as pmu_ctr_read doesn't return errors anyways.
>
> >
> > Done.
> >
> > > > +
> > > > + return ret;
> > > > +}
> > > > +
> > > > int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
> > > > {
> > > > struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > > > @@ -69,7 +107,12 @@ int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_ba
> > > > int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > > > struct kvm_vcpu_sbi_ext_data *edata)
> > > > {
> > > > - /* TODO */
> > > > + int ret;
> > > > +
> > > > + ret = pmu_ctr_read(vcpu, cidx, &edata->out_val);
> > > > + if (ret == -EINVAL)
> > > > + edata->err_val = SBI_ERR_INVALID_PARAM;
> > > > +
> > > > return 0;
> > > > }
> > > >
> > > > --
> > > > 2.25.1
> > > >
> > >
> > > Regards,
> > > Anup
> >
> >
> >
> > --
> > Regards,
> > Atish
>
>
>
> --
> Regards,
> Atish
Regards,
Anup