LinuxLists.cc - [PATCH v2 00/11] KVM perf support

2022-12-15 17:33:18

Subject: [PATCH v2 00/11] KVM perf support

This series extends perf support for KVM. The KVM implementation relies
on the SBI PMU extension and trap n emulation of hpmcounter CSRs.
The KVM implementation exposes the virtual counters to the guest and internally
manage the counters using kernel perf counters.

This series doesn't support the counter overflow as the Sscofpmf extension
doesn't allow trap & emulation mechanism of scountovf CSR yet. The required
changes to allow that are being under discussions. Supporting overflow interrupt
also requires AIA interrupt filtering support.

1. PATCH1-4 are generic KVM/PMU driver improvements.
2. PATCH8 disables hpmcounter for now. It will be enabled to maintain ABI
requirement once the ONE reg interface is settled.

perf stat works in kvm guests with this series.

Here is example of running perf stat in a guest running in KVM.

===========================================================================
/ # /host/apps/perf stat -e instructions -e cycles -e r8000000000000005 \
> -e r8000000000000006 -e r8000000000000007 -e r8000000000000008 \
> -e r800000000000000a perf bench sched messaging -g 10 -l 10

# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 7.769 [sec]

Performance counter stats for 'perf bench sched messaging -g 10 -l 10':

73556259604 cycles
73387266056 instructions # 1.00 insn per cycle
0 dTLB-store-misses
0 iTLB-load-misses
0 r8000000000000005
2595 r8000000000000006
2272 r8000000000000007
10 r8000000000000008
0 r800000000000000a

12.173720400 seconds time elapsed

1.002716000 seconds user
21.931047000 seconds sys

Note: The SBI_PMU_FW_SET_TIMER (eventid : r8000000000000005) is zero
as kvm guest supports sstc now.

This series can be found here as well.
https://github.com/atishp04/linux/tree/kvm_perf_v2

TODO:
1. Add sscofpmf support.
2. Add One reg interface for the following operations:
1. Enable/Disable PMU (should it at VM level rather than vcpu ?)
2. Number of hpmcounter and width of the counters
3. Init PMU
4. Allow guest user to access cycle & instret without trapping

Changes from v1->v2:
1. Addressed comments from Andrew.
2. Removed kvpmu sanity check.
3. Added a kvm pmu init flag and the sanity check to probe function.
4. Improved the linux vs sbi error code handling.

Atish Patra (11):
RISC-V: Define helper functions expose hpm counter width and count
RISC-V: KVM: Define a probe function for SBI extension data structures
RISC-V: KVM: Return correct code for hsm stop function
RISC-V: KVM: Modify SBI extension handler to return SBI error code
RISC-V: KVM: Improve privilege mode filtering for perf
RISC-V: KVM: Add skeleton support for perf
RISC-V: KVM: Add SBI PMU extension support
RISC-V: KVM: Disable all hpmcounter access for VS/VU mode
RISC-V: KVM: Implement trap & emulate for hpmcounters
RISC-V: KVM: Implement perf support without sampling
RISC-V: KVM: Implement firmware events

arch/riscv/include/asm/kvm_host.h | 3 +
arch/riscv/include/asm/kvm_vcpu_pmu.h | 108 +++++
arch/riscv/include/asm/kvm_vcpu_sbi.h | 13 +-
arch/riscv/include/asm/sbi.h | 2 +-
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/main.c | 3 +-
arch/riscv/kvm/tlb.c | 6 +-
arch/riscv/kvm/vcpu.c | 5 +
arch/riscv/kvm/vcpu_insn.c | 4 +-
arch/riscv/kvm/vcpu_pmu.c | 585 ++++++++++++++++++++++++++
arch/riscv/kvm/vcpu_sbi.c | 56 ++-
arch/riscv/kvm/vcpu_sbi_base.c | 45 +-
arch/riscv/kvm/vcpu_sbi_hsm.c | 22 +-
arch/riscv/kvm/vcpu_sbi_pmu.c | 86 ++++
arch/riscv/kvm/vcpu_sbi_replace.c | 51 ++-
drivers/perf/riscv_pmu_sbi.c | 62 ++-
include/linux/perf/riscv_pmu.h | 5 +
17 files changed, 963 insertions(+), 94 deletions(-)
create mode 100644 arch/riscv/include/asm/kvm_vcpu_pmu.h
create mode 100644 arch/riscv/kvm/vcpu_pmu.c
create mode 100644 arch/riscv/kvm/vcpu_sbi_pmu.c

--
2.25.1

2022-12-15 17:36:38

by Atish Patra

[permalink] [raw]

Subject: [PATCH v2 03/11] RISC-V: KVM: Return correct code for hsm stop function

According to the SBI specification, the stop function can only
return error code SBI_ERR_FAILED. However, currently it returns
-EINVAL which will be mapped SBI_ERR_INVALID_PARAM.

Return the appropriate linux error code.

Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/vcpu_sbi_hsm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/kvm/vcpu_sbi_hsm.c b/arch/riscv/kvm/vcpu_sbi_hsm.c
index 2e915ca..0f8d9fe 100644
--- a/arch/riscv/kvm/vcpu_sbi_hsm.c
+++ b/arch/riscv/kvm/vcpu_sbi_hsm.c
@@ -42,7 +42,7 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu)
static int kvm_sbi_hsm_vcpu_stop(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.power_off)
- return -EINVAL;
+ return -EPERM;

kvm_riscv_vcpu_power_off(vcpu);

--
2.25.1

2022-12-15 17:38:21

by Atish Patra

[permalink] [raw]

Subject: [PATCH v2 06/11] RISC-V: KVM: Add skeleton support for perf

This patch only adds barebore structure of perf implementation. Most of
the function returns zero at this point and will be implemented
fully in the future.

Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/kvm_host.h | 3 +
arch/riscv/include/asm/kvm_vcpu_pmu.h | 76 ++++++++++++++
arch/riscv/kvm/Makefile | 1 +
arch/riscv/kvm/vcpu.c | 5 +
arch/riscv/kvm/vcpu_insn.c | 2 +-
arch/riscv/kvm/vcpu_pmu.c | 142 ++++++++++++++++++++++++++
6 files changed, 228 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/include/asm/kvm_vcpu_pmu.h
create mode 100644 arch/riscv/kvm/vcpu_pmu.c

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 93f43a3..f9874b4 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -18,6 +18,7 @@
#include <asm/kvm_vcpu_insn.h>
#include <asm/kvm_vcpu_sbi.h>
#include <asm/kvm_vcpu_timer.h>
+#include <asm/kvm_vcpu_pmu.h>

#define KVM_MAX_VCPUS 1024

@@ -228,6 +229,8 @@ struct kvm_vcpu_arch {

/* Don't run the VCPU (blocked) */
bool pause;
+
+ struct kvm_pmu pmu;
};

static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
new file mode 100644
index 0000000..6a8c0f7
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022 Rivos Inc
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#ifndef __KVM_VCPU_RISCV_PMU_H
+#define __KVM_VCPU_RISCV_PMU_H
+
+#include <linux/perf/riscv_pmu.h>
+#include <asm/kvm_vcpu_sbi.h>
+#include <asm/sbi.h>
+
+#ifdef CONFIG_RISCV_PMU_SBI
+#define RISCV_KVM_MAX_FW_CTRS 32
+#define RISCV_MAX_COUNTERS 64
+
+/* Per virtual pmu counter data */
+struct kvm_pmc {
+ u8 idx;
+ struct perf_event *perf_event;
+ uint64_t counter_val;
+ union sbi_pmu_ctr_info cinfo;
+ /* Event monitoring status */
+ bool started;
+};
+
+/* PMU data structure per vcpu */
+struct kvm_pmu {
+ struct kvm_pmc pmc[RISCV_MAX_COUNTERS];
+ /* Number of the virtual firmware counters available */
+ int num_fw_ctrs;
+ /* Number of the virtual hardware counters available */
+ int num_hw_ctrs;
+ /* A flag to indicate that pmu initialization is done */
+ bool init_done;
+ /* Bit map of all the virtual counter used */
+ DECLARE_BITMAP(pmc_in_use, RISCV_MAX_COUNTERS);
+};
+
+#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
+#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
+
+int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
+int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
+ struct kvm_vcpu_sbi_ext_data *edata);
+int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag, uint64_t ival,
+ struct kvm_vcpu_sbi_ext_data *edata);
+int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag,
+ struct kvm_vcpu_sbi_ext_data *edata);
+int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag,
+ unsigned long eidx, uint64_t edata,
+ struct kvm_vcpu_sbi_ext_data *extdata);
+int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
+ struct kvm_vcpu_sbi_ext_data *edata);
+int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
+
+#else
+struct kvm_pmu {
+};
+
+static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
+{
+ return 0;
+}
+static inline void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu) {}
+static inline void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu) {}
+#endif
+#endif
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 019df920..5de1053 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -25,3 +25,4 @@ kvm-y += vcpu_sbi_base.o
kvm-y += vcpu_sbi_replace.o
kvm-y += vcpu_sbi_hsm.o
kvm-y += vcpu_timer.o
+kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 7c08567..b746f21 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -137,6 +137,7 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)

WRITE_ONCE(vcpu->arch.irqs_pending, 0);
WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
+ kvm_riscv_vcpu_pmu_reset(vcpu);

vcpu->arch.hfence_head = 0;
vcpu->arch.hfence_tail = 0;
@@ -194,6 +195,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
/* Setup VCPU timer */
kvm_riscv_vcpu_timer_init(vcpu);

+ /* setup performance monitoring */
+ kvm_riscv_vcpu_pmu_init(vcpu);
+
/* Reset VCPU */
kvm_riscv_reset_vcpu(vcpu);

@@ -216,6 +220,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
/* Cleanup VCPU timer */
kvm_riscv_vcpu_timer_deinit(vcpu);

+ kvm_riscv_vcpu_pmu_deinit(vcpu);
/* Free unused pages pre-allocated for G-stage page table mappings */
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
}
diff --git a/arch/riscv/kvm/vcpu_insn.c b/arch/riscv/kvm/vcpu_insn.c
index 0bb5276..1ff2649 100644
--- a/arch/riscv/kvm/vcpu_insn.c
+++ b/arch/riscv/kvm/vcpu_insn.c
@@ -213,7 +213,7 @@ struct csr_func {
unsigned long wr_mask);
};

-static const struct csr_func csr_funcs[] = { };
+static const struct csr_func csr_funcs[] = {};

/**
* kvm_riscv_vcpu_csr_return -- Handle CSR read/write after user space
diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
new file mode 100644
index 0000000..0f0748f1
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Rivos Inc
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <linux/perf/riscv_pmu.h>
+#include <asm/csr.h>
+#include <asm/kvm_vcpu_sbi.h>
+#include <asm/kvm_vcpu_pmu.h>
+#include <linux/kvm_host.h>
+
+#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
+
+int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+
+ edata->out_val = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
+ struct kvm_vcpu_sbi_ext_data *edata)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+
+ if ((cidx > RISCV_MAX_COUNTERS) || (cidx == 1)) {
+ edata->err_val = SBI_ERR_INVALID_PARAM;
+ return 0;
+ }
+
+ edata->out_val = kvpmu->pmc[cidx].cinfo.value;
+
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag, uint64_t ival,
+ struct kvm_vcpu_sbi_ext_data *edata)
+{
+ /* TODO */
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag,
+ struct kvm_vcpu_sbi_ext_data *edata)
+{
+ /* TODO */
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
+ unsigned long ctr_mask, unsigned long flag,
+ unsigned long eidx, uint64_t edata,
+ struct kvm_vcpu_sbi_ext_data *extdata)
+{
+ /* TODO */
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
+ struct kvm_vcpu_sbi_ext_data *edata)
+{
+ /* TODO */
+ return 0;
+}
+
+int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
+{
+ int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+
+ ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
+ if (ret < 0)
+ return ret;
+
+ if (!hpm_width || !num_hw_ctrs) {
+ pr_err("Can not initialize PMU for vcpu with NULL hpmcounter width/count\n");
+ return -EINVAL;
+ }
+
+ if ((num_hw_ctrs + RISCV_KVM_MAX_FW_CTRS) > RISCV_MAX_COUNTERS)
+ num_fw_ctrs = RISCV_MAX_COUNTERS - num_hw_ctrs;
+ else
+ num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;
+
+ kvpmu->num_hw_ctrs = num_hw_ctrs;
+ kvpmu->num_fw_ctrs = num_fw_ctrs;
+ /*
+ * There is no corelation betwen the logical hardware counter and virtual counters.
+ * However, we need to encode a hpmcounter CSR in the counter info field so that
+ * KVM can trap n emulate the read. This works well in the migraiton usecase as
+ * KVM doesn't care if the actual hpmcounter is available in the hardware or not.
+ */
+ for (i = 0; i < kvm_pmu_num_counters(kvpmu); i++) {
+ /* TIME CSR shouldn't be read from perf interface */
+ if (i == 1)
+ continue;
+ kvpmu->pmc[i].idx = i;
+ if (i < kvpmu->num_hw_ctrs) {
+ kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
+ if (i < 3)
+ /* CY, IR counters */
+ kvpmu->pmc[i].cinfo.width = 63;
+ else
+ kvpmu->pmc[i].cinfo.width = hpm_width;
+ /*
+ * The CSR number doesn't have any relation with the logical
+ * hardware counters. The CSR numbers are encoded sequentially
+ * to avoid maintaining a map between the virtual counter
+ * and CSR number.
+ */
+ kvpmu->pmc[i].cinfo.csr = CSR_CYCLE + i;
+ } else {
+ kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_FW;
+ kvpmu->pmc[i].cinfo.width = BITS_PER_LONG - 1;
+ }
+ }
+
+ kvpmu->init_done = true;
+
+ return 0;
+}
+
+void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
+{
+ /* TODO */
+}
+
+void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
+{
+ /* TODO */
+}
+
--
2.25.1

2022-12-15 17:48:16

by Atish Patra

[permalink] [raw]

Subject: [PATCH v2 01/11] RISC-V: Define helper functions expose hpm counter width and count

KVM module needs to know how many hardware counters and the counter
width that the platform supports. Otherwise, it will not be able to show
optimal value of virtual counters to the guest. The virtual hardware
counters also need to have the same width as the logical hardware
counters for simplicity. However, there shouldn't be mapping between
virtual hardware counters and logical hardware counters. As we don't
support hetergeneous harts or counters with different width as of now,
the implementation relies on the counter width of the first available
programmable counter.

Signed-off-by: Atish Patra <[email protected]>
---
drivers/perf/riscv_pmu_sbi.c | 35 +++++++++++++++++++++++++++++++++-
include/linux/perf/riscv_pmu.h | 3 +++
2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 3852c18..65d4aa4 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -49,6 +49,9 @@ static const struct attribute_group *riscv_pmu_attr_groups[] = {
static union sbi_pmu_ctr_info *pmu_ctr_list;
static unsigned int riscv_pmu_irq;

+/* Cache the available counters in a bitmask */
+unsigned long cmask;
+
struct sbi_pmu_event_data {
union {
union {
@@ -264,6 +267,37 @@ static bool pmu_sbi_ctr_is_fw(int cidx)
return (info->type == SBI_PMU_CTR_TYPE_FW) ? true : false;
}

+/*
+ * Returns the counter width of a programmable counter and number of hardware
+ * counters. As we don't support heterneous CPUs yet, it is okay to just
+ * return the counter width of the first programmable counter.
+ */
+int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr)
+{
+ int i;
+ union sbi_pmu_ctr_info *info;
+ u32 hpm_width = 0, hpm_count = 0;
+
+ if (!cmask)
+ return -EINVAL;
+
+ for_each_set_bit(i, &cmask, RISCV_MAX_COUNTERS) {
+ info = &pmu_ctr_list[i];
+ if (!info)
+ continue;
+ if (!hpm_width && (info->csr != CSR_CYCLE) && (info->csr != CSR_INSTRET))
+ hpm_width = info->width;
+ if (info->type == SBI_PMU_CTR_TYPE_HW)
+ hpm_count++;
+ }
+
+ *hw_ctr_width = hpm_width;
+ *num_hw_ctr = hpm_count;
+
+ return 0;
+}
+EXPORT_SYMBOL(riscv_pmu_get_hpm_info);
+
static int pmu_sbi_ctr_get_idx(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
@@ -798,7 +832,6 @@ static void riscv_pmu_destroy(struct riscv_pmu *pmu)
static int pmu_sbi_device_probe(struct platform_device *pdev)
{
struct riscv_pmu *pmu = NULL;
- unsigned long cmask = 0;
int ret = -ENODEV;
int num_counters;

diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index e17e86a..a1c3f77 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -73,6 +73,9 @@ void riscv_pmu_legacy_skip_init(void);
static inline void riscv_pmu_legacy_skip_init(void) {};
#endif
struct riscv_pmu *riscv_pmu_alloc(void);
+#ifdef CONFIG_RISCV_PMU_SBI
+int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr);
+#endif

#endif /* CONFIG_RISCV_PMU */

--
2.25.1

2022-12-15 17:50:58

by Atish Patra

[permalink] [raw]

Subject: [PATCH v2 10/11] RISC-V: KVM: Implement perf support without sampling

RISC-V SBI PMU & Sscofpmf ISA extension allows supporting perf in
the virtualization enviornment as well. KVM implementation
relies on SBI PMU extension for most the most part while trapping
& emulating the CSRs read for counter access.

This patch doesn't have the event sampling support yet.

Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/vcpu_pmu.c | 358 ++++++++++++++++++++++++++++++++++++--
1 file changed, 342 insertions(+), 16 deletions(-)

diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
index 53c4163..21c1f0f 100644
--- a/arch/riscv/kvm/vcpu_pmu.c
+++ b/arch/riscv/kvm/vcpu_pmu.c
@@ -12,10 +12,163 @@
#include <linux/perf/riscv_pmu.h>
#include <asm/csr.h>
#include <asm/kvm_vcpu_sbi.h>
+#include <asm/bitops.h>
#include <asm/kvm_vcpu_pmu.h>
#include <linux/kvm_host.h>

#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
+#define get_event_type(x) ((x & SBI_PMU_EVENT_IDX_TYPE_MASK) >> 16)
+#define get_event_code(x) (x & SBI_PMU_EVENT_IDX_CODE_MASK)
+
+static inline u64 pmu_get_sample_period(struct kvm_pmc *pmc)
+{
+ u64 counter_val_mask = GENMASK(pmc->cinfo.width, 0);
+ u64 sample_period;
+
+ if (!pmc->counter_val)
+ sample_period = counter_val_mask;
+ else
+ sample_period = (-pmc->counter_val) & counter_val_mask;
+
+ return sample_period;
+}
+
+static u32 pmu_get_perf_event_type(unsigned long eidx)
+{
+ enum sbi_pmu_event_type etype = get_event_type(eidx);
+ u32 type;
+
+ if (etype == SBI_PMU_EVENT_TYPE_HW)
+ type = PERF_TYPE_HARDWARE;
+ else if (etype == SBI_PMU_EVENT_TYPE_CACHE)
+ type = PERF_TYPE_HW_CACHE;
+ else if (etype == SBI_PMU_EVENT_TYPE_RAW || etype == SBI_PMU_EVENT_TYPE_FW)
+ type = PERF_TYPE_RAW;
+ else
+ type = PERF_TYPE_MAX;
+
+ return type;
+}
+
+static inline bool pmu_is_fw_event(unsigned long eidx)
+{
+
+ return get_event_type(eidx) == SBI_PMU_EVENT_TYPE_FW;
+}
+
+static void pmu_release_perf_event(struct kvm_pmc *pmc)
+{
+ if (pmc->perf_event) {
+ perf_event_disable(pmc->perf_event);
+ perf_event_release_kernel(pmc->perf_event);
+ pmc->perf_event = NULL;
+ }
+}
+
+static u64 pmu_get_perf_event_hw_config(u32 sbi_event_code)
+{
+ /* SBI PMU HW event code is offset by 1 from perf hw event codes */
+ return (u64)sbi_event_code - 1;
+}
+
+static u64 pmu_get_perf_event_cache_config(u32 sbi_event_code)
+{
+ u64 config = U64_MAX;
+ unsigned int cache_type, cache_op, cache_result;
+
+ /* All the cache event masks lie within 0xFF. No separate masking is necesssary */
+ cache_type = (sbi_event_code & SBI_PMU_EVENT_CACHE_ID_CODE_MASK) >> 3;
+ cache_op = (sbi_event_code & SBI_PMU_EVENT_CACHE_OP_ID_CODE_MASK) >> 1;
+ cache_result = sbi_event_code & SBI_PMU_EVENT_CACHE_RESULT_ID_CODE_MASK;
+
+ if (cache_type >= PERF_COUNT_HW_CACHE_MAX ||
+ cache_op >= PERF_COUNT_HW_CACHE_OP_MAX ||
+ cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
+ return config;
+
+ config = cache_type | (cache_op << 8) | (cache_result << 16);
+
+ return config;
+}
+
+static u64 pmu_get_perf_event_config(unsigned long eidx, uint64_t evt_data)
+{
+ enum sbi_pmu_event_type etype = get_event_type(eidx);
+ u32 ecode = get_event_code(eidx);
+ u64 config = U64_MAX;
+
+ if (etype == SBI_PMU_EVENT_TYPE_HW)
+ config = pmu_get_perf_event_hw_config(ecode);
+ else if (etype == SBI_PMU_EVENT_TYPE_CACHE)
+ config = pmu_get_perf_event_cache_config(ecode);
+ else if (etype == SBI_PMU_EVENT_TYPE_RAW)
+ config = evt_data & RISCV_PMU_RAW_EVENT_MASK;
+ else if ((etype == SBI_PMU_EVENT_TYPE_FW) && (ecode < SBI_PMU_FW_MAX))
+ config = (1ULL << 63) | ecode;
+
+ return config;
+}
+
+static int pmu_get_fixed_pmc_index(unsigned long eidx)
+{
+ u32 etype = pmu_get_perf_event_type(eidx);
+ u32 ecode = get_event_code(eidx);
+ int ctr_idx;
+
+ if (etype != SBI_PMU_EVENT_TYPE_HW)
+ return -EINVAL;
+
+ if (ecode == SBI_PMU_HW_CPU_CYCLES)
+ ctr_idx = 0;
+ else if (ecode == SBI_PMU_HW_INSTRUCTIONS)
+ ctr_idx = 2;
+ else
+ return -EINVAL;
+
+ return ctr_idx;
+}
+
+static int pmu_get_programmable_pmc_index(struct kvm_pmu *kvpmu, unsigned long eidx,
+ unsigned long cbase, unsigned long cmask)
+{
+ int ctr_idx = -1;
+ int i, pmc_idx;
+ int min, max;
+
+ if (pmu_is_fw_event(eidx)) {
+ /* Firmware counters are mapped 1:1 starting from num_hw_ctrs for simplicity */
+ min = kvpmu->num_hw_ctrs;
+ max = min + kvpmu->num_fw_ctrs;
+ } else {
+ /* First 3 counters are reserved for fixed counters */
+ min = 3;
+ max = kvpmu->num_hw_ctrs;
+ }
+
+ for_each_set_bit(i, &cmask, BITS_PER_LONG) {
+ pmc_idx = i + cbase;
+ if ((pmc_idx >= min && pmc_idx < max) &&
+ !test_bit(pmc_idx, kvpmu->pmc_in_use)) {
+ ctr_idx = pmc_idx;
+ break;
+ }
+ }
+
+ return ctr_idx;
+}
+
+static int pmu_get_pmc_index(struct kvm_pmu *pmu, unsigned long eidx,
+ unsigned long cbase, unsigned long cmask)
+{
+ int ret;
+
+ /* Fixed counters need to be have fixed mapping as they have different width */
+ ret = pmu_get_fixed_pmc_index(eidx);
+ if (ret >= 0)
+ return ret;
+
+ return pmu_get_programmable_pmc_index(pmu, eidx, cbase, cmask);
+}

static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
unsigned long *out_val)
@@ -82,7 +235,41 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
unsigned long ctr_mask, unsigned long flag, uint64_t ival,
struct kvm_vcpu_sbi_ext_data *edata)
{
- /* TODO */
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ int i, num_ctrs, pmc_index, sbiret = 0;
+ struct kvm_pmc *pmc;
+
+ num_ctrs = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;
+ if (ctr_base + __fls(ctr_mask) >= num_ctrs) {
+ sbiret = SBI_ERR_INVALID_PARAM;
+ goto out;
+ }
+
+ /* Start the counters that have been configured and requested by the guest */
+ for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
+ pmc_index = i + ctr_base;
+ if (!test_bit(pmc_index, kvpmu->pmc_in_use))
+ continue;
+ pmc = &kvpmu->pmc[pmc_index];
+ if (flag & SBI_PMU_START_FLAG_SET_INIT_VALUE)
+ pmc->counter_val = ival;
+ if (pmc->perf_event) {
+ if (unlikely(pmc->started)) {
+ sbiret = SBI_ERR_ALREADY_STARTED;
+ continue;
+ }
+ perf_event_period(pmc->perf_event, pmu_get_sample_period(pmc));
+ perf_event_enable(pmc->perf_event);
+ pmc->started = true;
+ } else {
+ kvm_debug("Can not start counter due to invalid confiugartion\n");
+ sbiret = SBI_ERR_INVALID_PARAM;
+ }
+ }
+
+out:
+ edata->err_val = sbiret;
+
return 0;
}

@@ -90,16 +277,142 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
unsigned long ctr_mask, unsigned long flag,
struct kvm_vcpu_sbi_ext_data *edata)
{
- /* TODO */
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ int i, num_ctrs, pmc_index, sbiret = 0;
+ u64 enabled, running;
+ struct kvm_pmc *pmc;
+
+ num_ctrs = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;
+ if ((ctr_base + __fls(ctr_mask)) >= num_ctrs) {
+ sbiret = SBI_ERR_INVALID_PARAM;
+ goto out;
+ }
+
+ /* Stop the counters that have been configured and requested by the guest */
+ for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
+ pmc_index = i + ctr_base;
+ if (!test_bit(pmc_index, kvpmu->pmc_in_use))
+ continue;
+ pmc = &kvpmu->pmc[pmc_index];
+ if (pmc->perf_event) {
+ if (pmc->started) {
+ /* Stop counting the counter */
+ perf_event_disable(pmc->perf_event);
+ pmc->started = false;
+ } else
+ sbiret = SBI_ERR_ALREADY_STOPPED;
+
+ if (flag & SBI_PMU_STOP_FLAG_RESET) {
+ /* Relase the counter if this is a reset request */
+ pmc->counter_val += perf_event_read_value(pmc->perf_event,
+ &enabled, &running);
+ pmu_release_perf_event(pmc);
+ clear_bit(pmc_index, kvpmu->pmc_in_use);
+ }
+ } else {
+ kvm_debug("Can not stop counter due to invalid confiugartion\n");
+ sbiret = SBI_ERR_INVALID_PARAM;
+ }
+ }
+
+out:
+ edata->err_val = sbiret;
+
return 0;
}

int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
unsigned long ctr_mask, unsigned long flag,
- unsigned long eidx, uint64_t edata,
- struct kvm_vcpu_sbi_ext_data *extdata)
+ unsigned long eidx, uint64_t evt_data,
+ struct kvm_vcpu_sbi_ext_data *ext_data)
{
- /* TODO */
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ struct perf_event *event;
+ struct perf_event_attr attr;
+ int num_ctrs, ctr_idx;
+ u32 etype = pmu_get_perf_event_type(eidx);
+ u64 config;
+ struct kvm_pmc *pmc;
+ int sbiret = 0;
+
+
+ num_ctrs = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;
+ if (etype == PERF_TYPE_MAX || (ctr_base + __fls(ctr_mask) >= num_ctrs)) {
+ sbiret = SBI_ERR_INVALID_PARAM;
+ goto out;
+ }
+
+ if (pmu_is_fw_event(eidx)) {
+ sbiret = SBI_ERR_NOT_SUPPORTED;
+ goto out;
+ }
+
+ /*
+ * SKIP_MATCH flag indicates the caller is aware of the assigned counter
+ * for this event. Just do a sanity check if it already marked used.
+ */
+ if (flag & SBI_PMU_CFG_FLAG_SKIP_MATCH) {
+ if (!test_bit(ctr_base, kvpmu->pmc_in_use)) {
+ sbiret = SBI_ERR_FAILURE;
+ goto out;
+ }
+ ctr_idx = ctr_base;
+ goto match_done;
+ }
+
+ ctr_idx = pmu_get_pmc_index(kvpmu, eidx, ctr_base, ctr_mask);
+ if (ctr_idx < 0) {
+ sbiret = SBI_ERR_NOT_SUPPORTED;
+ goto out;
+ }
+
+match_done:
+ pmc = &kvpmu->pmc[ctr_idx];
+ pmu_release_perf_event(pmc);
+ pmc->idx = ctr_idx;
+
+ config = pmu_get_perf_event_config(eidx, evt_data);
+ memset(&attr, 0, sizeof(struct perf_event_attr));
+ attr.type = etype;
+ attr.size = sizeof(attr);
+ attr.pinned = true;
+
+ /*
+ * It should never reach here if the platform doesn't support sscofpmf extensio
+ * as mode filtering won't work without it.
+ */
+ attr.exclude_host = true;
+ attr.exclude_hv = true;
+ attr.exclude_user = !!(flag & SBI_PMU_CFG_FLAG_SET_UINH);
+ attr.exclude_kernel = !!(flag & SBI_PMU_CFG_FLAG_SET_SINH);
+ attr.config = config;
+ attr.config1 = RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS;
+ if (flag & SBI_PMU_CFG_FLAG_CLEAR_VALUE) {
+ //TODO: Do we really want to clear the value in hardware counter
+ pmc->counter_val = 0;
+ }
+
+ /*
+ * Set the default sample_period for now. The guest specified value
+ * will be updated in the start call.
+ */
+ attr.sample_period = pmu_get_sample_period(pmc);
+
+ event = perf_event_create_kernel_counter(&attr, -1, current, NULL, pmc);
+ if (IS_ERR(event)) {
+ pr_err("kvm pmu event creation failed event %pe for eidx %lx\n", event, eidx);
+ return -EOPNOTSUPP;
+ }
+
+ set_bit(ctr_idx, kvpmu->pmc_in_use);
+ pmc->perf_event = event;
+ if (flag & SBI_PMU_CFG_FLAG_AUTO_START)
+ perf_event_enable(pmc->perf_event);
+
+ ext_data->out_val = ctr_idx;
+out:
+ ext_data->err_val = sbiret;
+
return 0;
}

@@ -119,6 +432,7 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
{
int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ struct kvm_pmc *pmc;

ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
if (ret < 0)
@@ -134,6 +448,7 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
else
num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;

+ bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
kvpmu->num_hw_ctrs = num_hw_ctrs;
kvpmu->num_fw_ctrs = num_fw_ctrs;
/*
@@ -146,24 +461,26 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
/* TIME CSR shouldn't be read from perf interface */
if (i == 1)
continue;
- kvpmu->pmc[i].idx = i;
+ pmc = &kvpmu->pmc[i];
+ pmc->idx = i;
+ pmc->counter_val = 0;
if (i < kvpmu->num_hw_ctrs) {
kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
if (i < 3)
/* CY, IR counters */
- kvpmu->pmc[i].cinfo.width = 63;
+ pmc->cinfo.width = 63;
else
- kvpmu->pmc[i].cinfo.width = hpm_width;
+ pmc->cinfo.width = hpm_width;
/*
* The CSR number doesn't have any relation with the logical
* hardware counters. The CSR numbers are encoded sequentially
* to avoid maintaining a map between the virtual counter
* and CSR number.
*/
- kvpmu->pmc[i].cinfo.csr = CSR_CYCLE + i;
+ pmc->cinfo.csr = CSR_CYCLE + i;
} else {
- kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_FW;
- kvpmu->pmc[i].cinfo.width = BITS_PER_LONG - 1;
+ pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
+ pmc->cinfo.width = BITS_PER_LONG - 1;
}
}

@@ -172,13 +489,22 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
return 0;
}

-void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
+void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
{
- /* TODO */
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ struct kvm_pmc *pmc;
+ int i;
+
+ if (!kvpmu)
+ return;
+
+ for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_MAX_COUNTERS) {
+ pmc = &kvpmu->pmc[i];
+ pmu_release_perf_event(pmc);
+ }
}

-void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
+void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
{
- /* TODO */
+ kvm_riscv_vcpu_pmu_deinit(vcpu);
}
-
--
2.25.1

2022-12-15 17:57:10

by Atish Patra

[permalink] [raw]

Subject: [PATCH v2 02/11] RISC-V: KVM: Define a probe function for SBI extension data structures

Currently the probe function just checks if an SBI extension is
registered or not. However, the extension may not want to advertise
itself depending on some other condition.
An additional extension specific probe function will allow
extensions to decide if they want to be advertised to the caller or
not. Any extension that does not require additional dependency checks
can avoid implementing this function.

Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/include/asm/kvm_vcpu_sbi.h | 3 +++
arch/riscv/kvm/vcpu_sbi_base.c | 13 +++++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_vcpu_sbi.h b/arch/riscv/include/asm/kvm_vcpu_sbi.h
index f79478a..61dac1b 100644
--- a/arch/riscv/include/asm/kvm_vcpu_sbi.h
+++ b/arch/riscv/include/asm/kvm_vcpu_sbi.h
@@ -29,6 +29,9 @@ struct kvm_vcpu_sbi_extension {
int (*handler)(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long *out_val, struct kvm_cpu_trap *utrap,
bool *exit);
+
+ /* Extension specific probe function */
+ unsigned long (*probe)(struct kvm_vcpu *vcpu, unsigned long extid);
};

void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, struct kvm_run *run);
diff --git a/arch/riscv/kvm/vcpu_sbi_base.c b/arch/riscv/kvm/vcpu_sbi_base.c
index 5d65c63..89e2415 100644
--- a/arch/riscv/kvm/vcpu_sbi_base.c
+++ b/arch/riscv/kvm/vcpu_sbi_base.c
@@ -19,6 +19,7 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
{
int ret = 0;
struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+ const struct kvm_vcpu_sbi_extension *sbi_ext;

switch (cp->a6) {
case SBI_EXT_BASE_GET_SPEC_VERSION:
@@ -43,8 +44,16 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
*/
kvm_riscv_vcpu_sbi_forward(vcpu, run);
*exit = true;
- } else
- *out_val = kvm_vcpu_sbi_find_ext(cp->a0) ? 1 : 0;
+ } else {
+ sbi_ext = kvm_vcpu_sbi_find_ext(cp->a0);
+ if (sbi_ext) {
+ if (sbi_ext->probe)
+ *out_val = sbi_ext->probe(vcpu, cp->a0);
+ else
+ *out_val = 1;
+ } else
+ *out_val = 0;
+ }
break;
case SBI_EXT_BASE_GET_MVENDORID:
*out_val = vcpu->arch.mvendorid;
--
2.25.1

2022-12-15 17:57:39

by Atish Patra

[permalink] [raw]

Subject: [PATCH v2 05/11] RISC-V: KVM: Improve privilege mode filtering for perf

Currently, the host driver doesn't have any method to identify if the
requested perf event is from kvm or bare metal. As KVM runs in HS
mode, there are no separate hypervisor privilege mode to distinguish
between the attributes for guest/host.

Improve the privilege mode filtering by using the event specific
config1 field.

Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
drivers/perf/riscv_pmu_sbi.c | 27 ++++++++++++++++++++++-----
include/linux/perf/riscv_pmu.h | 2 ++
2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 65d4aa4..df795b7 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -298,6 +298,27 @@ int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr)
}
EXPORT_SYMBOL(riscv_pmu_get_hpm_info);

+static unsigned long pmu_sbi_get_filter_flags(struct perf_event *event)
+{
+ unsigned long cflags = 0;
+ bool guest_events = false;
+
+ if (event->attr.config1 & RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS)
+ guest_events = true;
+ if (event->attr.exclude_kernel)
+ cflags |= guest_events ? SBI_PMU_CFG_FLAG_SET_VSINH : SBI_PMU_CFG_FLAG_SET_SINH;
+ if (event->attr.exclude_user)
+ cflags |= guest_events ? SBI_PMU_CFG_FLAG_SET_VUINH : SBI_PMU_CFG_FLAG_SET_UINH;
+ if (guest_events && event->attr.exclude_hv)
+ cflags |= SBI_PMU_CFG_FLAG_SET_SINH;
+ if (event->attr.exclude_host)
+ cflags |= SBI_PMU_CFG_FLAG_SET_UINH | SBI_PMU_CFG_FLAG_SET_SINH;
+ if (event->attr.exclude_guest)
+ cflags |= SBI_PMU_CFG_FLAG_SET_VSINH | SBI_PMU_CFG_FLAG_SET_VUINH;
+
+ return cflags;
+}
+
static int pmu_sbi_ctr_get_idx(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
@@ -308,11 +329,7 @@ static int pmu_sbi_ctr_get_idx(struct perf_event *event)
uint64_t cbase = 0;
unsigned long cflags = 0;

- if (event->attr.exclude_kernel)
- cflags |= SBI_PMU_CFG_FLAG_SET_SINH;
- if (event->attr.exclude_user)
- cflags |= SBI_PMU_CFG_FLAG_SET_UINH;
-
+ cflags = pmu_sbi_get_filter_flags(event);
/* retrieve the available counter index */
#if defined(CONFIG_32BIT)
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase,
diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index a1c3f77..1c42146 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -26,6 +26,8 @@

#define RISCV_PMU_STOP_FLAG_RESET 1

+#define RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS 0x1
+
struct cpu_hw_events {
/* currently enabled events */
int n_events;
--
2.25.1

2022-12-15 17:58:14

by Atish Patra

[permalink] [raw]

Subject: [PATCH v2 07/11] RISC-V: KVM: Add SBI PMU extension support

SBI PMU extension allows KVM guests to configure/start/stop/query about
the PMU counters in virtualized enviornment as well.

In order to allow that, KVM implements the entire SBI PMU extension.

Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/Makefile | 2 +-
arch/riscv/kvm/vcpu_sbi.c | 11 +++++
arch/riscv/kvm/vcpu_sbi_pmu.c | 86 +++++++++++++++++++++++++++++++++++
3 files changed, 98 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/kvm/vcpu_sbi_pmu.c

diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 5de1053..278e97c 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -25,4 +25,4 @@ kvm-y += vcpu_sbi_base.o
kvm-y += vcpu_sbi_replace.o
kvm-y += vcpu_sbi_hsm.o
kvm-y += vcpu_timer.o
-kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
+kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o vcpu_sbi_pmu.o
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
index 50c5472..3b8b84e8 100644
--- a/arch/riscv/kvm/vcpu_sbi.c
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -20,6 +20,16 @@ static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_v01 = {
};
#endif

+#ifdef CONFIG_RISCV_PMU_SBI
+extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu;
+#else
+static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu = {
+ .extid_start = -1UL,
+ .extid_end = -1UL,
+ .handler = NULL,
+};
+#endif
+
static const struct kvm_vcpu_sbi_extension *sbi_ext[] = {
&vcpu_sbi_ext_v01,
&vcpu_sbi_ext_base,
@@ -28,6 +38,7 @@ static const struct kvm_vcpu_sbi_extension *sbi_ext[] = {
&vcpu_sbi_ext_rfence,
&vcpu_sbi_ext_srst,
&vcpu_sbi_ext_hsm,
+ &vcpu_sbi_ext_pmu,
&vcpu_sbi_ext_experimental,
&vcpu_sbi_ext_vendor,
};
diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
new file mode 100644
index 0000000..223752f
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Rivos Inc
+ *
+ * Authors:
+ * Atish Patra <[email protected]>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <asm/csr.h>
+#include <asm/sbi.h>
+#include <asm/kvm_vcpu_sbi.h>
+
+static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ struct kvm_vcpu_sbi_ext_data *edata,
+ struct kvm_cpu_trap *utrap)
+{
+ int ret = 0;
+ struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+ unsigned long funcid = cp->a6;
+ uint64_t temp;
+
+ /* Return not supported if PMU is not initialized */
+ if (!kvpmu->init_done)
+ return -EINVAL;
+
+ switch (funcid) {
+ case SBI_EXT_PMU_NUM_COUNTERS:
+ ret = kvm_riscv_vcpu_pmu_num_ctrs(vcpu, edata);
+ break;
+ case SBI_EXT_PMU_COUNTER_GET_INFO:
+ ret = kvm_riscv_vcpu_pmu_ctr_info(vcpu, cp->a0, edata);
+ break;
+ case SBI_EXT_PMU_COUNTER_CFG_MATCH:
+#if defined(CONFIG_32BIT)
+ temp = ((uint64_t)cp->a5 << 32) | cp->a4;
+#else
+ temp = cp->a4;
+#endif
+ ret = kvm_riscv_vcpu_pmu_ctr_cfg_match(vcpu, cp->a0, cp->a1,
+ cp->a2, cp->a3, temp, edata);
+ break;
+ case SBI_EXT_PMU_COUNTER_START:
+#if defined(CONFIG_32BIT)
+ temp = ((uint64_t)cp->a4 << 32) | cp->a3;
+#else
+ temp = cp->a3;
+#endif
+ ret = kvm_riscv_vcpu_pmu_ctr_start(vcpu, cp->a0, cp->a1, cp->a2,
+ temp, edata);
+ break;
+ case SBI_EXT_PMU_COUNTER_STOP:
+ ret = kvm_riscv_vcpu_pmu_ctr_stop(vcpu, cp->a0, cp->a1, cp->a2, edata);
+ break;
+ case SBI_EXT_PMU_COUNTER_FW_READ:
+ ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, edata);
+ break;
+ default:
+ edata->err_val = SBI_ERR_NOT_SUPPORTED;
+ }
+
+
+ return ret;
+}
+
+unsigned long kvm_sbi_ext_pmu_probe(struct kvm_vcpu *vcpu, unsigned long extid)
+{
+ struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
+
+ /*
+ * PMU Extension is only available to guests if privilege mode filtering
+ * is available. Otherwise, guest will always count events while the
+ * execution is in hypervisor mode.
+ */
+ return kvpmu->init_done && riscv_isa_extension_available(NULL, SSCOFPMF);
+}
+
+const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu = {
+ .extid_start = SBI_EXT_PMU,
+ .extid_end = SBI_EXT_PMU,
+ .handler = kvm_sbi_ext_pmu_handler,
+ .probe = kvm_sbi_ext_pmu_probe,
+};
--
2.25.1

2022-12-15 17:59:00

by Atish Patra

[permalink] [raw]

Subject: [PATCH v2 08/11] RISC-V: KVM: Disable all hpmcounter access for VS/VU mode

Any guest must not get access to any hpmcounter including cycle/instret
without any checks. We achieve that by disabling all the bits except TM
bit in hcountern.

However, instret and cycle access for guest userspace can be enabled
upon explicit request (via ONE REG) or on first trap from VU mode
to maintain ABI requirement in the future. This patch doesn't support
that as ONE REG inteface is not settled yet.

Signed-off-by: Atish Patra <[email protected]>
---
arch/riscv/kvm/main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 58c5489..9c2efd3 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -49,7 +49,8 @@ int kvm_arch_hardware_enable(void)
hideleg |= (1UL << IRQ_VS_EXT);
csr_write(CSR_HIDELEG, hideleg);

- csr_write(CSR_HCOUNTEREN, -1UL);
+ /* VS should access only TM bit. Everything else should trap */
+ csr_write(CSR_HCOUNTEREN, 0x02);

csr_write(CSR_HVIP, 0);

--
2.25.1

2022-12-15 20:26:37

by Conor Dooley

[permalink] [raw]

Subject: Re: [PATCH v2 05/11] RISC-V: KVM: Improve privilege mode filtering for perf

Hey Atish,

On Thu, Dec 15, 2022 at 09:00:40AM -0800, Atish Patra wrote:
> RISC-V: KVM: Improve privilege mode filtering for perf

I almost marked this as "not applicable" in patchwork as I was mislead
by the $subject. I know our perf driver is a real mixed bag, but should
it not be something more like:
"perf: RISC-V: Improve privilege mode filtering for KVM"?
It was only when I noticed that the rest of the series had been marked
as "Handled Elsewhere" that I realised that this must not be a KVM patch
;)

Thanks,
Conor

> Currently, the host driver doesn't have any method to identify if the
> requested perf event is from kvm or bare metal. As KVM runs in HS
> mode, there are no separate hypervisor privilege mode to distinguish
> between the attributes for guest/host.
>
> Improve the privilege mode filtering by using the event specific
> config1 field.
>
> Reviewed-by: Andrew Jones <[email protected]>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> drivers/perf/riscv_pmu_sbi.c | 27 ++++++++++++++++++++++-----
> include/linux/perf/riscv_pmu.h | 2 ++
> 2 files changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 65d4aa4..df795b7 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -298,6 +298,27 @@ int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr)
> }
> EXPORT_SYMBOL(riscv_pmu_get_hpm_info);
>
> +static unsigned long pmu_sbi_get_filter_flags(struct perf_event *event)
> +{
> + unsigned long cflags = 0;
> + bool guest_events = false;
> +
> + if (event->attr.config1 & RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS)
> + guest_events = true;
> + if (event->attr.exclude_kernel)
> + cflags |= guest_events ? SBI_PMU_CFG_FLAG_SET_VSINH : SBI_PMU_CFG_FLAG_SET_SINH;
> + if (event->attr.exclude_user)
> + cflags |= guest_events ? SBI_PMU_CFG_FLAG_SET_VUINH : SBI_PMU_CFG_FLAG_SET_UINH;
> + if (guest_events && event->attr.exclude_hv)
> + cflags |= SBI_PMU_CFG_FLAG_SET_SINH;
> + if (event->attr.exclude_host)
> + cflags |= SBI_PMU_CFG_FLAG_SET_UINH | SBI_PMU_CFG_FLAG_SET_SINH;
> + if (event->attr.exclude_guest)
> + cflags |= SBI_PMU_CFG_FLAG_SET_VSINH | SBI_PMU_CFG_FLAG_SET_VUINH;
> +
> + return cflags;
> +}
> +
> static int pmu_sbi_ctr_get_idx(struct perf_event *event)
> {
> struct hw_perf_event *hwc = &event->hw;
> @@ -308,11 +329,7 @@ static int pmu_sbi_ctr_get_idx(struct perf_event *event)
> uint64_t cbase = 0;
> unsigned long cflags = 0;
>
> - if (event->attr.exclude_kernel)
> - cflags |= SBI_PMU_CFG_FLAG_SET_SINH;
> - if (event->attr.exclude_user)
> - cflags |= SBI_PMU_CFG_FLAG_SET_UINH;
> -
> + cflags = pmu_sbi_get_filter_flags(event);
> /* retrieve the available counter index */
> #if defined(CONFIG_32BIT)
> ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase,
> diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
> index a1c3f77..1c42146 100644
> --- a/include/linux/perf/riscv_pmu.h
> +++ b/include/linux/perf/riscv_pmu.h
> @@ -26,6 +26,8 @@
>
> #define RISCV_PMU_STOP_FLAG_RESET 1
>
> +#define RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS 0x1
> +
> struct cpu_hw_events {
> /* currently enabled events */
> int n_events;
> --
> 2.25.1
>
>

Attachments:

(No filename) (3.30 kB)
signature.asc (235.00 B)
Download all attachments

2022-12-15 21:34:11

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v2 05/11] RISC-V: KVM: Improve privilege mode filtering for perf

On Thu, Dec 15, 2022 at 12:18 PM Conor Dooley <[email protected]> wrote:
>
> Hey Atish,
>
> On Thu, Dec 15, 2022 at 09:00:40AM -0800, Atish Patra wrote:
> > RISC-V: KVM: Improve privilege mode filtering for perf
>
> I almost marked this as "not applicable" in patchwork as I was mislead
> by the $subject. I know our perf driver is a real mixed bag, but should
> it not be something more like:
> "perf: RISC-V: Improve privilege mode filtering for KVM"?

Sure. I will change it in the next version.

> It was only when I noticed that the rest of the series had been marked
> as "Handled Elsewhere" that I realised that this must not be a KVM patch
> ;)
>
> Thanks,
> Conor
>
> > Currently, the host driver doesn't have any method to identify if the
> > requested perf event is from kvm or bare metal. As KVM runs in HS
> > mode, there are no separate hypervisor privilege mode to distinguish
> > between the attributes for guest/host.
> >
> > Improve the privilege mode filtering by using the event specific
> > config1 field.
> >
> > Reviewed-by: Andrew Jones <[email protected]>
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > drivers/perf/riscv_pmu_sbi.c | 27 ++++++++++++++++++++++-----
> > include/linux/perf/riscv_pmu.h | 2 ++
> > 2 files changed, 24 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> > index 65d4aa4..df795b7 100644
> > --- a/drivers/perf/riscv_pmu_sbi.c
> > +++ b/drivers/perf/riscv_pmu_sbi.c
> > @@ -298,6 +298,27 @@ int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr)
> > }
> > EXPORT_SYMBOL(riscv_pmu_get_hpm_info);
> >
> > +static unsigned long pmu_sbi_get_filter_flags(struct perf_event *event)
> > +{
> > + unsigned long cflags = 0;
> > + bool guest_events = false;
> > +
> > + if (event->attr.config1 & RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS)
> > + guest_events = true;
> > + if (event->attr.exclude_kernel)
> > + cflags |= guest_events ? SBI_PMU_CFG_FLAG_SET_VSINH : SBI_PMU_CFG_FLAG_SET_SINH;
> > + if (event->attr.exclude_user)
> > + cflags |= guest_events ? SBI_PMU_CFG_FLAG_SET_VUINH : SBI_PMU_CFG_FLAG_SET_UINH;
> > + if (guest_events && event->attr.exclude_hv)
> > + cflags |= SBI_PMU_CFG_FLAG_SET_SINH;
> > + if (event->attr.exclude_host)
> > + cflags |= SBI_PMU_CFG_FLAG_SET_UINH | SBI_PMU_CFG_FLAG_SET_SINH;
> > + if (event->attr.exclude_guest)
> > + cflags |= SBI_PMU_CFG_FLAG_SET_VSINH | SBI_PMU_CFG_FLAG_SET_VUINH;
> > +
> > + return cflags;
> > +}
> > +
> > static int pmu_sbi_ctr_get_idx(struct perf_event *event)
> > {
> > struct hw_perf_event *hwc = &event->hw;
> > @@ -308,11 +329,7 @@ static int pmu_sbi_ctr_get_idx(struct perf_event *event)
> > uint64_t cbase = 0;
> > unsigned long cflags = 0;
> >
> > - if (event->attr.exclude_kernel)
> > - cflags |= SBI_PMU_CFG_FLAG_SET_SINH;
> > - if (event->attr.exclude_user)
> > - cflags |= SBI_PMU_CFG_FLAG_SET_UINH;
> > -
> > + cflags = pmu_sbi_get_filter_flags(event);
> > /* retrieve the available counter index */
> > #if defined(CONFIG_32BIT)
> > ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase,
> > diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
> > index a1c3f77..1c42146 100644
> > --- a/include/linux/perf/riscv_pmu.h
> > +++ b/include/linux/perf/riscv_pmu.h
> > @@ -26,6 +26,8 @@
> >
> > #define RISCV_PMU_STOP_FLAG_RESET 1
> >
> > +#define RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS 0x1
> > +
> > struct cpu_hw_events {
> > /* currently enabled events */
> > int n_events;
> > --
> > 2.25.1
> >
> >

2023-01-12 10:23:38

by Andrew Jones

[permalink] [raw]

Subject: Re: [PATCH v2 01/11] RISC-V: Define helper functions expose hpm counter width and count

On Thu, Dec 15, 2022 at 09:00:36AM -0800, Atish Patra wrote:
> KVM module needs to know how many hardware counters and the counter
> width that the platform supports. Otherwise, it will not be able to show
> optimal value of virtual counters to the guest. The virtual hardware
> counters also need to have the same width as the logical hardware
> counters for simplicity. However, there shouldn't be mapping between
> virtual hardware counters and logical hardware counters. As we don't
> support hetergeneous harts or counters with different width as of now,
> the implementation relies on the counter width of the first available
> programmable counter.
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> drivers/perf/riscv_pmu_sbi.c | 35 +++++++++++++++++++++++++++++++++-
> include/linux/perf/riscv_pmu.h | 3 +++
> 2 files changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 3852c18..65d4aa4 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -49,6 +49,9 @@ static const struct attribute_group *riscv_pmu_attr_groups[] = {
> static union sbi_pmu_ctr_info *pmu_ctr_list;
> static unsigned int riscv_pmu_irq;
>
> +/* Cache the available counters in a bitmask */
> +unsigned long cmask;

I presume this can be static since it's not getting added to the header.
And don't we need this to be a long long for rv32? We should probably
just use u64.

> +
> struct sbi_pmu_event_data {
> union {
> union {
> @@ -264,6 +267,37 @@ static bool pmu_sbi_ctr_is_fw(int cidx)
> return (info->type == SBI_PMU_CTR_TYPE_FW) ? true : false;
> }
>
> +/*
> + * Returns the counter width of a programmable counter and number of hardware
> + * counters. As we don't support heterneous CPUs yet, it is okay to just

heterogeneous

> + * return the counter width of the first programmable counter.
> + */
> +int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr)
> +{
> + int i;
> + union sbi_pmu_ctr_info *info;
> + u32 hpm_width = 0, hpm_count = 0;
> +
> + if (!cmask)
> + return -EINVAL;
> +
> + for_each_set_bit(i, &cmask, RISCV_MAX_COUNTERS) {
> + info = &pmu_ctr_list[i];
> + if (!info)
> + continue;
> + if (!hpm_width && (info->csr != CSR_CYCLE) && (info->csr != CSR_INSTRET))

nit: No need for () around the != expressions

> + hpm_width = info->width;
> + if (info->type == SBI_PMU_CTR_TYPE_HW)
> + hpm_count++;
> + }
> +
> + *hw_ctr_width = hpm_width;
> + *num_hw_ctr = hpm_count;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(riscv_pmu_get_hpm_info);

EXPORT_SYMBOL_GPL ?

> +
> static int pmu_sbi_ctr_get_idx(struct perf_event *event)
> {
> struct hw_perf_event *hwc = &event->hw;
> @@ -798,7 +832,6 @@ static void riscv_pmu_destroy(struct riscv_pmu *pmu)
> static int pmu_sbi_device_probe(struct platform_device *pdev)
> {
> struct riscv_pmu *pmu = NULL;
> - unsigned long cmask = 0;
> int ret = -ENODEV;
> int num_counters;
>
> diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
> index e17e86a..a1c3f77 100644
> --- a/include/linux/perf/riscv_pmu.h
> +++ b/include/linux/perf/riscv_pmu.h
> @@ -73,6 +73,9 @@ void riscv_pmu_legacy_skip_init(void);
> static inline void riscv_pmu_legacy_skip_init(void) {};
> #endif
> struct riscv_pmu *riscv_pmu_alloc(void);
> +#ifdef CONFIG_RISCV_PMU_SBI
> +int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr);
> +#endif
>
> #endif /* CONFIG_RISCV_PMU */
>
> --
> 2.25.1
>

Thanks,
drew

2023-01-12 10:41:40

by Andrew Jones

[permalink] [raw]

Subject: Re: [PATCH v2 02/11] RISC-V: KVM: Define a probe function for SBI extension data structures

On Thu, Dec 15, 2022 at 09:00:37AM -0800, Atish Patra wrote:
> Currently the probe function just checks if an SBI extension is
> registered or not. However, the extension may not want to advertise
> itself depending on some other condition.
> An additional extension specific probe function will allow
> extensions to decide if they want to be advertised to the caller or
> not. Any extension that does not require additional dependency checks
> can avoid implementing this function.
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/include/asm/kvm_vcpu_sbi.h | 3 +++
> arch/riscv/kvm/vcpu_sbi_base.c | 13 +++++++++++--
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kvm_vcpu_sbi.h b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> index f79478a..61dac1b 100644
> --- a/arch/riscv/include/asm/kvm_vcpu_sbi.h
> +++ b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> @@ -29,6 +29,9 @@ struct kvm_vcpu_sbi_extension {
> int (*handler)(struct kvm_vcpu *vcpu, struct kvm_run *run,
> unsigned long *out_val, struct kvm_cpu_trap *utrap,
> bool *exit);
> +
> + /* Extension specific probe function */
> + unsigned long (*probe)(struct kvm_vcpu *vcpu, unsigned long extid);

It doesn't seem like the extid parameter should be necessary since the
probe function is specific to the extension, but it doesn't hurt either.

> };
>
> void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, struct kvm_run *run);
> diff --git a/arch/riscv/kvm/vcpu_sbi_base.c b/arch/riscv/kvm/vcpu_sbi_base.c
> index 5d65c63..89e2415 100644
> --- a/arch/riscv/kvm/vcpu_sbi_base.c
> +++ b/arch/riscv/kvm/vcpu_sbi_base.c
> @@ -19,6 +19,7 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> {
> int ret = 0;
> struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> + const struct kvm_vcpu_sbi_extension *sbi_ext;
>
> switch (cp->a6) {
> case SBI_EXT_BASE_GET_SPEC_VERSION:
> @@ -43,8 +44,16 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> */
> kvm_riscv_vcpu_sbi_forward(vcpu, run);
> *exit = true;
> - } else
> - *out_val = kvm_vcpu_sbi_find_ext(cp->a0) ? 1 : 0;
> + } else {
> + sbi_ext = kvm_vcpu_sbi_find_ext(cp->a0);
> + if (sbi_ext) {
> + if (sbi_ext->probe)
> + *out_val = sbi_ext->probe(vcpu, cp->a0);
> + else
> + *out_val = 1;
> + } else
> + *out_val = 0;
> + }
> break;
> case SBI_EXT_BASE_GET_MVENDORID:
> *out_val = vcpu->arch.mvendorid;
> --
> 2.25.1
>

Reviewed-by: Andrew Jones <[email protected]>

2023-01-12 11:34:23

by Andrew Jones

[permalink] [raw]

Subject: Re: [PATCH v2 03/11] RISC-V: KVM: Return correct code for hsm stop function

On Thu, Dec 15, 2022 at 09:00:38AM -0800, Atish Patra wrote:
> According to the SBI specification, the stop function can only
> return error code SBI_ERR_FAILED. However, currently it returns
> -EINVAL which will be mapped SBI_ERR_INVALID_PARAM.

I presume the mapping referred to here is kvm_linux_err_map_sbi().
If so, then -EPERM isn't correct either. That maps to SBI_ERR_DENIED.
The only thing that will ensure we get SBI_ERR_FAILURE (-1) is
anything not handled by the kvm_linux_err_map_sbi switch, as we
need to use the default.

Thanks,
drew

>
> Return the appropriate linux error code.
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/kvm/vcpu_sbi_hsm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/kvm/vcpu_sbi_hsm.c b/arch/riscv/kvm/vcpu_sbi_hsm.c
> index 2e915ca..0f8d9fe 100644
> --- a/arch/riscv/kvm/vcpu_sbi_hsm.c
> +++ b/arch/riscv/kvm/vcpu_sbi_hsm.c
> @@ -42,7 +42,7 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu)
> static int kvm_sbi_hsm_vcpu_stop(struct kvm_vcpu *vcpu)
> {
> if (vcpu->arch.power_off)
> - return -EINVAL;
> + return -EPERM;
>
> kvm_riscv_vcpu_power_off(vcpu);
>
> --
> 2.25.1
>

2023-01-12 15:52:33

by Andrew Jones

[permalink] [raw]

Subject: Re: [PATCH v2 07/11] RISC-V: KVM: Add SBI PMU extension support

On Thu, Dec 15, 2022 at 09:00:42AM -0800, Atish Patra wrote:
> SBI PMU extension allows KVM guests to configure/start/stop/query about
> the PMU counters in virtualized enviornment as well.
>
> In order to allow that, KVM implements the entire SBI PMU extension.
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/kvm/Makefile | 2 +-
> arch/riscv/kvm/vcpu_sbi.c | 11 +++++
> arch/riscv/kvm/vcpu_sbi_pmu.c | 86 +++++++++++++++++++++++++++++++++++
> 3 files changed, 98 insertions(+), 1 deletion(-)
> create mode 100644 arch/riscv/kvm/vcpu_sbi_pmu.c
>
> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> index 5de1053..278e97c 100644
> --- a/arch/riscv/kvm/Makefile
> +++ b/arch/riscv/kvm/Makefile
> @@ -25,4 +25,4 @@ kvm-y += vcpu_sbi_base.o
> kvm-y += vcpu_sbi_replace.o
> kvm-y += vcpu_sbi_hsm.o
> kvm-y += vcpu_timer.o
> -kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
> +kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o vcpu_sbi_pmu.o
> diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> index 50c5472..3b8b84e8 100644
> --- a/arch/riscv/kvm/vcpu_sbi.c
> +++ b/arch/riscv/kvm/vcpu_sbi.c
> @@ -20,6 +20,16 @@ static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_v01 = {
> };
> #endif
>
> +#ifdef CONFIG_RISCV_PMU_SBI
> +extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu;
> +#else
> +static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu = {
> + .extid_start = -1UL,
> + .extid_end = -1UL,
> + .handler = NULL,
> +};
> +#endif
> +
> static const struct kvm_vcpu_sbi_extension *sbi_ext[] = {
> &vcpu_sbi_ext_v01,
> &vcpu_sbi_ext_base,
> @@ -28,6 +38,7 @@ static const struct kvm_vcpu_sbi_extension *sbi_ext[] = {
> &vcpu_sbi_ext_rfence,
> &vcpu_sbi_ext_srst,
> &vcpu_sbi_ext_hsm,
> + &vcpu_sbi_ext_pmu,
> &vcpu_sbi_ext_experimental,
> &vcpu_sbi_ext_vendor,
> };
> diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
> new file mode 100644
> index 0000000..223752f
> --- /dev/null
> +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022 Rivos Inc
> + *
> + * Authors:
> + * Atish Patra <[email protected]>
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/kvm_host.h>
> +#include <asm/csr.h>
> +#include <asm/sbi.h>
> +#include <asm/kvm_vcpu_sbi.h>
> +
> +static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> + struct kvm_vcpu_sbi_ext_data *edata,
> + struct kvm_cpu_trap *utrap)
> +{
> + int ret = 0;
> + struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + unsigned long funcid = cp->a6;
> + uint64_t temp;
> +
> + /* Return not supported if PMU is not initialized */
> + if (!kvpmu->init_done)
> + return -EINVAL;
> +
> + switch (funcid) {
> + case SBI_EXT_PMU_NUM_COUNTERS:
> + ret = kvm_riscv_vcpu_pmu_num_ctrs(vcpu, edata);
> + break;
> + case SBI_EXT_PMU_COUNTER_GET_INFO:
> + ret = kvm_riscv_vcpu_pmu_ctr_info(vcpu, cp->a0, edata);
> + break;
> + case SBI_EXT_PMU_COUNTER_CFG_MATCH:
> +#if defined(CONFIG_32BIT)
> + temp = ((uint64_t)cp->a5 << 32) | cp->a4;
> +#else
> + temp = cp->a4;
> +#endif
> + ret = kvm_riscv_vcpu_pmu_ctr_cfg_match(vcpu, cp->a0, cp->a1,
> + cp->a2, cp->a3, temp, edata);
> + break;
> + case SBI_EXT_PMU_COUNTER_START:
> +#if defined(CONFIG_32BIT)
> + temp = ((uint64_t)cp->a4 << 32) | cp->a3;
> +#else
> + temp = cp->a3;
> +#endif
> + ret = kvm_riscv_vcpu_pmu_ctr_start(vcpu, cp->a0, cp->a1, cp->a2,
> + temp, edata);
> + break;
> + case SBI_EXT_PMU_COUNTER_STOP:
> + ret = kvm_riscv_vcpu_pmu_ctr_stop(vcpu, cp->a0, cp->a1, cp->a2, edata);
> + break;
> + case SBI_EXT_PMU_COUNTER_FW_READ:
> + ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, edata);
> + break;
> + default:
> + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> + }
> +
> +
> + return ret;
> +}
> +
> +unsigned long kvm_sbi_ext_pmu_probe(struct kvm_vcpu *vcpu, unsigned long extid)
> +{
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> +
> + /*
> + * PMU Extension is only available to guests if privilege mode filtering
> + * is available. Otherwise, guest will always count events while the
> + * execution is in hypervisor mode.
> + */
> + return kvpmu->init_done && riscv_isa_extension_available(NULL, SSCOFPMF);

Assuming we're only supporting homogeneous systems, then can't we just
check for Sscofpmf at PMU init time? When the extension isn't present
we'd fail to init and then here init_done wouldn't be set.

> +}
> +
> +const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu = {
> + .extid_start = SBI_EXT_PMU,
> + .extid_end = SBI_EXT_PMU,
> + .handler = kvm_sbi_ext_pmu_handler,
> + .probe = kvm_sbi_ext_pmu_probe,
> +};
> --
> 2.25.1
>

Thanks,
drew

2023-01-12 16:01:58

by Andrew Jones

[permalink] [raw]

Subject: Re: [PATCH v2 06/11] RISC-V: KVM: Add skeleton support for perf

On Thu, Dec 15, 2022 at 09:00:41AM -0800, Atish Patra wrote:
> This patch only adds barebore structure of perf implementation. Most of
> the function returns zero at this point and will be implemented
> fully in the future.
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/include/asm/kvm_host.h | 3 +
> arch/riscv/include/asm/kvm_vcpu_pmu.h | 76 ++++++++++++++
> arch/riscv/kvm/Makefile | 1 +
> arch/riscv/kvm/vcpu.c | 5 +
> arch/riscv/kvm/vcpu_insn.c | 2 +-
> arch/riscv/kvm/vcpu_pmu.c | 142 ++++++++++++++++++++++++++
> 6 files changed, 228 insertions(+), 1 deletion(-)
> create mode 100644 arch/riscv/include/asm/kvm_vcpu_pmu.h
> create mode 100644 arch/riscv/kvm/vcpu_pmu.c
>
> diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> index 93f43a3..f9874b4 100644
> --- a/arch/riscv/include/asm/kvm_host.h
> +++ b/arch/riscv/include/asm/kvm_host.h
> @@ -18,6 +18,7 @@
> #include <asm/kvm_vcpu_insn.h>
> #include <asm/kvm_vcpu_sbi.h>
> #include <asm/kvm_vcpu_timer.h>
> +#include <asm/kvm_vcpu_pmu.h>
>
> #define KVM_MAX_VCPUS 1024
>
> @@ -228,6 +229,8 @@ struct kvm_vcpu_arch {
>
> /* Don't run the VCPU (blocked) */
> bool pause;
> +
> + struct kvm_pmu pmu;
> };
>
> static inline void kvm_arch_hardware_unsetup(void) {}
> diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> new file mode 100644
> index 0000000..6a8c0f7
> --- /dev/null
> +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> @@ -0,0 +1,76 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022 Rivos Inc
> + *
> + * Authors:
> + * Atish Patra <[email protected]>
> + */
> +
> +#ifndef __KVM_VCPU_RISCV_PMU_H
> +#define __KVM_VCPU_RISCV_PMU_H
> +
> +#include <linux/perf/riscv_pmu.h>
> +#include <asm/kvm_vcpu_sbi.h>
> +#include <asm/sbi.h>
> +
> +#ifdef CONFIG_RISCV_PMU_SBI
> +#define RISCV_KVM_MAX_FW_CTRS 32
> +#define RISCV_MAX_COUNTERS 64
> +
> +/* Per virtual pmu counter data */
> +struct kvm_pmc {
> + u8 idx;
> + struct perf_event *perf_event;
> + uint64_t counter_val;
> + union sbi_pmu_ctr_info cinfo;
> + /* Event monitoring status */
> + bool started;
> +};
> +
> +/* PMU data structure per vcpu */
> +struct kvm_pmu {
> + struct kvm_pmc pmc[RISCV_MAX_COUNTERS];
> + /* Number of the virtual firmware counters available */
> + int num_fw_ctrs;
> + /* Number of the virtual hardware counters available */
> + int num_hw_ctrs;
> + /* A flag to indicate that pmu initialization is done */
> + bool init_done;
> + /* Bit map of all the virtual counter used */
> + DECLARE_BITMAP(pmc_in_use, RISCV_MAX_COUNTERS);
> +};
> +
> +#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
> +#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
> +
> +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
> +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> + struct kvm_vcpu_sbi_ext_data *edata);
> +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> + struct kvm_vcpu_sbi_ext_data *edata);
> +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag,
> + struct kvm_vcpu_sbi_ext_data *edata);
> +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag,
> + unsigned long eidx, uint64_t edata,
> + struct kvm_vcpu_sbi_ext_data *extdata);

How about replacing 'edata' with 'evtdata' and then using 'edata' for the
struct kvm_vcpu_sbi_ext_data pointer in order to keep the struct pointer
name consistent with the other functions?

> +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> + struct kvm_vcpu_sbi_ext_data *edata);
> +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
> +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
> +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> +
> +#else
> +struct kvm_pmu {
> +};
> +
> +static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> +{
> + return 0;
> +}
> +static inline void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu) {}
> +#endif
> +#endif

nit: it'd be nice to have

#endif /* CONFIG_RISCV_PMU_SBI */
#endif /* !__KVM_VCPU_RISCV_PMU_H */

> diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> index 019df920..5de1053 100644
> --- a/arch/riscv/kvm/Makefile
> +++ b/arch/riscv/kvm/Makefile
> @@ -25,3 +25,4 @@ kvm-y += vcpu_sbi_base.o
> kvm-y += vcpu_sbi_replace.o
> kvm-y += vcpu_sbi_hsm.o
> kvm-y += vcpu_timer.o
> +kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> index 7c08567..b746f21 100644
> --- a/arch/riscv/kvm/vcpu.c
> +++ b/arch/riscv/kvm/vcpu.c
> @@ -137,6 +137,7 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
>
> WRITE_ONCE(vcpu->arch.irqs_pending, 0);
> WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
> + kvm_riscv_vcpu_pmu_reset(vcpu);
>
> vcpu->arch.hfence_head = 0;
> vcpu->arch.hfence_tail = 0;
> @@ -194,6 +195,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> /* Setup VCPU timer */
> kvm_riscv_vcpu_timer_init(vcpu);
>
> + /* setup performance monitoring */
> + kvm_riscv_vcpu_pmu_init(vcpu);
> +
> /* Reset VCPU */
> kvm_riscv_reset_vcpu(vcpu);
>
> @@ -216,6 +220,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> /* Cleanup VCPU timer */
> kvm_riscv_vcpu_timer_deinit(vcpu);
>
> + kvm_riscv_vcpu_pmu_deinit(vcpu);
> /* Free unused pages pre-allocated for G-stage page table mappings */
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> }
> diff --git a/arch/riscv/kvm/vcpu_insn.c b/arch/riscv/kvm/vcpu_insn.c
> index 0bb5276..1ff2649 100644
> --- a/arch/riscv/kvm/vcpu_insn.c
> +++ b/arch/riscv/kvm/vcpu_insn.c
> @@ -213,7 +213,7 @@ struct csr_func {
> unsigned long wr_mask);
> };
>
> -static const struct csr_func csr_funcs[] = { };
> +static const struct csr_func csr_funcs[] = {};

Stray change

>
> /**
> * kvm_riscv_vcpu_csr_return -- Handle CSR read/write after user space
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> new file mode 100644
> index 0000000..0f0748f1
> --- /dev/null
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -0,0 +1,142 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022 Rivos Inc
> + *
> + * Authors:
> + * Atish Patra <[email protected]>
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/kvm_host.h>
> +#include <linux/perf/riscv_pmu.h>
> +#include <asm/csr.h>
> +#include <asm/kvm_vcpu_sbi.h>
> +#include <asm/kvm_vcpu_pmu.h>
> +#include <linux/kvm_host.h>
> +
> +#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> +
> +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
> +{
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> +
> + edata->out_val = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;

edata->out_val = kvm_pmu_num_counters(kvpmu);

> +
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> + struct kvm_vcpu_sbi_ext_data *edata)
> +{
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> +
> + if ((cidx > RISCV_MAX_COUNTERS) || (cidx == 1)) {

nit: No need for () around the expressions

> + edata->err_val = SBI_ERR_INVALID_PARAM;
> + return 0;
> + }
> +
> + edata->out_val = kvpmu->pmc[cidx].cinfo.value;
> +
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> + struct kvm_vcpu_sbi_ext_data *edata)
> +{
> + /* TODO */
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag,
> + struct kvm_vcpu_sbi_ext_data *edata)
> +{
> + /* TODO */
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> + unsigned long ctr_mask, unsigned long flag,
> + unsigned long eidx, uint64_t edata,
> + struct kvm_vcpu_sbi_ext_data *extdata)
> +{
> + /* TODO */
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> + struct kvm_vcpu_sbi_ext_data *edata)
> +{
> + /* TODO */
> + return 0;
> +}
> +
> +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> +{
> + int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> +
> + ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
> + if (ret < 0)
> + return ret;
> +
> + if (!hpm_width || !num_hw_ctrs) {
> + pr_err("Can not initialize PMU for vcpu with NULL hpmcounter width/count\n");
^ Cannot ^ VCPU ^ or number counters

> + return -EINVAL;
> + }
> +
> + if ((num_hw_ctrs + RISCV_KVM_MAX_FW_CTRS) > RISCV_MAX_COUNTERS)

Shouldn't we warn about this condition? Presumably it means Linux selected
RISCV_MAX_COUNTERS too small, so a warning would let us know we need to
bump it up.

> + num_fw_ctrs = RISCV_MAX_COUNTERS - num_hw_ctrs;
> + else
> + num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;
> +
> + kvpmu->num_hw_ctrs = num_hw_ctrs;
> + kvpmu->num_fw_ctrs = num_fw_ctrs;

nit: add blank line here

> + /*
> + * There is no corelation betwen the logical hardware counter and virtual counters.

correlation

> + * However, we need to encode a hpmcounter CSR in the counter info field so that
> + * KVM can trap n emulate the read. This works well in the migraiton usecase as

migration

> + * KVM doesn't care if the actual hpmcounter is available in the hardware or not.
> + */
> + for (i = 0; i < kvm_pmu_num_counters(kvpmu); i++) {
> + /* TIME CSR shouldn't be read from perf interface */
> + if (i == 1)
> + continue;
> + kvpmu->pmc[i].idx = i;
> + if (i < kvpmu->num_hw_ctrs) {
> + kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
> + if (i < 3)
> + /* CY, IR counters */
> + kvpmu->pmc[i].cinfo.width = 63;
> + else
> + kvpmu->pmc[i].cinfo.width = hpm_width;
> + /*
> + * The CSR number doesn't have any relation with the logical
> + * hardware counters. The CSR numbers are encoded sequentially
> + * to avoid maintaining a map between the virtual counter
> + * and CSR number.
> + */
> + kvpmu->pmc[i].cinfo.csr = CSR_CYCLE + i;
> + } else {
> + kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_FW;
> + kvpmu->pmc[i].cinfo.width = BITS_PER_LONG - 1;
> + }
> + }
> +
> + kvpmu->init_done = true;
> +
> + return 0;
> +}
> +
> +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> +{
> + /* TODO */
> +}
> +
> +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> +{
> + /* TODO */
> +}
> +
> --
> 2.25.1
>

Thanks,
drew

2023-01-12 16:05:31

by Andrew Jones

[permalink] [raw]

Subject: Re: [PATCH v2 08/11] RISC-V: KVM: Disable all hpmcounter access for VS/VU mode

On Thu, Dec 15, 2022 at 09:00:43AM -0800, Atish Patra wrote:
> Any guest must not get access to any hpmcounter including cycle/instret
> without any checks. We achieve that by disabling all the bits except TM
> bit in hcountern.

hcounteren

>
> However, instret and cycle access for guest userspace can be enabled
> upon explicit request (via ONE REG) or on first trap from VU mode
> to maintain ABI requirement in the future. This patch doesn't support
> that as ONE REG inteface is not settled yet.
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/kvm/main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
> index 58c5489..9c2efd3 100644
> --- a/arch/riscv/kvm/main.c
> +++ b/arch/riscv/kvm/main.c
> @@ -49,7 +49,8 @@ int kvm_arch_hardware_enable(void)
> hideleg |= (1UL << IRQ_VS_EXT);
> csr_write(CSR_HIDELEG, hideleg);
>
> - csr_write(CSR_HCOUNTEREN, -1UL);
> + /* VS should access only TM bit. Everything else should trap */

s/TM bit/the time counter/

> + csr_write(CSR_HCOUNTEREN, 0x02);
>
> csr_write(CSR_HVIP, 0);
>
> --
> 2.25.1
>

Otherwise,

Reviewed-by: Andrew Jones <[email protected]>

Thanks,
drew

2023-01-12 18:53:32

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v2 07/11] RISC-V: KVM: Add SBI PMU extension support

On Thu, Jan 12, 2023 at 7:29 AM Andrew Jones <[email protected]> wrote:
>
> On Thu, Dec 15, 2022 at 09:00:42AM -0800, Atish Patra wrote:
> > SBI PMU extension allows KVM guests to configure/start/stop/query about
> > the PMU counters in virtualized enviornment as well.
> >
> > In order to allow that, KVM implements the entire SBI PMU extension.
> >
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > arch/riscv/kvm/Makefile | 2 +-
> > arch/riscv/kvm/vcpu_sbi.c | 11 +++++
> > arch/riscv/kvm/vcpu_sbi_pmu.c | 86 +++++++++++++++++++++++++++++++++++
> > 3 files changed, 98 insertions(+), 1 deletion(-)
> > create mode 100644 arch/riscv/kvm/vcpu_sbi_pmu.c
> >
> > diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> > index 5de1053..278e97c 100644
> > --- a/arch/riscv/kvm/Makefile
> > +++ b/arch/riscv/kvm/Makefile
> > @@ -25,4 +25,4 @@ kvm-y += vcpu_sbi_base.o
> > kvm-y += vcpu_sbi_replace.o
> > kvm-y += vcpu_sbi_hsm.o
> > kvm-y += vcpu_timer.o
> > -kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
> > +kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o vcpu_sbi_pmu.o
> > diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
> > index 50c5472..3b8b84e8 100644
> > --- a/arch/riscv/kvm/vcpu_sbi.c
> > +++ b/arch/riscv/kvm/vcpu_sbi.c
> > @@ -20,6 +20,16 @@ static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_v01 = {
> > };
> > #endif
> >
> > +#ifdef CONFIG_RISCV_PMU_SBI
> > +extern const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu;
> > +#else
> > +static const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu = {
> > + .extid_start = -1UL,
> > + .extid_end = -1UL,
> > + .handler = NULL,
> > +};
> > +#endif
> > +
> > static const struct kvm_vcpu_sbi_extension *sbi_ext[] = {
> > &vcpu_sbi_ext_v01,
> > &vcpu_sbi_ext_base,
> > @@ -28,6 +38,7 @@ static const struct kvm_vcpu_sbi_extension *sbi_ext[] = {
> > &vcpu_sbi_ext_rfence,
> > &vcpu_sbi_ext_srst,
> > &vcpu_sbi_ext_hsm,
> > + &vcpu_sbi_ext_pmu,
> > &vcpu_sbi_ext_experimental,
> > &vcpu_sbi_ext_vendor,
> > };
> > diff --git a/arch/riscv/kvm/vcpu_sbi_pmu.c b/arch/riscv/kvm/vcpu_sbi_pmu.c
> > new file mode 100644
> > index 0000000..223752f
> > --- /dev/null
> > +++ b/arch/riscv/kvm/vcpu_sbi_pmu.c
> > @@ -0,0 +1,86 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (c) 2022 Rivos Inc
> > + *
> > + * Authors:
> > + * Atish Patra <[email protected]>
> > + */
> > +
> > +#include <linux/errno.h>
> > +#include <linux/err.h>
> > +#include <linux/kvm_host.h>
> > +#include <asm/csr.h>
> > +#include <asm/sbi.h>
> > +#include <asm/kvm_vcpu_sbi.h>
> > +
> > +static int kvm_sbi_ext_pmu_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > + struct kvm_vcpu_sbi_ext_data *edata,
> > + struct kvm_cpu_trap *utrap)
> > +{
> > + int ret = 0;
> > + struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > + unsigned long funcid = cp->a6;
> > + uint64_t temp;
> > +
> > + /* Return not supported if PMU is not initialized */
> > + if (!kvpmu->init_done)
> > + return -EINVAL;
> > +
> > + switch (funcid) {
> > + case SBI_EXT_PMU_NUM_COUNTERS:
> > + ret = kvm_riscv_vcpu_pmu_num_ctrs(vcpu, edata);
> > + break;
> > + case SBI_EXT_PMU_COUNTER_GET_INFO:
> > + ret = kvm_riscv_vcpu_pmu_ctr_info(vcpu, cp->a0, edata);
> > + break;
> > + case SBI_EXT_PMU_COUNTER_CFG_MATCH:
> > +#if defined(CONFIG_32BIT)
> > + temp = ((uint64_t)cp->a5 << 32) | cp->a4;
> > +#else
> > + temp = cp->a4;
> > +#endif
> > + ret = kvm_riscv_vcpu_pmu_ctr_cfg_match(vcpu, cp->a0, cp->a1,
> > + cp->a2, cp->a3, temp, edata);
> > + break;
> > + case SBI_EXT_PMU_COUNTER_START:
> > +#if defined(CONFIG_32BIT)
> > + temp = ((uint64_t)cp->a4 << 32) | cp->a3;
> > +#else
> > + temp = cp->a3;
> > +#endif
> > + ret = kvm_riscv_vcpu_pmu_ctr_start(vcpu, cp->a0, cp->a1, cp->a2,
> > + temp, edata);
> > + break;
> > + case SBI_EXT_PMU_COUNTER_STOP:
> > + ret = kvm_riscv_vcpu_pmu_ctr_stop(vcpu, cp->a0, cp->a1, cp->a2, edata);
> > + break;
> > + case SBI_EXT_PMU_COUNTER_FW_READ:
> > + ret = kvm_riscv_vcpu_pmu_ctr_read(vcpu, cp->a0, edata);
> > + break;
> > + default:
> > + edata->err_val = SBI_ERR_NOT_SUPPORTED;
> > + }
> > +
> > +
> > + return ret;
> > +}
> > +
> > +unsigned long kvm_sbi_ext_pmu_probe(struct kvm_vcpu *vcpu, unsigned long extid)
> > +{
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > +
> > + /*
> > + * PMU Extension is only available to guests if privilege mode filtering
> > + * is available. Otherwise, guest will always count events while the
> > + * execution is in hypervisor mode.
> > + */
> > + return kvpmu->init_done && riscv_isa_extension_available(NULL, SSCOFPMF);
>
> Assuming we're only supporting homogeneous systems, then can't we just
> check for Sscofpmf at PMU init time? When the extension isn't present
> we'd fail to init and then here init_done wouldn't be set.
>

Sure. We can do that too. Will revise it v3.

> > +}
> > +
> > +const struct kvm_vcpu_sbi_extension vcpu_sbi_ext_pmu = {
> > + .extid_start = SBI_EXT_PMU,
> > + .extid_end = SBI_EXT_PMU,
> > + .handler = kvm_sbi_ext_pmu_handler,
> > + .probe = kvm_sbi_ext_pmu_probe,
> > +};
> > --
> > 2.25.1
> >
>
> Thanks,
> drew

2023-01-12 19:06:34

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v2 03/11] RISC-V: KVM: Return correct code for hsm stop function

On Thu, Jan 12, 2023 at 2:28 AM Andrew Jones <[email protected]> wrote:
>
> On Thu, Dec 15, 2022 at 09:00:38AM -0800, Atish Patra wrote:
> > According to the SBI specification, the stop function can only
> > return error code SBI_ERR_FAILED. However, currently it returns
> > -EINVAL which will be mapped SBI_ERR_INVALID_PARAM.
>
> I presume the mapping referred to here is kvm_linux_err_map_sbi().
> If so, then -EPERM isn't correct either. That maps to SBI_ERR_DENIED.
> The only thing that will ensure we get SBI_ERR_FAILURE (-1) is
> anything not handled by the kvm_linux_err_map_sbi switch, as we
> need to use the default.
>

It returns SBI_ERR_FAILURE in the next patch when kvm_linux_err_map_sbi
is removed. Maybe I should drop this patch. The next patch does the
correct thing anyways.

> Thanks,
> drew
>
> >
> > Return the appropriate linux error code.
> >
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > arch/riscv/kvm/vcpu_sbi_hsm.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/kvm/vcpu_sbi_hsm.c b/arch/riscv/kvm/vcpu_sbi_hsm.c
> > index 2e915ca..0f8d9fe 100644
> > --- a/arch/riscv/kvm/vcpu_sbi_hsm.c
> > +++ b/arch/riscv/kvm/vcpu_sbi_hsm.c
> > @@ -42,7 +42,7 @@ static int kvm_sbi_hsm_vcpu_start(struct kvm_vcpu *vcpu)
> > static int kvm_sbi_hsm_vcpu_stop(struct kvm_vcpu *vcpu)
> > {
> > if (vcpu->arch.power_off)
> > - return -EINVAL;
> > + return -EPERM;
> >
> > kvm_riscv_vcpu_power_off(vcpu);
> >
> > --
> > 2.25.1
> >

2023-01-12 19:27:28

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v2 02/11] RISC-V: KVM: Define a probe function for SBI extension data structures

On Thu, Jan 12, 2023 at 2:21 AM Andrew Jones <[email protected]> wrote:
>
> On Thu, Dec 15, 2022 at 09:00:37AM -0800, Atish Patra wrote:
> > Currently the probe function just checks if an SBI extension is
> > registered or not. However, the extension may not want to advertise
> > itself depending on some other condition.
> > An additional extension specific probe function will allow
> > extensions to decide if they want to be advertised to the caller or
> > not. Any extension that does not require additional dependency checks
> > can avoid implementing this function.
> >
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > arch/riscv/include/asm/kvm_vcpu_sbi.h | 3 +++
> > arch/riscv/kvm/vcpu_sbi_base.c | 13 +++++++++++--
> > 2 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/kvm_vcpu_sbi.h b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> > index f79478a..61dac1b 100644
> > --- a/arch/riscv/include/asm/kvm_vcpu_sbi.h
> > +++ b/arch/riscv/include/asm/kvm_vcpu_sbi.h
> > @@ -29,6 +29,9 @@ struct kvm_vcpu_sbi_extension {
> > int (*handler)(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > unsigned long *out_val, struct kvm_cpu_trap *utrap,
> > bool *exit);
> > +
> > + /* Extension specific probe function */
> > + unsigned long (*probe)(struct kvm_vcpu *vcpu, unsigned long extid);
>
> It doesn't seem like the extid parameter should be necessary since the
> probe function is specific to the extension, but it doesn't hurt either.
>

Yeah. You are correct. I will drop it. Thanks.

> > };
> >
> > void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, struct kvm_run *run);
> > diff --git a/arch/riscv/kvm/vcpu_sbi_base.c b/arch/riscv/kvm/vcpu_sbi_base.c
> > index 5d65c63..89e2415 100644
> > --- a/arch/riscv/kvm/vcpu_sbi_base.c
> > +++ b/arch/riscv/kvm/vcpu_sbi_base.c
> > @@ -19,6 +19,7 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > {
> > int ret = 0;
> > struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
> > + const struct kvm_vcpu_sbi_extension *sbi_ext;
> >
> > switch (cp->a6) {
> > case SBI_EXT_BASE_GET_SPEC_VERSION:
> > @@ -43,8 +44,16 @@ static int kvm_sbi_ext_base_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > */
> > kvm_riscv_vcpu_sbi_forward(vcpu, run);
> > *exit = true;
> > - } else
> > - *out_val = kvm_vcpu_sbi_find_ext(cp->a0) ? 1 : 0;
> > + } else {
> > + sbi_ext = kvm_vcpu_sbi_find_ext(cp->a0);
> > + if (sbi_ext) {
> > + if (sbi_ext->probe)
> > + *out_val = sbi_ext->probe(vcpu, cp->a0);
> > + else
> > + *out_val = 1;
> > + } else
> > + *out_val = 0;
> > + }
> > break;
> > case SBI_EXT_BASE_GET_MVENDORID:
> > *out_val = vcpu->arch.mvendorid;
> > --
> > 2.25.1
> >
>
> Reviewed-by: Andrew Jones <[email protected]>

2023-01-12 19:45:47

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v2 06/11] RISC-V: KVM: Add skeleton support for perf

On Thu, Jan 12, 2023 at 7:10 AM Andrew Jones <[email protected]> wrote:
>
> On Thu, Dec 15, 2022 at 09:00:41AM -0800, Atish Patra wrote:
> > This patch only adds barebore structure of perf implementation. Most of
> > the function returns zero at this point and will be implemented
> > fully in the future.
> >
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > arch/riscv/include/asm/kvm_host.h | 3 +
> > arch/riscv/include/asm/kvm_vcpu_pmu.h | 76 ++++++++++++++
> > arch/riscv/kvm/Makefile | 1 +
> > arch/riscv/kvm/vcpu.c | 5 +
> > arch/riscv/kvm/vcpu_insn.c | 2 +-
> > arch/riscv/kvm/vcpu_pmu.c | 142 ++++++++++++++++++++++++++
> > 6 files changed, 228 insertions(+), 1 deletion(-)
> > create mode 100644 arch/riscv/include/asm/kvm_vcpu_pmu.h
> > create mode 100644 arch/riscv/kvm/vcpu_pmu.c
> >
> > diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
> > index 93f43a3..f9874b4 100644
> > --- a/arch/riscv/include/asm/kvm_host.h
> > +++ b/arch/riscv/include/asm/kvm_host.h
> > @@ -18,6 +18,7 @@
> > #include <asm/kvm_vcpu_insn.h>
> > #include <asm/kvm_vcpu_sbi.h>
> > #include <asm/kvm_vcpu_timer.h>
> > +#include <asm/kvm_vcpu_pmu.h>
> >
> > #define KVM_MAX_VCPUS 1024
> >
> > @@ -228,6 +229,8 @@ struct kvm_vcpu_arch {
> >
> > /* Don't run the VCPU (blocked) */
> > bool pause;
> > +
> > + struct kvm_pmu pmu;
> > };
> >
> > static inline void kvm_arch_hardware_unsetup(void) {}
> > diff --git a/arch/riscv/include/asm/kvm_vcpu_pmu.h b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > new file mode 100644
> > index 0000000..6a8c0f7
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/kvm_vcpu_pmu.h
> > @@ -0,0 +1,76 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (c) 2022 Rivos Inc
> > + *
> > + * Authors:
> > + * Atish Patra <[email protected]>
> > + */
> > +
> > +#ifndef __KVM_VCPU_RISCV_PMU_H
> > +#define __KVM_VCPU_RISCV_PMU_H
> > +
> > +#include <linux/perf/riscv_pmu.h>
> > +#include <asm/kvm_vcpu_sbi.h>
> > +#include <asm/sbi.h>
> > +
> > +#ifdef CONFIG_RISCV_PMU_SBI
> > +#define RISCV_KVM_MAX_FW_CTRS 32
> > +#define RISCV_MAX_COUNTERS 64
> > +
> > +/* Per virtual pmu counter data */
> > +struct kvm_pmc {
> > + u8 idx;
> > + struct perf_event *perf_event;
> > + uint64_t counter_val;
> > + union sbi_pmu_ctr_info cinfo;
> > + /* Event monitoring status */
> > + bool started;
> > +};
> > +
> > +/* PMU data structure per vcpu */
> > +struct kvm_pmu {
> > + struct kvm_pmc pmc[RISCV_MAX_COUNTERS];
> > + /* Number of the virtual firmware counters available */
> > + int num_fw_ctrs;
> > + /* Number of the virtual hardware counters available */
> > + int num_hw_ctrs;
> > + /* A flag to indicate that pmu initialization is done */
> > + bool init_done;
> > + /* Bit map of all the virtual counter used */
> > + DECLARE_BITMAP(pmc_in_use, RISCV_MAX_COUNTERS);
> > +};
> > +
> > +#define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
> > +#define pmu_to_vcpu(pmu) (container_of((pmu), struct kvm_vcpu, arch.pmu))
> > +
> > +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata);
> > +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > + struct kvm_vcpu_sbi_ext_data *edata);
> > +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> > + struct kvm_vcpu_sbi_ext_data *edata);
> > +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag,
> > + struct kvm_vcpu_sbi_ext_data *edata);
> > +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag,
> > + unsigned long eidx, uint64_t edata,
> > + struct kvm_vcpu_sbi_ext_data *extdata);
>
> How about replacing 'edata' with 'evtdata' and then using 'edata' for the
> struct kvm_vcpu_sbi_ext_data pointer in order to keep the struct pointer
> name consistent with the other functions?
>

Sure.

> > +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > + struct kvm_vcpu_sbi_ext_data *edata);
> > +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu);
> > +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu);
> > +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu);
> > +
> > +#else
> > +struct kvm_pmu {
> > +};
> > +
> > +static inline int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > +{
> > + return 0;
> > +}
> > +static inline void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu) {}
> > +static inline void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu) {}
> > +#endif
> > +#endif
>
> nit: it'd be nice to have
>
> #endif /* CONFIG_RISCV_PMU_SBI */
> #endif /* !__KVM_VCPU_RISCV_PMU_H */
>
> > diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
> > index 019df920..5de1053 100644
> > --- a/arch/riscv/kvm/Makefile
> > +++ b/arch/riscv/kvm/Makefile
> > @@ -25,3 +25,4 @@ kvm-y += vcpu_sbi_base.o
> > kvm-y += vcpu_sbi_replace.o
> > kvm-y += vcpu_sbi_hsm.o
> > kvm-y += vcpu_timer.o
> > +kvm-$(CONFIG_RISCV_PMU_SBI) += vcpu_pmu.o
> > diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> > index 7c08567..b746f21 100644
> > --- a/arch/riscv/kvm/vcpu.c
> > +++ b/arch/riscv/kvm/vcpu.c
> > @@ -137,6 +137,7 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
> >
> > WRITE_ONCE(vcpu->arch.irqs_pending, 0);
> > WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
> > + kvm_riscv_vcpu_pmu_reset(vcpu);
> >
> > vcpu->arch.hfence_head = 0;
> > vcpu->arch.hfence_tail = 0;
> > @@ -194,6 +195,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> > /* Setup VCPU timer */
> > kvm_riscv_vcpu_timer_init(vcpu);
> >
> > + /* setup performance monitoring */
> > + kvm_riscv_vcpu_pmu_init(vcpu);
> > +
> > /* Reset VCPU */
> > kvm_riscv_reset_vcpu(vcpu);
> >
> > @@ -216,6 +220,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> > /* Cleanup VCPU timer */
> > kvm_riscv_vcpu_timer_deinit(vcpu);
> >
> > + kvm_riscv_vcpu_pmu_deinit(vcpu);
> > /* Free unused pages pre-allocated for G-stage page table mappings */
> > kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> > }
> > diff --git a/arch/riscv/kvm/vcpu_insn.c b/arch/riscv/kvm/vcpu_insn.c
> > index 0bb5276..1ff2649 100644
> > --- a/arch/riscv/kvm/vcpu_insn.c
> > +++ b/arch/riscv/kvm/vcpu_insn.c
> > @@ -213,7 +213,7 @@ struct csr_func {
> > unsigned long wr_mask);
> > };
> >
> > -static const struct csr_func csr_funcs[] = { };
> > +static const struct csr_func csr_funcs[] = {};
>
> Stray change
>
> >
> > /**
> > * kvm_riscv_vcpu_csr_return -- Handle CSR read/write after user space
> > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > new file mode 100644
> > index 0000000..0f0748f1
> > --- /dev/null
> > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > @@ -0,0 +1,142 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (c) 2022 Rivos Inc
> > + *
> > + * Authors:
> > + * Atish Patra <[email protected]>
> > + */
> > +
> > +#include <linux/errno.h>
> > +#include <linux/err.h>
> > +#include <linux/kvm_host.h>
> > +#include <linux/perf/riscv_pmu.h>
> > +#include <asm/csr.h>
> > +#include <asm/kvm_vcpu_sbi.h>
> > +#include <asm/kvm_vcpu_pmu.h>
> > +#include <linux/kvm_host.h>
> > +
> > +#define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> > +
> > +int kvm_riscv_vcpu_pmu_num_ctrs(struct kvm_vcpu *vcpu, struct kvm_vcpu_sbi_ext_data *edata)
> > +{
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > +
> > + edata->out_val = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;
>
> edata->out_val = kvm_pmu_num_counters(kvpmu);
>

ok.

> > +
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_ctr_info(struct kvm_vcpu *vcpu, unsigned long cidx,
> > + struct kvm_vcpu_sbi_ext_data *edata)
> > +{
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > +
> > + if ((cidx > RISCV_MAX_COUNTERS) || (cidx == 1)) {
>
> nit: No need for () around the expressions
>
> > + edata->err_val = SBI_ERR_INVALID_PARAM;
> > + return 0;
> > + }
> > +
> > + edata->out_val = kvpmu->pmc[cidx].cinfo.value;
> > +
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> > + struct kvm_vcpu_sbi_ext_data *edata)
> > +{
> > + /* TODO */
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag,
> > + struct kvm_vcpu_sbi_ext_data *edata)
> > +{
> > + /* TODO */
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > + unsigned long ctr_mask, unsigned long flag,
> > + unsigned long eidx, uint64_t edata,
> > + struct kvm_vcpu_sbi_ext_data *extdata)
> > +{
> > + /* TODO */
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > + struct kvm_vcpu_sbi_ext_data *edata)
> > +{
> > + /* TODO */
> > + return 0;
> > +}
> > +
> > +int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > +{
> > + int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > +
> > + ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
> > + if (ret < 0)
> > + return ret;
> > +
> > + if (!hpm_width || !num_hw_ctrs) {
> > + pr_err("Can not initialize PMU for vcpu with NULL hpmcounter width/count\n");
> ^ Cannot ^ VCPU ^ or number counters
>
> > + return -EINVAL;
> > + }
> > +
> > + if ((num_hw_ctrs + RISCV_KVM_MAX_FW_CTRS) > RISCV_MAX_COUNTERS)
>
> Shouldn't we warn about this condition? Presumably it means Linux selected
> RISCV_MAX_COUNTERS too small, so a warning would let us know we need to
> bump it up.
>

ok. I will leave a warning message. We are unlikely to hit that
scenario as RISCV_MAX_COUNTERS(64) unless
we add bunch of firmware counters for kvm. At that point both
RISCV_MAX_COUNTERS &
RISCV_FW_MAX_COUNTERS need to be increased.

> > + num_fw_ctrs = RISCV_MAX_COUNTERS - num_hw_ctrs;
> > + else
> > + num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;
> > +
> > + kvpmu->num_hw_ctrs = num_hw_ctrs;
> > + kvpmu->num_fw_ctrs = num_fw_ctrs;
>
> nit: add blank line here
>
> > + /*
> > + * There is no corelation betwen the logical hardware counter and virtual counters.
>
> correlation
>
> > + * However, we need to encode a hpmcounter CSR in the counter info field so that
> > + * KVM can trap n emulate the read. This works well in the migraiton usecase as
>
> migration
>

Thanks for the review. I will address all other comments as well.

> > + * KVM doesn't care if the actual hpmcounter is available in the hardware or not.
> > + */
> > + for (i = 0; i < kvm_pmu_num_counters(kvpmu); i++) {
> > + /* TIME CSR shouldn't be read from perf interface */
> > + if (i == 1)
> > + continue;
> > + kvpmu->pmc[i].idx = i;
> > + if (i < kvpmu->num_hw_ctrs) {
> > + kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
> > + if (i < 3)
> > + /* CY, IR counters */
> > + kvpmu->pmc[i].cinfo.width = 63;
> > + else
> > + kvpmu->pmc[i].cinfo.width = hpm_width;
> > + /*
> > + * The CSR number doesn't have any relation with the logical
> > + * hardware counters. The CSR numbers are encoded sequentially
> > + * to avoid maintaining a map between the virtual counter
> > + * and CSR number.
> > + */
> > + kvpmu->pmc[i].cinfo.csr = CSR_CYCLE + i;
> > + } else {
> > + kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_FW;
> > + kvpmu->pmc[i].cinfo.width = BITS_PER_LONG - 1;
> > + }
> > + }
> > +
> > + kvpmu->init_done = true;
> > +
> > + return 0;
> > +}
> > +
> > +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> > +{
> > + /* TODO */
> > +}
> > +
> > +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> > +{
> > + /* TODO */
> > +}
> > +
> > --
> > 2.25.1
> >
>
> Thanks,
> drew

2023-01-12 19:57:00

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v2 01/11] RISC-V: Define helper functions expose hpm counter width and count

On Thu, Jan 12, 2023 at 2:06 AM Andrew Jones <[email protected]> wrote:
>
> On Thu, Dec 15, 2022 at 09:00:36AM -0800, Atish Patra wrote:
> > KVM module needs to know how many hardware counters and the counter
> > width that the platform supports. Otherwise, it will not be able to show
> > optimal value of virtual counters to the guest. The virtual hardware
> > counters also need to have the same width as the logical hardware
> > counters for simplicity. However, there shouldn't be mapping between
> > virtual hardware counters and logical hardware counters. As we don't
> > support hetergeneous harts or counters with different width as of now,
> > the implementation relies on the counter width of the first available
> > programmable counter.
> >
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > drivers/perf/riscv_pmu_sbi.c | 35 +++++++++++++++++++++++++++++++++-
> > include/linux/perf/riscv_pmu.h | 3 +++
> > 2 files changed, 37 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> > index 3852c18..65d4aa4 100644
> > --- a/drivers/perf/riscv_pmu_sbi.c
> > +++ b/drivers/perf/riscv_pmu_sbi.c
> > @@ -49,6 +49,9 @@ static const struct attribute_group *riscv_pmu_attr_groups[] = {
> > static union sbi_pmu_ctr_info *pmu_ctr_list;
> > static unsigned int riscv_pmu_irq;
> >
> > +/* Cache the available counters in a bitmask */
> > +unsigned long cmask;
>
> I presume this can be static since it's not getting added to the header.
> And don't we need this to be a long long for rv32? We should probably
> just use u64.
>

Yeah. u64 would be better. I will change it along with static. Thanks.

> > +
> > struct sbi_pmu_event_data {
> > union {
> > union {
> > @@ -264,6 +267,37 @@ static bool pmu_sbi_ctr_is_fw(int cidx)
> > return (info->type == SBI_PMU_CTR_TYPE_FW) ? true : false;
> > }
> >
> > +/*
> > + * Returns the counter width of a programmable counter and number of hardware
> > + * counters. As we don't support heterneous CPUs yet, it is okay to just
>
> heterogeneous
>

Fixed.

> > + * return the counter width of the first programmable counter.
> > + */
> > +int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr)
> > +{
> > + int i;
> > + union sbi_pmu_ctr_info *info;
> > + u32 hpm_width = 0, hpm_count = 0;
> > +
> > + if (!cmask)
> > + return -EINVAL;
> > +
> > + for_each_set_bit(i, &cmask, RISCV_MAX_COUNTERS) {
> > + info = &pmu_ctr_list[i];
> > + if (!info)
> > + continue;
> > + if (!hpm_width && (info->csr != CSR_CYCLE) && (info->csr != CSR_INSTRET))
>
> nit: No need for () around the != expressions
>

Fixed.

> > + hpm_width = info->width;
> > + if (info->type == SBI_PMU_CTR_TYPE_HW)
> > + hpm_count++;
> > + }
> > +
> > + *hw_ctr_width = hpm_width;
> > + *num_hw_ctr = hpm_count;
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(riscv_pmu_get_hpm_info);
>
> EXPORT_SYMBOL_GPL ?
>

Is that mandatory ? I have seen usage of both in arch/riscv and other
places though.
I am also not sure if any other non-GPL module should/need access to this.

> > +
> > static int pmu_sbi_ctr_get_idx(struct perf_event *event)
> > {
> > struct hw_perf_event *hwc = &event->hw;
> > @@ -798,7 +832,6 @@ static void riscv_pmu_destroy(struct riscv_pmu *pmu)
> > static int pmu_sbi_device_probe(struct platform_device *pdev)
> > {
> > struct riscv_pmu *pmu = NULL;
> > - unsigned long cmask = 0;
> > int ret = -ENODEV;
> > int num_counters;
> >
> > diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
> > index e17e86a..a1c3f77 100644
> > --- a/include/linux/perf/riscv_pmu.h
> > +++ b/include/linux/perf/riscv_pmu.h
> > @@ -73,6 +73,9 @@ void riscv_pmu_legacy_skip_init(void);
> > static inline void riscv_pmu_legacy_skip_init(void) {};
> > #endif
> > struct riscv_pmu *riscv_pmu_alloc(void);
> > +#ifdef CONFIG_RISCV_PMU_SBI
> > +int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr);
> > +#endif
> >
> > #endif /* CONFIG_RISCV_PMU */
> >
> > --
> > 2.25.1
> >
>
> Thanks,
> drew

2023-01-13 07:52:34

by Andrew Jones

[permalink] [raw]

Subject: Re: [PATCH v2 01/11] RISC-V: Define helper functions expose hpm counter width and count

On Thu, Jan 12, 2023 at 10:18:05AM -0800, Atish Kumar Patra wrote:
> On Thu, Jan 12, 2023 at 2:06 AM Andrew Jones <[email protected]> wrote:
> >
> > On Thu, Dec 15, 2022 at 09:00:36AM -0800, Atish Patra wrote:
...
> > > +EXPORT_SYMBOL(riscv_pmu_get_hpm_info);
> >
> > EXPORT_SYMBOL_GPL ?
> >
>
> Is that mandatory ? I have seen usage of both in arch/riscv and other
> places though.
> I am also not sure if any other non-GPL module should/need access to this.

TBH, I'm not sure what the best policy is, but I presumed we should use
_GPL when we aren't aware of anything non-GPL and then when a day comes
that something non-GPL would like this to be exported, the patch that
flips it will provide the justification in its commit message.

Thanks,
drew

2023-01-13 08:03:35

by Andrew Jones

[permalink] [raw]

Subject: Re: [PATCH v2 03/11] RISC-V: KVM: Return correct code for hsm stop function

On Thu, Jan 12, 2023 at 10:25:06AM -0800, Atish Kumar Patra wrote:
> On Thu, Jan 12, 2023 at 2:28 AM Andrew Jones <[email protected]> wrote:
> >
> > On Thu, Dec 15, 2022 at 09:00:38AM -0800, Atish Patra wrote:
> > > According to the SBI specification, the stop function can only
> > > return error code SBI_ERR_FAILED. However, currently it returns
> > > -EINVAL which will be mapped SBI_ERR_INVALID_PARAM.
> >
> > I presume the mapping referred to here is kvm_linux_err_map_sbi().
> > If so, then -EPERM isn't correct either. That maps to SBI_ERR_DENIED.
> > The only thing that will ensure we get SBI_ERR_FAILURE (-1) is
> > anything not handled by the kvm_linux_err_map_sbi switch, as we
> > need to use the default.
> >
>
> It returns SBI_ERR_FAILURE in the next patch when kvm_linux_err_map_sbi
> is removed. Maybe I should drop this patch. The next patch does the
> correct thing anyways.

Yeah, I saw that, but then we silently fix a bug in the next patch.
I like that this is a separate patch, but it should do what it says
it's doing :-)

Thanks,
drew

2023-01-13 12:05:10

by Andrew Jones

[permalink] [raw]

Subject: Re: [PATCH v2 10/11] RISC-V: KVM: Implement perf support without sampling

On Thu, Dec 15, 2022 at 09:00:45AM -0800, Atish Patra wrote:
> RISC-V SBI PMU & Sscofpmf ISA extension allows supporting perf in
> the virtualization enviornment as well. KVM implementation
> relies on SBI PMU extension for most the most part while trapping

s/most the most/the most/

> & emulating the CSRs read for counter access.
>
> This patch doesn't have the event sampling support yet.
>
> Signed-off-by: Atish Patra <[email protected]>
> ---
> arch/riscv/kvm/vcpu_pmu.c | 358 ++++++++++++++++++++++++++++++++++++--
> 1 file changed, 342 insertions(+), 16 deletions(-)
>
> diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> index 53c4163..21c1f0f 100644
> --- a/arch/riscv/kvm/vcpu_pmu.c
> +++ b/arch/riscv/kvm/vcpu_pmu.c
> @@ -12,10 +12,163 @@
> #include <linux/perf/riscv_pmu.h>
> #include <asm/csr.h>
> #include <asm/kvm_vcpu_sbi.h>
> +#include <asm/bitops.h>
> #include <asm/kvm_vcpu_pmu.h>
> #include <linux/kvm_host.h>
>
> #define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> +#define get_event_type(x) ((x & SBI_PMU_EVENT_IDX_TYPE_MASK) >> 16)
> +#define get_event_code(x) (x & SBI_PMU_EVENT_IDX_CODE_MASK)

Should put () around x

> +
> +static inline u64 pmu_get_sample_period(struct kvm_pmc *pmc)

I'd rather we use kvm_pmu_ for the prefix instead of just pmu for
this and all the other functions.

Also I'd drop the inline attribute here and in other functions below to
let the compiler completely decide what to do.

> +{
> + u64 counter_val_mask = GENMASK(pmc->cinfo.width, 0);
> + u64 sample_period;
> +
> + if (!pmc->counter_val)
> + sample_period = counter_val_mask;
> + else
> + sample_period = (-pmc->counter_val) & counter_val_mask;

I probably don't understand this, since I see arm64/kvm doing the same
thing, but if sample_period is the number of remaining counts a counter
has, then I'd write it as

sample_period = counter_val_mask - (pmc->counter_val & counter_val_mask);

As it stands, the pmc->counter_val == 0 case would be the same, but
the other case differs by 1. arm64/kvm doesn't appear to handle the
zero case.

> +
> + return sample_period;
> +}
> +
> +static u32 pmu_get_perf_event_type(unsigned long eidx)
> +{
> + enum sbi_pmu_event_type etype = get_event_type(eidx);
> + u32 type;
> +
> + if (etype == SBI_PMU_EVENT_TYPE_HW)
> + type = PERF_TYPE_HARDWARE;
> + else if (etype == SBI_PMU_EVENT_TYPE_CACHE)
> + type = PERF_TYPE_HW_CACHE;
> + else if (etype == SBI_PMU_EVENT_TYPE_RAW || etype == SBI_PMU_EVENT_TYPE_FW)
> + type = PERF_TYPE_RAW;
> + else
> + type = PERF_TYPE_MAX;

nit: This might look nicer as a switch

> +
> + return type;
> +}
> +
> +static inline bool pmu_is_fw_event(unsigned long eidx)
> +{
> +

extra blank line here

> + return get_event_type(eidx) == SBI_PMU_EVENT_TYPE_FW;
> +}
> +
> +static void pmu_release_perf_event(struct kvm_pmc *pmc)
> +{
> + if (pmc->perf_event) {
> + perf_event_disable(pmc->perf_event);
> + perf_event_release_kernel(pmc->perf_event);
> + pmc->perf_event = NULL;
> + }
> +}
> +
> +static u64 pmu_get_perf_event_hw_config(u32 sbi_event_code)
> +{
> + /* SBI PMU HW event code is offset by 1 from perf hw event codes */
> + return (u64)sbi_event_code - 1;

This is probably fine to do since we're mapping specified codes to UAPI
codes, so it's unlikely something will change this relationship, but if we
expose the mapping explicitly it'll give code readers a chance to see
what's what without looking stuff up elsewhere. How about creating an
array indexed by sbi_event_code with the mapping?

enum perf_hw_id hw_event_perf_map[SBI_PMU_HW_GENERAL_MAX] = {
[SBI_PMU_HW_CPU_CYCLES] = PERF_COUNT_HW_CPU_CYCLES,
...
};

> +}
> +
> +static u64 pmu_get_perf_event_cache_config(u32 sbi_event_code)
> +{
> + u64 config = U64_MAX;
> + unsigned int cache_type, cache_op, cache_result;
> +
> + /* All the cache event masks lie within 0xFF. No separate masking is necesssary */
> + cache_type = (sbi_event_code & SBI_PMU_EVENT_CACHE_ID_CODE_MASK) >> 3;

It'd be nice to have a define for that 3 shift.

> + cache_op = (sbi_event_code & SBI_PMU_EVENT_CACHE_OP_ID_CODE_MASK) >> 1;

Also for this 1 shift.

> + cache_result = sbi_event_code & SBI_PMU_EVENT_CACHE_RESULT_ID_CODE_MASK;
> +
> + if (cache_type >= PERF_COUNT_HW_CACHE_MAX ||
> + cache_op >= PERF_COUNT_HW_CACHE_OP_MAX ||
> + cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
> + return config;
> +
> + config = cache_type | (cache_op << 8) | (cache_result << 16);
> +
> + return config;
> +}
> +
> +static u64 pmu_get_perf_event_config(unsigned long eidx, uint64_t evt_data)
> +{
> + enum sbi_pmu_event_type etype = get_event_type(eidx);
> + u32 ecode = get_event_code(eidx);
> + u64 config = U64_MAX;
> +
> + if (etype == SBI_PMU_EVENT_TYPE_HW)
> + config = pmu_get_perf_event_hw_config(ecode);
> + else if (etype == SBI_PMU_EVENT_TYPE_CACHE)
> + config = pmu_get_perf_event_cache_config(ecode);
> + else if (etype == SBI_PMU_EVENT_TYPE_RAW)
> + config = evt_data & RISCV_PMU_RAW_EVENT_MASK;
> + else if ((etype == SBI_PMU_EVENT_TYPE_FW) && (ecode < SBI_PMU_FW_MAX))
> + config = (1ULL << 63) | ecode;

nit: switch?

> +
> + return config;
> +}
> +
> +static int pmu_get_fixed_pmc_index(unsigned long eidx)
> +{
> + u32 etype = pmu_get_perf_event_type(eidx);
> + u32 ecode = get_event_code(eidx);
> + int ctr_idx;
> +
> + if (etype != SBI_PMU_EVENT_TYPE_HW)
> + return -EINVAL;
> +
> + if (ecode == SBI_PMU_HW_CPU_CYCLES)
> + ctr_idx = 0;
> + else if (ecode == SBI_PMU_HW_INSTRUCTIONS)
> + ctr_idx = 2;

nit: Could drop the ctr_idx variable and return directly

> + else
> + return -EINVAL;
> +
> + return ctr_idx;
> +}
> +
> +static int pmu_get_programmable_pmc_index(struct kvm_pmu *kvpmu, unsigned long eidx,
> + unsigned long cbase, unsigned long cmask)
> +{
> + int ctr_idx = -1;
> + int i, pmc_idx;
> + int min, max;
> +
> + if (pmu_is_fw_event(eidx)) {
> + /* Firmware counters are mapped 1:1 starting from num_hw_ctrs for simplicity */
> + min = kvpmu->num_hw_ctrs;
> + max = min + kvpmu->num_fw_ctrs;
> + } else {
> + /* First 3 counters are reserved for fixed counters */
> + min = 3;
> + max = kvpmu->num_hw_ctrs;
> + }
> +
> + for_each_set_bit(i, &cmask, BITS_PER_LONG) {
> + pmc_idx = i + cbase;
> + if ((pmc_idx >= min && pmc_idx < max) &&
> + !test_bit(pmc_idx, kvpmu->pmc_in_use)) {
> + ctr_idx = pmc_idx;
> + break;

nit: Could drop the ctr_idx variable and return directly

> + }
> + }
> +
> + return ctr_idx;
> +}
> +
> +static int pmu_get_pmc_index(struct kvm_pmu *pmu, unsigned long eidx,
> + unsigned long cbase, unsigned long cmask)
> +{
> + int ret;
> +
> + /* Fixed counters need to be have fixed mapping as they have different width */
> + ret = pmu_get_fixed_pmc_index(eidx);
> + if (ret >= 0)
> + return ret;
> +
> + return pmu_get_programmable_pmc_index(pmu, eidx, cbase, cmask);
> +}
>
> static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> unsigned long *out_val)
> @@ -82,7 +235,41 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> struct kvm_vcpu_sbi_ext_data *edata)
> {
> - /* TODO */
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + int i, num_ctrs, pmc_index, sbiret = 0;
> + struct kvm_pmc *pmc;
> +
> + num_ctrs = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;

We can put kvm_pmu_num_counters(kvpmu) in its one use below and
drop num_ctrs.

> + if (ctr_base + __fls(ctr_mask) >= num_ctrs) {

__fls is undefined when ctr_mask is zero, so we should check it first,
unless the caller of this function is required to check it.

(Same two comments for the functions below that have the same check.)

> + sbiret = SBI_ERR_INVALID_PARAM;
> + goto out;
> + }
> +
> + /* Start the counters that have been configured and requested by the guest */
> + for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
> + pmc_index = i + ctr_base;
> + if (!test_bit(pmc_index, kvpmu->pmc_in_use))
> + continue;
> + pmc = &kvpmu->pmc[pmc_index];
> + if (flag & SBI_PMU_START_FLAG_SET_INIT_VALUE)
> + pmc->counter_val = ival;
> + if (pmc->perf_event) {
> + if (unlikely(pmc->started)) {
> + sbiret = SBI_ERR_ALREADY_STARTED;
> + continue;
> + }
> + perf_event_period(pmc->perf_event, pmu_get_sample_period(pmc));
> + perf_event_enable(pmc->perf_event);
> + pmc->started = true;
> + } else {
> + kvm_debug("Can not start counter due to invalid confiugartion\n");
^ Cannot ^ configuration

> + sbiret = SBI_ERR_INVALID_PARAM;
> + }
> + }

Possibly a spec oversight is that we continue to try and start counters,
even when we've seen errors. The problem with implementing that is that
if we have both errors we only return the last one. I.e. one counter
was already started and another counter resulted in invalid-param, we
only return invalid-param. We also don't say anything about the number
of failures / successes. I think we should bail on the first error and
even stop counters that we started. Callers can then try again after
correcting their input without potentially getting already-started errors.
We'd need to change the spec to do that though.

> +
> +out:
> + edata->err_val = sbiret;
> +
> return 0;
> }
>
> @@ -90,16 +277,142 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> unsigned long ctr_mask, unsigned long flag,
> struct kvm_vcpu_sbi_ext_data *edata)
> {
> - /* TODO */
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + int i, num_ctrs, pmc_index, sbiret = 0;
> + u64 enabled, running;
> + struct kvm_pmc *pmc;
> +
> + num_ctrs = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;
> + if ((ctr_base + __fls(ctr_mask)) >= num_ctrs) {

nit: Can drop ()

> + sbiret = SBI_ERR_INVALID_PARAM;
> + goto out;
> + }
> +
> + /* Stop the counters that have been configured and requested by the guest */
> + for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
> + pmc_index = i + ctr_base;
> + if (!test_bit(pmc_index, kvpmu->pmc_in_use))
> + continue;
> + pmc = &kvpmu->pmc[pmc_index];
> + if (pmc->perf_event) {
> + if (pmc->started) {
> + /* Stop counting the counter */
> + perf_event_disable(pmc->perf_event);
> + pmc->started = false;
> + } else
> + sbiret = SBI_ERR_ALREADY_STOPPED;
> +
> + if (flag & SBI_PMU_STOP_FLAG_RESET) {
> + /* Relase the counter if this is a reset request */
> + pmc->counter_val += perf_event_read_value(pmc->perf_event,
> + &enabled, &running);
> + pmu_release_perf_event(pmc);
> + clear_bit(pmc_index, kvpmu->pmc_in_use);
> + }
> + } else {
> + kvm_debug("Can not stop counter due to invalid confiugartion\n");
^Cannot ^ configuration

> + sbiret = SBI_ERR_INVALID_PARAM;
> + }
> + }

Same comment about the multiple errors problem.

> +
> +out:
> + edata->err_val = sbiret;
> +
> return 0;
> }
>
> int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> unsigned long ctr_mask, unsigned long flag,
> - unsigned long eidx, uint64_t edata,
> - struct kvm_vcpu_sbi_ext_data *extdata)
> + unsigned long eidx, uint64_t evt_data,
> + struct kvm_vcpu_sbi_ext_data *ext_data)

This should be part of the skeleton patch.

> {
> - /* TODO */
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + struct perf_event *event;
> + struct perf_event_attr attr;
> + int num_ctrs, ctr_idx;
> + u32 etype = pmu_get_perf_event_type(eidx);
> + u64 config;
> + struct kvm_pmc *pmc;
> + int sbiret = 0;
> +
> +
> + num_ctrs = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;
> + if (etype == PERF_TYPE_MAX || (ctr_base + __fls(ctr_mask) >= num_ctrs)) {
> + sbiret = SBI_ERR_INVALID_PARAM;
> + goto out;
> + }
> +
> + if (pmu_is_fw_event(eidx)) {
> + sbiret = SBI_ERR_NOT_SUPPORTED;
> + goto out;
> + }
> +
> + /*
> + * SKIP_MATCH flag indicates the caller is aware of the assigned counter
> + * for this event. Just do a sanity check if it already marked used.
> + */
> + if (flag & SBI_PMU_CFG_FLAG_SKIP_MATCH) {
> + if (!test_bit(ctr_base, kvpmu->pmc_in_use)) {
> + sbiret = SBI_ERR_FAILURE;
> + goto out;

I see this is the same way OpenSBI implements this, but I don't really
understand it. The spec says

"""
NOTE: When SBI_PMU_CFG_FLAG_SKIP_MATCH is set in config_flags, the SBI
implementation will unconditionally select the first counter from the set
of counters specified by the counter_idx_base and counter_idx_mask.
"""

which doesn't say anything about the caller just wanting to confirm the
counter is in use. Indeed, the spec says this function is for

"""
Find and configure a counter from a set of counters which is not started
(or enabled) and can monitor the specified event.
"""

which would imply errors should returned anytime we match something
already started, rather than the opposite.

Also, I think we should be using ctr_base + first-set-bit(ctr_mask),
because the spec says to look at both the base and the mask, and take
the first counter.

Finally, the spec doesn't have SBI_ERR_FAILURE listed as a possible
error.

> + }
> + ctr_idx = ctr_base;
> + goto match_done;

nit: We don't need the match_done label and goto. We can just use
an if-else.

> + }
> +
> + ctr_idx = pmu_get_pmc_index(kvpmu, eidx, ctr_base, ctr_mask);
> + if (ctr_idx < 0) {
> + sbiret = SBI_ERR_NOT_SUPPORTED;
> + goto out;
> + }
> +
> +match_done:
> + pmc = &kvpmu->pmc[ctr_idx];
> + pmu_release_perf_event(pmc);
> + pmc->idx = ctr_idx;
> +
> + config = pmu_get_perf_event_config(eidx, evt_data);
> + memset(&attr, 0, sizeof(struct perf_event_attr));

At the top of the function we can use

struct perf_event_attr attr = {
.type = etype,
.size = sizeof(struct perf_event_attr),

/* anything else we know at the top */

};

instead of the memset and some of the assignments here.

> + attr.type = etype;
> + attr.size = sizeof(attr);
> + attr.pinned = true;
> +
> + /*
> + * It should never reach here if the platform doesn't support sscofpmf extensio
^ the ^ extension

> + * as mode filtering won't work without it.
> + */
> + attr.exclude_host = true;
> + attr.exclude_hv = true;
> + attr.exclude_user = !!(flag & SBI_PMU_CFG_FLAG_SET_UINH);
> + attr.exclude_kernel = !!(flag & SBI_PMU_CFG_FLAG_SET_SINH);
> + attr.config = config;
> + attr.config1 = RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS;
> + if (flag & SBI_PMU_CFG_FLAG_CLEAR_VALUE) {
> + //TODO: Do we really want to clear the value in hardware counter
> + pmc->counter_val = 0;
> + }
> +
> + /*
> + * Set the default sample_period for now. The guest specified value
> + * will be updated in the start call.
> + */
> + attr.sample_period = pmu_get_sample_period(pmc);
> +
> + event = perf_event_create_kernel_counter(&attr, -1, current, NULL, pmc);
> + if (IS_ERR(event)) {
> + pr_err("kvm pmu event creation failed event %pe for eidx %lx\n", event, eidx);
> + return -EOPNOTSUPP;

event is an error, so we shouldn't call it an event in the error message.
How about

pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));

and then PTR_ERR(event) intead of -EOPNOTSUPP.

> + }
> +
> + set_bit(ctr_idx, kvpmu->pmc_in_use);
> + pmc->perf_event = event;
> + if (flag & SBI_PMU_CFG_FLAG_AUTO_START)
> + perf_event_enable(pmc->perf_event);
> +
> + ext_data->out_val = ctr_idx;
> +out:
> + ext_data->err_val = sbiret;
> +
> return 0;
> }
>
> @@ -119,6 +432,7 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> {
> int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
> struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + struct kvm_pmc *pmc;
>
> ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
> if (ret < 0)
> @@ -134,6 +448,7 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> else
> num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;
>
> + bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);

Could move this bitmap clearing to deinit as it should be clear the
first time already since vcpus are allocated with __GFP_ZERO.

> kvpmu->num_hw_ctrs = num_hw_ctrs;
> kvpmu->num_fw_ctrs = num_fw_ctrs;
> /*
> @@ -146,24 +461,26 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> /* TIME CSR shouldn't be read from perf interface */
> if (i == 1)
> continue;
> - kvpmu->pmc[i].idx = i;
> + pmc = &kvpmu->pmc[i];
> + pmc->idx = i;
> + pmc->counter_val = 0;

Also could be moved to deinit.

> if (i < kvpmu->num_hw_ctrs) {
> kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
> if (i < 3)
> /* CY, IR counters */
> - kvpmu->pmc[i].cinfo.width = 63;
> + pmc->cinfo.width = 63;
> else
> - kvpmu->pmc[i].cinfo.width = hpm_width;
> + pmc->cinfo.width = hpm_width;
> /*
> * The CSR number doesn't have any relation with the logical
> * hardware counters. The CSR numbers are encoded sequentially
> * to avoid maintaining a map between the virtual counter
> * and CSR number.
> */
> - kvpmu->pmc[i].cinfo.csr = CSR_CYCLE + i;
> + pmc->cinfo.csr = CSR_CYCLE + i;
> } else {
> - kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_FW;
> - kvpmu->pmc[i].cinfo.width = BITS_PER_LONG - 1;
> + pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
> + pmc->cinfo.width = BITS_PER_LONG - 1;

Almost all of these changes can be avoided by using the pmc pointer in the
skeleton patch from the start.

> }
> }
>
> @@ -172,13 +489,22 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> return 0;
> }
>
> -void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> {
> - /* TODO */
> + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> + struct kvm_pmc *pmc;
> + int i;
> +
> + if (!kvpmu)
> + return;
> +
> + for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_MAX_COUNTERS) {
> + pmc = &kvpmu->pmc[i];
> + pmu_release_perf_event(pmc);
> + }
> }
>
> -void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> {
> - /* TODO */
> + kvm_riscv_vcpu_pmu_deinit(vcpu);

The skeleton patch could put deinit and reset in the right order to avoid
this change.

> }
> -
> --
> 2.25.1
>

Thanks,
drew

2023-01-23 07:24:07

by Andrew Jones

[permalink] [raw]

Subject: Re: [PATCH v2 10/11] RISC-V: KVM: Implement perf support without sampling

On Fri, Jan 13, 2023 at 12:45:02PM +0100, Andrew Jones wrote:
> On Thu, Dec 15, 2022 at 09:00:45AM -0800, Atish Patra wrote:
...
> > + /* Start the counters that have been configured and requested by the guest */
> > + for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
> > + pmc_index = i + ctr_base;
> > + if (!test_bit(pmc_index, kvpmu->pmc_in_use))
> > + continue;
> > + pmc = &kvpmu->pmc[pmc_index];
> > + if (flag & SBI_PMU_START_FLAG_SET_INIT_VALUE)
> > + pmc->counter_val = ival;
> > + if (pmc->perf_event) {
> > + if (unlikely(pmc->started)) {
> > + sbiret = SBI_ERR_ALREADY_STARTED;
> > + continue;
> > + }
> > + perf_event_period(pmc->perf_event, pmu_get_sample_period(pmc));
> > + perf_event_enable(pmc->perf_event);
> > + pmc->started = true;
> > + } else {
> > + kvm_debug("Can not start counter due to invalid confiugartion\n");
> ^ Cannot ^ configuration
>
> > + sbiret = SBI_ERR_INVALID_PARAM;
> > + }
> > + }
>
> Possibly a spec oversight is that we continue to try and start counters,
> even when we've seen errors. The problem with implementing that is that
> if we have both errors we only return the last one. I.e. one counter
> was already started and another counter resulted in invalid-param, we
> only return invalid-param. We also don't say anything about the number
> of failures / successes. I think we should bail on the first error and
> even stop counters that we started. Callers can then try again after
> correcting their input without potentially getting already-started errors.
> We'd need to change the spec to do that though.
>

Thinking about this some more, the spec doesn't prohibit implementations
from bailing on the first error, so we can do that. But maybe we don't
need to stop the counters we started. We can leave it to the driver to
sort out what got configured/started and what didn't when it gets an
error.

Thanks,
drew

2023-01-24 20:41:48

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v2 01/11] RISC-V: Define helper functions expose hpm counter width and count

On Thu, Jan 12, 2023 at 11:22 PM Andrew Jones <[email protected]> wrote:
>
> On Thu, Jan 12, 2023 at 10:18:05AM -0800, Atish Kumar Patra wrote:
> > On Thu, Jan 12, 2023 at 2:06 AM Andrew Jones <[email protected]> wrote:
> > >
> > > On Thu, Dec 15, 2022 at 09:00:36AM -0800, Atish Patra wrote:
> ...
> > > > +EXPORT_SYMBOL(riscv_pmu_get_hpm_info);
> > >
> > > EXPORT_SYMBOL_GPL ?
> > >
> >
> > Is that mandatory ? I have seen usage of both in arch/riscv and other
> > places though.
> > I am also not sure if any other non-GPL module should/need access to this.
>
> TBH, I'm not sure what the best policy is, but I presumed we should use
> _GPL when we aren't aware of anything non-GPL and then when a day comes
> that something non-GPL would like this to be exported, the patch that
> flips it will provide the justification in its commit message.
>

Sgtm. Changed it to EXPORT_SYMBOL_GPL for now.

> Thanks,
> drew

--
Regards,
Atish

2023-01-26 00:50:55

by Atish Patra

[permalink] [raw]

Subject: Re: [PATCH v2 10/11] RISC-V: KVM: Implement perf support without sampling

On Fri, Jan 13, 2023 at 3:45 AM Andrew Jones <[email protected]> wrote:
>
> On Thu, Dec 15, 2022 at 09:00:45AM -0800, Atish Patra wrote:
> > RISC-V SBI PMU & Sscofpmf ISA extension allows supporting perf in
> > the virtualization enviornment as well. KVM implementation
> > relies on SBI PMU extension for most the most part while trapping
>
> s/most the most/the most/
>

Done.

> > & emulating the CSRs read for counter access.
> >
> > This patch doesn't have the event sampling support yet.
> >
> > Signed-off-by: Atish Patra <[email protected]>
> > ---
> > arch/riscv/kvm/vcpu_pmu.c | 358 ++++++++++++++++++++++++++++++++++++--
> > 1 file changed, 342 insertions(+), 16 deletions(-)
> >
> > diff --git a/arch/riscv/kvm/vcpu_pmu.c b/arch/riscv/kvm/vcpu_pmu.c
> > index 53c4163..21c1f0f 100644
> > --- a/arch/riscv/kvm/vcpu_pmu.c
> > +++ b/arch/riscv/kvm/vcpu_pmu.c
> > @@ -12,10 +12,163 @@
> > #include <linux/perf/riscv_pmu.h>
> > #include <asm/csr.h>
> > #include <asm/kvm_vcpu_sbi.h>
> > +#include <asm/bitops.h>
> > #include <asm/kvm_vcpu_pmu.h>
> > #include <linux/kvm_host.h>
> >
> > #define kvm_pmu_num_counters(pmu) ((pmu)->num_hw_ctrs + (pmu)->num_fw_ctrs)
> > +#define get_event_type(x) ((x & SBI_PMU_EVENT_IDX_TYPE_MASK) >> 16)
> > +#define get_event_code(x) (x & SBI_PMU_EVENT_IDX_CODE_MASK)
>
> Should put () around x
>

Sure,

> > +
> > +static inline u64 pmu_get_sample_period(struct kvm_pmc *pmc)
>
> I'd rather we use kvm_pmu_ for the prefix instead of just pmu for
> this and all the other functions.
>
> Also I'd drop the inline attribute here and in other functions below to
> let the compiler completely decide what to do.
>

Done.

> > +{
> > + u64 counter_val_mask = GENMASK(pmc->cinfo.width, 0);
> > + u64 sample_period;
> > +
> > + if (!pmc->counter_val)
> > + sample_period = counter_val_mask;
> > + else
> > + sample_period = (-pmc->counter_val) & counter_val_mask;
>
> I probably don't understand this, since I see arm64/kvm doing the same
> thing, but if sample_period is the number of remaining counts a counter
> has, then I'd write it as

That's correct.

>
> sample_period = counter_val_mask - (pmc->counter_val & counter_val_mask);
>
> As it stands, the pmc->counter_val == 0 case would be the same, but
> the other case differs by 1. arm64/kvm doesn't appear to handle the
> zero case.
>

We do have to consider the last increment as well where it actually overflows.
Thus, if we want to rewrite as per your preference it should be

sample_period = counter_val_mask - (pmc->counter_val & counter_val_mask) + 1 ;

For zero cases, I got it wrong. It should be

counter_val_mask + 1

I guess If a user in a guest sets the sampling period to be 0 (even if
it is impractical to do that), that's a user's choice. KVM should just
create the perf event accordingly. I cross checked the x86
implementation. They do the same as well.
All the counters are 64 bit wide for ARM64. I guess that's why ARM64
didn't have to deal with the special 0 case.

Thanks for catching this case.

> > +
> > + return sample_period;
> > +}
> > +
> > +static u32 pmu_get_perf_event_type(unsigned long eidx)
> > +{
> > + enum sbi_pmu_event_type etype = get_event_type(eidx);
> > + u32 type;
> > +
> > + if (etype == SBI_PMU_EVENT_TYPE_HW)
> > + type = PERF_TYPE_HARDWARE;
> > + else if (etype == SBI_PMU_EVENT_TYPE_CACHE)
> > + type = PERF_TYPE_HW_CACHE;
> > + else if (etype == SBI_PMU_EVENT_TYPE_RAW || etype == SBI_PMU_EVENT_TYPE_FW)
> > + type = PERF_TYPE_RAW;
> > + else
> > + type = PERF_TYPE_MAX;
>
> nit: This might look nicer as a switch
>
> > +
> > + return type;
> > +}
> > +
> > +static inline bool pmu_is_fw_event(unsigned long eidx)
> > +{
> > +
>
> extra blank line here
>
> > + return get_event_type(eidx) == SBI_PMU_EVENT_TYPE_FW;
> > +}
> > +
> > +static void pmu_release_perf_event(struct kvm_pmc *pmc)
> > +{
> > + if (pmc->perf_event) {
> > + perf_event_disable(pmc->perf_event);
> > + perf_event_release_kernel(pmc->perf_event);
> > + pmc->perf_event = NULL;
> > + }
> > +}
> > +
> > +static u64 pmu_get_perf_event_hw_config(u32 sbi_event_code)
> > +{
> > + /* SBI PMU HW event code is offset by 1 from perf hw event codes */
> > + return (u64)sbi_event_code - 1;
>
> This is probably fine to do since we're mapping specified codes to UAPI
> codes, so it's unlikely something will change this relationship, but if we
> expose the mapping explicitly it'll give code readers a chance to see
> what's what without looking stuff up elsewhere. How about creating an
> array indexed by sbi_event_code with the mapping?
>
> enum perf_hw_id hw_event_perf_map[SBI_PMU_HW_GENERAL_MAX] = {
> [SBI_PMU_HW_CPU_CYCLES] = PERF_COUNT_HW_CPU_CYCLES,
> ...
> };
>

Done.

> > +}
> > +
> > +static u64 pmu_get_perf_event_cache_config(u32 sbi_event_code)
> > +{
> > + u64 config = U64_MAX;
> > + unsigned int cache_type, cache_op, cache_result;
> > +
> > + /* All the cache event masks lie within 0xFF. No separate masking is necesssary */
> > + cache_type = (sbi_event_code & SBI_PMU_EVENT_CACHE_ID_CODE_MASK) >> 3;
>
> It'd be nice to have a define for that 3 shift.
>
> > + cache_op = (sbi_event_code & SBI_PMU_EVENT_CACHE_OP_ID_CODE_MASK) >> 1;
>
> Also for this 1 shift.
>

Done.

> > + cache_result = sbi_event_code & SBI_PMU_EVENT_CACHE_RESULT_ID_CODE_MASK;
> > +
> > + if (cache_type >= PERF_COUNT_HW_CACHE_MAX ||
> > + cache_op >= PERF_COUNT_HW_CACHE_OP_MAX ||
> > + cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
> > + return config;
> > +
> > + config = cache_type | (cache_op << 8) | (cache_result << 16);
> > +
> > + return config;
> > +}
> > +
> > +static u64 pmu_get_perf_event_config(unsigned long eidx, uint64_t evt_data)
> > +{
> > + enum sbi_pmu_event_type etype = get_event_type(eidx);
> > + u32 ecode = get_event_code(eidx);
> > + u64 config = U64_MAX;
> > +
> > + if (etype == SBI_PMU_EVENT_TYPE_HW)
> > + config = pmu_get_perf_event_hw_config(ecode);
> > + else if (etype == SBI_PMU_EVENT_TYPE_CACHE)
> > + config = pmu_get_perf_event_cache_config(ecode);
> > + else if (etype == SBI_PMU_EVENT_TYPE_RAW)
> > + config = evt_data & RISCV_PMU_RAW_EVENT_MASK;
> > + else if ((etype == SBI_PMU_EVENT_TYPE_FW) && (ecode < SBI_PMU_FW_MAX))
> > + config = (1ULL << 63) | ecode;
>
> nit: switch?
>
> > +
> > + return config;
> > +}
> > +
> > +static int pmu_get_fixed_pmc_index(unsigned long eidx)
> > +{
> > + u32 etype = pmu_get_perf_event_type(eidx);
> > + u32 ecode = get_event_code(eidx);
> > + int ctr_idx;
> > +
> > + if (etype != SBI_PMU_EVENT_TYPE_HW)
> > + return -EINVAL;
> > +
> > + if (ecode == SBI_PMU_HW_CPU_CYCLES)
> > + ctr_idx = 0;
> > + else if (ecode == SBI_PMU_HW_INSTRUCTIONS)
> > + ctr_idx = 2;
>
> nit: Could drop the ctr_idx variable and return directly
>
> > + else
> > + return -EINVAL;
> > +
> > + return ctr_idx;
> > +}
> > +
> > +static int pmu_get_programmable_pmc_index(struct kvm_pmu *kvpmu, unsigned long eidx,
> > + unsigned long cbase, unsigned long cmask)
> > +{
> > + int ctr_idx = -1;
> > + int i, pmc_idx;
> > + int min, max;
> > +
> > + if (pmu_is_fw_event(eidx)) {
> > + /* Firmware counters are mapped 1:1 starting from num_hw_ctrs for simplicity */
> > + min = kvpmu->num_hw_ctrs;
> > + max = min + kvpmu->num_fw_ctrs;
> > + } else {
> > + /* First 3 counters are reserved for fixed counters */
> > + min = 3;
> > + max = kvpmu->num_hw_ctrs;
> > + }
> > +
> > + for_each_set_bit(i, &cmask, BITS_PER_LONG) {
> > + pmc_idx = i + cbase;
> > + if ((pmc_idx >= min && pmc_idx < max) &&
> > + !test_bit(pmc_idx, kvpmu->pmc_in_use)) {
> > + ctr_idx = pmc_idx;
> > + break;
>
> nit: Could drop the ctr_idx variable and return directly
>
> > + }
> > + }
> > +
> > + return ctr_idx;
> > +}
> > +
> > +static int pmu_get_pmc_index(struct kvm_pmu *pmu, unsigned long eidx,
> > + unsigned long cbase, unsigned long cmask)
> > +{
> > + int ret;
> > +
> > + /* Fixed counters need to be have fixed mapping as they have different width */
> > + ret = pmu_get_fixed_pmc_index(eidx);
> > + if (ret >= 0)
> > + return ret;
> > +
> > + return pmu_get_programmable_pmc_index(pmu, eidx, cbase, cmask);
> > +}
> >
> > static int pmu_ctr_read(struct kvm_vcpu *vcpu, unsigned long cidx,
> > unsigned long *out_val)
> > @@ -82,7 +235,41 @@ int kvm_riscv_vcpu_pmu_ctr_start(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > unsigned long ctr_mask, unsigned long flag, uint64_t ival,
> > struct kvm_vcpu_sbi_ext_data *edata)
> > {
> > - /* TODO */
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > + int i, num_ctrs, pmc_index, sbiret = 0;
> > + struct kvm_pmc *pmc;
> > +
> > + num_ctrs = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;
>
> We can put kvm_pmu_num_counters(kvpmu) in its one use below and
> drop num_ctrs.
>
> > + if (ctr_base + __fls(ctr_mask) >= num_ctrs) {
>
> __fls is undefined when ctr_mask is zero, so we should check it first,
> unless the caller of this function is required to check it.
>
> (Same two comments for the functions below that have the same check.)
>

Moved these checks to a common counter validate function that can be called
from start/stop/config.

> > + sbiret = SBI_ERR_INVALID_PARAM;
> > + goto out;
> > + }
> > +
> > + /* Start the counters that have been configured and requested by the guest */
> > + for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
> > + pmc_index = i + ctr_base;
> > + if (!test_bit(pmc_index, kvpmu->pmc_in_use))
> > + continue;
> > + pmc = &kvpmu->pmc[pmc_index];
> > + if (flag & SBI_PMU_START_FLAG_SET_INIT_VALUE)
> > + pmc->counter_val = ival;
> > + if (pmc->perf_event) {
> > + if (unlikely(pmc->started)) {
> > + sbiret = SBI_ERR_ALREADY_STARTED;
> > + continue;
> > + }
> > + perf_event_period(pmc->perf_event, pmu_get_sample_period(pmc));
> > + perf_event_enable(pmc->perf_event);
> > + pmc->started = true;
> > + } else {
> > + kvm_debug("Can not start counter due to invalid confiugartion\n");
> ^ Cannot ^ configuration
>
> > + sbiret = SBI_ERR_INVALID_PARAM;
> > + }
> > + }
>
> Possibly a spec oversight is that we continue to try and start counters,
> even when we've seen errors. The problem with implementing that is that
> if we have both errors we only return the last one. I.e. one counter
> was already started and another counter resulted in invalid-param, we
> only return invalid-param. We also don't say anything about the number
> of failures / successes. I think we should bail on the first error and
> even stop counters that we started. Callers can then try again after
> correcting their input without potentially getting already-started errors.
> We'd need to change the spec to do that though.
>

The idea was to provide a best effort service for batching use cases.
As long as the caller has a valid counter,
the SBI implementation should try to start that counter. For the
invalid counters, it does notify
the caller that one or more counters from the requested counter is invalid.
It's up to the caller to decide whether it actually cares about the error.

E.g. The caller may just want to stop all the counters without
tracking which counters are actually configured/started.

As per the spec:
SBI_ERR_INVALID_PARAM : set of counters has at least one invalid counter.

SBI_ERR_ALREADY_STARTED : set of counters includes at least one
counter which is already started.

> > +
> > +out:
> > + edata->err_val = sbiret;
> > +
> > return 0;
> > }
> >
> > @@ -90,16 +277,142 @@ int kvm_riscv_vcpu_pmu_ctr_stop(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > unsigned long ctr_mask, unsigned long flag,
> > struct kvm_vcpu_sbi_ext_data *edata)
> > {
> > - /* TODO */
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > + int i, num_ctrs, pmc_index, sbiret = 0;
> > + u64 enabled, running;
> > + struct kvm_pmc *pmc;
> > +
> > + num_ctrs = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;
> > + if ((ctr_base + __fls(ctr_mask)) >= num_ctrs) {
>
> nit: Can drop ()
>
> > + sbiret = SBI_ERR_INVALID_PARAM;
> > + goto out;
> > + }
> > +
> > + /* Stop the counters that have been configured and requested by the guest */
> > + for_each_set_bit(i, &ctr_mask, RISCV_MAX_COUNTERS) {
> > + pmc_index = i + ctr_base;
> > + if (!test_bit(pmc_index, kvpmu->pmc_in_use))
> > + continue;
> > + pmc = &kvpmu->pmc[pmc_index];
> > + if (pmc->perf_event) {
> > + if (pmc->started) {
> > + /* Stop counting the counter */
> > + perf_event_disable(pmc->perf_event);
> > + pmc->started = false;
> > + } else
> > + sbiret = SBI_ERR_ALREADY_STOPPED;
> > +
> > + if (flag & SBI_PMU_STOP_FLAG_RESET) {
> > + /* Relase the counter if this is a reset request */
> > + pmc->counter_val += perf_event_read_value(pmc->perf_event,
> > + &enabled, &running);
> > + pmu_release_perf_event(pmc);
> > + clear_bit(pmc_index, kvpmu->pmc_in_use);
> > + }
> > + } else {
> > + kvm_debug("Can not stop counter due to invalid confiugartion\n");
> ^Cannot ^ configuration
>
> > + sbiret = SBI_ERR_INVALID_PARAM;
> > + }
> > + }
>
> Same comment about the multiple errors problem.
>
> > +
> > +out:
> > + edata->err_val = sbiret;
> > +
> > return 0;
> > }
> >
> > int kvm_riscv_vcpu_pmu_ctr_cfg_match(struct kvm_vcpu *vcpu, unsigned long ctr_base,
> > unsigned long ctr_mask, unsigned long flag,
> > - unsigned long eidx, uint64_t edata,
> > - struct kvm_vcpu_sbi_ext_data *extdata)
> > + unsigned long eidx, uint64_t evt_data,
> > + struct kvm_vcpu_sbi_ext_data *ext_data)
>
> This should be part of the skeleton patch.
>
> > {
> > - /* TODO */
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > + struct perf_event *event;
> > + struct perf_event_attr attr;
> > + int num_ctrs, ctr_idx;
> > + u32 etype = pmu_get_perf_event_type(eidx);
> > + u64 config;
> > + struct kvm_pmc *pmc;
> > + int sbiret = 0;
> > +
> > +
> > + num_ctrs = kvpmu->num_fw_ctrs + kvpmu->num_hw_ctrs;
> > + if (etype == PERF_TYPE_MAX || (ctr_base + __fls(ctr_mask) >= num_ctrs)) {
> > + sbiret = SBI_ERR_INVALID_PARAM;
> > + goto out;
> > + }
> > +
> > + if (pmu_is_fw_event(eidx)) {
> > + sbiret = SBI_ERR_NOT_SUPPORTED;
> > + goto out;
> > + }
> > +
> > + /*
> > + * SKIP_MATCH flag indicates the caller is aware of the assigned counter
> > + * for this event. Just do a sanity check if it already marked used.
> > + */
> > + if (flag & SBI_PMU_CFG_FLAG_SKIP_MATCH) {
> > + if (!test_bit(ctr_base, kvpmu->pmc_in_use)) {
> > + sbiret = SBI_ERR_FAILURE;
> > + goto out;
>
> I see this is the same way OpenSBI implements this, but I don't really
> understand it. The spec says
>
> """
> NOTE: When SBI_PMU_CFG_FLAG_SKIP_MATCH is set in config_flags, the SBI
> implementation will unconditionally select the first counter from the set
> of counters specified by the counter_idx_base and counter_idx_mask.
> """
>
> which doesn't say anything about the caller just wanting to confirm the
> counter is in use. Indeed, the spec says this function is for
>
> """
> Find and configure a counter from a set of counters which is not started
> (or enabled) and can monitor the specified event.
> """
>
> which would imply errors should returned anytime we match something
> already started, rather than the opposite.
>

If the caller requested for a specific counter and that is already
configured, SBI implementation
can not assign another counter. In that case, it has to return an error only.

> Also, I think we should be using ctr_base + first-set-bit(ctr_mask),
> because the spec says to look at both the base and the mask, and take
> the first counter.
>

Ahh yes. Thanks for catching. Fixed.

> Finally, the spec doesn't have SBI_ERR_FAILURE listed as a possible
> error.
>

It should be SBI_ERR_INVALID_PARAM. Fixed.

> > + }
> > + ctr_idx = ctr_base;
> > + goto match_done;
>
> nit: We don't need the match_done label and goto. We can just use
> an if-else.
>
> > + }
> > +
> > + ctr_idx = pmu_get_pmc_index(kvpmu, eidx, ctr_base, ctr_mask);
> > + if (ctr_idx < 0) {
> > + sbiret = SBI_ERR_NOT_SUPPORTED;
> > + goto out;
> > + }
> > +
> > +match_done:
> > + pmc = &kvpmu->pmc[ctr_idx];
> > + pmu_release_perf_event(pmc);
> > + pmc->idx = ctr_idx;
> > +
> > + config = pmu_get_perf_event_config(eidx, evt_data);
> > + memset(&attr, 0, sizeof(struct perf_event_attr));
>
> At the top of the function we can use
>
> struct perf_event_attr attr = {
> .type = etype,
> .size = sizeof(struct perf_event_attr),
>
> /* anything else we know at the top */
>
> };
>
> instead of the memset and some of the assignments here.
>

Done.

> > + attr.type = etype;
> > + attr.size = sizeof(attr);
> > + attr.pinned = true;
> > +
> > + /*
> > + * It should never reach here if the platform doesn't support sscofpmf extensio
> ^ the ^ extension
>
> > + * as mode filtering won't work without it.
> > + */
> > + attr.exclude_host = true;
> > + attr.exclude_hv = true;
> > + attr.exclude_user = !!(flag & SBI_PMU_CFG_FLAG_SET_UINH);
> > + attr.exclude_kernel = !!(flag & SBI_PMU_CFG_FLAG_SET_SINH);
> > + attr.config = config;
> > + attr.config1 = RISCV_KVM_PMU_CONFIG1_GUEST_EVENTS;
> > + if (flag & SBI_PMU_CFG_FLAG_CLEAR_VALUE) {
> > + //TODO: Do we really want to clear the value in hardware counter
> > + pmc->counter_val = 0;
> > + }
> > +
> > + /*
> > + * Set the default sample_period for now. The guest specified value
> > + * will be updated in the start call.
> > + */
> > + attr.sample_period = pmu_get_sample_period(pmc);
> > +
> > + event = perf_event_create_kernel_counter(&attr, -1, current, NULL, pmc);
> > + if (IS_ERR(event)) {
> > + pr_err("kvm pmu event creation failed event %pe for eidx %lx\n", event, eidx);
> > + return -EOPNOTSUPP;
>
> event is an error, so we shouldn't call it an event in the error message.
> How about
>
> pr_err("kvm pmu event creation failed for eidx %lx: %ld\n", eidx, PTR_ERR(event));
>
> and then PTR_ERR(event) intead of -EOPNOTSUPP.
>

Done.

> > + }
> > +
> > + set_bit(ctr_idx, kvpmu->pmc_in_use);
> > + pmc->perf_event = event;
> > + if (flag & SBI_PMU_CFG_FLAG_AUTO_START)
> > + perf_event_enable(pmc->perf_event);
> > +
> > + ext_data->out_val = ctr_idx;
> > +out:
> > + ext_data->err_val = sbiret;
> > +
> > return 0;
> > }
> >
> > @@ -119,6 +432,7 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > {
> > int i = 0, num_fw_ctrs, ret, num_hw_ctrs = 0, hpm_width = 0;
> > struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > + struct kvm_pmc *pmc;
> >
> > ret = riscv_pmu_get_hpm_info(&hpm_width, &num_hw_ctrs);
> > if (ret < 0)
> > @@ -134,6 +448,7 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > else
> > num_fw_ctrs = RISCV_KVM_MAX_FW_CTRS;
> >
> > + bitmap_zero(kvpmu->pmc_in_use, RISCV_MAX_COUNTERS);
>
> Could move this bitmap clearing to deinit as it should be clear the
> first time already since vcpus are allocated with __GFP_ZERO.
>
> > kvpmu->num_hw_ctrs = num_hw_ctrs;
> > kvpmu->num_fw_ctrs = num_fw_ctrs;
> > /*
> > @@ -146,24 +461,26 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > /* TIME CSR shouldn't be read from perf interface */
> > if (i == 1)
> > continue;
> > - kvpmu->pmc[i].idx = i;
> > + pmc = &kvpmu->pmc[i];
> > + pmc->idx = i;
> > + pmc->counter_val = 0;
>
> Also could be moved to deinit.
>
> > if (i < kvpmu->num_hw_ctrs) {
> > kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_HW;
> > if (i < 3)
> > /* CY, IR counters */
> > - kvpmu->pmc[i].cinfo.width = 63;
> > + pmc->cinfo.width = 63;
> > else
> > - kvpmu->pmc[i].cinfo.width = hpm_width;
> > + pmc->cinfo.width = hpm_width;
> > /*
> > * The CSR number doesn't have any relation with the logical
> > * hardware counters. The CSR numbers are encoded sequentially
> > * to avoid maintaining a map between the virtual counter
> > * and CSR number.
> > */
> > - kvpmu->pmc[i].cinfo.csr = CSR_CYCLE + i;
> > + pmc->cinfo.csr = CSR_CYCLE + i;
> > } else {
> > - kvpmu->pmc[i].cinfo.type = SBI_PMU_CTR_TYPE_FW;
> > - kvpmu->pmc[i].cinfo.width = BITS_PER_LONG - 1;
> > + pmc->cinfo.type = SBI_PMU_CTR_TYPE_FW;
> > + pmc->cinfo.width = BITS_PER_LONG - 1;
>
> Almost all of these changes can be avoided by using the pmc pointer in the
> skeleton patch from the start.
>

Fixed.

> > }
> > }
> >
> > @@ -172,13 +489,22 @@ int kvm_riscv_vcpu_pmu_init(struct kvm_vcpu *vcpu)
> > return 0;
> > }
> >
> > -void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> > +void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> > {
> > - /* TODO */
> > + struct kvm_pmu *kvpmu = vcpu_to_pmu(vcpu);
> > + struct kvm_pmc *pmc;
> > + int i;
> > +
> > + if (!kvpmu)
> > + return;
> > +
> > + for_each_set_bit(i, kvpmu->pmc_in_use, RISCV_MAX_COUNTERS) {
> > + pmc = &kvpmu->pmc[i];
> > + pmu_release_perf_event(pmc);
> > + }
> > }
> >
> > -void kvm_riscv_vcpu_pmu_deinit(struct kvm_vcpu *vcpu)
> > +void kvm_riscv_vcpu_pmu_reset(struct kvm_vcpu *vcpu)
> > {
> > - /* TODO */
> > + kvm_riscv_vcpu_pmu_deinit(vcpu);
>
> The skeleton patch could put deinit and reset in the right order to avoid
> this change.
>

Done. I have addressed all the nit comments as well.
Thanks for the detailed review.

> > }
> > -
> > --
> > 2.25.1
> >
>
> Thanks,
> drew

--
Regards,
Atish