2022-05-24 04:30:22

by Atish Kumar Patra

[permalink] [raw]
Subject: [PATCH v9 00/12] Improve PMU support

The latest version of the SBI specification includes a Performance Monitoring
Unit(PMU) extension[1] which allows the supervisor to start/stop/configure
various PMU events. The Sscofpmf ('Ss' for Privileged arch and Supervisor-level
extensions, and 'cofpmf' for Count OverFlow and Privilege Mode Filtering)
extension[2] allows the perf like tool to handle overflow interrupts and
filtering support.

This series implements full PMU infrastructure to support
PMU in virt machine. This will allow us to add any PMU events in future.

Currently, this series enables the following omu events.
1. cycle count
2. instruction count
3. DTLB load/store miss
4. ITLB prefetch miss

The first two are computed using host ticks while last three are counted during
cpu_tlb_fill. We can do both sampling and count from guest userspace.
This series has been tested on both RV64 and RV32. Both Linux[3] and Opensbi[4]
patches are required to get the perf working.

Here is an output of perf stat/report while running hackbench with latest
OpenSBI & Linux kernel.

Perf stat:
==========
[root@fedora-riscv ~]# perf stat -e cycles -e instructions -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \
> perf bench sched messaging -g 1 -l 10
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1 groups == 40 processes run

Total time: 0.265 [sec]

Performance counter stats for 'perf bench sched messaging -g 1 -l 10':

4,167,825,362 cycles
4,166,609,256 instructions # 1.00 insn per cycle
3,092,026 dTLB-load-misses
258,280 dTLB-store-misses
2,068,966 iTLB-load-misses

0.585791767 seconds time elapsed

0.373802000 seconds user
1.042359000 seconds sys

Perf record:
============
[root@fedora-riscv ~]# perf record -e cycles -e instructions \
> -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 10000 \
> perf bench sched messaging -g 1 -l 10
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1 groups == 40 processes run

Total time: 1.397 [sec]
[ perf record: Woken up 10 times to write data ]
Check IO/CPU overload!
[ perf record: Captured and wrote 8.211 MB perf.data (214486 samples) ]

[root@fedora-riscv riscv]# perf report
Available samples
107K cycles ◆
107K instructions ▒
250 dTLB-load-misses ▒
13 dTLB-store-misses ▒
172 iTLB-load-misses
..

Changes from v8->v9:
1. Added the write_done flags to the vmstate.
2. Fixed the hpmcounter read access from M-mode.

Changes from v7->v8:
1. Removeding ordering constraints for mhpmcounter & mhpmevent.

Changes from v6->v7:
1. Fixed all the compilation errors for the usermode.

Changes from v5->v6:
1. Fixed compilation issue with PATCH 1.
2. Addressed other comments.

Changes from v4->v5:
1. Rebased on top of the -next with following patches.
- isa extension
- priv 1.12 spec
2. Addressed all the comments on v4
3. Removed additional isa-ext DT node in favor of riscv,isa string update

Changes from v3->v4:
1. Removed the dummy events from pmu DT node.
2. Fixed pmu_avail_counters mask generation.
3. Added a patch to simplify the predicate function for counters.

Changes from v2->v3:
1. Addressed all the comments on PATCH1-4.
2. Split patch1 into two separate patches.
3. Added explicit comments to explain the event types in DT node.
4. Rebased on latest Qemu.

Changes from v1->v2:
1. Dropped the ACks from v1 as signficant changes happened after v1.
2. sscofpmf support.
3. A generic counter management framework.

[1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
[2] https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
[3] https://github.com/atishp04/qemu/tree/riscv_pmu_v9

Atish Patra (12):
target/riscv: Fix PMU CSR predicate function
target/riscv: Implement PMU CSR predicate function for S-mode
target/riscv: pmu: Rename the counters extension to pmu
target/riscv: pmu: Make number of counters configurable
target/riscv: Implement mcountinhibit CSR
target/riscv: Add support for hpmcounters/hpmevents
target/riscv: Support mcycle/minstret write operation
target/riscv: Add sscofpmf extension support
target/riscv: Simplify counter predicate function
target/riscv: Add few cache related PMU events
hw/riscv: virt: Add PMU DT node to the device tree
target/riscv: Update the privilege field for sscofpmf CSRs

hw/riscv/virt.c | 28 ++
target/riscv/cpu.c | 14 +-
target/riscv/cpu.h | 56 ++-
target/riscv/cpu_bits.h | 59 +++
target/riscv/cpu_helper.c | 25 ++
target/riscv/csr.c | 906 ++++++++++++++++++++++++++++----------
target/riscv/machine.c | 29 ++
target/riscv/meson.build | 3 +-
target/riscv/pmu.c | 432 ++++++++++++++++++
target/riscv/pmu.h | 36 ++
10 files changed, 1353 insertions(+), 235 deletions(-)
create mode 100644 target/riscv/pmu.c
create mode 100644 target/riscv/pmu.h

--
2.25.1



2022-05-24 04:33:03

by Atish Kumar Patra

[permalink] [raw]
Subject: [PATCH v9 05/12] target/riscv: Implement mcountinhibit CSR

From: Atish Patra <[email protected]>

As per the privilege specification v1.11, mcountinhibit allows to start/stop
a pmu counter selectively.

Reviewed-by: Bin Meng <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
Signed-off-by: Atish Patra <[email protected]>
---
target/riscv/cpu.h | 2 ++
target/riscv/cpu_bits.h | 4 ++++
target/riscv/csr.c | 25 +++++++++++++++++++++++++
target/riscv/machine.c | 1 +
4 files changed, 32 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 7cbcd8d62fc1..45ac0f2d2614 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -269,6 +269,8 @@ struct CPUArchState {
target_ulong scounteren;
target_ulong mcounteren;

+ target_ulong mcountinhibit;
+
target_ulong sscratch;
target_ulong mscratch;

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 4d04b20d064e..b3f7fa713000 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -367,6 +367,10 @@
#define CSR_MHPMCOUNTER29 0xb1d
#define CSR_MHPMCOUNTER30 0xb1e
#define CSR_MHPMCOUNTER31 0xb1f
+
+/* Machine counter-inhibit register */
+#define CSR_MCOUNTINHIBIT 0x320
+
#define CSR_MHPMEVENT3 0x323
#define CSR_MHPMEVENT4 0x324
#define CSR_MHPMEVENT5 0x325
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 7e14f7685fb9..ea1cde68610c 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1475,6 +1475,28 @@ static RISCVException write_mtvec(CPURISCVState *env, int csrno,
return RISCV_EXCP_NONE;
}

+static RISCVException read_mcountinhibit(CPURISCVState *env, int csrno,
+ target_ulong *val)
+{
+ if (env->priv_ver < PRIV_VERSION_1_11_0) {
+ return RISCV_EXCP_ILLEGAL_INST;
+ }
+
+ *val = env->mcountinhibit;
+ return RISCV_EXCP_NONE;
+}
+
+static RISCVException write_mcountinhibit(CPURISCVState *env, int csrno,
+ target_ulong val)
+{
+ if (env->priv_ver < PRIV_VERSION_1_11_0) {
+ return RISCV_EXCP_ILLEGAL_INST;
+ }
+
+ env->mcountinhibit = val;
+ return RISCV_EXCP_NONE;
+}
+
static RISCVException read_mcounteren(CPURISCVState *env, int csrno,
target_ulong *val)
{
@@ -3741,6 +3763,9 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
[CSR_MHPMCOUNTER30] = { "mhpmcounter30", mctr, read_zero },
[CSR_MHPMCOUNTER31] = { "mhpmcounter31", mctr, read_zero },

+ [CSR_MCOUNTINHIBIT] = { "mcountinhibit", any, read_mcountinhibit,
+ write_mcountinhibit },
+
[CSR_MHPMEVENT3] = { "mhpmevent3", any, read_zero },
[CSR_MHPMEVENT4] = { "mhpmevent4", any, read_zero },
[CSR_MHPMEVENT5] = { "mhpmevent5", any, read_zero },
diff --git a/target/riscv/machine.c b/target/riscv/machine.c
index 2a437b29a1ce..87cd55bfd3a7 100644
--- a/target/riscv/machine.c
+++ b/target/riscv/machine.c
@@ -330,6 +330,7 @@ const VMStateDescription vmstate_riscv_cpu = {
VMSTATE_UINTTL(env.siselect, RISCVCPU),
VMSTATE_UINTTL(env.scounteren, RISCVCPU),
VMSTATE_UINTTL(env.mcounteren, RISCVCPU),
+ VMSTATE_UINTTL(env.mcountinhibit, RISCVCPU),
VMSTATE_UINTTL(env.sscratch, RISCVCPU),
VMSTATE_UINTTL(env.mscratch, RISCVCPU),
VMSTATE_UINT64(env.mfromhost, RISCVCPU),
--
2.25.1