2023-06-30 08:46:08

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v3 00/10] riscv: Allow userspace to directly access perf counters

riscv used to allow direct access to cycle/time/instret counters,
bypassing the perf framework, this patchset intends to allow the user to
mmap any counter when accessed through perf. But we can't break the
existing behaviour so we introduce a sysctl perf_user_access like arm64
does, which defaults to the legacy mode described above.

This version needs openSBI v1.3 *and* a fix that went upstream lately
(https://lore.kernel.org/lkml/[email protected]/T/).

**Important**: In this version, the default mode is now user access, not
the legacy so some applications will break.

base-commit-tag: v6.4-rc6

Changes in v3:
v3:
- patch 1 now contains the ref to the faulty commit (no Fixes tag as it is only a comment), as Andrew suggested
- Removed RISCV_PMU_LEGACY_TIME from patch 3, as Andrew suggested
- Rename RISCV_PMU_PDEV_NAME to "riscv-pmu-sbi", patch4 is just cosmetic now, as Andrew suggested
- Removed a few useless (and wrong) comments, as Andrew suggested
- Simplify arch_perf_update_userpage code, as Andrew suggested
- Documentation now mentions that time CSR is *always* accessible, whatever the mode, as suggested by Andrew
- Removed CYCLEH reference and add TODO for rv32 support, as suggested by Atish
- Do not rename the pmu instance as Atish suggested
- Set pmc_width only if rdpmc is enabled and CONFIG_RISCV_PMU is set and the event is a hw event
- Move arch_perf_update_userpage https://lore.kernel.org/lkml/[email protected]/T/
- **Switch to user mode access by default**

Changes in v2:
- Split into smaller patches, way better!
- Add RB from Conor
- Simplify the way we checked riscv architecture
- Fix race mmap and other thread running on other cpus
- Use hwc when available
- Set all userspace access flags in event_init, too cumbersome to handle sysctl changes
- Fix arch_perf_update_userpage for pmu other than riscv-pmu by renaming pmu driver
- Fixed kernel test robot build error
- Fixed documentation (Andrew and Bagas)
- perf testsuite passes mmap tests in all 3 modes

Alexandre Ghiti (10):
perf: Fix wrong comment about default event_idx
include: riscv: Fix wrong include guard in riscv_pmu.h
riscv: Make legacy counter enum match the HW numbering
drivers: perf: Rename riscv pmu sbi driver
riscv: Prepare for user-space perf event mmap support
drivers: perf: Implement perf event mmap support in the legacy backend
drivers: perf: Implement perf event mmap support in the SBI backend
Documentation: admin-guide: Add riscv sysctl_perf_user_access
tools: lib: perf: Implement riscv mmap support
perf: tests: Adapt mmap-basic.c for riscv

Documentation/admin-guide/sysctl/kernel.rst | 26 ++-
drivers/perf/riscv_pmu.c | 113 +++++++++++
drivers/perf/riscv_pmu_legacy.c | 28 ++-
drivers/perf/riscv_pmu_sbi.c | 196 +++++++++++++++++++-
include/linux/perf/riscv_pmu.h | 12 +-
include/linux/perf_event.h | 3 +-
tools/lib/perf/mmap.c | 65 +++++++
tools/perf/tests/mmap-basic.c | 4 +-
8 files changed, 427 insertions(+), 20 deletions(-)

--
2.39.2



2023-06-30 09:00:57

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v3 08/10] Documentation: admin-guide: Add riscv sysctl_perf_user_access

riscv now uses this sysctl so document its usage for this architecture.

Signed-off-by: Alexandre Ghiti <[email protected]>
---
Documentation/admin-guide/sysctl/kernel.rst | 26 +++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index d85d90f5d000..c376692b372b 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -941,16 +941,34 @@ enabled, otherwise writing to this file will return ``-EBUSY``.
The default value is 8.


-perf_user_access (arm64 only)
-=================================
+perf_user_access (arm64 and riscv only)
+=======================================
+
+Controls user space access for reading perf event counters.

-Controls user space access for reading perf event counters. When set to 1,
-user space can read performance monitor counter registers directly.
+arm64
+=====

The default value is 0 (access disabled).
+When set to 1, user space can read performance monitor counter registers
+directly.

See Documentation/arm64/perf.rst for more information.

+riscv
+=====
+
+When set to 0, user access is disabled.
+
+When set to 1, user space can read performance monitor counter registers
+directly only through perf, any direct access without perf intervention will
+trigger an illegal instruction.
+
+The default value is 2, which enables legacy mode (user space has direct
+access to cycle and insret CSRs only). Note that this legacy value
+is deprecated and will be removed once all userspace applications are fixed.
+
+Note that the time CSR is for now always accessible to all modes.

pid_max
=======
--
2.39.2


2023-06-30 09:02:22

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v3 04/10] drivers: perf: Rename riscv pmu sbi driver

That's just cosmetic, no functional changes.

Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
---
drivers/perf/riscv_pmu_sbi.c | 4 ++--
include/linux/perf/riscv_pmu.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 4f3ac296b3e2..83c3f1c4d2f1 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -914,7 +914,7 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
static struct platform_driver pmu_sbi_driver = {
.probe = pmu_sbi_device_probe,
.driver = {
- .name = RISCV_PMU_PDEV_NAME,
+ .name = RISCV_PMU_SBI_PDEV_NAME,
},
};

@@ -941,7 +941,7 @@ static int __init pmu_sbi_devinit(void)
if (ret)
return ret;

- pdev = platform_device_register_simple(RISCV_PMU_PDEV_NAME, -1, NULL, 0);
+ pdev = platform_device_register_simple(RISCV_PMU_SBI_PDEV_NAME, -1, NULL, 0);
if (IS_ERR(pdev)) {
platform_driver_unregister(&pmu_sbi_driver);
return PTR_ERR(pdev);
diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index 9f70d94942e0..5deeea0be7cb 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -21,7 +21,7 @@

#define RISCV_MAX_COUNTERS 64
#define RISCV_OP_UNSUPP (-EOPNOTSUPP)
-#define RISCV_PMU_PDEV_NAME "riscv-pmu"
+#define RISCV_PMU_SBI_PDEV_NAME "riscv-pmu-sbi"
#define RISCV_PMU_LEGACY_PDEV_NAME "riscv-pmu-legacy"

#define RISCV_PMU_STOP_FLAG_RESET 1
--
2.39.2


2023-06-30 09:03:13

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v3 09/10] tools: lib: perf: Implement riscv mmap support

riscv now supports mmaping hardware counters so add what's needed to
take advantage of that in libperf.

Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
---
tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)

diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
index 0d1634cedf44..378a163f0554 100644
--- a/tools/lib/perf/mmap.c
+++ b/tools/lib/perf/mmap.c
@@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter)

static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }

+#elif __riscv_xlen == 64
+
+/* TODO: implement rv32 support */
+
+#define CSR_CYCLE 0xc00
+#define CSR_TIME 0xc01
+
+#define csr_read(csr) \
+({ \
+ register unsigned long __v; \
+ __asm__ __volatile__ ("csrr %0, " #csr \
+ : "=r" (__v) : \
+ : "memory"); \
+ __v; \
+})
+
+static unsigned long csr_read_num(int csr_num)
+{
+#define switchcase_csr_read(__csr_num, __val) {\
+ case __csr_num: \
+ __val = csr_read(__csr_num); \
+ break; }
+#define switchcase_csr_read_2(__csr_num, __val) {\
+ switchcase_csr_read(__csr_num + 0, __val) \
+ switchcase_csr_read(__csr_num + 1, __val)}
+#define switchcase_csr_read_4(__csr_num, __val) {\
+ switchcase_csr_read_2(__csr_num + 0, __val) \
+ switchcase_csr_read_2(__csr_num + 2, __val)}
+#define switchcase_csr_read_8(__csr_num, __val) {\
+ switchcase_csr_read_4(__csr_num + 0, __val) \
+ switchcase_csr_read_4(__csr_num + 4, __val)}
+#define switchcase_csr_read_16(__csr_num, __val) {\
+ switchcase_csr_read_8(__csr_num + 0, __val) \
+ switchcase_csr_read_8(__csr_num + 8, __val)}
+#define switchcase_csr_read_32(__csr_num, __val) {\
+ switchcase_csr_read_16(__csr_num + 0, __val) \
+ switchcase_csr_read_16(__csr_num + 16, __val)}
+
+ unsigned long ret = 0;
+
+ switch (csr_num) {
+ switchcase_csr_read_32(CSR_CYCLE, ret)
+ default:
+ break;
+ }
+
+ return ret;
+#undef switchcase_csr_read_32
+#undef switchcase_csr_read_16
+#undef switchcase_csr_read_8
+#undef switchcase_csr_read_4
+#undef switchcase_csr_read_2
+#undef switchcase_csr_read
+}
+
+static u64 read_perf_counter(unsigned int counter)
+{
+ return csr_read_num(CSR_CYCLE + counter);
+}
+
+static u64 read_timestamp(void)
+{
+ return csr_read_num(CSR_TIME);
+}
+
#else
static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
static u64 read_timestamp(void) { return 0; }
--
2.39.2


2023-06-30 09:03:54

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v3 10/10] perf: tests: Adapt mmap-basic.c for riscv

riscv now supports mmaping hardware counters to userspace so adapt the test
to run on this architecture.

Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
---
tools/perf/tests/mmap-basic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/mmap-basic.c b/tools/perf/tests/mmap-basic.c
index e68ca6229756..f5075ca774f8 100644
--- a/tools/perf/tests/mmap-basic.c
+++ b/tools/perf/tests/mmap-basic.c
@@ -284,7 +284,7 @@ static struct test_case tests__basic_mmap[] = {
"permissions"),
TEST_CASE_REASON("User space counter reading of instructions",
mmap_user_read_instr,
-#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__)
+#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__) || __riscv_xlen == 64
"permissions"
#else
"unsupported"
@@ -292,7 +292,7 @@ static struct test_case tests__basic_mmap[] = {
),
TEST_CASE_REASON("User space counter reading of cycles",
mmap_user_read_cycles,
-#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__)
+#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__) || __riscv_xlen == 64
"permissions"
#else
"unsupported"
--
2.39.2


2023-06-30 09:10:45

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v3 05/10] riscv: Prepare for user-space perf event mmap support

Provide all the necessary bits in the generic riscv pmu driver to be
able to mmap perf events in userspace: the heavy lifting lies in the
driver backend, namely the legacy and sbi implementations.

Note that arch_perf_update_userpage is almost a copy of arm64 code.

Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
---
drivers/perf/riscv_pmu.c | 106 +++++++++++++++++++++++++++++++++
include/linux/perf/riscv_pmu.h | 4 ++
2 files changed, 110 insertions(+)

diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
index ebca5eab9c9b..e1b0992f34df 100644
--- a/drivers/perf/riscv_pmu.c
+++ b/drivers/perf/riscv_pmu.c
@@ -14,9 +14,74 @@
#include <linux/perf/riscv_pmu.h>
#include <linux/printk.h>
#include <linux/smp.h>
+#include <linux/sched_clock.h>

#include <asm/sbi.h>

+static bool riscv_perf_user_access(struct perf_event *event)
+{
+ return ((event->attr.type == PERF_TYPE_HARDWARE) ||
+ (event->attr.type == PERF_TYPE_HW_CACHE) ||
+ (event->attr.type == PERF_TYPE_RAW)) &&
+ !!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT);
+}
+
+void arch_perf_update_userpage(struct perf_event *event,
+ struct perf_event_mmap_page *userpg, u64 now)
+{
+ struct clock_read_data *rd;
+ unsigned int seq;
+ u64 ns;
+
+ userpg->cap_user_time = 0;
+ userpg->cap_user_time_zero = 0;
+ userpg->cap_user_time_short = 0;
+ userpg->cap_user_rdpmc = riscv_perf_user_access(event);
+
+ if (userpg->cap_user_rdpmc)
+ userpg->pmc_width = 64;
+
+ do {
+ rd = sched_clock_read_begin(&seq);
+
+ userpg->time_mult = rd->mult;
+ userpg->time_shift = rd->shift;
+ userpg->time_zero = rd->epoch_ns;
+ userpg->time_cycles = rd->epoch_cyc;
+ userpg->time_mask = rd->sched_clock_mask;
+
+ /*
+ * Subtract the cycle base, such that software that
+ * doesn't know about cap_user_time_short still 'works'
+ * assuming no wraps.
+ */
+ ns = mul_u64_u32_shr(rd->epoch_cyc, rd->mult, rd->shift);
+ userpg->time_zero -= ns;
+
+ } while (sched_clock_read_retry(seq));
+
+ userpg->time_offset = userpg->time_zero - now;
+
+ /*
+ * time_shift is not expected to be greater than 31 due to
+ * the original published conversion algorithm shifting a
+ * 32-bit value (now specifies a 64-bit value) - refer
+ * perf_event_mmap_page documentation in perf_event.h.
+ */
+ if (userpg->time_shift == 32) {
+ userpg->time_shift = 31;
+ userpg->time_mult >>= 1;
+ }
+
+ /*
+ * Internal timekeeping for enabled/running/stopped times
+ * is always computed with the sched_clock.
+ */
+ userpg->cap_user_time = 1;
+ userpg->cap_user_time_zero = 1;
+ userpg->cap_user_time_short = 1;
+}
+
static unsigned long csr_read_num(int csr_num)
{
#define switchcase_csr_read(__csr_num, __val) {\
@@ -171,6 +236,8 @@ int riscv_pmu_event_set_period(struct perf_event *event)

local64_set(&hwc->prev_count, (u64)-left);

+ perf_event_update_userpage(event);
+
return overflow;
}

@@ -267,6 +334,9 @@ static int riscv_pmu_event_init(struct perf_event *event)
hwc->idx = -1;
hwc->event_base = mapped_event;

+ if (rvpmu->event_init)
+ rvpmu->event_init(event);
+
if (!is_sampling_event(event)) {
/*
* For non-sampling runs, limit the sample_period to half
@@ -283,6 +353,39 @@ static int riscv_pmu_event_init(struct perf_event *event)
return 0;
}

+static int riscv_pmu_event_idx(struct perf_event *event)
+{
+ struct riscv_pmu *rvpmu = to_riscv_pmu(event->pmu);
+
+ if (!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT))
+ return 0;
+
+ if (rvpmu->csr_index)
+ return rvpmu->csr_index(event) + 1;
+
+ return 0;
+}
+
+static void riscv_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
+{
+ struct riscv_pmu *rvpmu = to_riscv_pmu(event->pmu);
+
+ if (rvpmu->event_mapped) {
+ rvpmu->event_mapped(event, mm);
+ perf_event_update_userpage(event);
+ }
+}
+
+static void riscv_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
+{
+ struct riscv_pmu *rvpmu = to_riscv_pmu(event->pmu);
+
+ if (rvpmu->event_unmapped) {
+ rvpmu->event_unmapped(event, mm);
+ perf_event_update_userpage(event);
+ }
+}
+
struct riscv_pmu *riscv_pmu_alloc(void)
{
struct riscv_pmu *pmu;
@@ -307,6 +410,9 @@ struct riscv_pmu *riscv_pmu_alloc(void)
}
pmu->pmu = (struct pmu) {
.event_init = riscv_pmu_event_init,
+ .event_mapped = riscv_pmu_event_mapped,
+ .event_unmapped = riscv_pmu_event_unmapped,
+ .event_idx = riscv_pmu_event_idx,
.add = riscv_pmu_add,
.del = riscv_pmu_del,
.start = riscv_pmu_start,
diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index 5deeea0be7cb..43282e22ebe1 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -55,6 +55,10 @@ struct riscv_pmu {
void (*ctr_start)(struct perf_event *event, u64 init_val);
void (*ctr_stop)(struct perf_event *event, unsigned long flag);
int (*event_map)(struct perf_event *event, u64 *config);
+ void (*event_init)(struct perf_event *event);
+ void (*event_mapped)(struct perf_event *event, struct mm_struct *mm);
+ void (*event_unmapped)(struct perf_event *event, struct mm_struct *mm);
+ uint8_t (*csr_index)(struct perf_event *event);

struct cpu_hw_events __percpu *hw_events;
struct hlist_node node;
--
2.39.2


2023-06-30 12:07:55

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v3 08/10] Documentation: admin-guide: Add riscv sysctl_perf_user_access

On Fri, Jun 30, 2023 at 10:30:11AM +0200, Alexandre Ghiti wrote:
> riscv now uses this sysctl so document its usage for this architecture.
>
> Signed-off-by: Alexandre Ghiti <[email protected]>
> ---
> Documentation/admin-guide/sysctl/kernel.rst | 26 +++++++++++++++++----
> 1 file changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index d85d90f5d000..c376692b372b 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -941,16 +941,34 @@ enabled, otherwise writing to this file will return ``-EBUSY``.
> The default value is 8.
>
>
> -perf_user_access (arm64 only)
> -=================================
> +perf_user_access (arm64 and riscv only)
> +=======================================
> +
> +Controls user space access for reading perf event counters.
>
> -Controls user space access for reading perf event counters. When set to 1,
> -user space can read performance monitor counter registers directly.
> +arm64
> +=====
>
> The default value is 0 (access disabled).

Should add a blank line here.

> +When set to 1, user space can read performance monitor counter registers
> +directly.
>
> See Documentation/arm64/perf.rst for more information.
>
> +riscv
> +=====
> +
> +When set to 0, user access is disabled.
> +
> +When set to 1, user space can read performance monitor counter registers
> +directly only through perf, any direct access without perf intervention will

Remove 'directly only'

(It can't be both "direct" and "through" at the same time.)

> +trigger an illegal instruction.
> +
> +The default value is 2,

This is no longer true.

> which enables legacy mode (user space has direct
> +access to cycle and insret CSRs only). Note that this legacy value
> +is deprecated and will be removed once all userspace applications are fixed.
> +
> +Note that the time CSR is for now always accessible to all modes.

s/always accessible/always directly accessible/

Also, remove 'for now'. While we may change this in the future, I'm not
sure if the 'for now' helps much. Maybe a "This may change in the future."
type of sentence? Or, just nothing (for now :-) and we'll modify this
document if it changes later.

Thanks,
drew

>
> pid_max
> =======
> --
> 2.39.2
>

2023-07-03 09:50:23

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v3 08/10] Documentation: admin-guide: Add riscv sysctl_perf_user_access

On Fri, Jun 30, 2023 at 1:16 PM Andrew Jones <[email protected]> wrote:
>
> On Fri, Jun 30, 2023 at 10:30:11AM +0200, Alexandre Ghiti wrote:
> > riscv now uses this sysctl so document its usage for this architecture.
> >
> > Signed-off-by: Alexandre Ghiti <[email protected]>
> > ---
> > Documentation/admin-guide/sysctl/kernel.rst | 26 +++++++++++++++++----
> > 1 file changed, 22 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index d85d90f5d000..c376692b372b 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -941,16 +941,34 @@ enabled, otherwise writing to this file will return ``-EBUSY``.
> > The default value is 8.
> >
> >
> > -perf_user_access (arm64 only)
> > -=================================
> > +perf_user_access (arm64 and riscv only)
> > +=======================================
> > +
> > +Controls user space access for reading perf event counters.
> >
> > -Controls user space access for reading perf event counters. When set to 1,
> > -user space can read performance monitor counter registers directly.
> > +arm64
> > +=====
> >
> > The default value is 0 (access disabled).
>
> Should add a blank line here.

Done, thanks

>
> > +When set to 1, user space can read performance monitor counter registers
> > +directly.
> >
> > See Documentation/arm64/perf.rst for more information.
> >
> > +riscv
> > +=====
> > +
> > +When set to 0, user access is disabled.
> > +
> > +When set to 1, user space can read performance monitor counter registers
> > +directly only through perf, any direct access without perf intervention will
>
> Remove 'directly only'
>
> (It can't be both "direct" and "through" at the same time.)
>
> > +trigger an illegal instruction.
> > +
> > +The default value is 2,
>
> This is no longer true.

Damn, sorry about that.

>
> > which enables legacy mode (user space has direct
> > +access to cycle and insret CSRs only). Note that this legacy value
> > +is deprecated and will be removed once all userspace applications are fixed.
> > +
> > +Note that the time CSR is for now always accessible to all modes.
>
> s/always accessible/always directly accessible/
>
> Also, remove 'for now'. While we may change this in the future, I'm not
> sure if the 'for now' helps much. Maybe a "This may change in the future."
> type of sentence? Or, just nothing (for now :-) and we'll modify this
> document if it changes later.

I won't say anything about the future, thanks!

I also harmonized the "user space" and "userspace" in this document
with what arm64 does.

Thanks

>
> Thanks,
> drew
>
> >
> > pid_max
> > =======
> > --
> > 2.39.2
> >