2023-07-27 14:47:48

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v4 00/10] riscv: Allow userspace to directly access perf counters

riscv used to allow direct access to cycle/time/instret counters,
bypassing the perf framework, this patchset intends to allow the user to
mmap any counter when accessed through perf.

**Important**: The default mode is now user access through perf only, not
the legacy so some applications will break. However, we introduce a sysctl
perf_user_access like arm64 does, which will allow to switch to the legacy
mode described above.

This version needs openSBI v1.3 *and* a kernel fix that went upstream lately
(https://lore.kernel.org/lkml/[email protected]/T/).

base-commit-tag: v6.5-rc1

Changes in v5:
- Fix typo from Atish
- Add RB from Atish and Andrew
- Improve cover letter and patch 7 commit log to explain why we made the
choice to break userspace for security reasons, thanks Atish and Rémi
- Rebase on top of v6.5-rc1

Changes in v4:
- Fixed some nits in riscv_pmu_sbi.c thanks to Andrew
- Fixed the documentation thanks to Andrew
- Added RB from Andrew \o/

Changes in v3:
- patch 1 now contains the ref to the faulty commit (no Fixes tag as it is only a comment), as Andrew suggested
- Removed RISCV_PMU_LEGACY_TIME from patch 3, as Andrew suggested
- Rename RISCV_PMU_PDEV_NAME to "riscv-pmu-sbi", patch4 is just cosmetic now, as Andrew suggested
- Removed a few useless (and wrong) comments, as Andrew suggested
- Simplify arch_perf_update_userpage code, as Andrew suggested
- Documentation now mentions that time CSR is *always* accessible, whatever the mode, as suggested by Andrew
- Removed CYCLEH reference and add TODO for rv32 support, as suggested by Atish
- Do not rename the pmu instance as Atish suggested
- Set pmc_width only if rdpmc is enabled and CONFIG_RISCV_PMU is set and the event is a hw event
- Move arch_perf_update_userpage https://lore.kernel.org/lkml/[email protected]/T/
- **Switch to user mode access by default**

Changes in v2:
- Split into smaller patches, way better!
- Add RB from Conor
- Simplify the way we checked riscv architecture
- Fix race mmap and other thread running on other cpus
- Use hwc when available
- Set all userspace access flags in event_init, too cumbersome to handle sysctl changes
- Fix arch_perf_update_userpage for pmu other than riscv-pmu by renaming pmu driver
- Fixed kernel test robot build error
- Fixed documentation (Andrew and Bagas)
- perf testsuite passes mmap tests in all 3 modes

Alexandre Ghiti (10):
perf: Fix wrong comment about default event_idx
include: riscv: Fix wrong include guard in riscv_pmu.h
riscv: Make legacy counter enum match the HW numbering
drivers: perf: Rename riscv pmu sbi driver
riscv: Prepare for user-space perf event mmap support
drivers: perf: Implement perf event mmap support in the legacy backend
drivers: perf: Implement perf event mmap support in the SBI backend
Documentation: admin-guide: Add riscv sysctl_perf_user_access
tools: lib: perf: Implement riscv mmap support
perf: tests: Adapt mmap-basic.c for riscv

Documentation/admin-guide/sysctl/kernel.rst | 27 ++-
drivers/perf/riscv_pmu.c | 113 +++++++++++
drivers/perf/riscv_pmu_legacy.c | 28 ++-
drivers/perf/riscv_pmu_sbi.c | 196 +++++++++++++++++++-
include/linux/perf/riscv_pmu.h | 12 +-
include/linux/perf_event.h | 3 +-
tools/lib/perf/mmap.c | 65 +++++++
tools/perf/tests/mmap-basic.c | 4 +-
8 files changed, 428 insertions(+), 20 deletions(-)

--
2.39.2



2023-07-27 14:48:48

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v4 02/10] include: riscv: Fix wrong include guard in riscv_pmu.h

The current include guard prevents the inclusion of asm/perf_event.h
which uses the same include guard: fix the one in riscv_pmu.h so that it
matches the file name.

Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
---
include/linux/perf/riscv_pmu.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index 43fc892aa7d9..9f70d94942e0 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -6,8 +6,8 @@
*
*/

-#ifndef _ASM_RISCV_PERF_EVENT_H
-#define _ASM_RISCV_PERF_EVENT_H
+#ifndef _RISCV_PMU_H
+#define _RISCV_PMU_H

#include <linux/perf_event.h>
#include <linux/ptrace.h>
@@ -81,4 +81,4 @@ int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr);

#endif /* CONFIG_RISCV_PMU */

-#endif /* _ASM_RISCV_PERF_EVENT_H */
+#endif /* _RISCV_PMU_H */
--
2.39.2


2023-07-27 14:50:31

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v4 10/10] perf: tests: Adapt mmap-basic.c for riscv

riscv now supports mmaping hardware counters to userspace so adapt the test
to run on this architecture.

Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
---
tools/perf/tests/mmap-basic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/mmap-basic.c b/tools/perf/tests/mmap-basic.c
index e68ca6229756..f5075ca774f8 100644
--- a/tools/perf/tests/mmap-basic.c
+++ b/tools/perf/tests/mmap-basic.c
@@ -284,7 +284,7 @@ static struct test_case tests__basic_mmap[] = {
"permissions"),
TEST_CASE_REASON("User space counter reading of instructions",
mmap_user_read_instr,
-#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__)
+#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__) || __riscv_xlen == 64
"permissions"
#else
"unsupported"
@@ -292,7 +292,7 @@ static struct test_case tests__basic_mmap[] = {
),
TEST_CASE_REASON("User space counter reading of cycles",
mmap_user_read_cycles,
-#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__)
+#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__) || __riscv_xlen == 64
"permissions"
#else
"unsupported"
--
2.39.2


2023-07-27 15:26:01

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v4 05/10] riscv: Prepare for user-space perf event mmap support

Provide all the necessary bits in the generic riscv pmu driver to be
able to mmap perf events in userspace: the heavy lifting lies in the
driver backend, namely the legacy and sbi implementations.

Note that arch_perf_update_userpage is almost a copy of arm64 code.

Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
---
drivers/perf/riscv_pmu.c | 105 +++++++++++++++++++++++++++++++++
include/linux/perf/riscv_pmu.h | 4 ++
2 files changed, 109 insertions(+)

diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
index ebca5eab9c9b..432ad2e80ce3 100644
--- a/drivers/perf/riscv_pmu.c
+++ b/drivers/perf/riscv_pmu.c
@@ -14,9 +14,73 @@
#include <linux/perf/riscv_pmu.h>
#include <linux/printk.h>
#include <linux/smp.h>
+#include <linux/sched_clock.h>

#include <asm/sbi.h>

+static bool riscv_perf_user_access(struct perf_event *event)
+{
+ return ((event->attr.type == PERF_TYPE_HARDWARE) ||
+ (event->attr.type == PERF_TYPE_HW_CACHE) ||
+ (event->attr.type == PERF_TYPE_RAW)) &&
+ !!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT);
+}
+
+void arch_perf_update_userpage(struct perf_event *event,
+ struct perf_event_mmap_page *userpg, u64 now)
+{
+ struct clock_read_data *rd;
+ unsigned int seq;
+ u64 ns;
+
+ userpg->cap_user_time = 0;
+ userpg->cap_user_time_zero = 0;
+ userpg->cap_user_time_short = 0;
+ userpg->cap_user_rdpmc = riscv_perf_user_access(event);
+
+ userpg->pmc_width = 64;
+
+ do {
+ rd = sched_clock_read_begin(&seq);
+
+ userpg->time_mult = rd->mult;
+ userpg->time_shift = rd->shift;
+ userpg->time_zero = rd->epoch_ns;
+ userpg->time_cycles = rd->epoch_cyc;
+ userpg->time_mask = rd->sched_clock_mask;
+
+ /*
+ * Subtract the cycle base, such that software that
+ * doesn't know about cap_user_time_short still 'works'
+ * assuming no wraps.
+ */
+ ns = mul_u64_u32_shr(rd->epoch_cyc, rd->mult, rd->shift);
+ userpg->time_zero -= ns;
+
+ } while (sched_clock_read_retry(seq));
+
+ userpg->time_offset = userpg->time_zero - now;
+
+ /*
+ * time_shift is not expected to be greater than 31 due to
+ * the original published conversion algorithm shifting a
+ * 32-bit value (now specifies a 64-bit value) - refer
+ * perf_event_mmap_page documentation in perf_event.h.
+ */
+ if (userpg->time_shift == 32) {
+ userpg->time_shift = 31;
+ userpg->time_mult >>= 1;
+ }
+
+ /*
+ * Internal timekeeping for enabled/running/stopped times
+ * is always computed with the sched_clock.
+ */
+ userpg->cap_user_time = 1;
+ userpg->cap_user_time_zero = 1;
+ userpg->cap_user_time_short = 1;
+}
+
static unsigned long csr_read_num(int csr_num)
{
#define switchcase_csr_read(__csr_num, __val) {\
@@ -171,6 +235,8 @@ int riscv_pmu_event_set_period(struct perf_event *event)

local64_set(&hwc->prev_count, (u64)-left);

+ perf_event_update_userpage(event);
+
return overflow;
}

@@ -267,6 +333,9 @@ static int riscv_pmu_event_init(struct perf_event *event)
hwc->idx = -1;
hwc->event_base = mapped_event;

+ if (rvpmu->event_init)
+ rvpmu->event_init(event);
+
if (!is_sampling_event(event)) {
/*
* For non-sampling runs, limit the sample_period to half
@@ -283,6 +352,39 @@ static int riscv_pmu_event_init(struct perf_event *event)
return 0;
}

+static int riscv_pmu_event_idx(struct perf_event *event)
+{
+ struct riscv_pmu *rvpmu = to_riscv_pmu(event->pmu);
+
+ if (!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT))
+ return 0;
+
+ if (rvpmu->csr_index)
+ return rvpmu->csr_index(event) + 1;
+
+ return 0;
+}
+
+static void riscv_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
+{
+ struct riscv_pmu *rvpmu = to_riscv_pmu(event->pmu);
+
+ if (rvpmu->event_mapped) {
+ rvpmu->event_mapped(event, mm);
+ perf_event_update_userpage(event);
+ }
+}
+
+static void riscv_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
+{
+ struct riscv_pmu *rvpmu = to_riscv_pmu(event->pmu);
+
+ if (rvpmu->event_unmapped) {
+ rvpmu->event_unmapped(event, mm);
+ perf_event_update_userpage(event);
+ }
+}
+
struct riscv_pmu *riscv_pmu_alloc(void)
{
struct riscv_pmu *pmu;
@@ -307,6 +409,9 @@ struct riscv_pmu *riscv_pmu_alloc(void)
}
pmu->pmu = (struct pmu) {
.event_init = riscv_pmu_event_init,
+ .event_mapped = riscv_pmu_event_mapped,
+ .event_unmapped = riscv_pmu_event_unmapped,
+ .event_idx = riscv_pmu_event_idx,
.add = riscv_pmu_add,
.del = riscv_pmu_del,
.start = riscv_pmu_start,
diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index 5deeea0be7cb..43282e22ebe1 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -55,6 +55,10 @@ struct riscv_pmu {
void (*ctr_start)(struct perf_event *event, u64 init_val);
void (*ctr_stop)(struct perf_event *event, unsigned long flag);
int (*event_map)(struct perf_event *event, u64 *config);
+ void (*event_init)(struct perf_event *event);
+ void (*event_mapped)(struct perf_event *event, struct mm_struct *mm);
+ void (*event_unmapped)(struct perf_event *event, struct mm_struct *mm);
+ uint8_t (*csr_index)(struct perf_event *event);

struct cpu_hw_events __percpu *hw_events;
struct hlist_node node;
--
2.39.2


2023-07-27 15:30:39

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support

riscv now supports mmaping hardware counters so add what's needed to
take advantage of that in libperf.

Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
---
tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)

diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
index 0d1634cedf44..378a163f0554 100644
--- a/tools/lib/perf/mmap.c
+++ b/tools/lib/perf/mmap.c
@@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter)

static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }

+#elif __riscv_xlen == 64
+
+/* TODO: implement rv32 support */
+
+#define CSR_CYCLE 0xc00
+#define CSR_TIME 0xc01
+
+#define csr_read(csr) \
+({ \
+ register unsigned long __v; \
+ __asm__ __volatile__ ("csrr %0, " #csr \
+ : "=r" (__v) : \
+ : "memory"); \
+ __v; \
+})
+
+static unsigned long csr_read_num(int csr_num)
+{
+#define switchcase_csr_read(__csr_num, __val) {\
+ case __csr_num: \
+ __val = csr_read(__csr_num); \
+ break; }
+#define switchcase_csr_read_2(__csr_num, __val) {\
+ switchcase_csr_read(__csr_num + 0, __val) \
+ switchcase_csr_read(__csr_num + 1, __val)}
+#define switchcase_csr_read_4(__csr_num, __val) {\
+ switchcase_csr_read_2(__csr_num + 0, __val) \
+ switchcase_csr_read_2(__csr_num + 2, __val)}
+#define switchcase_csr_read_8(__csr_num, __val) {\
+ switchcase_csr_read_4(__csr_num + 0, __val) \
+ switchcase_csr_read_4(__csr_num + 4, __val)}
+#define switchcase_csr_read_16(__csr_num, __val) {\
+ switchcase_csr_read_8(__csr_num + 0, __val) \
+ switchcase_csr_read_8(__csr_num + 8, __val)}
+#define switchcase_csr_read_32(__csr_num, __val) {\
+ switchcase_csr_read_16(__csr_num + 0, __val) \
+ switchcase_csr_read_16(__csr_num + 16, __val)}
+
+ unsigned long ret = 0;
+
+ switch (csr_num) {
+ switchcase_csr_read_32(CSR_CYCLE, ret)
+ default:
+ break;
+ }
+
+ return ret;
+#undef switchcase_csr_read_32
+#undef switchcase_csr_read_16
+#undef switchcase_csr_read_8
+#undef switchcase_csr_read_4
+#undef switchcase_csr_read_2
+#undef switchcase_csr_read
+}
+
+static u64 read_perf_counter(unsigned int counter)
+{
+ return csr_read_num(CSR_CYCLE + counter);
+}
+
+static u64 read_timestamp(void)
+{
+ return csr_read_num(CSR_TIME);
+}
+
#else
static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
static u64 read_timestamp(void) { return 0; }
--
2.39.2


2023-07-27 16:07:07

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v4 07/10] drivers: perf: Implement perf event mmap support in the SBI backend

We used to unconditionnally expose the cycle and instret csrs to
userspace, which gives rise to security concerns.

So now we only allow access to hw counters from userspace through the perf
framework which will handle context switches, per-task events...etc. A
sysctl allows to revert the behaviour to the legacy mode so that userspace
applications which are not ready for this change do not break.

But the default value is to allow userspace only through perf: this will
break userspace applications which rely on direct access to rdcycle.
This choice was made for security reasons [1][2]: most of the applications
which use rdcycle can instead use rdtime to count the elapsed time.

[1] https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/REWcwYnzsKE?pli=1
[2] https://www.youtube.com/watch?v=3-c4C_L2PRQ&ab_channel=IEEESymposiumonSecurityandPrivacy

Signed-off-by: Alexandre Ghiti <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
---
drivers/perf/riscv_pmu.c | 10 +-
drivers/perf/riscv_pmu_sbi.c | 192 +++++++++++++++++++++++++++++++++--
2 files changed, 195 insertions(+), 7 deletions(-)

diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
index 432ad2e80ce3..80c052e93f9e 100644
--- a/drivers/perf/riscv_pmu.c
+++ b/drivers/perf/riscv_pmu.c
@@ -38,7 +38,15 @@ void arch_perf_update_userpage(struct perf_event *event,
userpg->cap_user_time_short = 0;
userpg->cap_user_rdpmc = riscv_perf_user_access(event);

- userpg->pmc_width = 64;
+#ifdef CONFIG_RISCV_PMU
+ /*
+ * The counters are 64-bit but the priv spec doesn't mandate all the
+ * bits to be implemented: that's why, counter width can vary based on
+ * the cpu vendor.
+ */
+ if (userpg->cap_user_rdpmc)
+ userpg->pmc_width = to_riscv_pmu(event->pmu)->ctr_get_width(event->hw.idx) + 1;
+#endif

do {
rd = sched_clock_read_begin(&seq);
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 760eb2afcf82..9a51053b1f99 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -24,6 +24,14 @@
#include <asm/sbi.h>
#include <asm/hwcap.h>

+#define SYSCTL_NO_USER_ACCESS 0
+#define SYSCTL_USER_ACCESS 1
+#define SYSCTL_LEGACY 2
+
+#define PERF_EVENT_FLAG_NO_USER_ACCESS BIT(SYSCTL_NO_USER_ACCESS)
+#define PERF_EVENT_FLAG_USER_ACCESS BIT(SYSCTL_USER_ACCESS)
+#define PERF_EVENT_FLAG_LEGACY BIT(SYSCTL_LEGACY)
+
PMU_FORMAT_ATTR(event, "config:0-47");
PMU_FORMAT_ATTR(firmware, "config:63");

@@ -43,6 +51,9 @@ static const struct attribute_group *riscv_pmu_attr_groups[] = {
NULL,
};

+/* Allow user mode access by default */
+static int sysctl_perf_user_access __read_mostly = SYSCTL_USER_ACCESS;
+
/*
* RISC-V doesn't have heterogeneous harts yet. This need to be part of
* per_cpu in case of harts with different pmu counters
@@ -301,6 +312,11 @@ int riscv_pmu_get_hpm_info(u32 *hw_ctr_width, u32 *num_hw_ctr)
}
EXPORT_SYMBOL_GPL(riscv_pmu_get_hpm_info);

+static uint8_t pmu_sbi_csr_index(struct perf_event *event)
+{
+ return pmu_ctr_list[event->hw.idx].csr - CSR_CYCLE;
+}
+
static unsigned long pmu_sbi_get_filter_flags(struct perf_event *event)
{
unsigned long cflags = 0;
@@ -329,18 +345,34 @@ static int pmu_sbi_ctr_get_idx(struct perf_event *event)
struct cpu_hw_events *cpuc = this_cpu_ptr(rvpmu->hw_events);
struct sbiret ret;
int idx;
- uint64_t cbase = 0;
+ uint64_t cbase = 0, cmask = rvpmu->cmask;
unsigned long cflags = 0;

cflags = pmu_sbi_get_filter_flags(event);
+
+ /*
+ * In legacy mode, we have to force the fixed counters for those events
+ * but not in the user access mode as we want to use the other counters
+ * that support sampling/filtering.
+ */
+ if (hwc->flags & PERF_EVENT_FLAG_LEGACY) {
+ if (event->attr.config == PERF_COUNT_HW_CPU_CYCLES) {
+ cflags |= SBI_PMU_CFG_FLAG_SKIP_MATCH;
+ cmask = 1;
+ } else if (event->attr.config == PERF_COUNT_HW_INSTRUCTIONS) {
+ cflags |= SBI_PMU_CFG_FLAG_SKIP_MATCH;
+ cmask = 1UL << (CSR_INSTRET - CSR_CYCLE);
+ }
+ }
+
/* retrieve the available counter index */
#if defined(CONFIG_32BIT)
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase,
- rvpmu->cmask, cflags, hwc->event_base, hwc->config,
+ cmask, cflags, hwc->event_base, hwc->config,
hwc->config >> 32);
#else
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_CFG_MATCH, cbase,
- rvpmu->cmask, cflags, hwc->event_base, hwc->config, 0);
+ cmask, cflags, hwc->event_base, hwc->config, 0);
#endif
if (ret.error) {
pr_debug("Not able to find a counter for event %lx config %llx\n",
@@ -474,6 +506,22 @@ static u64 pmu_sbi_ctr_read(struct perf_event *event)
return val;
}

+static void pmu_sbi_set_scounteren(void *arg)
+{
+ struct perf_event *event = (struct perf_event *)arg;
+
+ csr_write(CSR_SCOUNTEREN,
+ csr_read(CSR_SCOUNTEREN) | (1 << pmu_sbi_csr_index(event)));
+}
+
+static void pmu_sbi_reset_scounteren(void *arg)
+{
+ struct perf_event *event = (struct perf_event *)arg;
+
+ csr_write(CSR_SCOUNTEREN,
+ csr_read(CSR_SCOUNTEREN) & ~(1 << pmu_sbi_csr_index(event)));
+}
+
static void pmu_sbi_ctr_start(struct perf_event *event, u64 ival)
{
struct sbiret ret;
@@ -490,6 +538,10 @@ static void pmu_sbi_ctr_start(struct perf_event *event, u64 ival)
if (ret.error && (ret.error != SBI_ERR_ALREADY_STARTED))
pr_err("Starting counter idx %d failed with error %d\n",
hwc->idx, sbi_err_map_linux_errno(ret.error));
+
+ if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
+ (hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
+ pmu_sbi_set_scounteren((void *)event);
}

static void pmu_sbi_ctr_stop(struct perf_event *event, unsigned long flag)
@@ -497,6 +549,10 @@ static void pmu_sbi_ctr_stop(struct perf_event *event, unsigned long flag)
struct sbiret ret;
struct hw_perf_event *hwc = &event->hw;

+ if ((hwc->flags & PERF_EVENT_FLAG_USER_ACCESS) &&
+ (hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
+ pmu_sbi_reset_scounteren((void *)event);
+
ret = sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, hwc->idx, 1, flag, 0, 0, 0);
if (ret.error && (ret.error != SBI_ERR_ALREADY_STOPPED) &&
flag != SBI_PMU_STOP_FLAG_RESET)
@@ -704,10 +760,13 @@ static int pmu_sbi_starting_cpu(unsigned int cpu, struct hlist_node *node)
struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);

/*
- * Enable the access for CYCLE, TIME, and INSTRET CSRs from userspace,
- * as is necessary to maintain uABI compatibility.
+ * We keep enabling userspace access to CYCLE, TIME and INSTRET via the
+ * legacy option but that will be removed in the future.
*/
- csr_write(CSR_SCOUNTEREN, 0x7);
+ if (sysctl_perf_user_access == SYSCTL_LEGACY)
+ csr_write(CSR_SCOUNTEREN, 0x7);
+ else
+ csr_write(CSR_SCOUNTEREN, 0x2);

/* Stop all the counters so that they can be enabled from perf */
pmu_sbi_stop_all(pmu);
@@ -838,6 +897,121 @@ static void riscv_pmu_destroy(struct riscv_pmu *pmu)
cpuhp_state_remove_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
}

+static void pmu_sbi_event_init(struct perf_event *event)
+{
+ /*
+ * The permissions are set at event_init so that we do not depend
+ * on the sysctl value that can change.
+ */
+ if (sysctl_perf_user_access == SYSCTL_NO_USER_ACCESS)
+ event->hw.flags |= PERF_EVENT_FLAG_NO_USER_ACCESS;
+ else if (sysctl_perf_user_access == SYSCTL_USER_ACCESS)
+ event->hw.flags |= PERF_EVENT_FLAG_USER_ACCESS;
+ else
+ event->hw.flags |= PERF_EVENT_FLAG_LEGACY;
+}
+
+static void pmu_sbi_event_mapped(struct perf_event *event, struct mm_struct *mm)
+{
+ if (event->hw.flags & PERF_EVENT_FLAG_NO_USER_ACCESS)
+ return;
+
+ if (event->hw.flags & PERF_EVENT_FLAG_LEGACY) {
+ if (event->attr.config != PERF_COUNT_HW_CPU_CYCLES &&
+ event->attr.config != PERF_COUNT_HW_INSTRUCTIONS) {
+ return;
+ }
+ }
+
+ /*
+ * The user mmapped the event to directly access it: this is where
+ * we determine based on sysctl_perf_user_access if we grant userspace
+ * the direct access to this event. That means that within the same
+ * task, some events may be directly accessible and some other may not,
+ * if the user changes the value of sysctl_perf_user_accesss in the
+ * meantime.
+ */
+
+ event->hw.flags |= PERF_EVENT_FLAG_USER_READ_CNT;
+
+ /*
+ * We must enable userspace access *before* advertising in the user page
+ * that it is possible to do so to avoid any race.
+ * And we must notify all cpus here because threads that currently run
+ * on other cpus will try to directly access the counter too without
+ * calling pmu_sbi_ctr_start.
+ */
+ if (event->hw.flags & PERF_EVENT_FLAG_USER_ACCESS)
+ on_each_cpu_mask(mm_cpumask(mm),
+ pmu_sbi_set_scounteren, (void *)event, 1);
+}
+
+static void pmu_sbi_event_unmapped(struct perf_event *event, struct mm_struct *mm)
+{
+ if (event->hw.flags & PERF_EVENT_FLAG_NO_USER_ACCESS)
+ return;
+
+ if (event->hw.flags & PERF_EVENT_FLAG_LEGACY) {
+ if (event->attr.config != PERF_COUNT_HW_CPU_CYCLES &&
+ event->attr.config != PERF_COUNT_HW_INSTRUCTIONS) {
+ return;
+ }
+ }
+
+ /*
+ * Here we can directly remove user access since the user does not have
+ * access to the user page anymore so we avoid the racy window where the
+ * user could have read cap_user_rdpmc to true right before we disable
+ * it.
+ */
+ event->hw.flags &= ~PERF_EVENT_FLAG_USER_READ_CNT;
+
+ if (event->hw.flags & PERF_EVENT_FLAG_USER_ACCESS)
+ on_each_cpu_mask(mm_cpumask(mm),
+ pmu_sbi_reset_scounteren, (void *)event, 1);
+}
+
+static void riscv_pmu_update_counter_access(void *info)
+{
+ if (sysctl_perf_user_access == SYSCTL_LEGACY)
+ csr_write(CSR_SCOUNTEREN, 0x7);
+ else
+ csr_write(CSR_SCOUNTEREN, 0x2);
+}
+
+static int riscv_pmu_proc_user_access_handler(struct ctl_table *table,
+ int write, void *buffer,
+ size_t *lenp, loff_t *ppos)
+{
+ int prev = sysctl_perf_user_access;
+ int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+ /*
+ * Test against the previous value since we clear SCOUNTEREN when
+ * sysctl_perf_user_access is set to SYSCTL_USER_ACCESS, but we should
+ * not do that if that was already the case.
+ */
+ if (ret || !write || prev == sysctl_perf_user_access)
+ return ret;
+
+ on_each_cpu(riscv_pmu_update_counter_access, NULL, 1);
+
+ return 0;
+}
+
+static struct ctl_table sbi_pmu_sysctl_table[] = {
+ {
+ .procname = "perf_user_access",
+ .data = &sysctl_perf_user_access,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = riscv_pmu_proc_user_access_handler,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_TWO,
+ },
+ { }
+};
+
static int pmu_sbi_device_probe(struct platform_device *pdev)
{
struct riscv_pmu *pmu = NULL;
@@ -881,6 +1055,10 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
pmu->ctr_get_width = pmu_sbi_ctr_get_width;
pmu->ctr_clear_idx = pmu_sbi_ctr_clear_idx;
pmu->ctr_read = pmu_sbi_ctr_read;
+ pmu->event_init = pmu_sbi_event_init;
+ pmu->event_mapped = pmu_sbi_event_mapped;
+ pmu->event_unmapped = pmu_sbi_event_unmapped;
+ pmu->csr_index = pmu_sbi_csr_index;

ret = cpuhp_state_add_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
if (ret)
@@ -894,6 +1072,8 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
if (ret)
goto out_unregister;

+ register_sysctl("kernel", sbi_pmu_sysctl_table);
+
return 0;

out_unregister:
--
2.39.2


2023-07-28 18:24:07

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v4 10/10] perf: tests: Adapt mmap-basic.c for riscv

On Thu, Jul 27, 2023 at 7:30 AM Alexandre Ghiti <[email protected]> wrote:
>
> riscv now supports mmaping hardware counters to userspace so adapt the test
> to run on this architecture.
>
> Signed-off-by: Alexandre Ghiti <[email protected]>
> Reviewed-by: Andrew Jones <[email protected]>
> Reviewed-by: Atish Patra <[email protected]>
> ---
> tools/perf/tests/mmap-basic.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/tests/mmap-basic.c b/tools/perf/tests/mmap-basic.c
> index e68ca6229756..f5075ca774f8 100644
> --- a/tools/perf/tests/mmap-basic.c
> +++ b/tools/perf/tests/mmap-basic.c
> @@ -284,7 +284,7 @@ static struct test_case tests__basic_mmap[] = {
> "permissions"),
> TEST_CASE_REASON("User space counter reading of instructions",
> mmap_user_read_instr,
> -#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__)
> +#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__) || __riscv_xlen == 64

Same comment relating to the guard, why not defined(__riscv) ?

Thanks,
Ian

> "permissions"
> #else
> "unsupported"
> @@ -292,7 +292,7 @@ static struct test_case tests__basic_mmap[] = {
> ),
> TEST_CASE_REASON("User space counter reading of cycles",
> mmap_user_read_cycles,
> -#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__)
> +#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__) || __riscv_xlen == 64
> "permissions"
> #else
> "unsupported"
> --
> 2.39.2
>

2023-07-28 19:06:34

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support

On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <[email protected]> wrote:
>
> riscv now supports mmaping hardware counters so add what's needed to
> take advantage of that in libperf.
>
> Signed-off-by: Alexandre Ghiti <[email protected]>
> Reviewed-by: Andrew Jones <[email protected]>
> Reviewed-by: Atish Patra <[email protected]>
> ---
> tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 65 insertions(+)
>
> diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> index 0d1634cedf44..378a163f0554 100644
> --- a/tools/lib/perf/mmap.c
> +++ b/tools/lib/perf/mmap.c
> @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter)
>
> static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }
>
> +#elif __riscv_xlen == 64

This is something of an odd guard, perhaps:
#elif defined(__riscv) && __riscv_xlen == 64

That way it is more intention revealing that this is riscv code. Could
you add a comment relating to the __riscv_xlen ?

> +
> +/* TODO: implement rv32 support */
> +
> +#define CSR_CYCLE 0xc00
> +#define CSR_TIME 0xc01
> +
> +#define csr_read(csr) \
> +({ \
> + register unsigned long __v; \
> + __asm__ __volatile__ ("csrr %0, " #csr \
> + : "=r" (__v) : \
> + : "memory"); \

To avoid the macro pasting that could potentially go weird, could this be:

__asm__ __volatile__ ("csrr %0, %1",
: "=r"(__v) /* outputs */
: "i"(csr) /* inputs */
: "memory" /* clobbers */)

Also, why is this clobbering memory? Worth adding a comment.

Thanks,
Ian

> + __v; \
> +})
> +
> +static unsigned long csr_read_num(int csr_num)
> +{
> +#define switchcase_csr_read(__csr_num, __val) {\
> + case __csr_num: \
> + __val = csr_read(__csr_num); \
> + break; }
> +#define switchcase_csr_read_2(__csr_num, __val) {\
> + switchcase_csr_read(__csr_num + 0, __val) \
> + switchcase_csr_read(__csr_num + 1, __val)}
> +#define switchcase_csr_read_4(__csr_num, __val) {\
> + switchcase_csr_read_2(__csr_num + 0, __val) \
> + switchcase_csr_read_2(__csr_num + 2, __val)}
> +#define switchcase_csr_read_8(__csr_num, __val) {\
> + switchcase_csr_read_4(__csr_num + 0, __val) \
> + switchcase_csr_read_4(__csr_num + 4, __val)}
> +#define switchcase_csr_read_16(__csr_num, __val) {\
> + switchcase_csr_read_8(__csr_num + 0, __val) \
> + switchcase_csr_read_8(__csr_num + 8, __val)}
> +#define switchcase_csr_read_32(__csr_num, __val) {\
> + switchcase_csr_read_16(__csr_num + 0, __val) \
> + switchcase_csr_read_16(__csr_num + 16, __val)}
> +
> + unsigned long ret = 0;
> +
> + switch (csr_num) {
> + switchcase_csr_read_32(CSR_CYCLE, ret)
> + default:
> + break;
> + }
> +
> + return ret;
> +#undef switchcase_csr_read_32
> +#undef switchcase_csr_read_16
> +#undef switchcase_csr_read_8
> +#undef switchcase_csr_read_4
> +#undef switchcase_csr_read_2
> +#undef switchcase_csr_read
> +}
> +
> +static u64 read_perf_counter(unsigned int counter)
> +{
> + return csr_read_num(CSR_CYCLE + counter);
> +}
> +
> +static u64 read_timestamp(void)
> +{
> + return csr_read_num(CSR_TIME);
> +}
> +
> #else
> static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
> static u64 read_timestamp(void) { return 0; }
> --
> 2.39.2
>

2023-07-29 11:44:31

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 10/10] perf: tests: Adapt mmap-basic.c for riscv

On Fri, Jul 28, 2023 at 10:54:02AM -0700, Ian Rogers wrote:
> On Thu, Jul 27, 2023 at 7:30 AM Alexandre Ghiti <[email protected]> wrote:
> >
> > riscv now supports mmaping hardware counters to userspace so adapt the test
> > to run on this architecture.
> >
> > Signed-off-by: Alexandre Ghiti <[email protected]>
> > Reviewed-by: Andrew Jones <[email protected]>
> > Reviewed-by: Atish Patra <[email protected]>
> > ---
> > tools/perf/tests/mmap-basic.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/tests/mmap-basic.c b/tools/perf/tests/mmap-basic.c
> > index e68ca6229756..f5075ca774f8 100644
> > --- a/tools/perf/tests/mmap-basic.c
> > +++ b/tools/perf/tests/mmap-basic.c
> > @@ -284,7 +284,7 @@ static struct test_case tests__basic_mmap[] = {
> > "permissions"),
> > TEST_CASE_REASON("User space counter reading of instructions",
> > mmap_user_read_instr,
> > -#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__)
> > +#if defined(__i386__) || defined(__x86_64__) || defined(__aarch64__) || __riscv_xlen == 64
>
> Same comment relating to the guard, why not defined(__riscv) ?

__riscv_xlen will always be defined for all riscv targets, so also
checking __riscv isn't necessary when a specific bit width also needs
to be checked. __riscv is useful for checks that don't need to be
concerned with the bit width. Grepping, I see instances of both
"defined(__riscv) && __riscv_xlen ==" and just "__riscv_xlen ==".
IMHO, the former should be reduced to the latter, rather than creating
more instances of them.

Thanks,
drew

2023-07-31 10:26:31

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support

Hi Ian,

On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <[email protected]> wrote:
>
> On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <[email protected]> wrote:
> >
> > riscv now supports mmaping hardware counters so add what's needed to
> > take advantage of that in libperf.
> >
> > Signed-off-by: Alexandre Ghiti <[email protected]>
> > Reviewed-by: Andrew Jones <[email protected]>
> > Reviewed-by: Atish Patra <[email protected]>
> > ---
> > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 65 insertions(+)
> >
> > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > index 0d1634cedf44..378a163f0554 100644
> > --- a/tools/lib/perf/mmap.c
> > +++ b/tools/lib/perf/mmap.c
> > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter)
> >
> > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }
> >
> > +#elif __riscv_xlen == 64
>
> This is something of an odd guard, perhaps:
> #elif defined(__riscv) && __riscv_xlen == 64
>
> That way it is more intention revealing that this is riscv code. Could
> you add a comment relating to the __riscv_xlen ?

I guess Andrew answered that already.

>
> > +
> > +/* TODO: implement rv32 support */
> > +
> > +#define CSR_CYCLE 0xc00
> > +#define CSR_TIME 0xc01
> > +
> > +#define csr_read(csr) \
> > +({ \
> > + register unsigned long __v; \
> > + __asm__ __volatile__ ("csrr %0, " #csr \
> > + : "=r" (__v) : \
> > + : "memory"); \
>
> To avoid the macro pasting that could potentially go weird, could this be:
>
> __asm__ __volatile__ ("csrr %0, %1",
> : "=r"(__v) /* outputs */
> : "i"(csr) /* inputs */
> : "memory" /* clobbers */)
>
> Also, why is this clobbering memory? Worth adding a comment.

No idea, I see that it is also done this way in
arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ?

Thanks for your comments!

Alex

>
> Thanks,
> Ian
>
> > + __v; \
> > +})
> > +
> > +static unsigned long csr_read_num(int csr_num)
> > +{
> > +#define switchcase_csr_read(__csr_num, __val) {\
> > + case __csr_num: \
> > + __val = csr_read(__csr_num); \
> > + break; }
> > +#define switchcase_csr_read_2(__csr_num, __val) {\
> > + switchcase_csr_read(__csr_num + 0, __val) \
> > + switchcase_csr_read(__csr_num + 1, __val)}
> > +#define switchcase_csr_read_4(__csr_num, __val) {\
> > + switchcase_csr_read_2(__csr_num + 0, __val) \
> > + switchcase_csr_read_2(__csr_num + 2, __val)}
> > +#define switchcase_csr_read_8(__csr_num, __val) {\
> > + switchcase_csr_read_4(__csr_num + 0, __val) \
> > + switchcase_csr_read_4(__csr_num + 4, __val)}
> > +#define switchcase_csr_read_16(__csr_num, __val) {\
> > + switchcase_csr_read_8(__csr_num + 0, __val) \
> > + switchcase_csr_read_8(__csr_num + 8, __val)}
> > +#define switchcase_csr_read_32(__csr_num, __val) {\
> > + switchcase_csr_read_16(__csr_num + 0, __val) \
> > + switchcase_csr_read_16(__csr_num + 16, __val)}
> > +
> > + unsigned long ret = 0;
> > +
> > + switch (csr_num) {
> > + switchcase_csr_read_32(CSR_CYCLE, ret)
> > + default:
> > + break;
> > + }
> > +
> > + return ret;
> > +#undef switchcase_csr_read_32
> > +#undef switchcase_csr_read_16
> > +#undef switchcase_csr_read_8
> > +#undef switchcase_csr_read_4
> > +#undef switchcase_csr_read_2
> > +#undef switchcase_csr_read
> > +}
> > +
> > +static u64 read_perf_counter(unsigned int counter)
> > +{
> > + return csr_read_num(CSR_CYCLE + counter);
> > +}
> > +
> > +static u64 read_timestamp(void)
> > +{
> > + return csr_read_num(CSR_TIME);
> > +}
> > +
> > #else
> > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
> > static u64 read_timestamp(void) { return 0; }
> > --
> > 2.39.2
> >

2023-07-31 12:40:17

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support

On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <[email protected]> wrote:
>
> Hi Ian,
>
> On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <[email protected]> wrote:
> >
> > On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <[email protected]> wrote:
> > >
> > > riscv now supports mmaping hardware counters so add what's needed to
> > > take advantage of that in libperf.
> > >
> > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > Reviewed-by: Andrew Jones <[email protected]>
> > > Reviewed-by: Atish Patra <[email protected]>
> > > ---
> > > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 65 insertions(+)
> > >
> > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > > index 0d1634cedf44..378a163f0554 100644
> > > --- a/tools/lib/perf/mmap.c
> > > +++ b/tools/lib/perf/mmap.c
> > > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter)
> > >
> > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }
> > >
> > > +#elif __riscv_xlen == 64
> >
> > This is something of an odd guard, perhaps:
> > #elif defined(__riscv) && __riscv_xlen == 64
> >
> > That way it is more intention revealing that this is riscv code. Could
> > you add a comment relating to the __riscv_xlen ?
>
> I guess Andrew answered that already.
>
> >
> > > +
> > > +/* TODO: implement rv32 support */
> > > +
> > > +#define CSR_CYCLE 0xc00
> > > +#define CSR_TIME 0xc01
> > > +
> > > +#define csr_read(csr) \
> > > +({ \
> > > + register unsigned long __v; \
> > > + __asm__ __volatile__ ("csrr %0, " #csr \
> > > + : "=r" (__v) : \
> > > + : "memory"); \
> >
> > To avoid the macro pasting that could potentially go weird, could this be:
> >
> > __asm__ __volatile__ ("csrr %0, %1",
> > : "=r"(__v) /* outputs */
> > : "i"(csr) /* inputs */
> > : "memory" /* clobbers */)

Forgot to answer this one: it compiles, but I have to admit that I
don't understand the difference and if that's correct (all macros in
arch/riscv/include/asm/csr.h use # to do this) and what benefits it
brings. Can you elaborate more on things that could "go weird"?

Thanks again,

Alex

> >
> > Also, why is this clobbering memory? Worth adding a comment.
>
> No idea, I see that it is also done this way in
> arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ?
>
> Thanks for your comments!
>
> Alex
>
> >
> > Thanks,
> > Ian
> >
> > > + __v; \
> > > +})
> > > +
> > > +static unsigned long csr_read_num(int csr_num)
> > > +{
> > > +#define switchcase_csr_read(__csr_num, __val) {\
> > > + case __csr_num: \
> > > + __val = csr_read(__csr_num); \
> > > + break; }
> > > +#define switchcase_csr_read_2(__csr_num, __val) {\
> > > + switchcase_csr_read(__csr_num + 0, __val) \
> > > + switchcase_csr_read(__csr_num + 1, __val)}
> > > +#define switchcase_csr_read_4(__csr_num, __val) {\
> > > + switchcase_csr_read_2(__csr_num + 0, __val) \
> > > + switchcase_csr_read_2(__csr_num + 2, __val)}
> > > +#define switchcase_csr_read_8(__csr_num, __val) {\
> > > + switchcase_csr_read_4(__csr_num + 0, __val) \
> > > + switchcase_csr_read_4(__csr_num + 4, __val)}
> > > +#define switchcase_csr_read_16(__csr_num, __val) {\
> > > + switchcase_csr_read_8(__csr_num + 0, __val) \
> > > + switchcase_csr_read_8(__csr_num + 8, __val)}
> > > +#define switchcase_csr_read_32(__csr_num, __val) {\
> > > + switchcase_csr_read_16(__csr_num + 0, __val) \
> > > + switchcase_csr_read_16(__csr_num + 16, __val)}
> > > +
> > > + unsigned long ret = 0;
> > > +
> > > + switch (csr_num) {
> > > + switchcase_csr_read_32(CSR_CYCLE, ret)
> > > + default:
> > > + break;
> > > + }
> > > +
> > > + return ret;
> > > +#undef switchcase_csr_read_32
> > > +#undef switchcase_csr_read_16
> > > +#undef switchcase_csr_read_8
> > > +#undef switchcase_csr_read_4
> > > +#undef switchcase_csr_read_2
> > > +#undef switchcase_csr_read
> > > +}
> > > +
> > > +static u64 read_perf_counter(unsigned int counter)
> > > +{
> > > + return csr_read_num(CSR_CYCLE + counter);
> > > +}
> > > +
> > > +static u64 read_timestamp(void)
> > > +{
> > > + return csr_read_num(CSR_TIME);
> > > +}
> > > +
> > > #else
> > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
> > > static u64 read_timestamp(void) { return 0; }
> > > --
> > > 2.39.2
> > >

2023-07-31 16:26:19

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support

On Mon, Jul 31, 2023 at 3:27 AM Alexandre Ghiti <[email protected]> wrote:
>
> On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <[email protected]> wrote:
> >
> > Hi Ian,
> >
> > On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <[email protected]> wrote:
> > >
> > > On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <[email protected]> wrote:
> > > >
> > > > riscv now supports mmaping hardware counters so add what's needed to
> > > > take advantage of that in libperf.
> > > >
> > > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > > Reviewed-by: Andrew Jones <[email protected]>
> > > > Reviewed-by: Atish Patra <[email protected]>
> > > > ---
> > > > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++
> > > > 1 file changed, 65 insertions(+)
> > > >
> > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > > > index 0d1634cedf44..378a163f0554 100644
> > > > --- a/tools/lib/perf/mmap.c
> > > > +++ b/tools/lib/perf/mmap.c
> > > > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter)
> > > >
> > > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }
> > > >
> > > > +#elif __riscv_xlen == 64
> > >
> > > This is something of an odd guard, perhaps:
> > > #elif defined(__riscv) && __riscv_xlen == 64
> > >
> > > That way it is more intention revealing that this is riscv code. Could
> > > you add a comment relating to the __riscv_xlen ?
> >
> > I guess Andrew answered that already.
> >

Not sure. I still think it looks weird:
...
#if defined(__i386__) || defined(__x86_64__)
...
#elif defined(__aarch64__)
...
#elif __riscv_xlen == 64
...
#else
static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
static u64 read_timestamp(void) { return 0; }
#endif

The first two are clearly #ifdef-ing architecture specific assembly
code, under what conditions I get RISC-V code ¯\(ツ)/¯ At least worth
a comment like "csrr is only available when you have xlens of 64
because ..."

> > >
> > > > +
> > > > +/* TODO: implement rv32 support */
> > > > +
> > > > +#define CSR_CYCLE 0xc00
> > > > +#define CSR_TIME 0xc01
> > > > +
> > > > +#define csr_read(csr) \
> > > > +({ \
> > > > + register unsigned long __v; \
> > > > + __asm__ __volatile__ ("csrr %0, " #csr \
> > > > + : "=r" (__v) : \
> > > > + : "memory"); \
> > >
> > > To avoid the macro pasting that could potentially go weird, could this be:
> > >
> > > __asm__ __volatile__ ("csrr %0, %1",
> > > : "=r"(__v) /* outputs */
> > > : "i"(csr) /* inputs */
> > > : "memory" /* clobbers */)
>
> Forgot to answer this one: it compiles, but I have to admit that I
> don't understand the difference and if that's correct (all macros in
> arch/riscv/include/asm/csr.h use # to do this) and what benefits it
> brings. Can you elaborate more on things that could "go weird"?

So rather than use an input constraint for the asm block you are using
the C preprocessor to paste in the csr argument. If csr is something
like "1" then it looks good and you'll get "csrr %0,1". If you pass
something like "1 << 31" then that will be pasted as "csrr %0, 1 <<
31" and that starts to get weird in the context of being in the
assembler where it is unlikely the C operators work. Using the input
constraint avoids this, causes the C compiler to check the type of the
argument and you'll probably get more intelligible error messages as a
consequence.

>
> Thanks again,
>
> Alex
>
> > >
> > > Also, why is this clobbering memory? Worth adding a comment.
> >
> > No idea, I see that it is also done this way in
> > arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ?

It would seem to make sense then not to have a memory constraint until
we know why we're doing it?

Thanks,
Ian

> >
> > Thanks for your comments!
> >
> > Alex
> >
> > >
> > > Thanks,
> > > Ian
> > >
> > > > + __v; \
> > > > +})
> > > > +
> > > > +static unsigned long csr_read_num(int csr_num)
> > > > +{
> > > > +#define switchcase_csr_read(__csr_num, __val) {\
> > > > + case __csr_num: \
> > > > + __val = csr_read(__csr_num); \
> > > > + break; }
> > > > +#define switchcase_csr_read_2(__csr_num, __val) {\
> > > > + switchcase_csr_read(__csr_num + 0, __val) \
> > > > + switchcase_csr_read(__csr_num + 1, __val)}
> > > > +#define switchcase_csr_read_4(__csr_num, __val) {\
> > > > + switchcase_csr_read_2(__csr_num + 0, __val) \
> > > > + switchcase_csr_read_2(__csr_num + 2, __val)}
> > > > +#define switchcase_csr_read_8(__csr_num, __val) {\
> > > > + switchcase_csr_read_4(__csr_num + 0, __val) \
> > > > + switchcase_csr_read_4(__csr_num + 4, __val)}
> > > > +#define switchcase_csr_read_16(__csr_num, __val) {\
> > > > + switchcase_csr_read_8(__csr_num + 0, __val) \
> > > > + switchcase_csr_read_8(__csr_num + 8, __val)}
> > > > +#define switchcase_csr_read_32(__csr_num, __val) {\
> > > > + switchcase_csr_read_16(__csr_num + 0, __val) \
> > > > + switchcase_csr_read_16(__csr_num + 16, __val)}
> > > > +
> > > > + unsigned long ret = 0;
> > > > +
> > > > + switch (csr_num) {
> > > > + switchcase_csr_read_32(CSR_CYCLE, ret)
> > > > + default:
> > > > + break;
> > > > + }
> > > > +
> > > > + return ret;
> > > > +#undef switchcase_csr_read_32
> > > > +#undef switchcase_csr_read_16
> > > > +#undef switchcase_csr_read_8
> > > > +#undef switchcase_csr_read_4
> > > > +#undef switchcase_csr_read_2
> > > > +#undef switchcase_csr_read
> > > > +}
> > > > +
> > > > +static u64 read_perf_counter(unsigned int counter)
> > > > +{
> > > > + return csr_read_num(CSR_CYCLE + counter);
> > > > +}
> > > > +
> > > > +static u64 read_timestamp(void)
> > > > +{
> > > > + return csr_read_num(CSR_TIME);
> > > > +}
> > > > +
> > > > #else
> > > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
> > > > static u64 read_timestamp(void) { return 0; }
> > > > --
> > > > 2.39.2
> > > >

2023-07-31 16:33:13

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support

On Mon, Jul 31, 2023 at 5:10 PM Ian Rogers <[email protected]> wrote:
>
> On Mon, Jul 31, 2023 at 3:27 AM Alexandre Ghiti <[email protected]> wrote:
> >
> > On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <[email protected]> wrote:
> > >
> > > Hi Ian,
> > >
> > > On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <[email protected]> wrote:
> > > >
> > > > On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <[email protected]> wrote:
> > > > >
> > > > > riscv now supports mmaping hardware counters so add what's needed to
> > > > > take advantage of that in libperf.
> > > > >
> > > > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > > > Reviewed-by: Andrew Jones <[email protected]>
> > > > > Reviewed-by: Atish Patra <[email protected]>
> > > > > ---
> > > > > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++
> > > > > 1 file changed, 65 insertions(+)
> > > > >
> > > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > > > > index 0d1634cedf44..378a163f0554 100644
> > > > > --- a/tools/lib/perf/mmap.c
> > > > > +++ b/tools/lib/perf/mmap.c
> > > > > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter)
> > > > >
> > > > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }
> > > > >
> > > > > +#elif __riscv_xlen == 64
> > > >
> > > > This is something of an odd guard, perhaps:
> > > > #elif defined(__riscv) && __riscv_xlen == 64
> > > >
> > > > That way it is more intention revealing that this is riscv code. Could
> > > > you add a comment relating to the __riscv_xlen ?
> > >
> > > I guess Andrew answered that already.
> > >
>
> Not sure. I still think it looks weird:
> ...
> #if defined(__i386__) || defined(__x86_64__)
> ...
> #elif defined(__aarch64__)
> ...
> #elif __riscv_xlen == 64
> ...
> #else
> static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
> static u64 read_timestamp(void) { return 0; }
> #endif
>
> The first two are clearly #ifdef-ing architecture specific assembly
> code, under what conditions I get RISC-V code ¯\(ツ)/¯ At least worth
> a comment like "csrr is only available when you have xlens of 64
> because ..."

__riscv_xlen indicates riscv64, which is the only target of this
patchset. But if you prefer, I don't mind adding back the
defined(__riscv) if I re-spin a new version.

>
> > > >
> > > > > +
> > > > > +/* TODO: implement rv32 support */
> > > > > +
> > > > > +#define CSR_CYCLE 0xc00
> > > > > +#define CSR_TIME 0xc01
> > > > > +
> > > > > +#define csr_read(csr) \
> > > > > +({ \
> > > > > + register unsigned long __v; \
> > > > > + __asm__ __volatile__ ("csrr %0, " #csr \
> > > > > + : "=r" (__v) : \
> > > > > + : "memory"); \
> > > >
> > > > To avoid the macro pasting that could potentially go weird, could this be:
> > > >
> > > > __asm__ __volatile__ ("csrr %0, %1",
> > > > : "=r"(__v) /* outputs */
> > > > : "i"(csr) /* inputs */
> > > > : "memory" /* clobbers */)
> >
> > Forgot to answer this one: it compiles, but I have to admit that I
> > don't understand the difference and if that's correct (all macros in
> > arch/riscv/include/asm/csr.h use # to do this) and what benefits it
> > brings. Can you elaborate more on things that could "go weird"?
>
> So rather than use an input constraint for the asm block you are using
> the C preprocessor to paste in the csr argument. If csr is something
> like "1" then it looks good and you'll get "csrr %0,1". If you pass
> something like "1 << 31" then that will be pasted as "csrr %0, 1 <<
> 31" and that starts to get weird in the context of being in the
> assembler where it is unlikely the C operators work. Using the input
> constraint avoids this, causes the C compiler to check the type of the
> argument and you'll probably get more intelligible error messages as a
> consequence.
>

Thanks. So if I'm not mistaken, in this exact context, given we only
use csr_read() through the csr_read_num() function, it seems ok right?

> >
> > Thanks again,
> >
> > Alex
> >
> > > >
> > > > Also, why is this clobbering memory? Worth adding a comment.
> > >
> > > No idea, I see that it is also done this way in
> > > arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ?
>
> It would seem to make sense then not to have a memory constraint until
> we know why we're doing it?
>

I have just had the answer internally (thanks to @Brendan Sweeney):
csr modifications can alter how the memory is accessed (satp which
changes the address space, sum which allows/disallows userspace
access...), so we need the memory barrier to make sure the compiler
does not reorder the memory accesses.

Thanks,

Alex

> Thanks,
> Ian
>
> > >
> > > Thanks for your comments!
> > >
> > > Alex
> > >
> > > >
> > > > Thanks,
> > > > Ian
> > > >
> > > > > + __v; \
> > > > > +})
> > > > > +
> > > > > +static unsigned long csr_read_num(int csr_num)
> > > > > +{
> > > > > +#define switchcase_csr_read(__csr_num, __val) {\
> > > > > + case __csr_num: \
> > > > > + __val = csr_read(__csr_num); \
> > > > > + break; }
> > > > > +#define switchcase_csr_read_2(__csr_num, __val) {\
> > > > > + switchcase_csr_read(__csr_num + 0, __val) \
> > > > > + switchcase_csr_read(__csr_num + 1, __val)}
> > > > > +#define switchcase_csr_read_4(__csr_num, __val) {\
> > > > > + switchcase_csr_read_2(__csr_num + 0, __val) \
> > > > > + switchcase_csr_read_2(__csr_num + 2, __val)}
> > > > > +#define switchcase_csr_read_8(__csr_num, __val) {\
> > > > > + switchcase_csr_read_4(__csr_num + 0, __val) \
> > > > > + switchcase_csr_read_4(__csr_num + 4, __val)}
> > > > > +#define switchcase_csr_read_16(__csr_num, __val) {\
> > > > > + switchcase_csr_read_8(__csr_num + 0, __val) \
> > > > > + switchcase_csr_read_8(__csr_num + 8, __val)}
> > > > > +#define switchcase_csr_read_32(__csr_num, __val) {\
> > > > > + switchcase_csr_read_16(__csr_num + 0, __val) \
> > > > > + switchcase_csr_read_16(__csr_num + 16, __val)}
> > > > > +
> > > > > + unsigned long ret = 0;
> > > > > +
> > > > > + switch (csr_num) {
> > > > > + switchcase_csr_read_32(CSR_CYCLE, ret)
> > > > > + default:
> > > > > + break;
> > > > > + }
> > > > > +
> > > > > + return ret;
> > > > > +#undef switchcase_csr_read_32
> > > > > +#undef switchcase_csr_read_16
> > > > > +#undef switchcase_csr_read_8
> > > > > +#undef switchcase_csr_read_4
> > > > > +#undef switchcase_csr_read_2
> > > > > +#undef switchcase_csr_read
> > > > > +}
> > > > > +
> > > > > +static u64 read_perf_counter(unsigned int counter)
> > > > > +{
> > > > > + return csr_read_num(CSR_CYCLE + counter);
> > > > > +}
> > > > > +
> > > > > +static u64 read_timestamp(void)
> > > > > +{
> > > > > + return csr_read_num(CSR_TIME);
> > > > > +}
> > > > > +
> > > > > #else
> > > > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
> > > > > static u64 read_timestamp(void) { return 0; }
> > > > > --
> > > > > 2.39.2
> > > > >

2023-07-31 17:09:26

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support

On Mon, Jul 31, 2023 at 9:06 AM Alexandre Ghiti <[email protected]> wrote:
>
> On Mon, Jul 31, 2023 at 5:10 PM Ian Rogers <[email protected]> wrote:
> >
> > On Mon, Jul 31, 2023 at 3:27 AM Alexandre Ghiti <[email protected]> wrote:
> > >
> > > On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <[email protected]> wrote:
> > > >
> > > > Hi Ian,
> > > >
> > > > On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <[email protected]> wrote:
> > > > >
> > > > > On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <[email protected]> wrote:
> > > > > >
> > > > > > riscv now supports mmaping hardware counters so add what's needed to
> > > > > > take advantage of that in libperf.
> > > > > >
> > > > > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > > > > Reviewed-by: Andrew Jones <[email protected]>
> > > > > > Reviewed-by: Atish Patra <[email protected]>
> > > > > > ---
> > > > > > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++
> > > > > > 1 file changed, 65 insertions(+)
> > > > > >
> > > > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > > > > > index 0d1634cedf44..378a163f0554 100644
> > > > > > --- a/tools/lib/perf/mmap.c
> > > > > > +++ b/tools/lib/perf/mmap.c
> > > > > > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter)
> > > > > >
> > > > > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }
> > > > > >
> > > > > > +#elif __riscv_xlen == 64
> > > > >
> > > > > This is something of an odd guard, perhaps:
> > > > > #elif defined(__riscv) && __riscv_xlen == 64
> > > > >
> > > > > That way it is more intention revealing that this is riscv code. Could
> > > > > you add a comment relating to the __riscv_xlen ?
> > > >
> > > > I guess Andrew answered that already.
> > > >
> >
> > Not sure. I still think it looks weird:
> > ...
> > #if defined(__i386__) || defined(__x86_64__)
> > ...
> > #elif defined(__aarch64__)
> > ...
> > #elif __riscv_xlen == 64
> > ...
> > #else
> > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
> > static u64 read_timestamp(void) { return 0; }
> > #endif
> >
> > The first two are clearly #ifdef-ing architecture specific assembly
> > code, under what conditions I get RISC-V code ¯\(ツ)/¯ At least worth
> > a comment like "csrr is only available when you have xlens of 64
> > because ..."
>
> __riscv_xlen indicates riscv64, which is the only target of this
> patchset. But if you prefer, I don't mind adding back the
> defined(__riscv) if I re-spin a new version.

This kind of begs the question as to why there is no __riscv64 ifdef.
The issue with xlen is it isn't intention revealing so for regular
people trying to understand the code it would be nice to document it.

> >
> > > > >
> > > > > > +
> > > > > > +/* TODO: implement rv32 support */
> > > > > > +
> > > > > > +#define CSR_CYCLE 0xc00
> > > > > > +#define CSR_TIME 0xc01
> > > > > > +
> > > > > > +#define csr_read(csr) \
> > > > > > +({ \
> > > > > > + register unsigned long __v; \
> > > > > > + __asm__ __volatile__ ("csrr %0, " #csr \
> > > > > > + : "=r" (__v) : \
> > > > > > + : "memory"); \
> > > > >
> > > > > To avoid the macro pasting that could potentially go weird, could this be:
> > > > >
> > > > > __asm__ __volatile__ ("csrr %0, %1",
> > > > > : "=r"(__v) /* outputs */
> > > > > : "i"(csr) /* inputs */
> > > > > : "memory" /* clobbers */)
> > >
> > > Forgot to answer this one: it compiles, but I have to admit that I
> > > don't understand the difference and if that's correct (all macros in
> > > arch/riscv/include/asm/csr.h use # to do this) and what benefits it
> > > brings. Can you elaborate more on things that could "go weird"?
> >
> > So rather than use an input constraint for the asm block you are using
> > the C preprocessor to paste in the csr argument. If csr is something
> > like "1" then it looks good and you'll get "csrr %0,1". If you pass
> > something like "1 << 31" then that will be pasted as "csrr %0, 1 <<
> > 31" and that starts to get weird in the context of being in the
> > assembler where it is unlikely the C operators work. Using the input
> > constraint avoids this, causes the C compiler to check the type of the
> > argument and you'll probably get more intelligible error messages as a
> > consequence.
> >
>
> Thanks. So if I'm not mistaken, in this exact context, given we only
> use csr_read() through the csr_read_num() function, it seems ok right?

So you've formed a cargo cult and the justification is not wanting to
stop a copy-paste chain from somewhere else. This code itself will be
copy-pasted and we go to some ends to encourage that by placing parts
of it in include/uapi/linux/perf_event.h. It seems better to catch
this issue early rather than propagate it.

> > >
> > > Thanks again,
> > >
> > > Alex
> > >
> > > > >
> > > > > Also, why is this clobbering memory? Worth adding a comment.
> > > >
> > > > No idea, I see that it is also done this way in
> > > > arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ?
> >
> > It would seem to make sense then not to have a memory constraint until
> > we know why we're doing it?
> >
>
> I have just had the answer internally (thanks to @Brendan Sweeney):
> csr modifications can alter how the memory is accessed (satp which
> changes the address space, sum which allows/disallows userspace
> access...), so we need the memory barrier to make sure the compiler
> does not reorder the memory accesses.

The conditions you mention shouldn't apply here though? Even if you
add a memory barrier for the compiler what is stopping the hardware
reordering loads and stores? If it absolutely has to be there then a
comment something like "There is a bug is riscv where the csrr
instruction can clobber memory breaking future reads and some how this
constraint fixes it by ... ".

Thanks,
Ian

> Thanks,
>
> Alex
>
> > Thanks,
> > Ian
> >
> > > >
> > > > Thanks for your comments!
> > > >
> > > > Alex
> > > >
> > > > >
> > > > > Thanks,
> > > > > Ian
> > > > >
> > > > > > + __v; \
> > > > > > +})
> > > > > > +
> > > > > > +static unsigned long csr_read_num(int csr_num)
> > > > > > +{
> > > > > > +#define switchcase_csr_read(__csr_num, __val) {\
> > > > > > + case __csr_num: \
> > > > > > + __val = csr_read(__csr_num); \
> > > > > > + break; }
> > > > > > +#define switchcase_csr_read_2(__csr_num, __val) {\
> > > > > > + switchcase_csr_read(__csr_num + 0, __val) \
> > > > > > + switchcase_csr_read(__csr_num + 1, __val)}
> > > > > > +#define switchcase_csr_read_4(__csr_num, __val) {\
> > > > > > + switchcase_csr_read_2(__csr_num + 0, __val) \
> > > > > > + switchcase_csr_read_2(__csr_num + 2, __val)}
> > > > > > +#define switchcase_csr_read_8(__csr_num, __val) {\
> > > > > > + switchcase_csr_read_4(__csr_num + 0, __val) \
> > > > > > + switchcase_csr_read_4(__csr_num + 4, __val)}
> > > > > > +#define switchcase_csr_read_16(__csr_num, __val) {\
> > > > > > + switchcase_csr_read_8(__csr_num + 0, __val) \
> > > > > > + switchcase_csr_read_8(__csr_num + 8, __val)}
> > > > > > +#define switchcase_csr_read_32(__csr_num, __val) {\
> > > > > > + switchcase_csr_read_16(__csr_num + 0, __val) \
> > > > > > + switchcase_csr_read_16(__csr_num + 16, __val)}
> > > > > > +
> > > > > > + unsigned long ret = 0;
> > > > > > +
> > > > > > + switch (csr_num) {
> > > > > > + switchcase_csr_read_32(CSR_CYCLE, ret)
> > > > > > + default:
> > > > > > + break;
> > > > > > + }
> > > > > > +
> > > > > > + return ret;
> > > > > > +#undef switchcase_csr_read_32
> > > > > > +#undef switchcase_csr_read_16
> > > > > > +#undef switchcase_csr_read_8
> > > > > > +#undef switchcase_csr_read_4
> > > > > > +#undef switchcase_csr_read_2
> > > > > > +#undef switchcase_csr_read
> > > > > > +}
> > > > > > +
> > > > > > +static u64 read_perf_counter(unsigned int counter)
> > > > > > +{
> > > > > > + return csr_read_num(CSR_CYCLE + counter);
> > > > > > +}
> > > > > > +
> > > > > > +static u64 read_timestamp(void)
> > > > > > +{
> > > > > > + return csr_read_num(CSR_TIME);
> > > > > > +}
> > > > > > +
> > > > > > #else
> > > > > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
> > > > > > static u64 read_timestamp(void) { return 0; }
> > > > > > --
> > > > > > 2.39.2
> > > > > >

2023-07-31 20:21:48

by Jessica Clarke

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support

On 31 Jul 2023, at 17:06, Alexandre Ghiti <[email protected]> wrote:
>
> On Mon, Jul 31, 2023 at 5:10 PM Ian Rogers <[email protected]> wrote:
>>
>> On Mon, Jul 31, 2023 at 3:27 AM Alexandre Ghiti <[email protected]> wrote:
>>>
>>> On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <[email protected]> wrote:
>>>>
>>>> Hi Ian,
>>>>
>>>> On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <[email protected]> wrote:
>>>>>
>>>>> On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <[email protected]> wrote:
>>>>>>
>>>>>> riscv now supports mmaping hardware counters so add what's needed to
>>>>>> take advantage of that in libperf.
>>>>>>
>>>>>> Signed-off-by: Alexandre Ghiti <[email protected]>
>>>>>> Reviewed-by: Andrew Jones <[email protected]>
>>>>>> Reviewed-by: Atish Patra <[email protected]>
>>>>>> ---
>>>>>> tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++
>>>>>> 1 file changed, 65 insertions(+)
>>>>>>
>>>>>> diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
>>>>>> index 0d1634cedf44..378a163f0554 100644
>>>>>> --- a/tools/lib/perf/mmap.c
>>>>>> +++ b/tools/lib/perf/mmap.c
>>>>>> @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter)
>>>>>>
>>>>>> static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }
>>>>>>
>>>>>> +#elif __riscv_xlen == 64
>>>>>
>>>>> This is something of an odd guard, perhaps:
>>>>> #elif defined(__riscv) && __riscv_xlen == 64
>>>>>
>>>>> That way it is more intention revealing that this is riscv code. Could
>>>>> you add a comment relating to the __riscv_xlen ?
>>>>
>>>> I guess Andrew answered that already.
>>>>
>>
>> Not sure. I still think it looks weird:
>> ...
>> #if defined(__i386__) || defined(__x86_64__)
>> ...
>> #elif defined(__aarch64__)
>> ...
>> #elif __riscv_xlen == 64
>> ...
>> #else
>> static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
>> static u64 read_timestamp(void) { return 0; }
>> #endif
>>
>> The first two are clearly #ifdef-ing architecture specific assembly
>> code, under what conditions I get RISC-V code ¯\(ツ)/¯ At least worth
>> a comment like "csrr is only available when you have xlens of 64
>> because ..."
>
> __riscv_xlen indicates riscv64, which is the only target of this
> patchset. But if you prefer, I don't mind adding back the
> defined(__riscv) if I re-spin a new version.

I mean, -Wundef is a thing that will scream at you if you evaluate a
macro that’s undefined and get an implicit 0, and parts of the linux
build seem to enable it. So I would strongly recommend against
(ab)using that feature of the preprocessor, especially when it aligns
with greater clarity.

Jess


2023-07-31 21:31:00

by Jessica Clarke

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support

On 31 Jul 2023, at 20:47, Matthew Wilcox <[email protected]> wrote:
>
> On Mon, Jul 31, 2023 at 09:46:07AM -0700, Ian Rogers wrote:
>> On Mon, Jul 31, 2023 at 9:06 AM Alexandre Ghiti <[email protected]> wrote:
>>> I have just had the answer internally (thanks to @Brendan Sweeney):
>>> csr modifications can alter how the memory is accessed (satp which
>>> changes the address space, sum which allows/disallows userspace
>>> access...), so we need the memory barrier to make sure the compiler
>>> does not reorder the memory accesses.
>>
>> The conditions you mention shouldn't apply here though? Even if you
>> add a memory barrier for the compiler what is stopping the hardware
>> reordering loads and stores? If it absolutely has to be there then a
>> comment something like "There is a bug is riscv where the csrr
>> instruction can clobber memory breaking future reads and some how this
>> constraint fixes it by ... ".
>
> If the hardware doesn't know that writing to a csr can change how memory
> accesses are done and reorders memory accesses around that csr write,
> you've got a really broken piece of hardware on your hands ...
>
> I know nothing about risc-v, and maybe the definition says that you need
> to put a memory barrier before and/or after it in the instruction stream,
> but on all hardware I'm familiar with, writing to a CSR is an implicitly
> serialising operation.

satp has some special rules due to the interaction with TLBs. Enabling
/ disabling translation by toggling between Bare and non-Bare modes
doesn’t require any kind of fence, nor does changing the ASID (if not
recycling), but any other changes to satp (changing between
Sv39/Sv48/Sv57 or changing the page table base address) do require
fences (not really sure why to be honest, maybe something about having
separate kernel and user page tables?..). §5.2.1 Supervisor
Memory-Management Fence Instruction of the privileged spec (the one I’m
looking at is version 20211203 but dated October 4, 2022) details this.

Jess


2023-07-31 21:31:41

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support

On Mon, Jul 31, 2023 at 09:46:07AM -0700, Ian Rogers wrote:
> On Mon, Jul 31, 2023 at 9:06 AM Alexandre Ghiti <[email protected]> wrote:
> > I have just had the answer internally (thanks to @Brendan Sweeney):
> > csr modifications can alter how the memory is accessed (satp which
> > changes the address space, sum which allows/disallows userspace
> > access...), so we need the memory barrier to make sure the compiler
> > does not reorder the memory accesses.
>
> The conditions you mention shouldn't apply here though? Even if you
> add a memory barrier for the compiler what is stopping the hardware
> reordering loads and stores? If it absolutely has to be there then a
> comment something like "There is a bug is riscv where the csrr
> instruction can clobber memory breaking future reads and some how this
> constraint fixes it by ... ".

If the hardware doesn't know that writing to a csr can change how memory
accesses are done and reorders memory accesses around that csr write,
you've got a really broken piece of hardware on your hands ...

I know nothing about risc-v, and maybe the definition says that you need
to put a memory barrier before and/or after it in the instruction stream,
but on all hardware I'm familiar with, writing to a CSR is an implicitly
serialising operation.

2023-08-01 07:44:18

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v4 09/10] tools: lib: perf: Implement riscv mmap support

On Mon, Jul 31, 2023 at 6:46 PM Ian Rogers <[email protected]> wrote:
>
> On Mon, Jul 31, 2023 at 9:06 AM Alexandre Ghiti <[email protected]> wrote:
> >
> > On Mon, Jul 31, 2023 at 5:10 PM Ian Rogers <[email protected]> wrote:
> > >
> > > On Mon, Jul 31, 2023 at 3:27 AM Alexandre Ghiti <[email protected]> wrote:
> > > >
> > > > On Mon, Jul 31, 2023 at 12:15 PM Alexandre Ghiti <[email protected]> wrote:
> > > > >
> > > > > Hi Ian,
> > > > >
> > > > > On Fri, Jul 28, 2023 at 7:53 PM Ian Rogers <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Jul 27, 2023 at 7:28 AM Alexandre Ghiti <[email protected]> wrote:
> > > > > > >
> > > > > > > riscv now supports mmaping hardware counters so add what's needed to
> > > > > > > take advantage of that in libperf.
> > > > > > >
> > > > > > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > > > > > Reviewed-by: Andrew Jones <[email protected]>
> > > > > > > Reviewed-by: Atish Patra <[email protected]>
> > > > > > > ---
> > > > > > > tools/lib/perf/mmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++
> > > > > > > 1 file changed, 65 insertions(+)
> > > > > > >
> > > > > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > > > > > > index 0d1634cedf44..378a163f0554 100644
> > > > > > > --- a/tools/lib/perf/mmap.c
> > > > > > > +++ b/tools/lib/perf/mmap.c
> > > > > > > @@ -392,6 +392,71 @@ static u64 read_perf_counter(unsigned int counter)
> > > > > > >
> > > > > > > static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }
> > > > > > >
> > > > > > > +#elif __riscv_xlen == 64
> > > > > >
> > > > > > This is something of an odd guard, perhaps:
> > > > > > #elif defined(__riscv) && __riscv_xlen == 64
> > > > > >
> > > > > > That way it is more intention revealing that this is riscv code. Could
> > > > > > you add a comment relating to the __riscv_xlen ?
> > > > >
> > > > > I guess Andrew answered that already.
> > > > >
> > >
> > > Not sure. I still think it looks weird:
> > > ...
> > > #if defined(__i386__) || defined(__x86_64__)
> > > ...
> > > #elif defined(__aarch64__)
> > > ...
> > > #elif __riscv_xlen == 64
> > > ...
> > > #else
> > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
> > > static u64 read_timestamp(void) { return 0; }
> > > #endif
> > >
> > > The first two are clearly #ifdef-ing architecture specific assembly
> > > code, under what conditions I get RISC-V code ¯\(ツ)/¯ At least worth
> > > a comment like "csrr is only available when you have xlens of 64
> > > because ..."
> >
> > __riscv_xlen indicates riscv64, which is the only target of this
> > patchset. But if you prefer, I don't mind adding back the
> > defined(__riscv) if I re-spin a new version.
>
> This kind of begs the question as to why there is no __riscv64 ifdef.
> The issue with xlen is it isn't intention revealing so for regular
> people trying to understand the code it would be nice to document it.

I understand, I'll add the defined(__riscv) and a comment to explain
the __riscv_xlen.

>
> > >
> > > > > >
> > > > > > > +
> > > > > > > +/* TODO: implement rv32 support */
> > > > > > > +
> > > > > > > +#define CSR_CYCLE 0xc00
> > > > > > > +#define CSR_TIME 0xc01
> > > > > > > +
> > > > > > > +#define csr_read(csr) \
> > > > > > > +({ \
> > > > > > > + register unsigned long __v; \
> > > > > > > + __asm__ __volatile__ ("csrr %0, " #csr \
> > > > > > > + : "=r" (__v) : \
> > > > > > > + : "memory"); \
> > > > > >
> > > > > > To avoid the macro pasting that could potentially go weird, could this be:
> > > > > >
> > > > > > __asm__ __volatile__ ("csrr %0, %1",
> > > > > > : "=r"(__v) /* outputs */
> > > > > > : "i"(csr) /* inputs */
> > > > > > : "memory" /* clobbers */)
> > > >
> > > > Forgot to answer this one: it compiles, but I have to admit that I
> > > > don't understand the difference and if that's correct (all macros in
> > > > arch/riscv/include/asm/csr.h use # to do this) and what benefits it
> > > > brings. Can you elaborate more on things that could "go weird"?
> > >
> > > So rather than use an input constraint for the asm block you are using
> > > the C preprocessor to paste in the csr argument. If csr is something
> > > like "1" then it looks good and you'll get "csrr %0,1". If you pass
> > > something like "1 << 31" then that will be pasted as "csrr %0, 1 <<
> > > 31" and that starts to get weird in the context of being in the
> > > assembler where it is unlikely the C operators work. Using the input
> > > constraint avoids this, causes the C compiler to check the type of the
> > > argument and you'll probably get more intelligible error messages as a
> > > consequence.
> > >
> >
> > Thanks. So if I'm not mistaken, in this exact context, given we only
> > use csr_read() through the csr_read_num() function, it seems ok right?
>
> So you've formed a cargo cult and the justification is not wanting to
> stop a copy-paste chain from somewhere else. This code itself will be
> copy-pasted and we go to some ends to encourage that by placing parts
> of it in include/uapi/linux/perf_event.h. It seems better to catch
> this issue early rather than propagate it.

OK, nothing to argue here, I'll use your first proposition in the next version.

>
> > > >
> > > > Thanks again,
> > > >
> > > > Alex
> > > >
> > > > > >
> > > > > > Also, why is this clobbering memory? Worth adding a comment.
> > > > >
> > > > > No idea, I see that it is also done this way in
> > > > > arch/riscv/include/asm/csr.h. @Atish Kumar Patra , @Palmer Dabbelt ?
> > >
> > > It would seem to make sense then not to have a memory constraint until
> > > we know why we're doing it?
> > >
> >
> > I have just had the answer internally (thanks to @Brendan Sweeney):
> > csr modifications can alter how the memory is accessed (satp which
> > changes the address space, sum which allows/disallows userspace
> > access...), so we need the memory barrier to make sure the compiler
> > does not reorder the memory accesses.
>
> The conditions you mention shouldn't apply here though? Even if you
> add a memory barrier for the compiler what is stopping the hardware
> reordering loads and stores? If it absolutely has to be there then a
> comment something like "There is a bug is riscv where the csrr
> instruction can clobber memory breaking future reads and some how this
> constraint fixes it by ... ".

You're right, it does not apply here, so I can remove this memory
barrier. Regarding the hardware that could speculate across a csr
instruction, that's dealt with in the riscv spec:

"In particular, a CSR access is performed after the execution of any
prior instructions in program order whose behavior modifies or is
modified by the CSR state and before the execution of any subsequent
instructions in program order whose behavior modifies or is modified
by the CSR state"

Thanks for your comments,

Alex

>
> Thanks,
> Ian
>
> > Thanks,
> >
> > Alex
> >
> > > Thanks,
> > > Ian
> > >
> > > > >
> > > > > Thanks for your comments!
> > > > >
> > > > > Alex
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Ian
> > > > > >
> > > > > > > + __v; \
> > > > > > > +})
> > > > > > > +
> > > > > > > +static unsigned long csr_read_num(int csr_num)
> > > > > > > +{
> > > > > > > +#define switchcase_csr_read(__csr_num, __val) {\
> > > > > > > + case __csr_num: \
> > > > > > > + __val = csr_read(__csr_num); \
> > > > > > > + break; }
> > > > > > > +#define switchcase_csr_read_2(__csr_num, __val) {\
> > > > > > > + switchcase_csr_read(__csr_num + 0, __val) \
> > > > > > > + switchcase_csr_read(__csr_num + 1, __val)}
> > > > > > > +#define switchcase_csr_read_4(__csr_num, __val) {\
> > > > > > > + switchcase_csr_read_2(__csr_num + 0, __val) \
> > > > > > > + switchcase_csr_read_2(__csr_num + 2, __val)}
> > > > > > > +#define switchcase_csr_read_8(__csr_num, __val) {\
> > > > > > > + switchcase_csr_read_4(__csr_num + 0, __val) \
> > > > > > > + switchcase_csr_read_4(__csr_num + 4, __val)}
> > > > > > > +#define switchcase_csr_read_16(__csr_num, __val) {\
> > > > > > > + switchcase_csr_read_8(__csr_num + 0, __val) \
> > > > > > > + switchcase_csr_read_8(__csr_num + 8, __val)}
> > > > > > > +#define switchcase_csr_read_32(__csr_num, __val) {\
> > > > > > > + switchcase_csr_read_16(__csr_num + 0, __val) \
> > > > > > > + switchcase_csr_read_16(__csr_num + 16, __val)}
> > > > > > > +
> > > > > > > + unsigned long ret = 0;
> > > > > > > +
> > > > > > > + switch (csr_num) {
> > > > > > > + switchcase_csr_read_32(CSR_CYCLE, ret)
> > > > > > > + default:
> > > > > > > + break;
> > > > > > > + }
> > > > > > > +
> > > > > > > + return ret;
> > > > > > > +#undef switchcase_csr_read_32
> > > > > > > +#undef switchcase_csr_read_16
> > > > > > > +#undef switchcase_csr_read_8
> > > > > > > +#undef switchcase_csr_read_4
> > > > > > > +#undef switchcase_csr_read_2
> > > > > > > +#undef switchcase_csr_read
> > > > > > > +}
> > > > > > > +
> > > > > > > +static u64 read_perf_counter(unsigned int counter)
> > > > > > > +{
> > > > > > > + return csr_read_num(CSR_CYCLE + counter);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static u64 read_timestamp(void)
> > > > > > > +{
> > > > > > > + return csr_read_num(CSR_TIME);
> > > > > > > +}
> > > > > > > +
> > > > > > > #else
> > > > > > > static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
> > > > > > > static u64 read_timestamp(void) { return 0; }
> > > > > > > --
> > > > > > > 2.39.2
> > > > > > >