LinuxLists.cc - [PATCH 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V

2018-04-02 12:33:08

Subject: [PATCH 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V

This implements the baseline PMU for RISC-V platforms.

To ease future PMU portings, a guide is also written, containing
perf concepts, arch porting practices and some hints.

Changes in v2:
- Fix the bug reported by Alex, which was caused by not sufficient
initialization. Check https://lkml.org/lkml/2018/3/31/251 for the
discussion.

Alan Kao (2):
perf: riscv: preliminary RISC-V support
perf: riscv: Add Document for Future Porting Guide

Documentation/riscv/pmu.txt | 249 +++++++++++++++++++
arch/riscv/Kconfig | 12 +
arch/riscv/include/asm/perf_event.h | 76 +++++-
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/perf_event.c | 468 ++++++++++++++++++++++++++++++++++++
5 files changed, 802 insertions(+), 4 deletions(-)
create mode 100644 Documentation/riscv/pmu.txt
create mode 100644 arch/riscv/kernel/perf_event.c

--
2.16.2

2018-04-02 12:33:36

by Alan Kao

[permalink] [raw]

Subject: [PATCH v2 2/2] perf: riscv: Add Document for Future Porting Guide

Cc: Nick Hu <[email protected]>
Cc: Greentime Hu <[email protected]>
Signed-off-by: Alan Kao <[email protected]>
---
Documentation/riscv/pmu.txt | 249 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 249 insertions(+)
create mode 100644 Documentation/riscv/pmu.txt

diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt
new file mode 100644
index 000000000000..a3e930ed5141
--- /dev/null
+++ b/Documentation/riscv/pmu.txt
@@ -0,0 +1,249 @@
+Supporting PMUs on RISC-V platforms
+==========================================
+Alan Kao <[email protected]>, Mar 2018
+
+Introduction
+------------
+
+As of this writing, perf_event-related features mentioned in The RISC-V ISA
+Privileged Version 1.10 are as follows:
+(please check the manual for more details)
+
+* [m|s]counteren
+* mcycle[h], cycle[h]
+* minstret[h], instret[h]
+* mhpeventx, mhpcounterx[h]
+
+With such function set only, porting perf would require a lot of work, due to
+the lack of the following general architectural performance monitoring features:
+
+* Enabling/Disabling counters
+ Counters are just free-running all the time in our case.
+* Interrupt caused by counter overflow
+ No such design in the spec.
+* Interrupt indicator
+ It is not possible to have many interrupt ports for all counters, so an
+ interrupt indicator is required for software to tell which counter has
+ just overflowed.
+* Writing to counters
+ There will be an SBI to support this since the kernel cannot modify the
+ counters [1]. Alternatively, some vendor considers to implement
+ hardware-extension for M-S-U model machines to write counters directly.
+
+This document aims to provide developers a quick guide on supporting their
+PMUs in the kernel. The following sections briefly explain perf' mechanism
+and todos.
+
+You may check previous discussions here [1][2]. Also, it might be helpful
+to check the appendix for related kernel structures.
+
+
+1. Initialization
+-----------------
+
+*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains
+various methods according to perf's internal convention and PMU-specific
+parameters. One should declare such instance to represent the PMU. By default,
+*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very
+basic support to a baseline QEMU model.
+
+Then he/she can either assign the instance's pointer to *riscv_pmu* so that
+the minimal and already-implemented logic can be leveraged, or invent his/her
+own *riscv_init_platform_pmu* implementation.
+
+In other words, existing sources of *riscv_base_pmu* merely provide a
+reference implementation. Developers can flexibly decide how many parts they
+can leverage, and in the most extreme case, they can customize every function
+according to their needs.
+
+
+2. Event Initialization
+-----------------------
+
+When a user launches a perf command to monitor some events, it is first
+interpreted by the userspace perf tool into multiple *perf_event_open*
+system calls, and then each of them calls to the body of *event_init*
+member function that was assigned in the previous step. In *riscv_base_pmu*'s
+case, it is *riscv_event_init*.
+
+The main purpose of this function is to translate the event provided by user
+into bitmap, so that HW-related control registers or counters can directly be
+manipulated. The translation is based on the mappings and methods provided in
+*riscv_pmu*.
+
+Note that some features can be done in this stage as well:
+
+(1) interrupt setting, which is stated in the next section;
+(2) privilege level setting (user space only, kernel space only, both);
+(3) destructor setting. Normally it is sufficient to apply *riscv_destroy_event*;
+(4) tweaks for non-sampling events, which will be utilized by functions such as
+*perf_adjust_period*, usually something like the follows:
+
+if (!is_sampling_event(event)) {
+ hwc->sample_period = x86_pmu.max_period;
+ hwc->last_period = hwc->sample_period;
+ local64_set(&hwc->period_left, hwc->sample_period);
+}
+
+In the case of *riscv_base_pmu*, only (3) is provided for now.
+
+
+3. Interrupt
+------------
+
+3.1. Interrupt Initialization
+
+This often occurs at the beginning of the *event_init* method. In common
+practice, this should be a code segment like
+
+int x86_reserve_hardware(void)
+{
+ int err = 0;
+
+ if (!atomic_inc_not_zero(&pmc_refcount)) {
+ mutex_lock(&pmc_reserve_mutex);
+ if (atomic_read(&pmc_refcount) == 0) {
+ if (!reserve_pmc_hardware())
+ err = -EBUSY;
+ else
+ reserve_ds_buffers();
+ }
+ if (!err)
+ atomic_inc(&pmc_refcount);
+ mutex_unlock(&pmc_reserve_mutex);
+ }
+
+ return err;
+}
+
+And the magic is in *reserve_pmc_hardware*, which usually does atomic
+operations to make implemented IRQ accessible from some global function pointer.
+*release_pmc_hardware* serves the opposite purpose, and it is used in event
+destructors mentioned in previous section.
+
+(Note: From the implementations in all the architectures, the *reserve/release*
+pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading.
+It does NOT deal with the binding between an event and a physical counter,
+which will be introduced in the next section.)
+
+3.2. IRQ Structure
+
+Basically, a IRQ runs the following pseudo code:
+
+for each hardware counter that triggered this overflow
+
+ get the event of this counter
+
+ // following two steps are defined as *read()*,
+ // check the section Reading/Writing Counters for details.
+ count the delta value since previous interrupt
+ update the event->count (# event occurs) by adding delta, and
+ event->hw.period_left by subtracting delta
+
+ if the event overflows
+ sample data
+ set the counter appropriately for the next overflow
+
+ if the event overflows again
+ too frequently, throttle this event
+ fi
+ fi
+
+end for
+
+However as of this writing, none of the RISC-V implementations have designed an
+interrupt for perf, so the details are to be completed in the future.
+
+4. Reading/Writing Counters
+---------------------------
+
+They seem symmetric but perf treats them quite differently. For reading, there
+is a *read* interface in *struct pmu*, but it serves more than just reading.
+According to the context, the *read* function not only read the content of the
+counter (event->count), but also update the left period to the next interrupt
+(event->hw.period_left).
+
+But the core of perf does not need direct write to counters. Writing counters
+hides behind the abstraction of 1) *pmu->start*, literally start counting so one
+has to set the counter to a good value for the next interrupt; 2) inside the IRQ
+it should set the counter with the same reason.
+
+Reading is not a problem in RISC-V but writing would need some effort, since
+counters are not allowed to be written by S-mode.
+
+
+5. add()/del()/start()/stop()
+-----------------------------
+
+Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop()
+starts/stop the counter of some event in the PMU. All of them take the same
+arguments: *struct perf_event *event* and *int flag*.
+
+Consider perf as a state machine, then you will find that these functions serve
+as the state transition process between those states.
+Three states (event->hw.state) are defined:
+
+* PERF_HES_STOPPED: the counter is stopped
+* PERF_HES_UPTODATE: the event->count is up-to-date
+* PERF_HES_ARCH: arch-dependent usage ... we don't need this for now
+
+A normal flow of these state transitions are as follows:
+
+* A user launches a perf event, resulting in calling to *event_init*.
+* When being context-switched in, *add* is called by the perf core, with flag
+ PERF_EF_START, which mean that the event should be started after it is added.
+ In this stage, an general event is binded to a physical counter, if any.
+ The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now
+ stopped, and the (software) event count does not need updating.
+** *start* is then called, and the counter is enabled.
+ With flag PERF_EF_RELOAD, it write the counter to an appropriate value (check
+ previous section for detail).
+ No writing is made if the flag does not contain PERF_EF_RELOAD.
+ The state now is reset to none, because it is neither stopped nor update
+ (the counting already starts)
+* When being context-switched out, *del* is called. It then checkout all the
+ events in the PMU and call *stop* to update their counts.
+** *stop* is called by *del*
+ and the perf core with flag PERF_EF_UPDATE, and it often shares the same
+ subroutine as *read* with the same logic.
+ The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again.
+
+** Life cycles of these two pairs: *add* and *del* are called repeatedly as
+ tasks switch in-and-out; *start* and *stop* is also called when the perf core
+ needs a quick stop-and-start, for instance, when the interrupt period is being
+ adjusted.
+
+Current implementation is sufficient for now and can be easily extend to
+features in the future.
+
+A. Related Structures
+---------------------
+
+* struct pmu: include/linux/perf_events.h
+* struct riscv_pmu: arch/riscv/include/asm/perf_events.h
+
+ Both structures are designed to be read-only.
+
+ *struct pmu* defines some function pointer interfaces, and most of them take
+*struct perf_event* as a main argument, dealing with perf events according to
+perf's internal state machine (check kernel/events/core.c for details).
+
+ *struct riscv_pmu* defines PMU-specific parameters. The naming follows the
+convention of all other architectures.
+
+* struct perf_event: include/linux/perf_events.h
+* struct hw_perf_event
+
+ The generic structure that represents perf events, and the hardware-related
+details.
+
+* struct riscv_hw_events: arch/riscv/include/asm/perf_events.h
+
+ The structure that holds the status of events, has two fixed members:
+the number of events and the array of the events.
+
+References
+----------
+
+[1] https://github.com/riscv/riscv-linux/pull/124
+[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA
--
2.16.2

2018-04-02 12:34:40

by Alan Kao

[permalink] [raw]

Subject: [PATCH v2 1/2] perf: riscv: preliminary RISC-V support

This patch provide a basic PMU, riscv_base_pmu, which supports two
general hardware event, instructions and cycles. Furthermore, this
PMU serves as a reference implementation to ease the portings in
the future.

riscv_base_pmu should be able to run on any RISC-V machine that
conforms to the Priv-Spec. Note that the latest qemu model hasn't
fully support a proper behavior of Priv-Spec 1.10 yet, but work
around should be easy with very small fixes. Please check
https://github.com/riscv/riscv-qemu/pull/115 for future updates.

Cc: Nick Hu <[email protected]>
Cc: Greentime Hu <[email protected]>
Signed-off-by: Alan Kao <[email protected]>
---
arch/riscv/Kconfig | 12 +
arch/riscv/include/asm/perf_event.h | 76 +++++-
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/perf_event.c | 468 ++++++++++++++++++++++++++++++++++++
4 files changed, 553 insertions(+), 4 deletions(-)
create mode 100644 arch/riscv/kernel/perf_event.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index c22ebe08e902..3fbf19456c9a 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -203,6 +203,18 @@ config RISCV_ISA_C
config RISCV_ISA_A
def_bool y

+menu "PMU type"
+ depends on PERF_EVENTS
+
+config RISCV_BASE_PMU
+ bool "Base Performance Monitoring Unit"
+ def_bool y
+ help
+ A base PMU that serves as a reference implementation and has limited
+ feature of perf.
+
+endmenu
+
endmenu

menu "Kernel type"
diff --git a/arch/riscv/include/asm/perf_event.h b/arch/riscv/include/asm/perf_event.h
index e13d2ff29e83..98e2efb02d25 100644
--- a/arch/riscv/include/asm/perf_event.h
+++ b/arch/riscv/include/asm/perf_event.h
@@ -1,13 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (C) 2018 SiFive
+ * Copyright (C) 2018 Andes Technology Corporation
*
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public Licence
- * as published by the Free Software Foundation; either version
- * 2 of the Licence, or (at your option) any later version.
*/

#ifndef _ASM_RISCV_PERF_EVENT_H
#define _ASM_RISCV_PERF_EVENT_H

+#include <linux/perf_event.h>
+#include <linux/ptrace.h>
+
+#define RISCV_BASE_COUNTERS 2
+
+/*
+ * The RISCV_MAX_COUNTERS parameter should be specified.
+ */
+
+#ifdef CONFIG_RISCV_BASE_PMU
+#define RISCV_MAX_COUNTERS 2
+#endif
+
+#ifndef RISCV_MAX_COUNTERS
+#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU."
+#endif
+
+/*
+ * These are the indexes of bits in counteren register *minus* 1,
+ * except for cycle. It would be coherent if it can directly mapped
+ * to counteren bit definition, but there is a *time* register at
+ * counteren[1]. Per-cpu structure is scarce resource here.
+ *
+ * According to the spec, an implementation can support counter up to
+ * mhpmcounter31, but many high-end processors has at most 6 general
+ * PMCs, we give the definition to MHPMCOUNTER8 here.
+ */
+#define RISCV_PMU_CYCLE 0
+#define RISCV_PMU_INSTRET 1
+#define RISCV_PMU_MHPMCOUNTER3 2
+#define RISCV_PMU_MHPMCOUNTER4 3
+#define RISCV_PMU_MHPMCOUNTER5 4
+#define RISCV_PMU_MHPMCOUNTER6 5
+#define RISCV_PMU_MHPMCOUNTER7 6
+#define RISCV_PMU_MHPMCOUNTER8 7
+
+#define RISCV_OP_UNSUPP (-EOPNOTSUPP)
+
+struct cpu_hw_events {
+ /* # currently enabled events*/
+ int n_events;
+ /* currently enabled events */
+ struct perf_event *events[RISCV_MAX_COUNTERS];
+ /* vendor-defined PMU data */
+ void *platform;
+};
+
+struct riscv_pmu {
+ struct pmu *pmu;
+
+ /* generic hw/cache events table */
+ const int *hw_events;
+ const int (*cache_events)[PERF_COUNT_HW_CACHE_MAX]
+ [PERF_COUNT_HW_CACHE_OP_MAX]
+ [PERF_COUNT_HW_CACHE_RESULT_MAX];
+ /* method used to map hw/cache events */
+ int (*map_hw_event)(u64 config);
+ int (*map_cache_event)(u64 config);
+
+ /* max generic hw events in map */
+ int max_events;
+ /* number total counters, 2(base) + x(general) */
+ int num_counters;
+ /* the width of the counter */
+ int counter_width;
+
+ /* vendor-defined PMU features */
+ void *platform;
+};
+
#endif /* _ASM_RISCV_PERF_EVENT_H */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index ffa439d4a364..f50d19816757 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -39,5 +39,6 @@ obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
obj-$(CONFIG_FUNCTION_TRACER) += mcount.o
obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
+obj-$(CONFIG_PERF_EVENTS) += perf_event.o

clean:
diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_event.c
new file mode 100644
index 000000000000..cac4abd0a884
--- /dev/null
+++ b/arch/riscv/kernel/perf_event.c
@@ -0,0 +1,468 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2008 Thomas Gleixner <[email protected]>
+ * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar
+ * Copyright (C) 2009 Jaswinder Singh Rajput
+ * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter
+ * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra
+ * Copyright (C) 2009 Intel Corporation, <[email protected]>
+ * Copyright (C) 2009 Google, Inc., Stephane Eranian
+ * Copyright 2014 Tilera Corporation. All Rights Reserved.
+ * Copyright (C) 2018 Andes Technology Corporation
+ *
+ * Perf_events support for RISC-V platforms.
+ *
+ * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough
+ * functionality for perf event to fully work, this file provides
+ * the very basic framework only.
+ *
+ * For platform portings, please check Documentations/riscv/pmu.txt.
+ *
+ * The Copyright line includes x86 and tile ones.
+ */
+
+#include <linux/kprobes.h>
+#include <linux/kernel.h>
+#include <linux/kdebug.h>
+#include <linux/mutex.h>
+#include <linux/bitmap.h>
+#include <linux/irq.h>
+#include <linux/interrupt.h>
+#include <linux/perf_event.h>
+#include <linux/atomic.h>
+#include <asm/perf_event.h>
+
+static const struct riscv_pmu *riscv_pmu __read_mostly;
+static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
+
+/*
+ * Hardware & cache maps and their methods
+ */
+
+static const int riscv_hw_event_map[] = {
+ [PERF_COUNT_HW_CPU_CYCLES] = RISCV_PMU_CYCLE,
+ [PERF_COUNT_HW_INSTRUCTIONS] = RISCV_PMU_INSTRET,
+ [PERF_COUNT_HW_CACHE_REFERENCES] = RISCV_OP_UNSUPP,
+ [PERF_COUNT_HW_CACHE_MISSES] = RISCV_OP_UNSUPP,
+ [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = RISCV_OP_UNSUPP,
+ [PERF_COUNT_HW_BRANCH_MISSES] = RISCV_OP_UNSUPP,
+ [PERF_COUNT_HW_BUS_CYCLES] = RISCV_OP_UNSUPP,
+};
+
+#define C(x) PERF_COUNT_HW_CACHE_##x
+static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX]
+[PERF_COUNT_HW_CACHE_OP_MAX]
+[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
+ [C(L1D)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ },
+ [C(L1I)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ },
+ [C(LL)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ },
+ [C(DTLB)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ },
+ [C(ITLB)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ },
+ [C(BPU)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
+ [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
+ },
+ },
+};
+
+static int riscv_map_hw_event(u64 config)
+{
+ if (config >= riscv_pmu->max_events)
+ return -EINVAL;
+
+ return riscv_pmu->hw_events[config];
+}
+
+int riscv_map_cache_decode(u64 config, unsigned int *type,
+ unsigned int *op, unsigned int *result)
+{
+ return -ENOENT;
+}
+
+static int riscv_map_cache_event(u64 config)
+{
+ unsigned int type, op, result;
+ int err = -ENOENT;
+ int code;
+
+ err = riscv_map_cache_decode(config, &type, &op, &result);
+ if (!riscv_pmu->cache_events || err)
+ return err;
+
+ if (type >= PERF_COUNT_HW_CACHE_MAX ||
+ op >= PERF_COUNT_HW_CACHE_OP_MAX ||
+ result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
+ return -EINVAL;
+
+ code = (*riscv_pmu->cache_events)[type][op][result];
+ if (code == RISCV_OP_UNSUPP)
+ return -EINVAL;
+
+ return code;
+}
+
+/*
+ * Low-level functions: reading/writing counters
+ */
+
+static inline u64 read_counter(int idx)
+{
+ u64 val = 0;
+
+ switch (idx) {
+ case RISCV_PMU_CYCLE:
+ val = csr_read(cycle);
+ break;
+ case RISCV_PMU_INSTRET:
+ val = csr_read(instret);
+ break;
+ default:
+ WARN_ON_ONCE(idx < 0 || idx > RISCV_MAX_COUNTERS);
+ return -EINVAL;
+ }
+
+ return val;
+}
+
+static inline void write_counter(int idx, u64 value)
+{
+ /* currently not supported */
+}
+
+/*
+ * pmu->read: read and update the counter
+ *
+ * Other architectures' implementation often have a xxx_perf_event_update
+ * routine, which can return counter values when called in the IRQ, but
+ * return void when being called by the pmu->read method.
+ */
+static void riscv_pmu_read(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ u64 prev_raw_count, new_raw_count;
+ u64 oldval;
+ int idx = hwc->idx;
+ u64 delta;
+
+ do {
+ prev_raw_count = local64_read(&hwc->prev_count);
+ new_raw_count = read_counter(idx);
+
+ oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count,
+ new_raw_count);
+ } while (oldval != prev_raw_count);
+
+ /*
+ * delta is the value to update the counter we maintain in the kernel.
+ */
+ delta = (new_raw_count - prev_raw_count) &
+ ((1ULL << riscv_pmu->counter_width) - 1);
+ local64_add(delta, &event->count);
+ /*
+ * Something like local64_sub(delta, &hwc->period_left) here is
+ * needed if there is an interrupt for perf.
+ */
+}
+
+/*
+ * State transition functions:
+ *
+ * stop()/start() & add()/del()
+ */
+
+/*
+ * pmu->stop: stop the counter
+ */
+static void riscv_pmu_stop(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
+ hwc->state |= PERF_HES_STOPPED;
+
+ if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+ riscv_pmu_read(event);
+ hwc->state |= PERF_HES_UPTODATE;
+ }
+}
+
+/*
+ * pmu->start: start the event.
+ */
+static void riscv_pmu_start(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
+ return;
+
+ if (flags & PERF_EF_RELOAD) {
+ WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+
+ /*
+ * Set the counter to the period to the next interrupt here,
+ * if you have any.
+ */
+ }
+
+ hwc->state = 0;
+ perf_event_update_userpage(event);
+
+ /*
+ * Since we cannot write to counters, this serves as an initialization
+ * to the delta-mechanism in pmu->read(); otherwise, the delta would be
+ * wrong when pmu->read is called for the first time.
+ */
+ local64_set(&hwc->prev_count, read_counter(hwc->idx));
+}
+
+/*
+ * pmu->add: add the event to PMU.
+ */
+static int riscv_pmu_add(struct perf_event *event, int flags)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (cpuc->n_events == riscv_pmu->num_counters)
+ return -ENOSPC;
+
+ /*
+ * We don't have general conunters, so no binding-event-to-counter
+ * process here.
+ *
+ * Indexing using hwc->config generally not works, since config may
+ * contain extra information, but here the only info we have in
+ * hwc->config is the event index.
+ */
+ hwc->idx = hwc->config;
+ cpuc->events[hwc->idx] = event;
+ cpuc->n_events++;
+
+ hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+
+ if (flags & PERF_EF_START)
+ riscv_pmu_start(event, PERF_EF_RELOAD);
+
+ return 0;
+}
+
+/*
+ * pmu->del: delete the event from PMU.
+ */
+static void riscv_pmu_del(struct perf_event *event, int flags)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ struct hw_perf_event *hwc = &event->hw;
+
+ cpuc->events[hwc->idx] = NULL;
+ cpuc->n_events--;
+ riscv_pmu_stop(event, PERF_EF_UPDATE);
+ perf_event_update_userpage(event);
+}
+
+/*
+ * Interrupt
+ */
+
+static DEFINE_MUTEX(pmc_reserve_mutex);
+typedef void (*perf_irq_t)(void *riscv_perf_irq);
+perf_irq_t perf_irq;
+
+void riscv_pmu_handle_irq(void *riscv_perf_irq)
+{
+}
+
+static perf_irq_t reserve_pmc_hardware(void)
+{
+ perf_irq_t old;
+
+ mutex_lock(&pmc_reserve_mutex);
+ old = perf_irq;
+ perf_irq = &riscv_pmu_handle_irq;
+ mutex_unlock(&pmc_reserve_mutex);
+
+ return old;
+}
+
+void release_pmc_hardware(void)
+{
+ mutex_lock(&pmc_reserve_mutex);
+ perf_irq = NULL;
+ mutex_unlock(&pmc_reserve_mutex);
+}
+
+/*
+ * Event Initialization
+ */
+
+static atomic_t riscv_active_events;
+
+static void riscv_event_destroy(struct perf_event *event)
+{
+ if (atomic_dec_return(&riscv_active_events) == 0)
+ release_pmc_hardware();
+}
+
+static int riscv_event_init(struct perf_event *event)
+{
+ struct perf_event_attr *attr = &event->attr;
+ struct hw_perf_event *hwc = &event->hw;
+ perf_irq_t old_irq_handler = NULL;
+ int code;
+
+ if (atomic_inc_return(&riscv_active_events) == 1)
+ old_irq_handler = reserve_pmc_hardware();
+
+ if (old_irq_handler) {
+ pr_warn("PMC hardware busy (reserved by oprofile)\n");
+ atomic_dec(&riscv_active_events);
+ return -EBUSY;
+ }
+
+ switch (event->attr.type) {
+ case PERF_TYPE_HARDWARE:
+ code = riscv_pmu->map_hw_event(attr->config);
+ break;
+ case PERF_TYPE_HW_CACHE:
+ code = riscv_pmu->map_cache_event(attr->config);
+ break;
+ case PERF_TYPE_RAW:
+ return -EOPNOTSUPP;
+ default:
+ return -ENOENT;
+ }
+
+ event->destroy = riscv_event_destroy;
+ if (code < 0) {
+ event->destroy(event);
+ return code;
+ }
+
+ /*
+ * idx is set to -1 because the index of a general event should not be
+ * decided until binding to some counter in pmu->add().
+ *
+ * But since we don't have such support, later in pmu->add(), we just
+ * use hwc->config as the index instead.
+ */
+ hwc->config = code;
+ hwc->idx = -1;
+
+ return 0;
+}
+
+/*
+ * Initialization
+ */
+
+static struct pmu min_pmu = {
+ .name = "riscv-base",
+ .event_init = riscv_event_init,
+ .add = riscv_pmu_add,
+ .del = riscv_pmu_del,
+ .start = riscv_pmu_start,
+ .stop = riscv_pmu_stop,
+ .read = riscv_pmu_read,
+};
+
+static const struct riscv_pmu riscv_base_pmu = {
+ .pmu = &min_pmu,
+ .max_events = ARRAY_SIZE(riscv_hw_event_map),
+ .map_hw_event = riscv_map_hw_event,
+ .hw_events = riscv_hw_event_map,
+ .map_cache_event = riscv_map_cache_event,
+ .cache_events = &riscv_cache_event_map,
+ .counter_width = 63,
+ .num_counters = RISCV_BASE_COUNTERS + 0,
+};
+
+struct pmu * __weak __init riscv_init_platform_pmu(void)
+{
+ riscv_pmu = &riscv_base_pmu;
+ return riscv_pmu->pmu;
+}
+
+int __init init_hw_perf_events(void)
+{
+ struct pmu *pmu = riscv_init_platform_pmu();
+
+ perf_irq = NULL;
+ perf_pmu_register(pmu, "cpu", PERF_TYPE_RAW);
+ return 0;
+}
+arch_initcall(init_hw_perf_events);
--
2.16.2

2018-04-02 12:35:42

by Alan Kao

[permalink] [raw]

Subject: Re: [PATCH v2 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V

Sorry for the lack of version prefix in the title. This patchset should be
version 2.

On Mon, Apr 02, 2018 at 08:31:22PM +0800, Alan Kao wrote:
> This implements the baseline PMU for RISC-V platforms.
>
> To ease future PMU portings, a guide is also written, containing
> perf concepts, arch porting practices and some hints.
>
> Changes in v2:
> - Fix the bug reported by Alex, which was caused by not sufficient
> initialization. Check https://lkml.org/lkml/2018/3/31/251 for the
> discussion.
>
> Alan Kao (2):
> perf: riscv: preliminary RISC-V support
> perf: riscv: Add Document for Future Porting Guide
>
> Documentation/riscv/pmu.txt | 249 +++++++++++++++++++
> arch/riscv/Kconfig | 12 +
> arch/riscv/include/asm/perf_event.h | 76 +++++-
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/perf_event.c | 468 ++++++++++++++++++++++++++++++++++++
> 5 files changed, 802 insertions(+), 4 deletions(-)
> create mode 100644 Documentation/riscv/pmu.txt
> create mode 100644 arch/riscv/kernel/perf_event.c
>
> --
> 2.16.2
>

2018-04-03 01:37:12

by Alex Solomatnikov

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] perf: riscv: preliminary RISC-V support

This works for cycle and instruction counts.

Alex

On Mon, Apr 2, 2018 at 5:31 AM, Alan Kao <[email protected]> wrote:
>
> This patch provide a basic PMU, riscv_base_pmu, which supports two
> general hardware event, instructions and cycles. Furthermore, this
> PMU serves as a reference implementation to ease the portings in
> the future.
>
> riscv_base_pmu should be able to run on any RISC-V machine that
> conforms to the Priv-Spec. Note that the latest qemu model hasn't
> fully support a proper behavior of Priv-Spec 1.10 yet, but work
> around should be easy with very small fixes. Please check
> https://github.com/riscv/riscv-qemu/pull/115 for future updates.
>
> Cc: Nick Hu <[email protected]>
> Cc: Greentime Hu <[email protected]>
> Signed-off-by: Alan Kao <[email protected]>
> ---
> arch/riscv/Kconfig | 12 +
> arch/riscv/include/asm/perf_event.h | 76 +++++-
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/perf_event.c | 468 ++++++++++++++++++++++++++++++++++++
> 4 files changed, 553 insertions(+), 4 deletions(-)
> create mode 100644 arch/riscv/kernel/perf_event.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index c22ebe08e902..3fbf19456c9a 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -203,6 +203,18 @@ config RISCV_ISA_C
> config RISCV_ISA_A
> def_bool y
>
> +menu "PMU type"
> + depends on PERF_EVENTS
> +
> +config RISCV_BASE_PMU
> + bool "Base Performance Monitoring Unit"
> + def_bool y
> + help
> + A base PMU that serves as a reference implementation and has limited
> + feature of perf.
> +
> +endmenu
> +
> endmenu
>
> menu "Kernel type"
> diff --git a/arch/riscv/include/asm/perf_event.h b/arch/riscv/include/asm/perf_event.h
> index e13d2ff29e83..98e2efb02d25 100644
> --- a/arch/riscv/include/asm/perf_event.h
> +++ b/arch/riscv/include/asm/perf_event.h
> @@ -1,13 +1,81 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> /*
> * Copyright (C) 2018 SiFive
> + * Copyright (C) 2018 Andes Technology Corporation
> *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public Licence
> - * as published by the Free Software Foundation; either version
> - * 2 of the Licence, or (at your option) any later version.
> */
>
> #ifndef _ASM_RISCV_PERF_EVENT_H
> #define _ASM_RISCV_PERF_EVENT_H
>
> +#include <linux/perf_event.h>
> +#include <linux/ptrace.h>
> +
> +#define RISCV_BASE_COUNTERS 2
> +
> +/*
> + * The RISCV_MAX_COUNTERS parameter should be specified.
> + */
> +
> +#ifdef CONFIG_RISCV_BASE_PMU
> +#define RISCV_MAX_COUNTERS 2
> +#endif
> +
> +#ifndef RISCV_MAX_COUNTERS
> +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU."
> +#endif
> +
> +/*
> + * These are the indexes of bits in counteren register *minus* 1,
> + * except for cycle. It would be coherent if it can directly mapped
> + * to counteren bit definition, but there is a *time* register at
> + * counteren[1]. Per-cpu structure is scarce resource here.
> + *
> + * According to the spec, an implementation can support counter up to
> + * mhpmcounter31, but many high-end processors has at most 6 general
> + * PMCs, we give the definition to MHPMCOUNTER8 here.
> + */
> +#define RISCV_PMU_CYCLE 0
> +#define RISCV_PMU_INSTRET 1
> +#define RISCV_PMU_MHPMCOUNTER3 2
> +#define RISCV_PMU_MHPMCOUNTER4 3
> +#define RISCV_PMU_MHPMCOUNTER5 4
> +#define RISCV_PMU_MHPMCOUNTER6 5
> +#define RISCV_PMU_MHPMCOUNTER7 6
> +#define RISCV_PMU_MHPMCOUNTER8 7
> +
> +#define RISCV_OP_UNSUPP (-EOPNOTSUPP)
> +
> +struct cpu_hw_events {
> + /* # currently enabled events*/
> + int n_events;
> + /* currently enabled events */
> + struct perf_event *events[RISCV_MAX_COUNTERS];
> + /* vendor-defined PMU data */
> + void *platform;
> +};
> +
> +struct riscv_pmu {
> + struct pmu *pmu;
> +
> + /* generic hw/cache events table */
> + const int *hw_events;
> + const int (*cache_events)[PERF_COUNT_HW_CACHE_MAX]
> + [PERF_COUNT_HW_CACHE_OP_MAX]
> + [PERF_COUNT_HW_CACHE_RESULT_MAX];
> + /* method used to map hw/cache events */
> + int (*map_hw_event)(u64 config);
> + int (*map_cache_event)(u64 config);
> +
> + /* max generic hw events in map */
> + int max_events;
> + /* number total counters, 2(base) + x(general) */
> + int num_counters;
> + /* the width of the counter */
> + int counter_width;
> +
> + /* vendor-defined PMU features */
> + void *platform;
> +};
> +
> #endif /* _ASM_RISCV_PERF_EVENT_H */
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index ffa439d4a364..f50d19816757 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -39,5 +39,6 @@ obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o
> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
> +obj-$(CONFIG_PERF_EVENTS) += perf_event.o
>
> clean:
> diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_event.c
> new file mode 100644
> index 000000000000..cac4abd0a884
> --- /dev/null
> +++ b/arch/riscv/kernel/perf_event.c
> @@ -0,0 +1,468 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2008 Thomas Gleixner <[email protected]>
> + * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar
> + * Copyright (C) 2009 Jaswinder Singh Rajput
> + * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter
> + * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra
> + * Copyright (C) 2009 Intel Corporation, <[email protected]>
> + * Copyright (C) 2009 Google, Inc., Stephane Eranian
> + * Copyright 2014 Tilera Corporation. All Rights Reserved.
> + * Copyright (C) 2018 Andes Technology Corporation
> + *
> + * Perf_events support for RISC-V platforms.
> + *
> + * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough
> + * functionality for perf event to fully work, this file provides
> + * the very basic framework only.
> + *
> + * For platform portings, please check Documentations/riscv/pmu.txt.
> + *
> + * The Copyright line includes x86 and tile ones.
> + */
> +
> +#include <linux/kprobes.h>
> +#include <linux/kernel.h>
> +#include <linux/kdebug.h>
> +#include <linux/mutex.h>
> +#include <linux/bitmap.h>
> +#include <linux/irq.h>
> +#include <linux/interrupt.h>
> +#include <linux/perf_event.h>
> +#include <linux/atomic.h>
> +#include <asm/perf_event.h>
> +
> +static const struct riscv_pmu *riscv_pmu __read_mostly;
> +static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
> +
> +/*
> + * Hardware & cache maps and their methods
> + */
> +
> +static const int riscv_hw_event_map[] = {
> + [PERF_COUNT_HW_CPU_CYCLES] = RISCV_PMU_CYCLE,
> + [PERF_COUNT_HW_INSTRUCTIONS] = RISCV_PMU_INSTRET,
> + [PERF_COUNT_HW_CACHE_REFERENCES] = RISCV_OP_UNSUPP,
> + [PERF_COUNT_HW_CACHE_MISSES] = RISCV_OP_UNSUPP,
> + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = RISCV_OP_UNSUPP,
> + [PERF_COUNT_HW_BRANCH_MISSES] = RISCV_OP_UNSUPP,
> + [PERF_COUNT_HW_BUS_CYCLES] = RISCV_OP_UNSUPP,
> +};
> +
> +#define C(x) PERF_COUNT_HW_CACHE_##x
> +static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX]
> +[PERF_COUNT_HW_CACHE_OP_MAX]
> +[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
> + [C(L1D)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + },
> + [C(L1I)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + },
> + [C(LL)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + },
> + [C(DTLB)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + },
> + [C(ITLB)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + },
> + [C(BPU)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> + [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> + },
> + },
> +};
> +
> +static int riscv_map_hw_event(u64 config)
> +{
> + if (config >= riscv_pmu->max_events)
> + return -EINVAL;
> +
> + return riscv_pmu->hw_events[config];
> +}
> +
> +int riscv_map_cache_decode(u64 config, unsigned int *type,
> + unsigned int *op, unsigned int *result)
> +{
> + return -ENOENT;
> +}
> +
> +static int riscv_map_cache_event(u64 config)
> +{
> + unsigned int type, op, result;
> + int err = -ENOENT;
> + int code;
> +
> + err = riscv_map_cache_decode(config, &type, &op, &result);
> + if (!riscv_pmu->cache_events || err)
> + return err;
> +
> + if (type >= PERF_COUNT_HW_CACHE_MAX ||
> + op >= PERF_COUNT_HW_CACHE_OP_MAX ||
> + result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
> + return -EINVAL;
> +
> + code = (*riscv_pmu->cache_events)[type][op][result];
> + if (code == RISCV_OP_UNSUPP)
> + return -EINVAL;
> +
> + return code;
> +}
> +
> +/*
> + * Low-level functions: reading/writing counters
> + */
> +
> +static inline u64 read_counter(int idx)
> +{
> + u64 val = 0;
> +
> + switch (idx) {
> + case RISCV_PMU_CYCLE:
> + val = csr_read(cycle);
> + break;
> + case RISCV_PMU_INSTRET:
> + val = csr_read(instret);
> + break;
> + default:
> + WARN_ON_ONCE(idx < 0 || idx > RISCV_MAX_COUNTERS);
> + return -EINVAL;
> + }
> +
> + return val;
> +}
> +
> +static inline void write_counter(int idx, u64 value)
> +{
> + /* currently not supported */
> +}
> +
> +/*
> + * pmu->read: read and update the counter
> + *
> + * Other architectures' implementation often have a xxx_perf_event_update
> + * routine, which can return counter values when called in the IRQ, but
> + * return void when being called by the pmu->read method.
> + */
> +static void riscv_pmu_read(struct perf_event *event)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> + u64 prev_raw_count, new_raw_count;
> + u64 oldval;
> + int idx = hwc->idx;
> + u64 delta;
> +
> + do {
> + prev_raw_count = local64_read(&hwc->prev_count);
> + new_raw_count = read_counter(idx);
> +
> + oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count,
> + new_raw_count);
> + } while (oldval != prev_raw_count);
> +
> + /*
> + * delta is the value to update the counter we maintain in the kernel.
> + */
> + delta = (new_raw_count - prev_raw_count) &
> + ((1ULL << riscv_pmu->counter_width) - 1);
> + local64_add(delta, &event->count);
> + /*
> + * Something like local64_sub(delta, &hwc->period_left) here is
> + * needed if there is an interrupt for perf.
> + */
> +}
> +
> +/*
> + * State transition functions:
> + *
> + * stop()/start() & add()/del()
> + */
> +
> +/*
> + * pmu->stop: stop the counter
> + */
> +static void riscv_pmu_stop(struct perf_event *event, int flags)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> +
> + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
> + hwc->state |= PERF_HES_STOPPED;
> +
> + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
> + riscv_pmu_read(event);
> + hwc->state |= PERF_HES_UPTODATE;
> + }
> +}
> +
> +/*
> + * pmu->start: start the event.
> + */
> +static void riscv_pmu_start(struct perf_event *event, int flags)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> +
> + if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
> + return;
> +
> + if (flags & PERF_EF_RELOAD) {
> + WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
> +
> + /*
> + * Set the counter to the period to the next interrupt here,
> + * if you have any.
> + */
> + }
> +
> + hwc->state = 0;
> + perf_event_update_userpage(event);
> +
> + /*
> + * Since we cannot write to counters, this serves as an initialization
> + * to the delta-mechanism in pmu->read(); otherwise, the delta would be
> + * wrong when pmu->read is called for the first time.
> + */
> + local64_set(&hwc->prev_count, read_counter(hwc->idx));
> +}
> +
> +/*
> + * pmu->add: add the event to PMU.
> + */
> +static int riscv_pmu_add(struct perf_event *event, int flags)
> +{
> + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> + struct hw_perf_event *hwc = &event->hw;
> +
> + if (cpuc->n_events == riscv_pmu->num_counters)
> + return -ENOSPC;
> +
> + /*
> + * We don't have general conunters, so no binding-event-to-counter
> + * process here.
> + *
> + * Indexing using hwc->config generally not works, since config may
> + * contain extra information, but here the only info we have in
> + * hwc->config is the event index.
> + */
> + hwc->idx = hwc->config;
> + cpuc->events[hwc->idx] = event;
> + cpuc->n_events++;
> +
> + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
> +
> + if (flags & PERF_EF_START)
> + riscv_pmu_start(event, PERF_EF_RELOAD);
> +
> + return 0;
> +}
> +
> +/*
> + * pmu->del: delete the event from PMU.
> + */
> +static void riscv_pmu_del(struct perf_event *event, int flags)
> +{
> + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> + struct hw_perf_event *hwc = &event->hw;
> +
> + cpuc->events[hwc->idx] = NULL;
> + cpuc->n_events--;
> + riscv_pmu_stop(event, PERF_EF_UPDATE);
> + perf_event_update_userpage(event);
> +}
> +
> +/*
> + * Interrupt
> + */
> +
> +static DEFINE_MUTEX(pmc_reserve_mutex);
> +typedef void (*perf_irq_t)(void *riscv_perf_irq);
> +perf_irq_t perf_irq;
> +
> +void riscv_pmu_handle_irq(void *riscv_perf_irq)
> +{
> +}
> +
> +static perf_irq_t reserve_pmc_hardware(void)
> +{
> + perf_irq_t old;
> +
> + mutex_lock(&pmc_reserve_mutex);
> + old = perf_irq;
> + perf_irq = &riscv_pmu_handle_irq;
> + mutex_unlock(&pmc_reserve_mutex);
> +
> + return old;
> +}
> +
> +void release_pmc_hardware(void)
> +{
> + mutex_lock(&pmc_reserve_mutex);
> + perf_irq = NULL;
> + mutex_unlock(&pmc_reserve_mutex);
> +}
> +
> +/*
> + * Event Initialization
> + */
> +
> +static atomic_t riscv_active_events;
> +
> +static void riscv_event_destroy(struct perf_event *event)
> +{
> + if (atomic_dec_return(&riscv_active_events) == 0)
> + release_pmc_hardware();
> +}
> +
> +static int riscv_event_init(struct perf_event *event)
> +{
> + struct perf_event_attr *attr = &event->attr;
> + struct hw_perf_event *hwc = &event->hw;
> + perf_irq_t old_irq_handler = NULL;
> + int code;
> +
> + if (atomic_inc_return(&riscv_active_events) == 1)
> + old_irq_handler = reserve_pmc_hardware();
> +
> + if (old_irq_handler) {
> + pr_warn("PMC hardware busy (reserved by oprofile)\n");
> + atomic_dec(&riscv_active_events);
> + return -EBUSY;
> + }
> +
> + switch (event->attr.type) {
> + case PERF_TYPE_HARDWARE:
> + code = riscv_pmu->map_hw_event(attr->config);
> + break;
> + case PERF_TYPE_HW_CACHE:
> + code = riscv_pmu->map_cache_event(attr->config);
> + break;
> + case PERF_TYPE_RAW:
> + return -EOPNOTSUPP;
> + default:
> + return -ENOENT;
> + }
> +
> + event->destroy = riscv_event_destroy;
> + if (code < 0) {
> + event->destroy(event);
> + return code;
> + }
> +
> + /*
> + * idx is set to -1 because the index of a general event should not be
> + * decided until binding to some counter in pmu->add().
> + *
> + * But since we don't have such support, later in pmu->add(), we just
> + * use hwc->config as the index instead.
> + */
> + hwc->config = code;
> + hwc->idx = -1;
> +
> + return 0;
> +}
> +
> +/*
> + * Initialization
> + */
> +
> +static struct pmu min_pmu = {
> + .name = "riscv-base",
> + .event_init = riscv_event_init,
> + .add = riscv_pmu_add,
> + .del = riscv_pmu_del,
> + .start = riscv_pmu_start,
> + .stop = riscv_pmu_stop,
> + .read = riscv_pmu_read,
> +};
> +
> +static const struct riscv_pmu riscv_base_pmu = {
> + .pmu = &min_pmu,
> + .max_events = ARRAY_SIZE(riscv_hw_event_map),
> + .map_hw_event = riscv_map_hw_event,
> + .hw_events = riscv_hw_event_map,
> + .map_cache_event = riscv_map_cache_event,
> + .cache_events = &riscv_cache_event_map,
> + .counter_width = 63,
> + .num_counters = RISCV_BASE_COUNTERS + 0,
> +};
> +
> +struct pmu * __weak __init riscv_init_platform_pmu(void)
> +{
> + riscv_pmu = &riscv_base_pmu;
> + return riscv_pmu->pmu;
> +}
> +
> +int __init init_hw_perf_events(void)
> +{
> + struct pmu *pmu = riscv_init_platform_pmu();
> +
> + perf_irq = NULL;
> + perf_pmu_register(pmu, "cpu", PERF_TYPE_RAW);
> + return 0;
> +}
> +arch_initcall(init_hw_perf_events);
> --
> 2.16.2
>

2018-04-03 03:17:10

by Palmer Dabbelt

[permalink] [raw]

Subject: Re: [PATCH 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V

On Mon, 02 Apr 2018 05:31:22 PDT (-0700), [email protected] wrote:
> This implements the baseline PMU for RISC-V platforms.
>
> To ease future PMU portings, a guide is also written, containing
> perf concepts, arch porting practices and some hints.
>
> Changes in v2:
> - Fix the bug reported by Alex, which was caused by not sufficient
> initialization. Check https://lkml.org/lkml/2018/3/31/251 for the
> discussion.
>
> Alan Kao (2):
> perf: riscv: preliminary RISC-V support
> perf: riscv: Add Document for Future Porting Guide
>
> Documentation/riscv/pmu.txt | 249 +++++++++++++++++++
> arch/riscv/Kconfig | 12 +
> arch/riscv/include/asm/perf_event.h | 76 +++++-
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/perf_event.c | 468 ++++++++++++++++++++++++++++++++++++
> 5 files changed, 802 insertions(+), 4 deletions(-)
> create mode 100644 Documentation/riscv/pmu.txt
> create mode 100644 arch/riscv/kernel/perf_event.c

I'm having some trouble pulling this into my tree. I think you might have
another patch floating around somewhere, as I don't have any
arch/riscv/include/asm/perf_event.h right now.

Do you mind rebasing this on top of linux-4.16 so I can look properly?

Thanks!

2018-04-04 02:10:07

by Alex Solomatnikov

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] perf: riscv: Add Document for Future Porting Guide

Doc fixes:

diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt
index a3e930e..ae90a5e 100644
--- a/Documentation/riscv/pmu.txt
+++ b/Documentation/riscv/pmu.txt
@@ -20,7 +20,7 @@ the lack of the following general architectural
performance monitoring features:
* Enabling/Disabling counters
Counters are just free-running all the time in our case.
* Interrupt caused by counter overflow
- No such design in the spec.
+ No such feature in the spec.
* Interrupt indicator
It is not possible to have many interrupt ports for all counters, so an
interrupt indicator is required for software to tell which counter has
@@ -159,14 +159,14 @@ interrupt for perf, so the details are to be
completed in the future.

They seem symmetric but perf treats them quite differently. For reading, there
is a *read* interface in *struct pmu*, but it serves more than just reading.
-According to the context, the *read* function not only read the content of the
-counter (event->count), but also update the left period to the next interrupt
+According to the context, the *read* function not only reads the content of the
+counter (event->count), but also updates the left period for the next interrupt
(event->hw.period_left).

But the core of perf does not need direct write to counters. Writing counters
-hides behind the abstraction of 1) *pmu->start*, literally start
counting so one
+is hidden behind the abstraction of 1) *pmu->start*, literally start
counting so one
has to set the counter to a good value for the next interrupt; 2)
inside the IRQ
-it should set the counter with the same reason.
+it should set the counter to the same reasonable value.

Reading is not a problem in RISC-V but writing would need some effort, since
counters are not allowed to be written by S-mode.
@@ -190,37 +190,37 @@ Three states (event->hw.state) are defined:
A normal flow of these state transitions are as follows:

* A user launches a perf event, resulting in calling to *event_init*.
-* When being context-switched in, *add* is called by the perf core, with flag
- PERF_EF_START, which mean that the event should be started after it is added.
- In this stage, an general event is binded to a physical counter, if any.
+* When being context-switched in, *add* is called by the perf core, with a flag
+ PERF_EF_START, which means that the event should be started after
it is added.
+ At this stage, a general event is bound to a physical counter, if any.
The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE,
because it is now
stopped, and the (software) event count does not need updating.
** *start* is then called, and the counter is enabled.
- With flag PERF_EF_RELOAD, it write the counter to an appropriate
value (check
- previous section for detail).
- No writing is made if the flag does not contain PERF_EF_RELOAD.
- The state now is reset to none, because it is neither stopped nor update
- (the counting already starts)
-* When being context-switched out, *del* is called. It then checkout all the
- events in the PMU and call *stop* to update their counts.
+ With flag PERF_EF_RELOAD, it writes an appropriate value to the
counter (check
+ the previous section for details).
+ Nothing is written if the flag does not contain PERF_EF_RELOAD.
+ The state now is reset to none, because it is neither stopped nor updated
+ (the counting already started)
+* When being context-switched out, *del* is called. It then checks out all the
+ events in the PMU and calls *stop* to update their counts.
** *stop* is called by *del*
and the perf core with flag PERF_EF_UPDATE, and it often shares the same
subroutine as *read* with the same logic.
The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again.

-** Life cycles of these two pairs: *add* and *del* are called repeatedly as
+** Life cycle of these two pairs: *add* and *del* are called repeatedly as
tasks switch in-and-out; *start* and *stop* is also called when the perf core
needs a quick stop-and-start, for instance, when the interrupt
period is being
adjusted.

-Current implementation is sufficient for now and can be easily extend to
+Current implementation is sufficient for now and can be easily
extended with new
features in the future.

A. Related Structures
---------------------

-* struct pmu: include/linux/perf_events.h
-* struct riscv_pmu: arch/riscv/include/asm/perf_events.h
+* struct pmu: include/linux/perf_event.h
+* struct riscv_pmu: arch/riscv/include/asm/perf_event.h

Both structures are designed to be read-only.

@@ -231,13 +231,13 @@ perf's internal state machine (check
kernel/events/core.c for details).
*struct riscv_pmu* defines PMU-specific parameters. The naming follows the
convention of all other architectures.

-* struct perf_event: include/linux/perf_events.h
+* struct perf_event: include/linux/perf_event.h
* struct hw_perf_event

The generic structure that represents perf events, and the hardware-related
details.

-* struct riscv_hw_events: arch/riscv/include/asm/perf_events.h
+* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h

The structure that holds the status of events, has two fixed members:
the number of events and the array of the events.

On Mon, Apr 2, 2018 at 5:31 AM, Alan Kao <[email protected]> wrote:
> Cc: Nick Hu <[email protected]>
> Cc: Greentime Hu <[email protected]>
> Signed-off-by: Alan Kao <[email protected]>
> ---
> Documentation/riscv/pmu.txt | 249 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 249 insertions(+)
> create mode 100644 Documentation/riscv/pmu.txt
>
> diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt
> new file mode 100644
> index 000000000000..a3e930ed5141
> --- /dev/null
> +++ b/Documentation/riscv/pmu.txt
> @@ -0,0 +1,249 @@
> +Supporting PMUs on RISC-V platforms
> +==========================================
> +Alan Kao <[email protected]>, Mar 2018
> +
> +Introduction
> +------------
> +
> +As of this writing, perf_event-related features mentioned in The RISC-V ISA
> +Privileged Version 1.10 are as follows:
> +(please check the manual for more details)
> +
> +* [m|s]counteren
> +* mcycle[h], cycle[h]
> +* minstret[h], instret[h]
> +* mhpeventx, mhpcounterx[h]
> +
> +With such function set only, porting perf would require a lot of work, due to
> +the lack of the following general architectural performance monitoring features:
> +
> +* Enabling/Disabling counters
> + Counters are just free-running all the time in our case.
> +* Interrupt caused by counter overflow
> + No such design in the spec.
> +* Interrupt indicator
> + It is not possible to have many interrupt ports for all counters, so an
> + interrupt indicator is required for software to tell which counter has
> + just overflowed.
> +* Writing to counters
> + There will be an SBI to support this since the kernel cannot modify the
> + counters [1]. Alternatively, some vendor considers to implement
> + hardware-extension for M-S-U model machines to write counters directly.
> +
> +This document aims to provide developers a quick guide on supporting their
> +PMUs in the kernel. The following sections briefly explain perf' mechanism
> +and todos.
> +
> +You may check previous discussions here [1][2]. Also, it might be helpful
> +to check the appendix for related kernel structures.
> +
> +
> +1. Initialization
> +-----------------
> +
> +*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains
> +various methods according to perf's internal convention and PMU-specific
> +parameters. One should declare such instance to represent the PMU. By default,
> +*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very
> +basic support to a baseline QEMU model.
> +
> +Then he/she can either assign the instance's pointer to *riscv_pmu* so that
> +the minimal and already-implemented logic can be leveraged, or invent his/her
> +own *riscv_init_platform_pmu* implementation.
> +
> +In other words, existing sources of *riscv_base_pmu* merely provide a
> +reference implementation. Developers can flexibly decide how many parts they
> +can leverage, and in the most extreme case, they can customize every function
> +according to their needs.
> +
> +
> +2. Event Initialization
> +-----------------------
> +
> +When a user launches a perf command to monitor some events, it is first
> +interpreted by the userspace perf tool into multiple *perf_event_open*
> +system calls, and then each of them calls to the body of *event_init*
> +member function that was assigned in the previous step. In *riscv_base_pmu*'s
> +case, it is *riscv_event_init*.
> +
> +The main purpose of this function is to translate the event provided by user
> +into bitmap, so that HW-related control registers or counters can directly be
> +manipulated. The translation is based on the mappings and methods provided in
> +*riscv_pmu*.
> +
> +Note that some features can be done in this stage as well:
> +
> +(1) interrupt setting, which is stated in the next section;
> +(2) privilege level setting (user space only, kernel space only, both);
> +(3) destructor setting. Normally it is sufficient to apply *riscv_destroy_event*;
> +(4) tweaks for non-sampling events, which will be utilized by functions such as
> +*perf_adjust_period*, usually something like the follows:
> +
> +if (!is_sampling_event(event)) {
> + hwc->sample_period = x86_pmu.max_period;
> + hwc->last_period = hwc->sample_period;
> + local64_set(&hwc->period_left, hwc->sample_period);
> +}
> +
> +In the case of *riscv_base_pmu*, only (3) is provided for now.
> +
> +
> +3. Interrupt
> +------------
> +
> +3.1. Interrupt Initialization
> +
> +This often occurs at the beginning of the *event_init* method. In common
> +practice, this should be a code segment like
> +
> +int x86_reserve_hardware(void)
> +{
> + int err = 0;
> +
> + if (!atomic_inc_not_zero(&pmc_refcount)) {
> + mutex_lock(&pmc_reserve_mutex);
> + if (atomic_read(&pmc_refcount) == 0) {
> + if (!reserve_pmc_hardware())
> + err = -EBUSY;
> + else
> + reserve_ds_buffers();
> + }
> + if (!err)
> + atomic_inc(&pmc_refcount);
> + mutex_unlock(&pmc_reserve_mutex);
> + }
> +
> + return err;
> +}
> +
> +And the magic is in *reserve_pmc_hardware*, which usually does atomic
> +operations to make implemented IRQ accessible from some global function pointer.
> +*release_pmc_hardware* serves the opposite purpose, and it is used in event
> +destructors mentioned in previous section.
> +
> +(Note: From the implementations in all the architectures, the *reserve/release*
> +pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading.
> +It does NOT deal with the binding between an event and a physical counter,
> +which will be introduced in the next section.)
> +
> +3.2. IRQ Structure
> +
> +Basically, a IRQ runs the following pseudo code:
> +
> +for each hardware counter that triggered this overflow
> +
> + get the event of this counter
> +
> + // following two steps are defined as *read()*,
> + // check the section Reading/Writing Counters for details.
> + count the delta value since previous interrupt
> + update the event->count (# event occurs) by adding delta, and
> + event->hw.period_left by subtracting delta
> +
> + if the event overflows
> + sample data
> + set the counter appropriately for the next overflow
> +
> + if the event overflows again
> + too frequently, throttle this event
> + fi
> + fi
> +
> +end for
> +
> +However as of this writing, none of the RISC-V implementations have designed an
> +interrupt for perf, so the details are to be completed in the future.
> +
> +4. Reading/Writing Counters
> +---------------------------
> +
> +They seem symmetric but perf treats them quite differently. For reading, there
> +is a *read* interface in *struct pmu*, but it serves more than just reading.
> +According to the context, the *read* function not only read the content of the
> +counter (event->count), but also update the left period to the next interrupt
> +(event->hw.period_left).
> +
> +But the core of perf does not need direct write to counters. Writing counters
> +hides behind the abstraction of 1) *pmu->start*, literally start counting so one
> +has to set the counter to a good value for the next interrupt; 2) inside the IRQ
> +it should set the counter with the same reason.
> +
> +Reading is not a problem in RISC-V but writing would need some effort, since
> +counters are not allowed to be written by S-mode.
> +
> +
> +5. add()/del()/start()/stop()
> +-----------------------------
> +
> +Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop()
> +starts/stop the counter of some event in the PMU. All of them take the same
> +arguments: *struct perf_event *event* and *int flag*.
> +
> +Consider perf as a state machine, then you will find that these functions serve
> +as the state transition process between those states.
> +Three states (event->hw.state) are defined:
> +
> +* PERF_HES_STOPPED: the counter is stopped
> +* PERF_HES_UPTODATE: the event->count is up-to-date
> +* PERF_HES_ARCH: arch-dependent usage ... we don't need this for now
> +
> +A normal flow of these state transitions are as follows:
> +
> +* A user launches a perf event, resulting in calling to *event_init*.
> +* When being context-switched in, *add* is called by the perf core, with flag
> + PERF_EF_START, which mean that the event should be started after it is added.
> + In this stage, an general event is binded to a physical counter, if any.
> + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now
> + stopped, and the (software) event count does not need updating.
> +** *start* is then called, and the counter is enabled.
> + With flag PERF_EF_RELOAD, it write the counter to an appropriate value (check
> + previous section for detail).
> + No writing is made if the flag does not contain PERF_EF_RELOAD.
> + The state now is reset to none, because it is neither stopped nor update
> + (the counting already starts)
> +* When being context-switched out, *del* is called. It then checkout all the
> + events in the PMU and call *stop* to update their counts.
> +** *stop* is called by *del*
> + and the perf core with flag PERF_EF_UPDATE, and it often shares the same
> + subroutine as *read* with the same logic.
> + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again.
> +
> +** Life cycles of these two pairs: *add* and *del* are called repeatedly as
> + tasks switch in-and-out; *start* and *stop* is also called when the perf core
> + needs a quick stop-and-start, for instance, when the interrupt period is being
> + adjusted.
> +
> +Current implementation is sufficient for now and can be easily extend to
> +features in the future.
> +
> +A. Related Structures
> +---------------------
> +
> +* struct pmu: include/linux/perf_events.h
> +* struct riscv_pmu: arch/riscv/include/asm/perf_events.h
> +
> + Both structures are designed to be read-only.
> +
> + *struct pmu* defines some function pointer interfaces, and most of them take
> +*struct perf_event* as a main argument, dealing with perf events according to
> +perf's internal state machine (check kernel/events/core.c for details).
> +
> + *struct riscv_pmu* defines PMU-specific parameters. The naming follows the
> +convention of all other architectures.
> +
> +* struct perf_event: include/linux/perf_events.h
> +* struct hw_perf_event
> +
> + The generic structure that represents perf events, and the hardware-related
> +details.
> +
> +* struct riscv_hw_events: arch/riscv/include/asm/perf_events.h
> +
> + The structure that holds the status of events, has two fixed members:
> +the number of events and the array of the events.
> +
> +References
> +----------
> +
> +[1] https://github.com/riscv/riscv-linux/pull/124
> +[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA
> --
> 2.16.2
>

2018-04-05 05:04:07

by Alan Kao

[permalink] [raw]

Subject: Re: [PATCH 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V

On Tue, Apr 03, 2018 at 03:45:17PM -0700, Palmer Dabbelt wrote:
> On Tue, 03 Apr 2018 07:29:02 PDT (-0700), [email protected] wrote:
> >On Mon, Apr 02, 2018 at 08:15:44PM -0700, Palmer Dabbelt wrote:
> >>On Mon, 02 Apr 2018 05:31:22 PDT (-0700), [email protected] wrote:
> >>>This implements the baseline PMU for RISC-V platforms.
> >>>
> >>>To ease future PMU portings, a guide is also written, containing
> >>>perf concepts, arch porting practices and some hints.
> >>>
> >>>Changes in v2:
> >>> - Fix the bug reported by Alex, which was caused by not sufficient
> >>> initialization. Check https://lkml.org/lkml/2018/3/31/251 for the
> >>> discussion.
> >>>
> >>>Alan Kao (2):
> >>> perf: riscv: preliminary RISC-V support
> >>> perf: riscv: Add Document for Future Porting Guide
> >>>
> >>> Documentation/riscv/pmu.txt | 249 +++++++++++++++++++
> >>> arch/riscv/Kconfig | 12 +
> >>> arch/riscv/include/asm/perf_event.h | 76 +++++-
> >>> arch/riscv/kernel/Makefile | 1 +
> >>> arch/riscv/kernel/perf_event.c | 468 ++++++++++++++++++++++++++++++++++++
> >>> 5 files changed, 802 insertions(+), 4 deletions(-)
> >>> create mode 100644 Documentation/riscv/pmu.txt
> >>> create mode 100644 arch/riscv/kernel/perf_event.c
> >>
> >>I'm having some trouble pulling this into my tree. I think you might have
> >>another patch floating around somewhere, as I don't have any
> >>arch/riscv/include/asm/perf_event.h right now.
> >>
> >>Do you mind rebasing this on top of linux-4.16 so I can look properly?
> >>
> >>Thanks!
> >
> >Sorry for the inconvenience, but this patch was based on Alex's patch at
> >https://github.com/riscv/riscv-linux/pull/124/files. I thought that one
> >had already been picked into your tree.
> >
> >Any ideas?
>
> Thanks, it applies on top of that. I'm going to play around with this a
> bit, but it looks generally good.

Note that to make it work better when wraparound occurs, you should change the
value of *.counter_width* into the width of real hardware counters. This is
because this patch does not handle wraparound checking, so using a wider
bit mask may sometimes report a extremely large number.

Ideally this should be done by adding a Kconfig option called
"Hifive Unleashed PMU" which automatically sets the width an reuses most of the
baseline codes. What do you think about this?

Thanks.

2018-04-05 05:11:16

by Alan Kao

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] perf: riscv: Add Document for Future Porting Guide

Hi Alex,

On Tue, Apr 03, 2018 at 07:08:43PM -0700, Alex Solomatnikov wrote:
> Doc fixes:
>
>

Thanks for these fixes. I'll edit this patch and send a v3 once I am done
with the PMU patch.

I suppose a "Reviewed-by: Alex Solomatnikov" appending at the end of the
commit will be great, right?

Alan

> diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt
> index a3e930e..ae90a5e 100644
> --- a/Documentation/riscv/pmu.txt
> +++ b/Documentation/riscv/pmu.txt
> @@ -20,7 +20,7 @@ the lack of the following general architectural
> performance monitoring features:
> * Enabling/Disabling counters
> Counters are just free-running all the time in our case.
> * Interrupt caused by counter overflow
> - No such design in the spec.
> + No such feature in the spec.
> * Interrupt indicator
> It is not possible to have many interrupt ports for all counters, so an
> interrupt indicator is required for software to tell which counter has
> @@ -159,14 +159,14 @@ interrupt for perf, so the details are to be
> completed in the future.
>
> They seem symmetric but perf treats them quite differently. For reading, there
> is a *read* interface in *struct pmu*, but it serves more than just reading.
> -According to the context, the *read* function not only read the content of the
> -counter (event->count), but also update the left period to the next interrupt
> +According to the context, the *read* function not only reads the content of the
> +counter (event->count), but also updates the left period for the next interrupt
> (event->hw.period_left).
>
> But the core of perf does not need direct write to counters. Writing counters
> -hides behind the abstraction of 1) *pmu->start*, literally start
> counting so one
> +is hidden behind the abstraction of 1) *pmu->start*, literally start
> counting so one
> has to set the counter to a good value for the next interrupt; 2)
> inside the IRQ
> -it should set the counter with the same reason.
> +it should set the counter to the same reasonable value.
>
> Reading is not a problem in RISC-V but writing would need some effort, since
> counters are not allowed to be written by S-mode.
> @@ -190,37 +190,37 @@ Three states (event->hw.state) are defined:
> A normal flow of these state transitions are as follows:
>
> * A user launches a perf event, resulting in calling to *event_init*.
> -* When being context-switched in, *add* is called by the perf core, with flag
> - PERF_EF_START, which mean that the event should be started after it is added.
> - In this stage, an general event is binded to a physical counter, if any.
> +* When being context-switched in, *add* is called by the perf core, with a flag
> + PERF_EF_START, which means that the event should be started after
> it is added.
> + At this stage, a general event is bound to a physical counter, if any.
> The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE,
> because it is now
> stopped, and the (software) event count does not need updating.
> ** *start* is then called, and the counter is enabled.
> - With flag PERF_EF_RELOAD, it write the counter to an appropriate
> value (check
> - previous section for detail).
> - No writing is made if the flag does not contain PERF_EF_RELOAD.
> - The state now is reset to none, because it is neither stopped nor update
> - (the counting already starts)
> -* When being context-switched out, *del* is called. It then checkout all the
> - events in the PMU and call *stop* to update their counts.
> + With flag PERF_EF_RELOAD, it writes an appropriate value to the
> counter (check
> + the previous section for details).
> + Nothing is written if the flag does not contain PERF_EF_RELOAD.
> + The state now is reset to none, because it is neither stopped nor updated
> + (the counting already started)
> +* When being context-switched out, *del* is called. It then checks out all the
> + events in the PMU and calls *stop* to update their counts.
> ** *stop* is called by *del*
> and the perf core with flag PERF_EF_UPDATE, and it often shares the same
> subroutine as *read* with the same logic.
> The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again.
>
> -** Life cycles of these two pairs: *add* and *del* are called repeatedly as
> +** Life cycle of these two pairs: *add* and *del* are called repeatedly as
> tasks switch in-and-out; *start* and *stop* is also called when the perf core
> needs a quick stop-and-start, for instance, when the interrupt
> period is being
> adjusted.
>
> -Current implementation is sufficient for now and can be easily extend to
> +Current implementation is sufficient for now and can be easily
> extended with new
> features in the future.
>
> A. Related Structures
> ---------------------
>
> -* struct pmu: include/linux/perf_events.h
> -* struct riscv_pmu: arch/riscv/include/asm/perf_events.h
> +* struct pmu: include/linux/perf_event.h
> +* struct riscv_pmu: arch/riscv/include/asm/perf_event.h
>
> Both structures are designed to be read-only.
>
> @@ -231,13 +231,13 @@ perf's internal state machine (check
> kernel/events/core.c for details).
> *struct riscv_pmu* defines PMU-specific parameters. The naming follows the
> convention of all other architectures.
>
> -* struct perf_event: include/linux/perf_events.h
> +* struct perf_event: include/linux/perf_event.h
> * struct hw_perf_event
>
> The generic structure that represents perf events, and the hardware-related
> details.
>
> -* struct riscv_hw_events: arch/riscv/include/asm/perf_events.h
> +* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h
>
> The structure that holds the status of events, has two fixed members:
> the number of events and the array of the events.
>
>

2018-04-05 16:30:38

by Palmer Dabbelt

[permalink] [raw]

Subject: Re: [PATCH 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V

On Wed, 04 Apr 2018 22:02:29 PDT (-0700), [email protected] wrote:
> On Tue, Apr 03, 2018 at 03:45:17PM -0700, Palmer Dabbelt wrote:
>> On Tue, 03 Apr 2018 07:29:02 PDT (-0700), [email protected] wrote:
>> >On Mon, Apr 02, 2018 at 08:15:44PM -0700, Palmer Dabbelt wrote:
>> >>On Mon, 02 Apr 2018 05:31:22 PDT (-0700), [email protected] wrote:
>> >>>This implements the baseline PMU for RISC-V platforms.
>> >>>
>> >>>To ease future PMU portings, a guide is also written, containing
>> >>>perf concepts, arch porting practices and some hints.
>> >>>
>> >>>Changes in v2:
>> >>> - Fix the bug reported by Alex, which was caused by not sufficient
>> >>> initialization. Check https://lkml.org/lkml/2018/3/31/251 for the
>> >>> discussion.
>> >>>
>> >>>Alan Kao (2):
>> >>> perf: riscv: preliminary RISC-V support
>> >>> perf: riscv: Add Document for Future Porting Guide
>> >>>
>> >>> Documentation/riscv/pmu.txt | 249 +++++++++++++++++++
>> >>> arch/riscv/Kconfig | 12 +
>> >>> arch/riscv/include/asm/perf_event.h | 76 +++++-
>> >>> arch/riscv/kernel/Makefile | 1 +
>> >>> arch/riscv/kernel/perf_event.c | 468 ++++++++++++++++++++++++++++++++++++
>> >>> 5 files changed, 802 insertions(+), 4 deletions(-)
>> >>> create mode 100644 Documentation/riscv/pmu.txt
>> >>> create mode 100644 arch/riscv/kernel/perf_event.c
>> >>
>> >>I'm having some trouble pulling this into my tree. I think you might have
>> >>another patch floating around somewhere, as I don't have any
>> >>arch/riscv/include/asm/perf_event.h right now.
>> >>
>> >>Do you mind rebasing this on top of linux-4.16 so I can look properly?
>> >>
>> >>Thanks!
>> >
>> >Sorry for the inconvenience, but this patch was based on Alex's patch at
>> >https://github.com/riscv/riscv-linux/pull/124/files. I thought that one
>> >had already been picked into your tree.
>> >
>> >Any ideas?
>>
>> Thanks, it applies on top of that. I'm going to play around with this a
>> bit, but it looks generally good.
>
> Note that to make it work better when wraparound occurs, you should change the
> value of *.counter_width* into the width of real hardware counters. This is
> because this patch does not handle wraparound checking, so using a wider
> bit mask may sometimes report a extremely large number.
>
> Ideally this should be done by adding a Kconfig option called
> "Hifive Unleashed PMU" which automatically sets the width an reuses most of the
> baseline codes. What do you think about this?

We're working through this now :)